Real World Experiences of Hybrid IT – Part 2
My previous post provided a short introduction to the series.
In this post, I’m going to be focusing on public cloud. I’ll cover my day-to-day experiences in using the major hyperscalers, costs, how I think about managing them, and my reasons for adopting hybrid IT as an operational model.
By now, almost everyone reading this should have had some experience with using a public cloud service. Personally, I’m using public cloud services daily. Being an independent consultant, I run my own company and I want to be able to focus on my customers, not on running my own IT.
When setting up my business, it made perfect sense for me to utilize public offerings such as Office 365 to get my communication and business tools up and running. When I want to work on new services or train myself, I augment my home lab with resources within the public clouds. For these use cases, this makes perfect sense: SaaS products are cheap, reliable, and easy to set up. Short-lived virtual machines or services for development/testing/training are also cost effective when used the right way.
This works for me, but I’m a team of one. Does this experience translate well into other cases? That’s an interesting question because it isn’t one-size-fits-all. I’ve worked with a wide range of customers over the years, and there are many different starting points for public cloud. The most important part of any cloud journey is understanding what tools to use in what locations. I’ve seen lift-and-shift style migrations to the cloud, use of cloud resources for discrete workloads like test/development, consumption of native services only, and every combination in between. Each of these have pros and cons, and there are areas of consideration that are sometimes overlooked in making these decisions.
I want to break down my experiences into the three areas where I’ve seen cost concerns arise, and how planning a hybrid IT approach can help mitigate these.
All public clouds offer many forms of virtual machines, ranging in cost, size, and capabilities. The provisioning model of the cloud make these easy to consume and adopt, but this is a double-edged sword. There are several brilliant use cases for these machines. It could be that you have a short-term need for additional compute power to supplement your existing estate. It might be that you need to perform some testing and need the extra resources available to you. Other options include being able to utilize hardware that you wouldn’t traditionally own, such as GPU-enabled platforms.
When planned out correctly, these use cases make financial sense. It is a short-term need that can be fulfilled quickly and allows business agility. The cost vs. benefit is clear. On the flip side, leaving these services running long-term can start to spiral out of control. From my own test environment, I know that a running VM that you forget about can run up bills very quickly, and while my own environment and use cases are relatively small even for me, bills into the hundreds of pounds (or dollars) per month for a couple of machines that I had forgotten to shut down/destroy are common. Now multiply that to enterprise scale, and the bill can become very difficult to justify, let alone manage. Early adopters that took a lift-and-shift approach to cloud without a clear plan to refactor applications are now hitting these concerns. Initial savings of moving expenditure from Cap-Ex to Op-Ex masked the long-term impact to the business.
The cloud offers easy access to many different types of storage. The prices can vary depending on the requirements of that storage. Primary tier storage is generally the most expensive to consume, while archive tiers are the cheapest. Features of storage in the cloud are improving all the time and catching up to what we have come to expect from the enterprise storage systems we have used on-premises over the years, but often at additional cost.
Consuming primary tier storage for long-term usage quickly adds up. This goes hand in hand with the examples of monitoring VM usage above. We can’t always plan for growth within our businesses, however, what starts off as small usage can quickly grow to multiple TB/PB. Managing this growth long-term is important for keeping costs low; ensuring that only the required data is kept on the most expensive tiers is key. We’ve seen many public examples where this kind of growth has required a rethink. The most recent example that comes to mind is that of Dropbox. That might be an exception to the rule, however, it highlights the need to be able to support data migrations either between cloud services or back to on-premises systems.
Getting data to the cloud is now a relatively easy task and, in most instances, incurs little to no cost. However, moving data within or between clouds, and in cases of repatriation, back to your own systems, does incur cost. In my experience, these are often overlooked. Sprawl within the cloud, adopting new features in different regions, or running a multi-region implementation can increase both traffic between services and associated costs.
Starting with a good design and maintaining locality between services helps minimize this. Ensuring the data you need is as close to the application as possible and minimal traffic is widely distributed needs to be considered from the very start of a cloud journey.
Why a Hybrid IT Mindset?
With those examples in mind, why should we adopt a hybrid IT mindset? Having all the tools available to you within your toolbox allows for solutions to be designed that maximize the efficiencies and productivity of your business whilst avoiding growing costs. Keeping long-running services that are low on the refactor/replace priorities in on-premises infrastructure is often the most cost-effective method. Allowing the data generated by these services to take advantage of cloud-native technologies and systems that would be too costly to develop internally (such as artificial intelligence or machine learning) gives your business the best of both worlds. If you have short-term requirements for additional capacity or short-lived workloads like dev/test, giving your teams access to resources in the cloud can speed up productivity and even out spikes in demand throughout the year. The key is in assessing the lifetime of the workload and what the overall costs of the services consumed are.
Next time I’ll look at how we build better on-premises data centers to adopt cloud-like practices and consumption.