Organizations large and small are taking advantage of cloud computing. This can be as simple as leveraging Office 365 or migrating all of their business applications from a data center into the cloud. We’ve all been witness to the explosive growth of cloud applications over the past several years.
When organizations start considering migrating business applications to the cloud, there are many considerations, everything from security, compliance, cost, disaster recovery, and more. Eventually, the discussion evolves into what systems or apps should be migrated first.
Once systems start getting identified, the discussion turns to the type of workload each system has and what it will look like in the cloud. In the post, I’ll explain what cloud workloads are and the different types of workloads to know about. I’ll also discuss important considerations and what it takes to be successful as you asses whether your current workloads are good candidates for migrating to the cloud.
What Is a Workload in Cloud Computing?
What exactly is a cloud workload? A cloud workload is essentially a resource or service consuming computing power in the cloud. This could be a web service, application server, database server, or any other business process.
When I’m working with customers on cloud migrations, I always like to evaluate the type of workload they have before diving too deep into planning. Even with database workloads, they can fall into one of several different categories.
Cloud Workload Types Explained
Some workloads fit into a general compute
tier. From a database workload perspective, I would generalize this as a departmental level workload. For example, an application that serves a department within a much larger organization; accounting or human resources would be a good example.
Some database workloads are much more CPU intensive
and demand a higher number of vCores. Along the same lines of workloads that demand more CPU, we have workloads that are much more memory intensive
. Memory intensive workloads are very common with database workloads due to how relational database systems like SQL Server cache data. Traditionally, to have more memory, you had to scale up CPU too. With Microsoft Azure, we can now select “constrained core” VMs that give you all the benefits of the higher size VM without providing the actual core count. For example, I only need eight vCores for my VM; however, I need the 128GB of memory only available with the 16 vCore model. I could select a constrained core version of the 16 vCore VM that was reduced down to eight vCores. All other resources of the 16 vCore model are available, such as memory, storage throughput, network throughput, and more.
Most cloud providers also have a storage optimized
version of their cloud offerings. These are typical for large data warehouses
, NoSQL databases, and workloads that just put a lot of demand on storage.
It’s worth noting most cloud providers also have a GPU optimized
workload offering. Typically I don’t see these needed for a database workload, however, with the close integration with machine learning, artificial intelligence, and advanced analytics, these GPU optimized resources could come into play.
Some cloud providers like Azure have options for periodic workloads. For certain types of database platforms, such as Azure SQL Database, we have a serverless tier
. Serverless allows for the auto scaling of resources as the workload demands it. There’s also an auto-pause feature where it will pause the compute resources, which can save compute cost. Automatic scalability of resources is common for things like web services; however, having this capability in a database workload is unique.
For unpredictable workloads, cloud providers may provide a pool of resources that various systems can utilize. For Azure SQL Database, this feature is called Elastic Pools
. This allows an organization to set a pool limit for vCores or eDTUs that singleton databases can utilize. If the workload is balanced properly, this can drastically save the organization money from not having to over allocate resources to account for short bursts of resource needs.
Another common scenario is a Hybrid
workload. This is becoming very common for organizations where they would like to take advantage of cloud features. With VPNs and fast dedicated connections like Express Route with Azure, Fast Connect with Oracle, or Direct Connect with AWS, it’s becoming much easier to create hybrid cloud environments by connecting or extending on-premises locations with private cloud and/or public cloud infrastructure, or in some cases, creating multi-cloud environments.
Some common Hybrid scenarios I encounter with database workloads are leveraging Power BI to connect to local data sets for reporting or Azure Data Factory to connect to local data sets for ETL processes. Another common scenario is leveraging the cloud for disaster recovery by either storing backup files in the cloud or extending availability groups or log shipping to a VM.
Top Considerations for Migrating a Workload
When considering moving a workload to the cloud, you must determine what resource specification is needed. All too often, on-premises environments are oversized to accommodate future growth. A huge benefit in the cloud is you can easily scale up resources as your business and workload grow. For this reason, organizations should focus on what their existing workloads need and not just what is currently provisioned.
For CPU, memory, and storage capacity, it’s straightforward. If I have a database server with eight logical cores and I’m averaging 60 – 70% CPU load during normal business hours, I’ll want to make sure I’m using a VM with a similar processor for clock speed and want to stay with at least eight logical cores. I’ll handle memory in a similar fashion. I’ll review current memory counters and see how long data is staying in the buffer pool and make sure I have adequate memory in the new system. I’ve been involved in projects where the on-premises server was severely under-provisioned to where we built out the cloud server with more memory, as well as servers that were vastly over-provisioned, so we backed off and selected a server with a more reasonable amount. Storage capacity is more straightforward where you provision what is currently used and what is forecast for the near future.
Benefits of Capturing Storage Throughput Through Automation
An area consistently overlooked is storage throughput. This is how much data is read and written to disk. Many organizations monitor disk latency which is important; however, they may not be focused on overall throughput because they aren’t being throttled by their local or network storage on-premises.
Traditionally, we’d measure storage throughput by running a benchmark against the existing storage to see what its capabilities are. This is a great method as a benchmark for a stress test; however, it’s not a great method of sizing what’s needed in a new cloud environment. What’s critical is to capture the existing workload for storage throughput so you can size the cloud environment properly. Often this isn’t done, and the organization suffers performance issues after migrating that ultimately cause reputational loss and possibly unforeseen budget issues with having to resize the infrastructure.
To capture storage throughput on a SQL Server database, one of the simplest methods is to use performance monitor, “perfmon,” to capture ‘Disk Read Bytes/sec’ and ‘Disk Write Bytes/sec.’ You can then convert it to MB/s for easier comparison. Another option I use more frequently is to utilize the sys.dm_io_virtual_file_stats. This DMV captures file statistics information since the last SQL Server service restart.
Paul Randal has a blog post
on capturing IO latencies for a period of time that uses a wait-for-delay to capture the difference between snapshots of data. I’ve also blogged
about how I made a few changes to Paul’s script to change the capture duration from 30 minutes to five and then convert the num_of_bytes_written and num_of_bytes_read to MB and then divide by 300 seconds (five minutes) to get the read/write throughput per database. I find this can be very helpful instead of the overall disk level, so if we’re moving certain databases and not the entire workload, we can focus on just those databases. It also helps to pinpoint who the heavy consumers are to see if any tuning can occur to decrease the overall throughput requirement.
Another benefit of using automation to capture this data over time is being able to see when the heavy usage happens. Is the peak usage during backups, maintenance operations, or off-hours ETL processes? If so, in most use cases, those values can be excluded to see what the real end-user activity requires.
The reason this is so important is that cloud resources limit I/O and throughput based on the size of the resource. In Azure, this is true for Azure SQL Database, Azure SQL Managed Instance, and Azure SQL VMs. As the size of the tier or number of vCores increases, so do other limits, such as storage IOPS and throughput.
Monitoring Cloud Workloads vs. On-Premises Workloads
Monitoring workloads in the cloud is very similar to how you would monitor on-premises, with a few exceptions. As noted above, storage throughput is a critical component of having an environment that can handle your workload. There are limits based on your environment, and knowing when you’re nearing or at capacity is very important. A similar process I outlined above utilizing performance monitor or the sys.dm_os_virtual_file_stats DMV will continue to let you know what you’re consuming. However, for Azure Virtual Machines, you can take advantage of a few metrics to monitor for disk or VM level capping:
- At the disk level: Data Disk IOPS Consumed Percentage and Data Disk Bandwidth Consumed Percentage
- At the VM level: VM Cached IOPS Consumed Percentage, VM Cached Bandwidth Consumed Percentage, VM Uncached IOPS Consumed Percentage, and VM Uncached Bandwidth Consumed Percentage
Proper Migration Planning Is the Key to Success
The cloud is prepared to handle just about any workload; however, proper planning is crucial to having a successful migration. Being aware of cloud resource limitations, especially with storage throughput, will help ensure a successful migration with having proper compute resources available. Once you migrate to the cloud, monitoring for resource consumption should also include storage throughput, as that’s a crucial factor for database workloads and could be the driving factor for having to increase compute resources.
You can learn more about how SolarWinds database solutions
are built to provide a deep level of insights to help drive optimizations and improve performance across your data environments before, during, and after migrations. For even deeper insights, see how database observability is a key offering for SolarWinds® Observability