Home > A Guide to Stack Overflow’s Path to the Cloud

A Guide to Stack Overflow’s Path to the Cloud

Like many companies, Stack Overflow is trying to get out of the business of running our architecture in our own data centers; instead, we want to offload some of the more mundane parts of system administration to a cloud service offering like Azure. I’m going to cut to the chase for the purpose of this article and concede we’ve already decided on Azure for the majority of our infrastructure and, most importantly to me, our databases.

Why Stack Overflow Chose Azure SQL Managed Instance

We eventually want to move our existing on-premises databases to platform as a service (PaaS), and our target at the moment is Azure SQL Managed Instance. Managed Instance is the right balance of alleviating our patching, backups, and high availability efforts, while still allowing us to support cross-database queries, applications that haven’t been fully tested with read committed snapshot isolation, and offerings like Microsoft Teams, which all create many tenant databases—a cost-prohibitive approach in Azure SQL Database. (Several other factors described here apply, too.) All of our existing on-premises databases run on SQL Server 2019. To move to Managed Instance, we have a self-imposed but critical contingency limitation: we need the ability to fail back. This is a bad combination because, today, Managed Instance (MI) only supports one-way migration from SQL Server 2019. So, we could join an MI to our on-premises Availability Group (AG) and use it as a secondary. But as soon as we fail over to it, that’s it; we’re not failing back. Diagram of one way migration of Managed Instance from SQL Server 2019

Pros and Cons of Upgrading to SQL Server 2022

The other fast-track option for us would be to upgrade to SQL Server 2022 in our data center, which would give us a direct migration path to MI. But this means we’d have to plan out the upgrade in the data center before ever thinking about moving to the cloud. And adding a 2022 instance to an existing 2019 AG presents the same type of problem: we can’t test failing over to that instance as the primary while keeping the contingency of failing back. Diagram of adding a SQL Server 2022 instance to an existing 2019 Availability Group This also would introduce too many major architectural changes a little too rapidly for our taste. While we were part of the Early Access Program (EAP), we simply didn’t have the time budgeted to spend upgrading to SQL Server 2022 in our data center before the move to Azure. Never mind that to be fully confident in 2022 for our critical workloads, we’d have to wait for it to be generally available to start the process, and probably not until at least the first cumulative update.

Migrating to Azure VMs Before Upgrading to SQL Server 2022

For our timeline, it makes more sense to move to Azure VMs in the short term, where we would put SQL Server 2019 instances and add them to our existing AGs. This approach would retain our ability to fail back to the data center if needed since it’s merely another on-premises instance in a new location. Diagram of adding SQL Server 2019 instance to existing Availability Groups with Azure VM We would wait to add SQL Server 2022 to the mix until after we finish migrating to Azure VMs, and after all applications and services are certified to run there successfully. With the data center (and the potential need to fail back there) out of the picture, we can then start the task of upgrading to SQL Server 2022 in our Azure VMs. We could begin by adding a secondary, using it for read-only workloads, and eventually make it the primary (and phase out 2019 as we add additional 2022 instances). Diagram of phasing out SQL Server 2019 for SQL Server 2022 In this scenario, we’d still have a point of no return, where the first time we switch to a 2022 primary would mean we can’t fail back to 2019. But once we’re fully in Azure, we’d be able to test all of our infrastructure independently and with no other differences, something we can’t do today. This testing would give us much greater confidence in our ability to upgrade our production workloads with less reliance on needing to fail back. Sometime after this, we’d start the Managed Instance conversation. With multiple 2022 instances in place in Azure VMs, we could then introduce an MI secondary. We could begin with read-only workloads, as well as testing our applications and services against MI independently (like we plan to do with 2022). Now, though, we would have the full ability to fail back from MI to 2022, so there would no longer be a point of no return for this part of the evolution. Diagram of multiple SQL Server 2022 instances in Azure VMs with Managed Instance secondary

SQL Server 2022 New Features To Help Ease Migration Challenges

Regardless if it takes a long time to work up from being on 2022 in Azure VMs to being on Managed Instance, we’ll get to take advantage of several SQL Server 2022 features in the meantime:
  • Contained Availability Groups – To reduce the amount of server-specific configuration, we need to mirror on every replica, like logins and jobs
  • Azure Active Directory – To ease the migration to Azure as a whole and to help eliminate the dependency on our existing on-premises Active Directory
  • Data virtualization – To reduce the need to ingest SQL Server data from other analytics platforms, such as Snowflake
  • New T-SQL functionality – Some of our queries would be greatly simplified by new built-in functions like DATETRUNC, DATE_BUCKET, and GENERATE_SERIES, as well as the additional ordinal capability of STRING_SPLIT
  • Various performance enhancements – From more WAIT_AT_LOW_PRIORITY options to memory and parallelism feedback to Query Store and Accelerated Database Recovery enhancements, there are potential low-effort improvements in many corners of our workloads

Other Considerations for Cloud Migration

There are many different pieces of information factoring into whether you should move to the cloud and which one(s). And while I’ve covered many of the technical details, there are other considerations to also keep in mind: cost (including the ability to shift existing sunk cost licenses), stability, existing expertise, job market, and many other factors. So, this is our current long-term plan to migrate Stack Overflow to the cloud.  While not designed for speed, our ultimate goal is to introduce the fewest points of no return, which I highly recommend you also prioritize with your own cloud migration plans.

How SolarWinds Can Help Set Cloud Migrations Up for Success

Understanding what’s happening across your environment(s) before, during, and after cloud migration activities is key to planning better, rolling out faster, and optimizing performance more easily. However, monitoring database performance is typically only one piece of the strategy. With a solutions suite that built to supports relational and open-source database types, SolarWinds® software can help you benchmarks database performance on on-premises and then , then help you measure it once those workloads are in the cloud. You can learn more about baselining performance as you’re planning your cloud migration journey in this webcast.
Aaron Bertrand
Aaron Bertrand has been working with SQL Server since version 6.5. He created the first great SQL Server-related website, aspfaq, and has earned the Microsoft…
Read more