Shipping Code Fast and Often
For Shopify’s developers and DBAs, working in a high-availability environment means delivering at high velocity. It’s not unusual for them to deploy new software 20 times per day, and during peak periods—such as Black Friday and Cyber Monday—they have deployed new code over 80 times in a single day. Sergio Roysen, Shopify's MySQL DBA Team lead has told us that "Shopify deploys new software twenty times a day, sometimes. After Black Friday, after the code freeze, we deployed eighty-six times in the same day. Each of these new deploys can introduce new queries. Tracking those queries without SolarWinds® Database Performance Monitor (DPM) would take 24 hours. It could be too late. We could be down for 24 hours until we come up with a solution. Real-time monitoring is the only way to keep our wheels running."Monitoring What Matters
The Shopify team maintain a laser focus on anything that gets in the way of performance. Their approach demands extreme precision and, for that precision, clear visibility into how the system behaves. Here’s how:- Tracking fast query effects. One part of Shopify’s existing monitoring process included maintaining a digest of slow queries. The problem was obtaining visibility into high-frequency, really fast queries. Though each query executed in just microseconds--and low latency is great--there were potentially millions firing every hour, which all together consumed significant time in the application, plus potentially made it hard to see if there was any issue within one of these fast queries. Once Shopify started using SolarWinds DPM, they were able to zoom to one-second granularity, which allowed them to spot and cache every query, pinpointing any that were causing performance issues.
- Getting rid of bad queries. The team constantly look for patterns that could reveal issues, such as a particular shop generating an inordinate number of queries. By profiling queries, they can quickly find those that are causing a problem on a certain database. Once discovered, they can either get rid of these queries entirely, or fix those that may have been hung up by using a bad index or no index at all. This continual finding, fixing, and culling process helps purge their application of performance issues.
- Working from packet captures. During a recent period of high volume, the team deployed new Redis capacity. They were seeing spikes in latency and needed to confirm that they had provisioned enough resources to match the heavy traffic. Because DPM allowed them to monitor packet captures— and not just queries—they could see the latency increases introduced by the new connections caused by the Redis deploy. In the past, it was difficult to gain visibility into this dimension, as traditional database monitoring solutions don't enable engineering teams to see packet traffic. However, when given the right monitoring tool, they were able to see the exact metrics they knew they should look for and monitor, throughout the traffic surge.
- Developer testing of queries. Developers across different teams in the organization are responsible for not only writing new queries, but performance testing them as well. They use performance monitoring to evaluate poor-performing queries and confirm that they have either been fixed or removed. By giving developers access to monitoring tools, Shopify eliminates the need for complicated back-and-forths between devs and DBAs, to ensure that code pushes won't have a detrimental effect in production. Administrative functions such as role-based and team-based access control make it possible for even non-DBA engineers to navigate sophisticated monitoring platforms without risk. Shopify has over 500 users on their DPM install, benefitting from SAML integration and Okta single sign-on. That means users of many different specialties and approaches can have an input on Shopify's monitoring.