/ Network

Finding the Sweet Spot Between Innovation and Optimization

April 8, 2024

Page Contents

Balancing growth and performance optimization is a constant tension in all aspects of business. This is particularly true in DevOps, where there’s a drive to release more features faster to out-pace the competition and a need to stabilize and optimize existing system performance to deliver the best user experience. Yet most DevOps budgets and resources are shrinking. With each release, teams are forced to make difficult trade-off decisions. How can you ensure you are striking the right balance and meeting the needs of new and existing users and the business?

Just A Part of the Job

There is an inherent tension between building new features and optimizing the performance of existing features. It was around in waterfall development, through agile, and now in DevOps. This tension arises from a finite set of resources (time, budget, personnel) and the need to prioritize between competing goals.

Building new features:

Optimizing performance:

Not managing these competing priorities can lead to several challenges:

Scope creep: Adding new features can lead to feature creep, making existing features more complex and harder to maintain.

Technical debt: Delaying performance optimization can accumulate technical debt, making future development more difficult and expensive.

Business impact: Missing features and unaddressed stability or performance issues can result in user dissatisfaction, potentially impacting the bottom line.

Metrics can play a crucial role in managing these tensions by providing data-driven insights to inform decisions. They can help:

Measure the impact of new features: Track user engagement, feature adoption, and business value delivered by new features. This helps assess if the development effort was worthwhile.

Monitor performance metrics: Track key performance indicators (KPIs) like response time, page load time, and error rates. Identify areas where performance can be improved and quantify the potential benefits.

Measure technical debt: Use code analysis tools and code coverage metrics to assess the technical debt accrued by delaying performance optimization. This helps understand the potential future costs of neglecting performance.

Use cost-benefit analysis: Compare the expected benefits of new features with the cost of performance optimization. This helps make informed decisions about resource allocation.

Identifying, measuring, and communicating meaningful metrics can help each DevOps team member prioritize more efficiently. The right metrics can serve as a north star and align prioritization decisions to achieve business and technical objectives. Regularly monitoring and communicating progress toward objectives empowers team members to make informed decisions, make course corrections, and work toward continual improvement. To strike the right balance, teams must select metrics that span performance and development objectives.

Performance Metrics

Everyone is familiar with key performance metrics (KPIs) that gauge the health, performance, and resilience of a SaaS system. These metrics can be used to optimize the performance of preproduction and production systems and can be broadly categorized into four main areas:

Availability and uptime:

Uptime: A measure of the percentage of time the system is accessible to users. Ideally, aiming for an uptime of 99.9% or higher would be best. A dip in uptime can indicate system instability or outages, impacting the user experience and potentially leading to churn.

Mean Time to Repair (MTTR): A measure of the average time to resolve an outage or incident. A lower MTTR indicates a faster response and recovery time, minimizing downtime and its impact on users.

Availability and response time metrics displayed in SolarWinds Observability SaaS

Performance and speed:

Response time: A measure of the time it takes for the system to respond to user requests. Faster response times improve user experience and increase user satisfaction. Aim for response times below 2 seconds for optimal performance.

Page load time: A measure of the time it takes for a webpage to load fully. Similar to response time, faster page load times contribute to a better user experience and improved engagement. Throughput: This measures the number of requests the system can handle per unit of time. High throughput ensures the system can handle peak loads without performance degradation.

Resource utilization:

CPU utilization: This measures the percentage of the system's available CPU resources. High CPU utilization can lead to performance bottlenecks and slowdowns.

Memory utilization: This measures the percentage of available memory resources the system uses. Similar to CPU utilization, high memory usage can impact performance.

Network utilization: This measures the percentage of available network bandwidth the system uses. High network utilization can lead to congestion and slowdowns.

CPU and memory utilization metrics displayed in SolarWinds Observability SaaS

Error rates and stability:

Error rate: A measure of the percentage of user requests resulting in errors. A low error rate indicates a stable and reliable system.

Application crashes: This measures the number of times the application unexpectedly crashes or terminates. Frequent crashes can be disruptive to users and indicate underlying system instability.

Development and Release Metrics

While performance metrics such as uptime, error rates, and response time have been used since the 1970s to measure system performance, it was with the rise of Agile and DevOps release metrics became more common. First, the adoption of Agile and the focus on continuous improvement created a need to measure common release metrics such as deployment frequency to track improvements. Then DevOps expanded the definition of release to include the entire software delivery pipeline and increased focus on performance across the DevOps team.

In 2014, a research initiative at Google Cloud Platform, the DevOps Research and Assessment (DORA), performed its first study on the practices of software delivery teams. The DORA survey and report aim to understand the capabilities that drive software delivery and operations performance.

The survey of DevOps professionals is conducted annually and measures release performance across a variety of release, performance, and business metrics. The DORA report is widely cited by industry analysts and thought leaders. It is a valuable resource for anyone interested in learning more about DevOps practices and how to improve their own team's performance.

The DORA metrics have become widely accepted as a standard for measuring software delivery performance. They are used by teams of all sizes and in all industries to track their progress and identify areas for improvement. They are composed of a set of four metrics designed to be used to measure the performance of software development teams.

Deployment frequency: How often does the team deploy new code to production?

Lead time for changes: How long does it take for a change to go from being committed to code to being deployed to production?

Change failure rate: How often does a deployment cause a production failure?

Mean time to restore (MTTR): How long does it take to recover from a production failure?

Using these metrics and comparing outcomes across a variety of teams, the DORA team found that high-performing teams have significantly higher deployment frequencies, shorter lead times for changes, lower change failure rates, and faster MTTRs than low-performing teams.

Performance and Release Metrics Together

Release delivery metrics and performance metrics work together to paint a holistic picture of your custom application's health and effectiveness. Measuring and analyzing both types of metrics provides the perspective needed to inform better prioritization and trade-off decisions.

Release Delivery Metrics: Track the predictability, efficiency, and quality of the release cycle. These may include:

Release cycle time: Time from idea to production.

Deployment frequency: Number of releases per unit time.

Lead time for changes: Time to take a change from ideation to implementation.

Defect escape rate: Percentage of defects that make it to production.

Performance Metrics: Measure the application's ability to meet user expectations and business objectives. These may include:

Response time: Time taken for the application to respond to a user request.

Uptime: Percentage of time the application is available for use.

Resource utilization: CPU, memory, and network usage by the application.

User Feedback: Survey of users about their experience with the application.

At a high level, release metrics measure the efficiency and quality of the delivery pipeline, while performance metrics identify areas for improvement. Both sets of metrics come together to provide a complete picture of application health. Using this combined insight, you can make informed decisions to optimize your development process, enhance application performance, and ultimately deliver greater value to your users and business.

Wrapping up

A challenge teams face regularly is finding the perfect balance between driving new feature development and optimizing system performance. Tracking and analyzing a combined set of release and performance metrics captures both objectives and offers deep insight into an application's strengths and weaknesses. For example, devoting resources to delivering more features more quickly may feel like progress. Still, if those new features add instability or degrade system performance, additional resources may be needed for troubleshooting and bug fixes. In the same way, identifying performance issues can highlight areas for improvement in the system and release process.

Release delivery and application performance metrics are complementary and work together to provide a complete understanding of application health. To operationalize this approach, select the metrics to track based on your goals and identify and put in place tools, like SolarWinds Observability SaaS, to collect and analyze the data. SolarWinds Observability SaaS provides a full-stack monitoring solution connecting the data points across web applications and back-end systems to streamline management and provide actionable insights. If you want to try it for yourself, sign up for a fully functional 30-day free trial of SolarWinds Observability SaaS.