If the users aren't happy, then in short order, neither will anyone else be. The ubiquity of information technology has exposed users to the best the IT industry has to offer—and the worst. So, if end-user experience is king, how does one go about ensuring users stay happy?
The short answer is
application performance management (APM) tools. Users don’t care if the apps they use are in the cloud, on-premises, or something in between. Users expect their experience to be painless, fast, and error free. It makes the most sense, then, to measure the user experience with tools also capable of application-specific monitoring and analysis in conjunction with infrastructure monitoring and analysis, and that’s exactly where APM tools come in.
APM tools tease signal from noise, examining multiple metrics from “clicks” to “code” and all the infrastructure in between to identify performance problems with applications and their related databases, no matter how complex or distributed they may be. These metrics vary from traditional infrastructure performance indicators such as CPU or storage utilization to performing actual transactional requests directly against an application and its infrastructure to determine real-world latency. With a lot of comprehensive and deep visibility in between.
APM tools can be powerful, but they require knowledge to use. User experience is subjective. IT practitioners using an APM tool need to have knowledge of how to translate user performance expectations into “metrics that matter” for the APM tool to measure and manage, and this requires at least a basic understanding of how the application works and the infrastructure it relies upon, as well as what users expect of it.
Users don’t care about the details of how applications work anymore than they care how the traction control works in their car. If it works as expected, when needed, that’s all they need to know. However, a user’s decision about when application use is needed and their expectations of the performance of the application can both vary radically depending on the specific user.
And if user expectation variability weren't enough of a challenge, end-user diversity certainly is. Users can be customers, employees, business partners, and more. Users can operate from inside the organizational perimeter, where most of the infrastructure involved is directly controlled by your organization—and thus not shared with uncontrollable or unforeseen “noisy neighbors.” Users can also operate from outside the organizational perimeter, where there are any number of intermediate infrastructures, all controlled by different intermediate organizations, with each offering the potential of being a performance bottleneck at any given time.
Further extending complexity, the applications themselves may be "simple" traditional monolithic applications, or they can be massive, sprawling, microservices-based architectures consisting of thousands of dynamically changing individual workloads spread across multiple geographic regions.
It's up to your team to fill in the blanks. So how do you do that?
What Users Care About
Ultimately, users care about four things. Uptime (is the service available when they need it?), accuracy (does it reliably get the job done?), speed (is it fast enough not to be frustrating?), and simplicity. Of these, simplicity is both the most subjective and, to a certain extent, out of the hands of IT operations staff.
Simplicity can be defined as both the ability to find features users want within an application (discoverability) and the ability to accomplish what is desired with minimum fuss. Simplicity is mostly the responsibility of app designers but could become an IT operations problem in some situations.
If a user workflow requires several different apps and services, then IT operations concerns may be a consideration due to operational interactions between them. Poor IP address management (IPAM), for example, could leave a key part of an application offline as individual workloads fail to obtain IP addresses or fail to have them registered with a load balancer.
Similarly, the relative speeds of different services within a distributed application matter. One link in a process chain exhibiting poor performance can easily render the entire application experience undesirable for the user.
Further complicating matters, the user experience can vary for different users of the same distributed application. Each user can potentially have their work processed through unique chains of microservices, as determined by any number of variables, ranging from geography to load balancers routing workloads based on usage.
The use of a microservices architecture mitigates but doesn’t eliminate shared resource contention, especially when certain services are more monolithic—or less able to horizontally scale—than others. This means operational decisions about workload locality and prioritization (for shared services) will also affect the performance of the application as experienced by the end user, further increasing the importance of adequate monitoring and analysis.
Inside the Perimeter, Outside, or Both?
IT inside the perimeter can refer to on-premises IT, IT in colocation facilities, or any IT deployments wherein the organization has complete (or nearly complete) control of the underlying infrastructure, including compute, storage, network, and databases. There’s room for equivocation regarding various implementation details, but the basic principle behind the use of this term is if you control the infrastructure itself, it's a lot easier to identify and solve problems.
It's easy to accept on-premises IT infrastructure as "inside the perimeter." It's physically located on an organization's premises. It’s quite literally behind the perimeter firewall(s) and almost always under the control of the organization itself. (Although with the growing trend toward on-premises IT managed other service providers, even this definition isn’t as simple as it once was.)
Applications using public cloud technology, however, can also be inside the perimeter. Administrators have significant control over certain public cloud technologies—VMware on AWS is a great example—and the use of virtual firewalls with VPN or software-defined WAN (SD-WAN) technologies can connect resources in the public cloud to on-premises infrastructure in a secure fashion.
Of course, anything in the public cloud is subject to the vagaries of all intervening networks. If a critical internet service provider goes down between a user and the cloud resource, the user may experience significantly degraded performance or lack of any service availability at all.
End-user computing (EUC) is another example of IT that’s potentially contentious to define as either wholly inside or outside the perimeter. EUC devices may be entirely controlled by the organization, offering the insight and configuration capabilities traditionally associated with being inside the perimeter. They can also use VPNs, giving them a similar profile to secured public cloud resources.
Like public cloud resources, however, EUC devices can be "somewhere on the other side of the internet," and often not in a friendly part, either. EUC devices can be a smartphone, trapped behind multi-tiered carrier-grade NAT, meaning that the device doesn’t have a public IPv4 address, or a realistic means to perform port forwarding. EUC devices could also be behind a firewall (for example, at a hotel) severely restricting the protocols allowed.
The variability in experience and the rapidly evolving definition of "the perimeter" underscore the importance of measuring the user experience from multiple points. It isn't enough to slap a monitoring application on your on-premises application, verify it works fine according to a server physically next door to the one the application is running on, and call it a day.
To consistently deliver an acceptable—let alone excellent!—user experience, you simply must measure the application under the conditions your users will actually experience—wherever those users happen to be, and in whatever relation to the applications they’ll access that the real world requires.
Consistency Is Crucial
Many IT professionals focus too narrowly on infrastructure performance as a “proxy” for application performance and availability. IT teams wait until someone screams, look for a bottleneck, alleviate the bottleneck, and repeat. Apart from being an entirely reactive approach to problem resolution, this approach misses measuring and responding to much of the complexity affecting the modern user experience. After all, you can’t manage (let alone optimize) what you don’t measure!
It's easy to tweak the nerd knobs on equipment entirely under one's control. It takes skill, patience, and the right tools to assess and remediate user experience issues where a significant portion of the underlying and/or intervening infrastructure belongs to, and is operated by, someone else.
The key word to take away from all of this is "consistently." Can you
consistently deliver the user experience that you've promised, no matter where that user happens to be? Can you meet their expectations, no matter what they are?