Are You “Flying Blind?”
Would you travel on an airplane if the flight crew were “flying blind?”
What does it mean to be “flying blind?” Freedictionary.com defines it as “to do something based on guesswork, intuition, or without any help or instructions.”
Think about it. Every aircraft is equipped with systems to monitor critical information and alert the flight crew if an indicator is outside of defined thresholds. These systems monitor information about things like air speed, wind direction, external air pressure, fuel levels, and altitude.
Subtle changes in these indicators could result in more significant issues for a flight if not addressed in a proactive and timely manner.
The same can be said for your IT environment. When it comes to monitoring your IT environment, are you “flying blind?”
Are you managing critical technology resources based on intuition? How do you know that systems and infrastructure are performing optimally? How do you know that a critical log file is not going to exhaust its allocated space today—or in the next 10 minutes? How are you monitoring your IT environment?
“Why monitor when everything is going fine?”
That’s exactly the point: everything is fine. Until it’s not.
Whether it’s a disruption of a critical business system, a router that’s down, or a filesystem reaching capacity, is there ever a good time for an interruption of IT services? Your business is fully dependent on having reliable, highly available systems provided by IT. An interruption means that someone within IT must drop whatever they are doing and respond. An interruption means that the business is unable to do its business.
And when things go wrong, it damages IT’s reputation and credibility with the user community.
From the IT perspective, reacting to and addressing outages and interruptions gets in the way of getting requests fulfilled or projects done.
And there is often a cascading effect. Because project deadlines don’t move, IT has less time to complete a project. This often results in insufficient testing or reduced quality prior to implementation.
From a business perspective, a system outage could mean revenue loss. There is also the additive incurred expense resulting from the unplanned cost of an interruption.
But by proactively monitoring the IT environment, you instill confidence in the IT organization. Users grow to depend upon and have trust in the systems that IT provides.
How monitoring saves time and money
The best way that monitoring saves time and money is by alerting IT to intervene before something does go wrong. For example, performance or utilization thresholds can be defined so when a threshold is reached, IT can act, such as add additional space to a filesystem, at a time that doesn’t affect users. Having this capability helps minimize any unexpected downtime that would have resulted had no action taken place.
But when things do go wrong, monitoring still provides a huge benefit.
First, an effective monitoring solution helps quickly pinpoint what is causing an outage. This saves time by focusing on resolving the direct issue rather than troubleshooting or investigating what might be causing an interruption.
Secondly, IT can assign the right resources for an issue. For example, network technicians can be appropriately assigned to issues involving a network device. Server administrators can be appropriately assigned to server performance issues. While it sounds trivial, getting the right resource on the right issue at the right time saves timeand money, and optimizes the use of IT technicians.
What are some basic things to look for in a monitoring tool?
Here are some essential features to look for when evaluating a monitoring tool.
- Ability to define alert thresholds – Monitored resources will have varying degrees of importance and criticality for an organization. Being able to define appropriate thresholds for monitored resources is a must.
- Reporting – Reporting is a “must-have” from the implementation of a monitoring tool. Monitoring alerts indicate a current state; reporting provides the ability to identify and detect trends and patterns.
- Dashboard – The ability to view the current status of all monitored resources from a dashboard is also a “must-have.”
- Automated remediation actions – Monitoring and alerting are just two parts of the monitoring tool puzzle. An effective monitoring tool will also facilitate automated remediation actions, such as restarting services and applications, backing up files, or running pre-defined scripts, in the event of a threshold breach.
- Intuitive setup – In this modern age of IT, tool implementations should be intuitive and require no customization or coding—it should be simple and straightforward. Upon installation, the tool should discover potential computing resources for monitoring. Configuration should be as simple as defining the phone numbers and email addresses for those that should receive alerts.
- Basic definitions and alerting thresholds for common monitoring targets – Using these “out-of-the-box” definitions is a quick way to implement a basic monitoring safety net and deliver quick value from an investment in a tool.
- Monitoring architecture – Does the solution require the installation of agents or probes on each monitoring target? Does the tool require a database server? Different solutions have different infrastructure requirements and must be considered as part of tool selection.
Whatever the solution, you must understand what options and functionality your IT organization must have in a monitoring tool. As with most things IT, it’s easy to be distracted by “nice to have” functionality and features. Keep in mind that those “nice to have” features come at a cost. Identify what you must have in a monitoring tool and how it will benefit your business and your IT organization.
Convincing your management team
The benefits of a monitoring tool are compelling. But how do you convince your management team that you need a monitoring tool?
The first thing to understand is “what is the cost of an outage?” There are both tangible and intangible costs of service interruptions. The tangible costs are relatively easy to identify:
- Time – A system outage means lost time from a business perspective. And in the business world, time is money. But the time that a system or resource is unavailable must be multiplied by the time required to recover from the outage. Recovery time is often overlooked when analyzing the impact and cost of system outage.
- Resources – What resources typically become involved in the event of a service interruption? Don’t forget that it’s not just IT resources—business resources are usually affected as well.
- Business impact – Service interruptions can be easily translated to business impact. If an interruption occurs on a sales support system, your sales department should be able to provide data about lost sales or revenues. If the interruption occurs on a student registration system, the registrar’s office can provide information about the number of registrations that had to be processed manually or how many students were unable to register for classes.
Intangible costs are less straightforward to identify and capture but are equally important. System interruptions mean that IT resources must drop whatever they are doing and address the issue. As a result, project work often gets delayed or end user requests go unfulfilled for longer than they should. If systems delivered by IT are perceived by the user community as being unreliable, IT suffers damage to its reputation and credibility.
After collecting this information, you’re ready to build a compelling business case. A business case provides the opportunity not only to educate your management about the need for a monitoring tool, but also the information needed to make a decision. Be sure to define—in layman’s terms—the problem you’re trying to solve with a monitoring tool. Discuss both the benefits of implementing a monitoring tool, as well as the risks that result from not having a monitoring tool. Most importantly, recommend a course of action and clearly ask for a decision.
If you go through this process, then pick out and implement the right solution for your environment, a year, from now you’ll be thanking your past self. In addition to the time and money you’ll have saved, you’ll have hopefully freed up some of your cycles to focus on more strategic initiatives to your company and your career.
Looking for a network monitoring solution? Download a free trial of SolarWinds® ipMonitor.