Managing The User Experience And The Perception Of Downtime

IT System Downtime

IT is not a luxury – it’s a key enabler for almost every business activity. Just as you expect water to come out of the tap when you turn it on, you expect your corporate IT systems to perform properly every time you log on. However, every corporate IT system will experience performance issues at some point. They can be caused by many things, from an application issue to increased user expectations of application response times to a problem with a service provided by a third party.

Tracking down a problem is increasingly difficult, as it could be located anywhere across the WAN, LAN, WLAN or at a remote site as enterprise networks become increasingly borderless. The key to tackling these issues is end-user experience monitoring.

Users today expect increasingly rapid response times. When faced with a slower response than usual, they will not be aware of the root cause of the problem – it’s simply that their application is not working properly. So they call the Helpdesk. “Just get it working, the sooner the better,” they say (or demand). However complex the problem may be, the user expects it to be fixed quickly so they can continue with their job. When the user’s response time expectations are not met they perceive this as ‘downtime’, even if the network or application is not technically ‘down’.

Furthermore, downtime wastes time and money, affecting all aspects of business from user productivity to customer service. If customers experience a slow response, they will simply go elsewhere. According to report from the Ponemon Institute, the average cost of downtime per minute is $5,600. When you consider that the average downtime is 90 minutes, the average cost of an incident can be half a million dollars.

The challenge is to identify these incidents quickly so that they can be addressed before they have a major impact on users. The increased interdependency of network and applications and cost of downtime – or what users perceive as downtime – mean it is no longer enough to use a discrete tool and say “it’s not the network” or ‘my servers are fine”. Separate network and application monitoring tools are not designed to manage the interplay between the network and application environments. End-user experience monitoring is key as it allows IT staff to identify problems at an early stage and track down their root cause more quickly.

The system also has to show when incidents have been handled by network redundancy – for example, when traffic is rerouted through a back-up route because a link goes down. While the network may be able to cope with the problem, if the team are not alerted they will not know that a component has failed, leaving them exposed to increased risk of a complete failure which will impact the business.

Another factor is the number of problems that are never fixed, including transient or intermittent issues that are difficult to track down unless historic data is available or events can be reconstructed. According to Forrester1, 31% of performance issues take more than a month to resolve or are never resolved at all. Therefore,

The solution that is emerging is called Application-aware Network Performance Management (AANPM). This takes an application-centric view of everything happening across the network and the interdependencies and enables engineers to monitor and optimise the end user experience. It looks at applications in terms of how they are deployed and how they are performing, not in terms of coding.

Earlier this year Gartner recognised this holistic monitoring by publishing their first Magic Quadrant on the Network Performance Monitoring and Diagnostics (NPMD) market, which some other analysts call AANPM. This emphasised the need for a single integrated solution to monitor, troubleshoot and analyse networks and—even more important—the applications and services they carry.

By using data from both application and network performance methodologies, AANPM helps your IT staff work together as a team to solve downtime problems quickly and optimise performance. It helps network engineers overcome the visibility challenges presented by virtualisation, BYOD and cloud based services and identify performance issues anywhere along the network path. By tracking the flow of data and messages between the various elements of the network, it helps get to the root cause of any performance problems more quickly, enabling engineers to spot and fix them before they escalate.

It also keeps the focus on the end-user experience by identifying when a user is experiencing poor response times and which component is contributing to the delay. This actionable performance data can be shared between network and applications teams to identify what led to the problem and which component needs attention.

It is important to remember, however, that tools such as AANPM cannot be used in isolation. IT staff need to be trained to work across disciplines, not in silos, and to ensure the proper planning processes are in place. With the appropriate tools, training and planning, they will be able to prevent user perceptions of downtime from leading to loss of productivity and reducing the quality of customer service.

rgb_holder r 050913_293 300px

Roger Holder is currently EMEA Field Marketing Manager for Fluke Networks. Roger has tremendous industry knowledge and unique experience within IT and has extensive knowledge of SAP systems. He is a thought leader in his field, writing about key industry issues such as the requirements of network management software and the challenges of monitoring in today’s complex enterprise networks. He has particular expertise in Application Performance Management solutions, having specialised in the field since 2011.