Successfully adapting application performance management for the cloud -- that is, cloud APM -- involves several...
- Setting resource boundaries to limit performance variability;
- Adapting monitoring practices to cloud conditions; and
- Employing compensatory measures when direct resource adjustments can't resolve Quality of Experience (QoE) problems.
First, some definitions: APM is the process of monitoring and adjusting application resources to meet a specific Quality of Experience standard set by the business usage of the application. Technically, QoE is the sum of the application execution time and the network delivery time, and both of these can vary significantly in the cloud.
Addressing cloud performance variables
In fact, one of the most significant truths about cloud computing is this: When the potential cloud resource pool is large and geographically diverse, network response times inevitably differ between different locations in the resource pool.
Typically, the difference in network response time is due less to the actual distance than to the number of route hops taken. Farther hosting points typically take more route hops to reach and incur more delay, but the exact number of hops between your users and the cloud hosting points may vary significantly among potential cloud network providers.
Simple testing -- for instance, using the traceroute diagnostic tool -- can establish the performance of connections from each primary worker location to various points in the cloud, and this will help identify the network provider with the best performance.
Monitoring performance, gauging response time
Once you've done everything possible to contain the range of application performance variations associated with the cloud distribution of applications' hosting points, the next step is restructuring application-monitoring practices and tools for the cloud.
Traditionally, APM starts with measuring response time at the user level and then "backs into the application" through successive layers of connection and functionality. Where APM tools are applied either at the user point of service or inside the application/component itself, it should be possible to employ the same tools and practices for cloud-based applications that were used in the data center.
Look for network performance tools that will operate with server-side software rather than with an appliance.
The only requirement for cloud APM: Tools that are expected to co-reside with the application/component must be incorporated into the deployed software image, which means they have to be compatible with the cloud service's hardware and software platform.
Some APM users will employ network probes or other network management tools to detect application packets at critical points to isolate delay sources and identify other problems, something that's clearly impossible to do inside a public cloud. The only realistic monitoring strategy is to inspect packets only at network boundary points, meaning the point of connection to the user and to the application's components. It's likely that APM tools already monitor the user boundary, so what may be needed is integrating network monitoring with application images so that tools deploy with the application into the cloud and can be accessed there.
Where the cloud service involves connections provided by several operators, boundary-point monitoring will be difficult unless one or both operators provide a form of monitoring probe at the interconnection. It may be possible to uncover problems by route tracing (again, using traceroute), but only if the operators expose their infrastructures to the control protocols used. If not, then fault isolation and network-specific remediation (via service-level agreement inquiries) will be difficult. Then, you'll have to move to remediation.
The goal of isolating problem sources is to fix the specific issue causing an application performance problem -- rerouting a connection, changing a hosting location and so on. When you can't isolate the problem or make needed changes because you lack resource control, you'll need to take compensatory performance enhancement measures to improve cloud APM.
Boosting cloud application performance
Available APM techniques fall into two primary groups: network acceleration and component replication for load sharing. Among the biggest mistakes an IT pro can make is thinking that the former should be used for network issues and the latter for computing issues. Anything that improves performance can be used in compensatory enhancement -- whatever the base cause of the performance problem -- because the goal is to making up for the problem rather than fixing it.
Network performance enhancement usually involves a combination of data compression, multipath transmission and traffic prioritization. About half of all enterprises use some form of network performance enhancement in their applications, so they'd naturally expect to migrate the same tools for use in cloud APM. .
The problem is that when the technique requires a pair of devices, one at each end of the network path, it may be impossible to install the application side of the pair in the cloud. Look for network performance tools that will operate with server-side software rather than with an appliance. But be sure the software is compatible with the hardware or software platform of the cloud because it will have to be integrated with the machine image for deployment.
Replication of application components provides additional parallel-processing capacity to improve performance under load, but this mechanism will work for cloud APM only if the performance problem is caused by application load. If you suspect that's the case, your preferred choice would be a higher-performance server or dedicated server.
However, if server performance can't resolve the problem and it's clearly load-related, consider replication. To make replication work, the application must be designed to run as a set of parallel instances with a load balancer dividing the work. To be used in the cloud, the load balancer will likely have to be a cloud-hosted software component.
Most cloud performance problems can be resolved by tuning cloud and network connections, following the same general procedures used for private data center application hosting. The risk is that the procedures needed to keep QoE within bounds will incur additional cloud-hosting charges for special services, and that may compromise the cloud business case. It's best to validate cloud APM issues in the trial phase -- while there's still time to review the decisions and make changes where needed.