The best reason to seek an understanding of the past is to help predict the future. Analytics' implied goal is...
just that, but there's often a gap between the two elements of that value proposition. For cloud analytics, the promise of using historical data to drive everything between planning and real-time operation is exciting but hard to realize. If you want to get the most predictive value from cloud analytics, be sure you capture the scope of information you need, frame data in the way needed to support predictions, expect to have to hybridize tools to get your best results and don't forget to support action.
Data is information. Knowledge is information and context. Prediction is the extension of knowledge into the future. This is the framework of predictive analytics at a high level, but to get the best results from a predictive analytics project, you have to look deeper into all three of these terms -- not in this order but starting with your expected outcomes.
Most cloud users start their predictive cloud analytics project with an implicit but all too often vague goal. "I want to use the cloud more efficiently" is probably how most enterprises would state things, but what's needed is a definition of efficiently. Does it mean lower cost, better quality of experience, what? A good predictive analytics project has to start with very defined goals, because those goals will determine what the output of the prediction has to be and that, in turn, defines the data to be collected and the contextual insight that needs to be applied to it.
Beginning a predictive analytics project
A good starting point is to presume you will collect hard metrics on two results of cloud deployment: application response time and cost. To do that, you'll need to collect specific data on everything that impacts cost, meaning everything your provider charges for. You'll also need to collect data on everything that contributes to response time, including both the cloud and the network services that connect users with the cloud. This data has to be collected with some level of synchronicity; you can't measure cost on Tuesday and quality of experience (QoE) on Friday, or you'll lose the context of your information.
Many users find that they overanalyze. Cloud analytics is most useful when it's correlated with distinct changes in cloud deployment, rather than simply gathered every minute for six months. The low-level information has to be related to things that have changed so that you can predict the results of further changes of that type in the future. You'll need to time-stamp all your data and mark the points where any cost/performance changes to your cloud environment were made.
The predictive approach to using analytics will almost always focus on a before-and-after analysis of these change points. The essential idea is to take analysis of conditions at a change point and extrapolate the results to predict what would happen if the same sort of change were made again. Obviously, this can't be done where the change either has no impact on cost/QoE or where the change can't be replicated. This is where your goals come in.
Most predictive cloud analytics is applied to the scaling of virtual machines and containers, to the location of resources and to the routing of workflows, particularly where they cross cloud boundaries. Often, the conditions that would trigger analysis will have to be forced. Change the number of instances of an application to see how the cost-QoE relationship changes.
This type of predictive application drives cloud planning. It's also possible to drive cloud operations with a predictive analytics project in real time, but this requires very careful planning to avoid unwanted cost or performance consequences.
Few organizations will want to couple predictive cloud analytics into detailed cloud control, because this will almost always require custom software. Instead, think of using analytics to generate events that will then trigger cloud DevOps actions. Using predictive analytics to drive an architected cloud change via DevOps lets a company build in scaling or redeployment limits on applications, limits that will prevent goal-seeking on QoE, for example, doubling application instances deployed, at double the cost.
Some users are interested in using analytics to do dynamic tuning of application scaling, deployment zone usage or multicloud usage. This demands a quick condition-response cycle that is more consistent with complex event processing than with traditional analytics. There's also an option with many public cloud providers to build scaling and resiliency triggers into your cloud hosting using parameters. If this is the case, then use analytics and configuration testing to create different cloud hosting models, and test their cost/performance. Then, enforce the specific configuration using the cloud provider tools.
Cloud analytics and management tools, like Microsoft Operations Management Suite or Amazon CloudWatch, will combine analytics and at least basic remedial steps into a single approach that doesn't rely on external tools or coupling with operations processes via DevOps. Where tools will generate alerts, these can be used to trigger things like scaling. If you want more control, it's normally not difficult to convert these alerts into DevOps events for full control of cloud configuration and integration through an availability or scalability change.
Cloud analytics ultimately gains value by creating actionable insights -- at least at the planning level but eventually more into the real-time domain. There are many cloud trends that are linked to real-time events, both at the application level and for cloud management. As these event-driven concepts mature, they'll impact both the requirements applications impose on the cloud and the mechanisms we have available to turn cloud performance and status information into action.
Strategies and techniques for predictive analytics
Why good data is needed for predictive analytics success
Ten steps to improve data analytics