Coburn Watson doesn't fear the challenge of optimizing one of the largest Amazon Web Services deployments in the world. He revels in it.
The cloud performance engineer came to movie rental giant Netflix Inc. less than a year ago because he loves solving problems. He has been doing it ever since he was a kid, back when high-tech meant building applications in the Logo programming language on his TI-99/4A computer. Engineering skills run in the family. Watson's dad started working with large computer systems in the 1970s, focusing on performance issues.
Watson's path to Netflix did not run in a straight line. He left Texas for California as a student, attending UC-Santa Barbara to study aquatic biology, rather than computer science or engineering. Watson felt drawn to the ocean and the analytical side of biology. He started working in biotechnology after graduation, around the same time that computers were starting to play a larger role in analyzing data.
With his background rooted in technology, he became a systems administrator in the lab. That led to a varied career in IT -- he's been an Oracle database administrator, a Java developer, a solutions architect and a performance manager.
We don't enforce policies around AWS usage. We don't tell teams you can only use 100 instances, we don't tell teams you can't deploy until next Monday.
Coburn Watson, Netflix
When the chance to work with one of the world's largest cloud systems came along, Watson found the opportunity too good to pass up. Since he started as manager of cloud performance engineering, his team has identified a major problem facing Netflix -- ensuring overhead does not increase at the same rate as subscriptions. He has worked with the engineering and performance team to create an environment where the costs of cloud infrastructure do not slow down the business.
"We're not necessarily trying to reduce costs," said Watson, who describes Netflix as in the early stages of growth despite its 26 million online subscribers. "Our goal is really to not increase our AWS costs linearly with subscriber count. We do that by optimizing resources."
Controlling how much money is spent on Amazon instances was a major source of concern voiced by attendees at AWS's re:Invent conference held in Las Vegas in November, its global partner and customer conference. There were several sessions and lectures given on the topic, including one by Watson which offered pointers on how to manage instances through optimization, as opposed to hard limits and rules.
"We approached [AWS usage] from the perspective of -- we have a bunch of very sharp engineering teams, everyone wants to make the right decision," he said. "We don't enforce policies around AWS usage. We don't tell teams you can only use 100 instances, we don't tell teams you can't deploy until next Monday."
Instead of a dictatorial approach, Watson takes a more holistic view. Key to that philosophy is the enormous amount of data Netflix gathers on its usage through internal monitoring tools and the reports they generate with analytics.
"We have so many things running out there, I think the ability to have a nice aggregated view is something that's required," Watson said. Netflix has tens of thousands of instances running on a given day and uses automated monitoring and reporting tools to turn all that usage into actionable data. "Given the scale that we're at, you really have to have these tools."
Netflix's tools are crucial for the engineering team, and a few, including an open-source cloud management and deployment tool called Asgard, are available for free to AWS users.
Solving business problems with technology
Watson says the point of all these tools is to constantly deliver data to the people in the company who need it to make decisions. Periodically, he leads a group of managers who discuss what they've learned from the reports. He thinks the meetings help the organization solve business problems. The meetings are particularly productive, he said, because they are held in person at the company's office in Los Gatos, Calif., as opposed to teleconference. Watson values being able to work with his team in person, calling it a perk of working at Netflix.
More on cloud performance
Are users taking advantage of AWS tools?
Cloud performance tools still lacking
Cloud performance FAQs
"When we set these meetings and talk about usage, I can drill down along account, region, zone, instance type. And I can break it down by team," he said. "I work with a bunch of stunning colleagues on very complex problems. We have really incredible efficiency because we all work together in one location."
Through communication, close monitoring of operations and continued development of best practices, Watson has helped create an environment where costs are kept in line without stifling innovation or shackling the engineers.
"We have a principle that we really don't want to get in the way of engineers with capacity planning," he said.
The principle is part of Watson's philosophy toward cloud computing and core to his mission of keeping costs in control regardless of increases in subscribers. To date it has been a success, to the point where potential customers interested in AWS are given the Netflix story as a reason to sign up. It even got the big screen treatment at re:Invent.
Watson remains more fascinated by the problems posed by capacity management than the attention he gets at conferences. He hopes that more companies will benefit from the best practices and open-source technology his team has developed.
This was first published in December 2012