How to solve your TMI problem: Data science analytics to the rescue
A comprehensive collection of articles, videos and more, hand-picked by our editors
The need to access data diminishes over time. This spectrum of data currency, from new and "hot" to old and "cold," is best served by a combination of different storage practices and technologies.
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
Speaking to a packed conference room at the Cloud Computing Expo in New York, Scott Cleland, senior director of product marketing at Western Digital's HGST business unit, said organizations should consider data currency when devising an enterprise storage strategy.
As an example, Cleland discussed an approach that encompasses a consolidated big data platform for hot-to-warm files up to five years old and a so-called active archive for older cool-to-cold files. An active archive should be thought of as a store of data that is too valuable to discard, but which is accessed only occasionally. Traditionally, files more than a decade old were often relegated to offline tape storage, a technology that today garners little mind share, but which remains widely used.
A complicating yet crucial factor to managing data currency, one that Cleland said is often overlooked, is that data almost always exists in multiple copies.
"We are always working with multiple copies of data, even though we often don't realize it," Cleland said. "We have copies for backup and disaster recovery, copies used by different corporate departments, and more copies around the globe to ensure fast response times from any user location."
These multiple copies not only strain storage budgets, but they also add complexity to the process of analysis, he said.
Scott Clelandsenior director of product marketing, Western Digital's HGST
Though the challenge of keeping multiple copies of gigantic databases and other files fully synchronized was beyond the scope of Cleland's discussion, he cited areas where data duplication can evolve unintentionally. These include analysis clusters, data warehouses, silos that arise from line-of-business departmental use, disaster-recovery snapshots, and overlaps in public- and private-cloud application implementations.
The reason data currency is growing increasingly important is because the rate at which we now amass data is unprecedented, Cleland said. "Data continues to be created at an exponential rate and total capacity of storage hardware being shipped is not able to keep up." Cleland cited data from researcher IDC that forecast by 2020 the total amount of data that is created and replicated annually will surpass 40 zettabytes while so-called capacity shipments will fall shy of 10 zettabytes.
Speaking directly to developers of cloud and mobile applications, Cleland said that storage hardware is only as good as the software tools that surround it. "Developers need to understand that it is first important to get all your data in one place then do a deep analytic run on that data."
What is the role of data redundancy?
Is flash memory the future of archival storage?