imageteam - Fotolia
Big data is of little use until applications can get their hands on it. Data ingestion -- the process of obtaining, importing and formatting data -- becomes critically important as data volumes grow and applications demand its immediate availability. Wal-Mart Stores Inc., the world's largest retailer, with more than 5,000 stores in the United States, turned to Robin Systems for help.
"As big data gets bigger and applications require instant access for real-time streaming analytics, you have to take that data in faster, store it and prepare it for use. That's data ingestion," said Judith Hurwitz, CEO of Hurwitz & Associates LLC, a cloud consultancy in Needham, Mass. "For two enterprises that are otherwise equal, the one that can ingest data faster is going to have a distinct advantage."
That was exactly the challenge facing Wal-Mart. Through the use of a container-based platform for compute and data virtualization, the company was able to raise the ingestion speed of 250 million files by a factor of 8.5, ultimately improving query performance by 250%. These gains were achieved alongside a simultaneous cut in infrastructure, from 16 servers with 320 cores to 10 servers with 160 cores.
Despite its massive presence and technology expertise, Wal-Mart -- like other retail businesses -- is struggling in a world turned upside down by the advent of digital-only retailers, typified by Amazon, according to Sushil Kumar, chief marketing officer at Robin Systems, based in San Jose, Calif.
One challenge facing Wal-Mart was the Savings Catcher component of its mobile app. Shoppers use the app to scan the bar code on their point-of-sale receipts, and Wal-Mart uses the information to compare prices to other retailers that are geographically close by. If a lower price is found, Wal-Mart refunds the difference to a virtual gift card that shoppers can spend on a subsequent store visit. The ingestion process needs to occur within milliseconds of the bar-code scan, Kumar said.
Judith HurwitzCEO, Hurwitz & Associates
Wal-Mart manages about 30 PB of data in Hadoop, with dedicated server clusters for each application, according to Kumar. "For five different Hadoop environments, there would be five different clusters," he said. The downsides of this approach were slowed application development and testing as individual clusters were configured, and led to duplication of data, data ingestion bottlenecks and servers that were typically vastly underutilized.
Problems were exacerbated when different applications needed access to the same data. In this siloed clustered environment, the only way to do that was to have a copy of the data for each application. Hadoop's own automatic redundant replication for data protection created additional copies and infrastructure overhead. The result was the number of application-driven and Hadoop-driven copies of the same data were on the verge of spiraling out of control.
Robin Systems' approach was to containerize storage and decouple it from compute. Doing so eliminated the need for multiple copies, enabled sharing across applications and slashed storage overhead by 50%, Kumar said. Each application still sees the storage as local.
Though improving the shopping experience remained the overriding goal, a direct beneficiary is Wal-Mart's army of application developers, Kumar said. "Decoupling storage from compute gives new agility and flexibility to developers," he said. "Anything that simplifies the development process leads to better applications that can be built, tested and deployed in less time."
The rate at which data can be ingested and acted upon is becoming a differentiator and competitive advantage for businesses, according to Brian Hopkins, a Forrester Research principal analyst serving enterprise architecture professionals. "[In 2016,] ingestion and analytics will become a must-have for digital winners," he said. "The window for turning data into action is narrowing."
Where is transition point from brick-and-mortar retailing to digital-only?
Digital retailing requires changes in store staff operations
Understand ingestion before purchasing analytics tools