The plotline for a data center's Big Data story is still being written, and already there is no lack of twists and turns. The Internet of Things (IoT) is just the latest in a multi-episode drama that'll spawn as many shark-jumping forecasts as processes.
One of the subplots that tends to be overlooked is the new attitude toward "archive." Past application design usually relegates archiving to an afterthought, treating it the same as old log files and superseded discarded installation images. Data scientists and others have argued that these archived data sets deserve (at the least) elevated status. Marketing gurus even believe some archives might hold keys to improved customer retention and higher resolution metrics for past campaigns.
They're not speaking of occasional use — pulling a tape from the vault every now and then. They want to fully integrate current data analytics with those of the past. A challenge for records management firms? A new business model, maybe, but more importantly, a bigger model for the data center.
"If businesses are to extract value from years of history and corporate memory, they must store data in a fully accessible database or data store with access methods that are standards-based so they don't need to maintain a different set of skills and tools," Ashar Baig wrote in Gigaom. "For some organizations, combining current and historical data sets is optimal for providing organizational stakeholders with query access to production data warehouses and data archives."
In its work to define "Big Data" for itself, the NIST Big Data Working Group used "archive" as a leading characteristic: Big Data is a data set too big to readily archive in previous notions of archive.
These ideas bring archives out of the vault and back into the data center.
Sure, there will be degrees of relevance for data. It's tempting to be dismissive about old logs associated with long-discarded servers and desktops — or applications whose developers had to close up shop decades ago.
But not so fast.
A few farsighted folks in HR have studied data lakes such as that of payroll giant ADP's DataCloud, used for predictive HR analytics. Those supposedly useless logs might be able to help characterize the performance of employees, or help workforce managers better shape position management — a discipline that tends to limp along with a bare minimum of empirical traction. Other types of logs from configuration management systems could guide CFO lease-vs.-buy and cloud-vs.-local processing decision-making.
The dominant recipe for today's application design is to capture more data and to discard and archive less of it.
Then there's forensics. Title that episode of the show, "Data Science, Meet Dr. Compliance."
Just as past tips on managing network bandwidth touched on fat email attachments, impacts from the death of archiving as we know it will continue to ripple across growing data lakes for years to come. The drought across the western U.S. is ongoing, but water shortages and the sucking sound of data center power draw won't stop it.
There'll be no cancelling this show anytime soon.
Get our latest blog posts delivered in a weekly email.