From the Blogosphere
A Sixth Thing CIOs Should Know About Big Data
Are you prepared?
Aug. 13, 2012 05:00 AM
Amid the slew of articles offering advice on Big Data, Joab Jackson's, Five Things CIOs Should Know About Big Data. stood out because of how absolutely spot on it was.
The five points he makes nearly always come up in our conversations with customers and prospects:
- You will need to think about big data. What we're seeing now is that the price of entry to big data, at least from a CapEx standpoint, is pretty low. Open source tools like Hadoop, Cassandra, MongoDB, MapReduce and others, combined with the relatively low price of cloud computing, means organizations that may not have been inclined to collect, store and analyze their data volumes are now more willing and able to do so.
- Useful data can come from anywhere. Data that used to be "dropped on the floor" is one way to categorize big data. Gazzang CEO, Larry Warnock, likens to big data to a giant fishing net trolling the ocean floor. What we're hearing from customers is that big data is often a combination of innocuous machine exhaust, customer transaction histories, geolocation, and some personally identifiable information like health records and bank account data. How you use those disparate pieces of data to enhance your business or advance a project is what big data is all about.
- You will need new expertise for big data. Could big data be the next growth industry? We certainly think and hope so.
- Big data doesn't require organization beforehand. Here we have the analogy of big data as a "dumping ground." Poor big data. In just the last three paragraphs, we've referred to it as stuff you drop on the floor, a fishing net scooping up debris and a dumping ground. If big data were a kid, he'd be in therapy right now.
The point is valid nonetheless. Big data allows you to ingest what you want, and worry about how you're going to use it later. This is how sensitive information often winds up in a big data environment.
- Big Data is not only about Hadoop. There are a number of really popular tools out there to help you make sense of your massive volumes of data. Joab mentions Splunk, HPCC Systems and MarkLogic. We have customers also using MongoDB, Ironfan from Infochimps and Chef for cloud infrastructure automation.
In the next few weeks, Gazzang will bring to market a new big data monitoring and diagnostics tool called zOps. Stay tuned for news on the newest member of the Gazzang product family.
Finally, I wanted to add a sixth, and final piece of advice to Joab's article.
- Think about security before you start. Too often, we hear from companies that leave data unprotected in a big data environment only to realize later that usernames and passwords, credit card data or health records were at risk of exposure. Fortunately, this hasn't come back to bite anyone yet (that we know of), but it's likely only a matter of time.
Retrofitting security into an existing big data cluster, which may contain thousands of nodes, is challenging. It also takes time to understand what data is being collected and whether it's even worth protecting.
Data encryption and key management can act as a last line of defense against unauthorized access or attack. It's relatively inexpensive and won't noticeably impact performance or availability of big data. So our advice to customers is, if you think you might have some sensitive data in your environment, it's better to be safe then sorry.