The topic of big data may sound a bit dry and technical; but, Viktor Mayer-Schönberger and Kenneth Cukier, in their book entitled Big Data: A Revolution that Will Transform How We Live, Work, and Think, bring it to life with clear explanations, historical references, and fascinating examples. In order to explain how big "big data" is, they write:
"If it were all printed in books, they would cover the entire surface of the United States some 52 layers thick. If it were placed on CD-ROMs and stacked up, they would stretch to the moon in five separate piles."
There are numerous critics who hate the label "big data" because it's a relative term and ill-defined. Mayer-Schönberger and Cukier simplify the debate by writing:
"One way to think about the issue today -- and the way we do in the book -- is this: big data refers to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value, in ways that change markets, organizations, the relationship between citizens and governments, and more."
The topic of big data includes everything from the collection of data (which comes from everywhere), to the computing power needed to analyze that data, to the software that has been developed to make the analysis possible. When most people write about big data and its value, what they are generally talking about are analytical insights associated than can be drawn from very large sets of data. As Mayer-Schönberger and Cukier note, "The real revolution is not in the machines that calculate data but in data itself and how we use it."
They point out that historically mankind has progressed as our access to data has increased. As mankind's ability to gather and store knowledge has improved, what we have been able to do with that data has also improved. "It's the same with big data," write Mayer-Schönberger and Cukier, "by changing the amount, we change the essence." Part of that "essence" will be the ubiquity of computers and networks used to augment human judgment. They write:
"Big data is all about seeing and understanding the relations within and among pieces of information, that until recently, we struggled to fully grasp. ... Big data is about three major shifts of mindset that are interlinked and hence reinforce one another. The first is the ability to analyze vast amounts of data about a topic rather than be forced to settle for smaller sets. The second is a willingness to embrace data's real-world messiness rather than privilege exactitude. The third is a growing respect for correlations rather than a continuing quest for elusive causality."
One example the authors give of an insight gained through the analysis of big data involves Walmart. Analysis showed that the sales of Pop-Tarts increased dramatically at stores in areas predicted to be hit by a hurricane. As a result, store managers were directed to place displays of Pop-Tarts near the entrance of the store when hurricanes were forecast. It makes sense, of course, that when faced with a potential natural disaster, people would want to stockpile a food source that, in order to eat, doesn't require preparation or electricity and comes in a waterproof pouch. But, as Mayer-Schönberger and Cukier note, it really doesn't matter why that relationship exists in order for a retailer like Walmart to provide its customers what they want and, at the same time, increase their bottom line at the same time, it's the relationship not the cause that is important. That's the beauty of big data analytics.
The authors note that even though big data has been a hot topic for a couple of years, "In some ways, we haven't yet fully appreciated our new freedom to collect and use larger pools of data." They tell an intriguing tale about statistics and analysis that begins three centuries ago and leads up to how today's big data systems are able to do things like sequence DNA. They write:
"Sampling is an outgrowth of an era of information-processing constraints, when people were measuring the world but lacked the tools to analyze what they had collected. ... The concept of sampling no longer makes as much sense when we can harness large amounts of data. ... So we'll frequently be okay to toss aside the shortcut of random sampling and aim for more comprehensive data instead. Doing so requires ample processing and storage power and cutting-edge tools to analyze it all. It also requires easy and affordable ways to collect the data. In the past, each one of these was an expensive conundrum. But now the cost and complexity of all these pieces of the puzzle have declined dramatically. What was previously the purview of just the biggest companies is now possible for most. Using all the data makes it possible to spot connections that are otherwise cloaked in the vastness of information."
Mayer-Schönberger's and Cukier's book has numerous reviews over the past several months -- most of them recommending the book as a good read. Some reviewers, apparently, believe that Mayer-Schönberger and Cukier are cheerleaders for big data. For example, Kirkus Reviews' assessment of the book (one of the more enthusiastic you will read), states, "Plenty of books extol the technical marvels of our information society, but this is an original analysis of the information itself—trillions of searches, calls, clicks, queries and purchases. ... A fascinating, enthusiastic view of the possibilities of vast computer correlations and the entrepreneurs who are taking advantage of them." ["Big Data," Kirkus Reviews, 17 February 2013] When Gil Press asked them if they were cheerleaders for big data, Cukier quickly remarked, "We are messengers of big data, not its evangelists." Mayer-Schönberger added, "The reviewer did not read the book." ["What's to be Done about Big Data?" Forbes, 11 March 2013] Maybe the reviewer only read the beginning of the book. Most of the worrying aspects surrounding the topic of big data are found near the end of the book. As Evgeny Morozov notes in his review of the book, "Fortunately, 'Big Data' isn't just another cyber-utopian tome, and the final section of the book offers a critical look at some of the darker effects of recording and analyzing everything." ["When More Trumps Better," Wall Street Journal, 8 March 2013]
Press calls the book "an excellent introduction for general audiences." I agree with that assessment. He added, "The most important part of the book is the authors' discussion of potential risks and possible ways to address them, providing a launch-pad to a much-needed conversation regarding what’s to be done about big data." Another reviewer, Hiawatha Bray, agrees with Press that Mayer-Schönberger and Cukier recognize that big data has challenges and the potential for misuse. In his review of the book, he writes, "To their credit, the authors are well aware of technology’s relentless erosion of privacy. Even if you strip names and addresses from a database, it’s possible to identify individuals by analyzing enough of the websites they visit or the Google searches they run. ... Mayer-Schönberger and Cukier offer up some sensible suggestions on how we can have the blessings of big data and our freedoms, too. Just as well; their lively book leaves no doubt that big data’s growth spurt is just beginning." ["‘Big Data’ by Mayer-Schönberger and Cukier," The Boston Globe, 5 March 2013]
Big data is an important topic that is only going to grow in importance in the years ahead. Most pundits believe that we are in the infancy stage of big data and that as it matures the uses to which it can be put might surprise us all. That's why I agree with the authors that big data will transform how we live, work, and think. If you want gain a good basic understanding of the subject, Mayer-Schönberger's and Cukier's book is a good place to start.