The big pros of big data

There is no downside to storing digital data.

With the start of a new year and resolution-making in full force, many of us are purging, streamlining and simplifying. In most areas of our lives, this is probably a good goal. When it comes to big data, however, not so much. In fact, as someone whose office has gone 100% digital, I say hang onto all that mega data — at least until we have a better understanding of how we can put it to work.

The fact is that we cannot ignore big data anymore. Why? Because big data isn’t just big, it’s beyond comprehension and growing exponentially every minute. According to International Data Corporation (IDC), the world’s information now doubles about every year and a half. At press time, IDC reckoned we would create at least 4 ZB (zettabytes) of data in 2014. That data would fill 32 billion iPads (the 128 GB model), which if stacked would be larger than the Great Wall of China. Former Google CEO Eric Schmidt suggests that we now create an entire human history’s worth of data every two days.

What is big data? Simply put, it’s the Internet of things. Everything we do is being captured digitally: websites browsed, books read, geo-locations visited, music listened to, payments made, credit cards used, photos and videos uploaded, you name it and it’s being captured.

Big data can be structured, semi-structured or unstructured. Structured and semi-structured data can include a personal component (Facebook, Instagram, Twitter) and business or enterprise component (financial records, XBRL, RFID, social media postings). Data librarian scientists are already proficient at indexing and identifying trends using this data. For example, homophily, or our tendency to associate with people like us, means that it’s possible for data analysts to determine how and where we’ll spend time and money based on our Facebook friends and LinkedIn contacts. Or, they can easily turn to the saved GPS coordinates from our smartphones to create an accurate picture of exactly where we’ve been in the past month and then derive key pieces of information, such as where we like to shop, spend our free time, etc.

Unstructured data includes voice, videos, PDFs and all the text available on all web pages. According to a recent editorial in the Journal of Information Systems (published by the American Accounting Association), "unstructured data represent the largest proportion of existing data and the greatest opportunity for exploiting Big Data."We can already do simplified searches of all kinds of information, and techniques are being developed and refined to filter through all untagged text, photos and videos. We’ve advanced far enough now that there is huge potential to improve client services.

Before our office became digital, I was often asked how long we should hang on to client documents. This question used to be a balance between litigative exposure, client service and cost. Now that we’re paperless, I see no reason to get rid of anything that has been published because in our digital world, information exists beyond our office and is ready for someone else to use. And, for all practical purposes, data storage costs nothing. Given that there’s no negative to saving digital data, we should absolutely hang on to published unstructured data because one day soon we will be able to sort it, index it and use it to offer better client service and find ways to work more efficiently. Think about what you might learn if you could access all the correspondence you’ve had with clients and suppliers over the past 10 years. The fact is, we have infinite capacity to store information in all its forms. Don’t be afraid. Save it, embrace it and use it.