In recent posts, I have talked at length about “Big” files and the issues around the transport and management of such data sizes, but now is the time to talk about “Big” data. The market is well aware of the pros and cons of traditional methods to address the file transfer challenges and the resulting Managed File Transfer (MFT) solutions. However, along with the many benefits of implementing a MFT solution come side effects in the form of additional data silos. These silos while created to protect sensitive data with such tools as encryption, multi-factor authentication, and data abstraction invariably obscure a significant and sizable amount of data from the landscape of Big Data tool sets and ancillary solutions.
Let’s put this into perspective. According to IBM, we create 2.5 Quintillion bytes (SI equivalent Exabytes) of data on a daily basis. Assuming the current capacity of the average hard drive nowadays is typically 1 Terabyte (TB) or more, this equates to approximately 2.5 million drives worth of data generated every day. Ninety percent of the data currently has been created in the past two years with totals expecting to grow exponentially. When we start to aggregate these figures, the result becomes a staggering sum.
From the file transfer vantage point, MFT vendors are also seeing a correlation in individual file sizes with an estimated 40% increase during a five year period (2006-2011). This can be attributed to a variety of reasons such as backwards compatibility support and rich object embedding along with an overlay of cheaper storage and faster networks.
Although the issues around how to store, structure, and extract data is a focus for enterprises nowadays, one must recognize that IT organizations will be looking for applications that can easily integrate into their Big Data Tool sets. MFT data silos certainly fall into this category as they contain data from a multitude of agnostic sources – generated by people, process, or product. The data resident in these silos oftentimes lacks visibility by even the originating application itself much less the Big Data offerings which is critical to address. As exciting as Big Data is, there are still a number of areas to consider especially when it comes to the known and unknown data stores across the enterprise. To marginalize the technology efforts, MFT vendors need to evaluate their own solutions and provide the necessary logging, reporting, and APIs to bring visibility to the Big Data trend.