Is Big Data Creating Big Issues For Storage Performance And Scalability?

Big Data

Over the last decade there has been an explosion in the growth of data. According to a report released earlier this year from analyst house Gartner, many IT leaders are attempting to manage “big data” challenges by focusing on the high volumes of information to the exclusion of the many other dimensions of information management, leaving massive challenges to be addressed later.

One of the key factors behind this explosion of big data is the content intensive businesses and applications creating demanding challenges on storage and file systems. For example, just a few short years ago, it was practically unheard of for organisations to have file counts measured in the billions and content scaling from several hundred terabytes to several petabytes.

Previously, organisations would choose to archive a large percentage of their content on tape. However, in today’s on-demand world, this is no longer feasible. This is because, in the social media centric world that we live in, users demand instantaneous access to their data, and will not tolerate the delays involved in restoring data from a tape vault.

Today, with organisations being unable to keep a handle on the sheer volume of data flowing through the business, this becoming more and more commonplace. Aggravating this problem is the need to distribute content among multiple data centres, either for disaster recovery or to place content at the network edge close to the requesting users to keep latency at a minimum.

This is important for the performance of applications critical to the day to day running of the business such as Oracle Fusion or SAP enterprise resource planning (ERP). In addition to the data that these resource heavy applications produce, IT departments must now deal with massive scale in multiple recreational sites such as Facebook and YouTube.

While Content Delivery Networks (CDNs) can help solve this problem for very frequently accessed content, it quickly becomes too expensive to aid in the distribution of content at the petabyte scale. Traditional storage and file systems are not well suited to solve these problems either, leaving the storage architect with little choice but to attempt to engineer a solution on his own and suffer with complex and low performing systems and/or networks.

Traditional storage and file systems are very challenged in these environments because they have been designed to be deployed in a single location. They provide no mechanisms to create a common data pool across multiple geographically dispersed sites, leaving the systems architect to attempt to solve this problem with a multitude of individual storage systems, file systems, replication software instances, custom coding, and management points.

The good news is that there are now technologies available on the market that can solve these storage performance and scalability problems. These technologies deliver an easy-to-use, massively scalable cloud storage solution for the storage architect who must contend with massive amounts of data running through a modern day organisations network.

They can also deal with extremely high file counts and access rates. Furthermore, they can also keep up to the challenge of simply and efficiently making content available wherever it is needed across the globe.

In summary, in a world that demands massive content stores and the ability to easily and rapidly place the content around the globe, a different approach is clearly needed. Content intensive applications need a storage system that can start small, yet scale easily to dozens of petabytes. The system must be capable of storing tens of billions of files in a single repository.

In addition, it must also be able to serve the content rapidly and handle the workload of millions of users each day. It must be exceedingly easy to use and scale, requiring no particular expertise to deploy or tune. Finally, and perhaps most importantly, it must change the paradigm of storage arrays from being cemented to a single location, to being a globally dispersed (yet centrally managed) system that can actively store, retrieve, and distribute content anywhere it is needed.

SHARETweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Pin on PinterestDigg thisShare on RedditShare on TumblrShare on StumbleUponEmail this to someone

Oliver Robinson is director of product management, Data Direct Networks (DDN). Oliver is a strategic thinker and specialist in product design and strategy focusing on new product launch and working for organisations entering new lines of business. Oliver also has over ten years customer facing experience including over four years senior product management experience and eight years of pre-sales systems engineering with major companies including NetApp, Sun and Symantec.