Storage Magazine - UK
  HOW TO HANDLE THE DATA EXPLOSION!

HOW TO HANDLE THE DATA EXPLOSION!

From STORAGE Magazine Vol 7, Issue 8 - November/December 2007

Ron Condon, of information management company EMC, assesses the impact that rocketing data volumes will have on business and suggests ways to handle the challenge

The world is experiencing what can only be described as an explosion of digital data.

From the emails that keep us in instant communication, to the music that we download on the internet, to the videos that sit on social networking sites such
as YouTube, the volume of this data is expanding at such a rate that we are in danger of running out of the means to store it all (visit EMC ticker for the latest figures: http://www.emc.com/about/ destination/digital_universe/).

The sums involved are truly mind-boggling. Most of us have become acquainted with terms like a megabyte (equal to one million bytes or characters of information) from our encounters with personal computers. Some of us might even know that a gigabyte is 1,000 megabytes. But these terms are inadequate to describe the huge volumes of data being unleashed into our systems and on the internet. According to research company IDC, the amount of digital information created and replicated last year amounted to 161 exabytes - that's 161 billion gigabytes. Bringing it into more familiar terms, that is about three million times the information contained in all the books ever written,
up to the present day.

But that is just the beginning. By 2010,states IDC, the amount of information added every year to the digital universe will increase more than six-fold to 988 exabytes. And the pace of growth will continue to increase.

What's driving it?
Much of the expansion is being driven by the conversion of analogue material to digital format. This includes the move from film to digital image capture, from analogue to digital voice, and analogue to digital TV. And, says IDC, the biggest component in the digital universe is the images captured by more than one billion devices around the world, from digital cameras and camera phones to medical scanners and security cameras.

These images are replicated over the internet, on private company networks, by PCs and servers, in data centres, in digital TV broadcasts and on digital projection movie screens. Social networking sites such as MySpace, Facebook and YouTube, which allow individual consumers to share their inner thoughts, holiday pictures and even videos with the rest of the world, are also boosting the sheer weight of data circulating on the internet.

As a recent article in the Washington Post pointed out, YouTube alone consumes as much bandwidth today as the entire internet consumed in 2000. Users upload 65,000 new videos every day and download 100 million files daily, a 1,000% increase from just one year ago. Other research shows that more than a billion songs a day are shared over the internet in MP3 format. By 2010, adds IDC, this kind of consumer-generated material will account for 70% of those 988 exabytes, with business and government accounting for the rest.

Does it matter? And what does this have to do with business? This may be a consumer-driven trend, but it will have a huge impact on business, too, not least because most of the digital data (around 85%, by IDC's reckoning) will be sitting on some kind of corporate network. The owners of those networks will be responsible for the security, privacy, reliability and compliance of that data.

Moreover, they will have to deal with all the other new forms of business-based data flooding into their networks. This can range from digital CCTV signals, to new digital telephone systems and RFID tags, all of which will place an extra burden on networks and the companies trying to manage and control the information. "We estimate that the volume of data that companies need to manage and store is rising by around 60% and is increasing," says Adrian McDonald, UK managing director of EMC. "Of the total, some 80% of it is unstructured - emails, Word documents, presentations and so on."

The growing volume of data, and new types of data, will put extra pressure on computing operations. IT managers will see the span of their domains greatly enlarged, as VoIP phones are added to corporate networks; building automation and security migrates to IP networks; CCTV cameras go digital; and RFID and sensor networks proliferate.

Of course, all this information should be a good thing. Accurate information is at the heart of effective decision-making in a smooth-running business. But if the information is poorly managed, misfiled and inconsistent, then it is worse than useless and can even do damage by generating wrong decisions. At the same time, information is coming under increased scrutiny from lawmakers and industry regulators. Personal and financial data needs to be properly protected from unauthorised access, and company files need to be in a fit state to be inspected by the regulators and law enforcement.

As new rules make senior company executives personally accountable for the way their organisations handle information, the subject will become a boardroom issue. Companies therefore face a threefold challenge - how to take real advantage of all this new information, how to stop it overwhelming their systems, and how to avoid security breaches that will damage their reputation and risk legal penalties.

The key to these questions, says McDonald, is to become what he describes as an "information-centric organisation”. But what does that mean? It is perhaps easier to define what it is not. It is definitely not what most companies do today, where silos of information sit with different computer applications and different company departments, and where information is regularly duplicated on different systems. That approach makes data hard to handle and risks inaccuracies, as information is changed in one system, but not in others.

The answer, McDonald adds, is to move towards consolidating the data to a central point where it can be more easily managed; and where multiple instances of the same information can be eliminated and usage of the data more closely monitored. The aim is to achieve what he calls “qualified” information, defined as information that is "secure, continuant and compliant". In other words, the information has to be protected from unauthorised use or theft; available when required by an authorised user; and it should be in a state that will satisfy all applicable regulations, in terms of accuracy and ease of retrieval. By creating this information-centric infrastructure, he says, it is possible to create an agile business, where the brand is supported and new applications can be rolled out quickly.

New applications may cause new data to be generated, but this needs to be rapidly consolidated into the infrastructure and then managed through its lifecycle as it ages. At a certain stage, data will be archived on to less expensive storage media and finally deleted when no longer needed for the business or to meet regulations. That could prove a huge management task, but McDonald believes it is possible - in fact, essential - to automate the whole information lifecycle process, from de-duplicating multiple copies of the same record, to long-term archiving and on to the final deletion. And there is no time to delay. As he points out, data volumes in corporations are growing by 60% a year, and companies risk wasting vast amounts of money and effort on storing and backing up information of no business value (such as their staff's private MP3 collections).

STORING RUBBISH
"A lot of the information that businesses are storing is rubbish," says McDonald. "They are employing a lot of IT infra- structure and people to manage stuff that will provide no benefit to man nor beast. And the problem multiplies with the volume of data. That is why it is essential to understand the nature of the information you have and which parts of the information are most valuable to you."

Defining what is vital information and what is rubbish will require negotiation and discussion between IT and the various departments of the business, but it can pay off in hard cash. Important data that needs to be retrieved instantly by users will always be stored on the best available medium - either in computer memory (RAM) or on high-performance disks. Once data ages and is not needed so often, it can be migrated to a more cost-effective storage medium for archiving.

McDonald cites the example of his own company EMC, which operates a policy of archiving all data at 30 days. "That might seem draconian, but the skill is to put in aggressive management policies, while ensuring the user experience remains seamless. If someone in EMC looks for an email that is more than 30 days’ old, they don't need to know it is coming from an archival system, which is fundamentally less costly."

Careful classification of data has another benefit, too. It allows companies to impose security where it is most needed - and ensure that information such as credit card numbers and company secrets are properly encrypted and protected from unauthorised use. Meanwhile, less sensitive data can be treated more freely and may not need to be encrypted.

Ultimately, the impetus to adopt an information-centric approach needs to come at board level. Senior company directors must understand that, if they do not act, their organisations risk drowning in a rising tide of data and that, if they are unable to manage it, serious errors (such as leakage of personal data or failure to comply with regulators' demands) could lead to fines, prosecution and serious damage to their brand.

The IT department cannot achieve this all by itself, because it needs the full weight of the company behind it to make things happen. But once companies do take control of their growing mountains of data, then they really can turn it to their advantage - not least by working smarter, discovering new markets and serving their customers better. And the sooner they tackle the problem, the easier it will be. ST

The products referenced in this site are provided by parties other than BTC. BTC makes no representations regarding either the products or any information about the products. Any questions, complaints, or claims regarding the products must be directed to the appropriate manufacturer or vendor. Click here for usage terms and conditions.

©2006 Business and Technical Communications Ltd. All rights reserved.
No part of this site may be reproduced without written permission of the owners.
For Technical problems with this site contact the Webmaster