HOW TO HANDLE THE DATA EXPLOSION!
HOW TO HANDLE THE DATA EXPLOSION!
From STORAGE Magazine
Vol 7, Issue 8 - November/December 2007
Ron Condon, of information management company EMC, assesses the impact that
rocketing data volumes will have on business and suggests ways to handle the
challenge
The world is experiencing what can only be described as an explosion of
digital data.
From the emails that keep us in instant communication, to the music that we
download on the internet, to the videos that sit on social networking sites such
as YouTube, the volume of this data is expanding at such a rate that we are in
danger of running out of the means to store it all (visit EMC ticker for the
latest figures: http://www.emc.com/about/ destination/digital_universe/).
The sums involved are truly mind-boggling. Most of us have become acquainted
with terms like a megabyte (equal to one million bytes or characters of
information) from our encounters with personal computers. Some of us might even
know that a gigabyte is 1,000 megabytes. But these terms are inadequate to
describe the huge volumes of data being unleashed into our systems and on the
internet. According to research company IDC, the amount of digital information
created and replicated last year amounted to 161 exabytes - that's 161 billion
gigabytes. Bringing it into more familiar terms, that is about three million
times the information contained in all the books ever written,
up to the present day.
But that is just the beginning. By 2010,states IDC, the amount of information
added every year to the digital universe will increase more than six-fold to 988
exabytes. And the pace of growth will continue to increase.
What's driving it?
Much of the expansion is being driven by the conversion of analogue material to
digital format. This includes the move from film to digital image capture, from
analogue to digital voice, and analogue to digital TV. And, says IDC, the
biggest component in the digital universe is the images captured by more than
one billion devices around the world, from digital cameras and camera phones to
medical scanners and security cameras.
These images are replicated over the internet, on private company networks, by
PCs and servers, in data centres, in digital TV broadcasts and on digital
projection movie screens. Social networking sites such as MySpace, Facebook and
YouTube, which allow individual consumers to share their inner thoughts, holiday
pictures and even videos with the rest of the world, are also boosting the sheer
weight of data circulating on the internet.
As a recent article in the Washington Post pointed out, YouTube alone consumes
as much bandwidth today as the entire internet consumed in 2000. Users upload
65,000 new videos every day and download 100 million files daily, a 1,000%
increase from just one year ago. Other research shows that more than a billion
songs a day are shared over the internet in MP3 format. By 2010, adds IDC, this
kind of consumer-generated material will account for 70% of those 988 exabytes,
with business and government accounting for the rest.
Does it matter? And what does this have to do with business? This may be a
consumer-driven trend, but it will have a huge impact on business, too, not
least because most of the digital data (around 85%, by IDC's reckoning) will be
sitting on some kind of corporate network. The owners of those networks will be
responsible for the security, privacy, reliability and compliance of that data.
Moreover, they will have to deal with all the other new forms of business-based
data flooding into their networks. This can range from digital CCTV signals, to
new digital telephone systems and RFID tags, all of which will place an extra
burden on networks and the companies trying to manage and control the
information. "We estimate that the volume of data that companies need to manage
and store is rising by around 60% and is increasing," says Adrian McDonald, UK
managing director of EMC. "Of the total, some 80% of it is unstructured -
emails, Word documents, presentations and so on."
The growing volume of data, and new types of data, will put extra pressure on
computing operations. IT managers will see the span of their domains greatly
enlarged, as VoIP phones are added to corporate networks; building automation
and security migrates to IP networks; CCTV cameras go digital; and RFID and
sensor networks proliferate.
Of course, all this information should be a good thing. Accurate information is
at the heart of effective decision-making in a smooth-running business. But if
the information is poorly managed, misfiled and inconsistent, then it is worse
than useless and can even do damage by generating wrong decisions. At the same
time, information is coming under increased scrutiny from lawmakers and industry
regulators. Personal and financial data needs to be properly protected from
unauthorised access, and company files need to be in a fit state to be inspected
by the regulators and law enforcement.
As new rules make senior company executives personally accountable for the way
their organisations handle information, the subject will become a boardroom
issue. Companies therefore face a threefold challenge - how to take real
advantage of all this new information, how to stop it overwhelming their
systems, and how to avoid security breaches that will damage their reputation
and risk legal penalties.
The key to these questions, says McDonald, is to become what he describes as an
"information-centric organisation”. But what does that mean? It is perhaps
easier to define what it is not. It is definitely not what most companies do
today, where silos of information sit with different computer applications and
different company departments, and where information is regularly duplicated on
different systems. That approach makes data hard to handle and risks
inaccuracies, as information is changed in one system, but not in others.
The answer, McDonald adds, is to move towards consolidating the data to a
central point where it can be more easily managed; and where multiple instances
of the same information can be eliminated and usage of the data more closely
monitored. The aim is to achieve what he calls “qualified” information, defined
as information that is "secure, continuant and compliant". In other words, the
information has to be protected from unauthorised use or theft; available when
required by an authorised user; and it should be in a state that will satisfy
all applicable regulations, in terms of accuracy and ease of retrieval. By
creating this information-centric infrastructure, he says, it is possible to
create an agile business, where the brand is supported and new applications can
be rolled out quickly.
New applications may cause new data to be generated, but this needs to be
rapidly consolidated into the infrastructure and then managed through its
lifecycle as it ages. At a certain stage, data will be archived on to less
expensive storage media and finally deleted when no longer needed for the
business or to meet regulations. That could prove a huge management task, but
McDonald believes it is possible - in fact, essential - to automate the whole
information lifecycle process, from de-duplicating multiple copies of the same
record, to long-term archiving and on to the final deletion. And there is no
time to delay. As he points out, data volumes in corporations are growing by 60%
a year, and companies risk wasting vast amounts of money and effort on storing
and backing up information of no business value (such as their staff's private
MP3 collections).
STORING RUBBISH
"A lot of the information that businesses are storing is rubbish," says
McDonald. "They are employing a lot of IT infra- structure and people to manage
stuff that will provide no benefit to man nor beast. And the problem multiplies
with the volume of data. That is why it is essential to understand the nature of
the information you have and which parts of the information are most valuable to
you."
Defining what is vital information and what is rubbish will require negotiation
and discussion between IT and the various departments of the business, but it
can pay off in hard cash. Important data that needs to be retrieved instantly by
users will always be stored on the best available medium - either in computer
memory (RAM) or on high-performance disks. Once data ages and is not needed so
often, it can be migrated to a more cost-effective storage medium for archiving.
McDonald cites the example of his own company EMC, which operates a policy of
archiving all data at 30 days. "That might seem draconian, but the skill is to
put in aggressive management policies, while ensuring the user experience
remains seamless. If someone in EMC looks for an email that is more than 30
days’ old, they don't need to know it is coming from an archival system, which
is fundamentally less costly."
Careful classification of data has another benefit, too. It allows companies to
impose security where it is most needed - and ensure that information such as
credit card numbers and company secrets are properly encrypted and protected
from unauthorised use. Meanwhile, less sensitive data can be treated more freely
and may not need to be encrypted.
Ultimately, the impetus to adopt an information-centric approach needs to come
at board level. Senior company directors must understand that, if they do not
act, their organisations risk drowning in a rising tide of data and that, if
they are unable to manage it, serious errors (such as leakage of personal data
or failure to comply with regulators' demands) could lead to fines, prosecution
and serious damage to their brand.
The IT department cannot achieve this all by itself, because it needs the full
weight of the company behind it to make things happen. But once companies do
take control of their growing mountains of data, then they really can turn it to
their advantage - not least by working smarter, discovering new markets and
serving their customers better. And the sooner they tackle the problem, the
easier it will be. ST
|