or several years, the growth of unstructured content has been substantially more rapid than structured information. Because they contain highly structured data and metadata, applications such as ERP, financials, CRM, supply chain, etc., can be credibly used to indisputably re-create a sequence of events such as who placed the order, for what, for how much, when, and on what terms. In contrast, unstructured information such as e-mails, documents, spreadsheets, voicemail, and pictures are masses of stored information where it’s much more difficult to determine what’s relevant and replicate a decision flow in a manner that is provable with a reasonable degree of certainty. This represents a huge liability for organizations as information uncertainty grows daily and exponentially. (See figure, below.)
The legal community has discovered that unstructured information in general, and e-mail archives in particular, represent a new source of revenue, leading to a large number of lawsuits in which old e-mails become the key piece of evidence used to confirm charges of corporate wrongdoing. What’s more, changes to the Federal Rules of Civil Procedure (FRCP) adopted late in 2006 now require civil litigants to consider electronic evidence as part of the discovery process.
Not only is e-mail a prime candidate to look for a “smoking gun,” but the procedures by which e-mail is managed are also coming under attack and intensified scrutiny. Sometimes, it is easier to attack a firm’s business policies and procedures than it is to argue the pure merits of a case. E-mail management has become fertile ground for these types of attacks. Organizations that are unable to demonstrate clear, well-defined, and followed policies, procedures, and tools for e-mail management are open to accusations of systematic destruction or spoliation of e-mail-based evidence. Penalties for these types of violations can be severe, ranging from sanctions all the way to an adverse inference whereby the court essentially states that the evidence not produced by an organization should be assumed adverse to their case, effectively shifting the burden of proof. Policies and procedures for retention, backup, departing staff, disaster recovery, security, and access control are all fair game and may come under scrutiny.
As a result, e-mail archiving capabilities should not only focus narrowly on IT interests (e.g., holding down the cost of storing e-mails), but also begin to align with business imperatives, especially from the legal function within organizations. Legal departments view e-mail archives as critical to their day-to-day operations, and e-mail has become a mission-critical application. As such, while storage cost containment remains important, and in some respects is under even greater pressure than ever, successful e-mail archive initiatives emphasize not just reducing the cost of writing e-mail to an archive, but also querying e-mail that is necessary to support an efficient and effective discovery process. As shown in the table below, the business dynamics and corollary IT/storage administration implications of this imperative are evolving.
Evolution Of Unstructured Archiving Systems And Storage/IT Roles
The two main drivers behind this trend are the need to better support the legal activities of the corporation (i.e., reduce exposure to expensive lawsuits) and the necessity to maintain compliance with increasingly information-oriented regulations. It is important to note that the shifting center of gravity for e-mail archiving from the IT server room to the courtroom will evolve still further. The technologies, functionality, and business capabilities associated with being able to query, extract, and rapidly decipher large amounts of unstructured data will eventually find themselves at the center of boardroom discussions as organizations attempt to proactively verify that business practices and employee behaviors are consistent with the cultural and policy edicts of the organization. Furthermore, over time, line-of-business executives will begin to demand a payback on e-mail archiving investments, beyond risk reduction, deriving more value from information assets and directly improve the performance of their businesses (Stage 5 of table).
These trends have three significant consequences for storage administrators:
- Storage administrators and e-mail managers will be required to serve many masters, including business lines, legal departments, compliance, and an emerging records management function, once viewed as a backwater of corporations and more recently emerging as a fundamental driver of business requirements;
- Incremental expenditures on core e-mail archiving activities will initially be seen as non-value producing (e.g., like buying insurance) and pressures to keep storage costs down will escalate even further;
- As business lines begin to demand that e-mail archiving activities produce tangible value, this will force integration with other key information management systems of the organization, including document management and records retention. This means storage processes will become dramatically more “interesting” over time. This is both a challenge and opportunity for storage administrators.
Over the next few years, we expect to see significant new case law test the limits and requirements of e-mail archiving, which will continue to drive new investment in these technologies. We will also start seeing businesses further experiment with the capabilities created by e-mail archiving in other business domains in an attempt to affect document and information lifecycle management (ILM) strategies beyond e-mail.
IT’s role should be to facilitate the transition to a legal business-driven e-mail archiving approach, recognizing that approach will evolve with increased focus on access to unstructured data. However, IT organizations must also help the business fully factor, in a more comprehensive way, how messaging technology affects business risk and opportunity by addressing new e-mail technologies (e.g., hosted e-mail) as well as other messaging technologies (e.g., instant messaging and voicemail) that employees use to perform both business and personal communications today.
IT organizations must also recognize that as the economics and value proposition of e-mail archiving and other messaging and content technologies shift, their centers of expertise will also shift to initially serve corporate counsel in support of electronic discovery activities and over time to radically re-architect the unstructured information infrastructure of the organization.
IT Serves Many Masters
E-mail managers are under pressure to ensure the cost, performance, responsiveness, and validity of e-mail archives are maintained. Users and senior managers notice when things go wrong or performance is not adequate. At the same time, the legal and compliance functions are responsible for ensuring organizations are adequately protected from outside (or internal) exposures. To IT, e-mail archiving systems should be implemented in ways that have minimum impact on operations—but this is not always easy. For example, the simplest way to populate an e-mail archive system is to use the daily backup as a source of record. However, using the e-mail backup copy will not ensure all the e-mails are captured. E-mails immediately deleted by users, for example, will not be accessible because they’ll never be backed up.
The alternative method of using a journaling feed ensures every e-mail is captured. However, journaling introduces overhead and complexity that may impact the production e-mail system and may also increase storage costs substantially. We have heard storage and/or e-mail managers argue: “We don’t need to install journaling; it’s an unnecessary expense because we already back up and archive the vast majority of e-mails that come in to the organization.” This line of thinking misses the business requirement by a mile. The truth is that counsel must be able to say they have included all e-mails in a discovery request. If users can “shred” e-mails by deleting them immediately, they will, and often do so as a normal course of their daily routines. If that e-mail is discovered in, say, another organization, counsel has lost all credibility, and the organization will lose the trust of any court. This exposure can far outweigh the incremental costs and overhead of installing journaling.
It is a cost of business to ensure all e-mails are captured, and that the legal department has a comprehensive understanding of the e-mail content in question. It must be assumed that opposing legal counsel will mercilessly exploit any weakness in your e-mail systems. IT must ensure the business clearly specifies the requirements for e-mail archiving, including the needs for provenance, permanence, comprehensiveness, and deletion policies; otherwise, any supposed legal umbrella will be illusionary.
Impact On Users
E-mail is the lifeblood of most enterprises. Its value to the business is speed, flexibility, and lack of formal structure. Any initiative that seeks to change this must include an evaluation of the risk/reward equation.
As such, a primary requirement of e-mail archiving initiatives is to make activities as non-disruptive as possible for users. Unfortunately, that’s not always feasible. While today’s best-of-breed e-mail archiving systems preserve as much of the user footprint as possible (e.g., folder structures) user disruption is inevitable.
Newly introduced policies and consequent migration activities will invariably introduce change. For example, in an effort to keep costs down and e-mail manageable, an organization might mandate that after 60 days, e-mails be migrated to more cost-effective storage tiers. Users will at that time have access to a “stub” while the e-mail itself resides in the archive. Users will find the performance of accessing this migrated file to be much lower than what they’ve been conditioned to expect.
The key to managing this disruption is communicating the benefits to users. The two main benefits are virtually unlimited mailbox sizes, and simplification of e-mail management (e.g., elimination of managing .pst files).
On balance, users will find they are much more productive with the new system in place; however, it will take time for them to realize these benefits, and IT can expect some friction in the transition process.
Moreover, the availability of free, Web-based, consumer e-mail services forces firms to complement efforts to implement the right mix of e-mail archiving technology with the right mix of end-user education. Use of internal e-mail systems can be monitored to verify that guidelines are being followed. However, users increasingly can circumvent an organization’s e-mail mandates by using third-party e-mail services such as gmail, yahoo, and hotmail to conduct company business. It’s clear that the courts expect firms to be able to access and provide copies of any digital communication conducted in support of business. It’s not clear if the courts will relax this expectation simply because a business document was created and distributed over a third-party mail system.
As a result, e-mail governance policies must include users’ activities with third-party e-mail systems, with special efforts made to determine the risk to the firm posed by third-party e-mail provider archiving practices. Bad user behavior will trump any technology implementation, necessitating clear policies regarding third-party services.
The bottom line is that e-mail archiving will require introducing changes to the way users work. With older e-mails, response times will be slower and certain activities more restrictive. But users will ultimately be freed from the hassles of running out of mailbox space and dealing with .pst files.
Technology Integration Actions
Case studies of large and mid-sized organizations show that as much as 50% of file-based storage (e.g., NAS) is allocated to storing .pst files, often a huge amount of storage. This is not surprising given the overlap in e-mail content as, for example, the same attachment goes to 10 different users.
The good news for storage administrators is that a comprehensive e-mail archive system can get rid of .psts over time, freeing up storage space. The better news is storage administrators receive this benefit on the coattails of a wider corporate e-mail archiving initiative. Storage managers should move legacy .pst files to the slowest, least-expensive devices and over time phase out .psts through attrition, depending on legal retention policies. New .pst files should be aggressively deleted as the new archive will house the e-mails of record.
The challenge is that e-mail archiving initiatives will create truckloads of more storage that needs to be managed, especially as journaling is introduced. Technologies such as thin provisioning, data de-duplication, and low-cost SATA arrays should be aggressively investigated. As well, N-tier storage strategies make sense with e-mail archiving as newer e-mails will reside on near tier-1 storage (let’s call it T1-B) and will be migrated to T2 and T3 tiers as archived over time to save additional costs. This will require clear policies, data classification efforts, and technologies to automate the migration of files.
It is important to note that data classification must be automated at the point of creation or use; otherwise, classification efforts will become impossible to maintain (Stage 4 of table). This is challenging because tools and metadata are often lacking, especially for unstructured content, although e-mails contain plenty of metadata that serve as a good starting point for auto-classification efforts. Over time, more business-relevant metadata can be introduced, including user-driven classification schema.
Also noteworthy is that installing journaling requires additional overhead on both the processor and the network and often requires changes to the core e-mail system to ensure application performance and behavior is predictable. While journaling is necessary to ensure all e-mails are captured, it will require more storage, servers, and I/O performance. On the flip side, restoration of e-mail backups is dramatically simplified because the first line of restoration is the e-mail archive (versus e-mail backup media that must be accessed in snapshots of time).
Interestingly, we have begun to see some courts and attorneys start to question the authenticity of electronic evidence. As any IT professional knows, electronic files are not the same as paper, and proving authenticity is difficult to impossible in many cases. As a consequence, several additional technology integration points are worth noting:
- In the near term, we expect to see more hardware- and software-based write-once, read-many (WORM) technologies and some newer PKI/time-based approaches, including tamperproof marking technologies;
- Security in many ways is more difficult because more people have access to the repository. Audit trails and other security practices are fundamental; and
- Getting rid of e-mails is a key challenge, invoking a variant of Einstein’s advice, “keep everything you need but no more.” Shredding policies and technologies to ensure files are deleted and storage devices are scrubbed is critical to keeping costs down and archives manageable.
E-mail archiving shows up as the tip of the iceberg when it comes to managing, securing, and exploiting the unstructured data within an organization. While some vendors are trying to integrate the pieces into a single solution, no vendor has articulated a complete and scalable technology road map to achieve this. Today’s e-mail archiving products can largely be viewed as point solutions that either focus on infrastructure challenges, or attempt to solve end-user problems for archiving or e-discovery. Many solutions fail to provide adequate functions for users, or provide user functions without adequate performance, scalability, or integration with other unstructured data in the organization. The market will likely bifurcate, and users should expect products that concentrate either on data infrastructure or best-of-breed functionality.
As such, in designing strategies for unstructured data in general, and e-mail archives in particular, IT organizations should avoid dependence on single-vendor products that try to integrate the infrastructure and end-user functions within the same solution. The likelihood of success in the long run is limited. Short-term tactical adoption of these products is necessary, but the business case should assume a cost of migration to other products within five years as technology advancements are rapid.
Information As An Asset
Much, if not most, of today’s activity around e-mail archiving is being driven by the need to reduce corporate exposure. Organizations realize, however, that the broader use and integration of unstructured information systems brings potentially enormous value in terms of improved opportunity mining, cross-selling, and massive productivity enhancements. These initiatives will require storage administrators to consider accommodating the policy edicts of many other parts of the organization within the process framework of the e-mail archiving infrastructure. This means balancing the needs of maturing e-mail archiving processes with other document and ILM activities, and integrating what may be “siloed” security, compliance, retention, and other practices. Developing cross-organizational standards today will dramatically accelerate integration efforts down the road.
We are just beginning to understand this vision in terms of business requirements, key metrics, technologies, and challenges. Nonetheless, mid- and long-term plans must begin to incorporate the notion that information value can be viewed and measured using a balance-sheet metaphor, where information assets and liabilities, while evolving, can be observed as snapshots in time. The composition of that information balance sheet can be measured, albeit somewhat subjectively, and affected by specific strategies and actions that, like a balance sheet, can become an indicator of health, viability, and opportunity.
David Vellante is a co-founder of The Wikibon Project (www.wikibon.org), an open-source research and advisory community of practitioners and consultants dedicated to improving the adoption of technology and business systems. Michael McCreary is senior director, legal business technology, at Pfizer and is a member of the Wikibon community. The views and opinions expressed in this article are his own and should do not necessarily represent those of his employer.