Take an integrated approach to e-mail archiving

"Unified archival" integrates regulatory compliance, attachment management, and content management.

By Kon Leong

It's almost enough to drive us back to snail mail.

Last year, the SEC got serious, corralled some big banks, and collectively fined them $8.25 million. Their offense? They didn't have an e-mail archival system that the SEC deemed adequate. And in case you think you're exempt because you're not a bank, think again: Recent legislation, such as the Patriot Act, HIPAA, Basel II, and the Sarbanes-Oxley Act, reaches wider and deeper than ever to affect most corporations.

Performance is another critical e-mail issue. Many e-mail servers in use today were never designed for today's volumes. Attachments are a big part of the problem. As much as 85% of all e-mail data is attachments, according to the Radicati Group research firm. As e-mail servers get bogged down, adding more servers gets expensive.

And who wants to admit that the bulk of corporate intelligence lies buried in e-mail and attachments, but that few companies have figured out a cost-effective way to store, access, and search them? Instead, end users spend as much as 20% of their time searching through e-mail, according to an estimate by the Gartner Group.

So, how does e-mail archival address these diverse challenges? First, the word "archival" should be archived.

E-mail archival today is a lot more than just spooling data onto tape and retrieving it when necessary. Companies now want e-mail archival to address multiple challenges, including compliance with government regulations and management of attachments and e-mail content.

Partial solutions exist today, but they fall short of an integrated approach, or "unified archive," that integrates three functions in one platform:

Compliance—New regulations continue to lengthen the period of retention, expand the scope of e-mails covered, and shorten the required time of data access, search, and retrieval. Each regulated sector has specific archival requirements. E-mail archiving should support the requirements of multiple sets of regulations, since organizations may fall into two or more compliance categories. A sophisticated rules engine makes it easier to deal with the complexity in compliance requirements. Flexibility is also critical since compliance parameters change often. Tools for random sampling and approval processes should also be part of the archiving solution.

Attachment management—To lighten the data load on e-mail servers such as Exchange and Notes, attachments can be separated from the main messages and replaced with a link. This reduces the volume of data handled by the e-mail servers, resulting in faster performance, lower costs (by decreasing the number of e-mail servers required), and better content control. Other features to look for include implementation at the gateway and server level (as opposed to desktops); a single-copy approach, with no duplicates; Web-based attachment viewing by users and administrators; and fast attachment search capabilities.

Content management—As much as 80% of corporate intelligence can be buried in e-mail and attachments. Unlocking the value of this resource requires archival software that can organize huge stores of unstructured content and enable fast, on-demand searches. In addition, access, search, and restore privileges should be available for all employees, not just administrators, with built-in safeguards restricting access according to company policies. Capabilities for e-mail content management can include quick searches of e-mail and attachment content, hierarchical access privileges, detailed audit trails, and timely and scalable indexing of new content.

Click here to enlarge image


Ideally, these three archival applications should be integrated in a single system. The benefits of integrated archival include cost savings, a single point of control, integrated reporting, and lower administration overhead.

There's a wealth of functions to look for in today's e-mail archiving systems, and end users should weigh the various functions according to their specific requirements. The rest of this article explains the various functions.

Ability to run three applications on a single system—Running all three solutions on one system can save a lot of money. For example, if you run three separate systems, you need three servers (or six for redundancy). You also need multiple administrators, separate monitoring and response systems, and multiple backup-and-restore devices and processes. And you don't have integrated reporting or a single point of control. In contrast, with an integrated system the costs would be much lower since you need only one production system and one backup system. And administration is easier with a single interface, a single point-of-control, and integrated reporting.

Ability to sit at the gateway—The archival system should be able to sit at the "gateway" (i.e., the gathering point at the edge of the network where all e-mail traffic passes through) in order to catch all incoming and outgoing traffic. Many archival solutions depend on the e-mail server to provide copies of e-mail. Unfortunately, the e-mail server does an incomplete job. For example, it may report the contents of the "letter," but ignore the contents of the "envelope." This has security implications, since a user could address the inside letter to Recipient A but put Recipient B's address on the envelope for actual delivery. The archive would, therefore, have no record of Recipient B. Another source of data loss is the blind-copy feature ("bcc:"), whereby the e-mail servers are effectively "blind" to these messages. Other data leaks come from messaging systems that bypass the main e-mail servers (e.g., automated CRM e-mail, statement deliveries, etc.), or from inaccuracies stemming from the ignorance of e-mail servers of events such as e-mail blocking by content filters. All these leaks and inaccuracies can be caught at the gateway.

Scalable and flexible engine—E-mail archival involves enormous volumes of data. In a single year, a 12,000-person company generates more e-mail data than all the printed content to date in the U.S. Library of Congress. The software engine must have the horsepower to handle the combined burden of attachment, compliance, and content management, which depends on the scalability and flexibility of the base architecture and core engine, or message transfer agent.

Industrial database, plus metadata for scalability—E-mail archiving should incorporate well-known databases, or a choice of databases, such as Oracle, IBM DB2, Microsoft SQL Server, etc. Proprietary databases can lead to "vendor lock-in," along with difficulties in integrating, troubleshooting, and modifying software. For larger installations, a "metadata" layer improves scalability.

High-availability architecture with fail-over and load balancing—High availability is a critical requirement for regulatory compliance and attachment management. The archiving system should be able to "fail-over," or transfer the workload of a failed server to a working one. It should also enable load balancing (continuously distributing the workload across multiple servers).

Platform independence—Archival software should run on commonly used systems, including operating systems, databases, and servers.

Software technology platform—The e-mail archival system should be built on either ".Net" or Java/Java 2 Enterprise Edition ("J2EE"). These standards provide benefits such as easier modifications, lower maintenance, and faster integration with other applications. Legacy or non-standard platforms are more expensive to maintain, modify, and integrate.

Compatibility with popular mail servers—Archiving software should work with multiple e-mail platforms because of mixed environments and changing circumstances.

Attachment management, including "stubbing"— To improve e-mail server performance, e-mail archiving should provide a complete storage offloading capability by replacing all attachments with a link in the original message. Additional features may include the ability to download attachments through the link for both internal and external recipients; no client software to install; single-copy, multiple views; and attachment life-cycle management.

Granular access, search, and restore by users, with restrictions and privileges—The archival solution should extend access, search, and restore functions not only to administrators but also to every employee, subject to company policies and restrictions. This requires the ability to set up hierarchical privileges. It also lowers administration costs, maintains security, and increases the value of the archived data by giving access to people who can use it best. An additional feature could include restore services to the client for offline reading, preferably without installing client software.

Rules engine—Enterprises need to define and enforce archival rules in a flexible and granular fashion, including rules based on parameters such as sender, recipient, subject, date, size, domain, and body text.

Compression—Storage costs are a significant component of archival costs. Compression of content is the most direct way of minimizing costs, freeing network bandwidth, and shortening the time for data transport.

Fast, indexed search of message header, body, and attachments—The ability to quickly search, find, and retrieve data drives the value of an archival solution. Ideally, the software should support near real-time indexing of header data, body text, and attachments.

Detailed audit trails and integrated reporting—It is important to maintain a record of activities at the message, user, and administrator level. Such audit trails are useful for performance profiling, security, and accountability.

Administration tools with Web-based interface—A Web-based user interface enables control from anywhere, via any device, including wireless. The solution should also offer hierarchical administration, where housekeeping tasks can be shifted from the administrator to users.

A complete archive, including internal-internal e-mail—The archival system should capture and manage both internal-external e-mail and internal-internal e-mail.

Internal and external load balancing—Most systems offer rudimentary load balancing, which allocates inbound traffic to the appropriate server for processing. Some approaches also offer internal load balancing, where the servers allocate tasks between themselves internally to achieve maximum throughput.

Snapshot management—Snapshot management enables tracking of images of the archive at specific points in time, for retention or restoration. Some snapshots go further by enabling traffic management and finely tuned archival, such as setting snapshot intervals to any arbitrary period. Snapshots can be taken with negligible impact on network or system performance. Snapshots can also enable efficient recovery, comprehensive journaling, and flexible data management.

Security, encryption, authentication, authorization—The sensitive information often contained in archives, combined with the need to access it via the Internet, may require end-to-end security, management of privileges, authentication, authorization, digital signatures, and audit trails.

Multiple modes of access to the archive—As the Web redefined the concept of data access, e-mail archiving should provide multiple modes of access via open standards, which might include IMAP4, WebDAV, MAPI, POP3, SMTP, HTTP/s, and Web services.

Single-copy, multiple-views of the archive—Many archival products tolerate duplicate copies and display archive data in a limited way. You should look for single-copy capabilities that provide multiple virtual views of the same content. For example, a user can view an archive in the same folder format as the primary e-mail.

Support of wireless access—The e-mail archive should offer access capability via any device (e.g., PC, PDA, handset) and any wireless protocol (e.g., WAP, i-mode).

Life-cycle management and granular retention policies—Archiving should enable life-cycle management of archived material, from initial storage to re-classification to destruction. Granularity of control should be fine-tuned down to the message level. In addition, hierarchical storage management can automatically migrate data to different media, including disk, tape, optical, etc.

Near real-time indexing—Most archival systems are unable to do near real-time indexing and updates. Instead, they do it in batches, with time between updates varying widely. The archive data may not be up to date. Ideally, archiving would support near real-time incremental indexing and updates and be capable of parallel indexing across multiple machines to ensure scalability.

Complete end-user Webmail view and UI—Web-based access can be either through a browser or clients such as Outlook.

Compliance tools and process applications—Look for tools to make the compliance process easier for auditors and compliance officers. Potential features include random sampling and automatic distribution of samples to inspectors for examination, with tracking of each batch until approved.

Kon Leong is the president and CEO of ZipLip Inc. (www.ziplip.net) in Mountain View, CA. (The evaluation is available in spreadsheet format at www.ziplip.net.)

This article was originally published on December 01, 2003