Distributed storage located at the "edges" of the Internet can help service providers overcome two of the key drawbacks to the Internet: bandwidth and latency.
By Dennis Hoffman
Service providers today share a common dilemma: Their core service offerings have reached commodity status while operations and customer acquisition costs continue to rise. This situation has made profitability an elusive goal and has created an urgent need for them to find a way to deploy new, premium services so they can derive more value from their existing networks.
The good news is that many industries have overcome this problem in the past, breaking out of the commodity trap through the introduction of high-profit, premium services such as cable television and overnight delivery. The bad news is that the current Internet infrastructure is ill-suited to the introduction of compelling, premium services such as high-performance streaming and network-based file sharing. As a result, service providers need to adopt new infrastructure technologies that can cost-effectively support premium services.
The fundamental problems with the Internet are twofold:
- Bandwidth is relatively expensive, limited, and unpredictable; and
- Latency (i.e., the time between when data is requested and when it actually arrives) is often high and always unpredictable.
Furthermore, bandwidth usage and latency are most unpredictable at times of peak demandprecisely when service providers would like the infrastructure to operate at peak efficiency.
The figure shows a high-level topology of the Internet and demonstrates why it is so difficult for service providers to address the bandwidth and latency issues. Aside from the sheer distance that data must travel from its source to its destination, it also must travel through numerous routers and other bottlenecks, which all contribute to the overall unpredictability of data delivery. And, data must traverse the networks of numerous playersWeb-hosting companies, backbone providers, access networks, content delivery networks (CDNs), enterprise LANs, and consumersmost of which have little economic incentive to cooperate and limited technical ability to provide Quality of Service (QoS) through their network.
The various points of latency and congestion in the Internet include the following:
First-mile delays that are caused when either the server's connection to the backbone is too small to handle the requested traffic, or when the server itself is unable to keep up with the requests for data.
Backbone latency and congestion that are usually the result of either congestion at the routers, insufficient backbone bandwidth, or the numerous router hops the average packet of data takes to go from the source to the end user (which today stands at 17).
Network peering relationships that interject latency because they provide incentives for backbone providers to dump the data off their network as quickly as possible instead of finding the most expedient path for the data to travel to the end user.
Last-mile delays that are caused by narrowband connections to the end user as well as under-provisioned connections from the access provider to the backbone.
CDNs have tried to solve the bandwidth and latency problems of the Internet by bypassing the backbone and staging static content closer to end users. However, they, too, have had a difficult time deploying high-end services because their infrastructures are implicitly designed for static content, not new, dynamic data. And, it is not practical for them to build true rich-content infrastructure because the products required to do so are too expensive to purchase and difficult to maintain.
Clearly, there needs to be a fundamental change in the operational underpinnings of the Internet before it can provide the type of consistency and manageability that service providers need to achieve maximum profitability.
There is a common misconception that Internet bandwidth will soon be virtually unlimited and cease to be a problem for service providers. The reality is that bandwidth at the Internet's core is expensive, extremely difficult to manage effectively, and still relatively limited. According to a Morgan Stanley Dean Witter report, the Internet backbone would be completely saturated delivering 200,000 concurrent DVD-quality video streams. To put this in context, the final episode of "Survivor" attracted more than 51 million viewers and would require more than 250 times the bandwidth of the current Internet.
Furthermore, the proliferation of broadband access and rich media are driving up operational costs because they consume more bandwidth per second than dial-up access and static content, and they create a shift in consumer behavior to stay online longer. The table shows how broadband access can cause up to a 90 times increase in bandwidth usage (and hence bandwidth costs) while bringing in a fraction of the revenue per megabyte delivered.
To defray the high cost of operations, service providers currently over-subscribe their networks by up to a factor of 100:1 to get as much revenue as possible from access subscription fees. This practice operates on the theory that only a percentage of total subscribers will be online at any given time. Therefore, service providers can significantly over-subscribe and still be reasonably sure that all subscribers will have at least a minimally acceptable online experience. This practice is rapidly reaching the end of its useful existence, however. Over-subscription is already causing significant performance problems during peak-usage times, and the situation is growing worse every day as bandwidth-intensive rich content and broadband access continue to proliferate.
This convergence of bandwidth-hungry forces leaves service providers with two undesirable options: Over-subscribe and run the risk of losing customers by having the network slow to a crawl during peak hours, or buy enough bandwidth to accommodate peak demand, thus increasing operational costs.
Caching appliances have partially overcome these challenges by moving the most frequently used content to the edges of the network, closer to end users. However, caching appliances failed to solve all the problems because they only address bandwidth reduction on small, static content delivery and do not address the broader range of issues, including:
- Support for the storage and delivery of all types of Web content, including static content, rich media content, and dynamic data;
- Ability to guarantee delivery performance;
- Ability to monitor and log all billable events and integrate into billing and subscriber management systems; and
- Ability to easily scale delivery bandwidth and storage capacity as data grows.
The bottom line is that network operators need new, intelligent infrastructure capable of handling the growth in traffic and rich data and enabling new subscription-based revenue streams. Caching appliances are a step in the right direction, but they don't have the bandwidth and storage capacity to deliver large amounts of all types of Web content.
Distributed storage, on the other hand, is that new infrastructure. It solves the technical constraints of today's Internet by minimizing bandwidth and latency problems for all types of Web content, reducing operational costs and opening up new service possibilities.
Distributed storage at the edge
Distributed storage is a product category made up of devices that can store and deliver massive amounts of data from the edge of service provider networks, at a price/performance ratio an order of magnitude
better than today's caches and storage subsystems. Distributed storage could fundamentally change the way in which service providers store and deliver data on the Internet and will eliminate some of the key technical and operational hurdles blocking the road to profitability.
The "edge" of the Internet can be located in a number of places, and therefore distributed storage has applicability at a number of network points. There is a complex value chain linking content owners to Internet end users that includes Web-hosting companies, CDNs, and network service providers (NSPs). Each of these organizations has an edge where their networks touch other networks or end users.
By placing large amounts of storage at the edges of their networks, each of these providers can greatly increase the predictability of Internet data delivery while drastically reducing the cost of doing so. Distributed storage enables these companies to offer new value-added services while decreasing their operational costs.
The concept of storing and delivering data from the edge of the Internet is not new. Distributed storage simply represents the culmination of a process that began with the adoption of caching software, followed by caching "appliances."
While these technologies have helped to reduce the cost of existing services, they have not been suitable as platforms on which to launch new services because they are both unpredictable in the quality of their data delivery (especially for rich content) and incapable of running service-level applications.
Distributed storage provides a breakthrough by offering compelling capabilities that existing solutions do not, including the following:
"Edge" economicsDistributed storage enables service providers to greatly reduce the cost of rich data delivery. For example, at $600 per Mbps and 100% bandwidth utilization, the backbone bandwidth cost alone for delivering a 2-hour, 1Mbps video (about $2) is far too high to be profitable using traditional content delivery techniques.
Because distributed storage provides an order of magnitude improvement in price/performance over traditional systems, it enables this same video stream to be delivered for just pennies, thus creating viable economic conditions for service providers to deploy premium rich-media services.
High-capacity storageCaching appliances, as the name suggests, cache relatively small amounts of frequently accessed content in memory and local storage for a short period of time. Distributed storage, however, provides massive amounts of storage, enabling service providers to "permanently" store large amounts of data on the edge of the network. This not only supports the efficient storage and delivery of rich data, but it also enables the deployment of new value-added services for customers such as network-based file sharing and remote backup services.
This combination of edge economics and high-capacity storage allows service providers to instill into the Internet the communications characteristics required for long-term profitable operations: predictable delivery of data at economically practical costs. This has been impossible in the past due to the prohibitive cost of acquiring and maintaining high-capacity storage and overly complex content delivery systems.
Today, however, the availability of off-the-shelf, high-capacity storage and computing hardware has given rise to a new crop of innovative companies delivering distributed storage solutions that are affordable enough to support ubiquitous deployment.
The technical and economic flaws inherent in today's Internet infrastructure have made it difficult for Web hosters, CDNs, and NSPs to develop more-profitable business models. Distributed storage provides the critical piece of infrastructure that service providers need to deploy high-profit, premium services, while drastically reducing the cost of content delivery.
CDNs are among the early adopters of this new technology, which is not surprising, since CDNs have consistently led the migration of content toward the edges of the Internet. They will be followed by Web-hosting companies and NSPs as the compelling benefits of distributed storage technology become field-proven and tangible. Ultimately, distributed storage will enable service providers to break out of the commodity trap and emerge with new business models that enable long-term profitability.
Dennis Hoffman is co-founder and president of Storigen Systems (www.storigen.com) in Lowell, MA.