Gold class Internet

Article 2 of 77
Information Age, November 2000
View the original article online

Poor web site performance can kill a company's online strategy. What technologies and techniques can organisations leverage to create lightning fast ecommerce?

Eight seconds. That’s all it takes. No matter how good the marketing campaign, the word of mouth, the product or even the after-sales service, eight seconds is all it takes for hard-earned customers to leave a company for its competitor. Because, more often than not, it’s a competitor that a frustrated customer will go to after a web page has failed to load within eight seconds of their arrival at a web site.

Zona Research calculates that £2.6 billion of revenue is lost each year from purchases not made as a result of slow web sites. Nielsen/WebRatings has shown that users view more web pages if they can access them faster. Jupiter Communications found that 75% of sites with high traffic levels received complaints from users regarding slow page delivery and 42% of these sites reported complaints from users of pages failing to load. Various studies have shown that up to 35% of customers at an e-Commerce site will abandon their shopping cart after a wait of a minute. Even more worrying is that 24% switch services after a site outage.

“Last Christmas, there was a massive fiasco in the States over toy store web sites,” says David Caddis, vice president and general manager of e-business assurance at Candle Corporation. “Orders weren’t being delivered or were turning up late. Not only did that stop people going back to the web sites, but they stopped going to the stores, too. A bad experience on the web translates not just into bad returns but also changes in brand loyalty off the web.” Levis shut down its fledgling site and bailed out of the e-retail business when it realised its systems were unable to cope with demand.

What are CIOs doing to prevent their businesses going the same way? Many are taking the prudent option of first finding out what the user experience is like at their companies’ sites, rather than investing in upgrades the firms don’t need. Indeed, before the sites even get users, many beta test them to estimate how they will perform in a real-world situation. The tests they use are as ‘real-world’ as possible, so they don’t make assumptions about the site that won’t actually be true outside the test.

But Caddis says, “We had some customers who built large web sites. They tested them and thought they were great. When we tested the sites, they were shocked. The end user was experiencing 12.5-13 seconds, even though the tests had been serving in eight seconds. The tests had all been done on T1 lines (fast connections that can cope with large amounts of data), whereas people were accessing from a variety of different systems outside.”

Andy Crosby, field market manager for Mercury Interactive, says that of the 500 sites his company’s tested in the last three quarters, although there was no single problem which caused the most slowdowns, 97% of the sites had a critical problem. Equally bad, 70% of sites failed to reach 30% of their intended capability (and therefore revenue). His company has centres all over the world that can bombard sites with traffic from over 100,000 simulated users from many apparent locations. By seeing how the sites responds as the traffic is increased, Mercury can provide feedback on how the site is faring. “After three to five tests, we can usually increase performance five times. It’s partly a configuration issue. Just by retuning the application or web server, you can see some really good returns.” Despite such universal a problem with web site performance, most of his call-outs have been in the last week before sites have gone live. “Most say ‘we haven’t had time to test it’,” he admits.

Monitoring tools range from the simple to the complicated. The IT manager of one online banking site asked relatives with stopwatches to record their access times at different times of the day. Richard Marsh, an administrator at Vodaphone, which uses Candle’s ETEWatch, has been able to use the monitoring tool to determine whether the company’s Gemini web-based call-centre application is performing as it’s supposed to. “We can see how long Gemini takes to call up business account numbers and customer telephone numbers, two transactions performed thousands of times a day. From this, we can ask developers to see whether the performance of the underlying code that actions these requests can be improved.”

Similarly, Tonic for HTTP (hypertext transport protocol: the protocol web browsers use to let web servers know what they need and web servers use to provide the required data) from Tonic Software simulates Internet traffic through a site then provides warnings if the performance drops below tolerance levels. It can also conduct stress tests, identify performance bottlenecks and simulate millions of concurrent users interacting with a web application from locations around the world. Caddis’ company Candle has a couple of offerings that are downloaded or installed to the client and monitor it from there, before reporting back to the server the full user experience.

After the site goes live, analysis of server logs provides information about the user experience, as do robots that automatically probe the sites at different times to determine access times and availability from different parts of the world; companies that provide this service need to have physical points of presence on the Internet all over the world.

The results of these tests usually indicate that its the web pages themselves that need fixing, not the infrastructure. Caddis suggests moving objects that aren’t needed on the home page onto other pages – most of the time, users just click through before the objects have even resolved. German broadcaster Orb found most of the information on its site was being ignored because it was on navigation pages. By moving it to pages where there was more information content anyway, it found users were reading the information they’d ignored on other pages and sped up the navigation pages significantly.

While it might take only a few seconds to download the code of a web page, it can take a minute and a half for an objects-heavy page to resolve on the screen. Technical consultant Richard Donkin, of UK-based software supplier Orchestream, says there are a few tricks web designers can use to improve that. “The biggest thing is optimising your graphics and not to use too many different buttons. If you cut down the colour depth of the graphics and make them simpler, that makes a huge difference. In fact, it’s better to have a button bar rather than individual buttons because that’s only one transaction under HTTP, so you don’t keep having to make requests back to the server for additional objects.”

Under HTTP 1.0, the original protocol used by web servers for transactions, it was impossible to have a constant connection between desktop client and web server. Every request for a new page object had to be a new transaction, with all the performance penalties that brought. HTTP 1.1, the latest version, allows for a persistent connection between client and server and is understood by all 3.0 browsers and higher.

It’s this advance that’s behind start-up company Redline Networks’ TX web accelerators. The systems intercept HTTP transactions directed to the server and compress them together into a few dozen TCP/IP (the protocols used to transmit data across networks and the Internet) transactions. It then filters out unnecessary data, such as unrendered scripts and comments, before sending the data back to the client in a compressed form, which is uncompressed natively by the browser. A rival accelerator from Packeteer (acquired from British firm Workfire Technologies in September) stores tables on how web objects interact with browsers and how to optimise those transactions.

Even so, Redline recommends using the TX accelerators with a standard caching system. Caching systems are software or hardware-based systems, usually independent of the web server, that intercept calls for often-requested objects and static content (content that is the same for all transactions) and serve them from their own stores of fast electronic memory.

This frees up the web server for processing dynamic content. Inktomi, CacheFlow, and Cisco all have caching systems. CacheFlow’s based on its own OS, CacheOS. The company reckons that its server accelerators can process up to 95% of inbound page requests, giving response times up 80% faster for web users.

But one particular problem with e-commerce web pages are that they include server-side includes (points where the server substitutes information from a database), points out Donkin. Because that content is dynamic, standard caching technology can’t help, except where there are static elements referenced. One approach to this problem is provided by Persistence Software’s new offering, Dynamai, which the company claims it has used on eBay and Discovery.com to improve performance 35 times. Dynamai remembers the answers to common e-commerce questions about price and availability and responds with pre-computed answers. The software listens for events such as price changes and invalidates associated cache contents to avoid serving old data.

Next problem down the line is how quickly the backend can respond to requests for data in real-time. This is the main cause of 50% of performance problems, Donkin estimates. One rule, he says, is that companies shouldn’t use the same database system for corporate data and e-commerce. “At the American Express site, you have the ability to look at a statement. The corporate system is probably running on a mainframe. But the web server will be running on a Unix box, probably linked to a separate database.”

This means American Express can scale up easily if demand outweighs the capabilities of the mainframe. And if one system goes down, not all the services go down. Having a separate database also means it can be optimised for web serving, with different tables and heavy indexing of particular areas, and less indexing of transactions, thus reducing the amount of data the server has to get through before it locates what it wants.

However, with two separate databases, synchronisation becomes an issue - both databases have to have the same information as each other. Copying the data between the two requires time and a lull period in database activity otherwise performance will take a hit.

What some companies have found is that picking a time to do this isn’t necessarily as obvious as choosing the middle of the night. David Caddis has another of his cautionary tales of e-commerce. “A bank in the Mid-West was using 4-6 in the morning to backup the database. But they discovered they were getting an unusual amount of activity at that time and the users were getting poor performance as a result of the backup. It turned out that farmers were going on-line to check their accounts every morning before going out into the fields. The result was the bank genuinely had to become a 24/7 online bank.”

Database performance can also be hit in unexpected ways. HFC Bank was getting particularly poor back-end performance on its branch-based, on-line credit-check system, but only on searches coming from particular geographic areas. It turned out the areas had particularly large Latino populations and searches on the name Rodriguez, even with a qualifying initial, were producing lists of names hundreds of entries long. By requiring more qualifiers, the back-end performance was improved.

Companies that have started out with relatively small hits on their sites have found their servers unable to cope with demand as it increases and have had to use load-balancing to improve performance. This uses multiple web servers and a load-balancer, which is either hardware-based or software-based. Windows 2000 Advanced Server and Datacenter Server have load-balancing capabilities, as do Linux, Solaris and most versions of Unix. Hardware-based load-balancers are available from ArrowPoint, Alteon, F5 Networks, HydraWeb and Radware among others.

The load-balancer sits between a network’s routers (which direct traffic to the right area of the network) and its servers and distributes the requests among the servers. An intelligent balancer is able to determine how much the servers’ resources are being stretched and can direct a transaction request to the server with the least load. MCI, BT and Dow Jones use HydraWeb’s load-balancer to monitor how well applications are coping before deciding whether to pass them any more requests.

An added advantage to companies using load-balancing is that whenever an old server starts to be outpaced by the demands placed on it, it can remain online and another server added, doubling the web serving capability. And if one of the servers breaks down or needs maintenance, a reduced service remains rather than a complete outage. According to Donkin, with a static site, “load-balancing is the most useful thing you can do.” And Crosby adds that “load-balancing is another area that can cause problems. But just by changing the configuration, you can see some fairly dramatic results.” Another good point: the servers don’t all have to be on the same network but can be on a WAN (wide-area network: two or more networks connected together. Typically, the connection between the two is slower than the networks themselves) or the Internet. Radware’s WSD-NP, for instance, can redirect requests from clients across the Internet to servers physically nearer to them.

At Virgin’s two latest online venture, Virgin Cars and Virgin Wine, switches from Alteon provide web traffic load-balancing, but they also improve the performance of the firewalls by distributing packets between them for processing, enabling traffic to be processed by both firewalls equally. “We’ve been able to increase the resilience as well as the performance of front-and back-end systems that are critical to our online operations,” maintains Richard Shearn, IT director at Virgin Cars.

Companies with global web sites have also had to modify their operations for the benefit of their overseas customers. Traffic doesn’t travel from the client directly to the server and back again. It hops from node to node on the Internet. The further away the client is from the server and the more congested the Internet, the more hops traffic has to make to get there and back.

One way to solve this problem is to use the services of a content delivery network such as Akamai. Akamai has several thousand servers around the world with identical content. Requests for content on sites such as apple.com and cnn.com are redirected to an Akamai server close to the client, so the client is served more quickly, simply because of its proximity to the server.

The Content Bridge Alliance, a group of companies with similar aims to Akamai, is working on standards to make their own servers interoperable with each others’ in order to have a greater number of local points of presence than Akamai.

“Bandwidth. It’s always bandwidth,” says Mercury Interactive’s Gareth Heaton, who helps maintains the company’s own site. In his experience, the problem isn’t the network itself or even the web pages. It’s either the link to the Internet from the company network or the internet itself not being able to cope with the amount of traffic on them. It’s impossible to serve up 100,000 simultaneous connections over a 56k modem. And even on dedicated lines, there’s a problem getting data into and out of a network fast enough, particularly if there’s streaming video or audio on the site: the bandwidth consumption of these applications, which require far more consistent streams of traffic than their HTTP counterparts, is phenomenal if any quality is to be attained.

But sometimes, the bandwidth problem is caused by the ISP rather than solved by it, maintains Andy Crosby. “An ISP will supply with what it can get away with, rather than what’s actually purchased because it’s actually fairly difficult for a web site designer to test the bandwidth of its connection. I had one client who had purchased a 2Mb pipe. We tested it and he was getting 500k. We phoned the ISP and they said they’d see if they could fix it. A couple of minutes later, they were getting 2Mb.”

In order to reduce the demands placed on their networks by their users, most ISPs have their own caching systems installed through which all their users’ transactions pass. Commonly visited sites have a high proportion of their content cached at local ISPs and so transactions don’t have to passed across the Internet to obtain this content, saving the ISPs’ bandwidth. Although most companies wouldn’t want to deal with the thousands of ISPs worldwide in order to have their content cached, Akamai has also been giving servers with cached content to local ISPs for free to improve their selling-point with customers. “I do rate Akami very highly,” says Crosby. “If your problem’s not your back-end system, Akamai can really help, more so than someone like CacheFlow or Alteon with a switching box.”

But companies with very popular servers who don’t have the budget to fund unlimited bandwidth have had to prioritise applications and users to ensure Quality of Service (QoS) levels. At its simplest, QoS techniques involve the web server ascertaining (from a user’s login name, IP address or from a cookie stored on the user’s hard drive) whether he or she is more important than the other users, and then prioritising traffic to and from the client accordingly.

One way to do this is to set up mirror servers on a VPN (Virtual Private Network: a portion of the Internet secured for use by a company. These systems use encryption and other security mechanisms to ensure that only authorized users can access the network and that the data cannot be intercepted) and have their load-balancer redirect high-priority business customers to it after giving them access to the VPN.

Another technique often used is to have traffic aimed at clients browsing important pages prioritised at the expense of traffic related to ancillary pages. And web-based applications such as sales tools can be classed as more important than casual browsers, and traffic carrying text marked as more important then that carrying pictures.

The best way to mark traffic is using ‘switches’, fast network devices that direct packets to their destinations, says Donkin. There are various emerging protocols, such as RSVP (Resource Preservation Protocol), for applications to notify network devices that they want priority and for the devices to notify the sender if they’re capable of providing the resources required. If they are, the resources are pre-allocated to the traffic between the server and the client application. The switch or router can tag TCP/IP packets by inserting data into their headers, marking them as having a certain priority level. Network devices that can understand this tagging then transmit the packets with appropriate speed when the network is congested, delaying or even deleting low priority traffic and passing on high priority traffic.

Switches, because they don’t have to open up each packet, only reading the headers, are quicker at this than routers so are preferred QoS devices.

The latest version of Cisco’s IOS (internetworking operating systems) software contains NBAR (Network-based application recognition), a classification engine that can recognise traffic from web-based applications and then apply appropriate QoS measures. Microsoft Windows, too, has a built in QoS API (an application-programming interface: a pre-built tool that programmers can use in their own applications) that Windows Media Player and Internet Explorer use to notify servers of their respective network demands. Packets from WMP are given priority over IE’s traffic because web traffic is less vulnerable to interruption than streaming media’s.

The drawback with packet-marking is that once packets have left a ‘QoS’ network, their priority tags are removed so the QoS is only guaranteed on that network, not on the Internet or another connected network.

Further problems may set in when traffic isn’t HTTP-based but is HTTPS-based (a secure version of HTTP that encrypts the traffic using Netscape’s Secure Sockets Layer protocol). Servers alone can experience a 50-fold degradation of performance, according to Networkshop tests, in many cases allowing only a few transactions a second, as they process the encrypted transaction information and encrypt their responses.

Worse still, load-balancers cannot access header information, and so don’t know how to redirect packets based on the information within. That means that if a customer’s been prioritised to a fast server, as soon as they try to pay or perform a secure transaction, they suffer a massive slowdown.

There are ways to overcome at least some of this slowdown, including installing a SSL card in the server or having a standalone SSL accelerator on the network that will decrypt and encrypt content for the server. SSL accelerators can improve e-commerce application performance 50% and increase up to 47 times more connections per second, according to F5. The company has even gone to the extent of producing an SSL load-balancer that integrates an SSL card with a load-balancer so that it can redirect packets based on the decrypted HTTP information.

Another solution is to skip chunks of the Internet or even circumvent it altogether. Companies such as AboveNet and Digital Island have installed their own fibre optic networks around the world onto which traffic can be routed for faster access. Because their web servers have to be one stop from the Internet backbone to do this, the drawback is that interested companies have had to have their web sites hosted by AboveNet or Digital Island (and since Digital Island is on Hawaii, because of the military fibre optics connecting it to Asia and the US, if there’s a problem, there’s not going much they can do about it).

User web-design firm e-Media says that the AboveNet network is capable of handling up to 30 or 40 times its normal traffic. “An internal study showed that for every second the web site was mentioned in a television promotion, the site attracted an extra 150,000 page views,” says CTO Michael Terrata. “Most providers cannot guarantee the headroom required to handle the traffic the site generates without notice. Some of the competitors are running lines at 80 to 90% capacity, much higher than AboveNet’s 50%.”

Since the connection to the US from Europe is one of the bigger bottlenecks in Internet traffic, “most people pay for a direct connection between Uunet and Sprint,” according to Richard Donkin. Traffic from the US intended for these companies’ servers is given priority by routers at either end of the connection.

Start-up QoS, based in Dublin, has leased fibre and satellite connections under an irrevocable right of use from companies such as Global Crossing. Routers classify information packets, giving the company’s clients the ability to select and classify the way content can be provided to customers. CFO Michael Keane says “it’s like classifying a bus lane on a busy highway. You don’t build a new lane to solve the congestion. You just restrict the use of one or two lanes to those with certain needs and functions.” Clients can even make changes to their own levels of bandwidth through a web interface. However, the system has yet to go live so the company can’t say for sure how reliable its service will be.

Figuring out what could be slowing down a web site is not an easy job, but working out how to speed it up is even harder. But with business-to-business e-commerce alone predicted to be worth $7.29 trillion in 2004 by the Gartner Group, the rewards are as high as the penalties are severe.

Rob Buckley – Freelance Journalist and Editor

Information Masters

Faltering steps

Gold class Internet