Showing posts with label web pages load faster. Show all posts
Showing posts with label web pages load faster. Show all posts

Jan 20, 2010

Effective usage of Online tools for web based testing

Web Standards
  • The World Wide Web Consortium (W3C) is an international consortium where Web standards are developed. Mission : To lead the WWW to its full potential by developing protocols and guidelines that ensure long-term growth for the Web
  • A major chunk of the population suffers from some of the impairment. The target segment for accessibility includes – Visual, Hearing, Old age, low Literacy, low bandwidth connections
  • Web Content Accessibility Guidelines (WCAG) - Explains to Web content developers how to make content accessible for people with disabilities
  • Section 508 and ADA (American with Disabilities Act) are laws that govern Web accessibility in US
Browser Compatibility
  • Different web browsers are created by different organizations, user-interface enhancements. The designer must devise a strategy that accounts for this possible variation to ensure that a website can still be used without the enhancement or only use features that are supported by ‘all’ browsers.
  • Testing the Browser Compatibility of Web sites across various browsers and operating systems
Readability
  • Readability is the measure of how easy to read and comprehend a document
  • Reading level algorithms only provide a rough guide, but can give a useful indication as to whether you've pitched your content at the right level for your intended audience
  • Gunning Fog, Flesch Reading Ease, and Flesch-Kincaid are reading level algorithms that can be helpful in determining how readable your content is
Web Page Speed Analysis
  • Web page speed is based on page size, composition, and download time.
  • Page weight is a measurement of the file size of a Web page that includes the combined size of all the elements of the page, including HTML files, images, audio or video, Flash animation, etc.
  • Reducing the page weight obviously improve the performance of the web site
Tools Usage

Tools

Description

HTML Validator

To check the HTML code as per W3C standards. This validator checks the markup validity of Web documents in HTML, XHTML, SMIL, MathML etc

Link Checker

To check the Link navigations as per W3C standards. Link Checker looks for issues in links, anchors and referenced objects in a Web page, or recursively on a whole Web site

CSS Validator

To check the CSS as per W3C standards

HTML WDG

To check the HTML code as per WDG standards

Juicy Studio

To check the readability of the content of the website

Page Speed Report

To analyze Web page speed

WAVE

To check for accessibility errors as per WCAG standards

All these evaluators are available with Web Accessibility Tool Bar (WAT) 2.0 for IE.

WAT 2.0 is a Free Ware provided by Web Accessibility Tools consortium. The tool aids in manual examination of web pages for a variety of aspects of accessibility. The functions are

1.Identify components of webpage

2.Provide access to alternate views of page content

3.Facilitate use of 3rd party online applications

Along with these evaluators, WAT2.0 also features tools such as Lynx viewer, Grey scale, FAE, Contrast analyzer, Vischeck color blindness simulator, Resize etc

Other Tool

http://www.webshot.org – Free Online Tool used for Cross Browser Testing

Analysis for a XXX Website

This case study provides the effective usage of free online tools available to test the XXX Website

Evaluator

Standards

Errors

Warning

HTML

W3C

123image

107image

Link Checker

W3C

1    image

110image

CSS

W3C

49 image

203image

HTML

WDG

30 image

21   image

Not an issue: Though these errors are reported, it doesn’t affect the core functionality.

Nice to have: The ease of displaying in various browsers, PDA, Pagers etc.

* All the results reported were w.r.t the XXX Website

Accessibility analysis using WAVE

Web pages

No of Errorsimage

Home page

29

Collection page

55

Product page

46

Shopping Cart

27

Sign in - check out

24

Recipient page

36

Billing page

48

Review order page

79

Not an issue: Within a given system configuration we will not have any issues.

Nice to Have: The web page is assured to meet WCAG standards and also accessible at any situation. (The errors reported are “Alt text” missing for the images, links, form labels. The error will be critical to access, if the page is not fully loaded.)

* Refer appendix for screen shot of WAVE Tool

Browser Compatibility

Browser

Versions

Browser

Versions

Avant

11.7

K-Meleon

(1.1.4), (1.5.0)

Chrome

(0.2.149.30), (1.0.154.65), (0.3.154.9), (0.4.154.25)

Konqueror

4.2

Epiphany

2.22

Minefield

(3.1), (3.2), (3.6)

Firefox

(0.8), (0.9), (1.0.8), (1.5),(2.0.0.4), (3.0.1), (3.1)

MSIE

(5.01), (6.0), (7.0)

Flock

(1.2.6), (2.03)

Navigator

9.0.06

Galeon

2.0.4

Safari

(3.2.1), (4.0)

Iceweasel

3.0.4

SeaMonkey

(1.1.14), (2.0)

Kazehakase

0.5.2

Shiretoko

(3.1), (3.5)

Opera

(8.0), ( 9.25), (10.0), (7.03), (7.11), (7.54), (8.54), (9.26), (9.27), (9.6), (9.62), (9.63), (9.64)

47 browsers showed positive results for the XXX Website. Almost all the widely used browsers were able to access the site

Evaluator

Value

Gunning Fox Index

10.89

Flesh Reading Ease

52.04

Flesh Kincaid Grade

6.92

 

  • Gunning Fox index and Flesh Kincaid, the value represents number of schoolings required to read the content. For example if the value is sixteen, the sixteen years of schoolings is required to read the content
  • For Flesh Reading Ease, the result is an index number that rates the text on a 100-point scale. The higher the score, the easier it is to understand the document. 60 – 70 is standard value and we stand at only 52.04 (Almost Near but still can reach the Standards)
Web Page Speed Analysis

Test

Units

Remarks

Status

Total HTML

1

The Total HTML should be 1

Pass

Total Objects

93

The Optimum number of object should be <20

Warning

Total Images

69

The images should be lowered for quick loading

Warning

Total CSS

5

The CSS should be lowered for quick loading

Caution

Total Size

735Kb

The optimum size should be < 100Kb

Warning

Total Script

18

The total script should be lowered for quick loading

Warning

HTML Size

67Kb

The Optimum HTML size should be <50Kb

Caution

Image Size

390Kb

The Optimum Image size should be <100Kb

Warning

Script Size

223Kb

The Optimum Script size should be <20Kb

Warning

CSS size

55Kb

The Optimum CSS Size should be <55Kb

Warning

Multi Size

0

The Multimedia size should be < 10Kb

Pass

Nice to Have: The Page size – Images, CSS, HTML can be decreased so that the pages gets loaded faster

Note: This depends on parameters like bandwidth, System Configuration etc..

Merits

The benefits on rectification of these errors are,

Clean code – Any developer could analyze the code easily.

Search Engine friendly – Search engines can track our website with ease

Faster Page Loading – A minor change can help in loading the pages quickly

Standard compliance - Publishing the standard compliance the Client can win the customer confidence

Accessibility – Easy accessibility of website across the various category of people

Once the site is W3C standard compliant, then the site shall have the W3C logo

image

Conclusion
  • The Accessibility can be improvised to WCAG standard and open a new path for Disabled customers
  • Writing web pages in accordance with the standards shortens site development time and makes pages easier to maintain. Debugging and troubleshooting become easier, because the code follows a standard
  • Cross Browser testing can ensure that the Website can be used by the largest possible audience, with minimal variation in the user-experience
  • The Readability Test provides prediction of the reading ease of the document with respect to sentence length and polysyllabic words. The results shall be used with Good writing style guidelines to increase the readability
  • Lower page load times create more comfortable visitors. The Website brand perception will be enhanced.
Screen Shots

Web Accessibility Tool bar 2.0

image

WAVE report

image

Effective usage of Online tools for web based testing

Web Standards
  • The World Wide Web Consortium (W3C) is an international consortium where Web standards are developed. Mission : To lead the WWW to its full potential by developing protocols and guidelines that ensure long-term growth for the Web
  • A major chunk of the population suffers from some of the impairment. The target segment for accessibility includes – Visual, Hearing, Old age, low Literacy, low bandwidth connections
  • Web Content Accessibility Guidelines (WCAG) - Explains to Web content developers how to make content accessible for people with disabilities
  • Section 508 and ADA (American with Disabilities Act) are laws that govern Web accessibility in US
Browser Compatibility
  • Different web browsers are created by different organizations, user-interface enhancements. The designer must devise a strategy that accounts for this possible variation to ensure that a website can still be used without the enhancement or only use features that are supported by ‘all’ browsers.
  • Testing the Browser Compatibility of Web sites across various browsers and operating systems
Readability
  • Readability is the measure of how easy to read and comprehend a document
  • Reading level algorithms only provide a rough guide, but can give a useful indication as to whether you've pitched your content at the right level for your intended audience
  • Gunning Fog, Flesch Reading Ease, and Flesch-Kincaid are reading level algorithms that can be helpful in determining how readable your content is
Web Page Speed Analysis
  • Web page speed is based on page size, composition, and download time.
  • Page weight is a measurement of the file size of a Web page that includes the combined size of all the elements of the page, including HTML files, images, audio or video, Flash animation, etc.
  • Reducing the page weight obviously improve the performance of the web site
Tools Usage

Tools

Description

HTML Validator

To check the HTML code as per W3C standards. This validator checks the markup validity of Web documents in HTML, XHTML, SMIL, MathML etc

Link Checker

To check the Link navigations as per W3C standards. Link Checker looks for issues in links, anchors and referenced objects in a Web page, or recursively on a whole Web site

CSS Validator

To check the CSS as per W3C standards

HTML WDG

To check the HTML code as per WDG standards

Juicy Studio

To check the readability of the content of the website

Page Speed Report

To analyze Web page speed

WAVE

To check for accessibility errors as per WCAG standards

All these evaluators are available with Web Accessibility Tool Bar (WAT) 2.0 for IE.

WAT 2.0 is a Free Ware provided by Web Accessibility Tools consortium. The tool aids in manual examination of web pages for a variety of aspects of accessibility. The functions are

1.Identify components of webpage

2.Provide access to alternate views of page content

3.Facilitate use of 3rd party online applications

Along with these evaluators, WAT2.0 also features tools such as Lynx viewer, Grey scale, FAE, Contrast analyzer, Vischeck color blindness simulator, Resize etc

Other Tool

http://www.webshot.org – Free Online Tool used for Cross Browser Testing

Analysis for a XXX Website

This case study provides the effective usage of free online tools available to test the XXX Website

Evaluator

Standards

Errors

Warning

HTML

W3C

123image

107image

Link Checker

W3C

1    image

110image

CSS

W3C

49 image

203image

HTML

WDG

30 image

21   image

Not an issue: Though these errors are reported, it doesn’t affect the core functionality.

Nice to have: The ease of displaying in various browsers, PDA, Pagers etc.

* All the results reported were w.r.t the XXX Website

Accessibility analysis using WAVE

Web pages

No of Errorsimage

Home page

29

Collection page

55

Product page

46

Shopping Cart

27

Sign in - check out

24

Recipient page

36

Billing page

48

Review order page

79

Not an issue: Within a given system configuration we will not have any issues.

Nice to Have: The web page is assured to meet WCAG standards and also accessible at any situation. (The errors reported are “Alt text” missing for the images, links, form labels. The error will be critical to access, if the page is not fully loaded.)

* Refer appendix for screen shot of WAVE Tool

Browser Compatibility

Browser

Versions

Browser

Versions

Avant

11.7

K-Meleon

(1.1.4), (1.5.0)

Chrome

(0.2.149.30), (1.0.154.65), (0.3.154.9), (0.4.154.25)

Konqueror

4.2

Epiphany

2.22

Minefield

(3.1), (3.2), (3.6)

Firefox

(0.8), (0.9), (1.0.8), (1.5),(2.0.0.4), (3.0.1), (3.1)

MSIE

(5.01), (6.0), (7.0)

Flock

(1.2.6), (2.03)

Navigator

9.0.06

Galeon

2.0.4

Safari

(3.2.1), (4.0)

Iceweasel

3.0.4

SeaMonkey

(1.1.14), (2.0)

Kazehakase

0.5.2

Shiretoko

(3.1), (3.5)

Opera

(8.0), ( 9.25), (10.0), (7.03), (7.11), (7.54), (8.54), (9.26), (9.27), (9.6), (9.62), (9.63), (9.64)

47 browsers showed positive results for the XXX Website. Almost all the widely used browsers were able to access the site

Evaluator

Value

Gunning Fox Index

10.89

Flesh Reading Ease

52.04

Flesh Kincaid Grade

6.92

 

  • Gunning Fox index and Flesh Kincaid, the value represents number of schoolings required to read the content. For example if the value is sixteen, the sixteen years of schoolings is required to read the content
  • For Flesh Reading Ease, the result is an index number that rates the text on a 100-point scale. The higher the score, the easier it is to understand the document. 60 – 70 is standard value and we stand at only 52.04 (Almost Near but still can reach the Standards)
Web Page Speed Analysis

Test

Units

Remarks

Status

Total HTML

1

The Total HTML should be 1

Pass

Total Objects

93

The Optimum number of object should be <20

Warning

Total Images

69

The images should be lowered for quick loading

Warning

Total CSS

5

The CSS should be lowered for quick loading

Caution

Total Size

735Kb

The optimum size should be < 100Kb

Warning

Total Script

18

The total script should be lowered for quick loading

Warning

HTML Size

67Kb

The Optimum HTML size should be <50Kb

Caution

Image Size

390Kb

The Optimum Image size should be <100Kb

Warning

Script Size

223Kb

The Optimum Script size should be <20Kb

Warning

CSS size

55Kb

The Optimum CSS Size should be <55Kb

Warning

Multi Size

0

The Multimedia size should be < 10Kb

Pass

Nice to Have: The Page size – Images, CSS, HTML can be decreased so that the pages gets loaded faster

Note: This depends on parameters like bandwidth, System Configuration etc..

Merits

The benefits on rectification of these errors are,

Clean code – Any developer could analyze the code easily.

Search Engine friendly – Search engines can track our website with ease

Faster Page Loading – A minor change can help in loading the pages quickly

Standard compliance - Publishing the standard compliance the Client can win the customer confidence

Accessibility – Easy accessibility of website across the various category of people

Once the site is W3C standard compliant, then the site shall have the W3C logo

image

Conclusion
  • The Accessibility can be improvised to WCAG standard and open a new path for Disabled customers
  • Writing web pages in accordance with the standards shortens site development time and makes pages easier to maintain. Debugging and troubleshooting become easier, because the code follows a standard
  • Cross Browser testing can ensure that the Website can be used by the largest possible audience, with minimal variation in the user-experience
  • The Readability Test provides prediction of the reading ease of the document with respect to sentence length and polysyllabic words. The results shall be used with Good writing style guidelines to increase the readability
  • Lower page load times create more comfortable visitors. The Website brand perception will be enhanced.
Screen Shots

Web Accessibility Tool bar 2.0

image

WAVE report

image

Sep 29, 2009

Why Cache the Web?

The short answer is that caching saves money. It saves time as well, which is sometimes the same thing if you believe that "time is money." But how does caching save you money?

It does so by providing a more efficient mechanism for distributing information on the Web. Consider an example from our physical world: the distribution of books. Specifically, think about how a book gets from publisher to consumer. Publishers print the books and sell them, in large quantities, to wholesale distributors. The distributors, in turn, sell the books in smaller quantities to bookstores. Consumers visit the stores and purchase individual books. On the Internet, web caches are analogous to the bookstores and wholesale distributors.

The analogy is not perfect, of course. Books cost money; web pages (usually) don't. Books are physical objects, whereas web pages are just electronic and magnetic signals. It's difficult to copy a book, but trivial to copy electronic data.

The point is that both caches and bookstores enable efficient distribution of their respective contents. An Internet without caches is like a world without bookstores. Imagine 100,000 residents of India each buying one copy of Comics Book from the publisher in India. Now imagine 50,000 Internet users in Australia each downloading the Yahoo! home page every time they access it. It's much more efficient to transfer the page once, cache it, and then serve future requests directly from the cache.

In order for caching to be effective, the following conditions must be met:
  • Client requests must exhibit locality of reference.
  • The cost of caching must be less than the cost of direct retrieval.
We can intuitively conclude that the first requirement is true. Certain web sites are very popular. Classic examples are the starting pages for Netscape and Microsoft browsers. Others include searching and indexing sites such as Yahoo! and Altavista. Event-based sites, such as those for the Olympics, NASA's Mars Pathfinder mission, and World Cup Soccer, become extremely popular for days or weeks at a time. Finally, every individual has a few favorite pages that he or she visits on a regular basis.

It's not always obvious that the second requirement is true. We need to compare the costs of caching to the costs of not caching. Numerous factors enter into the analysis, some of which are easier to measure than others. To calculate the cost of caching, we can add up the costs for hardware, software, and staff time to administer the system. We also need to consider the time users save waiting for pages to load (latency) and the cost of Internet bandwidth.

Three primary benefits of caching web content:
  • To make web pages load faster (reduce latency)
  • To reduce wide area bandwidth usage
  • To reduce the load placed on origin servers
Types of Web Caches


Web content can be cached at a number of different locations along the path between a client and an origin server. First, many browsers and other user agents have built-in caches. For simplicity, I'll call these browser caches. Next, a caching proxy (a.k.a. "proxy cache") aggregates all of the requests from a group of clients. Lastly, a surrogate can be located in front of an origin server to cache popular responses.


Browser Caches



Browsers and other user agents benefit from having a built-in cache. When you press the Back button on your browser, it reads the previous page from its cache. Nongraphical agents, such as web crawlers, cache objects as temporary files on disk rather than keeping them in memory.

Netscape Navigator lets you control exactly how much memory and disk space to use for caching, and it also allows you to flush the cache. Microsoft Internet Explorer lets you control the size of your local disk cache, but in a less flexible way. Both have controls for how often cached responses should be validated. People generally use 10–100MB of disk space for their browser cache.

A browser cache is limited to just one user, or at least one user agent. Thus, it gets hits only when the user revisits a page. As we'll see later, browser caches can store "private" responses, but shared caches cannot.

Caching Proxies

Caching proxies, unlike browser caches, service many different users at once. Since many different users visit the same popular web sites, caching proxies usually have higher hit ratios than browser caches. As the number of users increases, so does the hit ratio.

Caching proxies are essential services for many organizations, including ISPs, corporations, and schools. They usually run on dedicated hardware, which may be an appliance or a general-purpose server, such as a Unix or Windows NT system. Many organizations use inexpensive PC hardware that costs less than $1,000 / 45000 Rs. At the other end of the spectrum, some organizations pay hundreds of thousands of dollars/ Rupees, or more, for high-performance solutions from one of the many caching vendors.

Caching proxies are normally located near network gateways (i.e., routers) on the organization's side of its Internet connection. In other words, a cache should be located to maximize the number of clients that can use it, but it should not be on the far side of a slow, congested network link.

Caching Proxy Features

The key feature of a caching proxy is its ability to store responses for later use. This is what saves you time and bandwidth. Caching proxies actually tend to have a wide range of additional features that many organizations find valuable. Most of these are things you can do only with a proxy but which have relatively little to do with caching. For example, if you want to authenticate your users, but don't care about caching, you might use a caching proxy product anyway.



Authentication
A proxy can require users to authenticate themselves before it serves any requests. This is particularly useful for firewall proxies. When each user has a unique username and password, only authorized individuals can surf the Web from inside your network. Furthermore, it provides a higher quality audit trail in the event of problems.
Request filtering
Caching proxies are often used to filter requests from users. Corporations usually have policies that prohibit employees from viewing pornography at work. To help enforce the policy, the corporate proxy can be configured to deny requests to known pornographic sites. Request filtering is somewhat controversial. Some people equate it with censorship and correctly point out that filtering schemes are not perfect.
Response filtering
In addition to filtering requests, proxies can also filter responses. This usually involves checking the contents of an object as it is being downloaded. A filter that checks for software viruses is a good example. Some organizations use proxies to filter out Java and JavaScript code, even when it is embedded in an HTML file. I've also heard about software that attempts to prevent access to pornography by searching images for a high percentage of flesh-tone pixels.

For example, see http://www.heartsoft.com, http://www.eye-t.com, and http://www.thebair.com.

Prefetching
Prefetching is the process of retrieving some data before it is actually requested. Disk and memory systems typically use prefetching, also known as "read ahead." For the Web, prefetching usually involves requesting images and hyperlinked pages referenced in an HTML file. Prefetching represents a tradeoff between latency and bandwidth. A caching proxy selects objects to prefetch, assuming that a client will request them. Correct predictions result in a latency reduction; incorrect predictions, however, result in wasted bandwidth. So the interesting question is, how accurate are prefetching predictions? Unfortunately, good measurements are hard to come by. Companies with caching products that use prefetching are secretive about their algorithms.
Translation and transcoding
Translation and transcoding both refer to processes that change the content of something without significantly changing its meaning or appearance. For example, you can imagine an application that translates text pages from English to German as they are downloaded. Transcoding usually refers to low-level changes in digital data rather than high-level human languages. Changing an image file format from GIF to JPEG is a good example. Since the JPEG format results in a smaller file than GIF, they can be transferred faster. Applying general-purpose compression is another way to reduce transfer times. A pair of cooperating proxies can compress all transfers between them and uncompress the data before it reaches the clients.
Traffic shaping
A significant number of organizations use application layer proxies to control bandwidth utilization. In some sense, this functionality really belongs at the network layer, where it's possible to control the flow of individual packets. However, the application layer provides extra information that network administrators find useful. For example, the level of service for a particular request can be based on the user's identification, the agent making the request, or the type of data being requested (e.g., HTML, Postscript, MP3).
Meshes, Clusters, and Hierarchies

There are a number of situations where it's beneficial for caching proxies to talk to each other. There are different names for some different configurations. A cluster is a tightly coupled collection of caches, usually designed to appear as a single service. That is, even if there are seven systems in a cluster, to the outside world it looks like just one system. The members of a cluster are normally located together, both physically and topologically.

A loosely coupled collection of caches is called a hierarchy or mesh. If the arrangement is tree-like, with a clear distinction between upper- and lower-layer nodes, it is called a hierarchy. If the topology is flat or ill-defined, it is called a mesh. A hierarchy of caches make sense because the Internet itself is hierarchical. However, when a mesh or hierarchy spans multiple organizations, a number of issues arise.

List of caching products

Squid
http://www.squid-cache.org
Squid is an open source software package that runs on a wide range of Unix platforms. There has also been some recent success in porting Squid to Windows NT. As with most free software, users receive technical support from a public mailing list. Squid was originally derived from the Harvest project in 1996.

Netscape Proxy Server
http://home.netscape.com/proxy/v3.5/index.html
The Netscape Proxy Server was the first caching proxy product available. The lead developer, Ari Luotonen, also worked extensively on the CERN HTTP server during the Web's formative years in 1993 and 1994. Netscape's Proxy runs on a handful of Unix systems, as well as Windows NT.

Microsoft Internet Security and Acceleration Server
http://www.microsoft.com/isaserver/
Microsoft currently has two caching proxy products available. The older Proxy Server runs on Windows NT, while the newer ISA product requires Windows 2000.

Volera
http://www.volera.com
Volera is a recent spin-off of Novell. The product formerly known as Internet Caching System (ICS) is now called Excelerator. Volera does not sell this product directly. Rather, it is bundled on hardware appliances available from a number of OEM partners.

Network Appliance Netcache
http://www.netapp.com/products/netcache/
Network Appliance was the second company to sell a caching proxy, and the first to sell an appliance. The Netcache products also have roots in the Harvest project.

Inktomi Traffic Server
http://www.inktomi.com/products/network/traffic/
Inktomi boasts some of the largest customer installations, such as America Online and Exodus. Their Traffic Server product has been available since 1997.

CacheFlow
http://www.cacheflow.com
Intelligent prefetching and refreshing features distinguish CacheFlow from their competitors.

InfoLibria
http://www.infolibria.com
InfoLibria's products are designed for high reliability and fault tolerance.

Cisco Cache Engine
http://www.cisco.com/go/cache/
The Cisco 500 series Cache Engine is a small, low-profile system designed to work with their Web Cache Control Protocol (WCCP). As your demand for capacity increases, you can easily add more units.

Lucent imminet WebCache
http://www.lucent.com/serviceprovider/imminet/
Lucent's products offer carrier-grade reliability and active refresh features.

iMimic DataReactor
http://www.imimic.com
iMimic is a relative newcomer to this market. However, their DataReactor product is already licensed to a number of OEM partners. iMimic also sells their product directly.

Why Cache the Web?

The short answer is that caching saves money. It saves time as well, which is sometimes the same thing if you believe that "time is money." But how does caching save you money?

It does so by providing a more efficient mechanism for distributing information on the Web. Consider an example from our physical world: the distribution of books. Specifically, think about how a book gets from publisher to consumer. Publishers print the books and sell them, in large quantities, to wholesale distributors. The distributors, in turn, sell the books in smaller quantities to bookstores. Consumers visit the stores and purchase individual books. On the Internet, web caches are analogous to the bookstores and wholesale distributors.

The analogy is not perfect, of course. Books cost money; web pages (usually) don't. Books are physical objects, whereas web pages are just electronic and magnetic signals. It's difficult to copy a book, but trivial to copy electronic data.

The point is that both caches and bookstores enable efficient distribution of their respective contents. An Internet without caches is like a world without bookstores. Imagine 100,000 residents of India each buying one copy of Comics Book from the publisher in India. Now imagine 50,000 Internet users in Australia each downloading the Yahoo! home page every time they access it. It's much more efficient to transfer the page once, cache it, and then serve future requests directly from the cache.

In order for caching to be effective, the following conditions must be met:
  • Client requests must exhibit locality of reference.
  • The cost of caching must be less than the cost of direct retrieval.
We can intuitively conclude that the first requirement is true. Certain web sites are very popular. Classic examples are the starting pages for Netscape and Microsoft browsers. Others include searching and indexing sites such as Yahoo! and Altavista. Event-based sites, such as those for the Olympics, NASA's Mars Pathfinder mission, and World Cup Soccer, become extremely popular for days or weeks at a time. Finally, every individual has a few favorite pages that he or she visits on a regular basis.

It's not always obvious that the second requirement is true. We need to compare the costs of caching to the costs of not caching. Numerous factors enter into the analysis, some of which are easier to measure than others. To calculate the cost of caching, we can add up the costs for hardware, software, and staff time to administer the system. We also need to consider the time users save waiting for pages to load (latency) and the cost of Internet bandwidth.

Three primary benefits of caching web content:
  • To make web pages load faster (reduce latency)
  • To reduce wide area bandwidth usage
  • To reduce the load placed on origin servers
Types of Web Caches


Web content can be cached at a number of different locations along the path between a client and an origin server. First, many browsers and other user agents have built-in caches. For simplicity, I'll call these browser caches. Next, a caching proxy (a.k.a. "proxy cache") aggregates all of the requests from a group of clients. Lastly, a surrogate can be located in front of an origin server to cache popular responses.


Browser Caches



Browsers and other user agents benefit from having a built-in cache. When you press the Back button on your browser, it reads the previous page from its cache. Nongraphical agents, such as web crawlers, cache objects as temporary files on disk rather than keeping them in memory.

Netscape Navigator lets you control exactly how much memory and disk space to use for caching, and it also allows you to flush the cache. Microsoft Internet Explorer lets you control the size of your local disk cache, but in a less flexible way. Both have controls for how often cached responses should be validated. People generally use 10–100MB of disk space for their browser cache.

A browser cache is limited to just one user, or at least one user agent. Thus, it gets hits only when the user revisits a page. As we'll see later, browser caches can store "private" responses, but shared caches cannot.

Caching Proxies

Caching proxies, unlike browser caches, service many different users at once. Since many different users visit the same popular web sites, caching proxies usually have higher hit ratios than browser caches. As the number of users increases, so does the hit ratio.

Caching proxies are essential services for many organizations, including ISPs, corporations, and schools. They usually run on dedicated hardware, which may be an appliance or a general-purpose server, such as a Unix or Windows NT system. Many organizations use inexpensive PC hardware that costs less than $1,000 / 45000 Rs. At the other end of the spectrum, some organizations pay hundreds of thousands of dollars/ Rupees, or more, for high-performance solutions from one of the many caching vendors.

Caching proxies are normally located near network gateways (i.e., routers) on the organization's side of its Internet connection. In other words, a cache should be located to maximize the number of clients that can use it, but it should not be on the far side of a slow, congested network link.

Caching Proxy Features

The key feature of a caching proxy is its ability to store responses for later use. This is what saves you time and bandwidth. Caching proxies actually tend to have a wide range of additional features that many organizations find valuable. Most of these are things you can do only with a proxy but which have relatively little to do with caching. For example, if you want to authenticate your users, but don't care about caching, you might use a caching proxy product anyway.



Authentication
A proxy can require users to authenticate themselves before it serves any requests. This is particularly useful for firewall proxies. When each user has a unique username and password, only authorized individuals can surf the Web from inside your network. Furthermore, it provides a higher quality audit trail in the event of problems.
Request filtering
Caching proxies are often used to filter requests from users. Corporations usually have policies that prohibit employees from viewing pornography at work. To help enforce the policy, the corporate proxy can be configured to deny requests to known pornographic sites. Request filtering is somewhat controversial. Some people equate it with censorship and correctly point out that filtering schemes are not perfect.
Response filtering
In addition to filtering requests, proxies can also filter responses. This usually involves checking the contents of an object as it is being downloaded. A filter that checks for software viruses is a good example. Some organizations use proxies to filter out Java and JavaScript code, even when it is embedded in an HTML file. I've also heard about software that attempts to prevent access to pornography by searching images for a high percentage of flesh-tone pixels.

For example, see http://www.heartsoft.com, http://www.eye-t.com, and http://www.thebair.com.

Prefetching
Prefetching is the process of retrieving some data before it is actually requested. Disk and memory systems typically use prefetching, also known as "read ahead." For the Web, prefetching usually involves requesting images and hyperlinked pages referenced in an HTML file. Prefetching represents a tradeoff between latency and bandwidth. A caching proxy selects objects to prefetch, assuming that a client will request them. Correct predictions result in a latency reduction; incorrect predictions, however, result in wasted bandwidth. So the interesting question is, how accurate are prefetching predictions? Unfortunately, good measurements are hard to come by. Companies with caching products that use prefetching are secretive about their algorithms.
Translation and transcoding
Translation and transcoding both refer to processes that change the content of something without significantly changing its meaning or appearance. For example, you can imagine an application that translates text pages from English to German as they are downloaded. Transcoding usually refers to low-level changes in digital data rather than high-level human languages. Changing an image file format from GIF to JPEG is a good example. Since the JPEG format results in a smaller file than GIF, they can be transferred faster. Applying general-purpose compression is another way to reduce transfer times. A pair of cooperating proxies can compress all transfers between them and uncompress the data before it reaches the clients.
Traffic shaping
A significant number of organizations use application layer proxies to control bandwidth utilization. In some sense, this functionality really belongs at the network layer, where it's possible to control the flow of individual packets. However, the application layer provides extra information that network administrators find useful. For example, the level of service for a particular request can be based on the user's identification, the agent making the request, or the type of data being requested (e.g., HTML, Postscript, MP3).
Meshes, Clusters, and Hierarchies

There are a number of situations where it's beneficial for caching proxies to talk to each other. There are different names for some different configurations. A cluster is a tightly coupled collection of caches, usually designed to appear as a single service. That is, even if there are seven systems in a cluster, to the outside world it looks like just one system. The members of a cluster are normally located together, both physically and topologically.

A loosely coupled collection of caches is called a hierarchy or mesh. If the arrangement is tree-like, with a clear distinction between upper- and lower-layer nodes, it is called a hierarchy. If the topology is flat or ill-defined, it is called a mesh. A hierarchy of caches make sense because the Internet itself is hierarchical. However, when a mesh or hierarchy spans multiple organizations, a number of issues arise.

List of caching products

Squid
http://www.squid-cache.org
Squid is an open source software package that runs on a wide range of Unix platforms. There has also been some recent success in porting Squid to Windows NT. As with most free software, users receive technical support from a public mailing list. Squid was originally derived from the Harvest project in 1996.

Netscape Proxy Server
http://home.netscape.com/proxy/v3.5/index.html
The Netscape Proxy Server was the first caching proxy product available. The lead developer, Ari Luotonen, also worked extensively on the CERN HTTP server during the Web's formative years in 1993 and 1994. Netscape's Proxy runs on a handful of Unix systems, as well as Windows NT.

Microsoft Internet Security and Acceleration Server
http://www.microsoft.com/isaserver/
Microsoft currently has two caching proxy products available. The older Proxy Server runs on Windows NT, while the newer ISA product requires Windows 2000.

Volera
http://www.volera.com
Volera is a recent spin-off of Novell. The product formerly known as Internet Caching System (ICS) is now called Excelerator. Volera does not sell this product directly. Rather, it is bundled on hardware appliances available from a number of OEM partners.

Network Appliance Netcache
http://www.netapp.com/products/netcache/
Network Appliance was the second company to sell a caching proxy, and the first to sell an appliance. The Netcache products also have roots in the Harvest project.

Inktomi Traffic Server
http://www.inktomi.com/products/network/traffic/
Inktomi boasts some of the largest customer installations, such as America Online and Exodus. Their Traffic Server product has been available since 1997.

CacheFlow
http://www.cacheflow.com
Intelligent prefetching and refreshing features distinguish CacheFlow from their competitors.

InfoLibria
http://www.infolibria.com
InfoLibria's products are designed for high reliability and fault tolerance.

Cisco Cache Engine
http://www.cisco.com/go/cache/
The Cisco 500 series Cache Engine is a small, low-profile system designed to work with their Web Cache Control Protocol (WCCP). As your demand for capacity increases, you can easily add more units.

Lucent imminet WebCache
http://www.lucent.com/serviceprovider/imminet/
Lucent's products offer carrier-grade reliability and active refresh features.

iMimic DataReactor
http://www.imimic.com
iMimic is a relative newcomer to this market. However, their DataReactor product is already licensed to a number of OEM partners. iMimic also sells their product directly.

Text Widget

Copyright © Vinay's Blog | Powered by Blogger

Design by | Blogger Theme by