Performance Research, Part 2: Browser Cache Usage – Exposed!

By Tenni TheurerJanuary 4th, 2007

This is the second in a series of articles describing experiments conducted to learn more about optimizing web page performance. You may be wondering why you’re reading a performance article on the YUI Blog. It turns out that most of web page performance is affected by front-end engineering, that is, the user interface design and development.

In an earlier post, I described What the 80/20 Rule Tells Us about Reducing HTTP Requests. Since browsers spend 80% of the time fetching external components including scripts, stylesheets and images, reducing the number of HTTP requests has the biggest impact on reducing response time. But shouldn’t everything be saved in the browser’s cache anyway?

Why does cache matter?

It’s important to differentiate between end user experiences for an empty versus a full cache page view. An “empty cache” means the browser bypasses the disk cache and has to request all the components to load the page. A “full cache” means all (or at least most) of the components are found in the disk cache and the corresponding HTTP requests are avoided.

The main reason for an empty cache page view is because the user is visiting the page for the first time and the browser has to download all the components to load the page. Other reasons include:

  • The user visited the page previously but cleared the browser cache.
  • The browser cache was automatically cleared, based on the browser’s settings.
  • The user reloaded the page in a way that caused the cache to be bypassed. For example, the browser will bypass the cache if you hold down the control-shift key while clicking the Refresh button in Internet Explorer.

Strategies such as combining scripts, stylesheets, or images reduce the number of HTTP requests for both an empty and a full cache page view. Configuring components to have an Expires header with a date in the future reduces the number of HTTP requests for only the full cache page view.

Previously, we observed where the time is spent when a user requests http://www.yahoo.com with an empty cache. When a user loads the page, the browser downloads approximately 30 components (see Figure 1). Figure 2 is a graphical view of where the time is spent loading http://www.yahoo.com with a full cache. Each bar represents a specific component requested by the browser. Since components are already in the cache on a full cache page view, and the Expires header has a date in the future, the browser only has to download three components including the HTML document

Figure 1. Loading http://www.yahoo.com with an empty cache

Figure 1. Loading http://www.yahoo.com with an empty cache

Figure 2. Loading http://www.yahoo.com with a full cache

Figure 2. Loading http://www.yahoo.com with a full cache

Table 1 shows a summary of the total size and number of requests for each type of component to load http://www.yahoo.com. How much does a full cache benefit the user? Loading the page over my cable modem at home, it took 2.4 seconds with an empty cache and only 0.9 seconds with a full cache. The full cache page view had 90% fewer HTTP requests and 83% fewer bytes to download than the empty cache page view.

Table 1. Empty and Full Cache Summary to load http://www.yahoo.com

Table 1. Empty and Full Cache Summary to load http://www.yahoo.com

* Times were measured over cable modem (~2.5 mbps).

How many users view Yahoo! pages with a full cache?

The performance team at Yahoo! ran an experiment to determine the percentage of users and page views with an empty cache on some of Yahoo!’s most popular pages. We defined the experiment to measure users’ cache behavior related to a new component (an image). For this new image we measured the following statistics each day:

  1. What percentage of users requested the new image?
  2. What percentage of page views requested the new image?

The new image was configured with the following HTTP headers:

Expires: Thu, 15 Apr 2004 20:00:00 GMT
Last-Modified: Wed, 28 Sep 2006 23:49:57 GMT

When the browser saves a component in its cache, it also saves the Expires and Last Modified values. Specifying an Expires date in the past forces the browser to request the image every time the page is viewed (with a few exceptions, such as when users click the browser’s “back” button to return to a page). If the image is already in the browser’s cache and is being re-requested, the browser will pass the Last-Modified date in the request header. This is called a conditional GET request and if the image has not been modified, the server will return a 304 Not Modified response. The requests from browsers, therefore, result in one of the following response status codes:

  • 200 — The browser does not have the image in its cache.
  • 304 — The browser has the image in its cache, but needs to verify the last modified date.

Since the status codes are recorded in the apache access logs, we are able to determine the empty and full cache measurements by analyzing the logs.

The percentage of users with an empty cache is:

       # of unique users with at least one 200 response 
                                                        
                    total # of unique users             

The percentage of page views with an empty cache is:

                      # of 200 responses           
                                                   
           # of 200 responses + # of 304 responses 

Figure 3 shows the percentage of users and page views with an empty cache plotted over each day of the experiment. On the first day of the experiment, no one had these images cached so the empty cache percentage was 100%. As the days passed more users had the images cached, so the percentages dropped until at some point it reached a constant steady state.

Figure 3. Percentage of Users and Page Views with an Empty Cache

Figure 3. Percentage of Users and Page Views with an Empty Cache

Suprising Results

40-60% of Yahoo!’s users have an empty cache experience and ~20% of all page views are done with an empty cache. To my knowledge, there’s no other research that shows this kind of information. And I don’t know about you, but these results came to us as a big surprise. It says that even if your assets are optimized for maximum caching, there are a significant number of users that will always have an empty cache. This goes back to the earlier point that reducing the number of HTTP requests has the biggest impact on reducing response time. The percentage of users with an empty cache for different web pages may vary, especially for pages with a high number of active (daily) users. However, we found in our study that regardless of usage patterns, the percentage of page views with an empty cache is always ~20%.

Conclusion: Keep in mind the empty cache user experience. It might be more prevalent than you think!

86 Comments

  1. [...] Browser Cache Usage – Exposed! [...]

  2. nice post.. btw: did you tried YSlow against the blog site ? eheh – Performance Grade: D (68)

  3. [...] an empty cache experience and about 20% of all page views are done with an empty cache (see this article for more information on browser cache usage) This fact outlines the importance of keeping web pages [...]

  4. [...] for improving performance for first time visitors. As described in Tenni Theurer’s blog Browser Cache Usage – Exposed!, 40-60% of daily visitors to your site come in with an empty cache. Making your page fast for these [...]

  5. [...] Glaubt man den Ingenieuren von Yahoo, haben 40% bis 60% der Yahoo Benutzer Erfahrungen mit leerem Cache und 20% aller Seitenaufrufe erfolgen gar ohne Cache (mehr zu diesem Thema in diesem Blogeintrag). [...]

  6. 40-60% users with an empty cache seems pretty high. I could understand that for Firefox users as they are considered more tech savvy to know how to delete private data. but I doubt that for the majority of IE users and normally they stand for the bigger piece of cake in the browser game…

  7. [...] Performance Research, Part 2: Browser Cache Usage – Exposed! » Yahoo! User Interface Blog [...]

  8. The best possible way to achieve cacheability of an object is to perform a server-side re-write of all linked content (images, scripts etc) and re-write the links to refer to a file name based off the MD5 hash of the file content.

    So, your link:

    http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif

    is re-written:

    http://us.i1.yimg.com/never-expire/A31D5F12.gif

    Where A31D5F12 is the MD5 hash.

    The never-expire directory is configured via the Apache .htaccess file (with mod_expires) so that all content contained expires a long time in the future.

    Since the hash generates a globally unique value from a given file (ahem), the content is cacheable forever. Any referenced media can be treated in this way.

    Every time any linked media is changed, the page must be re-written and the appropriate new MD5 based file names substituted.

  9. [...] into the service with an empty browser cache (a figure derived as the result of a great deal of research on the topic by Yahoo), one can derive the following [...]

  10. Fernando Beck said:
    December 22, 2007 at 6:58 am

    I’ve overheard quite a few times that using 304 as a response to Yahoo/Google crawlers may affect the way my website is ranked afterwards.

    Is there any piece of truth behind that affirmation?

  11. [...] 20 percent of all views of that page are occurring with an empty browser cache, according to studies conducted at Yahoo (the logo image not being in a visitor’s browser cache, either because the have not visited [...]

  12. [...] some might argue that testing the speed of un-cached pages would be unfair, however according to Yahoo’s research on caching, approximately 50% of users will never have the opportunity to have the page contents be cached. [...]

  13. [...] describing experiments conducted to learn more about optimizing web page performance (Part 1, Part 2, Part 3, Part 4). You may be wondering why you’re reading a performance article on the YUI [...]

  14. [...] de un sistema operativo, éstas otras han de ser descargadas (si no han sido cacheadas, lo que sucede el 50% de las veces) por cada usuario que visualice la web. Esto supone un problema de rendimiento, ya que [...]

  15. [...] которые заходят к вам в первый раз. Как сказано в блоге Tenni Theurer’а: «40-60% посетителей приходят на сайт с пустым кешем». [...]

  16. [...] 80% of those page views are done with a primed cache (based on Yahoo!’s browser cache statistics). We’re down to 80M page [...]

  17. [...] 80% of those page views are done with a primed cache (based on Yahoo!’s browser cache statistics). We’re down to 80M page [...]

  18. Kevin Pearcey said:
    March 18, 2008 at 9:11 am

    Have you done any similar research to sites using https/ssl? It seems that the browsers cache usage is very different once the content is delivered with ssl in some cases never appearing to cache any content, even small images – though using css does seem to help.

  19. [...] 很多网站的 UI 设计人员为了达到某些视觉效果,会在一些用户需要频繁访问的页面模块上应用大量的图片。这样的情况,研究表明,对于用户粘度比较高的站点, 在Web 服务器上对这一类对象设置 Expires Header 就是十分有必要的,大量带宽就这么节省下来,费用也节省了下来。顺便说一下,对于验证码这样的东西,要加个简单的规则过滤掉。 [...]

  20. [...] for improving performance for first time visitors. As described in Tenni Theurer’s blog Browser Cache Usage – Exposed!, 40-60% of daily visitors to your site come in with an empty cache. Making your page fast for these [...]

  21. [...] for improving performance for first time visitors. As described in Tenni Theurer’s blog post Browser Cache Usage – Exposed!, 40-60% of daily visitors to your site come in with an empty cache. Making your page fast for these [...]

  22. Hi,

    I’ve a lot of confusion about Header Expires. I know it enables the users to load the pages fast if the components like CSS, images, scripts etc… in cache. But the thing I want to know is What is the syntax to use this, and where to use? I mean where to use this Header Expires in the html file or in CSS file?

    Your help is valuable to me and greatly appriciated.

    Regards,

    SreeRam.

  23. Tenni Theurer said:
    August 19, 2008 at 9:33 am

    @SreeRam: The Expires header can be set in your web server configuration. For example, Apache uses optional modules to include headers, including both Expires and Cache-Control. Use the ExpiresDefault directive to set an expiration date relative to the current date. For more information on this rule, take a look here.

  24. I am managing a website that has hundreds of thousands of new unique visitors daily. Which means that they have empty cache, how to cope up with this problem in a high javascript/css/AJAX based website?

  25. for the best result, how long must we set expires header?

  26. IE’s cache has been broken since the start, it will not check for new copies of a page at appropriate times, IE does not work with dynamic websites unless you effectively disable the cache. That is part of the reason you will always see 20% or more no cache page views.

  27. Is it possible to specifying conditional caching within for page loads? i.e. cache the html but do not cache the javascript that it links to, for example?

  28. for HTTP STATUS 304, we still need time to check if the media modified or not, right? how to remove the checking time?

  29. [...] Performance Research, Part 2: Browser Cache Usage – Exposed! [...]

  30. [...] When you measure your pages, you must test them using both an empty and a primed cache. The general assumption is that 20 percent to 50 percent of your incoming requests are being done with an empty cache. This supposition was proven to be true in a test that Yahoo conducted. [...]

  31. Surprised at the number of ‘use a unique filename for every rev. of the file’ and ‘make the filename an MD5 hash of its contents’ type comments.

    Have you never heard of ETags? Or do most IE flavours not support it?

  32. [...] for improving performance for first time visitors. As described in Tenni Theurer’s blog post Browser Cache Usage – Exposed!, 40-60% of daily visitors to your site come in with an empty cache. Making your page fast for these [...]

  33. [...] I ran an experiment to measure browser cache stats from the server side. Tenni’s write up, Browser Cache Usage – Exposed, is the stuff of legend. There she reveals that while 80% of page views were done with a primed [...]

  34. [...] and I ran an experiment to measure browser cache stats from the server side. Tenni’s write up, Browser Cache Usage – Exposed, is the stuff of legend. There she reveals that while 80% of page views were done with a primed [...]