Real Browser Monitoring: the browser profile approach

Posted by Pieter Ennes on February 15th, 2011

A few weeks ago we formally launched Real Browser Monitoring after a soft introduction last December. We’re really excited about this new monitoring capability, and are working hard to add more meat to this type of monitoring. Currently we’re adding support for:

  • Filling forms (POST) in a page
  • Multi-step monitors with Real Browsers
  • Flash (that’s actually a hard one to scale over all our customers)

In this post I’d like to share some of the implementation details, what additional improvements we’re working on, and a bit on other approaches we’re exploring.

Why Real Browser monitoring

As many website rely extensively on third party content (images, like-buttons, ads, …) and client side code (JavaScript), we see that in synthetic website monitoring too, there is a shift from basic measurements using HTTP probes, towards monitoring of complex pages including the third party content and code. And because the visitors of sites will use different browsers, there is also a need for insight into how the site loads in different browsers.

To satisfy this demand, many website monitoring companies have been competing to set up arrays of servers running Linux + Firefox, or Microsoft Windows + various flavours of Internet Explorer. For each real browser check, they need to fire up an instance of the browser (and in the case of IE, most likely including the surrounding virtual machine), load the web page, retrieve the results, and kill the instance. From an engineering point of view this scenario has room for optimisations, and from a operational point of view, with thousands of monitors that need to be executed every couple of minutes, scaling this up becomes key.

So at WatchMouse we looked at the problem from a different angle: by researching the main differences between browsers, then interviewing clients and partners which of these difference they found most important, and finally building a monitoring solution that is both satisfactory in terms of browser profiling, and has some extra features that are hard to offer using the more brute force method mentioned before. Let me try to describe some of the gory details.

Browser profiles

A lot of research on key inter-browser differences has been done by the great people involved in browserscope.org, a community-driven project for profiling web browsers. In fact, the BrowserScope network tab (also containing Steve Souders’ excellent UA profiler work) is at the basis of our solution: Instead of firing up every browsers executable out on the market and trying to puppeteer it to load a page, on various platforms, for thousands of monitors, we embrace the BrowserScope network parameters and exploit the fine-grained control that we have over a single engine and simulate the different browsers.

Some major browsers

Some of the key factors in browsers that affect the performance are:

  • the number of parallel connections that the browser can open to the same host
  • the way the browser handles concurrent downloads of scripts, images and CSS files

Using these parameters we realized the emulation of a number of actual browsers, which we fine-tuned by comparing the waterfall charts of different browsers on various test sites to the waterfall charts rendered by the emulation.

More on the different parameters can be found on the site of the browserscope.org project. Some factors, however, I would like to discuss in more detail below.

Resolving host names

DNS lookups are nowadays handled by most browsers in a similar way: By resolving host names asynchronously as soon as they are available, even before there is a free TCP socket. The differences between browsers are therefore negligible when examining the full page load.

Some modern browsers (notably Chrome, Safari 5, and some versions of Firefox) also have a feature called DNS pre-fetching. This option is unrelated to the above and doesn’t affect loading times of single web pages. In multi-step transactions, however, the option enables resolving host names in links the user may click on in the near future. There are some privacy concerns with this and we will most probably leave it disabled in any future multi-step monitors. With multi-step transactions in mind, I propose to add a new BrowserScope network parameter, indicating whether DNS pre-fetching is enabled by default or not. In single-step monitors there should be no measurable influence on performance anyway.

But what about IPv6, or AAAA/A vs. A/AAAA lookups?

Still on the topic of resolve time, in case one of our stations has IPv6 connectivity it is configured to perform DNS lookups in accordance with RFC 3484. On the other hand, early versions of Windows seem to have had no or non-standard IPv6 support. In other words: Even with IPv6 connectivity available, Windows may be preferring IPv4 above IPv6. More modern versions (i.e. Windows Vista/Server after 2008) seem to adhere to RFC 3484 just like the nodes in our monitoring network.

Mac OS X also had it’s share of problems with RFC 3484, but the other way around: it always prefers IPv6. This situation doesn’t arise on our network, and unless someone would be specifically interested in synthetic testing using this broken set up, we can ignore it.

Summarizing: We ignore any platform-induced differences with respect to name resolution, but as explained above, this should have minimal impact on the way a single page will be loaded.

TCP connect times and simultaneous connections

The overall connect times are significantly influenced by the number of connections a browser opens to each host, and to all hosts in total. Our engine sets the number of connections based on the previously mentioned research by Souders. We can therefore expect (and confirmed this in our experiments) that the connect times are very similar to those of the real browser.

Response times

Response times can be split in two components: the actual server response time and any network-induced latency. The latter will depend only on geographical location, and hardly on OS or browser version. The former, the server response time, can be influenced by the number of concurrent connections opened to the server by the client, or in theory the ordering of the requests. However, as our engine sets the correct number of connections for each browser profile, only the ordering of the requests may be different. As of yet we did not see any difference in practice due to this.

Total pageload time

After obtaining a HAR file describing the pageload, we use many of the remaining BrowserScope network parameters to do the last step in fine-tuning the pageload profile to the selected browser. For example, looking at the JS/CSS concurrency levels and information from the HTML, together tells us how the elements are to be ordered.

Check result displayed in the HAR viewer

JavaScript execution times

For now, we don’t touch these at all, at least until we have done more research on how to heuristically predict the changes.

Pros and cons of the profiling approach

Benefits:

  • Accurate timings (using real browser events)
  • Waterfalls for all profiles, not just one
  • HTTP authentication and SSL client certificates for all profiles
  • Matching on dynamic texts for all profiles
  • Snapshots for all profiles
  • Scalability, immediacy

Drawbacks:

  • Single JavaScript engine
  • Rendering quirks and artefacts won’t show
  • No interpretation of conditional comments (like <!--[if IE]>)

An independent expert opinion

Further considerations are pointed out by Aaron Peters, an independent Web Performance consultant:

It’s great to see WatchMouse acknowledging the added value of Real Browser Monitoring and adding this as a service to their customers. Measuring page load times with real browsers is key to understanding the end user experience.

he says, while adding:

WatchMouse takes an interesting and innovative approach by using a single platform and then applying algorithms based on browser profiles to calculate the ‘actual’ page load times. The results I have seen so far are impressive

Aaron does believe, however, that for certain web pages the above mentioned drawbacks can have a significant impact on the results:

For example, the performance of JavaScript engines in browsers differ greatly. On a JavaScript-heavy page, the speed of the JavaScript engine can have a large impact on the overall page load times and user experience. In order to make their RBM service top of the bill, WatchMouse will have to get those last few important browser differences into their algorithm.

Conclusion

We believe that we have built a product that allows us to offer competitive, scalable and user-friendly real browser monitoring in a different way. We show that the profile approach has advantages as well as limitations, but when used in a synthetic monitoring environment, the benefits outweigh the imperfections by far.

If you feel you don’t get enough browser-specific detail from one of your checks, then it’s easy to do a single-shot measurement on the page in question using one of the many terrific tools out there.

When more insight is needed into different browsers on a continuous basis, passive (i.e. non-synthetic) methods may be a suitable alternative. Naturally, we’re working hard to get our version of Real User Monitoring (RUM) out there as soon as possible!

Next steps

Having this product out in the open allows us to gather bigger volumes of data required to tune the profiles further. A current topic of interest is to see if we can predict how the first-visual event translates across browsers. We’re also researching to install real browsers on a small set of nodes to be used as a reference platform.

But that’s not all and we’re certainly not finished yet! So let us know in case you have any thoughts on this matter.

Pieter Ennes
VP Engineering

References

  1. UA Profiler, http://stevesouders.com/ua/
  2. DNS Prefetching, http://www.chromium.org/developers/design-documents/dns-prefetching
  3. DNS Prefetching and Its Privacy Implications, http://www.usenix.org/event/leet10/tech/full_papers/Krishnan.pdf
  4. RFC 3484, http://www.ietf.org/rfc/rfc3484.txt
  5. RFC 3484 in Windows, http://support.microsoft.com/kb/969029
  6. Measuring and combating IPv6 brokenness, http://ripe61.ripe.net/presentations/162-ripe61.pdf

Where is this cloud thing anyway?

Posted by mark on February 11th, 2010

If you are considering a cloud infrastructure, your first questions should be: Does it exist? Does it scale? With the WatchMouse network of monitoring stations, I did some research over the past year into the computing clouds to shed some light on these basic questions.

We looked at the uptime of a number of cloud providers, such as Amazon, Akamai, and Google AppEngine. The results are in the following figure. What we see is that cloud uptime is pretty good, and it is getting better. As a matter of fact, these clouds perform better than a lot of commercial websites.

Cloud providers uptime

Another interesting question is: where is the cloud? We looked at the distance from the WatchMouse monitoring stations to a Google App Engine application. The following figure shows that the App Engine cloud is in fact spread out around the world!

Cloud providers performance

On the vertical axis we see a number of WatchMouse monitoring stations, so that each horizontal bar represents the connect time from that station to either a regular website in the Netherlands, or an application at Google Apps Engine. From the graph it is clear that that cloud is in more than one place on the globe. Given the programming model of Google Apps Engine, this means that it is very scalable. Our research also shows that this cloud is expanding. For a more in-depth discussion of this data go to the slidecast (slides plus audio) of my presentation at the 2009 Computer Measurement Group conference.

So if you still thought that clouds are a foggy notion, this research is a wake-up call. Clouds do exist, are scalable and are getting better.

This is a guest column by Peter van Eijk, owner of Digital Infrastructures, a consultancy firm. He publishes a blog at http://petersgriddle.net and is currently setting up the Computer Measurement Group’s Dutch chapter. If you speak Dutch, follow the blog and join the NLCMG LinkedIn group.  Contact information is listed on LinkedIn and on www.digitalinfrastructures.nl/contact

Public DNS servers performance, worth the trouble?

Posted by Pieter Ennes on December 9th, 2009

Google’s announcement to offer a public DNS service similar to OpenDNS was already discussed exhaustively last week. But we noticed that a lot of people wondered how the different public DNS recursors compare performance-wise, worldwide. We therefore did a small study in this field using the WatchMouse monitoring network.

Test method
We set up monitors for the following public DNS services:

Another one, ScrubIT, turned out to be DOA and was removed from the test as neither of its servers answer queries, even though their blog says “It works!“. Yet another, OpenNIC was left out since its configuration requires a different set of IP addresses in each country. Finally, there are public servers from Level3 with widely known IP’s, but an official web page about this service seems hard to find. We’ve only just started monitoring them and therefore left them out in this test.

For a period of five days, we probed each of the above DNS services for five different records: ‘A’ records for cnn.com (TTL=300), bbc.co.uk (TTL=300) and www.techcrunch.com (TTL=1200); ‘MX’ records for yahoo.com (TTL=7200), and an ‘AAAA’ record for it.ipv6.watchmouse.com (TTL=28800). Each monitor was executed every 5 minutes and was configured to send a single UDP query (with no retries), and time out after 3 seconds without a reply. This set-up was duplicated to monitor two different recursors for each service provider; the obtained results for these were averaged.

For each DNS service roughly 18.000 queries were performed over a six day period (2 servers * 6 days * 288 probes/day * 5 monitors), rotating the 42 locations in our network.

We also added two reference monitors: one utilising a local recursor running on our monitoring station, and another one, dubbed ‘Direct’, querying one of the listed name servers directly.

Results
Before going into the performance details, let’s have a look at the relative number of errors per service:

Failures per 1000 queries

First thing to note is that a local recursor generates less failures compared to a direct queries to one of the listed name servers, probably due of its ability to cache and prefer fast and correctly functioning servers over ones that are slow or failing. Both DNSAdvantage and OpenDNS do a good job in masking name server errors and minimising lookup time-outs; their failure rate was below 5‰. DNSResolvers seems to have a common failure rate of just under 10‰. Google’s free service, on the other hand, fails to return records within the 3 second time-out in about 15‰ of the queries. That’s worse than a direct query (13‰) to one of the provider servers or through a local resolver (9‰). And it’s three times as high as the two other major competitors.

World-wide DNS performance

When we consider the performance of the queries that did receive a valid response, it can be seen that a query through a local resolver is a little bit slower than using a direct query, again on average, worldwide. The difference of ~20ms, however, cannot easily be explained by taking into account the (negligible) round trip time of the UDP query to the local host. The variance in performance from the local resolver is also increased, so it would be likely that other factors play a role here.

DNSResolvers either have very busy servers, or do not seem to do a good job in reducing network latency. Their performance is nearly twice as bad compared to using the local resolver. Most likely their service does not facilitate something called Anycast to route queries to a nearby data centre. The remaining providers do use anycasting, and clearly have an advantage because of this.

Discarding time-outs, Google’s public DNS (59ms on average) definitely is the best in terms of performance, with OpenDNS in a photo finish with a 80ms score. But also the services offered by DNSAdvantage display solid sub-100ms performance with 93ms.

Discussion
Most of the tested host names were either for high-volume sites or had large TTL’s, causing public caches to be easily primed and expose their qualities. OpenDNS, Google and DNSAdvantage all show that they master this and have better lookup times than a local resolver or a direct query.

To avoid influences of second order effects, the measurements were done using only a single UDP query in each probe. By doing so, we were able to separate real query performance from packet loss and server failures. In real life, however, a typical PC would retry the query after a certain time (~5 seconds). Our DNS monitors can do this, but this would cause the failures to be blended into the performance measurements, inducing an (arbitrary) bias from the chosen time-out setting.

Also, real people (or offices for that matter) are most often not in multiple places at the same time. Thus in real life one would be more interested in what the best service provider is in your area. We hope to have some time for a second blog item on this in the near future, with typical user settings, and a breakdown per area.

Measuring averaged worldwide performance the way we did now is still nice as a synthetic benchmark. So for fun, and because everyone finds DNS response times very important, we can introduce a DNS time wasting score. Based on, say, an 100 sequential lookups per day for an average user and a time-out of 3 seconds, the failures induce extra waiting time and influence the score:

RankProviderDaily quality score
1OpenDNS100 lookups/day * (80ms + 4.8‰ * 3s) = 9.4 seconds/day
2Google100 lookups/day * (59ms + 15‰ * 3s) = 10.4 seconds/day
3DNSAdvantage100 lookups/day * (92ms + 4.4‰ * 3s) = 10.6 seconds/day
4Local100 lookups/day * (157ms + 8.6‰ * 3s) = 18.2 seconds/day
5DNSResolvers100 lookups/day * (289ms + 9.5‰ * 3s) = 31.7 seconds/day

(10 seconds per day adds up to roughly one hour of DNS waiting per year)

Conclusion
Three of the researched providers clearly are competitive in terms of ‘clean’ performance and offer a useful service to the public. But the number of failures shown in the first chart must be considered as an intrinsic part of the quality of service.

And this is where Google, with a 3 times higher failure rate, seems to have more problems than OpenDNS and DNSAdvantage. For 15 out of 1000 host name lookups, Google fails to respond within 3 seconds (or the packet is just lost), causing an extra lookup after a time-out and an observed lookup time of multiple seconds. It seems that the Google service is fast indeed, but not the most reliable.


Want to monitor DNS too? WatchMouse offer DNS monitoring in their packages!

Latest experiments

Categories