I Crawled Top 25K Sites For Speed Analysis.

Updated:

I decided to scan the top 25K websites on the internet The purpose was to see who all were still implementing the meta keyword tag. As I started crawling these websites the experiment became bigger and bigger. Below I present some of my findings.

I used the top 25K list from Quantcast. Approximately 1120 sites had a hidden profile so I could scan only 23880 websites.

Response times

Approximately 1352 websites failed either a DNS lookup or the connection timed out (slower than 20 seconds).

Here is a table for the response times of these sites.

RESPONSE TIME (SECONDS)URL ( TOTAL)% OF TOTAL.
1917038.4
2738030.9
3339314.21
412185.1
58183.43
61960.82
7650.27
8440.18
91870.78
10310.13

Approx 92% of the sites opened up in the first 5 seconds.

  • less than 1000 websites were above 5 seconds.

Response codes

  • 5.6% (1352 took longer than expected or had no response.
  • 63.85% had 200 status ok. (2xx)
  • 28.57% used a redirect. ( 3xx )
  • 1.68% (402) websites had a not found error (4xx)
  • 0.23% had a server error (5xx)

Analysis on analytics software.

I also crawled their web page to find out how many were using Google tag manager. I also checked for others such as

  • Google Tag Manager
  • Google Analytics
  • Doubleclick
  • Marketo
  • Adobe Analytics

This is a relative percentage markup as not each site was tracked or followed. Approx 13K sites provided some sort of a js code to detect what system they were using.

The following is the table breakdown.

ANALYTICS SOFTWARETOTAL% RELATIVE
Contains: googletagmanager12438.17%
Contains: google-analytics855356.19%
Contains: doubleclick321821.14%
Contains: marketo1260.83%
Contains: adobe9866.48%