I crawled top 25K sites for Speed analysis.

I decided to scan the top 25K websites on the internet The purpose was to see who all were still implementing the meta keyword tag. As I started crawling these websites the experiment became bigger and bigger. Below I present some of my findings.

I used the top 25K list from Quantcast. Approximately 1120 sites had a hidden profile so I could scan only 23880 websites.

1) Response times

Approximately 1352 websites failed either a DNS lookup or the connection timed out (slower than 20 seconds).

Here is a table for the response times of these sites.

RESPONSE TIME (SECONDS) URL ( TOTAL) % OF TOTAL.
1 9170 38.4
2 7380 30.9
3 3393 14.21
4 1218 5.1
5 818 3.43
6 196 0.82
7 65 0.27
8 44 0.18
9 187 0.78
10 31 0.13

Approx 92% of the sites opened up in the first 5 seconds.

  • less than 1000 websites were above 5 seconds.

Response codes

  • 5.6% (1352 took longer than expected or had no response.
  • 63.85% had 200 status ok. (2xx)
  • 28.57% used a redirect. ( 3xx )
  • 1.68% (402) websites had a not found error (4xx)
  • 0.23% had a server error (5xx)

Bonus Analysis on analytics software.

I also crawled their web page to find out how many were using Google tag manager. I also checked for others such as

  • Google Tag Manager
  • Google Analytics
  • Doubleclick
  • Marketo
  • Adobe Analytics

This is a relative percentage markup as not each site was tracked or followed. Approx 13K sites provided some sort of a js code to detect what system they were using.

The following is the table breakdown.

ANALYTICS SOFTWARE TOTAL % RELATIVE
Contains: googletagmanager 1243 8.17%
Contains: google-analytics 8553 56.19%
Contains: doubleclick 3218 21.14%
Contains: marketo 126 0.83%
Contains: adobe 986 6.48%