What is this?
Page web bloat score (WebBS for short) is calculated as follows:
WebBS = TotalPageSize / PageImageSize
TotalPageSize is the size of all requests, and PageImageSize is the size of a full-page screenshot.
The larger the WebBS, the more bloated a web page is relative to its image representation. For example, Tim Berners-Lee's homepage has a WebBS of 0.204, which makes it really efficient, while CNN has score of ~6, making it bloated.
Why?
In order to fix something, we need to measure it first.
Web bloat is a hot topic now: see posts by Maciej, Ronan, and Tammy. However, most of us still use a subjective absolute measurement: if it loads fast on your computer, then it's good. That "measure" is flawed: a web page with only two paragraphs that weighs 500 kB is going to load fast—but it's still bloated!
So, how to measure web bloat?
HTML is a text-based protocol, designed to render a graphics document on the client. The idea was that text is smaller to transfer than the full resolution image of a document. If that weren't so, Tim Berners-Lee would have designed a protocol to transfer images, not text and markup. That gives a convenient way to measure the bloat of any static web page—just compare it to a full-page screenshot of the same page.
An example
Take the "web bloat" Google search results page (SERP).
Measuring absolutely, it weighs 416 kB and loads in 1.11 seconds. That's fast.
However, a WebBS of 6.94 shows that it is bloated relative to its information content.
Not a huge surprise, as in 1998 the Google homepage was only 10 kB. And the content is pretty much the same: ten links. Sure, today it has voice search and auto-suggest, but most people just click on the first few links.
My web page has a WebBS>1, what do I do?
Convert your page to an image map. For example, Google SERP as an clickable image map is only 106.7 kB (and two requests).
Just kidding! But a surprising number of pages would be faster if Tim had created a pixel transfer protocol.
A high WebBS usually indicates unused stuff on the page: JavaScript, CSS, oversized images, etc. Maybe you have a valid reason for that content. But more often than not, it means you can optimize it more.
But wait: your sponsor's page is also bloated?!
Ooops. Yes, the WebBS of TestDome.com is 3.35.
It used to be much worse, and that's how all this started. As a TestDome co-founder, a few months ago I noticed that the homepage had become slightly slower. I opened the source HTML and found that nine customer reference logos were embedded in full resolution, like this 150 kB monster. I asked a developer to fix it, and he did a great job converting logos to css sprites. But he told me he would leave 13 other requests for web chat unchanged, because they are async and provided by a third party. The same with five requests for Google analytics. The designer wanted to leave a custom web font and jQuery gets cached anyway. In the end, we're a startup, and there are more urgent tasks than hand-coding our homepage.
We left it as it is, and I realised that popular web stacks make it hard to develop non-bloated pages.
How do you generate screenshots, and why PNG?
Requests go to an Azure VPS machine running SlimerJS, a scriptable browser. That is similar technology to what we use for web programming tests. The current calculator can process ~5 screenshots per minute; after that limit is reached, a page with instructions on how to calculate WebBS manually is displayed.
It was our arbitrary decision to use PNG, as they are lossless compression. Maybe JPEGs would be better, most web pages have lossy JPEGs on them anyway.
To measure the web bloat of a dynamic page (video or animation), compare it with compressed video of the same page. But this calculator doesn't do that.
How can the image of a page be smaller than the page itself?
Because web pages have been growing exponentially:
As of September 2016, the average web page is 2496 kB in size and requires 140 requests.
To understand why, we need a bit of history...
The long tale of two tribes
Since the first computers were connected, there was a fight. Between the "thin" tribe and the "fat" tribe.
The thin tribe wanted to render everything on the source server and make the destination server a "dumb" terminal. Quick, simple, and zero dependency. But the fat tribe said no, it's stupid to transfer every graphics element. Let's make a fat "smart" client that executes rendering (or part of the business logic) on the destination server. Then you don't need to transfer every graphics element, just the minimum data. The fat tribe always advertised three benefits of fat, smart clients: smaller bandwidth, less latency, and that the client can render arbitrary stuff.
But, in the early days of computing, "graphics" was just plain text. Data was pretty much the same as its graphic representation, and people could live with a short latency after they pressed enter at a command line. The thin tribe won and the text terminal conquered the world. The peak of this era was the IBM mainframe, a server that can simultaneously serve thousands of clients thanks to its I/O processors. The fat tribe retreated, shaking its collective fist, saying, "Just you wait—one day graphics will come, and we'll be back!"
They waited until the 80s. Graphics terminals become popular, but they were sluggish. Sending every line, color, or icon over the wire sucked up the bandwidth. When dragging and rearranging elements with the mouse, you could see the latency. Unlike simple text flow, graphics brought myriad screen resolutions, color depths, and DPI.
"We told you so!" said the fat tribe, and started creating smart client-server solutions. Client-servers and PCs were all the rage in the 80s. But even bigger things were on the horizon.
In 1989, Tim Berners-Lee was thinking about how to create world wide web of information. He decided not to join a tribe but to go the middle route. His invention, HTML, would transfer only the semantic information, not the representation. You could override how fonts or colors looked in your client, to the joy of fat tribe. But for all relevant computing you would do a round trip to the server, to the delight of the thin tribe. Scrolling, resizing, and text selection were instantaneous: there was only a wait when you decided to go to the next page. Tim's invention took the world by the storm. It was exactly the "graphics terminal" that nobody wished for but everybody needed. It was open and people started creating clients and adding more features.
The first candy was inline images. They required more bandwidth, but the designers promised to be careful and always embed the optimized thumbnail in the page. They also didn't like the free floating text, so they started using tables to make fixed layouts.
Programmers wanted to add code on the client for validation, animation, or just for reducing round trips. First they got Java applets, then JavaScript, then Flash.
Publishers wanted audio and video, and then they wanted ads.
Soon the web became a true fat client, and everybody liked it.
The thin tribe was acting like a crybaby: "You can't have so many dependencies—the latest Java, latest Flash, latest Real media encoder, different styles for different browsers, it's insane!" They went on to develop Remote desktop, Citrix XenDesktop, VNC, and other uncool technologies used by guys in grey suits. But they knew that adding crap to the client couldn't last forever. And there is a fundamental problem with HTML…
HTML was designed for academics, not the average Joe
Look at the homepages of Tim Berners-Lee, Bjarne Stroustrup, and Donald Knuth. All three together have 235 kB, less than one Google SERP. Images are optimized, most of the content is above the fold, and their pages were "responsive" two decades before responsive design became a thing. But they are all ugly. If the father of the WWW, the father of C++, and the father of computer algorithms were in an evening web development class, they would all get an F and be asked to do their homepages again.
The average Joe prefers form over content and is too lazy to write optimized code. To be honest, I would be an average Joe if I were a web developer. Implementing customer features brings more money and fame than optimizing CSS sprites. This leads me to a conclusion:
You can't blame web developers for making a completely rational decision.
If a 2496 kB page weight is the average, not an exception, then it is a failure of the technology, not all the people who are using it.
At one point, Google realized there was an issue with the web. Their solution: SPDY (now part of HTTP/2) and Brotli. The idea is that, although the web is bloated, we will create the technology to fix the bloat on the fly. Brotli is particularly interesting, as it uses a predefined 120 kB dictionary containing the most common words in English, Chinese, Arabic, as well as common phrases in HTML and JavaScript! But, there is only so much that lipstick can do for a pig. Even the best web compressor can't figure out whether all that JS and CS is actually going to be used, or replace images with thumbnails or improve the JPEG compression ratio because the user would never notice the difference. Lossless compression of some 255 kB JS library doesn't help much.
The thin tribe realized that with a good compressor and good bandwidth the game changes. OnLive Game Service was launched in 2010, allowing you to stream games from the cloud. The next year, Gaikai launched their service for cloud gaming. They were not competitors for long: Sony purchased Gaikai in 2012, and all OnLive patents in 2015. They used the technology to create PlayStation Now. Today I can play more than 400 live games on Samsung Smart TV, at 30 frames per second. But I still need to wait 8.3 second to fully load the CNN homepage. Who is crazy here?
Remember main arguments of the fat tribe: smaller bandwidth, less latency, and that the client can render arbitrary stuff. Seems that with websites of 2016, thin tribe can do all of that equally good or better.
In my modest opinion, the web is in the state of bloat because the fat tribe screwed it up. Today's technology makes it too easy to create bloated websites and too hard make slim ones.
About
Zel is the author of this text, parts of which are copied from his earlier "PXT Protocol" article. But that article is a piece of crap, as even smart readers of reddit didn't figure out it was a sarcastic piece.
If you think that this page does a better job of demonstrating the web bloat problem, please share: