Quick, optimistic page responses for faster TTFB

How quick is your site's time-to-first-byte? At Wavo.me, we noticed that our TTFB was slower than it should be. Occasionally, depending on the specific page being loaded, the time to first byte could hit as high as 1.5s (higher, actually, if significant portions of the data required to render the page were not cached).

Unfortunately, TTFB can be a difficult problem to solve. The way many web apps are architected, in order to serve a response, you need wait for all database or cache queries to complete before you can render the page and respond. This period - the period after the browser has issued its request but before the server has begun serving the response - leaves the browser entirely idle. The browser has literally nothing to do but wait.

Why should the browser be idle, though? A typical single-page webapp is going to have a common, minified set of JS, CSS, fonts, etc. Why shouldn't the browser fetch and parse those files while it's waiting for the server to respond with the database-driven portion of the response?

That's more or less what we've implemented at Wavo.me. Right now, when you fetch a page on Wavo.me, we're going to serve you a small common portion of the html that is common to all pages on Wavo before we know anything about the request - before we've even made our first database query. We serve this to the browser immediately, knowing that browsers are smart enough to begin parsing and processing html before the entire document has arrived. As soon as the browser sees the external content, it will go fetch that content while it waits for the server to complete its response.

Meanwhile, after serving the common portion of the html, our server begins the typical request-response pattern by querying caches/dbs and rendering pages. When it is done rendering a response, it emits the remaining portion of the response. Effectively, time-to-first-byte reduces to zero (locally, without any networking overhead, median ttfb is 7ms - on Wavo itself, there is some overhead simply in the networking and in the loadbalancing/TLS that we do). Instead of waiting for 1s worth of database queries to resolve, we're able to respond with useful content within tens of milliseconds of receiving the request.

Now, there are some gotchas to this approach. We're currently testing this in production on Wavo.me, and it's perfectly likely that we'll decide that it causes more problems than it is worth.

We're optimistically serving pages. That means that we're serving pages that we know responded with a 200 status code at least once since the last time we pushed code to the server, but it is possible that the current request will result in some other status code. We optimistically respond with a 200, immediately output the common content, and then when we see that the actual backend response is something unexpected, we abort by emitting a meta refresh in the head, evicting the cached common content, and reloading the page. On reload, the user will correctly see (for example) a 404 page.
By optimistically serving the head, we short-circuit the ability to emit custom http headers while we're rendering. Headers in a http request come before the body, and we're immediately serving up a portion of the body. That means that our code cannot output any headers. This is particularly problematic for code-driven redirects, for example.
We no longer emit a content-length in our responses (because, well, we emit a portion of the body before we actually know the content-length). For our stack, it doesn't much matter - browsers are perfectly happy without a content-length, and we have no special caching proxies that rely on content-length. But it is quite possible that this would cause problems for others.
There's a legitimate argument to be made that the performance gains by implementing this are outweighed by the increased complexity and, well, weirdness of the server code. The code is clean and well encapsulated, but it introduces restrictions (see above) that aren't typical. I could see it leading to bugs if new developers aren't aware of the quickstart code and the issues involved.

Overall, this has given us a small but measurable performance win on page rendering. It's not perfectly clear that we'll keep this code active in production - in fact, HTTP/2 server push makes this optimization unnecessary - but it has been an interesting little experiment.