In recent weeks, I’ve been working on Tello, a web app to track and manage television shows:
As web apps go, it’s relatively small — about 10,000 lines of code. It’s a React/Redux app, bundled by Webpack, served by a thin Node backend (using Express and MongoDB). 90% of the code is on the front-end.
Source code is available on Github.
There are a lot of aspects to web performance. Historically, I’ve focused more on the post-load front-end side of things: stuff like making sure scrolling is jank-free, and animations are smooth.
In contrast, I don’t often pay much attention to page-load time, at least not on small pet-projects. After all, there isn’t that much code being shipped; it has to be pretty fast out-of-the-box, right?
After doing an initial benchmark, however, I was surprised to see that my 10k-LOC pet project was slowwww: on a 3G connection, it took around 5 seconds for anything meaningful to show up, and more than 15 seconds for all network requests to resolve.
I realized that I needed to invest some time and energy on this problem. It doesn’t matter how beautiful my animations are if folks bail after staring at a blank white screen for 5 seconds!
All in all, I experimented with half a dozen techniques, over the course of a weekend. The end result is that the page shows meaningful content after ~2300ms — A reduction of over 50%!
This blog post is a case-study in the specific techniques I tried, and how well they worked. More broadly, though, it’s about what I learned along the way about diagnosing problems, and what my thought process was for coming up with solutions.
All profiling done uses the same settings:
- “Fast 3G” throttled network speed.
- Desktop resolution.
- Disabled HTTP cache.
- Logged-in, to an account with 16 tracked TV shows.
We need a baseline that we can compare results against!
The page we’ll be testing is the main logged-in Summary view. it’s the most data-heavy page, and so it offers the most room for optimization.
The Summary view contains a set of cards, like this:
Here’s our baseline profile, on 3G speed. It’s… not great.
First Meaningful Paint: ~5000ms
First image loaded: ~6500ms
All requests finished: >15,000ms
Oof. The page doesn’t display anything useful until ~5 seconds in. The first image only finishes loading around 6.5 seconds, and it takes well over 15 seconds for all network requests to complete.
This timeline view offers a bunch of insights. Let’s walk through it and examine what it’s doing:
- The initial HTML page is fetched. This is nice and quick, because the app is not server-rendered.
- The monolithic JS bundle has to be downloaded. This takes forever. 🚩
- React traverses the component tree, computes the initial mount state, and pushes it to the DOM. It’s a header, a footer, and a lot of black space. 🚩
- After mount, the application realizes that it needs some data, so it makes a GET request to /me, an API endpoint that returns the user’s data, as well as an array of shows they care about, and which episodes they’ve seen.
- Once we have that crucial list of shows, the application can fetch:
- an image for each show,
- an array of episodes for each show.
This data comes from the wonderful TV Maze API.
You may be wondering why I don’t just store episode info in my database, so I can skip all those calls to TV Maze. Ultimately, TV Maze is the source of truth; it has all the info about new episodes. I could make these requests on the server during Step 4, but that would greatly increase the amount of time that step takes, while a user stares into a sea of empty black space. Plus, I like having a lean server layer.
One potential workaround could be to set up a cron job that does daily syncs with TV Maze, and only request episodes directly if I don’t already have recent data. I kinda like that the data is realtime, though… this avenue will be left unexplored, at least for now.
The biggest bottleneck right now is that initial JS bundle’s size; it takes too long to download!
The bundle size is 526kb, and it’s not currently being compressed at all. Gzipping to the rescue!
With a Node/Express backend, this is easy; we just need to install the compression module, and use it as an Express middleware.
With that incredibly simple fix, let’s see what effect it has on our timeline:
First meaningful paint: 5000ms -> 3100ms
First image loaded: 6500ms -> 4600ms
All data loaded: 6500ms -> 4750ms
All images loaded: ~15,000ms -> ~13,000ms
The bundle went from 526kb over-the-wire to just 156kb, and it make a huge difference on page-load speed.
With the obvious first step taken, I looked at the timeline. The first paint is at 2400ms, but it isn’t meaningful. It gets better at 3100ms, but all that episode data isn’t received until almost 5000ms.
I started thinking about server-rendering, but that wouldn’t actually fix the problem; the server would still have to make a call to the DB, and then a call to the TV Maze API. Worse, the user would be staring at a white screen while the server did its work.
Why not use local-storage? We can persist all state changes to the browser, and rehydrate from that state when the user returns. The data will be stale, but that’s OK! The real data won’t be far behind, and this will make the initial load feel so much faster.
Because this app uses Redux, persisting/hydrating state is pretty straightforward. First, I needed a way to update localStorage whenever the Redux state changed:
Next, we need to subscribe our Redux store to this method, as well as initialize it with any data from previous sessions:
There were a few kinks to work out, but for the most part, this was a really simple change, thanks to how Redux is architected.
Let’s take a gander at the new timeline:
Cool! It’s hard to tell from how small the captured screenshots are, but our very first paint is meaningful now; it contains a full list of the shows and episodes from our previous session: at 2600ms
First meaningful paint: 3100ms -> 2600ms
Episode data available: 4750ms -> 2600ms (!)
While this hasn’t actually affected the loading time (we still do make those API requests and they still take a while), the user has data immediately, and so the perceived speed improvement is very noticeable.
Gone is the staggered, things-keep-changing second where content appears as it’s available. While this is often a popular technique for getting stuff on the page sooner, it can be overwhelming when the page keeps updating as new content is available. I much prefer being able to render the “final” UI immediately.
As an extra bonus, this winds up being pretty useful in non-perf ways too. For example, users have the ability to change the sorting of shows, but before this change, that preference would be forgotten when the session ends. Now, that preference is restored when they come back!
There is a downside to this, though: it’s no longer clear whether you’re still waiting for new data or not. I plan to add a spinner in the corner that shows whether additional requests are still being waited on or not.
Also, you may be thinking “This is great for returning users, but does nothing for new users!”. You’re right, but actually, this isn’t applicable for new users. New users have no tracked shows, and so their page load is super quick; just a call-to-action to start adding shows. So we’ve effectively killed the experience of “staring at a black screen forever” for all users, new and returning.
Even with this latest improvement, images are still taking forever to load; this timeline doesn’t show it, but it still takes 12+ seconds for all images to be loaded, with 3G speeds.
The reason for this is simple: TV Maze returns large movie-poster-style photos, whereas I only need a narrow strip, used to help tell shows apart at-a-glance.
To solve this problem, my initial thought was to use something like ImageMagick, a wonderful CLI tool I used while making ColourMatch.
When the user adds a new show, the server would request a copy of the image, use ImageMagick to crop out the middle of the image, send it over to S3, and use the S3 URL on the client, rather than using the TV Maze image link.
Rather than deal with this myself, though, I decided to outsource this concern to Imgix. Imgix is a service that sits in front of S3 (or other cloud storage providers) and allows you to dynamically create cropped, resized images. You just use a URL like this, and it creates and serves an appropriate image:
A nice bonus is being able to crop based on interesting areas of the photos. You’ll notice in the left/right comparison photo above that it crops the 4 kids on the bikes, instead of just cropping the exact center of the image.
For Imgix to work, the image has to be available via S3 or similar. Here’s a snippet from my back-end code, which uploads an image when a new show is added:
By running every new show through this promise, we get images that are ready to be dynamically cropped.
On the client, I use image properties srcset and sizes make sure that images are being served based on the window size and display pixel ratio:
This helps ensure that mobile clients get the larger version of the image (since those cards wind up taking up the whole viewport’s width), whereas desktop clients get a slightly smaller version.
Each image is now way smaller, but we’re still loading an entire page worth of shows at once! On my large desktop window, only 6 shows are visible at once, but we fetch all 16 images at once, on page-load.
Happily, the awesome package react-lazyload offers really simple lazy loading. The code is as simple as:
Alright, it’s been a while since we looked at a timeline.
Our first-meaningful-paint numbers haven’t changed, but image download times are way better:
First image: 4600ms -> 3900ms
All visible images: ~9000ms -> 4100ms
Eagle-eyed readers might have noticed that this timeline only downloads episode data for 6 episodes, instead of all 16. This is because my initial attempt (and the only one I remembered to capture) lazy-loaded the episode card, not just the show’s image.
Ultimately this introduced more problems than I was able to solve in this weekend-long-perf-tune, and so I simplified it. The impact on above-the-fold image load-times is unchanged, though.
We’re definitely getting to a pretty good place, perf-wise.
One obvious issue is that we only have a single bundle. Let’s use codesplitting to reduce the amount of on-request code needed!
Because I’m using React Router 4, it’s a simple matter of following the docs to create a
<Bundle /> component. I played around with a few different configurations, but ultimately, there wasn’t a lot of splitting that made sense.
In the end, I split out the mobile views from the desktop ones. The mobile version has its own views, which use a swiping library, custom assets, and a few extra components. This bundle wound up being surprisingly small — about 30kb before compression — but it nevertheless had a noticeable impact:
First meaningful paint: 2600ms -> 2300ms
First image loaded: 3900ms -> 3700ms
Lesson learned: codesplitting’s effectiveness is hugely dependent on the given application. In this case, the biggest dependencies — React and its ecosystem packages — are used across the site, and don’t need to be split off.
The components themselves could be split off at the route level for marginal gains in initial page load, but then you introduce additional latency on every route change; dealing with spinners everywhere isn’t fun.
I did toy with the idea of rendering a “shell” — a placeholder with the right layout, but without data — on the server.
The issue I foresaw was that the client already has access to the previous session’s data, through localStorage, and it initializes with that data. The server isn’t privy to that, and so I’d wind up with the warning about markup not matching between client and server.
I figure that I might’ve been able to shave half a second off my first-meaningful-paint time with SSR, but the site wouldn’t be interactive in that time; a personal pet-peeve of mine is when a site looks ready, but isn’t.
Plus, SSR introduces a lot of complexity, and can slow down development time. Performance is important, but “good enough” is good enough.
Something I’m interested in exploring, but haven’t found the time, is compile-time SSR. This would only work for static pages like the logged-out homepage, but I can imagine it being hugely effective. As part of my build process, I’d create and persist the
index.html. This would get served to the users by the Node server as the plain HTML file that it is. The client would still download and run React, so the page would become interactive, but there would be zero server-side build time, since we’ve paid that cost before the code was even deployed.
An idea I thought had a lot of potential was to serve React and ReactDOM from a CDN.
Webpack makes this easy; you can specify an externals key to have it not bundle the given dependency.
It seemed as though there were two strong benefits to this approach:
- Serving a popular library from a CDN means it’s likely to already be cached for the user
- Dependencies could be parallelized, downloading in tandem with the app bundle, instead of being a single large file.
I was surprised to see, at least in the worst case where the CDN hasn’t cached it, that moving React to a CDN was harmful:
First meaningful paint: 2300ms -> 2650ms
You’ll notice that React and React DOM are downloaded in parallel to my main desktop bundle, and yet… it actually slows down the total time.
I don’t want to imply that using CDNs like this are never a good idea. I’m not an expert in this stuff and it’s entirely possible that this is something I’m doing wrong, not a flaw with the idea! At least in my case, though, it didn’t pan out.
There are two main ideas I hope this post communicates:
- Small side-project apps are pretty fast out-of-the-box, but a weekend of experimentation can yield huge speed improvements. The Chrome developer tools make it really easy to poke around and see where the bottlenecks are, and it may surprise you how much low-hanging fruit there is. Servicers like Imgix allow you to defer hard problems to other people, often for free or for very low cost.
- Every application is different. This post details specific tricks for Tello, which has a very unique set of concerns. Even if these tips aren’t directly applicable in your app, I hope I’ve showcased how performance is a creative part of web development.
For example, the conventional wisdom is that server-side rendering is the way to go. In many applications it is, but maybe a client-side solution using local-storage and/or service-workers would be better! Maybe you can pay some of the cost of SSR during compile-time, or maybe you can do what Netflix does for some pages, and skip shipping React to the client entirely!
Performance is actually really fun when you realize how much creativity and outside-the-box thinking is involved.
Thanks for reading! I hope this was helpful :) Lemme know what you think on Twitter.
View Tello’s source on Github.🌟