01 / Article

Prevent 404s on bulk resource requests

Chris Calo

6 min read

404s are a problem, and are sometimes unavoidable. HTTP requests are expensive. In all cases, when making a resource request - if it fails - you are missing precious time, elsewhere. If a fallback is available for assets that may fail, loading it once and eliminating later calls will help a lot!

Now the only question is, how do we know if something is going to fail before it fails?

Problem

Often times, when loading bulk images from an API, you're at the mercy of the data provider. You receive a bunch of meta data, including a URL string for an image. You queue it up as an HTTP request, and fire away. Some come back good - actually, most do - but 5–10% are useless. Something on the API changed, or some static images fell from a remote cache, and you're on the hook for it. The JavaScript console glows red. Your page-load time dwindles behind schedule.

What we'd like to be able to do is preempt which request(s) will fail. And, if they will, we want to load a low-cost alternative, instead. Or, worst case, make a single HTTP request to a fallback image, and stop calling the other duds. This will limit the scope of your resource loading venture, to only those required.

Note: this approach assumes you have some knowledge of the entire data set. Such as, a list of potential IDs, filenames, et cetera.

Example

You are building an app to sell fresh produce to subscribers. You have access to an API from a local distributor, and it makes a few guarantees. You will always have a collection of up-to-date SKU numbers (IDs, if you will) for their products. When querying an end-point, say /fruits/:sku, a JSON response is returned. Each response includes a product name and some information about their latest inventory. We get a harvest date, a sell-by date, the name of the grower, and a URL to a macro image of the fruit. The image is of extreme importance! Otherwise, how will you tease customers with close-up photos of the freshest blueberries and the ripest mangos?

Debranding our self branding via blueberries and mangos

Well, the only thing worse than not showing the tantalising close-up is showing nothing at all. As a small group of growers and distributors, our provider has its database up-to-date. But, their images sometimes lag behind. As such, we will maintain a cache of images week to week. If we know an image will be missing, we will load last week's kiwis. It may not be the youngest of the bunch. But, unless something is drastically wrong, last week's kiwis don't look much different from this week's kiwis to our customers. It'll suffice until the images are updated later this afternoon.

And, of course, our API doesn't provide the courtesy of an imageAvailableBoolean flag. They seem to think of images as more of a nice-to-have. Whereas, on our end, our application depends on their availability.

Solution

We will solve this with a unique caching mechanism, leveraging age-old algorithmic knowledge. All our SKUs (again, IDs for the database-oriented) will be known. The only real problem is whether, at run-time, we will have all their bits and bobs. Easy fix! Let's cache the IDs that have functional data associated with them. Anything that seems askew will ignore, and we add everything else to an array.

This approach is language agonistic. But, since JavaScript is a favourite these days, we use ES6-flavoured JS.

Dope. So, now we have an array of our quality objects. Sorted, too!

Do note, we are using a fictitious http-util module here. This could also replace this with an XMLHttpRequest wrapped in a Promise. Or, some off-the-shelf utility that makes HTTP requests synchronous. All that we care about is that we can check the state within the context of our loop.

Alright, so, why sorted? Most of the audience with at least a tenuous grasp on algorithms likely knows where we're going here. Binary search is the runtime answer. It's easy to put in place, cheap to use, and quick - ringing in at a decent O(log n). (In other words, not the best, but workable. We're favouring legibility to micro-optimisations here.) We can run the above offline, as part of a build-step, to get our big array of SKUs. And then, we will swiftly dance over our dataset at runtime. If we find the appropriate SKU, we get the latest image, if not we load from cache.

Again, the beauty here is that this is quite simple. Here's where we'd actually get our image, when we go to render everything:

Again, making up fantastical HTTP modules. They exist, and are common and easy to use, but pseudocode is easier to write in, and keeps us general!

The Payoff

The result? Much faster load times, only getting the data we care about. Also, the added benefit of fewer holes in our user-interface. The designers, and users alike, will appreciate our effort!

In theory, we don't need to have images in cache as a fallback to use this method. It's simply how we illustrated our little story, here. This could also do this with something like headshots for a sport-scoring application. If you've already saved headshot images you've collected in your S3 bucket, and it's available at runtime; great! If not, show the classic silhouette SVG you have baked into your stylesheet. All the better! You didn't have to hunt down the athlete's picture, and you don't need make the costly HTTP round-trip for nothing!

When and Where to Use It

The first bit of code above is for offline use, or as little as possible on the server. Ideally, it would be built into your build-step (if you're releasing enough to keep up with image updates). Or, run as a cron-job for some fixed interval. In our produce example's case, perhaps daily at 08:00 in the morning. The latter bit of code is embedded in your application, consuming the array at runtime, in your rendering function.

New horizon of earth launch space junk in style of our 404

To put this in perspective, imagine you are building a React web application, with webpack. The build config should have a step in it to create an initial array of valid images. (And, optionally, queue up a cron-job for when to run it next.) You store the resulting array in a locally accessible JavaScript file. The binary search utility will be placed in some helper module you create. In which, all images of shaky presence can be searched via a corresponding routine. Each function will then be called in your components' render() methods, when needed.