Posted: June 03, 2019

Image Optimization with Gatsby

TL;DR - Websites built with Gatsby primarily solve the problem of image optimization by querying for images using GraphQL and then inserting them into their React components using gatsby-image. For images referenced from Markdown, the vast majority of the plugins out there are primarily geared towards manipulating images that are already included in the repository’s filesystem. To optimize images stored remotely, I tried a bunch of things and they didn’t work, so I gradually built a solution that did.

The solution is published here, rather than being hosted on GitHub (at the moment), because I am waiting on a response from one of the library authors whose own plugin for Gatsby ended up providing crucial context in me solving the case of remote image optimization. Once all that is sorted out, there is some more work to be done to genericize the plugin (and provide auth options for images that require htaccess …) prior to it being published. In the meantime, if you’d like to add image optimization for remote images to your own site, you can! Take the code and run with it - and thanks for reading!


Initial setup - creating a local Gatsby plugin

There is some initial setup when using local plugins with Gatsby:

  • you need to have a plugins folder at the root level of your project
  • you need to have a folder for each local plugin you’d like to use with its name
  • inside that folder, you need to set up a package.json with the dependencies for your plugin, and its source code
  • … profit. JK - after writing your source code for the plugin you need to include it in your plugins list within gatsby-config.js

A great way to learn how to build a local plugin is covered in this Gatsby tutorial.

How the “problem” started

I wanted to share with you the process of building a Gatsby plugin to download images referenced in my CMS’ markdown so that Gatsby’s native gatsby-remark-images could optimize the images for me.

But let me back up a second first; let me start over by outlining the “problem” and why I even got into this rabbit hole to begin with. My girlfriend and I are currently traveling the world - several months ago we quit our day jobs with the intent to take some time for ourselves. I had previously been using Next.js for both work and my personal site but had been interested in learning more about Gatsby after seeing that React’s site was built using it. My only acceptance criteria was making a site that could be hooked up to a CMS that was easy for my girlfriend to use; bonus props if it had an iOS app so we could write blog posts on the road. After spending less than an hour with Gatsby and exploring the rich plugin library, we were off to the races!

Any regular reader of r/reactjs (or user of Gatsby) will recognize some of the features of our blog; on the front end, I wasn’t trying to reinvent the wheel. 1

Yet a funny thing happened while we were traveling in Slovenia - we had just finished up a three day packrafting trip when our guide emailed us the photos of us on the river … and they wouldn’t load. Some of them were upwards of 10 megs. Up until this moment, neither of us had truly noticed how constrained our access to bandwidth was, but our friend’s WiFi was intermittent and even when it worked it could only download the photos with extreme patience. I realized that hosting the photos off of the site was preventing us from something as simple as previewing the above post while we were working on it. Something had to be done! But what … 🤔

Step one: Cover all your bases

The first thing I did was check out the Gatsby plugin library. No need to reinvent the wheel, remember??

No dice. There was a plugin that looked promising, gatsby-plugin-remote-images but it only accepts lodash’s .get method for file matching and my CMS’s remote file paths for the images don’t match the get API. Bummer. The plugin wouldn’t work for me anyway because all it does is download the images for you - ideally, I wanted to not only download the images, but get the relative path for them working in my Markdown so that gatsby-remark-images would just work. 2

At first I thought I might be able to write a plugin that would just download the images and reference their downloaded path - it looked like I might be able to chain that together with gatsby-remark-copy-linked-files to achieve the effect that I was hoping for: 3

Something like this…

// gatsby-config.js
module.exports = {
  plugins: [
    {
      resolve: `gatsby-transformer-remark`,
      options: {
        plugins: [
          {
            resolve: `custom-image-optimizing-plugin`,
            options: remarkImagesOptions,
          },
        ],
      },
    },
  ],
};

However, after examining the Gatsby Node API, I realized I didn’t know enough about what I was doing to proceed. Having authored several other Remark plugins for Gatsby previously, I could see there was a disconnect between what gets passed to Remark plugins and the Gatsby Node API’s you have access to as a transformer plugin. In particular, I realized that I not only needed to download the images, I needed to create new Gatsby nodes using the createNode action passed by Gatsby. 4

Secondly, I came to find that the createRemoteFileNode method exposed by gatsby-source-filesystem didn’t expose a way to specify where the downloaded files went - and their default location without further processing was the .cache folder, which is one of the few areas where Gatsby’s documentation could really use some 💖. After much searching, I found a Spectrum chat where Kyle Mathews (Gatsby’s creator) seemed to suggest that including directories with gatsby-source-filesystem would get gatsby-remark-images to work its magic. I learned two things by including the .cache folder in my gatsby-config.js file:

  • It’s much easier to include files by use of globbing patterns than it is to exclude them (and including the entire .cache folder, which is clearly a bad idea even to somebody as intellectually challenged as myself, generates some scary console statements, let me tell you)
  • Images were being processed by gatsby-remark-images in this fashion, but due to the way it / gatsby-source-filesystem builds out the directories in the public folder by default, there was no deterministic way for me to refer to the optimized images that were produced in this fashion.

I had gotten far - images were being downloaded! - but my code seemed stuck at a dead end:

// the gatsby-node.js file of my localized plugin ...
const { createRemoteFileNode } = require(`gatsby-source-filesystem`);

exports.sourceNodes = async (
  { getNodes, cache, reporter, store, actions, createNodeId },
  pluginOptions
) => {
  const { createNode } = actions;

  const nodes = getNodes();

  let nodeContent = nodes.filter(
    node => node.internal.type === "MyCMSNodeType,YoursMightVary"
  );

  await Promise.all(
    nodeContent.map(async node =>
      findRemoteImagesAndDownloadThem(
        {
          cache,
          createNode,
          createNodeId
          node,
          reporter,
          store,
        }
      )
    )
  );
};

const findRemoteImagesAndDownloadThem = async (
  {
    cache,
    createNode,
    createNodeId
    node,
    reporter,
    store,
  }
) => {
  let { content } = node;

  if (!content && typeof content !== "string") return;

  // I'm sorry, but this regex is what did the trick ...
  const matchedImages = content.match(/\!\[(|.)+?\]\(https:\/\/.+?\)/g) || [];

  let images = [];

  for (var index = 0; index < matchedImages.length; index++) {
    images.push(matchedImages[index]);
  }

  let resolvedPromises = await Promise.all(
    images.map(image =>
      downloadImage({
        cache,
        createNode,
        createNodeId,
        options,
        reporter,
        store,
        image
      })
    )
  );
};

const downloadImage = async ({
  cache,
  createNode,
  createNodeId,
  options,
  reporter,
  store,
  image
}) => {

  // not proud of this one, either, but the match is assured
  // based on the first regex
  const url = image.match(/https:\/\/[^\)]*/)[0];

  const imageNode = await downloadImage({
    cache,
    createNode,
    createNodeId,
    store,
    url
  });
  // ok the image has been downloaded...
  // now what?!!
}

const downloadImage = async ({
  cache,
  createNode,
  createNodeId,
  store,
  url
}) => {
  try {
    let imageNode = await createRemoteFileNode({
      url,
      store,
      cache,
      createNode,
      createNodeId
    });

    return imageNode;
  } catch (e) {
    console.log(`Image download errored: `, e);
  }
};

Between a rock and a hard place

Well, what a pickle I’d landed myself in. I had the files in the computer but I couldn’t seem to replace the remote URLs in my markdown with the localized images in a way that worked. I had wanted, quite badly, to get the images to a point where they could just be recognized by gatsby-remark-images without any further leg work on my part. Yet no massaging of the absolutePath property returned by the image node downloaded using createRemoteFileNode seemed to be picked up by gatsby-remark-images.

At this point I nearly threw in the towel - I’d already gone this far, and we still couldn’t load our website easily. Each dev run of my Gatsby server was taking 5+ minutes to start up, and I had nothing to show for it so far. Defeat seemed imminent. Browsing the Gatsby plugin library one last time, I got lucky 6 - I found a plugin that was promising to do almost the exact same thing for Wordpress sites, gatsby-wordpress-inline-images. Reading through the source code, I finally saw the link between gatsby nodes and what could be passed to gatsby-plugin-sharp - normally done via one of the various GraphQL fragments that come with gatsby-plugin-sharp, the output could still be passed to gatsby-image.

The only thing missing was some further customizations that I wanted:

// ... regex to strip out the title attribute from the incoming markdown ...
<figure>
  <a href="/now_local_path_to_image">
    <Image fluid={fluidResult} />
  </a>
  <figcaption style={{ fontStyle: `italic` }}>{strippedOutTitle}</figcaption>
</figure>

This takes care of:

  • semantics (all image elements are self-contained instead of being sheltered in some random div)
  • the image still can respect the option (passed via the same object powering gatsby-remark-images) to provide a source link to the image, which in my particular case is appropriate because while WEBP is the perfect format for display, nobody wants to download a WEBP image …
  • My CMS doesn’t let us set the title property of uploaded images, but it does allow us to set alt text. My conditional logic can setup both the title prop AND caption the image appropriately.

It’s Business Time

The full solution isn’t pretty. I guess I pretty much promised that in the beginning, but I want to say it again because, in imagining my original goal, I had hoped for something elegant and clever - manipulating the pre-existing Markdown from my CMS so that gatsby-remark-images could do the rest of the legwork. In the end, my solution rips out the existing Markdown and replaces it with pre-rendered HTML. I’m positive that a more sophisticated and less complicated solution exists out there. 5

That being said, it’s presented below in all its dubious glory:

// local plugin's gatsby-node.js

const React = require(`react`);
const ReactDOMServer = require(`react-dom/server`);
const { createRemoteFileNode } = require(`gatsby-source-filesystem`);
const { fluid } = require(`gatsby-plugin-sharp`);
const Img = require(`gatsby-image`);

// helper function that strips out the
// original image from the now optimized one
// to set that sweet sweet OG image property in <head>
const getImageSrc = require("../../src/logic/getImageSrc");

exports.sourceNodes = async (
  {
    getNodes,
    cache,
    createContentDigest,
    reporter,
    store,
    actions,
    createNodeId,
  },
  pluginOptions
) => {
  const { createNode } = actions;

  // Sorry, I just can't require lodash just to merge objects ...
  const options = {
    maxWidth: pluginOptions.maxWidth || 650,
    wrapperStyle: pluginOptions.wrapperStyle || ``,
    backgroundColor: pluginOptions.backgroundColor || `#fff`,
    withWebp: pluginOptions.withWebp || true,
    linkImagesToOriginal: pluginOptions.linkImagesToOriginal || false,
    showCaptions: pluginOptions.showCaptions || true,
    nodeType: "MyCMSNodeType,YoursMightVary",
    // always optimize in production, but since optimization is costly
    // give option to not do it in dev
    optimize:
      process.env.NODE_ENV === "development" ? pluginOptions.optimize : true,
  };

  console.log(`Optimizing blog post photos: ${options.optimize}`);

  const nodes = getNodes();

  let nodeContent = nodes.filter(
    (node) => node.internal.type === options.nodeType
  );

  await Promise.all(
    nodeContent.map(async (node) =>
      findAndDownloadRemoteImages(
        {
          cache,
          createContentDigest,
          createNode,
          createNodeId,
          node,
          reporter,
          store,
        },
        options
      )
    )
  );
};

const findAndDownloadRemoteImages = async (
  {
    cache,
    createContentDigest,
    createNode,
    createNodeId,
    node,
    reporter,
    store,
  },
  options
) => {
  let {
    content,
    fields: { slug },
    id,
    ...rest
  } = node;

  if (!content && typeof content !== "string") return;

  const matchedImages = content.match(/\!\[(|.)+?\]\(https:\/\/.+?\)/g) || [];

  let images = [];

  for (var index = 0; index < matchedImages.length; index++) {
    images.push(matchedImages[index]);
  }
  if (options.optimize) {
    let resolvedPromises = await Promise.all(
      images.map((image) =>
        replaceImage({
          cache,
          createNode,
          createNodeId,
          image,
          options,
          reporter,
          store,
        })
      )
    );

    for (
      var resolvedIndex = 0;
      resolvedIndex < resolvedPromises.length;
      resolvedIndex++
    ) {
      const resolvedContent = resolvedPromises[resolvedIndex];
      if (resolvedContent) {
        content = content.replace(
          resolvedContent.oldContent,
          resolvedContent.newContent
        );
      }
    }
  }

  const nodeData = Object.assign(
    {
      content,
      ...rest,
    },
    {
      id: createNodeId(`optimized-${id}`),
      parent: null,
      children: [],
      internal: {
        // create a new node; use the OG
        // transformed markdown for RSS feed
        // use this one for the dangerouslySetInnerHTML
        // on the actual blog
        type: `${options.nodeType}New`,
        mediaType: "text/markdown",
        content: content,
        contentDigest: createContentDigest(content),
      },
    }
  );

  createNode(nodeData);
};

const replaceImage = async ({
  cache,
  createNode,
  createNodeId,
  image,
  options,
  reporter,
  store,
}) => {
  const url = image.match(/https:\/\/[^\)]*/)[0];

  const imageNode = await downloadImage({
    cache,
    createNode,
    createNodeId,
    store,
    url,
  });

  if (!imageNode) return;

  let classes = options.wrapperStyle;
  let formattedImgTag = {};
  let title = image.match(/\[(|.)+?\]/)[0];
  title = title.substring(1, title.length - 1) || "";

  // I know ... somebody help me regex this better.
  if (!title || title === "[") {
    title = ``;
  }

  formattedImgTag.url = url;
  if (!formattedImgTag.url) return;

  formattedImgTag.classList = classes ? classes.split(" ") : [];
  formattedImgTag.title = title;
  formattedImgTag.alt = title;

  const fileType = imageNode.extension;

  if (fileType === `gif` && fileType !== `svg`) {
    const rawHTML = await getStringifiedImageHtml({
      cache,
      formattedImgTag,
      imageNode,
      options,
      reporter,
      title,
    });

    if (rawHTML) {
      return { oldContent: image, newContent: rawHTML };
    }
  }
};

const downloadImage = async ({
  cache,
  createNode,
  createNodeId,
  store,
  url,
}) => {
  try {
    // cache is passed because createRemoteFileNode
    // keeps track of previously requested URLs
    // however the creation of the file node itself
    // is not cached. I thought that using
    // touchNode / defining createNodeId with a
    // deterministic, URL based, value would
    // allow for me to skip the creation of the node,
    // but keeping the node in the filesystem
    // doesn't give me access (here) to its values;
    // it could just be that the value
    // I need to cache is the stringified image HTML
    // to avoid hitting this method entirely.
    // In other words, full caching (and the
    // comcomitant speed that would bring),
    // is still a WIP, and any / all
    // guidance would be appreciated!
    let fileNode = await createRemoteFileNode({
      url,
      store,
      cache,
      createNode,
      createNodeId,
    });

    return fileNode;
  } catch (e) {
    // right now the catch could trigger
    // for images that require auth
    // odds are internet connectivity is
    // the only other thing that would
    // trigger this, which is fine;
    // you'd get an exception well before
    // making it here if that were the case
    console.log(`Image downloaded failed: `, e);
  }
};

const getStringifiedImageHtml = async function ({
  cache,
  formattedImgTag,
  imageNode,
  options,
  reporter,
  title,
}) {
  if (!imageNode || !imageNode.absolutePath) return;

  let fluidResultWebp;
  let fluidResult = await fluid({
    file: imageNode,
    args: {
      ...options,
      maxWidth: formattedImgTag.width || options.maxWidth,
    },
    reporter,
    cache,
  });

  if (options.withWebp) {
    fluidResultWebp = await fluid({
      file: imageNode,
      args: {
        ...options,
        maxWidth: formattedImgTag.width || options.maxWidth,
        toFormat: "WEBP",
      },
      reporter,
      cache,
    });
  }

  if (!fluidResult) return;

  if (options.withWebp) {
    fluidResult.srcSetWebp = fluidResultWebp.srcSet;
  }

  const imgOptions = {
    fluid: fluidResult,
    style: {
      maxWidth: "100%",
    },
    loading: `eager`,
    imgStyle: {
      opacity: 1,
    },
    key: imageNode.id,
    title: title,
    alt: title,
  };
  if (formattedImgTag.width) imgOptions.style.width = formattedImgTag.width;

  const imageLink = getImageSrc(fluidResult.originalImg);

  const ReactImgEl = React.createElement(Img.default, imgOptions, null);
  const AnchorEl = options.linkImagesToOriginal
    ? React.createElement("a", { href: imageLink, key: imageLink }, ReactImgEl)
    : null;

  const FigCaption = title
    ? React.createElement(
        "figcaption",
        { key: title, style: { fontStyle: "italic", fontSize: "13px" } },
        title
      )
    : null;

  const Figure = React.createElement(
    "figure",
    { style: { textAlign: "center" } },
    [AnchorEl, FigCaption]
  );

  return ReactDOMServer.renderToString(Figure);
};

  1. The blog is special to us because we can create awesome travel content on it, and I worked very hard to NOT work hard on it, if you know what I mean. This is something that flies under the radar, particularly in tech culture - technology is a means to an end and, as a perfectionist, resisting the urge to immediately build an RSS email service to alert subscribers to new blog posts (to give one example) proved difficult. That one’s still in the backlog, where perhaps it will remain till the perfect rainy day presents itself.

    ↩ go back from whence you came!
  2. Turns out this was a total pipedream, but I wasn't smart enough to realize it at the time. D'oh!

    ↩ go back from whence you came!
  3. Backfit foresight - I hope I can use the same object taken for this & gatsby-remark-images plugin!

    ↩ go back from whence you came!
  4. I was still going to need the original markdown with remotely sourced images to power my RSS feed.

    ↩ go back from whence you came!
  5. In addition, it seems Kyle Mathews wants more work on images to be done under the Gatsby hood, but this will still only be for images referenced from GraphQL.

    ↩ go back from whence you came!
  6. Sometimes that just happens. Sometimes it doesn't. I'm a spelunker. I love reading source code. I know that I would have gotten to reading gatsby-plugin-sharp's source and making the link between file nodes and what the fluid method takes in; I'm just glad that this one time I didn't have to go the full way down that road just to get a solution working.

    ↩ go back from whence you came!