Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More webpack updates - package size optimizations and long term caching #1126

Merged
merged 22 commits into from
May 29, 2020

Conversation

salvatore-fxpig
Copy link
Contributor

@salvatore-fxpig salvatore-fxpig commented May 14, 2020

Description of proposed changes

The goal of this pull request is providing a stable and definitive solution to transpilation of extensions, using babel to transpile all the code (app and dependencies). This increases the build time to around 75s which is still largely tolerable, considering that rebuild time in dev mode is not affected.

while at the same time tackling the problem of package size creep by implementing a number of optimizations:

  • Splitting packages in vendor bundles and appbundle for long term caching.
  • Splitting away locales
  • Splitting away some heavy dependencies that are used only on specific paths, notably papaparse, dompurify, marked, react-ga
  • Using the optimized (smaller) version of react-select, excluding non-used parts of the library
  • Fixing the default @extensions path to avoid the webpack import being bamboozled
  • Reducing the package size of d3-scale by selectively including the used modules
  • Reducing the package size of lodash by applying lodash-webpack-plugin

Related issue(s)

#1106
#902

Testing

I tested the build, with nextstrain.org extensions enabled (but running as standalone), and removing the forced include of the transpiled version of iso-639-1 from the extension code, and it displays in IE11 and Safari 5.1 (the latter with some errors, the same on current master build).

The size of the core app bundle (i.e. the bundle that is downloaded when navigating to localhost:4000/ and then picking e.g. dengue) is reduced to 390kB (from 400kB), of which around 200kB can be cached on long term (i.e. change a couple times per year) and another 50kB can be cached for mid term (i.e. change once every couple months).

To make the nexstrain repo fully compatible with the present PR, a minimal update of the code is required to make nextstrain.org serve index.html from the dist/ folder, I opened a separate (backwards-compatible) PR on nextstrain.org.
nextstrain/nextstrain.org#166

Some more cross-browser testing would probably be a good idea.

Thank you for contributing to Nextstrain!

@salvatore-fxpig
Copy link
Contributor Author

It must be noted that absolute package size reduction is not that impressive mainly because full babel transpilation + some arrangements required to ensure real stable long term hashes end up bloating the package a fair bit.

@salvatore-fxpig
Copy link
Contributor Author

If this gets merged it's very important that somebody from the core team goes through the core-vendors configuration to make sure that those modules are really stable, both internally (i.e. you're happy with using same version of the module for a long time) and externally (i.e. the module is not going to be switched out).

@salvatore-fxpig salvatore-fxpig force-pushed the webpack-updates-4 branch 3 times, most recently from 5c0dc2b to db615d5 Compare May 14, 2020 19:44
Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool, and it's going to make auspice a lot better 😄

I've gone through the code and it all looks good, but have only tested briefly so far & will do more over the coming days.

In the meantime, let's get #1120 merged as this sits on top of that, and the comments there will also apply here.

babel.config.js Outdated
"@babel/preset-env",
{
useBuiltIns: "usage",
targets: "cover 95%",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this covers IE9 which is still needed for Auspice

async ensureLanguageResources(lang) {
for (const ns of ["language", "sidebar", "translation"]) {
if (!i18n.hasResourceBundle(lang, ns)) {
const res = await import(/* webpackMode: "lazy-once" */ `../../locales/${lang}/${ns}.json`); // eslint-disable-line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the locales are all bundled together (thanks for the clear comment related to this). Does this "lazy-once" behavior allow for locals to each be in their own chunk if one day we stop enforcing them to be together via the wepack config?

Copy link
Contributor Author

@salvatore-fxpig salvatore-fxpig May 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically you have 3 scenarios:

  1. Without an explicit locale config in webpack.config.js, and with no webpackMode: "lazy-once", each locale will create its own micro-chunk. This may seem a good idea at first glance because most people are only ever going to use one locale, however what happens is that various chunk merging plugins tend to merge these micro-chunks in very unpredictable ways and you get the locales bundled up with other stuff, defeating the purpose. So I would advise against this.

  2. With lazy-once and the explicit single-package locale config (as is now), the lazy-once is basically ignored, it only serves as a safeguard against scenario (1) above.

  3. If you decide to split locales further, e.g.

          locales1: {
            test: /\/src\/locales\//,
            name: "locales-others",
            enforce: true,
            priority: 1,
            chunks: "all"
          },
          locales2: {
            test: /\/src\/locales\/(es|pt|fr)\//,
            name: "locales-es.pt.fr",
            enforce: true,
            priority: 2, // This priority must be higher than the "locales-others" chunk
            chunks: "all"
          },

in that case you should absolutely remove the lazy-once directive because otherwise when the locales get lazy loaded, all the separate packages will be loaded at once (even if they are in multiple chunks, all chunks will be requested at once), thus nullifying the benefit of splitting them.

/**
* Here we put the libraries that are unlikely to change for a long time.
* Every change or update of any of these libraries will change the hash
* of the big vendor bundle, so we must be sure they're stable both internally
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this needs changing, the bundle hash will change and so will the script src (in dist/index.html), correct? And this means that the client will treat it as completely different request, and thus there's no danger of using the old, now invalid bundle from the cache?

If this is true, is there benefit to adding a (large) max-age to the response for these bundles, since if their contents change the hash will also change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes all of the above is correct, I didn't check how you're caching .js files atm on the public site but basically the whole point of this setup is you set index.html to be served with no no cache whatsoever (something like no-cache,no-store,must-revalidate), then you set nice and long (multiple months) caching for the .js part, and the hashes in the names of the scripts take care of cache busting for you.

And when you build the index.html will generate automatically with the correct hashes, modifying only those that should be modified.

Copy link
Member

@jameshadfield jameshadfield May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! While nextstrain.org uses a custom server, I know there are other groups using auspice view as a production server, so we should set these here. Playing around with it, it seems as simple as

- res.sendFile(path.join(auspiceBuild.baseDir, "dist/index.html"))
+ res.sendFile(path.join(auspiceBuild.baseDir, "dist/index.html"), {headers: {"Cache-Control": "no-cache, no-store, must-revalidate"}});

and

- app.use("/dist", expressStaticGzip(auspiceBuild.distDir));
+ app.use("/dist", expressStaticGzip(auspiceBuild.distDir, {maxAge: '365d'}));

but wanted to double check with you (and the nextstrain.org server uses very similar code)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for forgetting to reply here, yep it should suffice as far as I can tell

});
const pluginHtml = new HtmlWebpackPlugin({
filename: 'index.html',
template: './index.html'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the template is unusable by itself, would it be more typical to move this to src/index.html?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I think that could be a good idea 👍

Copy link
Member

@jameshadfield jameshadfield May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great -- i'm in favor of this. But currently, this doesn't build when auspice is run as a project's dependency with extensions. For instance, on nextstrain.org we get

internal/fs/utils.js:230
    throw err;
    ^

Error: ENOENT: no such file or directory, open '/Users/naboo/github/nextstrain/nextstrain.org/node_modules/auspice/index.html'
    at Object.openSync (fs.js:457:3)
    at Object.readFileSync (fs.js:359:35)
    at Object.exportIndexDotHtml (/Users/naboo/github/nextstrain/nextstrain.org/node_modules/auspice/cli/utils.js:112:17)
    at /Users/naboo/github/nextstrain/nextstrain.org/node_modules/auspice/cli/build.js:57:15
 ...

I think this is relatively straightforward to fix by updating exportIndexDotHtml()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually with index.html being outputted in the /dist folder, exportIndexDotHtml() becomes completely obsolete. As far as I can see, so I think it can be killed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of keeping the risk of breaking other moving parts at a minimum, it's still probably better to updated exportIndexDotHtml. This new updated version I pushed is compatible with nextstrain.org master as well.

@salvatore-fxpig
Copy link
Contributor Author

I added the "new" node_modules resolution from #1120 in a slightly different flavor, virtually this implementation is more robust, in practice it's gonna be the same for 99.99% of practical cases.

@salvatore-fxpig
Copy link
Contributor Author

I also switched a couple of dependencies and devDependencies, namely putting @babel in devDependencies was causing the build to fail when auspice is used as a module

@salvatore-fxpig
Copy link
Contributor Author

salvatore-fxpig commented May 26, 2020

In this latest commit I updated the bundling strategy with another tiny bit: using as entry point a micro-async entrypoint that only imports (asynchronously) the main, real entrypoint, and is hashed with [contenthash].

The reasoning for this is as follows:
If long term caching policies are applied, it will happen in some builds that one component chunk is updated (e.g., only the <Main> component is updated so its chunk hash, alone, is updated), while all the rest of the code and importantly all of the code contained in the entrypoint bundle is left untouched.

Now, in webpack, the URL of the initial synchronous chunks are dropped in the HTML, but the URLs of the async chunks are stored in the entrypoint (the main, non-numbered JS bundle). This means that we have this situation:

  1. If we use an algorithm that ignores the additional webpack code when calculating the hash (e.g. md5 [chunkhash]), the entrypoint bundle will keep the same hash, but people with a cached copy will then have an entrypoint bundle that points to a js chunk that does not exist anymore, and the site will break for them until they CTRL+F5
  2. If we use for the entrypoint bundle an algorithm that considers everything in the bundle, including the (modified) webpack import code, like [contenthash], then every time we change code anywhere in the codebase, both the hash of the chunk that includes that code and the entrypoint bundle hash will change, forcing everybody to download the entrypoint bundle (47kB gzipped) again.

The solution: having an ant-sized entrypoint bundle (<1.5kB) that basically just contains the updated URLs of the chunks at every build, and is [contenthash]ed so that everytime that any of the chunk hashes change, its hash changes, updating for everyone, so that caching is intact and overconsumption of bandwidth is minimal.

@salvatore-fxpig
Copy link
Contributor Author

To be noted that the bundlesize test, which hardcodes the vendor bundle hashes, has been passing nicely through these latest modifications which hit both the entrypoint and the webpack configuration, so we can see that they're decently stable... !

Due to the hardcoding of the hashes in the bundlesize configuration, it means the bundlesize test will fail if for some reason the vendor bundles for long term hashing are unwittingly changed.
Example: in a commit before, without realizing it, I had added back the

resolve: {
      mainFields: ['browser', 'main', 'module'],
...
}

from the other PR.
The build was still successful, but obviously the whole code (and hashes) changed, and the bundlesize test started to fail, which helped me figure out what was wrong.


componentWillMount() {
async componentWillMount() {
if (!this.props.language || this.props.language === "en") return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all translate keys return the same string value in english -- t(x) === x is mostly, but not always, true. I think the current implementation deliberately doesn't load the English local, instead utilizing the fact that i18n will fall back to the provided strings. This results in errors in only a few places, for instance clicking on the "Download data" modal produces

image

I think an acceptable solution would be to let the English locale be bundled wherever webpack wants to put it (it's <2kb gz) but keep the other languages in a separate bundle which is only fetched when necessary (as done now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, I had completely missed this. Yes I think your solution is best, I modified a tiny bit index.js and webpack.config.js to accommodate for it.

Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resulting bundle sizes are pretty amazing and i'm really excited to get this out as it'll make a big difference.

The hot reloading ability of auspice develop seems to have gone with this PR (it was never perfect, but it's become worse here). The bundle is still being rebuild (the server prints messages to this effect, and reloading the page reflects changes) but the browser doesn't auto-update with changes. We can potentially fix this after merge if needs must.

@salvatore-fxpig
Copy link
Contributor Author

I must walk back on my initial claim that the modified node_modules searching function in webpack.config.js wouldn't make a difference in 99.9% of cases... It turns out that actually such a package conflict exists with the master version of nextstrain.org and as such the new searching function (which I further amended to solve a bug in it) turns out to be absolutely essential!

@salvatore-fxpig
Copy link
Contributor Author

The resulting bundle sizes are pretty amazing and i'm really excited to get this out as it'll make a big difference.

The hot reloading ability of auspice develop seems to have gone with this PR (it was never perfect, but it's become worse here). The bundle is still being rebuild (the server prints messages to this effect, and reloading the page reflects changes) but the browser doesn't auto-update with changes. We can potentially fix this after merge if needs must.

I managed to make (most of) hot reload work again altho it's not exactly a best-in-class solution, it should suffice while we try to figure out why it does not work as expected in the first place.

Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful. I'm going to take care of the cache-related headers in separate PRs (for both auspice & nextstrain.org), but this is working well in all my testing setups & represents a big step forward for auspice. Thanks again @salvatore-fxpig!

@jameshadfield jameshadfield merged commit 2a8f8a5 into nextstrain:master May 29, 2020
jameshadfield added a commit that referenced this pull request May 29, 2020
This commit (building on top of #1126) enforces that
index.html is never cached, as it gets recreated when auspice rebuilds and the src for the main JS bundle changes accordingly. This may not be strictly necessary, as the max-age was 0 and the ETag changes as the file changes, but setting this explicitly is good practice.

JS bundles, which use hashes to facilitate cache busting, are served with a 30d cache lifetime. Note that this is not enough to _force_ browsers to not revalidate within the 30d (i.e. 304 response) -- see [this post](https://stackoverflow.com/questions/26166433/how-to-prevent-request-that-returns-304/26339940#26339940) for context. Revalidation during the 30d lifetime will return a 304 which is no worse than the current situation we have.
@jameshadfield jameshadfield mentioned this pull request May 29, 2020
jameshadfield added a commit that referenced this pull request May 29, 2020
This commit (building on top of #1126) enforces that
index.html is never cached, as it gets recreated when auspice rebuilds and the src for the main JS bundle changes accordingly. This may not be strictly necessary, as the max-age was 0 and the ETag changes as the file changes, but setting this explicitly is good practice.

JS bundles, which use hashes to facilitate cache busting, are served with a 30d cache lifetime. Note that this is not enough to _force_ browsers to not revalidate within the 30d (i.e. 304 response) -- see [this post](https://stackoverflow.com/questions/26166433/how-to-prevent-request-that-returns-304/26339940#26339940) for context. Revalidation during the 30d lifetime will return a 304 which is no worse than the current situation we have.
jameshadfield added a commit that referenced this pull request Jul 14, 2020
When running `auspice view` there is some logic to ask if we have a customised client bundle present, which we should serve, or whether we should serve the vanilla (default) auspice client.

PR #1126 introduced hashes into the (client) JS bundle names, which resulted in this logic missing when there was a customised client present, so we would always fall back to the vanilla auspice client.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants