Investigate tree-shaking #24

borkdude · 2022-07-27T08:57:20Z

When using only one function from cherry/cljs.core.js, e.g. js-keys, and then processing this with a treeshaker like esbuild or ncc, it seems that it's still pulling in the entire package.

Repro:

index.mjs:

import { js_keys } from 'cherry-cljs/cljs.core.js'

console.log(js_keys({a: "hello"}));

shell:

echo '{}' > package.json
npm install --save-dev @vercel/ncc
npm install cherry-cljs
npx ncc build index.mjs -m

Output:

ncc: Version 0.34.0
ncc: Compiling file index.mjs into ESM
297kB  dist/index.mjs
297kB  [1023ms] - ncc 0.34.0

The text was updated successfully, but these errors were encountered:

borkdude · 2022-07-27T09:09:11Z

@thheller I haven't looked into the esm build implementation that closely yet, but maybe this rings any bells?

thheller · 2022-07-27T10:05:24Z

Regular JS tree-shaking is very limited. It is not comparable to the Closure Compiler :advanced optimizations, and recognizes very few patterns. That works when you write code specifically for it, but quickly gives up otherwise.

The :target :esm is just a "hack" to make the :advanced output look like ESM when it technically isn't.

So, for exports it does this and the final JS output it generates looks something like:

// defined as a prepend, not going through :advanced
// each export gets a placeholder name
let export_js_keys;

// then actual advanced output of module
<advanced output>

// defined as a append, also added after :advanced
// each placeholder name is then exported as the actual configured export name
export { export_js_keys as js_keys };

Where part of the :advanced input will include a export_js_keys = cljs.core.js_keys;. The export_js_keys is added to externs so it doesn't get renamed. So, what will remain is export_js_keys = aA (or just js-keys impl directly inlined). This is done because there is no other way to have the Closure Compiler generate export statements itself directly. You cannot feed it export since :advanced will remove it.

The regular JS tree-shaking doesn't recognize this and as such doesn't eliminate anything.

borkdude · 2022-07-27T10:21:55Z

Makes sense.

borkdude · 2022-07-27T11:12:41Z

This might be a hack, but it seemed to work. Invoking the google closure compile once again:

npx google-closure-compiler --module_resolution NODE --compilation_level ADVANCED  --js=node_modules/cherry-cljs/lib/cljs_core.js --js=node_modules/cherry-cljs/cljs.core.js --js=index.mjs  --js_output_file=dist/out.js --language_in ECMASCRIPT_2020

Instead of 300kb we now end up with 160kb.

borkdude · 2022-07-30T08:34:08Z

Perhaps we can do some rewriting manually on the outputted cherry/cljs.core ES6 module as a one time hack on the Closure output. A nice research project of its own which is fairly isolated. If anyone wants to investigate this, help is welcome!

jo-sm · 2022-08-03T13:38:06Z

I did some research and found a pretty comprehensive comment on this topic from Evan Wallace, who wrote esbuild (I also found a related comment). Long story short, the optimizations that Closure Compiler does, by looking at object properties and rewriting/eliminating the ones that aren't used, are considered to be unsafe and inherently problematic, and in general tools such as esbuild or Terser do more conservative static analysis, and this kind of analysis and minimization works well for basically all JS code that isn't written in Closure Compiler's format.

Another point of consideration is that all of the current minimization tools (including the two above) are basically on par or sometimes slightly better at minimizing JS code than Closure Compiler based on this benchmark.

With these in mind, and since cherry is targeting a more standard ES6 output, I would say that it's probably worthwhile to rewrite cljs_core.js and related files to work well with JS minimizers. I believe this should be possible by either making a bespoke script that analyzes the AST of that file and rewrites it (using, for example, babel), or doing it by hand (probably the former since it's automatically generated 🙂).

borkdude · 2022-08-03T13:42:25Z

@jo-sm Thanks. Yes, rewriting the Closure output is probably the best idea. That, or bootstrapping core itself with cherry (which will be more work but maybe in the future, attainable).

borkdude · 2022-08-03T13:46:51Z

(Or hacking Closure)

jo-sm · 2022-08-03T13:58:05Z

I would guess that for the short term rewriting makes sense, but it depends on how simple/complex/feasible bootstrapping in Cherry is. Do you think it makes sense to automate it, and if so, is it fine to hardcode some of the things based on the output generated by shadow (e.g. I noticed that there is $APP)? If that's acceptable then I think it should be somewhat straightforward to rewrite the AST, and I could look into it (I've done something similar experimenting rewriting the Google Closure libraries to be entirely ES6 compatible with import/export).

borkdude · 2022-08-03T14:00:23Z

@jo-sm Yes, it will take a long time before we're able to fully bootstrap, if we will ever get there: it depends on if it's worth pursuing as well. So for now, the rewriting strategy seems good. You can probably get some insights using --debug as well. Thanks for looking into this.

thheller · 2022-08-03T16:12:26Z

Another point of consideration is that all of the current minimization tools (including the two above) are basically on par or sometimes slightly better at minimizing JS code than Closure Compiler based on this benchmark.

I have to strongly disagree here. The JS tools are not even remotely close.

One big caveat of course is that you only get the full benefit if you have code that was written with :advanced in mind. This nowadays includes proper ES6+ code and is no longer limited to ClosureJS. Testing it against random npm libraries is pointless, it also misses the "whole program optimization" part that :advanced is built around. If you disregard this then the Closure Compiler is indeed average. But thats like saying a Plane is as fast as a Car, and never actually letting the Plane fly.

Another huge caveat is of course the flexibility you lose. Sometimes its not worth losing that, especially when publishing libraries that then run through another build tool again. Of course the unsafe part is true, thats why externs are sometimes an issue still.

(e.g. I noticed that there is $APP)?

shadow-cljs uses the $APP object as basically the root of everything. So cljs.core/assoc becomes $APP.cljs.core.assoc but due to :advanced collapsing namespaces and renaming everything its something like $APP.ab. Each module will import this $APP via import and also re-rexport it. It'll also add its own properties to it. Basically this is the "shared global namespace" that we no longer have otherwise because of module isolation.

This is actually just using the :rename-prefix-namespace feature from the closure compiler. Basically it just prefixes all objects that would otherwise end up in the global namespace with $APP.. The modules import and re-rexporting it is just added by shadow-cljs. Multi-module :browser builds also use this.

jo-sm · 2022-08-03T16:49:50Z

We can agree to disagree 🙂 At the risk of making the thread go into the rabbit hole of GCC discussion: Yes it's not running benchmarks against big projects, but it's running benchmarks against nontrivial JS code (these libraries are large, including ones like echarts which is 3MB uncompressed), so it's a decent indication of how the minimizer will work against normal JS code (as in, code that isn't necessarily written with GCC in mind, which the majority of large CLJS projects use via npm). I have not myself benchmarked GCC against other compilers in terms of "whole program optimization" but I am skeptical that the difference matters enough in practice to matter -- but I would be open to being proven wrong here, and I'll leave it at that.

With all this in mind I'll look at rewriting cljs_core.js to export the functions like js_keys directly and see how things go from there. @borkdude I just need to run bb run build to build the npm package files, right?

borkdude · 2022-08-03T16:53:02Z

I just need to run bb run build to build the npm package files, right?

If you want to discuss GCC vs other tools, you can open a discussion here.

thheller · 2022-08-03T17:01:19Z

... , but it's running benchmarks against nontrivial JS code

That is missing the point. The point of this thread and my comment is tree-shaking or dead code elimination, ie. removing unused code. When minifying libraries on their own this obviously cannot be done. It can't know what the consumer is going to use, so it has to keep everything. So, you are really just minifying. We agree that the Closure Compiler is an average (and slow) minifier compared to the alternatives.

That is what the benchmark you linked is testing. It is not testing tree-shaking or DCE, which is sort of what :advanced is all about.

If you are fine with just minifying just use :simple.

borkdude · 2022-08-03T17:45:09Z

I think it's a good discussion to have, but let's continue here: #36

jo-sm · 2022-08-03T18:24:46Z

Hmm, I'm having trouble running bb run build -- I started a thread on the #cherry slack channel to discuss further. I think I might just be missing something minor.

borkdude · 2022-08-03T18:42:25Z

@jo-sm You probably just need to upgrade bb.

borkdude · 2022-08-07T18:05:25Z

Update from #36:

@thheller did some experiments with ES module, but unfortunately, binding (and in general set!) in cljs.core generates a direct re-assignment of vars the were imported which isn't valid JS...

thheller/shadow-cljs@e8e0cf9

This is also a challenge for cherry itself btw.

jo-sm · 2022-09-07T15:31:42Z

BTW just to give a quick update: I worked on it a bit a while back but had to drop it to deal with other things going on (for now, at least). I would like to pick it back up and finish my prototype, but basically what I did was to perform a few operations:

In the original version of cljs_core.js, there're a lot of "indirect" assignments, basically something like var x = y; var y = z which can be simplified to var x = z. I simplify these cases.
If the export is a function, instead of defining it like var some = function () {...}, I define it as function some() {...}.
For everything exported, I determine what exactly is needed from outside of the scope and perform a kind of depth first search to get the order in which these things are defined. We then keep only those, add them to an empty new AST in order, already eliminating code that isn't used (which there seems to be quite a lot of, particularly Google Closure code).

I was also doing this in TypeScript because working with the JS AST is otherwise a huge pain (it still kind of is, but TS makes it much easier to work with and makes me feel more confident because of the complexity of the JS AST).

It was still a bit buggy when I worked on it a few weeks ago, but I was able to kind of get proper dead code elimination working.

borkdude · 2022-09-07T15:35:39Z

@jo-sm Very interesting and hopeful! I'd love to see continued work on this.

borkdude added the help wanted Extra attention is needed label Jul 27, 2022

mk mentioned this issue Aug 4, 2022

Set :min-bb-version in bb.edn #37

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate tree-shaking #24

Investigate tree-shaking #24

borkdude commented Jul 27, 2022

borkdude commented Jul 27, 2022

thheller commented Jul 27, 2022 •

edited

Loading

borkdude commented Jul 27, 2022

borkdude commented Jul 27, 2022

borkdude commented Jul 30, 2022 •

edited

Loading

jo-sm commented Aug 3, 2022 •

edited

Loading

borkdude commented Aug 3, 2022

borkdude commented Aug 3, 2022

jo-sm commented Aug 3, 2022

borkdude commented Aug 3, 2022

thheller commented Aug 3, 2022

jo-sm commented Aug 3, 2022

borkdude commented Aug 3, 2022

thheller commented Aug 3, 2022

borkdude commented Aug 3, 2022

jo-sm commented Aug 3, 2022

borkdude commented Aug 3, 2022

borkdude commented Aug 7, 2022

jo-sm commented Sep 7, 2022

borkdude commented Sep 7, 2022

Investigate tree-shaking #24

Investigate tree-shaking #24

Comments

borkdude commented Jul 27, 2022

borkdude commented Jul 27, 2022

thheller commented Jul 27, 2022 • edited Loading

borkdude commented Jul 27, 2022

borkdude commented Jul 27, 2022

borkdude commented Jul 30, 2022 • edited Loading

jo-sm commented Aug 3, 2022 • edited Loading

borkdude commented Aug 3, 2022

borkdude commented Aug 3, 2022

jo-sm commented Aug 3, 2022

borkdude commented Aug 3, 2022

thheller commented Aug 3, 2022

jo-sm commented Aug 3, 2022

borkdude commented Aug 3, 2022

thheller commented Aug 3, 2022

borkdude commented Aug 3, 2022

jo-sm commented Aug 3, 2022

borkdude commented Aug 3, 2022

borkdude commented Aug 7, 2022

jo-sm commented Sep 7, 2022

borkdude commented Sep 7, 2022

thheller commented Jul 27, 2022 •

edited

Loading

borkdude commented Jul 30, 2022 •

edited

Loading

jo-sm commented Aug 3, 2022 •

edited

Loading