Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby): use V8.serialize instead of JSON.stringify if available #10732

Merged
merged 24 commits into from
Apr 11, 2019

Conversation

stefanprobst
Copy link
Contributor

Use v8.serialize instead of json-stringify-safe for persisting redux state for a small performance gain (from ~250ms to ~150ms in the markdown benchmark). Useful?
NOTE: Could only be merged once we switch minimum Node version to 8.0.

@stefanprobst stefanprobst requested a review from a team as a code owner December 30, 2018 21:56
@KyleAMathews
Copy link
Contributor

Oh nifty. Didn't know about this. We could check the node version being used and use one or the other.

@pieh
Copy link
Contributor

pieh commented Dec 31, 2018

or just straight check if v8.serialize and v8.deserialize exists, and fallback to current method if they don't. @stefanprobst, did you measure memory usage by any chance with your changes (json-stringify-safe can hog so much memory sometimes)?

@stefanprobst
Copy link
Contributor Author

I'll add a fallback next year, @pieh no memory usage numbers yet, but it seems that v8 serialization happens out-of-process, i.e. process.memoryUsage().heapUsed stays constant.

@stefanprobst
Copy link
Contributor Author

Some memory usage numbers: markdown benchmark is at 73%. Also did a benchmark with a 100.000 entries Map (10 run average):

v8
time: 466ms
ΔheapUsed: 3 MB
Δexternal: 207 MB

json-stringify-safe
time: 1577ms
ΔheapUsed: 444 MB
Δexternal: 112 MB

Two questions:

  • why json-stringify-safe and not JSON.stringify?
  • any reason for writeFile not writeFileSync in the event callback?

@DSchau
Copy link
Contributor

DSchau commented Jan 2, 2019

@stefanprobst whoa! That is looking encouraging! Nice work!

@pieh
Copy link
Contributor

pieh commented Jan 2, 2019

  • why json-stringify-safe and not JSON.stringify?

"Like JSON.stringify, but doesn't throw on circular references." - there were issues with those in the past, so this package was taking care of them.

  • any reason for writeFile not writeFileSync in the event callback?

We shouldn't be blocking main thread here. Are there any advantage to using sync version?

@stefanprobst
Copy link
Contributor Author

stefanprobst commented Jan 2, 2019

circular references

Where are they coming from? I'm asking because we cannot recreate the references with JSON.parse? And also because easiest would be to just use jest-serialize.

any advantage to using sync version?

From the docs: It is unsafe to use fs.writeFile() multiple times on the same file without waiting for the callback.

@stefanprobst
Copy link
Contributor Author

Has anyone had a chance to test this?

@pieh
Copy link
Contributor

pieh commented Jan 11, 2019

@stefanprobst We didn't.

This will need to wait at least a week, because most of us will be offline next week - so we wouldn't be able to deal with any potential bugs it might introduce

@stefanprobst
Copy link
Contributor Author

stefanprobst commented Jan 15, 2019

One issue: v8.serialize throws when it encounters a function, e.g. like here.

EDIT: This is also an issue with gatsby-plugin-feed, which allows functions in plugin options.

@ivanoats
Copy link
Contributor

This looks really promising. We could use it on large (18K+ pages) site that is not building because our cache is too big (can't serialize redux index saveSate) here.

How could we help with this?

@stefanprobst
Copy link
Contributor Author

How could we help with this?

@ivanoats Would be awesome if you could test this with your site and report back your findings!

@stefanprobst
Copy link
Contributor Author

Not sure how to proceed with this, but another idea is to break the node store out of app state, i.e. keep using json-stringify-safe for everything but the nodes redux namespace, where we could switch serialization method. We would have to separate out nodes anyway once we fully make the Loki switch.

@ivanoats
Copy link
Contributor

ivanoats commented Mar 8, 2019

We (@rsstdd and I) tested this just now. Our site looks fine in develop, and is no longer crashing on build ... but it looks like many files are missing from the build. Some images, JSON files, and JS files.
image

I think this is a good direction to go in, not sure how best to help with @stefanprobst 's code right now. It looks like he was looking for feedback from other contributors.

@stefanprobst
Copy link
Contributor Author

@ivanoats Thanks so much for testing this! To quickly clarify: on your site this works without issues with gatsby develop - but with gatsby build you get missing files? Is this with a cleared .cache folder or without?

@rsstdd
Copy link

rsstdd commented Mar 10, 2019

@stefanprobst We were able to build ~30k pages after removing .cache and public for both gatsby develop and gatsby build.

Thank you for your work!

@stefanprobst
Copy link
Contributor Author

@rsstdd That's great!! Do you still see issues with a populated .cache folder? Because that would indicate that we are losing something somewhere in serialization=>deserialization.

@DanielRuf
Copy link
Contributor

The Serialization API is currently experimental and the stability is 1. Generally I would not recommend to use this in production or in general as this could change at any time in the future (Node 11, 12, ...).

https://nodejs.org/api/v8.html#v8_serialization_api

@stefanprobst
Copy link
Contributor Author

@DanielRuf Well it has been added in Node 8 and is e.g. used by jest, so I would not exactly call this unstable.

@DanielRuf
Copy link
Contributor

Well, Jest supports node >= 6 and uses fallbacks for good reasons.

https://github.com/facebook/jest/blob/master/packages/jest-serializer/src/index.ts

Still, I would not recommend this without proper fallbacks.

@stefanprobst
Copy link
Contributor Author

jest considers v8.serialize stable enough to use it by default when it is available (Node 8+), and falls back to JSON.stringify when it is not. This is also what this PR does. Anyway, this is more to gather some more real world data if using v8 would help with the oom issues some people are experiencing in saveState.

@wardpeet
Copy link
Contributor

wardpeet commented Mar 14, 2019

@rsstdd thanks for testing & @stefanprobst thanks for working on this! I'll ping the @gatsbyjs/core to see if we can get this merged.

@stefanprobst
Copy link
Contributor Author

@wardpeet Cool! One open issue is how to deal with nodes that contain functions, e.g. gatsby-plugin-feed which allows functions in plugin options, which will be part of its SitePlugin node.

@KyleAMathews
Copy link
Contributor

👍 Question is if this would be a breaking change, i.e. can we do this in v2?

Kinda. Since these already weren't being serialized, they were already broken.

@pieh pieh changed the title [rfc] Use V8.serialize [wip] feat(gatsby): use V8.serialize instead of JSON.stringify if available Apr 5, 2019
@pieh pieh changed the title [wip] feat(gatsby): use V8.serialize instead of JSON.stringify if available feat(gatsby): use V8.serialize instead of JSON.stringify if available Apr 7, 2019
@pieh pieh removed the status: WIP label Apr 7, 2019
@DSchau
Copy link
Contributor

DSchau commented Apr 9, 2019

Possible issues:

  • Symbol
  • Functions
  • etc.

(e.g. things that can't be serialized with v8)

What we need to do now is make sure that we're not breaking unrelated sites--it does appear to fix the issue though.

@pieh
Copy link
Contributor

pieh commented Apr 10, 2019

I just finished verifying v8.serialize changes - sites from site showcase that actually builds after fresh clone also builds with v8.serialize.
From 52 sites that I build:

  • 39 were just OK in both of them
  • 10 failed baseline build ( didn’t build with just cloning, installing deps and running gatsby build ) - so I skipped those.
  • 1 failed … because The engine "node" is incompatible with this module. Expected version "8.11.x". Got "10.9.0" during deps installation - skiped that one as well
  • 2 initially failed with v8.serialize, but passed baseline test correctly, but after investigation those were related to plugins breaking (one was with gatsby-remark-relative-images - 3rd party package, and the other one was gatsby-mdx that is fixed in rc channel - error UNHANDLED EXCEPTION Error ChristopherBiscardi/gatsby-mdx#342 ) - after fixing/adjusting deps for those, v8.serialize built as well.

So I'm feeling fairly confident in going forward with this change without doing any feature flags

Copy link
Contributor

@wardpeet wardpeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some bikeshed comments but this looks nice! @pieh lets ship this? or you rather want to do it under a tag first?

packages/gatsby/src/redux/index.js Outdated Show resolved Hide resolved
packages/gatsby/src/db/node-tracking.js Show resolved Hide resolved
packages/gatsby/src/db/node-tracking.js Outdated Show resolved Hide resolved
packages/gatsby/src/db/node-tracking.js Outdated Show resolved Hide resolved
@pieh
Copy link
Contributor

pieh commented Apr 11, 2019

I forgot to mention this before. I published this under dist-tag, so if anyone want to try - install gatsby@v8-serialize.

@pieh
Copy link
Contributor

pieh commented Apr 11, 2019

Here's also some chart showing memory usage improvements - https://lucid-galileo-3c3f54.netlify.com/

Things to check - in first graph (using JSON.stringify) there are huge memory spikes when we persist state (marked by green areas), and eventually the process crashes (around ~84s when memory usage "flatlines"). Second chart (when using those changes) memory barely spikes when persisting state

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants