Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex based filter in asset-import-meta-url ignores actual code in some cases #7632

Closed
7 tasks done
jawadsh123 opened this issue Apr 6, 2022 · 11 comments Β· Fixed by #7650
Closed
7 tasks done

regex based filter in asset-import-meta-url ignores actual code in some cases #7632

jawadsh123 opened this issue Apr 6, 2022 · 11 comments Β· Fixed by #7650

Comments

@jawadsh123
Copy link

Describe the bug

Hey!

When using new URL(..., import.meta.url), for some cases vite does not transform the URLs to point to bundled assets (only when building though, works fine in dev πŸ§‘β€πŸ’»)

Reproduction

Go to repro and in the terminal do

npm run build
npm run preview

Browser console should show (notice the second URL isn't transformed)
image

Why?

It seems these regex filters are at fault

const noCommentsCode = code
.replace(multilineCommentsRE, blankReplacer)
.replace(singlelineCommentsRE, blankReplacer)
.replace(stringsRE, (m) => `'${'\0'.repeat(m.length - 2)}'`)

// before

import "./App.css";
import { jsx as _jsx } from "react/jsx-runtime";
import { Fragment as _Fragment } from "react/jsx-runtime";
console.log("1", new URL("./logo.svg", import.meta.url).href);
console.log("2 //", new URL("./logo.svg", import.meta.url).href);
function App() {
  return /* @__PURE__ */ _jsx(_Fragment, {
    children: /* @__PURE__ */ _jsx("img", {
      src: new URL("./logo.svg", import.meta.url).href
    })
  });
}
export default App;
// after filtering with regexes

import '';
import { jsx as _jsx } from '';
import { Fragment as _Fragment } from '';
console.log('', new URL('', import.meta.url).href);
console.log(''img''./logo.svg", import.meta.url).href
    })
  });
}
export default App;

More importantly, the order in which they are applied seems to be the problem
i.e. since singlelineCommentsRE is quite broad, removing line comments before strings messes the code in weird ways 🐞

How do we fix? πŸ”¨

AFAICT removing strings before line comments fixes such problems (shouldn't create new bugs either?)

 const noCommentsCode = code 
   .replace(multilineCommentsRE, blankReplacer) 
-  .replace(singlelineCommentsRE, blankReplacer)
-  .replace(stringsRE, (m) => `'${'\0'.repeat(m.length - 2)}'`)
+  .replace(stringsRE, (m) => `'${'\0'.repeat(m.length - 2)}'`)
+  .replace(singlelineCommentsRE, blankReplacer) 

PS: I stumbled across this when trying to consume Emscripten's release build (-Oz) from vite. In release emcc minifies all JS and puts it in single line, that line had // somewhere in the middle, and everything went kaboom πŸ’₯ 🀣

Reproduction

https://stackblitz.com/edit/vitejs-vite-vc6p2d?file=src%2FApp.tsx&terminal=dev

System Info

System:
    OS: macOS 12.1
    CPU: (10) arm64 Apple M1 Max
    Memory: 84.89 MB / 32.00 GB
    Shell: 5.8 - /bin/zsh
  Binaries:
    Node: 16.13.2 - ~/.asdf/shims/node
    Yarn: 1.22.17 - ~/.asdf/shims/yarn
    npm: 8.1.2 - ~/.asdf/shims/npm
  Browsers:
    Chrome: 99.0.4844.84
    Safari: 15.2
  npmPackages:
    @vitejs/plugin-react: ^1.0.7 => 1.1.4
    vite: ^2.7.2 => 2.7.10

Used Package Manager

npm

Logs

No response

Validations

@jawadsh123
Copy link
Author

AFAICT removing strings before line comments fixes such problems (shouldn't create new bugs either?)

 const noCommentsCode = code 
   .replace(multilineCommentsRE, blankReplacer) 
-  .replace(singlelineCommentsRE, blankReplacer)
-  .replace(stringsRE, (m) => `'${'\0'.repeat(m.length - 2)}'`)
+  .replace(stringsRE, (m) => `'${'\0'.repeat(m.length - 2)}'`)
+  .replace(singlelineCommentsRE, blankReplacer) 

scratch that, it'll mess stuff up even more 😞, stringsRE matches across lines
maybe singlelineCommentsRE could be toned down a bit, to not match // that are within quotes?

@poyoho
Copy link
Member

poyoho commented Apr 7, 2022

Thank you very much for your correction 😊

@bluwy
Copy link
Member

bluwy commented Apr 7, 2022

maybe singlelineCommentsRE could be toned down a bit, to not match // that are within quotes?

Could be worth a shot, but we had tried something similar (in a different area) in the past, and it opened up more edge cases. But would be nice if you can find a robust regex for it.

@jawadsh123
Copy link
Author

@bluwy on second thought, I do feel it will be quite finicky and hard to get right

another idea, maybe we can bank on the non-overlapping nature of regex matches and combine all our current filters into one giant regex, something like this -> https://regex101.com/r/I3D8t8/1
the idea is,

  1. go over the string left->right
  2. find the left-most match for any of the three filters
  3. process it, move ahead in string, and repeat from step 1 πŸ”

I feel this can work. I had the idea to do this manually, but regex engines seem to do this for ORed expressions anyways
@bluwy @poyoho do y'all see any problems with this approach? I can chalk up a PR if it feels alright

@poyoho
Copy link
Member

poyoho commented Apr 7, 2022

FYI, I think there is also a problem with this replacement. There will be a matching error if there are sub-strings of the same length in the same string. I also try to solve this problem normally, it's not just assetImportMetaUrl plugins that need to be changed. #7549

@poyoho
Copy link
Member

poyoho commented Apr 7, 2022

maybe singlelineCommentsRE could be toned down a bit, to not match // that are within quotes?

// "

const a = 1

// "

It also bad

@jawadsh123
Copy link
Author

I also try to solve this problem normally, it's not just assetImportMetaUrl plugins that need to be changed. #7549

@poyoho really nice idea of having general utils for regexes, should make the code lot nicer to read πŸš€

There will be a matching error if there are sub-strings of the same length in the same string.

hmm, I don't think I totally understand, can you pls share some example code where it can break?

maybe singlelineCommentsRE could be toned down a bit, to not match // that are within quotes?

// "

const a = 1

// "

It also bad

agreed, improving singlelineCommentsRE is very tricky
which is why was thinking of that second approach

@poyoho
Copy link
Member

poyoho commented Apr 8, 2022

I test my case in your provide regexp is well
https://regex101.com/r/VikP6f/1

@poyoho
Copy link
Member

poyoho commented Apr 8, 2022

there a error in nested string template https://regex101.com/r/0GKkgF/1 πŸ˜…

@poyoho
Copy link
Member

poyoho commented Apr 9, 2022

https://regex101.com/r/JSS4XV/1. but javascript regexp don't support nested syntax.

@jawadsh123
Copy link
Author

thanks @poyoho!

@github-actions github-actions bot locked and limited conversation to collaborators Apr 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants