Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RRFC] npmrc file improvements. #427

Open
EvanCarroll opened this issue Aug 6, 2021 · 6 comments
Open

[RRFC] npmrc file improvements. #427

EvanCarroll opened this issue Aug 6, 2021 · 6 comments

Comments

@EvanCarroll
Copy link

EvanCarroll commented Aug 6, 2021

There are two current problems that I see with npmrc files,

  1. They're very suseptable to mismatch as they follow a format of

    @scope=${URL}
    ${URL}/:_authToken=${TOKEN}
    

    This means you put in the URL twice, and it's not clear exactly that it's a URL to begin with. You can see this bug here where I just suffered from this problem. [BUG] npmrc errors related to lacking authToken are obscure cli#3618

  2. They do not differentiate between push and pull end-points which is need by CI tools like GitLab. Currently, GitLab hosts their npm registry on an unprivledged port 5050. When you pull from packages this is what you're supposed to pull from. When you push packages you're supposed to submit to their privileged V4 Package API (which itself is interfaced with like a registry). You can see in their official docs they create a .npmrc file with,

    @foo:registry=https://gitlab.example.com/api/v4/projects/${CI_PROJECT_ID}/packages/npm/
    //gitlab.example.com/api/v4/projects/${CI_PROJECT_ID}/packages/npm/:_authToken=${CI_JOB_TOKEN}
    

    I suspect strongly the reason why they do NOT use the unprivledged $CI_REGISTRY (and instead use $CI_SERVER_HOST in their examples) in the top is because if they did, it would expand with the port number to acme.net:5050 and thus mismatch with the auth line below. But ideally what they want is the ability for both people with and without publishing rights to pull from the unprivledged port 5050, and to push to the privledged https://gitlab.example.com/api/v4 (or $CI_API_V4_URL)

    What would be ideal here is .npmrc supported something more like this,

    "@scope": {
      "push": {
        repoUrl: $url,
        authToken: $token
      }
      "pull": {
        repoUrl: $url,
        authToken: $token
      }
    }
    

    Then you could never mismatch the token's connection to the scope, or the repo url. And GitLab could permit users internal and exteral to pull from the unprivledged port 5050 repo, regardless of whether or not they intend to publish later. And, without rewriting their npmrc after they pull to support pushing.

@darcyclarke darcyclarke added the Agenda will be discussed at the Open RFC call label Aug 11, 2021
@isaacs isaacs added the config label Aug 11, 2021
@isaacs
Copy link
Contributor

isaacs commented Aug 11, 2021

The problem with this proposal is that, while it does address the use case described, it is (a) a breaking change, and (b) not nearly as big a breaking change as we need.

We have been talking about a massive overhaul of npm's configuration management layer, and we can keep this open and tagged with the "config" label to ensure that it gets considered in that process. But what we really need is something much larger in scope to solve a number of other problems in npm today.

From today's rfc call:

  • @isaacs
    • have been considering improvements internally
    • this initial discussion/issue doesn't go as far as we'd like if we're going to make a breaking change to .npmrc/config
    • npm install --regsitry=foo <- typos don't throw (ie. infinite options/config)
    • lets eliminate nopt
  • @naugter
    • consider making this an ecosystem package first & make it optional before shipping it by default
  • @darcyclarke
    • what about something like --thrown-on-unkown to curb the infinite supported config problem we have today
      • the fact that this is misspelled here in the notes is :male-cook: 😚 perfect
  • Action: audit config / propose a more comprehensive RFC

@EvanCarroll
Copy link
Author

discussion/issue doesn't go as far as we'd like

It's not clear to me why the proposal is less aggressive then what you're looking for. I would prefer if you're going to do everything, to drop json and use toml instead. But I take it that's not what you're referring too. What more do you want to solve that this configuration wouldn't solve regardless of the serialization format..

"@scope": {
  "push": {
    repoUrl: $url,
    authToken: $token
  }
  "pull": {
    repoUrl: $url,
    authToken: $token
  }
}

@isaacs
Copy link
Contributor

isaacs commented Aug 12, 2021

So, what you're proposing here is making the "scope" field a top-level config item, which contains nested objects. This is possible (albeit a bit weird) to serialize using the existing ini format, like this:

[@scope]
[@scope.push]
repoUrl=$url
authToken=$token
[@scope.pull]
repoUrl=$url
authToken=$token

However, this opens a handful of questions:

  1. npm configs are nested. What do you do if @scope.push.repoUrl is defined in the project .npmrc, but @scope.push.authToken is defined in the user-level ~/.npmrc? So we have to decide whether objects clobber one another, or get merged together.
  2. How would you specify @scope.push.repoUrl in a command-line flag? In an environment variable?
  3. What other things do we put in there? Adding just one nested object-merging thingie is a recipe for a wart. If we're going to do it, let's do it, and tackle some of the bigger problems with configuration management.

Among those problems:

  • We right now accept any old string as a flag, so npm publish --regsitry=https://example.com publishes to the default registry.
  • We warn on invalid values, but then throw them out, so npm publish --access=restircted will just ignore the misspelled value, which is Very Bad if it results in making something public that shouldn't be.
  • People want to have different scope/registry settings per workspace, which there is right now no way to even begin to support.
  • We have a lot of booleans that should really be enums, and the only way we rationalize them right now is to squash them down, which means a lot of unfortunate side-effects that makes the code hard to reason about. There are multiple things you might guess that npm install foo --save-dev --save-peer --save-optional does, but most of those guesses are wrong. npm install foo --save-type=peerOptional is much clearer.
  • The parser (nopt) is super dated, basically a spike that was pushed out into the world before it was ready, and then frozen in amber by virtue of being a huge breaking change to ever modify. It's time to bite that bullet.

We are not likely to replace JSON with toml anywhere it's currently being used. Replacing ini is up for discussion. Having multiple different formats in play at once (eg, supporting both .npmrc and .npmrc.json) is not.

In npm v7, we did make some progress towards making it possible to refactor config in a major way, by consolidating a lot of the logic and definitions into one place, and passing around a plain-old-js object to npm/cli's many dependencies (so that they didn't have to read from npm's internal config.get(...) method and do this side-effect management in multiple places). However, we flinched away from actually changing too much in the config semantics, because one Big Change is the maximum for any semver major release, and we already had peerDep installation.

But getting to the specifics suggestion here:

What more do you want to solve that this configuration wouldn't solve regardless of the serialization format..

My main point of feedback is that authentication should be per-registry rather than per-scope. I think mapping the registry for a given scope separately for reads and writes is a good idea, but I'm not sure it makes sense for the auth token to be different per-scope.

Unless, I guess, if you wanted to publish all the packages in the @foo scope under one account, but all the packages in the @bar scope under another account, both to the same registry? That seems a bit weird to me, but it may be I'm just not being imaginative enough 😅

So, maybe a data structure something like this would make sense:

{
  "registry": {
    "janky.registry.mycompany.com/path/to/registry/endpoint": {
      "timeout": 60000, // this is a little slow sometimes
      "prefer-online": true, // always be refreshing the cache
      "push": { // settings to use when doing any PUT/DELETE/POST
        "token": $token,
        "retries": 999 // keep retrying
      },
      "pull": { // settings to use when doing any GET/HEAD
        "token": $readOnlyToken,
        "timeout": 6000 // overrides the value above
      }
    },
    "registry.npmjs.org": {
      // settings for the public registry
    },
  },
  "scopes": {
    "@mycompany": {
      "registry": "https://janky.registry.mycompany.com/path/to/registry/endpoint"
    },
    // ...
  }
}

Once we parse the config files, and the cli arguments, and the environment variables, we should be able to construct a plain object with everything we need in any given situation and pass to any of our deps, in a way that's much easier to reason about.

@isaacs
Copy link
Contributor

isaacs commented Aug 12, 2021

(Tbh, I think ini format is Just Fine. It's a perfectly adequate lowest-common-denominator of config definition formats, and in many ways more ergonomic than json or toml. The big downside of ini, that all values are strings, is actually not a big deal in our case, since we also read configs from the cli and environment, where all values are strings anyway.)

@EvanCarroll
Copy link
Author

EvanCarroll commented Aug 12, 2021

@isaacs Excellent response. I'll start with your second reply first.

I would not go with ini because it's unstandardized. I would move to TOML. It's well-speced and has a brilliant and vibrant ecosystem. It's not true that "all values are strings". On the contrary, TOML supports String, Integer, Float, Boolean, Date/Timestamp with and without TZ, Array, Inline Table. See the 1.0.0 spec for more information

What do you do if @scope.push.repoUrl is defined in the project .npmrc, but @scope.push.authToken is defined in the user-level ~/.npmrc? So we have to decide whether objects clobber one another, or get merged together.

Merging such that the option with more locality to the project is chosen seems like the best and most flexible bet, and it's also consistent with the Rust ecosystem. No strong preference though, it's just the behavior I would expect.

How would you specify @scope.push.repoUrl in a command-line flag?

I could see an program to set the config option whether globally or locally for the project's respective file through a subcommand like npm config. I probably wouldn't provide a command-line override for config values. That said, if that's what you want it's not clear to me why -c "@scope.push.repoUrl=URL" would not work.

In an environment variable?

As far as passing it through an environmental variable, I would audit the two systems that in are current play that offer package registries as part of CI: GitLab, GitHub. These two systems both supply the registry url and the token in the environment. The best way to address this is to have mode like npm publish --ci which (1) reads these values, (2) ignores the config, (3) alerts the user that the configs were found and ignored.

We right now accept any old string as a flag, so npm publish --regsitry=https://example.com publishes to the default registry.
We warn on invalid values, but then throw them out, so npm publish --access=restircted will just ignore the misspelled value, which is Very Bad if it results in making something public that shouldn't be.

We agree on an ideal here. This isn't unique to the CLI though values unsupported in the TOML should probably do the same thing for the same reason. After parsing they should be rejected for being invalid configuration files.

People want to have different scope/registry settings per workspace, which there is right now no way to even begin to support.

Again, I say we stand on the shoulders of giants (rust's docs on workspaces), mostly because they stood on NPM's shoulders first, and we should return the favor.

We have a lot of booleans that should really be enums, and the only way we rationalize them right now is to squash them down, which means a lot of unfortunate side-effects that makes the code hard to reason about. There are multiple things you might guess that npm install foo --save-dev --save-peer --save-optional does, but most of those guesses are wrong. npm install foo --save-type=peerOptional is much clearer.

That whole system is bloody complex and relatively hard to reason about, especially with the notion of workspaces. It's so complex Angular can't make heads and tells of it and there is no documentation how it's supposed to work. angular/angular#43058

Not saying it's without use, but it's actually too complex for me. I haven't fully wrapped my head around why there aren't dependencies for development and production, and why we have peer dependencies and how they're supposed to work with workspaces. In fact, totally honest -- I usually end up somewhat brute forcing this and it leaves me feeling like a button masher playing Street Fighter II Turbo for work.

The parser (nopt) is super dated, basically a spike that was pushed out into the world before it was ready, and then frozen in amber by virtue of being a huge breaking change to ever modify. It's time to bite that bullet. We are not likely to replace JSON with toml anywhere it's currently being used. Replacing ini is up for discussion. Having multiple different formats in play at once (eg, supporting both .npmrc and .npmrc.json) is not.

This doesn't make sense to me. TOML has support in most languages now. The reference implementation is solid. Moreover, there is no spec for ini -- everyone does it differently.

What more do you want to solve that this configuration wouldn't solve regardless of the serialization format..

The only things I want to solve are the push and pull semantics, and reduce the tendency to encounter errors.

My main point of feedback is that authentication should be per-registry rather than per-scope. I think mapping the registry for a given scope separately for reads and writes is a good idea, but I'm not sure it makes sense for the auth token to be different per-scope.

Actually when I think about that as an improvement, it sounds great too the only thing I would caveat that with is that these are frequently package registries in CI, and not instance registries. This means that for GitLab and the like, people WILL have a different registry for each package. The only reason why it works for scope for me (and probably why it seems awkward for you) is that my repository has only one package, and I only use one scope on the package registry. If I back up a little bit, none of this configuration should even be needed because it can all be inferred from the environment without any configuration if we were to supply a --ci flag that had repository specific logic.

Unless, I guess, if you wanted to publish all the packages in the @foo scope under one account, but all the packages in the @bar scope under another account, both to the same registry? That seems a bit weird to me, but it may be I'm just not being imaginative enough

No actually that's pretty close to what I want to do, I want to publish all the packages (though there is only one of them) in the @foo scope to GitLab, and I want to pull down all other scopes from my company's internal verdaccio cache of NPM.

So, maybe a data structure something like this would make sense:

Let's try again to sell TOML. ;)

[scope.@foo]
registry.push = "acme_publish"
registry.pull = "acme_public"

[scope.@bar]
registry.pull = "other"
# implicitly disabled push
private = true

[scope.@baz]
registry = "acme_publish"

[package.demo_app]
registry.pull = "other"

[registry.acme_publish]
url = "https://janky.registry.gitlab.acme.com:5050/path/to/registry/endpoint"
timeout = 60
retries = -1

[registry.acme_public]
url = "https://janky.registry.gitlab.acme.com/path/to/registry/endpoint"
timeout = 6000
retries = 3
token = $token

[registry.default]
url = "https://verdaccio.acme.com/path/to/registry/endpoint"

That moves the repos to be declared outside of the scope, and links them by name.

We could also add to the registry stansa "push_only" or "pull_only" which could stop the repo from being used in contexts that do not make sense.

@jcrben
Copy link

jcrben commented Feb 13, 2023

Should also consider multiple registries https://stackoverflow.com/questions/32633678/is-there-any-way-to-configure-multiple-registries-in-a-single-npmrc-file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants