Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: implement node --run <script-in-package-json> #46534

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented Feb 7, 2023

This stemmed from a twitter thread: https://twitter.com/matteocollina/status/1622514656878161920. I was just curious to find out how slow/fast a JSON.parse() and execSync would be compared to npm run, and whether the difference is significant enough that a pure C++ implementation (like what deno task does) would make sense. On my server, with this:

{
  "scripts": {
    "hello": "echo \"hello\""
  }
}

npm run hello runs for ~330ms, out/Release/node --run hello runs for ~35ms (~3-4ms overhead over starting up a no-op process), and a similar deno.json + deno task hello runs for ~5ms (that just does everything in native, no JS involved).

I don't feel very strongly about this patch (I personally feel it's more of the package managers'/user-land CLIs' job to simplify their workflow to reach this speed for this type of commands). So there are no tests and no proper error handling here (yet), and it probably needs a bunch more work to be closer to what people expect from these commands (e.g. handling env var or cross-platform support for shells). But since I already wrote this to find out about the numbers anyway, and it feels wrong to just ignore that 330ms -> 35ms difference and pretend that nothing happened, might as well open a PR even just as food for thought ;) A fully C++ implementation is possible too, though I am not sure a 35ms vs 5ms difference is really worth the effort. This is just trying to see what difference can be made with minimum effort.

EDIT: After more discussions I realized that this actually also makes sense even not for the sake of performance. To quote myself below:

If the user uses yarn to manage their package.json, it's best to use yarn run. If they use npm to manage their package.json, it's best to use npm run. But what if they don't use any package managers (yet?) and just have a hand-written package.json? node --run would be a natural choice.

And in this case, supporting any features that's not commonly shared by most package managers would be a non-goal. It would make sense to ensure that upgrading from node --run to npm run, yarn run, et.al results in as fewer breakages as possible (or do not incur breakages that would not happen when switching between package managers), but not the other way around. Just like switching from Node.js built-in modules (e.g. fs) to the "enhanced" third-party versions of them (e.g. fs-extra) would usually be non-breaking, but it definitely does not happen the other way around.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/startup

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Feb 7, 2023
@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Feb 7, 2023

I for one think this is pretty cool and we should explore it 😄 Thank you for trying this out!

EDIT: folding the discussions about parsing node options

This is related to #43973 and nodejs/loaders#98 and #43818 (comment), where we want to provide a way for users to specify what loader(s) Node should use via package.json. I think such an API should be a new package.json field nodeOptions that’s a JSON representation of the NODE_OPTIONS environment variable, supporting as many of that variable’s options as possible. This PR might show the way toward implementing such an API, if this package.json parsing happens (or can happen) early enough in the startup process to happen before NODE_OPTIONS is currently read and the CLI options are set. If what you have here isn’t early enough, this might be a reason (other than 5ms runtime) to justify implementing it in C++. cc @nodejs/loaders @aduh95

@arcanis
Copy link
Contributor

arcanis commented Feb 7, 2023

While I'm very interested in improving the run performances, I don't think this is the way, for reasons similar to what we discussed in nodejs/corepack#57. Quoting myself:

With we would be abstracting multiple package managers, each with their own different behaviours (for example yarn run build:foo will run the build:foo script regardless of the workspace that declares it, whereas npm run build:foo will run it exclusively in the current workspace). While the semantics are often similar, the details aren't, and I feel like the subtle differences in behaviour could be extremely confusing for our users and painful for us as maintainers.

Adding a feature to the Node CLI doesn't force anyone to use it though

I don't agree with this - the JS community is large, and many many many people would misinterpret the very existence of such an optional feature in Node as meaning "it's official, you don't need to specify the package manager name anymore" - then they would get pissed when it doesn't work as they expect from one project to the other, or because it doesn't work for all commands, or [...] - eventually blaming it on either Node or package manager authors.

I'd prefer to investigate what we do in more details and see whether we could add features in Node to unlock some performance boosts. For example, perhaps implementing #21664 would help? Perhaps a way to ship a whole application in a single file would reduce I/O? Perhaps it'd be possible for Node to natively integrate something like v8-compile-cache?

Keep in mind that if we compare yarn run true to yarn --version, the difference is ~50ms on my machine. Everything else (~300ms) is the usual startup. I strongly suspect improving this startup is where the core of the problem is.

@joyeecheung

This comment was marked as off-topic.

@joyeecheung
Copy link
Member Author

joyeecheung commented Feb 7, 2023

I'd prefer to investigate what we do in more details and see whether we could add features in Node to unlock some performance boosts.

@arcanis

There isn't much special here that cannot already be done in the user land, on my server this:

// Run it with `node run.js hello`

// Maybe the package CLIs do a bunch more here
const json = JSON.parse(require('fs').readFileSync('package.json', 'utf-8'));
// And here
require('child_process').execSync(json.scripts[process.argv[2]], { stdio: 'inherit' });

is already almost as fast as the implementation in this patch, with just a 3-4ms overhead by being a user-land script, but it's still under 40ms and nowhere near ~330ms. What the package CLIs can already do is to refactor their internals and have a simplified fast path for this workflow and do as little as possible extra work on top of what happens in the snippet above. Though I also suspect that maybe that refactoring would already be hard enough for a > 10 years-old, complex CLI like npm. Or maybe performance would not be their priority. But this seems a low-hanging enough fruit to implement in Node.js core, in comparison, and the original thread made a good point about Node.js core already reading package.json, so this doesn't feel that much a sacrilege towards the small-core philosophy if we keep the built-in feature minimalistic enough.

then they would get pissed when it doesn't work as they expect from one project to the other, or because it doesn't work for all commands

I suspect if we keep whatever built-in minimalistic enough and gives a fair warning about it not supporting too many things, people would not expect the built-in one to be as powerful as the user-land package managers, and would understand that package managers support something extra on top of the built-in one. But it's just a guess. Whatever happens in the user-land package managers cannot be more complicated than what happens in the snippet above, can they?

At the end of the day I am not that keen on pushing this into core, but I do want to see a good argument to use when someone asks "Why in the Node.js world starting a script takes 330ms with the built-in feature while other runtimes takes less than 10ms?". And at least a ~30ms level alternative doesn't seem as bad.

@arcanis
Copy link
Contributor

arcanis commented Feb 7, 2023

But this seems a low-hanging enough fruit to implement in Node.js core, in comparison (and the original thread made a good point about Node.js core already reading package.json)

Whatever happens in the user-land package managers cannot be more complicated than what happens in the snippet above, can they?

It's not a low-hanging fruit. Taking Yarn as an example, yarn run:

  • Takes into account the yarnPath setting to run the script through a pinned Yarn version
  • Runs hooks to allow users to track usage of their scripts fields
  • Creates a temporary env to expose the dependencies' bin entries depending on which workspace runs the script
  • Uses a builtin portable-like shell for portability across systems
  • Adds the --require and --loader flags to NODE_OPTIONS, if required
  • Knows how to run scripts from other workspaces, using the : syntax mentioned before
  • Can run both scripts from the package.json, or binaries from the dependencies

If node --run was implemented, it'd be npm-centric, kinda useless besides the most very basic cases, and it's likely we would recommend our users to never use it rather than always wonder "is it one of those rare case where it'd work?".

(Note that doing all the work above isn't what takes the longest - as I mentioned, compared to yarn --version, yarn run only takes an extra 50ms; the real problem is the program initialization, and perhaps the yarnPath indirection. We haven't found a good way to optimize it so far, although we're regularly looking.)

@bnoordhuis
Copy link
Member

I think Joyee is saying there's a common happy path where you can go brr: no frilly hooks, no extra setup, just run the script.

npm does some not-strictly-necessary leg work when you run it in the common happy path scenario and I bet so does yarn.

@joyeecheung
Copy link
Member Author

joyeecheung commented Feb 7, 2023

If node --run was implemented, it'd be npm-centric, kinda useless besides the most very basic cases

Why would it be npm-centric? I don't think it needs to support anything that's not supported by these package managers in common?

and it's likely we would recommend our users to never use it rather than always wonder "is it one of those rare case where it'd work?".

Yes, that's what I think we could recommend too. If the user uses yarn to manage their package.json, it's best to use yarn run. If they use npm to manage their package.json, it's best to use npm run. But what if they don't use any package managers and just have a hand-written package.json? node --run would be a natural choice (and it would be fair to expect that the built-in one only does very basic things and the users have to actually use yarn or npm to use any yarn- or npm-specific features).

It's not a low-hanging fruit.

What I meant is "implementing something that does not do any of these (no matter in yarn or in npm flavor) would be a low-hanging fruit". When I started using Node.js I actually wondered why I had to use npm run when I did not have anything in my dependencies and why it wasn't built-in to Node.js. It seems a bit scary to me that package mangers poke around the system before executing a simple command when what I want is just having shorthands for certain simple tasks and not having to type the entire command (of course in some projects that's what's necessary, but most package.json I've seen don't have scripts fields that would need these). In some cases a project may prefer to not have any third-party code or the only third-party code they need is not managed by a Node.js package manager (e.g. a C++ library). In these cases it would be nice to be able to keep things as simple as possible and use a very stright-forward node --run for development tasks, even not for performance's sake, or rather it would be odd to have to use a package manager at the risk of running more third-party code.

@joyeecheung joyeecheung changed the title cli: implement --run cli: implement node --run <script-in-package-json> Feb 7, 2023
@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Feb 7, 2023

I think if this is something that Deno already does, that’s a clear enough signal that Node could do it too. It’s okay if Node’s version is more basic than the equivalent offered by Yarn or npm; that’s no different than using Deno’s. And it’s okay if Yarn wants to advise their users to use Yarn’s version, to take advantage of Yarn workspaces or whatever.

EDIT: folding the discussions about parsing node options > I think if what you want is “reading package.json while ignoring what NODE_OPTIONS specifies”, that can already be done anyway? It’s one thing when to parse NODE_OPTIONS and a different thing when to make a decision based on it. And no this doesn’t happen after `NODE_OPTIONS` parsing as that happens in `InitializeNodeWithArgsInternal`, before JS is executed, though I think it’s certainly possible to refactor the option parsing and store the result of parsing `NODE_OPTIONS` separately so that you can ignore it if that’s what you are after.

I think we were assuming that NODE_OPTIONS would take priority over anything defined in the package.json; we just wanted a way to define Node settings in a more structured way, other than something like NODE_OPTIONS=--loader=ts-node or the like within an package.json "scripts" command. As in, when an error message or docs page tells people to use the TypeScript loader, it can be instructions along the lines of “Create a nodeOptions section in your package.json if you don’t have one already, and inside it add "loader": "ts-node".” And this would apply to all of the scripts for the project without the user needing to add NODE_OPTIONS= to each one.

@joyeecheung

This comment was marked as outdated.

@joyeecheung

This comment was marked as off-topic.

@GeoffreyBooth

This comment was marked as off-topic.

@joyeecheung

This comment was marked as off-topic.

@RDIL
Copy link

RDIL commented Feb 8, 2023

Personally, I'm -1 on this change. It would create many edge cases among different package managers, not to mention adding probably more confusion, as every guide in existence already says to use npm run XYZ or yarn XYZ, with the addition of this, it's just a third way, and that feels less than ideal in my mind.

Sure, the fact you need to run scripts via the package-manager even when you don't use packages isn't great, and this may improve startup time, but I just don't think that the other sacrifices would be worth it. Just my two cents.

@Qard

This comment was marked as off-topic.

@joyeecheung

This comment was marked as off-topic.

@Qard

This comment was marked as off-topic.

@joyeecheung

This comment was marked as off-topic.

@joyeecheung
Copy link
Member Author

I am going to fold the discussions about parsing the collection of node options from package.json, as I'm afraid this is shadowing the discussions about node --run at this point, which only cares about "scripts" fields. Feel free to open a separate issue or discussion if you think we should continue there. @GeoffreyBooth @Qard

@jasnell
Copy link
Member

jasnell commented Feb 9, 2023

I can understand the concern that @arcanis has here and can imagine no shortage of edge cases here. npm automatically putting node_modules/.bin into the PATH provided to scripts is an example. Would we do that here also? If not, there's no way we can guarantee that this will actually work for a large set of the modules on the registry that use scripts. I do like the basic idea, however. Perhaps there's another way? What if instead of running a script this ran the bin from package.json instead? (I've no idea if this is actually a Good Idea but it's an Idea)

{
  "bin": {
    "app": "./foo.js"
  }
}

> node --run app

@joyeecheung
Copy link
Member Author

joyeecheung commented Feb 9, 2023

npm automatically putting node_modules/.bin into the PATH provided to scripts is an example. Would we do that here also? I

Is that something done by all the package managers that corepack supports, for example? If not, I think we should not do it. What we should be careful about is that users can switch from node --run to npm run/yarn run/pnpm run etc. without breakages (or if they do, it's not a breakage that would not otherwise happen during a switch between package managers). But support of switching from npm run etc. to node --run should be a non-goal.

If not, there's no way we can guarantee that this will actually work for a large set of the modules on the registry that use scripts.

I think it would be okay to not work for existing modules on the registry, as it's very likely that they already manage their package.json with some package manager, and this then goes against what I suggested earlier:

If the user uses yarn to manage their package.json, it's best to use yarn run. If they use npm to manage their package.json, it's best to use npm run. But what if they don't use any package managers and just have a hand-written package.json? node --run would be a natural choice.

At least this would be useful enough for projects (not modules) that prefer not to run any third-party code, or at least not any third-party code managed by a Node.js package manager (like a C++ library).

From my POV there is a value in providing a very basic task runner for users who just want to do things in the simple way, but once this has to increase its complexity to support features supported package managers, especially ones that are not commonly found in most package managers, that value gets lost.

@Qard
Copy link
Member

Qard commented Feb 9, 2023

I don't think appending ./node_modules/.bin to the path would be an issue. Having it at the end would mean it would still run through all the other options for bin lookup before trying there. If it gets there then the script was already broken anyway. 🤷🏻

@MylesBorins
Copy link
Contributor

I have mixed feelings about this. On one hand a streamlined way of running scripts that is "very fast" and "streamlined" seems beneficial... but having "yet another way" of doing this seems rife with error + confusion.

I think it would be useful to ensure everyone is on the same page about the use case of node --run and what type of developer would actually be using it.

If the user uses yarn to manage their package.json, it's best to use yarn run. If they use npm to manage their package.json, it's best to use npm run. But what if they don't use any package managers (yet?) and just have a hand-written package.json? node --run would be a natural choice.

My intuition is that there is very few people using Node.js without a package manager, and even fewer who need to be running scripts for their projects that have 0 external dependencies. How many people do we know hand writing a package.json and then NOT using a package manager to install it. This scenario seems a bit contrived to me in all honesty. If someone is then using a package manager to install their dependencies they can use the same package manager to install a script

What I have heard folks complain about is the speed of npm run and how many things are getting loaded just to run some scripts. I've also heard folks complain about not being able to specify the shell used to run scripts and wanting more flexibility.

Script running gets quite complicated across the various package managers for reasons mentioned above including:

  • magic env vars to make running dep bins simpler
  • how argument passing is done
  • lifecycle scripts such a pre and post (default on by npm + yarn, default off by pnpm)
  • workspace support

Similar to what @arcanis had suggested I feel like we would benefit from working with package managers to figure out what features in Node.js could help improve package manager performance in this space.

@anonrig
Copy link
Member

anonrig commented Jan 15, 2024

Hey @joyeecheung, do you plan on working on this? If not, I'd like to continue and open a PR.

@arcanis
Copy link
Contributor

arcanis commented Jan 15, 2024

Last year's feedback is still relevant.

Also note that node -r package-name can already serve the --run purpose for third-party packages without any of the drawbacks, similar to how things work in the Python ecosystem (python -m SimpleHTTPServer). It just needs to be documented somewhere so that CLI tools know how to check whether they are loaded via -r or as API.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Jan 17, 2024

Also note that node -r package-name can already serve the --run purpose for third-party packages

-r is short for --require, so this only works for packages that can be required (so only CommonJS packages). node --import package-name should work for everything, CommonJS or ESM packages.

Regardless, the request is to run package.json scripts, not packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants