Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Serialize walkAsync so isBinaryFile doesn't throw EMFILE: too many open files. #286

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

amiller-gh
Copy link

@amiller-gh amiller-gh commented Jun 26, 2023

Hey! Thanks for the great library – ran into an issue and have a small fix proposal.

I know this has already been raised here: #248

However, for #reasons we need to disable ASAR packaging for our product. This means that walkAsync has to crawl all of node_modules looking for binary files. Because getFilePathIfBinary actually opens and reads the first few thousand bytes of every file, the recursive Promise.all quickly throws with EMFILE: too many open files, no matter how high I set the open file limit via ulimit.

Conveniently, just putting things into a simple for loop fixes things by serializing the file reads, giving Node a chance to release each file before reading the next one. IIRC, this shouldn't impact performance much because of how Node's FS access works. Anecdotally, we also have this change deployed to our CI and it's performing very quickly without any errors!

@amiller-gh amiller-gh requested a review from a team as a code owner June 26, 2023 07:09
Copy link
Member

@MarshallOfSound MarshallOfSound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a performance regression, serialized fs access is Bad Practice ™️

I'd accept a PR that modifies this code to use the queue module or something similar that limits our old parallel behavior to a fixed concurrency (E.g. 100)

@amiller-gh
Copy link
Author

Happy to refactor (may be a day or so), but if I remember correctly, isn't libuv's default thread pool size only 4? And, even with more libuv threads, wouldn't everything buffer while waiting for the event loop to pull the data anyway?

My understanding was that, because of how Node handles IO, the "parallelism" you get from creating hundreds of IO-executing promises has quickly diminishing returns to the point where serializing the work is often comparable perf-wise, and in many cases – where the flood of promise objects quickly inundates the GC, sapping CPU time from the event loop, disk reads compete for system IO access, and libuv buffers begin to grow – serialized reads can frequently result in improved performance.

Genuinely curious if this has improved with recent versions of Node! Happy to update the PR regardless, but would enjoy learning something here if the prevailing wisdom has changed :)

mwakizaka added a commit to Magic-Pod/osx-sign that referenced this pull request Aug 5, 2023
mwakizaka added a commit to Magic-Pod/osx-sign that referenced this pull request Aug 5, 2023
@mwakizaka
Copy link

Hi, sorry for cutting in.

We also encountered an EMFILE error like #248 and confirmed that the problem got resolved by this PR. As far as I checked, this PR does not likely affect the performance. Although I'm not an expert in nodejs, it would be appreciated if osx-sign project would consider merging this PR.

@new-carrot
Copy link

This will be a performance regression, serialized fs access is Bad Practice ™️

I'd accept a PR that modifies this code to use the queue module or something similar that limits our old parallel behavior to a fixed concurrency (E.g. 100)

There is no way an error that causes the osx-sign fail to do the job is better than a performance decrease. If you try to run the script below on an app that have many files, which I copied from src/util.ts . The EMFILE error will happen, no matter how high you increase ulimit. Beside, performance decrease is not much.

const fs = require('fs-extra');
const path = require('path');
const a= require('isbinaryfile');

async function getFilePathIfBinary (filePath) {
  if (await a.isBinaryFile(filePath)) {
    return filePath;
  }
  return null;
}

async function walkAsync(dirPath) {

  async function _walkAsync(dirPath) {
    const children = await fs.readdir(dirPath);
    return await Promise.all(
      children.map(async (child) => {
        const filePath = path.resolve(dirPath, child);

        const stat = await fs.stat(filePath);
        if (stat.isFile()) {
          switch (path.extname(filePath)) {
            case '.cstemp': // Temporary file generated from past codesign
              await fs.remove(filePath);
              return null;
            default:
              return await getFilePathIfBinary(filePath);
          }
        } else if (stat.isDirectory() && !stat.isSymbolicLink()) {
          const walkResult = await _walkAsync(filePath);
          switch (path.extname(filePath)) {
            case '.app': // Application
            case '.framework': // Framework
              walkResult.push(filePath);
          }
          return walkResult;
        }
        return null;
      })
    );
  }

  const allPaths = await _walkAsync(dirPath);
  return compactFlattenedList(allPaths);
}
walkAsync('./AnExample.app').then((result) => {
  console.log(result.length);
}).catch((error) => {
  console.log('error', error);
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants