Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High level profiling / tracing #8588

Closed
marcins opened this issue Oct 31, 2022 · 4 comments
Closed

High level profiling / tracing #8588

marcins opened this issue Oct 31, 2022 · 4 comments
Labels
💬 RFC Request For Comments

Comments

@marcins
Copy link
Contributor

marcins commented Oct 31, 2022

💬 RFC

🔦 Context

Parcel currently provides a way to generate sampling CPU profiles using --profile, or using Ctrl-E during the build. These CPU sample based profiles can be used to identify bottlenecks and understand where Parcel is spending it's time at the code level, and are useful for these low-level purposes.

This proposal is to introduce the concept of a high-level trace, where Parcel can report on time spent at a higher level - for example, to be able to identify which transform is the most expensive in your build, or identify if there is a particular source file that causes additional overhead.

This is particularly useful for us internally when determining where to focus our efforts on build time improvements, in order to identify where time is being spent and where we should focus our optimisation efforts.

Prior art

When I initially spiked this, @mischnic mentioned this had been previously raised by him in the Parcel project management tool (issue T-1131).

This RFC builds upon work originally created by @thebriando in our internal Parcel fork - Brian's implementation only ran on the main thread, and measured time spent in transform/bundling/packaging at a high level, however when I tried to adapt this approach to a more broad high level tracing (e.g. per transform) I ran into the following issues: a) passing a Tracer around through the various layers would involve a lot of changes in a lot of places, b) it would not be able to cross thread/serialisation boundaries.

Spike approach

There is a very rough spike branch here: https://github.com/marcins/parcel/tree/mszczepanski/spike-system-tracing which at this stage is not comprehensive, nor particularly clean, but was created in order to help socialise the potential approach.

In this branch I have added additional "trace" functionality to the logger, mostly because I wanted to follow a similar approach - have a singleton SystemTracer that could be used to access the measurement functionality from anywhere. This SystemTracer lets you measure any code within Parcel with a pattern like:

 let measurement = SystemTracer.createMeasurement(transformer.name, {
  categories: ["transform"],
  args: {
    name: fromProjectPathRelative(initialAsset.value.filePath),
  },
});

// .. transformation happens here

measurement.end();

These trace events are caught by a TraceReporter which uses chrome-trace-event to produce an appropriate output.

This will produce a Chrome Trace event similar to:

{"ts":2726640,"pid":34523,"tid":3,"ph":"X","name":"@parcel/transformer-babel","cat":"transform","args":{"name":"node_modules/react/jsx-runtime.js"},"dur":288},

A trivial but complete trace is available here: https://gist.github.com/marcins/a1a0ceaf3f3fd0d4771e0e611ebc360f

Analysing results

While these traces can be loaded into Chrome DevTools Performance tab, like Parcel's other profiles, they are of limited value here due to the lack of analysis tools. However, where this data can really be useful is by loading it into https://ui.perfetto.dev/. Example screenshot from the trace above:

image

The visual timeline itself isn't necessarily useful except at a very high level, however the SQL queries built into Perfetto let us easily ask questions like "What is the most expensive transform?"

select 
  s.category, name, SUM(CAST(dur AS double)/1000/1000) as dur_ms
from 
  slice s 
left join 
  args using(arg_set_id)
where 
  args.flat_key = "args.name" AND
  s.category = "transform"
group by name
order by dur_ms desc
limit 100;
category name dur_ms
transform @parcel/transformer-babel 93.15300000000002
transform @parcel/transformer-js 68.70400000000001
transform @parcel/transformer-html 27.017
transform @parcel/transformer-css 4.612
transform @parcel/transformer-react-refresh-wrap 1.6889999999999998
transform @parcel/transformer-postcss 0.382
transform @parcel/transformer-posthtml 0.28

Using Perfetto's SQL you can come up with all sorts of other queries, whether it's by source file, category, etc.

Proposed implementation details

  • Create a new package @parcel/tracing to encapsulate tracing/profiling. The existing Trace and Profiler from packages/core/workers/src/Profiler.js could be moved here as well as they share a dependency on chrome-trace-event / deal with tracing.
  • Tracer / tracer should be analogous to Logger / logger (i.e. the class implementing the thing, a singleton instance of the thing exported for use in other code)
  • Follow the same pattern as for logging, a child will handle Tracer.onTrace events and re-emit them using bus.emit('traceEvent'), which get re-fired on the main thread by a listener.
  • Currently in the work Brian did there is a TraceReporter to handle creating the trace, but we should probably just handle this in core similar to how / where existing profiling traces are created?
  • Thought that needs more thinking.. --profile=application vs --profile=sampling (where sampling is the default current behaviour?)
  • need to measure the impact of sending measurement messages across workers when the result is unused - it would be easier to disable listening / writing to a file when the feature isn't enabled, rather than disabling at the sending end, but if there's a non-trivial impact then definitely work on the latter (maybe can just add an .enable()/.disable() to Tracer that can be called when config/CLI options are processed?)
  • measurement of most transforms is handled at the Transformer pipeline level, but for the BabelTransformer implement a wrapPluginVisitorMethod that will measure time spent in individual Babel plugins (marcins@51695a0#diff-8555dae9dfcfcf7ddcaab1c7cb7ebde8b3bcdecef9b54dc032c654bb0e2c5c21R69-R86)

Status / Questions / etc

Pending implementation.

@devongovett
Copy link
Member

This all makes sense to me, and it sounds useful!

In my spike I've just added a new trace method to Logger in order to let me easily send these trace events across the worker boundary.

You could do this using the bus from @parcel/workers. It's an event emitter, but it sends events in workers back to the main thread. The logger and reporters use it already.

Where should control of whether this feature is enabled be?

We could add a new CLI option like we have for --profile perhaps?

If we implement this, we should probably consider unifying it with profiling

We could make --profile take a value, like --profile=high-level (idk what to call it), but I don't think a separate --trace option would be too confusing either.

It would be useful to also capture time spent in Babel plugins - this might be possible to do in the Babel transform, as Babel config provides the ability to provide a wrapPluginVisitorMethod

Sounds like it would work.

@mischnic mischnic added the 💬 RFC Request For Comments label Oct 31, 2022
@marcins
Copy link
Contributor Author

marcins commented Oct 31, 2022

Thanks for the feedback Devon, I've removed the questions from the original post and added some proposed implementation details based on the answers and some more thinking.

@marcins marcins mentioned this issue Dec 16, 2022
5 tasks
@github-actions github-actions bot added the Stale Inactive issues label Apr 30, 2023
@devongovett devongovett removed the Stale Inactive issues label Apr 30, 2023
@parcel-bundler parcel-bundler deleted a comment from github-actions bot Apr 30, 2023
@github-actions github-actions bot added the Stale Inactive issues label Oct 27, 2023
@mischnic
Copy link
Member

@marcins is this completed with #8695?

@mischnic mischnic removed the Stale Inactive issues label Oct 27, 2023
@parcel-bundler parcel-bundler deleted a comment from github-actions bot Oct 27, 2023
@marcins
Copy link
Contributor Author

marcins commented Oct 27, 2023

Implemented in #8695

@marcins marcins closed this as completed Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💬 RFC Request For Comments
Projects
None yet
Development

No branches or pull requests

3 participants