Pipeline 2.0 RFC #8645
marcinszkudlinski
started this conversation in
Ideas
Replies: 2 comments
-
Does the implementation enforce exact 1ms period? If not then I would recommend to replace 1ms with 'scheduling period', which default is 1ms.
This statement conflicts with the previous one about processing data in 1ms chunks. It might need additional explanation what you meant by consume all data (when?). |
Beta Was this translation helpful? Give feedback.
0 replies
-
Discussion 12/19/23
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
OBSOLETE - the newest version here:
thesofproject/sof-docs#497
readable version: https://marcinszkudlinski.github.io/sof-docs/PAGES/architectures/firmware/sof-common/pipeline_2_0/pipeline2_0_discussion.html
DOC version: 0.2 12/18/2023
This document describes a set of architecture changes known as “pipeline 2.0”.
The purpose of the changes is to make the pipeline:
initial headsup is here: #7261, some of the changes have already been implemented, but with a lot of workarounds i.e. in DP processing.
1. Module scheduling
Low Latency (LL) scheduling type
All LL modules will be called every 1ms, one-by-one, in a context of a single high-priority worker thread, staring from the first data producer to the last data consumer. All LL modules working on a single core must finish processing within 1ms and must process data in 1ms chunks. The latency of a complete modules chain (when all modules are on the same core) is 1ms.
2.0 new: Each module should consume all available data from the input buffer(s), there must not be any data left in input buffer. There are exceptions: if a module has certain needs to keep data in buffers between cycles (like ASRC module), it may request it at binding time. In that case binding code must create a buffer that fulfill this requirement.
2.0 new: Modules located on different cores may be freely bound and exchange data cross-core, even in case of LL-LL connection but 1) each core-cross bind will add 1ms latency to processing 2) there are certain limitations described later.
2. Data Processing (DP) scheduling type (partially 2.0 new)
Each module works in its own preemptible thread context, with lower priority than LL. Each of DP module’s thread has the same priority. If there are more than one module ready, the scheduler will choose a module that have the closest deadline, where a deadline is the last moment when a following module will have to start processing.
Modules will be scheduled for processing when they have enough data for processing at all inputs and enough space to store results on all outputs.
Current limitation/simplification: DP module must be surrounded by LL modules, there’s no possibility to connect DP to DP. DP to DP binding and proper deadline calculation is a complex task will be introduced later.
A DP module may be located at different core that modules bound to it, there’s no difference in processing or latency.
3. Sink/src interface
Sink/src API is an interface for data flow between bound modules. It was introduced some time ago.
The main principle of modules is to process audio data. To do it, a typical module needs a source to get data from and a sink to send a processed data to. In current pipeline there’s only one type of data flow, using comp_buffer/audio_stream structures and surrounding procedures.
Pipeline 2.0 introduces an API that allows flexibility of data flow – a sink / source API.
There are 2 sides of sink/source API:
An API provider, typically a buffer or (2.0 new) a module like DAI - a provider of sink/source api is an entity implementing all required API methods and providing a ready-to-use pointers to data to be processed / buffer to store processed data.
It is up to the provider to take care of buffer management, avoiding conflicts, taking care of cache coherency (if needed), etc.
Sink/source providers may have their properties and limitations – like DAI may not be able to provide data to other cores, etc. See following chapters for details
An API user, typically a processing module.
A user of sink/source API is an entity that simply call API methods and get a ready-to-use pointers to data for processing / buffer to store results.
Api user does not need to do any extra operations with data, like taking care of cache coherency, it can just simply use provided pointers. It is up to the pipeline code to use a proper api provider. See following chapters for details.
Sink/Src naming convention: always look from the API user (not API provider) point of view
(2.0 new) In typical case a user of API is a processing module, a provider is a buffer, but there are other possibilities. If a module – for any reason – need to have an internal data buffer, it may simply optimize the flow by exposing it to the others by providing sink/src API. Typical example of such module is DAI, that needs to have an internal buffer for DMA and may provide data to next module directly, without using additional buffer in the middle.
Currently, however, there’s an optimalization in the code – DAI may use a buffer between itself and next module for its own purposes, but it is an optimalization trick/hack. Sink/Src API allows to do it in natural and flexible way.
Another example of module providing sink interface may be a mixout, accepting one stream at input, keeping data in an internal buffer, and sending them out to several other modules (identical copies) by providing several instances of source API and exposing the same data buffer to several receivers. Also a unique pair mixin-mixout may use sink/src API to expose their internal buffers to each other.
4. Module binding
There may be 3 kinds of bindings:
4.1 entity using sink/source to entity using sink/source
Typically, a module a module. This is the most natural way of binding (at current code - the only way), requires a buffer in between:
Using of a buffer provide a lot of flexibility, allowing cross-core binds, optional data linearization, LL to DP connections – just a proper buffer need to be used. See following chapter for details.
4.2 (2.0 new) direct connection entity exposing sink/source to entity using sink/source
Typically a DAI providing/accepting data to/from a module. There’s no buffer between , but binding a module to a module without a buffer implies some limitations:
In a rare situation when any of the above conditions is not met (i.e. cross core or DP module), a proper buffer must be used with additional sink_to_source copier:
4.3 (2.0 new) entity exposing sink/source to entity exposing sink/source
Extremely rare connection, like DAI to DAI. Both entities expose their internal buffers by sink/source. Connection requires a sink_to_source copier.
Again, modules must:
In a rare situation when any of the above conditions is not met, a proper buffer must be used with 2 sink_to_source copiers:
It looks complicated, but probably will be a very rare case, like 2 DAIs on separate cores (!!) bound together. Maybe it is not worth to implement at all.
5. Module binding – module bind requirements.
a module should declare it needs on every input and output:
Currently all buffers are circular, if a module needs to have linear data it is performing linearization by itself.
It is not optimal in many ways:
Special case: DP to LL bind (2.0 new)
This type of binding requires one adjustment. As stated before, LL modules need to process all available data from input. DP module, however, may and usually will work in longer cycles than 1ms providing several 1ms data chunks for LL module.
Solution: regardless of real amount of data stored in the buffer, LL module should be able to retrieve only data portion required for its 1ms processing (its IBS). It should be done at source API level.
Optional: There may be a “fast mode”, allowing LL modules to process all data from huge buffers at single step. Details todo – there may be a lot of problems, including sudden CPU load spikes, problems with LL internal buffers, etc.
6. Module binding – types of buffers (2.0 new)
As stated before, the most common type of bindings will be a “classic” connection of modules – users of sink/src APIs with a buffer providing source/sink between them.
To fulfill all modules’ requirements several types of buffers need to be used (buffer implementation “comp_buffer” and “audio_stream” will be removed):
6.1 shared buffer (2.0 new)
A connection between 2 LL modules in a chain
In case of a typical LL pipeline, each of the modules is processing a complete set of data on its input and produce a complete set of data on output. That typically means 16 - 48 audio frames per LL cycle.
The requirement is that the input buffer(s) is always drained completely.
In case of LL chain of modules located on the same core:
A huge optimalization may be made – each of above buffers may share the same memory space.
Note that it does not apply to situations where there are several input pins:
The order of processing will be:
LL1, LL2, LL5, LL6, LL3, LL4
I the example above 2 shared memory spaces should be used, marked above with red and green colors. Proper LL chain detector must be implemented.
Size of memory space should be 2 * MAX(all_OBSes, all_IBSes)
Note that shared buffer:
That means that a shared buffer cannot be used if any of LL modules has special needs (like ASRC, must keep some data in buffers between cycles).
6.2 Lockless cross core data buffer
This kind of buffer is to be used at every place where a shared buffer cannot be used, and the data flow does not need linearization (in case of LL to LL connection data will be linear in natural way)
This buffer can provide:
The buffer code is currently upstreamed as “dpQueue”, as it was intended initially to work with DP modules. (2.0 new) this name should be changed.
6.3 Linearization cross core data buffer 2.0 new
This buffer is the most sophisticated of all. It needs to be able to combine all features of “Lockless cross core data buffer” – unfortunately except “small overhead” – enforcing linear data on input/output.
This buffer should be used if modules cannot be bound using shared buffer, at least one of the modules is DP and any of the modules requires linear data / linear buffer space.
Implementation details TBD, it will probably require some internal data copy/move etc. There’s space for optimalization like – avoid some data move if only one of the modules requires linear data, etc.
7. Binding pipelines to cores
Each module is bound to a single core at creation time and will never move to another core. Also during pipeline creation, a driver should declare on which core it wants the pipeline to be created. All pipeline operations (mostly iteration through modules) will be than performed by the core the pipeline is bound to.
partially 2.0 new A module belonging to a pipeline does need to be located at the same core as the pipeline, but in this case the pipeline would need to use time-consuming IDC calls to perform any operation on it (start/prepare/pause etc.). The most optimal setup would than be to locate the pipeline on the same core that most of the modules of the pipeline are located.
8. Iteration through the modules
The most common operation on modules is iteration through all modules in system or through a modules subset – like all modules belonging to a pipeline, all LL modules, etc.
In current code the order is determined by a sophisticated mechanism, based on the way the modules are bound to each other. This is 1) way too complicated 2) problematic in case of modules located at different cores 3) problematic in case of DP modules 4) requires “direction” what is not a part of IPC4.
(2.0 new) Fortunately, the modules need always be iterated in the very same order, and this order well known during the topology design. This order should be passed to the FW during module/pipeline creation and should not be modified later. In this case module iteration may be based on a very simple list (one list per core), modified only when a module is created/deleted (implementation should take extra care for avoiding races when the list is modified). In the reference FW there’s a requirement that the modules are created in the order they need to be iterated (module created first goes first) and it is up to the driver to create them in right order. All modules’ iterations – including LL scheduling order – is than based on this. cSOF should use same solution – that may require some modifications in the driver.
Backward compatibility: legacy recurrency based algorithm should be kept in case of IPC3 and be used for creating module iteration list.
Beta Was this translation helpful? Give feedback.
All reactions