-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-18026: KIP-1112, ProcessorWrapper API with PAPI and partial DSL implementation #17892
base: trunk
Are you sure you want to change the base?
Conversation
@@ -714,8 +714,10 @@ public org.apache.kafka.streams.processor.api.Processor<Object, Object, Object, | |||
public synchronized <KIn, VIn, KOut, VOut> Topology addProcessor(final String name, | |||
final ProcessorSupplier<KIn, VIn, KOut, VOut> supplier, | |||
final String... parentNames) { | |||
internalTopologyBuilder.addProcessor(name, supplier, parentNames); | |||
final Set<StoreBuilder<?>> stores = supplier.stores(); | |||
final ProcessorSupplier<KIn, VIn, KOut, VOut> wrapped = internalTopologyBuilder.wrapProcessorSupplier(name, supplier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the entire implementation for PAPI apps
dslStoreSuppliers = Utils.newInstance( | ||
config.getClass(StreamsConfig.DSL_STORE_SUPPLIERS_CLASS_CONFIG), | ||
DslStoreSuppliers.class); | ||
dslStoreSuppliers = config.getConfiguredInstance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side fix: we weren't configuring this with the app configs
for (final StoreBuilder<?> storeBuilder : processorSupplier.stores()) { | ||
ApiUtils.checkSupplier(processorSupplier); | ||
|
||
final ProcessorSupplier<KIn, VIn, KOut, VOut> wrapped = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is how the DSL implementation will work -- processors are wrapped when building the DSL into a Topology, specifically in this method where they are added to the InternalTopologyBuilder.
Note this PR is only a partial implementation since not all DSL operators go through this class, some just write their processors to the InternalTopologyBuilder directly. Followup PRs will tackle converting these operators and enforcing that any new ones added in the future will have to go through this method and be wrapped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why we do this here instead of inside topologyBuilder.addProcessor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good question! that would definitely be simpler. however, to maximize the utility of the new wrapper, we want to allow users to wrap not only the Processor returned by the ProcessorSupplier#get method, but also the StoreBuilders returned from ProcessorSupplier#stores. Unfortunately, the stores are currently extracted and connected outside of the #addProcessor method, and since they aren't attached (via #addStateStore) until after #addProcessor is called, we can't even look up and wrap the stores for a processor from within #addProcessor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately it might make sense to move the extraction of the stores from a ProcessorSupplier into a single InternalTopologyBuilder method (eg #addStatefulProcessor) that calls both #addProcessor and #addStateStore, and do the wrapping there. This will also help future-proof new DSL operators from being implemented incorrectly and missing the wrapper step.
But I think it makes sense to wait until we've completed all the followup PRs for the remaining DSL operators, in case there are any weird edge cases we haven't thought of/seen yet. Then we can do one final PR to clean up and future-proof the wrapping process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (non-binding, of course)
* topology builders via the {@link org.apache.kafka.streams.StreamsBuilder#StreamsBuilder(TopologyConfig) StreamsBuilder(TopologyConfig)} method | ||
* for DSL applications, or when creating a PAPI topology via the {@link Topology#Topology(TopologyConfig)} constructor. | ||
* <p> | ||
* Note that some configs that are only defined in the TopologyConfig and not in the StreamsConfig, such as the {@code processor.wrapper.class}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a possibility of having this be available (and respected) in StreamsConfig
? People tend to forget to properly pipe things through TopolgoyConfig, though having it only there makes it a bit more obvious where it needs to be passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good catch -- this comment is actually out of date, and I did end up having to add the config to StreamsConfig as well.
I definitely share your concerns about people forgetting to properly pipe things in to the Topology/StreamsBuilder through TopologyConfig though. I recently filed KAFKA-18053 for exactly this problem, with one subtask covering a way to improve the situation without a KIP by detecting and notifying users of a misconfiguration. Ideally we can include this fix in 4.0 as well
topologyConfigs.originals() | ||
); | ||
} catch (final Exception e) { | ||
log.error("Unable to instantiate ProcessorWrapper from value of config {}. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does it make sense to also include the error in the logged message in case the user swallows this up higher in the stack somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
for (final StoreBuilder<?> storeBuilder : processorSupplier.stores()) { | ||
ApiUtils.checkSupplier(processorSupplier); | ||
|
||
final ProcessorSupplier<KIn, VIn, KOut, VOut> wrapped = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why we do this here instead of inside topologyBuilder.addProcessor
This PR includes the API for KIP-1112 and a partial implementation, which wraps any processors added through the PAPI and the DSL processors that are written to the topology through the ProcessorParameters#addProcessorTo method.
Further PRs will complete the implementation by converting the remaining DSL operators to using the #addProcessorTo method, and future-proof the processor writing mechanism to prevent new DSL operators from being implemented incorrectly/without the wrapper
Unit tests are WIP -- will be added to this PR shortly