Skip to content

Conversation

@ParidelPooya
Copy link
Contributor

@ParidelPooya ParidelPooya commented Dec 12, 2025

- Track finished runInChildContext in finishedAncestors
- Simplified hasFinishedAncestor that only checks finishedAncestors
- Enable map-min-successful and parallel-min-successful tests

- Add finishedAncestors parameter to CheckpointManager constructor
- Track completed operations (SUCCEED/FAIL) in finishedAncestors set
- Update all CheckpointManager instantiation sites
- Remove obsolete ancestor completion tests
- All tests passing (725/725)
- Build successful across all packages
- Removed pendingCompletions and ancestor completion methods
- Added finishedAncestors Set for tracking completed operations
- Implemented parent mapping for ancestor traversal
- Reduce expected InvocationCompleted events from 4 to 2
- Reflects new behavior where finishedAncestors prevents redundant operations
…rsing

    - Remove parentMapping Map from CheckpointManager to reduce memory usage
    - Add getParentId helper to parse hierarchical stepIds (e.g., '1-2-3' → '1-2')
    - Move finishedAncestors marking to run-in-child-context handler for proper scoping
    - Add markAncestorFinished method to Checkpoint interface for explicit control
    - Update all mock checkpoints to include new method for test compatibility
    - Temporarily disable parallel-wait assertion while investigating timing differences
…abstraction level

- Move operation names from step level to parallel branch level using NamedParallelBranch
- Move operation names from step level to map item level using itemNamer property
- This fixes timing issues where child operations completed before parent operations were marked as finished
- Checkpoint skipping now works correctly at the proper abstraction level where ancestor checking functions properly
…lity

- Test markAncestorFinished method for adding stepIds to finished ancestors set
- Test hasFinishedAncestor method for proper ancestor hierarchy checking
- Test checkpoint skipping behavior when ancestors are finished
- Test integration with complex nested hierarchies
- Verify that only true ancestors (not siblings) trigger checkpoint skipping
- All 13 tests passing with good coverage of the new functionality
@ParidelPooya ParidelPooya force-pushed the feat/finished-ancestors-clean branch 2 times, most recently from 305895f to 9ecc198 Compare December 12, 2025 17:56
- Remove 🧪 TESTING console logs from checkpoint handlers
- Clean up debug output that was still printing during tests
- Tests now run cleanly without verbose checkpoint logging
@ParidelPooya ParidelPooya force-pushed the feat/finished-ancestors-clean branch from 9ecc198 to 4414401 Compare December 12, 2025 18:06
async (event: any, context: DurableContext) => {
log("Starting parallel execution with minSuccessful: 2");

// Using ctx.step here will prevent us to check minSuccessful if we are trying
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also have tests that assert the behaviour of minSuccessful with steps and checkpoint latency, so that we show that the completed steps are the expected behaviour right now.

I would also expect that the success count in the result matches the number of extra succeeded branches too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the backend latency is not in our control. unless you only wants to run it locally. DEpends on checkpoints latency we could have more or less finished steps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create higher latency checkpoint by checkpointing large data. Something like this test: https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js-examples/src/examples/run-in-child-context/checkpoint-size-limit/run-in-child-context-checkpoint-size-limit.ts#L14

But if that isn't consistent we should at least have local-only tests since I think this would be something users can easily encounter


/**
* Checks if a step ID or any of its ancestors has a pending completion
* Mark an ancestor as finished (for run-in-child-context operations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just use markOperationState instead? And just add to this.finishedAncestors when the state is marked as OperationLifeCycle.COMPLETED? Then we don't need to have an extra function to use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OperationLifeCycle needs a lot of extra meta data. That was my first iteration but complexity was much higher

Comment on lines -308 to -315
if (this.hasFinishedAncestor(nextItem.stepId, nextItem.data)) {
log("⚠️", "Checkpoint skipped - ancestor finished:", {
stepId: nextItem.stepId,
parentId: nextItem.data.ParentId,
});
skippedCount++;
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be checking hasFinishedAncestor inside processQueue, since queue processing is async, so it's possible that an ancestor finished before we got to this batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I explained offline, branch3 and 2 will end up to the queue and them we need to either mark both as finished or ignore both

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that since we're using setImmediate, hasFinishedAncestor can return false before the queue is processed while it changes to true later when the queue starts processing. I can't think of an example, but I'm worried there's some edge case

Comment on lines 608 to 616
// Rule 5: Clean up operations whose ancestors are complete or pending completion
for (const op of allOps) {
if (
op.state === OperationLifecycleState.RETRY_WAITING ||
op.state === OperationLifecycleState.IDLE_NOT_AWAITED ||
op.state === OperationLifecycleState.IDLE_AWAITED
) {
if (this.hasPendingAncestorCompletion(op.stepId)) {
log(
"🧹",
`Cleaning up operation with completed ancestor: ${op.stepId}`,
);
this.cleanupOperation(op.stepId);
this.operations.delete(op.stepId);
}
// Note: Ancestor completion checking removed - operations will continue normally
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this logic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants