-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft ActivitySource Child Activity PropagationData solution #1
base: main
Are you sure you want to change the base?
Conversation
…ctivityContext to sampling logic.
CC @noahfalk Is it expected to add more stuff to the ParentActivityState in the future? I am asking to know is it worth it to add a new type for that or we can just pass the ActivityDataRequest as extra parameter to the call backs. |
by the way, thanks @CodeBlanch for the proposal :-) |
#2 would be an alternative, much bigger, approach to solving this (and other things). |
I wouldn't go that far with proposal #2 just to solve this scoped issue. Now looking at this proposal #1. Looking at the 4-bullets in the description, I think this is can be detected without any APIs. right?
You'll get the Activity context in the callback and you can assume the ActivityDataRequest.None at that time.
You'll get ActivityContext in the callback and you can check the ActivityTraceFlags and set ActivityDataRequest.AllDataAndRecorded.
You can request Activity.Current too and handle it. If the concern here is API usability, we can clarify the documentation and specifically list these actions there. Also, we can provide a sample code showing how you can do that. |
Essentially we're asking the user to understand the intricate details of the implementation and correctly craft this logic in their callback code: bool isDefaultContext = context == default;
Activity? currentActivity = Activity.Current;
ActivityContext initializedContext = !isDefaultContext
? context
: currentActivity?.Context ?? context;
ActivityDataRequest parentDataRequested = isDefaultContext
? ActivityDataRequest.None
: (initializedContext.TraceFlags & ActivityTraceFlags.Recorded) == ActivityTraceFlags.Recorded
? ActivityDataRequest.AllDataAndRecorded
: currentActivity?.IsAllDataRequested == true
? ActivityDataRequest.AllData
: ActivityDataRequest.PropagationData; Could they pull it off? Yes. I managed to :) But I think it would be nicer to give them an API that takes care of this heavy lifting. |
It corresponds to the OT spec option "RECORD". I am not sure how you would implement the sampling spec without it?
I probably missed some backstory. Assuming you had this value what does it get used for? The OT spec for sampling does not specify that samplers are intended to be provided with the sampling decision for the parent. |
@noahfalk You are thinking of AllDataAndRecorded. See, it's confusing :) We set Activity.ActivityTraceFlags.Recorded/Activity.Recorded for ActivityDataRequest.AllDataAndRecorded. We set Activity.AllDataRequested when ActivityDataRequest.AllData or ActivityDataRequest.AllDataAndRecorded. When we propagate context to another dependency we only really have 3 states. No propagation at all (ActivityDataRequest.None), propagation w/ ActivityTraceFlags.Recorded = yes (ActivityDataRequest.AllDataAndRecorded), or propagation w/ ActivityTraceFlags.Recorded = no (which could mean ActivityDataRequest.PropagationData or ActivityDataRequest.AllData, there's no way to know).
Ya there seems to be a lot of confusion here. But it's critical! Let me try to explain. Let's say we are writing an ASP.NET Core web service.
Help at all? |
I agree its confusing, but I still believe my original statement was correct : ) The OpenTelemetry spec outlines three values that a sampler can return:
The naming is aggravating, but OT defines Span.IsRecording() completely differently than Activity.IsRecorded. We can't change Activity.IsRecorded because that API already shipped so unless OT changes their spec it is an unavoidable naming collision:
Translating the OT spec into Activity terminology gives us: RECORD can not match AllDataAndRecorded because that option sets Activity.IsRecorded = true, and the spec says it must be false.
Thanks 👍 I think your objection may be with the specification itself? The specification details exactly what data should be made available to the sampler:
In particular the SpanContext of the parent includes the sampling bit, but the parent's Span.IsRecording() bit is not in the list. The .NET implementation matches the functionality of the spec albeit with the adjusted naming. The parent context's IsRecorded flag (the sampling bit) is available but Activity.IsAllDataRequested (the Span.IsRecording() function) is not. If that sounds correct then I think the best course of action is to first get the OpenTelemetry sig to change the spec, then we'll update the .NET implementation to match the updated spec. Or if I am still misunderstanding and it is an issue particular to the .NET implementation you could probably highlight that by showing how the task you want to do can be achieved in code when using some other reference implementation of the spec. Hope that helps a bit? : ) |
@noahfalk Ah got it. OT dotnet doesn't use the .NET:
OT:
Available states during propagation:
I don't know how to reconcile these things. If
So AllData/RECORD has meaning in the application that was instrumented but won't propagate to downstream dependencies. That is an issue with the spec, not .NET, you are correct about that. Anyway, @noahfalk before you reply, the comment about removing
More work for me you suck :D Here's a sampler implementation using the current API and the mappings from above: // Sampling by ParentId. Who knows what to do here? Bueller?
GetRequestedDataUsingParentId = (ref ActivityCreationOptions<string> options) => ActivityDataRequest.AllData,
// Sampling by context.
GetRequestedDataUsingContext = (ref ActivityCreationOptions<ActivityContext> options) =>
{
ActivityDataRequest parentActivityDataRequest = DetermineParentActivityDataRequest(options.Parent);
if (parentActivityDataRequest != ActivityDataRequest.None)
{
// If Parent had a sampling decision, restore it on the child.
return parentActivityDataRequest;
}
// If a link is marked as recorded, respect that.
foreach (var link in options.Links ?? Array.Empty<ActivityLink>())
{
if ((link.Context.TraceFlags & ActivityTraceFlags.Recorded) == ActivityTraceFlags.Recorded)
{
return ActivityDataRequest.AllDataAndRecorded;
}
}
// Sample this root span using some algorithm.
return !ExecuteSamplingAlgorithm(options)
? ActivityDataRequest.PropagationData
: ActivityDataRequest.AllDataAndRecorded;
// ActivityDataRequest.AllData not used, but it could be in the future.
// None never returned. Minimum level will be PropagationData.
static ActivityDataRequest DetermineParentActivityDataRequest(ActivityContext context)
{
bool isDefaultContext = context == default;
Activity currentActivity = Activity.Current;
ActivityContext initializedContext = !isDefaultContext
? context
: currentActivity?.Context ?? context;
return isDefaultContext
? ActivityDataRequest.None // If we had no context, there is no parent.
: (initializedContext.TraceFlags & ActivityTraceFlags.Recorded) == ActivityTraceFlags.Recorded
? ActivityDataRequest.AllDataAndRecorded // If context has ActivityTraceFlags.Recorded than Parent was AllDataAndRecorded.
: currentActivity?.IsAllDataRequested == true
? ActivityDataRequest.AllData // If we have Activity.Current for the parent (same-process) restore AllData if it was used.
: ActivityDataRequest.PropagationData; // Otherwise PropagationData.
}
}, I think that will work using the current implementation of everything. More or less, no changes needed by the .NET team. But look at the This PR and issue #2 are all about just trying to make the API nicer. I think we should pass in the parent ActivityDataRequest and encapsulate the PS: In that sampler code above I wanted to implement the Probability logic. But in OT it is a function of TraceId which is not available in the .NET API until after sampling. So a change will be needed there if we are to implement the spec as-is? |
Thanks for the sample, I think it may have highlighted the confusion! My understanding of the OT spec is that the sampling algorithm specified by the app developer (here represented by ExecuteSamplingAlgorithm) must be used for all spans, not only root spans. So I would expect the OpenTelemetry spec authors to say this sample code above is not a correct implementation of the spec and the SDK should not be trying to short-circuit invoking the sampler. @cijothomas are you able to confirm whether I am interpreting the spec correctly?
Continuing from above I don't believe the OT spec requires us to "propagate everything" in order to correctly implement the spec. If my understanding is wrong I am counting on @cijothomas or other OT folks to join in and say so : ) Based on my understanding of the OT spec I think this would be a simple valid implementation that doesn't make use of 'None':
And this is an optional more optimized implementation that avoids creating unnecessary Activities:
This does seem like an oversight, or something that got lost during refactoring. We should be providing the traceId in the GetRequestedData callback even when there is no parent. @tarekgh could you take a look at this and pull me in if I can be helpful? |
GetRequestedDataUsingContext is the sampler, which we always invoke, for every activity/span. Let me rewrite the code I had in the callback to simulate user's sampler of choice being plugged in. I'll show a few examples of how different OT spec samplers would look. You will see it doesn't change anything with regard to the complexity of determining the parent. // Sampling by ParentId. Who knows what to do here? Bueller?
GetRequestedDataUsingParentId = (ref ActivityCreationOptions<string> options) => ActivityDataRequest.AllData,
// Sampling by context.
GetRequestedDataUsingContext = (ref ActivityCreationOptions<ActivityContext> options) => _UserSampler?.Invoke(ref options) ?? ActivityDataRequest.None; OT AlwaysOn, _UserSampler = ExecuteAlwaysOnLogic private ActivityDataRequest ExecuteAlwaysOnLogic(ref ActivityCreationOptions<ActivityContext> options) => ActivityDataRequest.AllDataAndRecorded; OT AlwaysOff, _UserSampler = ExecuteAlwaysOffLogic private ActivityDataRequest ExecuteAlwaysOffLogic(ref ActivityCreationOptions<ActivityContext> options) => ActivityDataRequest.PropagationData; // Maybe None, maybe determined by ActivityKind. OT Probability, _UserSampler = ExecuteProbabilityLogic
private ActivityDataRequest ExecuteProbabilityLogic(ref ActivityCreationOptions<ActivityContext> options)
{
ActivityDataRequest parentActivityDataRequest = DetermineParentActivityDataRequest(options.Parent);
if (parentActivityDataRequest != ActivityDataRequest.None)
{
// If Parent had a sampling decision, restore it on the child.
return parentActivityDataRequest;
}
// If a link is marked as recorded, respect that.
foreach (var link in options.Links ?? Array.Empty<ActivityLink>())
{
if ((link.Context.TraceFlags & ActivityTraceFlags.Recorded) == ActivityTraceFlags.Recorded)
{
return ActivityDataRequest.AllDataAndRecorded;
}
}
// Sample this root span using TraceId probability algorithm. TraceId unavailable in .NET API.
return !ShouldSampleBasedOnTraceIdAlgorithm(ref options)
? ActivityDataRequest.PropagationData
: ActivityDataRequest.AllDataAndRecorded;
// ActivityDataRequest.AllData not used, but it could be in the future.
// None never returned. Minimum level will be PropagationData.
static ActivityDataRequest DetermineParentActivityDataRequest(ActivityContext context)
{
bool isDefaultContext = context == default;
Activity currentActivity = Activity.Current;
ActivityContext initializedContext = !isDefaultContext
? context
: currentActivity?.Context ?? context;
return isDefaultContext
? ActivityDataRequest.None // If we had no context, there is no parent.
: (initializedContext.TraceFlags & ActivityTraceFlags.Recorded) == ActivityTraceFlags.Recorded
? ActivityDataRequest.AllDataAndRecorded // If context has ActivityTraceFlags.Recorded than Parent was AllDataAndRecorded.
: currentActivity?.IsAllDataRequested == true
? ActivityDataRequest.AllData // If we have Activity.Current for the parent (same-process) restore AllData if it was used.
: ActivityDataRequest.PropagationData; // Otherwise PropagationData.
}
} OT ParentOrElse, _UserSampler = ExecuteParentOrElseLogic
private ActivityDataRequest ExecuteParentOrElseLogic(ref ActivityCreationOptions<ActivityContext> options)
{
ActivityDataRequest parentActivityDataRequest = DetermineParentActivityDataRequest(options.Parent);
if (parentActivityDataRequest != ActivityDataRequest.None)
{
// If Parent had a sampling decision, restore it on the child.
return parentActivityDataRequest;
}
// If a link is marked as recorded, respect that.
foreach (var link in options.Links ?? Array.Empty<ActivityLink>())
{
if ((link.Context.TraceFlags & ActivityTraceFlags.Recorded) == ActivityTraceFlags.Recorded)
{
return ActivityDataRequest.AllDataAndRecorded;
}
}
// Sample this root span using a delegate provided by the user.
return _UserDelegateSampler(ref options);
static ActivityDataRequest DetermineParentActivityDataRequest(ActivityContext context)
{
bool isDefaultContext = context == default;
Activity currentActivity = Activity.Current;
ActivityContext initializedContext = !isDefaultContext
? context
: currentActivity?.Context ?? context;
return isDefaultContext
? ActivityDataRequest.None // If we had no context, there is no parent.
: (initializedContext.TraceFlags & ActivityTraceFlags.Recorded) == ActivityTraceFlags.Recorded
? ActivityDataRequest.AllDataAndRecorded // If context has ActivityTraceFlags.Recorded than Parent was AllDataAndRecorded.
: currentActivity?.IsAllDataRequested == true
? ActivityDataRequest.AllData // If we have Activity.Current for the parent (same-process) restore AllData if it was used.
: ActivityDataRequest.PropagationData; // Otherwise PropagationData.
}
} ParentOrElse & Probability samplers both need to understand the parent's state. In order to do that, they need |
…et#53792) I have expanded the PerfMap format produced by Crossgen2 and R2RDump to produce metadata in form of pseudo-symbol records with high addresses. In this version I have implemented four metadata entries - output GUID, target OS, target architecture and perfmap format version number. I have verified for System.Private.CoreLib and for the composite framework that Crossgen2 and R2RDump produce identical metadata. To facilitate a smooth transition to the new perfmap format, in accordance with Juan's suggestion I have introduced a new command-line option to explicitly specify the perfmap format revision. As of today, 0 corresponds to the legacy Crossgen1-style output where the perfmap file name includes the {MVID} section, perfmap format #1 corresponds to current Crossgen2 with its new naming scheme. As of today there are no differences in the file content. Thanks Tomas
CodeQL flagged various places where we're dereferencing pointers that could be NULL, this PR systematically cleans some of them up via g_assert. * g_assert result of g_build_path calls * Allocation failure handling * mono_class_inflate_generic_class_checked can return NULL
…#102133) This generalizes the indir reordering optimization (that currently only triggers for loads) to kick in for GT_STOREIND nodes. The main complication with doing this is the fact that the data node of the second indirection needs its own reordering with the previous indirection. The existing logic works by reordering all nodes between the first and second indirection that are unrelated to the second indirection's computation to happen after it. Once that is done we know that there are no uses of the first indirection's result between it and the second indirection, so after doing the necessary interference checks we can safely move the previous indirection to happen after the data node of the second indirection. Example: ```csharp class Body { public double x, y, z, vx, vy, vz, mass; } static void Advance(double dt, Body[] bodies) { foreach (Body b in bodies) { b.x += dt * b.vx; b.y += dt * b.vy; b.z += dt * b.vz; } } ``` Diff: ```diff @@ -1,18 +1,17 @@ -G_M55007_IG04: ;; offset=0x001C +G_M55007_IG04: ;; offset=0x0020 ldr x3, [x0, w1, UXTW #3] ldp d16, d17, [x3, #0x08] ldp d18, d19, [x3, #0x20] fmul d18, d0, d18 fadd d16, d16, d18 - str d16, [x3, #0x08] - fmul d16, d0, d19 - fadd d16, d17, d16 - str d16, [x3, #0x10] + fmul d18, d0, d19 + fadd d17, d17, d18 + stp d16, d17, [x3, #0x08] ldr d16, [x3, #0x18] ldr d17, [x3, #0x30] fmul d17, d0, d17 fadd d16, d16, d17 str d16, [x3, #0x18] add w1, w1, #1 cmp w2, w1 bgt G_M55007_IG04 ```
* bug #1: don't allow for values out of the SerializationRecordType enum range * bug #2: throw SerializationException rather than KeyNotFoundException when the referenced record is missing or it points to a record of different type * bug #3: throw SerializationException rather than FormatException when it's being thrown by BinaryReader (or sth else that we use) * bug #4: document the fact that IOException can be thrown * bug #5: throw SerializationException rather than OverflowException when parsing the decimal fails * bug #6: 0 and 17 are illegal values for PrimitiveType enum * bug #7: throw SerializationException when a surrogate character is read (so far an ArgumentException was thrown)
* [NRBF] Don't use Unsafe.As when decoding DateTime(s) (dotnet#105749) * Add NrbfDecoder Fuzzer (dotnet#107385) * [NRBF] Fix bugs discovered by the fuzzer (dotnet#107368) * bug #1: don't allow for values out of the SerializationRecordType enum range * bug #2: throw SerializationException rather than KeyNotFoundException when the referenced record is missing or it points to a record of different type * bug #3: throw SerializationException rather than FormatException when it's being thrown by BinaryReader (or sth else that we use) * bug #4: document the fact that IOException can be thrown * bug #5: throw SerializationException rather than OverflowException when parsing the decimal fails * bug #6: 0 and 17 are illegal values for PrimitiveType enum * bug #7: throw SerializationException when a surrogate character is read (so far an ArgumentException was thrown) # Conflicts: # src/libraries/System.Formats.Nrbf/src/System/Formats/Nrbf/NrbfDecoder.cs * [NRBF] throw SerializationException when a surrogate character is read (dotnet#107532) (so far an ArgumentException was thrown) * [NRBF] Fuzzing non-seekable stream input (dotnet#107605) * [NRBF] More bug fixes (dotnet#107682) - Don't use `Debug.Fail` not followed by an exception (it may cause problems for apps deployed in Debug) - avoid Int32 overflow - throw for unexpected enum values just in case parsing has not rejected them - validate the number of chars read by BinaryReader.ReadChars - pass serialization record id to ex message - return false rather than throw EndOfStreamException when provided Stream has not enough data - don't restore the position in finally - limit max SZ and MD array length to Array.MaxLength, stop using LinkedList<T> as List<T> will be able to hold all elements now - remove internal enum values that were always illegal, but needed to be handled everywhere - Fix DebuggerDisplay * [NRBF] Comments and bug fixes from internal code review (dotnet#107735) * copy comments and asserts from Levis internal code review * apply Levis suggestion: don't store Array.MaxLength as a const, as it may change in the future * add missing and fix some of the existing comments * first bug fix: SerializationRecord.TypeNameMatches should throw ArgumentNullException for null Type argument * second bug fix: SerializationRecord.TypeNameMatches should know the difference between SZArray and single-dimension, non-zero offset arrays (example: int[] and int[*]) * third bug fix: don't cast bytes to booleans * fourth bug fix: don't cast bytes to DateTimes * add one test case that I've forgot in previous PR # Conflicts: # src/libraries/System.Formats.Nrbf/src/System/Formats/Nrbf/SerializationRecord.cs * [NRBF] Address issues discovered by Threat Model (dotnet#106629) * introduce ArrayRecord.FlattenedLength * do not include invalid Type or Assembly names in the exception messages, as it's most likely corrupted/tampered/malicious data and could be used as a vector of attack. * It is possible to have binary array records have an element type of array without being marked as jagged --------- Co-authored-by: Buyaa Namnan <[email protected]>
/cc @cijothomas @tarekgh
Here's a scratch solution for the issue we discussed earlier today wrt
ActivityDataRequest.PropagationData
being hard to determine when child Activity is created/sampled.What I tried to do was reconstruct the
ActivityDataRequest
used by the Parent so that it can be passed into the "sampler" logic when a child is created.Let me know what you guys think. I didn't want to go too far with it before testing the water.
The logic is basically...
When this is all happening in proc, there is enough data to do it. When that state is transmitted over the wire it's going to be hard to restore all the states. In that case you know you have a context and it is or isn't Recorded. If Recorded, you are ActivityDataRequest.AllDataAndRecorded. If not recorded you could be either ActivityDataRequest.PropagationData or ActivityDataRequest.AllData. I don't think we can solve that without expanding W3C ActivityTraceFlags. Or we could also just drop ActivityDataRequest.AllData? I don't really see the value it is adding to the mix.