-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduction of a Synthetic Attribute for Server Span Telemetry #1127
Comments
A few questions/thoughts:
It'd be awesome if you could send a PR with a specific proposal (considering the above). |
Thank you for your feedback on this issue! Just a couple questions regarding your first point:
|
I think we need to get some clarity regarding "what is synthetic source". For example, do we think it'll be a static list of client types (e.g. Agent header for HTTP) or a list that will be frequently updated? For example, we do not want to have an explicit flag saying "this trace is a result of a synthetic request" then we noticed "oops, we just realized that there are other traces from agent XYZ, and this agent is actually powered by AI/LLM so the previously added synthetic flag should be fixed". |
@reyang I think it'll be important to keep the list of known synthetic sources updated over time as there's no way to predict how popular a certain bot might become. I'm a little confused by your example. Are you essentially saying that in the scenario, it would be possible that we would miss synthetic traces created by newer technologies if we only maintained a static list? |
Area(s)
area:browser
Is your change request related to a problem? Please describe.
I would like to be able to identify telemetry created by synthetic sources such as bots or crawlers. This issue looks to work on defining conventions surrounding marking spans as originating from a synthetic source.
Describe the solution you'd like
I would like to introduce an attribute to HTTP server span semantic conventions, as well as metrics and logs that represents a low-cardinality string such as the below:
synthetic ->
"not set" | "bot" | "synthetic test"
Where the synthetic attribute being set to
"not set"
represents telemetry that is not generated from a synthetic source. This convention will be helpful for scenarios where a user may want to mark telemetry generated from frequent synthetic tests or web crawlers separately from direct human engagement.The determination of which of the three options a span falls into could be made by maintaining a list of known synthetic sources or allowing this decision to be user configurable.
Describe alternatives you've considered
While we could consider setting the
synthetic
attribute to a Boolean value, I believe the extra granularity of the low-cardinality string would be valuable.Additional context
No response
The text was updated successfully, but these errors were encountered: