Introduction of a Synthetic Attribute for Server Span Telemetry #1127

JacksonWeber · 2024-06-06T01:56:51Z

Area(s)

area:browser

Is your change request related to a problem? Please describe.

I would like to be able to identify telemetry created by synthetic sources such as bots or crawlers. This issue looks to work on defining conventions surrounding marking spans as originating from a synthetic source.

Describe the solution you'd like

I would like to introduce an attribute to HTTP server span semantic conventions, as well as metrics and logs that represents a low-cardinality string such as the below:

synthetic -> "not set" | "bot" | "synthetic test"

Where the synthetic attribute being set to "not set" represents telemetry that is not generated from a synthetic source. This convention will be helpful for scenarios where a user may want to mark telemetry generated from frequent synthetic tests or web crawlers separately from direct human engagement.

The determination of which of the three options a span falls into could be made by maintaining a list of known synthetic sources or allowing this decision to be user configurable.

Describe alternatives you've considered

While we could consider setting the synthetic attribute to a Boolean value, I believe the extra granularity of the low-cardinality string would be valuable.

Additional context

No response

The text was updated successfully, but these errors were encountered:

MSNev · 2024-06-06T19:16:00Z

#1230

lmolkova · 2024-10-07T16:58:31Z

A few questions/thoughts:

While non-HTTP usage would probably be low/non-exisent, I don't think it belongs in HTTP domain. So I'd consider adding some attribute like user_agent.type (probably needs a better name).
Is there some prior art in the industry to identify a synthetic source/bot user? Is there an attribute in ECS for it? Are there some non-telemetry user-agent conventions for it?
nit: let's just not set an attribute instead of using not_set value.

It'd be awesome if you could send a PR with a specific proposal (considering the above).

JacksonWeber · 2024-10-09T18:01:14Z

A few questions/thoughts:

While non-HTTP usage would probably be low/non-exisent, I don't think it belongs in HTTP domain. So I'd consider adding some attribute like user_agent.type (probably needs a better name).

Is there some prior art in the industry to identify a synthetic source/bot user? Is there an attribute in ECS for it? Are there some non-telemetry user-agent conventions for it?

nit: let's just not set an attribute instead of using not_set value.

It'd be awesome if you could send a PR with a specific proposal (considering the above).

Thank you for your feedback on this issue! Just a couple questions regarding your first point:

While I don't expect non-HTTP telemetry to need this synthetic source flag, I suppose it could be more generic and defined outside of HTTP specifically. However, I'm struggling to find any more relevant association for this. For example, if I want to define some attribute on the spans.yaml, I have the options of http, rpc, faas, rpc, gen-ai, database, messaging, and cloud-events. None of which seem to be more relevant than http for something like synthetic source. Maybe I'm missing something about the structure of the semantic conventions here.
I'm also curious about the idea for a user_agent.type field, what kind of data would a field with that name hold?

reyang · 2024-10-10T16:51:35Z

I think we need to get some clarity regarding "what is synthetic source". For example, do we think it'll be a static list of client types (e.g. Agent header for HTTP) or a list that will be frequently updated?

For example, we do not want to have an explicit flag saying "this trace is a result of a synthetic request" then we noticed "oops, we just realized that there are other traces from agent XYZ, and this agent is actually powered by AI/LLM so the previously added synthetic flag should be fixed".

JacksonWeber · 2024-10-10T20:13:24Z

@reyang I think it'll be important to keep the list of known synthetic sources updated over time as there's no way to predict how popular a certain bot might become.

I'm a little confused by your example. Are you essentially saying that in the scenario, it would be possible that we would miss synthetic traces created by newer technologies if we only maintained a static list?

JacksonWeber added enhancement New feature or request triage:needs-triage labels Jun 6, 2024

github-actions bot assigned reyang Jun 6, 2024

lmolkova added this to Python: HTTP Semantic Conventions Jun 17, 2024

lmolkova unassigned reyang Jun 17, 2024

lmolkova added the area:http label Jun 17, 2024

lmolkova removed this from Python: HTTP Semantic Conventions Jun 18, 2024

lmolkova added this to Spec: HTTP Semantic Conventions Jun 18, 2024

joaopgrassi removed the triage:needs-triage label Jul 9, 2024

JacksonWeber mentioned this issue Aug 28, 2024

Appending Synthetic Attribute via Collector open-telemetry/opentelemetry-collector#10999

Open

JacksonWeber linked a pull request Oct 28, 2024 that will close this issue

add http.request.synthetic attribute to server spans and metrics #1523

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction of a Synthetic Attribute for Server Span Telemetry #1127

Introduction of a Synthetic Attribute for Server Span Telemetry #1127

JacksonWeber commented Jun 6, 2024 •

edited

Loading

MSNev commented Jun 6, 2024

lmolkova commented Oct 7, 2024

JacksonWeber commented Oct 9, 2024

reyang commented Oct 10, 2024 •

edited

Loading

JacksonWeber commented Oct 10, 2024

Introduction of a Synthetic Attribute for Server Span Telemetry #1127

Introduction of a Synthetic Attribute for Server Span Telemetry #1127

Comments

JacksonWeber commented Jun 6, 2024 • edited Loading

Area(s)

Is your change request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

MSNev commented Jun 6, 2024

lmolkova commented Oct 7, 2024

JacksonWeber commented Oct 9, 2024

reyang commented Oct 10, 2024 • edited Loading

JacksonWeber commented Oct 10, 2024

JacksonWeber commented Jun 6, 2024 •

edited

Loading

reyang commented Oct 10, 2024 •

edited

Loading