Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2][WIP] Span hash sanitizer and enhance span hash adjuster #6499

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

chahatsagarmain
Copy link
Contributor

@chahatsagarmain chahatsagarmain commented Jan 7, 2025

Which problem is this PR solving?

Description of the changes

  • Add a hash to the span attributes
  • Make changes to components computing hash

How was this change tested?

Checklist

Signed-off-by: chahatsagarmain <[email protected]>
@chahatsagarmain chahatsagarmain requested a review from a team as a code owner January 7, 2025 18:42
@chahatsagarmain chahatsagarmain marked this pull request as draft January 7, 2025 18:42
@dosubot dosubot bot added the v2 label Jan 7, 2025
hashStr := hex.EncodeToString(buf.Bytes())
span.Tags = append(span.Tags, model.KeyValue{
Key: "span.hash",
VType: model.ValueType_STRING,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hard to evaluate this PR without seeing how it will be used in the storage backends. For example, is it really a string we want, or a number?

Copy link

codecov bot commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 72.72727% with 18 lines in your changes missing coverage. Please review.

Project coverage is 96.17%. Comparing base (08503ca) to head (c674268).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
cmd/collector/app/sanitizer/hash_sanitizer.go 50.00% 4 Missing and 2 partials ⚠️
cmd/query/app/querysvc/v2/adjuster/hash.go 70.00% 5 Missing and 1 partial ⚠️
model/span.go 68.42% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6499      +/-   ##
==========================================
- Coverage   96.27%   96.17%   -0.10%     
==========================================
  Files         372      373       +1     
  Lines       21282    21332      +50     
==========================================
+ Hits        20490    20517      +27     
- Misses        605      622      +17     
- Partials      187      193       +6     
Flag Coverage Δ
badger_v1 10.63% <0.00%> (-0.04%) ⬇️
badger_v2 2.77% <0.00%> (-0.01%) ⬇️
cassandra-4.x-v1-manual 16.66% <41.30%> (+0.08%) ⬆️
cassandra-4.x-v2-auto 2.70% <0.00%> (-0.01%) ⬇️
cassandra-4.x-v2-manual 2.70% <0.00%> (-0.01%) ⬇️
cassandra-5.x-v1-manual 16.66% <41.30%> (+0.08%) ⬆️
cassandra-5.x-v2-auto 2.70% <0.00%> (-0.01%) ⬇️
cassandra-5.x-v2-manual 2.70% <0.00%> (-0.01%) ⬇️
elasticsearch-6.x-v1 20.16% <0.00%> (-0.07%) ⬇️
elasticsearch-7.x-v1 20.24% <0.00%> (-0.08%) ⬇️
elasticsearch-8.x-v1 20.39% <0.00%> (-0.07%) ⬇️
elasticsearch-8.x-v2 2.76% <0.00%> (-0.01%) ⬇️
grpc_v1 12.27% <0.00%> (-0.05%) ⬇️
grpc_v2 9.06% <0.00%> (-0.04%) ⬇️
kafka-3.x-v1 10.59% <43.47%> (+0.24%) ⬆️
kafka-3.x-v2 2.77% <0.00%> (-0.01%) ⬇️
memory_v2 2.76% <0.00%> (-0.01%) ⬇️
opensearch-1.x-v1 20.28% <0.00%> (-0.07%) ⬇️
opensearch-2.x-v1 20.28% <0.00%> (-0.08%) ⬇️
opensearch-2.x-v2 2.76% <0.00%> (-0.02%) ⬇️
tailsampling-processor 0.51% <0.00%> (-0.01%) ⬇️
unittests 94.97% <46.96%> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: chahatsagarmain <[email protected]>
model/span.go Outdated
Comment on lines 114 to 120
func (s *Span) HashHashTag() bool {
if _, ok := KeyValues(s.Tags).FindByKey(SpanHashKey); ok {
return true
}
return false
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (s *Span) HashHashTag() bool {
if _, ok := KeyValues(s.Tags).FindByKey(SpanHashKey); ok {
return true
}
return false
}

redundant. Someone can always call _, ok := GetHashTag

return span
}

func uint64ToInt64Bits(value uint64) int64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplication. I would move this to Span, e.g. span.SetHashTag()

Signed-off-by: chahatsagarmain <[email protected]>
Signed-off-by: chahatsagarmain <[email protected]>
@chahatsagarmain chahatsagarmain marked this pull request as ready for review January 9, 2025 19:21
spanHash, _ := model.HashCode(span)
spanHash, found := span.GetHashTag()
if !found {
tempSpam := *span
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to make a copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted a review on this , calling SetHashTag sets the span.hash for the span tag and unit tests fail because of unexpected value .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I would expect the test to fail if we add a new tag, depending on how the test are implemented. We need to work around that.

model/span.go Outdated
if tag, ok := KeyValues(s.Tags).FindByKey(SpanHashKey); ok {
return tag.GetVInt64(), true
}
return -1, false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return -1, false
return 0, false

I don't thin -1 conveys any more meaning than 0, so I would rather return default value, this would match behavior of a get from standard map[string]int64

model/span.go Outdated
return -1, err
}
spanHash := uint64ToInt64Bits(hCode)
s.Tags = append(s.Tags, KeyValue{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have helper methods for creating typed instances of KeyValue, please use that instead of explicit creation.

@@ -16,6 +16,7 @@ type SanitizeSpan func(span *model.Span) *model.Span
func NewStandardSanitizers() []SanitizeSpan {
return []SanitizeSpan{
NewEmptyServiceNameSanitizer(),
NewHashingSanitizer(),
Copy link
Member

@yurishkuro yurishkuro Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: we also have v2 sanitizers, need to add this logic that as well.

./cmd/jaeger/internal/sanitizer

@chahatsagarmain chahatsagarmain marked this pull request as draft January 9, 2025 21:57
Signed-off-by: chahatsagarmain <[email protected]>
@@ -27,6 +27,7 @@ const (
BinaryType = ValueType_BINARY

SpanKindKey = "span.kind"
SpanHashKey = "span.hash"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move it into a separate group and use value @jaeger@hash - see ##6522

span.CopyTo(dedupedSpans.AppendEmpty())
continue

var spanHash int64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a helper function (or two) in jptrace package, similar to how you added them to model/

@yurishkuro
Copy link
Member

@chahatsagarmain what's the status of this?

@chahatsagarmain
Copy link
Contributor Author

@yurishkuro I was busy with ci issues (mostly due to cache action ) , Now i can get back these ones .

Signed-off-by: chahatsagarmain <[email protected]>
Signed-off-by: chahatsagarmain <[email protected]>
)

// NewHashingSanitizer creates a sanitizer to add hash field to spans
func NewHashingSanitizer() SanitizeSpan {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func NewHashingSanitizer() SanitizeSpan {
func AddHashTag() SanitizeSpan {

and similarly file name add_hash_tag

Comment on lines 21 to 23
protoMarshaler := &ptrace.ProtoMarshaler{}
return &SpanHasher{
marshaler: protoMarshaler,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protoMarshaler := &ptrace.ProtoMarshaler{}
return &SpanHasher{
marshaler: protoMarshaler,
return &SpanHasher{
marshaler: &ptrace.ProtoMarshaler{},

}
}

func (s *SpanHasher) SpanHash(trace ptrace.Traces) (int64, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptrace.Traces is a collection of spans, it doesn't make sense to ask for a single hash. It should be taking ptrace.Span as input.

hash, _ := model.HashCode(span)
if _, ok := spansByHash[hash]; !ok {
spansByHash[hash] = span
spanHash, found := span.GetHashTag()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this approach going to work if the span undergoes multiple transformations? E.g. we start with sanitizers that can mutate it, and we can add the tag as the last step in sanitizers (so probably ok for the write path). But then on the read path we also have adjusters that can change the trace, with this deduper being somewhere in the middle, is it going to get consistent behavior? I suspect yes because as a deduper it's job is really to filter out identical spans as ingested.

@@ -82,6 +86,7 @@ func (c converter) toDomain(dbSpan *Span) (*model.Span, error) {
if err != nil {
return nil, err
}
tags = append(tags, model.Int64(jptrace.HashAttribute, dbSpan.SpanHash))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't we expect hash tag to already be stored?

chahatsagarmain and others added 3 commits January 22, 2025 03:42
Signed-off-by: chahatsagarmain <[email protected]>
Signed-off-by: chahatsagarmain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v2][adjuster] Enhance Span Hash Adjuster For Spans That Have Already Been Hashed
2 participants