-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add Mysql2 and Trilogy db.collection.name
attribute
#1109
base: main
Are you sure you want to change the base?
Feat: Add Mysql2 and Trilogy db.collection.name
attribute
#1109
Conversation
d9e2fbd
to
8bd5818
Compare
db.collection.name
attributedb.collection.name
attribute
instrumentation/trilogy/lib/opentelemetry/instrumentation/trilogy/patches/client.rb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small wording things on the trilogy test instructions you can keep or leave. Other than that, looks great! Thank you!
instrumentation/trilogy/test/opentelemetry/instrumentation/trilogy/instrumentation_test.rb
Outdated
Show resolved
Hide resolved
instrumentation/trilogy/test/opentelemetry/instrumentation/trilogy/instrumentation_test.rb
Outdated
Show resolved
Hide resolved
Co-authored-by: Kayla Reopelle <[email protected]>
@@ -83,9 +86,17 @@ def _otel_client_attributes | |||
|
|||
attributes[SemanticConventions::Trace::DB_NAME] = _otel_database_name | |||
attributes[SemanticConventions::Trace::PEER_SERVICE] = config[:peer_service] | |||
attributes['db.collection.name'] = collection_name(sql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to avoid adding additional overhead to the instrumentation if possible.
In cases where the SQL statement is omitted or not sanitized, there isn't any regexp scan that occurs.
Is there a way to optimize it so that there isn't additional processing?
I would also like to consider that this db.collection.name
matches newer versions of the OTel Schema and the instrumentations are still using pre-1.0 semantics.
If I am not mistaken that would have been db.table.name
. I think we should continue using pre-1.0 semantics until we have
OTEL_SEMCONV_STABILITY_OPT_IN
implemented
https://opentelemetry.io/docs/specs/semconv/database/database-spans/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @arielvalentin! It makes sense to keep consistent with pre-1.0 semantics. In this case, the attribute is db.sql.table
. I've made this update!
Re additional processing: I see your point about putting every SQL statement through a regexp scan. Because the table/collection name isn't available on the client, I'm not sure how else we'd be able to get this information. We could make table name an opt-in/opt-out attribute, but is that level of control something we want to provide? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean it's difficult to say. I feel like maybe this should only be done when the we also include the db.statement and I don't know if it would be possible to combine with the obfuscation code.
Maybe I'm making too much of it? I'm not sure.
Perhaps benchmarking will put me a bit at ease.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if only including the table name if we're recording the db.statement deviates from the convention (because as of now, the convention has it that recording the collection name isn't based on any condition, just availability).
The change to Regexp.last_match(1)
and an updated regex string improved speed. In the following benchmark example, I used the MySQL obfuscation SQL code as a baseline, and found running the extra db.collection.name
sql was 1.19x slower.
require 'benchmark/ipsa'
TABLE_NAME = /\b(?:(?:FROM|INTO|UPDATE)|(?:(?:CREATE|DROP|ALTER)\s+TABLE(?:\s+IF\s+(?:NOT\s+)?EXISTS)?))\s+["']?([\w.]+)["']?/i
MYSQL_OBFUSCATION = /(?-mix:'(?:[^']|'')*?(?:\\'.*|'(?!')))|(?-mix:"(?:[^"]|"")*?(?:\\".*|"(?!")))|(?-mix:(\$(?!\d)[^$]*?\$).*?(?:\1|$))|(?-mix:\{?(?:[0-9a-fA-F]\-*){32}\}?)|(?-mix:-?\b(?:[0-9]+\.)?[0-9]+([eE][+-]?[0-9]+)?\b)|(?i-mx:\b(?:true|false|null)\b)|(?-mix:0x[0-9a-fA-F]+)|(?i-mx:(?:#|--).*?(?=\r|\n|$))|(?m-ix:\/\*.*?\*\/)|(?-mix:q'\[.*?(?:\]'|$)|q'\{.*?(?:\}'|$)|q'\<.*?(?:\>'|$)|q'\(.*?(?:\)'|$))/
SQL = 'SELECT * FROM test_table'
def collection_name
shared_operation
Regexp.last_match(1) if SQL =~ TABLE_NAME
end
def no_collection_name
shared_operation
SQL
end
def shared_operation
SQL.gsub(MYSQL_OBFUSCATION, '?')
end
Benchmark.ipsa do |x|
x.report('collection_name') { collection_name }
x.report('no_collection_name') { no_collection_name }
x.compare!
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arielvalentin We chatted about this in the SIG today and decided the reporting of table/collection names can be put behind a feature flag. What do you think? If that works, the remaining consideration is if the default on or off - do you have any preference on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arielvalentin - I made a suggestion for the config name db_collection_name
and default value include
, with omit
as the other option: 8328668
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @arielvalentin! Wanted to check in and see if you had thoughts on the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience. I'm going to add this to my list to review by EoD tomorrow
instrumentation/mysql2/lib/opentelemetry/instrumentation/mysql2/patches/client.rb
Outdated
Show resolved
Hide resolved
@hannahramadan Thank you for your patience waiting for my response. We discussed some of the details during the SIG on 2024-10-08 which I will share here: Default to
|
def db_sql_table_name(sql) | ||
Regexp.last_match(1) if sql =~ TABLE_NAME | ||
rescue StandardError | ||
nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an error occurs and the attribute is set to nil
, then the SDK will report an error. It would be best to avoid setting any invalid attributes.
Please ensure that the attribute is not set in cases where the table name could not be extracted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some changes! Will log an error if trouble getting the table name and won't report. Is the error logging here okay? 4af3e2c
instrumentation/mysql2/lib/opentelemetry/instrumentation/mysql2/patches/client.rb
Outdated
Show resolved
Hide resolved
Co-authored-by: Ariel Valentin <[email protected]>
…nnahramadan/opentelemetry-ruby-contrib into mysql_libs_db_collection_name
Thank you @arielvalentin and apologies for the delayed response here! To your point, I think it may make sense to nail down how we want the design to look and then add this attribute.
Perhaps we:
For table/collection name attribute:
|
The db.collection.name attribute is conditionally required for Database spans. This PR uses regex to detect the collection name and report it as the
db.collection.name
attribute for Mysql2 and Trilogy instrumentation.Closes #1100