-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add origin_referrer_url
and origin_url
to the file attribute
#1430
base: main
Are you sure you want to change the base?
Conversation
|
Hi @trisch-me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there could be alternative solutions such as:
- reusing existing
url
attributes on the same telemetry item that describes a file - defining specific security events which list all applicable metadata as event fields (not attributes)
The current solution seems to be very specific to a certain use-case (I download a file and capture it's metadata which will not be accessible later). These attributes will not be available or applicable to any other case.
It won't make sense to reuse those attributes in other conventions.
So please consider alternative solutions and please provide a use-case for these attributes.
file.origin_referrer_url
and file.origin_url
attributeorigin_referrer_url
, origin_url
and zone_identifier
to file attribute
origin_referrer_url
, origin_url
and zone_identifier
to file attributeorigin_referrer_url
, origin_url
and zone_identifier
to the file attribute
Thank you all for your comments. Based on feedback from various sources, I have added However, since there also have been concerns raised about whether the fields we plan to add are even necessary, we are considering having @trisch-me (and @magermark ) lead a more in-depth discussion during the upcoming Otel Semantic Convention meeting. |
Hi @trisch-me @lmolkova @joaopgrassi |
@AsuNa-jp could you please fix conflicts? thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had a discussion with @trask and @lmolkova about attributes and Events.
- I'm comfortable now having these as attributes. It seems we're moving more in this direction with our guidance.
- I'm still going to push back on a lack of "Event" definition in opentelemetry for these attributes.
Regarding (2) adding the attribute to the registry does NOT allow opentelemetry to use this attribute yet. There is still no defined signal (event, span or metric) which is defined to generate this data. You have a clear use case outlined in the PR. I suspect these attributes should be part of a file.open
or file.access
event.
I would like to see this event defined, but I won't block this PR if others feel it's ok as-is and approve these attributes without that detail.
model/file/registry.yaml
Outdated
@@ -137,3 +161,18 @@ groups: | |||
This attribute is only applicable to symbolic links. | |||
stability: experimental | |||
examples: ['/usr/bin/python3'] | |||
- id: file.zone_identifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be file.windows_zone.id
?
Given the generic nature of the word of 'zone' I think this should be clarified. Additionally we should stick to a similar convention across otel for id
, and we don't use identifier
anywhere else that I'm aware of.
origin_referrer_url
, origin_url
and zone_identifier
to the file attributeorigin_referrer_url
, origin_url
to the file attribute
origin_referrer_url
, origin_url
to the file attributeorigin_referrer_url
and origin_url
to the file attribute
Hi @jsuereth (CC: @trisch-me ) |
@jsuereth could you check if your concerns are addressed and approve this pr? |
path, directory, size, extension, and metadata, including | ||
file access time, file origin information and more. It addition, | ||
it also includes information about the process that accessed the file. | ||
attributes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add note on who/how/when should emit this event.
E.g. OTel instrumentations are usually run in a certain process and would usually 1) monitor things this process does (not OS-wide things) 2) set process
attributes as resource attributes, not as event attributes (because resource attributes are shared for the process lifetime and it's much more efficient).
If it's an external observer that monitors something on the OS level, we should call it out.
If it's either, then we should explain how and if process attributes should be recorded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lmolkova (CC: @trisch-me, @jsuereth )
Thanks you for the feedback!
not OS-wide things
As you pointed out, it was my misunderstanding—I was imagining OS-level events. (Since ECS can also handle that information, I ended up confusing the two)
let's add note on who/how/when should emit this event.
E.g. OTel instrumentations are usually run in a certain process
I'm not very familiar with the internal structure of Otel's instrumentation, so to be honest, it's difficult for me to provide an answer at this point. If possible, could you please share the specific technical documentation for the Otel instrumentation you're referring to? Additionally, if OS-level events are not expected, who is expected to send the file.access
or file.open
events? Could you share any assumptions or scenarios you had in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AsuNa-jp you can read more about otel instrumentations here https://opentelemetry.io/docs/concepts/instrumentation, or explore instrumentations in each language. They are usually in 'contrib' repositories. E.g. here's the python contrib repo
Having said this, it seems you're trying to define an event that an external observer (on the OS level) would record. Let's stick to this. The ask is to explicitly document this.
You might be interested to explore collector that already reports somewhat related things in filestats receiver, hostmetrics receiver, journald receiver, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lmolkova
Thank you for your detailed response to my question. I appreciate it. I will carefully go through the provided documents and, after understanding what Otel expects as event examples, proceed with revising this PR.
- ref: file.directory | ||
- ref: file.size | ||
- ref: file.extension | ||
- ref: file.accessed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be time it was accessed before it was opened this time? Is it available? Otherwise, it'd be the same as event timestamp and then it's not necessary.
- ref: process.user.name | ||
brief: Process name of the process that accessed the file. | ||
- ref: process.executable.name | ||
brief: Executable file name of the process that accessed the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if file was attempted to be opened, but it failed - do we want to record this event? If so, we should add error.type
attribute to it.
- id: event.file.open | ||
stability: experimental | ||
type: event | ||
name: file.open |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we qualify what open means? Below you mention it's actually an access event (which could probably mean other things, like setting/getting metadata). Should it be called file.access
then?
Does it intend to capture OS-level audit events like https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-10/security/threat-protection/auditing/event-4663 or https://github.com/linux-audit/audit-userspace?tab=readme-ov-file#events?
path, directory, size, extension, and metadata, including | ||
file access time, file origin information and more. It addition, | ||
it also includes information about the process that accessed the file. | ||
attributes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine a few scenarios for such events:
- audit (and then we should probably describe how OS events are mapped to this one)
- monitoring/observability - from within the process, I want to record an event every time file is opened and know IO operation details. For this one, we should record things like the file open mode, access permissions provided, and the error if it didn't happen.
- ref: file.name | ||
- ref: file.path | ||
- ref: file.directory | ||
- ref: file.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it always known when file is opened?
…entions into file_originevents
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@AsuNa-jp are you continuing to work on this PR? |
Hi @trisch-me , |
Changes
This PR adds the following attributes. (Updated 2024/Dec/05)
file.origin_referrer_url
file.origin_url
(Thanks @trisch-me for all the advice you gave me in creating this PR!)
Background: What are these fields for? (Updated 2024/Oct/21)
When downloading files from the internet (or network) using a web browser (such as Chrome or Edge) or a certain application, information about where the file came from is generally added to the file. This is a general behavior that can occur on all operating systems, and its primary use is to enhance security by providing context about the file’s source, allowing the system to assess potential risks and enforce appropriate security measures.
The details are explained below.
Windows
In Windows, it is known as the Mark of the Web(ref1, ref2), and is added to the file's NTFS alternate data stream.
For example, when you download an image file (image17.webp) from this webpage using a web browser, the download source URL is automatically added to the file's Alternate Data Stream (ADS) as following.
image17.webp:Zone.Identifier:$DATA
This PR adds a field to store the URL of the file's origin, which is saved in the NTFS alternate data stream (ADS).
ReferrerUrl
is intended to be stored in theorigin_referrer_url field
HostUrl
is inteded to be stored in theorigin_url
field.Note - In the case of Windows, MotW can be used not only with NTFS but also with ReFS (8.1/2012 R2 or later)
Linux
In Linux, some applications may store the file origin metadata in extended attributes (xattr) or Gnome virtual filesystem(gvfs) to track the source of a file.
For example, when you download an image file (image17.webp) from this webpage using a web browser, the download source URL is automatically added to gvfs.
example of a file downloaded by using firefox
Additionally, by using Curl or Wget, the referer URL(
user.xdg.referrer.url
) and origin URL(user.xdg.origin.url
) can be attached to the file's extended attributes. (Google Chrome used to adduser.xdg.referrer.url
anduser.xdg.origin.url
as well but it currently turned off this feature.)example of a file downloaded by using curl
user.xdg.referrer.url
is intended to be stored in theorigin_referrer_url field
user.xdg.origin.url
is inteded to be stored in theorigin_url
field.Note - As written in this web page, all major Linux file systems including Ext4, Btrfs, ZFS, and XFS support extended attributes.
MacOS
(Since I don't have a Mac device, my investigation will be based on the internet.)
In MacOS, some applications may store the file origin metadata in extended attributes to track the source of a file as follows. It seems that both the referrer and origin URL are being saved.
The image source is as follows:
https://stackoverflow.com/questions/70444996/obtaining-metadata-where-from-of-a-file-on-mac
The same thing is mentioned on another website as well. (https://exiftool.org/forum/index.php?topic=14991.0)
Background: the use cases. (Updated 2024/Oct/21)
(A). For example, in Elastic Security (Elastic Defend), a file open event may be generated when a file is opened. By including the file's origin information, such as the Origin URL and Referrer URL, the system can assess whether the file might be malware downloaded from a malicious website based on those URLs.
(B). Another example would be adding file origin information (such as the Origin URL and Referrer URL) to the file creation event when a file is downloaded from the internet. This would make it possible to detect if the file was downloaded from a website on a blocklist and take actions such as deleting the file.
Merge requirement checklist
[chore]