Replies: 4 comments 13 replies
-
Sidenote: we should make sure we're allowing for special characters in the variables. Maybe |
Beta Was this translation helpful? Give feedback.
-
2 cents:
|
Beta Was this translation helpful? Give feedback.
-
I don't think that would be a good compromise: now you have two ways of specifying the same information, which doesn't help with making config format easy to document and understand, nor to implement support for it.
Agree, we should try to be coherent when possible. So as it stands, I would support something like: "applyTransform": {
"method": "regexp_extract",
"args": [".*_(val|train)2014\\.json$"],
} over the less generic alternative: "applyTransform": {
"regex": ".*_(val|train)2014\\.json$"
} Now whether that's an option or another, I think I would prefer the group to decide sooner than later. |
Beta Was this translation helpful? Give feedback.
-
Cool thanks for the discussions, I'll try to send a few PRs and then we can iron out things there. |
Beta Was this translation helpful? Give feedback.
-
Right now, references are expected to be of the form "#{[ref]}":
(https://github.com/mlcommons/datasets_format/blob/main/docs/croissant-spec.md#reference)
The problems:
#{}
syntax brings us benefits, aside from unnecessary complexity: references always appear in values which are expected to be of type Reference (containedIn
,source
,key
,references
,data
), so we know those are references and not plain string.#{file.csv/filename}
wherefilename
is the property filed by Croissant, and#{file.csv/filename}
wherefilename
is a CSV column.Possible solution:
#{}
syntax.csvfile/column:filename
to refer to thefilename
column (column
is the accessor,filename
is the parameter), andcsvfile/filename
to refer to the csv filename (accessor with no parameter). This way we can envision other types of accessors, eg:image-files/exif:author
to get the author from image EXIF metadata,file.json/jsonpath:$.store.book[*].author
to get the authors list from a json file, etc...What do you folks think?
Beta Was this translation helpful? Give feedback.
All reactions