CSV dialect detection: implementation without third party libraries #2247
Labels
enhancement
New feature or request. Once marked with this label, its in the backlog.
WIP
work in progress
Discussed in #2246
Originally posted by ws-garcia October 25, 2024
Problem overview
Currently, this project does not have a stable alternative that allows detecting CSV file configuration. An example of this is raised in #1719, where the utility fails to detect the configuration for the given files.
Details
At the moment, @jqnatividad has begun digging into the problem and claiming
He pointed
The work path to go, until now, is outlined in jqnatividad/qsv-sniffer#14. Currently, all tasks are under study but not completed.
New path
In this I will discuss a new approach to implement dialect detection in qsv using trivial elements:
With this approach the dialect detection is reliable as the CleverCSV one, being able to obtain results with greater certainty. The process is as follows:
A Python implementation of this exact approach is described in a GitHub repository. The evaluation of this methods gives:
CSVsniffer
CleverCSV
csv.Sniffer
This sheds light over one point: the presented approach is clearly outperforming
csv.Sniffer
and alsoCleverCSV
in the research datasets.Hoping this can help this wonderful project!
Edit:
Code snippet will be presented in the discussion.
The text was updated successfully, but these errors were encountered: