Adapted Lemir simdjson algorithm applied to build csv memory index. Uses byte "nibles" to increase processing capacity to identify structure in a csv.
csv -> memory index
record number, index -> record
record, field idx, index -> field value
Post a comment and ask your questions here: #1
- set the open source licensing
- describe the project
- how to participate
- use the BurntSushi csv parser as a benchmark
- discuss and track which of io or cpu processing capacity is the bottleneck
- Documenting the core concepts (vs the specifics of the API)
- Adding a dependency https://github.com/rust-lang/portable-simd
- Extend the capability to streams (not all in memory as it is now)
- Consider splitting work without first knowing record breaks (requires toggling interpretation if/when start in quoted text)
- Make compatible for M1 (supports NEON)
- Document the public api
- Document the active tests and coverage
- Take inventory of how to augment the compliance with the csv standard to include escape and commas within quoted text