csv-simd

Adapted Lemir simdjson algorithm applied to build csv memory index. Uses byte "nibles" to increase processing capacity to identify structure in a csv.

csv -> memory index
record number, index -> record
record, field idx, index -> field value

Interested in leading a small (manageable) open source project?

Post a comment and ask your questions here: #1

Meta todos

set the open source licensing
describe the project
how to participate
use the BurntSushi csv parser as a benchmark
discuss and track which of io or cpu processing capacity is the bottleneck

Decisions

Documenting the core concepts (vs the specifics of the API)
Adding a dependency https://github.com/rust-lang/portable-simd
Extend the capability to streams (not all in memory as it is now)
Consider splitting work without first knowing record breaks (requires toggling interpretation if/when start in quoted text)

Code todos

Make compatible for M1 (supports NEON)
Document the public api
Document the active tests and coverage
Take inventory of how to augment the compliance with the csv standard to include escape and commas within quoted text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

csv-simd

Interested in leading a small (manageable) open source project?

Meta todos

Decisions

Code todos

Files

README.md

Latest commit

History

README.md

File metadata and controls

csv-simd

Interested in leading a small (manageable) open source project?

Meta todos

Decisions

Code todos