Skip to content

Commit 705d3a5

Browse files
Performance improvements and new release (#20)
* massive speedup by removing pcall * updated readme with benchmarks and getting ready for another release
1 parent 8bd8fbe commit 705d3a5

File tree

4 files changed

+69
-42
lines changed

4 files changed

+69
-42
lines changed

README.md

+37-7
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
11
# ftcsv
22
[![Build Status](https://travis-ci.org/FourierTransformer/ftcsv.svg?branch=master)](https://travis-ci.org/FourierTransformer/ftcsv) [![Coverage Status](https://coveralls.io/repos/github/FourierTransformer/ftcsv/badge.svg?branch=master)](https://coveralls.io/github/FourierTransformer/ftcsv?branch=master)
33

4-
ftcsv, a fairly fast csv library written in pure Lua. It's been tested with LuaJIT 2.0/2.1 and Lua 5.1, 5.2, and 5.3
4+
ftcsv is a fast pure lua csv library.
55

6-
It works well for CSVs that can easily be fully loaded into memory (easily up to a hundred MB). Currently, there isn't a "large" file mode with proper readers and writers for ingesting CSVs in bulk with a fixed amount of memory. It correctly handles both `\n` (LF), `\r` (CR) and `\r\n` (CRLF) line endings (ie it should work with Unix, Mac OS 9, and Windows line endings), and has UTF-8 support (it will strip out BOM if it exists).
6+
It works well for CSVs that can easily be fully loaded into memory (easily up to a hundred MB) and correctly handles `\n` (LF), `\r` (CR) and `\r\n` (CRLF) line endings. It has UTF-8 support, and will strip out the BOM if it exists. ftcsv can also parse headerless csv-like files and supports column remapping, file or string based loading, and more!
7+
8+
Currently, there isn't a "large" file mode with proper readers for ingesting large CSVs using a fixed amount of memory, but that is in the works in [another branch!](https://github.com/FourierTransformer/ftcsv/tree/parseLineIterator)
9+
10+
It's been tested with LuaJIT 2.0/2.1 and Lua 5.1, 5.2, and 5.3
711

812

913

@@ -88,7 +92,7 @@ ftcsv.parse("apple,banana,carrot", ",", {loadFromString=true, headers=false})
8892

8993
In the above example, the first field becomes 'a', the second field becomes 'b' and so on.
9094

91-
For all tested examples, take a look in /spec/feature_spec.lua
95+
For all tested examples, take a look in /spec/feature_spec.lua and /spec/dynamic_features_spec.lua
9296

9397

9498
## Encoding
@@ -112,15 +116,41 @@ file:close()
112116
```
113117

114118

119+
## Error Handling
120+
ftcsv returns a bunch of errors when passed a bad csv file or incorrect parameters. You can find a more detailed explanation of the more cryptic errors in [ERRORS.md](ERRORS.md)
121+
115122

116-
## Performance
117-
I did some basic testing and found that in lua, if you want to iterate over a string character-by-character and look for single chars, `string.byte` performs better than `string.sub`. As such, ftcsv iterates over the whole file and does byte compares to find quotes and delimiters and then generates a table from it. If you have thoughts on how to improve performance (either big picture or specifically within the code), create a GitHub issue - I'd love to hear about it!
123+
## Benchmarks
124+
We ran ftcsv against a few different csv parsers ([PIL](http://www.lua.org/pil/20.4.html)/[csvutils](http://lua-users.org/wiki/CsvUtils), [lua_csv](https://github.com/geoffleyland/lua-csv), and [lpeg_josh](http://lua-users.org/lists/lua-l/2009-08/msg00020.html)) for lua and here is what we found:
118125

126+
### 20 MB file, every field is double quoted (ftcsv optimal lua case\*)
119127

128+
| Parser | Lua | LuaJIT |
129+
| --------- | ------------------ | ------------------ |
130+
| PIL/csvutils | 3.939 +/- 0.565 SD | 1.429 +/- 0.175 SD |
131+
| lua_csv | 8.487 +/- 0.156 SD | 3.095 +/- 0.206 SD |
132+
| lpeg_josh | **1.350 +/- 0.191 SD** | 0.826 +/- 0.176 SD |
133+
| ftcsv | 3.101 +/- 0.152 SD | **0.499 +/- 0.133 SD** |
120134

121-
## Error Handling
122-
ftcsv returns a litany of errors when passed a bad csv file or incorrect parameters. You can find a more detailed explanation of the more cryptic errors in [ERRORS.md](ERRORS.md)
135+
\* see Performance section below for an explanation
123136

137+
### 12 MB file, some fields are double quoted
138+
139+
| Parser | Lua | LuaJIT |
140+
| --------- | ------------------ | ------------------ |
141+
| PIL/csvutils | 2.868 +/- 0.101 SD | 1.244 +/- 0.129 SD |
142+
| lua_csv | 7.773 +/- 0.083 SD | 3.495 +/- 0.172 SD |
143+
| lpeg_josh | **1.146 +/- 0.191 SD** | 0.564 +/- 0.121 SD |
144+
| ftcsv | 3.401 +/- 0.109 SD | **0.441 +/- 0.124 SD** |
145+
146+
[LuaCSV](http://lua-users.org/lists/lua-l/2009-08/msg00012.html) was also tried, but usually errored out at odd places during parsing.
147+
148+
NOTE: times are measured using `os.clock()`, so they are in CPU seconds. Each test was run 30 times in a randomized order. The file was pre-loaded, and only the csv decoding time was measured.
149+
150+
Benchmarks were run under ftcsv 1.1.6
151+
152+
## Performance
153+
We did some basic testing and found that in lua, if you want to iterate over a string character-by-character and look for single chars, `string.byte` performs faster than `string.sub`. This is especially true for LuaJIT. As such, in LuaJIT, ftcsv iterates over the whole file and does byte compares to find quotes and delimiters. However, for pure lua, `string.find` is used to find quotes but `string.byte` is used everywhere else as the CSV format in its proper form will have quotes around fields. If you have thoughts on how to improve performance (either big picture or specifically within the code), create a GitHub issue - I'd love to hear about it!
124154

125155

126156
## Contributing

ftcsv-1.1.5-1.rockspec

-33
This file was deleted.

ftcsv-1.1.6-1.rockspec

+30
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
package = "ftcsv"
2+
version = "1.1.6-1"
3+
4+
source = {
5+
url = "git://github.com/FourierTransformer/ftcsv.git",
6+
tag = "1.1.6"
7+
}
8+
9+
description = {
10+
summary = "A fast pure lua csv library (parser and encoder)",
11+
detailed = [[
12+
ftcsv works well for CSVs that can easily be fully loaded into memory (easily up to a hundred MB) and correctly handles `\n` (LF), `\r` (CR) and `\r\n` (CRLF) line endings. It has UTF-8 support, and will strip out the BOM if it exists. ftcsv can also parse headerless csv-like files and supports column remapping, file or string based loading, and more!
13+
14+
Note: Currently it cannot load CSV files where the file can't fit in memory.
15+
]],
16+
homepage = "https://github.com/FourierTransformer/ftcsv",
17+
maintainer = "Shakil Thakur <[email protected]>",
18+
license = "MIT"
19+
}
20+
21+
dependencies = {
22+
"lua >= 5.1, <5.4",
23+
}
24+
25+
build = {
26+
type = "builtin",
27+
modules = {
28+
["ftcsv"] = "ftcsv.lua"
29+
},
30+
}

ftcsv.lua

+2-2
Original file line numberDiff line numberDiff line change
@@ -151,9 +151,9 @@ local function parseString(inputString, inputLength, delimiter, i, headerField,
151151
outResults[1] = {}
152152
assignValue = function()
153153
emptyIdentified = false
154-
if not pcall(function()
154+
if headerField[fieldNum] then
155155
outResults[lineNum][headerField[fieldNum]] = field
156-
end) then
156+
else
157157
error('ftcsv: too many columns in row ' .. lineNum)
158158
end
159159
end

0 commit comments

Comments
 (0)