-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: allow users to manage a user-defined state while parsing rows #77
WIP: allow users to manage a user-defined state while parsing rows #77
Conversation
b8a55ae
to
6b9c1eb
Compare
6b9c1eb
to
1e87769
Compare
I'm moving this to "Ready for review" just so we can discuss if this is an ok idea. |
Doc is pending and tests using streams. |
I am not sure this is how I would go. My suggestion is for the |
"Rebranded" this PR to be more general than just count. But, counting can be a functionality baked in using this So, I'll leave this PR for the user state and open a new one for the count. For this PR, I'll give to the Thanks, @josevalim. |
Generally speaking, I don't want to expose the parser state or its transformation. I am concerned both about performance or the possibility we will expose too much of the internals, making it impossible to refactor later without breaking user code. |
It doesn't introduces much of an overhead (and it must be optional). It just enhances the state with a user state (a pair of This patch allows the package to stream each row, so the user can run aggregates and other operations as parser is stream lines. I can be an optimization is some cases. I'll have a look at this patch this weekend, to finish the remaining things (docs, tests...). |
The patch doesn't receives the parsed row, because it was written for the counter issue (#12). It must be generalized to perform not only counting, but any aggregating operation. |
Hi, I understand the user is not allowed to modify the internal state, but the issue is, if we expose it, users are going to read it, and potentially modify it. And I’m not comfortable going down this route. If this is essential to your solution, then I believe it is not a good fit for this library, which on purpose was designed to have a small API surface |
No problem, @josevalim. I understand your point. I'll close this for now and run a fork of this repo, but I'll make this streaming line idea. Later on, if make sense, Thanks for taking time to answer. |
This PR allows user to define and control some state (different from the parse state) while parsing each line from the CSV.
Users can pass a state and a function that runs for each row.
New options are:
init_user_state
- the state that goes along with the parse statestate_transform_function
- function with arity 2state and parse state (:line or :header)
and it should return the next stateTo avoid breaking change, for the moment,
parse_string_with_state
was introduced to keep the previousparse_*
functions to run exactly as they were.