Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: allow users to manage a user-defined state while parsing rows #77

Conversation

diasbruno
Copy link

@diasbruno diasbruno commented Oct 7, 2023

This PR allows user to define and control some state (different from the parse state) while parsing each line from the CSV.

Users can pass a state and a function that runs for each row.

New options are:

  • init_user_state - the state that goes along with the parse state
  • state_transform_function - function with arity 2 state and parse state (:line or :header) and it should return the next state

To avoid breaking change, for the moment, parse_string_with_state was introduced to keep the previous parse_* functions to run exactly as they were.

@diasbruno diasbruno force-pushed the feature/able-to-control-parse-state branch from b8a55ae to 6b9c1eb Compare October 7, 2023 18:40
@diasbruno diasbruno force-pushed the feature/able-to-control-parse-state branch from 6b9c1eb to 1e87769 Compare October 7, 2023 18:41
@diasbruno
Copy link
Author

I'm moving this to "Ready for review" just so we can discuss if this is an ok idea.

@diasbruno diasbruno marked this pull request as ready for review October 7, 2023 18:49
@diasbruno
Copy link
Author

Doc is pending and tests using streams.

@josevalim
Copy link
Member

I am not sure this is how I would go. My suggestion is for the parse_stream function to return a NimbleCSV.Struct that implements the enumerable protocol, including the count function, and the count function itself is optimized. :)

@diasbruno diasbruno changed the title WIP: allow user to be able to control the parse state. WIP: allow users to be manage a user-defined state while parsing rows Oct 8, 2023
@diasbruno
Copy link
Author

"Rebranded" this PR to be more general than just count. But, counting can be a functionality baked in using this NimbleCSV.Struct you are suggesting.

So, I'll leave this PR for the user state and open a new one for the count.

For this PR, I'll give to the state_transform function the current parsed row, and, add the remaining tests and documentation.

Thanks, @josevalim.

@josevalim
Copy link
Member

Generally speaking, I don't want to expose the parser state or its transformation. I am concerned both about performance or the possibility we will expose too much of the internals, making it impossible to refactor later without breaking user code.

@diasbruno diasbruno changed the title WIP: allow users to be manage a user-defined state while parsing rows WIP: allow users to manage a user-defined state while parsing rows Oct 8, 2023
@diasbruno
Copy link
Author

diasbruno commented Feb 15, 2024

It doesn't introduces much of an overhead (and it must be optional).

It just enhances the state with a user state (a pair of {internal state, user state}), but user is not allowed to modify the internal state.

This patch allows the package to stream each row, so the user can run aggregates and other operations as parser is stream lines. I can be an optimization is some cases.

I'll have a look at this patch this weekend, to finish the remaining things (docs, tests...).

@diasbruno
Copy link
Author

diasbruno commented Feb 15, 2024

The patch doesn't receives the parsed row, because it was written for the counter issue (#12). It must be generalized to perform not only counting, but any aggregating operation.

@josevalim
Copy link
Member

Hi, I understand the user is not allowed to modify the internal state, but the issue is, if we expose it, users are going to read it, and potentially modify it. And I’m not comfortable going down this route. If this is essential to your solution, then I believe it is not a good fit for this library, which on purpose was designed to have a small API surface

@diasbruno
Copy link
Author

No problem, @josevalim. I understand your point.

I'll close this for now and run a fork of this repo, but I'll make this streaming line idea. Later on, if make sense,
we merge libraries (don't wanna keep a separare library just for this).

Thanks for taking time to answer.

@diasbruno diasbruno closed this Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants