Add 'data deconstructors' - unjoin()
/ unrbind()
/ uncbind()
#16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi Nicola,
Apologies for the issue-less PR - your message yesterday reminded me of a concept I had for
{messy}
.A common step in data manipulation is to join datasets together. In air quality that might be binding monitoring data together with meteorological data, or adding site metadata. There'll be equivalents in any other field, though - combining clinical results with patient data, combining demographic data with sales history, and so on.
I often have to do effectively the below when I teach:
Created on 2025-02-11 with reprex v2.1.1
This PR adds three functions - the above
unjoin()
as well asunrbind()
anduncbind()
. The latter two chunk up your dataframe colwise and rowwise randomly based on user-defined sizes/proportions. This models data with similar structures coming from different sources - e.g., a monthly data report coming from a lab that needs binding into a single dataframe.If I'm honest, I can't think of a purpose for
uncbind()
that's not better achieved usingunjoin()
but it made sense to complete the set!Users can, of course, go ahead and use
messy()
or another function on each output data. This will make it even harder to re-join them for learners, as they'd have to ensure that column names match (forrbind()
) or their joining columns are aligned (formerge()
/left_join()
).