-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of non-API entry points in data.table
#56
Conversation
Add polars/duckdb post
Provide examples for other cases. More tweaks.
Also, add references to R Internals and restructure the (TRUE)LENGTH sections into related sub-sections.
Enumerate the problems with SETLENGTH
And other tweaks
Expand other solutions, tweak the rest of the text
Otherwise rendering the document would require first compiling R-devel on GNU/Linux with --enable-R-shlib.
data.table needs support for both S4 tables and S4 columns. shallow() has an API solution. Link between sections, both S4 -> ATTRIB and in other places. Other tweaks.
Also, * expand the cases under "there's more" * provide the bibliographic references * finish all the other small TODOs
Dear @tdhock, Here's the preliminary version of the post. I think it contains or at least links to all the necessary information. If its form needs improvements or its content needs additions, I'll be glad to perform the necessary changes. Can we ask Michael Chirico for a reality check? I've done a deep dive into the R API, but I've never tuned a hash table manually. Also, my pre-R-3.1 history of both data.table and R may be very lacking. If we manage to get rid of all the NOTEs regarding non-API entry points, how about submitting an article to the R journal, presenting |
thanks! these are the current categories on https://rdatatable-community.github.io/The-Raft/ ambassadors (2) |
arbitrary integer values inside unused `SEXP` fields, `data.table` will | ||
have to look up the `CHARSXP` values using the externally available | ||
information. Performing $O(nk)$ direct pointer comparisons would scale | ||
poorly, so for an $O(1)$ individual lookup `data.table` could build a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the big O notation here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use atime to verify the asymptotic performance if we make this change
this is really excellent overall thanks very much. I wonder if you can think of any easy/logical way to break it up into several differnt blog pages? I did not see one, so I think it is ok to keep it as one big one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive stuff!
Thank you very much for the comments! I think that a better R Journal article could result from describing the fixes after they are implemented, not only plans for them. Regarding a logical breakdown, the issues could be split by difficulty: easy (rename a function and possibly add a wrapper); |
Can we use this for the post image? I apologise for my previous indecisiveness. Should I create a new PR for the changes? |
This is not yet ready to be posted — the unfinished pieces are marked by
<!-- TODO: ... -->
in the document source — but if you have any suggestions for the overall shape of the post or any edits that can be applied right away, I'll be glad to hear them.Edit: is there a list of potentially applicable categories?
For the image, I can't decide between a still from a certain MC Hammer video (source) and a more visual metaphor for something that one does fully knowing what they are doing, at their own peril, until one day it fails (source, CC-BY-SA-4.0).
Fixes: #55