Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of non-API entry points in data.table #56

Merged
merged 45 commits into from
Dec 2, 2024

Conversation

aitap
Copy link
Contributor

@aitap aitap commented Nov 19, 2024

This is not yet ready to be posted — the unfinished pieces are marked by <!-- TODO: ... --> in the document source — but if you have any suggestions for the overall shape of the post or any edits that can be applied right away, I'll be glad to hear them.

Edit: is there a list of potentially applicable categories?

For the image, I can't decide between a still from a certain MC Hammer video (source) and a more visual metaphor for something that one does fully knowing what they are doing, at their own peril, until one day it fails (source, CC-BY-SA-4.0).

Fixes: #55

aitap added 10 commits November 17, 2024 18:55
Expand other solutions, tweak the rest of the text
Otherwise rendering the document would require first compiling R-devel
on GNU/Linux with --enable-R-shlib.
data.table needs support for both S4 tables and S4 columns. shallow()
has an API solution. Link between sections, both S4 -> ATTRIB and in
other places. Other tweaks.
Also,
* expand the cases under "there's more"
* provide the bibliographic references
* finish all the other small TODOs
@aitap
Copy link
Contributor Author

aitap commented Nov 27, 2024

Dear @tdhock,

Here's the preliminary version of the post. I think it contains or at least links to all the necessary information. If its form needs improvements or its content needs additions, I'll be glad to perform the necessary changes. Can we ask Michael Chirico for a reality check? I've done a deep dive into the R API, but I've never tuned a hash table manually. Also, my pre-R-3.1 history of both data.table and R may be very lacking.

If we manage to get rid of all the NOTEs regarding non-API entry points, how about submitting an article to the R journal, presenting data.table as a case study in API compliance, with everyone who did the planning and implementation as co-authors?

@aitap aitap marked this pull request as ready for review November 27, 2024 11:57
@tdhock
Copy link
Contributor

tdhock commented Nov 27, 2024

thanks! these are the current categories on https://rdatatable-community.github.io/The-Raft/

ambassadors (2)
announcements (7)
application package (1)
benchmarks (1)
bridge package (2)
community (3)
conferences (1)
developer (4)
documentation (2)
extension package (1)
funding opportunity (2)
governance (2)
grant (9)
guest post (2)
opinion (1)
partner package (1)
performance (1)
releases (1)
seal of approval (6)
testing (2)
tips (4)
translation (1)
travel (1)
tutorials (4

arbitrary integer values inside unused `SEXP` fields, `data.table` will
have to look up the `CHARSXP` values using the externally available
information. Performing $O(nk)$ direct pointer comparisons would scale
poorly, so for an $O(1)$ individual lookup `data.table` could build a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the big O notation here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use atime to verify the asymptotic performance if we make this change

@tdhock
Copy link
Contributor

tdhock commented Nov 27, 2024

this is really excellent overall thanks very much.
I guess you could probably submit it for peer review @ R journal as well?

I wonder if you can think of any easy/logical way to break it up into several differnt blog pages? I did not see one, so I think it is ok to keep it as one big one.

Copy link
Contributor

@Anirban166 Anirban166 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive stuff!

posts/2024-12-12-non-api-use/index.qmd Outdated Show resolved Hide resolved
posts/2024-12-12-non-api-use/index.qmd Outdated Show resolved Hide resolved
posts/2024-12-12-non-api-use/index.qmd Outdated Show resolved Hide resolved
posts/2024-12-12-non-api-use/index.qmd Outdated Show resolved Hide resolved
posts/2024-12-12-non-api-use/index.qmd Outdated Show resolved Hide resolved
@aitap
Copy link
Contributor Author

aitap commented Nov 29, 2024

Thank you very much for the comments! I think that a better R Journal article could result from describing the fixes after they are implemented, not only plans for them.

Regarding a logical breakdown, the issues could be split by difficulty: easy (rename a function and possibly add a wrapper); TRUELENGTH and friends (ALTREP + pointer hashes, could be split into two); ATTRIB and friends (currently not clear how to replace).

@kbodwin kbodwin changed the base branch from main to dev December 2, 2024 06:02
@kbodwin kbodwin merged commit dc9a305 into rdatatable-community:dev Dec 2, 2024
@aitap
Copy link
Contributor Author

aitap commented Dec 2, 2024

Can we use this for the post image? I apologise for my previous indecisiveness. Should I create a new PR for the changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

blog about C API
4 participants