-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Rf_charIsASCII for IS_ASCII instead of testing LEVELS on R >= 4.5 #6422
base: master
Are you sure you want to change the base?
Conversation
Since data.table now depends on R >= 3.3, the backports are no longer needed. Moreover, MAYBE_SHARED is currently a function, while MAYBE_REFERENCED expands to !NO_REFERENCES (which is a function). In debugging output, show MAYBE_REFERENCED (NAMED > 0) instead of NAMED.
getCharCE appeared in R-2.7, making it possible to check for strings _marked_ as UTF-8 or Latin-1. There is no marking as ASCII, so fixing IS_ASCII will have to wait for R >= 4.5.
There's no explicit encoding code for ASCII, so use charIsASCII() ("eapi", expected to appear in R-4.5.0).
Generated via commit b409a40 Download link for the artifact containing the test results: ↓ atime-results.zip Time taken to finish the standard R installation steps: 11 minutes and 31 seconds Time taken to run |
Don't mind that, we maintainers will take care of it :) |
set #6420 as the target of this PR to make the chain clearer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it an issue that the pre 4.5.0 version returns 1 or 0 whereas the new version returns an Rboolean
?
Good catch! It's a bit worse: LEVELS(x) & 64 returns 0 or 64.
Here are the only users of IS_ASCII:
data.table.h:
#define NEED2UTF8(s) !(IS_ASCII(s) || (s)==NA_STRING || IS_UTF8(s))
fwriteR.c:
#define TO_NATIVE(s) (native && (s)!=NA_STRING && !IS_ASCII(s))
forder.c:
if (!anynotutf8 && /*...*/ !IS_ASCII(s)) {
They all use the result of the macro in boolean context (if / || / &&),
so any non-zero result is interpreted as true.
|
This part is split from #6420 (but includes #6420; can rebase if needed) because the R version containing
Rf_isCharASCII
is not currently released. The overall issue is #6180. Not yet sure how to changeNEWS.md
in case we're expectingdata.table
releases in the meantime.