From 40ad2e6978202ecc626db9eaae3a18ed5e4df769 Mon Sep 17 00:00:00 2001 From: Ivan K Date: Wed, 28 Aug 2024 16:36:35 +0300 Subject: [PATCH 1/5] Drop direct use of NAMED Since data.table now depends on R >= 3.3, the backports are no longer needed. Moreover, MAYBE_SHARED is currently a function, while MAYBE_REFERENCED expands to !NO_REFERENCES (which is a function). In debugging output, show MAYBE_REFERENCED (NAMED > 0) instead of NAMED. --- src/assign.c | 6 +++--- src/data.table.h | 8 -------- 2 files changed, 3 insertions(+), 11 deletions(-) diff --git a/src/assign.c b/src/assign.c index 27b115d94..dd1a1f75f 100644 --- a/src/assign.c +++ b/src/assign.c @@ -549,12 +549,12 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values) (TYPEOF(values)!=VECSXP && i>0) // assigning the same values to a second column. Have to ensure a copy #2540 ) { if (verbose) { - Rprintf(_("RHS for item %d has been duplicated because NAMED==%d MAYBE_SHARED==%d, but then is being plonked. length(values)==%d; length(cols)==%d)\n"), - i+1, NAMED(thisvalue), MAYBE_SHARED(thisvalue), length(values), length(cols)); + Rprintf(_("RHS for item %d has been duplicated because MAYBE_REFERENCED==%d MAYBE_SHARED==%d, but then is being plonked. length(values)==%d; length(cols)==%d)\n"), + i+1, MAYBE_REFERENCED(thisvalue), MAYBE_SHARED(thisvalue), length(values), length(cols)); } thisvalue = copyAsPlain(thisvalue); // PROTECT not needed as assigned as element to protected list below. } else { - if (verbose) Rprintf(_("Direct plonk of unnamed RHS, no copy. NAMED==%d, MAYBE_SHARED==%d\n"), NAMED(thisvalue), MAYBE_SHARED(thisvalue)); // e.g. DT[,a:=as.character(a)] as tested by 754.5 + if (verbose) Rprintf(_("Direct plonk of unnamed RHS, no copy. MAYBE_REFERENCED==%d, MAYBE_SHARED==%d\n"), MAYBE_REFERENCED(thisvalue), MAYBE_SHARED(thisvalue)); // e.g. DT[,a:=as.character(a)] as tested by 754.5 } SET_VECTOR_ELT(dt, coln, thisvalue); // plonk new column in as it's already the correct length setAttrib(thisvalue, R_NamesSymbol, R_NilValue); // clear names such as DT[,a:=mapvector[a]] diff --git a/src/data.table.h b/src/data.table.h index 8dee1f065..b36a1f1fa 100644 --- a/src/data.table.h +++ b/src/data.table.h @@ -60,14 +60,6 @@ // for use with CPLXSXP, no macro provided by R internals #define ISNAN_COMPLEX(x) (ISNAN((x).r) || ISNAN((x).i)) // TRUE if either real or imaginary component is NA or NaN -// Backport macros added to R in 2017 so we don't need to update dependency from R 3.0.0 -#ifndef MAYBE_SHARED -# define MAYBE_SHARED(x) (NAMED(x) > 1) -#endif -#ifndef MAYBE_REFERENCED -# define MAYBE_REFERENCED(x) ( NAMED(x) > 0 ) -#endif - // If we find a non-ASCII, non-NA, non-UTF8 encoding, we try to convert it to UTF8. That is, marked non-ascii/non-UTF8 encodings will // always be checked in UTF8 locale. This seems to be the best fix Arun could think of to put the encoding issues to rest. // Since the if-statement will fail with the first condition check in "normal" ASCII cases, there shouldn't be huge penalty issues in From 46dbfa93e72776c59dacb286de9831fa28c481b5 Mon Sep 17 00:00:00 2001 From: Ivan K Date: Wed, 28 Aug 2024 16:57:08 +0300 Subject: [PATCH 2/5] Almost drop direct use of LEVELS getCharCE appeared in R-2.7, making it possible to check for strings _marked_ as UTF-8 or Latin-1. There is no marking as ASCII, so fixing IS_ASCII will have to wait for R >= 4.5. --- src/bmerge.c | 3 --- src/data.table.h | 7 ++++--- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/src/bmerge.c b/src/bmerge.c index 6cf970ca2..4197c5f8d 100644 --- a/src/bmerge.c +++ b/src/bmerge.c @@ -17,9 +17,6 @@ Differences over standard binary search (e.g. bsearch in stdlib.h) : o non equi joins (no != yet) since 1.9.8 */ -#define ENC_KNOWN(x) (LEVELS(x) & 12) -// 12 = LATIN1_MASK (1<<2) | UTF8_MASK (1<<3) // Would use these definitions from Defn.h, but that appears to be private to R. Hence 12. - #define EQ 1 #define LE 2 #define LT 3 diff --git a/src/data.table.h b/src/data.table.h index b36a1f1fa..4abdb0542 100644 --- a/src/data.table.h +++ b/src/data.table.h @@ -33,9 +33,10 @@ // #include // the debugging machinery + breakpoint aidee // raise(SIGINT); -#define IS_UTF8(x) (LEVELS(x) & 8) -#define IS_ASCII(x) (LEVELS(x) & 64) -#define IS_LATIN(x) (LEVELS(x) & 4) +/* we mean the encoding bits, not CE_NATIVE in a UTF-8 locale */ +#define IS_UTF8(x) (getCharCE(x) == CE_UTF8) +#define IS_LATIN(x) (getCharCE(x) == CE_LATIN1) +#define IS_ASCII(x) (LEVELS(x) & 64) // API expected in R >= 4.5 #define IS_TRUE(x) (TYPEOF(x)==LGLSXP && LENGTH(x)==1 && LOGICAL(x)[0]==TRUE) #define IS_FALSE(x) (TYPEOF(x)==LGLSXP && LENGTH(x)==1 && LOGICAL(x)[0]==FALSE) #define IS_TRUE_OR_FALSE(x) (TYPEOF(x)==LGLSXP && LENGTH(x)==1 && LOGICAL(x)[0]!=NA_LOGICAL) From a524d1445bcf1f9225dadb1bd51e5cd5af1a713b Mon Sep 17 00:00:00 2001 From: Ivan K Date: Wed, 28 Aug 2024 18:30:19 +0300 Subject: [PATCH 3/5] NEWS entry for NAMED, LEVELS reduction --- NEWS.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/NEWS.md b/NEWS.md index bbc2c6b6d..73e89d777 100644 --- a/NEWS.md +++ b/NEWS.md @@ -10,6 +10,8 @@ 1. Tests run again when some Suggests packages are missing, [#6411](https://github.com/Rdatatable/data.table/issues/6411). Thanks @aadler for the note and @MichaelChirico for the fix. +2. All direct use of `NAMED` and some direct use of `LEVELS` has been removed from the package, reducing the use of non-API functions, see [#6420](https://github.com/Rdatatable/data.table/pull/6420) by Ivan Krylov. The work is on the overall issue [#6180](https://github.com/Rdatatable/data.table/issues/6180) reported by Michael Chirico. + # data.table [v1.16.0](https://github.com/Rdatatable/data.table/milestone/30) (25 August 2024) ## BREAKING CHANGES From 72cbd170fd16844dd8094b8d049d2e56d0926d22 Mon Sep 17 00:00:00 2001 From: Ivan K Date: Wed, 28 Aug 2024 16:57:08 +0300 Subject: [PATCH 4/5] Drop direct use of LEVELS in R >= 4.5 There's no explicit encoding code for ASCII, so use charIsASCII() ("eapi", expected to appear in R-4.5.0). --- src/data.table.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/data.table.h b/src/data.table.h index 4abdb0542..ab52a1757 100644 --- a/src/data.table.h +++ b/src/data.table.h @@ -36,7 +36,11 @@ /* we mean the encoding bits, not CE_NATIVE in a UTF-8 locale */ #define IS_UTF8(x) (getCharCE(x) == CE_UTF8) #define IS_LATIN(x) (getCharCE(x) == CE_LATIN1) -#define IS_ASCII(x) (LEVELS(x) & 64) // API expected in R >= 4.5 +#if R_VERSION < R_Version(4, 5, 0) +# define IS_ASCII(x) (LEVELS(x) & 64) +#else +# define IS_ASCII(x) (Rf_charIsASCII(x)) // no CE_ASCII +#endif #define IS_TRUE(x) (TYPEOF(x)==LGLSXP && LENGTH(x)==1 && LOGICAL(x)[0]==TRUE) #define IS_FALSE(x) (TYPEOF(x)==LGLSXP && LENGTH(x)==1 && LOGICAL(x)[0]==FALSE) #define IS_TRUE_OR_FALSE(x) (TYPEOF(x)==LGLSXP && LENGTH(x)==1 && LOGICAL(x)[0]!=NA_LOGICAL) From b409a408f57912ca1792649bb610bc53b8f733a3 Mon Sep 17 00:00:00 2001 From: Michael Chirico Date: Wed, 28 Aug 2024 09:37:40 -0700 Subject: [PATCH 5/5] fine-tune NEWS --- NEWS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index 73e89d777..5fbe86b76 100644 --- a/NEWS.md +++ b/NEWS.md @@ -10,7 +10,7 @@ 1. Tests run again when some Suggests packages are missing, [#6411](https://github.com/Rdatatable/data.table/issues/6411). Thanks @aadler for the note and @MichaelChirico for the fix. -2. All direct use of `NAMED` and some direct use of `LEVELS` has been removed from the package, reducing the use of non-API functions, see [#6420](https://github.com/Rdatatable/data.table/pull/6420) by Ivan Krylov. The work is on the overall issue [#6180](https://github.com/Rdatatable/data.table/issues/6180) reported by Michael Chirico. +2. Continued work to remove non-API C functions, [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks Ivan Krylov for the PR and for writing a clear and concise guide about the R API: https://aitap.codeberg.page/R-api/. # data.table [v1.16.0](https://github.com/Rdatatable/data.table/milestone/30) (25 August 2024)