forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
C-interface.rmd
676 lines (497 loc) · 25.6 KB
/
C-interface.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
---
title: R's C interface
layout: default
---
```{r, echo = FALSE}
library(inline, quietly = TRUE)
```
# R's C interface {#c-api}
## Introduction
Reading the source code of R is an extremely powerful technique for improving your R programming. However, at some point you will hit a brick wall: many R functions are implemented in C. This guide gives you a basic introduction to C and R's internal C api, giving you the basic knowledge needed to read the internals of R that are written in C.
If you want to write new high-performance code, we do not recommend using C, but instead strongly recommend using Rcpp to connect to C++. The Rcpp API protects you from many of the historical idiosyncracies of the R API, takes care of memory management for you, and provides many useful helper methods
The contents of this chapter are adapted from Section 5 ("System and foreign language interfaces") of [Writing R extensions](http://cran.r-project.org/doc/manuals/R-exts.html), focussing on best practices and modern tools. This means it does not cover:
* the `.C` interface
* the old api defined in `Rdefines.h`
* rarely used and esoteric language features
To understand existing C code, it's useful to generate simple examples of your own that you can experiment with. To that end, all examples in this chapter use the `inline` package, which makes it extremely easy to get up and running with C code. Make sure you have it installed and loaded with the following code:
```{r, eval = FALSE}
install.packages("inline")
library(inline)
```
You'll also (obviously) need a working C compiler. Windows users can use Duncan Murdoch's [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Mac users will need the [Xcode command line tools](http://developer.apple.com/). Most Linux distributions will come with the necessary compilers.
## Differences between R and C
Even if you've never used C before, you should be able to read C code because the basic structure is similar to R. If you want to learn it more formally [The C programming language](http://amzn.com/0131101633?tag=devtools-20) by Kernigan and Ritchie is a classic.
Important differences from R include:
* variables can only store specific types of object, and must be declared before use
* objects are modified in place, unless you specifically copy the object
* indices start at 0, not 1
* you must use a semi-colon at end of each expression
* you must have an explicit return statement
* assignment is done with `=`, not `<-`
## Calling C functions from R
Generally, calling C functions from R involves two parts: a C function and an R function that uses `.Call`. The simple function below adds two numbers together and illustrates some of the important features of coding in C (creating new R vectors, coercing input arguments to the appropriate type and dealing with garbage collection).
```c
// In C ----------------------------------------
#include <R.h>
#include <Rinternals.h>
SEXP add(SEXP a, SEXP b) {
SEXP result;
PROTECT(result = allocVector(REALSXP, 1));
REAL(result)[0] = asReal(a) + asReal(b);
UNPROTECT(1);
return(result);
}
```
```{r, eval = FALSE}
# In R ----------------------------------------
add <- function(a, b) {
.Call("add", a, b)
}
```
In this chapter we'll produce these two pieces in one step by using the `inline` package. This allows us to write:
```{r, cache = TRUE}
add <- cfunction(signature(a = "integer", b = "integer"), "
SEXP result;
PROTECT(result = allocVector(REALSXP, 1));
REAL(result)[0] = asReal(a) + asReal(b);
UNPROTECT(1);
return(result);
")
add(1, 5)
```
The C functions and macros that R provides for us to modify R data structures are all defined in the header file `Rinternals.h`. It's easiest to find and display this file from within R:
```{r, eval = FALSE}
rinternals <- file.path(R.home(), "include", "Rinternals.h")
file.show(rinternals)
```
Before we begin writing and reading C code, we need to know a little about the basic data structures.
## Basic data structures
At the C-level, all R objects are stored in a common datatype, the `SEXP`. (Technically, this is a pointer to a structure with typedef `SEXPREC`). A `SEXP` is a variant type, with subtypes for all R's data structures. The most important types are:
* `REALSXP`: numeric vectors
* `INTSXP`: integer vectors
* `LGLSXP`: logical vectors
* `STRSXP`: character vectors
* `VECSXP`: lists
* `CLOSXP`: functions (closures)
* `ENVSXP`: environments
__Beware:__ At the C level, R's lists are `VECSXP`s not `LISTSXP`s. This is because early implementations of R used Lisp-like linked lists (now known as "pairlists") before moving to the S-like generic vectors that we now know as lists.
There are also `SEXP`s for less common object types:
* `CPLXSXP`: complex vectors
* `LISTSXP`: "pair" lists. At the R level, you only need to care about the distinction lists and pairlists for function arguments, but internally they are used in many more places.
* `DOTSXP`: '...'
* `SYMSXP`: names/symbols
* `NILSXP`: `NULL`
And `SEXP`s for internal objects, objects that are usually only created and used by C functions, not R functions:
* `LANGSXP`: language constructs
* `CHARSXP`: "scalar" strings (see below)
* `PROMSXP`: promises, lazily evaluated function arguments
* `EXPRSXP`: expressions
There's no built-in R function to easily access these names, but we can write one: (This is adapted from code in R's `inspect.c`)
```{r, cache = TRUE}
sexp_type <- cfunction(c(x = "ANY"), '
switch (TYPEOF(x)) {
case NILSXP: return mkString("NILSXP");
case SYMSXP: return mkString("SYMSXP");
case LISTSXP: return mkString("LISTSXP");
case CLOSXP: return mkString("CLOSXP");
case ENVSXP: return mkString("ENVSXP");
case PROMSXP: return mkString("PROMSXP");
case LANGSXP: return mkString("LANGSXP");
case SPECIALSXP: return mkString("SPECIALSXP");
case BUILTINSXP: return mkString("BUILTINSXP");
case CHARSXP: return mkString("CHARSXP");
case LGLSXP: return mkString("LGLSXP");
case INTSXP: return mkString("INTSXP");
case REALSXP: return mkString("REALSXP");
case CPLXSXP: return mkString("CPLXSXP");
case STRSXP: return mkString("STRSXP");
case DOTSXP: return mkString("DOTSXP");
case ANYSXP: return mkString("ANYSXP");
case VECSXP: return mkString("VECSXP");
case EXPRSXP: return mkString("EXPRSXP");
case BCODESXP: return mkString("BCODESXP");
case EXTPTRSXP: return mkString("EXTPTRSXP");
case WEAKREFSXP: return mkString("WEAKREFSXP");
case S4SXP: return mkString("S4SXP");
case RAWSXP: return mkString("RAWSXP");
default: return mkString("<unknown>");
}')
sexp_type(10)
sexp_type(10L)
sexp_type("a")
sexp_type(T)
sexp_type(list(a = 1))
sexp_type(pairlist(a = 1))
```
### Character vectors
R character vectors are stored as `STRSXP`s, a vector type where every element is a `CHARSXP`. `CHARSXP`s are read-only objects and must never be modified. In particular, the C-style string contained in a `CHARSXP` should be treated as read-only; it's hard to do otherwise because the `CHAR` accessor function returns a `const char*`.
Strings have this more complicated design because individual `CHARSXP`'s (elements of a character vector) can be shared between multiple strings. This is an optimisation to reduce memory usage, and can result in unexpected behaviour:
```{r, eval = FALSE}
x <- "banana"
y <- rep(x, 1e6)
object.size(x)
# 32-bit: 64 bytes
# 64-bit: 96 bytes
object.size(y) / 1e6
# 32-bit: 4.000056 bytes
# 64-bit: 8.000088 bytes
```
In 32-bit R, factors occupy about the same amount of memory as strings: both pointers and integers are 4 bytes. In 64-bit R, pointers are 8 bytes, so factors take about twice as much memory as strings.
## Coercion and object creation
At the heart of every C function will be a set of conversions between R data structures and C data structures. Inputs and output will always be R data structures (`SEXP`s) and you will need to convert them to C data structures in order to do any work. An additional complication is the garbage collector: if you don't claim every R object you create, the garbage collector will think they are unused and delete them.
### Object creation and garbage collection
The simplest way to create an new R-level object is `allocVector`, which takes two arguments, the type of `SEXP` (or `SEXPTYPE`) to create, and the length of the vector. The following code creates a three element list containing a logical vector, a numeric vector and an integer vector:
```{r, cache = TRUE}
dummy <- cfunction(body = '
SEXP vec, real, lgl, ints;
PROTECT(real = allocVector(REALSXP, 2));
PROTECT(lgl = allocVector(LGLSXP, 10));
PROTECT(ints = allocVector(INTSXP, 10));
PROTECT(vec = allocVector(VECSXP, 3));
SET_VECTOR_ELT(vec, 0, real);
SET_VECTOR_ELT(vec, 1, lgl);
SET_VECTOR_ELT(vec, 2, ints);
UNPROTECT(4);
return(vec);
')
dummy()
```
You might wonder what all the `PROTECT` calls do. They tell R that we're currently using each object, and not to delete it if the garbage collector is activated. (We don't need to protect objects that R already knows about, like function arguments).
You also need to make sure that every protected object is unprotected. `UNPROTECT` takes a single integer argument, n, and unprotects the last n objects that were protected. If your calls don't match, R will warn about a "stack imbalance in .Call".
Other specialised forms of `PROTECT` and `UNPROTECT` are needed in some circumstances: `UNPROTECT_PTR(s)` unprotects the
object pointed to by the `SEXP` s, `PROTECT_WITH_INDEX` saves an index of the protection location that can be used to replace the protected value using `REPROTECT`. Consult the R externals section on [garbage collection](http://cran.r-project.org/doc/manuals/R-exts.html#Garbage-Collection) for more details.
If you run `dummy()` a few times, you'll notice the output is basically random. This is because `allocVector` assigns memory to each output, but it doesn't clean it out first. For real functions, you'll want to loop through each element in the vector and zero it out. The most efficient way to do that is to use `memset`:
```{r, cache = TRUE}
zeroes <- cfunction(c(n_ = "integer"), '
int n = asInteger(n_);
SEXP out;
PROTECT(out = allocVector(INTSXP, n));
memset(INTEGER(out), 0, n * sizeof(int));
UNPROTECT(1);
return out;
')
zeroes(10);
```
### Allocation shortcuts
There are a few shortcuts for allocating matrices and 3d arrays:
```c
allocMatrix(SEXPTYPE mode, int nrow, int ncol)
alloc3DArray(SEXPTYPE mode, int nrow, int ncol, int nface)
```
Beware `allocList` - it creates a pairlist, not a regular list.
The `mkNamed` function simplifies the creation of named vectors. The following code is equivalent to `list(a = NULL, b = NULL, c = NULL)`:
```c
const char *names[] = {"a", "b", "c", ""};
mkNamed(VECSXP, names);
```
### Extracting C vectors
There is a helper function for each atomic vector (apart from character, see following) that allows you to index into a `SEXP` and access the C-level data structure that lives at its heart.
The following example shows how to use the helper function `REAL` to inspect and modify a numeric vector:
```{r, cache = TRUE}
add_one <- cfunction(c(x = "numeric"), "
SEXP out;
int n = length(x);
PROTECT(out = allocVector(REALSXP, n));
for (int i = 0; i < n; i++) {
REAL(out)[i] = REAL(x)[i] + 1;
}
UNPROTECT(1);
return out;
")
add_one(as.numeric(1:10))
```
There are similar helpers for logical, `LOGICAL(x)`, integer, `INTEGER(x)`, complex `COMPLEX(x)` and raw vectors `RAW(x)`.
If you're working with long vectors, there's a performance advantage to using the helper function once and saving the result in a pointer:
```{r, cache = TRUE}
add_two <- cfunction(c(x = "numeric"), "
SEXP out;
int n = length(x);
double *px, *pout;
PROTECT(out = allocVector(REALSXP, n));
px = REAL(x);
pout = REAL(out);
for (int i = 0; i < n; i++) {
pout[i] = px[i] + 2;
}
UNPROTECT(1);
return out;
")
add_two(as.numeric(1:10))
library(microbenchmark)
x <- as.numeric(1:1e6)
microbenchmark(
add_one(x),
add_two(x)
)
```
On my computer, `add_two` is about twice as fast as `add_one` for a million element vector. This is a common idiom in the R source code.
Strings and lists are more complicated because the individual elements are `SEXP`s not C-level data structures. You can use `STRING_ELT(x, i)` and `VECTOR_ELT(x, i)` to extract individual components of strings and lists respectively. To get a single C string from a element in a R character vector, use `CHAR(STRING_ELT(x, i))`. Set values in a list or character vector with `SET_VECTOR_ELT` and `SET_STRING_ELT`.
### Modifying strings
String vectors are a little more complicated. As discussed earlier, a string vector is a vector made up of pointers to immutable `CHARSXP`s, and it's the `CHARSXP` that contains the C string (which can be extracted using `CHAR`). The following function shows how to create a vector of fixed values:
```{r, cache = TRUE}
abc <- cfunction(NULL, '
SEXP out;
PROTECT(out = allocVector(STRSXP, 3));
SET_STRING_ELT(out, 0, mkChar("a"));
SET_STRING_ELT(out, 1, mkChar("b"));
SET_STRING_ELT(out, 2, mkChar("c"));
UNPROTECT(1);
return out;
')
abc()
```
Things are a little harder if you want to modify the strings in the vector because you need to know a lot about string manipulation in C (which is hard, and harder to do right). For any problem that involves any kind of string modification, you're better off using Rcpp.
The following function just makes a copy of a string, so you can at least see how all the pieces work together.
```{r, cache = TRUE}
copy <- cfunction(c(x = "character"), '
SEXP out;
int n = length(x);
const char* letter;
PROTECT(out = allocVector(STRSXP, n));
for (int i = 0; i < n; i++) {
letter = CHAR(STRING_ELT(x, i));
SET_STRING_ELT(out, i, mkChar(letter));
}
UNPROTECT(1);
return out;
')
copy(letters)
```
One last useful function for operating with strings: we can use `STRING_PTR` to get a pointer to the `STRING_ELT`s in a vector, such that we can access the `STRING_ELT`s by de-referencing the pointer. Occasionally, this can be easier to work with. We'll show how this can make a simple example of reversing a vector of strings easier.
```{r, cache = TRUE}
reverse <- cfunction( signature(x="character"), '
SEXP out;
int len = length(x);
PROTECT( out = allocVector(STRSXP, len) );
SEXP* out_ptr = STRING_PTR(out);
SEXP* x_ptr = STRING_PTR(x);
for( int i=0; i < len; ++i ) {
out_ptr[i] = x_ptr[len-i-1];
}
UNPROTECT(1);
return out;
')
reverse(letters)
```
### Coercing scalars
There also a few helper functions if you want to turn the first element of an R vector into a C scalar:
* `asLogical(x): INTSXP -> int`
* `asInteger(x): INTSXP -> int`
* `asReal(x): REALSXP -> double`
* `CHAR(asChar(x)): STRSXP -> const char*`
And similarly it's easy to turn a C scalar into a length-one R vector:
* `ScalarLogical(x): int -> LGLSXP`
* `ScalarInteger(x): int -> INTSXP`
* `ScalarReal(x): double -> REALSXP`
* `mkString(x): const char* -> STRSXP`
These all create R-level objects, so need to be `PROTECT`ed.
## Modifying objects
You must be very careful when modifying an object that the user has passed into the function. The following function has some very unexpected behaviour:
```{r, cache = TRUE}
add_three <- cfunction(c(x = "numeric"), '
REAL(x)[0] = REAL(x)[0] + 3;
return(x);
')
x <- 1
y <- x
add_three(x)
x
y
```
Not only has it modified the value of `x`, but it has also modified `y`! This happens because of the way that R implements it's copy-on-modify philosophy. It does so lazily, so a complete copy only has to be made if you make a change: `x` and `y` point to the same object, and the object is only duplicated if you change either `x` or `y`.
To avoid problems like this, always `duplicate()` inputs before modifying them:
```{r, cache = TRUE}
add_four <- cfunction(c(x = "numeric"), '
SEXP x_copy;
PROTECT(x_copy = duplicate(x));
REAL(x_copy)[0] = REAL(x_copy)[0] + 4;
UNPROTECT(1);
return(x_copy);
')
x <- 1
y <- x
add_four(x)
x
y
```
You should also be careful when calling C functions. Apart from the basic constructor functions and duplicate, you should never assume it's ok to modify an object in place without first checking that `NAMED(x)` is 0. If you're doing this sort of check in many places, you may want to define the following macro:
```c
#define MAYBE_SHARED(x) (NAMED(x) > 0)
```
This will also protect you if the interface for checking if an object is shared elsewhere changes. (Thanks to Luke Tierney for suggesting this macro and providing more details about when you need to make a copy).
## Pairlists and symbols
R hides a few details of the underlying datastructures it uses. In some places in R code, it looks like you're working with a list (`VECSXP`), but behind the scenes R you're actually modifying a pairlist (`LISTSXP`). These include attributes, calls and `...`.
Pairlists differ from lists in the following ways:
* Pairlists are [linked lists](http://en.wikipedia.org/wiki/Linked_list), a data structure which does not have any easy way to get to an arbitrary element of the list
* Pairlists have `tags`, not `names`, and tags are symbols, not strings.
Because you can't easily index into a specified location in a pairlist, R provides a set of helper functions to moved along the linked list. The basic helpers are `CAR` which extracts the first element of the list, and `CDR` which extracts the rest. These can be composed to get `CAAR`, `CDAR`, `CADDR`, `CADDR`, `CADDDR`. As well as the getters, R also provides `SETCAR`, `SETCDR` etc.
```{r, cache = TRUE}
car <- cfunction(c(x = "ANY"), 'return(CAR(x));')
cdr <- cfunction(c(x = "ANY"), 'return(CDR(x));')
cadr <- cfunction(c(x = "ANY"), 'return(CADR(x));')
x <- quote(f(a = 1, b = 2))
car(x)
cdr(x)
car(cdr(x))
cadr(x)
```
You can make new pairlists with `CONS` or `LCONS` (if you want a call object). A pairlist is always terminated with `R_NilValue`.
```{r, cache = TRUE}
new_call <- cfunction(NULL, '
SEXP out;
out = LCONS(install("+"), LCONS(
ScalarReal(10), LCONS(
ScalarReal(5), R_NilValue)));
return out;
')
new_call();
```
Similarly, you can loop through all elements of a pairlist as follows:
```{r, cache = TRUE}
count <- cfunction(c(x = "ANY"), '
SEXP el, nxt;
int i = 0;
for(nxt = x; nxt != R_NilValue; el = CAR(nxt), nxt = CDR(nxt)) {
i++;
}
return(ScalarInteger(i));
')
count(quote(f(a, b, c)))
count(quote(f()))
```
`TAG` and `SET_TAG` allow you to get and set the tag (aka name) associated with an element of a pairlist. The tag should be a symbol. To create a symbol (the equivalent of `as.symbol` or `as.name` in R), use `install`.
Attributes are also pairlists behind the scenes, but come with the helper functions `setAttrib` and `getAttrib` to make access a little easier:
```{r, cache = TRUE}
set_attr <- cfunction(c(obj = "ANY", attr = "character", value = "ANY"), '
const char* attr_s = CHAR(asChar(attr));
duplicate(obj);
setAttrib(obj, install(attr_s), value);
return(obj);
')
x <- 1:10
set_attr(x, "a", 1)
```
There are some (confusingly named) shortcuts for common setting operations: `classgets`, `namesgets`, `dimgets` and `dimnamesgets` are the internal versions of the default methods of `class<-`, `names<-`, `dim<-` and `dimnames<-`.
```{r, cache = TRUE}
tags <- cfunction(c(x = "ANY"), '
SEXP el, nxt, out;
int i = 0;
for(nxt = CDR(x); nxt != R_NilValue; nxt = CDR(nxt)) {
i++;
}
PROTECT(out = allocVector(VECSXP, i));
for(nxt = CDR(x), i = 0; nxt != R_NilValue; i++, nxt = CDR(nxt)) {
SET_VECTOR_ELT(out, i, TAG(nxt));
}
UNPROTECT(1);
return out;
')
tags(quote(f(a = 1, b = 2, c = 3)))
tags(quote(f()))
```
## Missing and non-finite values
For floating point numbers, R's `NA` is a subtype of `NaN` so IEEE 754 arithmetic should handle it correctly. However, it is unwise to depend on such details, and is better to deal with missings explicitly:
* In `REALSXP`s, use the `ISNA` macro, `ISNAN`, or `R_FINITE` macros to check for missing, NaN or non-finite values. Use the constants `NA_REAL`, `R_NaN`, `R_PosInf` and `R_NegInf` to set those values
* In `INTSXP`s, compare/set values to `NA_INTEGER`
* In `LGLSXP`s, compare/set values to `NA_LOGICAL`
* In `STRSXP`s, compare/set `CHAR(STRING_ELT(x, i))` values to `NA_STRING`.
For example, a primitive implementation of `is.NA` might look like
```{r, cache = TRUE}
is_na <- cfunction(c(x = "ANY"), '
SEXP out;
int n = length(x);
PROTECT(out = allocVector(LGLSXP, n));
for (int i = 0; i < n; i++) {
switch(TYPEOF(x)) {
case LGLSXP:
LOGICAL(out)[i] = (LOGICAL(x)[i] == NA_LOGICAL);
break;
case INTSXP:
LOGICAL(out)[i] = (INTEGER(x)[i] == NA_INTEGER);
break;
case REALSXP:
LOGICAL(out)[i] = ISNA(REAL(x)[i]);
break;
case STRSXP:
LOGICAL(out)[i] = (STRING_ELT(x, i) == NA_STRING);
break;
default:
LOGICAL(out)[i] = NA_LOGICAL;
}
}
UNPROTECT(1);
return out;
')
is_na(c(NA, 1L))
is_na(c(NA, 1))
is_na(c(NA, "a"))
is_na(c(NA, TRUE))
```
It's worth noting that R's `base::is.na` returns `TRUE` for both `NA` and `NaN`s in a numeric vector, as opposed to the C level `ISNA` macro, which returns `TRUE` only for `NA_REAL`s.
There are a few other special values:
```{r, cache = TRUE, error = TRUE}
nil <- cfunction(NULL, 'return(R_NilValue);')
unbound <- cfunction(NULL, 'return(R_UnboundValue);')
missing_arg <- cfunction(NULL, 'return(R_MissingArg);')
x <- missing_arg()
x
```
## Checking types in C
If the user provides different input to your function to what you're expecting (e.g. provides a list instead of a numeric vector), it's very easy to crash R. For this reason, it's a good idea to write a wrapper function that checks arguments are of the correct type, or coerces them if necessary. It's usually easier to do this at the R level. For example, going back to our first example of C code, we might rename it to `add_` and then write a wrapper function to check the inputs are ok:
```{r, cache = TRUE}
add_ <- cfunction(signature(a = "integer", b = "integer"), "
SEXP result;
PROTECT(result = allocVector(REALSXP, 1));
REAL(result)[0] = asReal(a) + asReal(b);
UNPROTECT(1);
return(result);
")
add <- function(a, b) {
stopifnot(is.numeric(a), is.numeric(b), length(a) == 1, length(b) == 1)
add_(a, b)
}
```
Or if we wanted to be more accepting of diverse inputs:
```{r, cache = TRUE}
add <- function(a, b) {
a <- as.numeric(a)
b <- as.numeric(b)
if (length(a) > 1) warning("Only first element of a used")
if (length(a) > 1) warning("Only first element of b used")
add_(a, b)
}
```
To coerce objects at the C level, use `PROTECT(new = coerceVector(old, SEXPTYPE))`. This will return an error if the `SEXP` can not be converted to the desired type. Note that these coercion functions do not use S3 dispatch.
To check if an object is of a specified type, you can use `TYPEOF`, which returns a `SEXPTYPE`:
```{r, cache = TRUE}
is_numeric <- cfunction(c("x" = "ANY"), "
return(ScalarLogical(TYPEOF(x) == REALSXP));
")
is_numeric(7)
is_numeric("a")
```
Or you can use one of the many helper functions. They all return 0 for FALSE and 1 for TRUE:
* For atomic vectors: `isInteger`, `isReal`, `isComplex`, `isLogical`, `isString`.
* For combinations of atomic vectors: `isNumeric` (integer, logical, real), `isNumber` (integer, logical, real, complex), `isVectorAtomic` (logical, interger, numeric, complex, string, raw)
* Matrices (`isMatrix`) and arrays (`isArray`)
* For other more esoteric object: `isEnvironment`, `isExpression`, `isList` (a pair list), `isNewList` (a list), `isSymbol`, `isNull`, `isObject` (S4 objects), `isVector` (atomic vectors, lists, expressions)
Note that some of these functions behave differently to the R-level functions with similar names. For example `isVector` is true for any atomic vector type, lists and expression, where `is.vector` is returns `TRUE` only if its input has no attributes apart from names.
## Finding the C source code for a function
In many R functions you'll find code like `.Internal(mean(x))` or `.Primitive("sum")`. That means that most of the function is implemented at the C-level. There are two steps to finding the corresponding C source code:
* First, open [src/main/names.c](https://github.com/wch/r-source/blob/trunk/src/main/names.c) and search for the name of the function. You'll find an entry that tells you the name of the function (which always starts with `do_`)
* Next, search the R source code for the name of that function. To make it easier to find where it's defined (rather than everywhere it's used), you can add `(SEXP`. e.g. to find the source code for `findInterval`, search for `do_findinterval(SEXP`.
## `.External`
An alternative to using `.Call` is to use `.External`. It is used almost identically, except that the C function will recieve a single arugment containing a `LISTSXP`, a pairlist from which the arguments can be extracted. This makes it possible to write functions that take a variable number of arguments.
`inline` does not currently support `.External` functions.
## Using C code in a package
If you're putting your code in a package, it's generally a good idea to stop using `inline` and revert back to separate R and C functions. At a minimum, you'll need:
* R files in a `R/` directory
* C files in `src/` directory
* A `DESCRIPTION` in the main directory
* A `NAMESPACE` file containing `useDynLib(packagename)`, which can be generated using a `roxygen2` tag: `@useDynLib packagename`
Running `load_all(path/to/package)` will automatically compile and reload the code in your package.
Your C code will need these headers:
```C
#include <R.h>
#include <Rinternals.h>
```