Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between fmean and fsum #628

Open
NicChr opened this issue Aug 26, 2024 · 1 comment
Open

Inconsistency between fmean and fsum #628

NicChr opened this issue Aug 26, 2024 · 1 comment

Comments

@NicChr
Copy link

NicChr commented Aug 26, 2024

Hello, firstly great package again as always!
Just wanted to highlight a few inconsistencies and possibly an erroneous result.

I noticed that when calling fmean() on an integer vector with NA values and groups, the result doesn't return NA as one might expect. I suspect it has something to do with the integer overflow issue mentioned in the documentation, but I'd at least expect a similar erroneous result with fsum which I'm not seeing there.

I also noticed that the length of the result returned by fsum() doesn't match that of fmean() in the case of integer vectors.

In any case I've attached some simple examples to demonstrate the behaviour I'm seeing.

library(collapse)
#> collapse 2.0.16, see ?`collapse-package` or ?`collapse-documentation`
#> 
#> Attaching package: 'collapse'
#> The following object is masked from 'package:stats':
#> 
#>     D
fsum(c(1L, NA), na.rm = F) #Ok
#> [1] NA
fmean(c(1L, NA), na.rm = F) #Ok
#> [1] NA
fsum(c(1L, NA), na.rm = F, g = c(1L, 1L)) #Ok
#>  1 
#> NA
fmean(c(1L, NA), na.rm = F, g = c(1L, 1L)) #Not ok
#>           1 
#> -1073741824
fsum(integer()) #Ok
#> integer(0)
fmean(integer()) #Not ok
#> [1] NA
fsum(numeric()) #Ok
#> numeric(0)
fmean(numeric()) #Ok
#> numeric(0)
fsum(integer(), na.rm = F, g = integer()) #Ok
#> named integer(0)
fmean(integer(), na.rm = F, g = integer()) #Not ok
#> <NA> 
#>   NA
fsum(numeric(), na.rm = F, g = integer()) #Ok
#> named numeric(0)
fmean(numeric(), na.rm = F, g = integer()) #Ok
#> named numeric(0)

Created on 2024-08-26 with reprex v2.1.0

@SebKrantz
Copy link
Owner

SebKrantz commented Aug 26, 2024

Thanks! So actually all more involved numerical statistics (fmean, fmedian, fvar, fsd, fquantile) give NA when passed integer(), which is consistent with base R. What is inconsistent is that fsum(),fmin, and fmax() return integer(), instead of 0, Inf, -Inf. I can think about this for a while. I guess I didn't go with base R here because I did not like these defaults.

And yeah, fmean() for integers with groups currently does not do any checks if na.rm = FALSE. which gives a negative number here because NA_integer_ is the smallest integer in C. I'll see if I can fix that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants