-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: convert UInt64 S2 IDs (as string) to S2 cell IDs #250
Comments
Is it too verbose for your use case to use |
Full example below (should have had this before! :) ) The signed/unsigned distinction is causing the problem, and I couldn't figure out how to address it all inside R. Some S2 implementations in other languages use unsigned Int64s (range 0 to 18,446,744,073,709,551,615) instead of signed Int64 (range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807). The S2 documentation often uses UInt64 too, eg here. Int64/UInt64 values are equivalent in the underlying 64 bits, but when those bits get parsed to an integer type they're different numbers. R can only handle the signed Int64 form as a numeric (as far as I can tell! maybe wrong). Here's a specific case. I have a dataset partitioned by S2 cell ID. One ID is 10710685813793882112. I want to map this UInt64 ID to R's s2_cell_id so I can pre-filter my data geographically.
I got to the conclusions above ^ by passing through some ChatGPTd C++, as below: Step 1: #include <Rcpp.h>
#include <bitset>
#include <cstdint> // For int64_t and uint64_t
// [[Rcpp::export]]
std::string stringToBitstring(std::string str) {
// Convert string to unsigned Int64
uint64_t unsignedValue = std::stoull(str);
// Reinterpret the bits as signed Int64
int64_t signedValue = reinterpret_cast<int64_t&>(unsignedValue);
// Convert to a bitstring
std::bitset<64> bits(signedValue);
return bits.to_string();
} Step 2: Rcpp::sourceCpp("stringToBitstring.cpp")
(format_bitstring <- stringToBitstring("10710685813793882112"))
(format_int64 <- bit64::as.integer64.bitstring(format_bitstring))
(format_s2cell <- s2::as_s2_cell(format_int64)) The other approaches I tested and rejected: All in R with bit64:
Conclusions I can't see a way to handle Uint64 representations of the S2 integer ID in R directly, hence the C++ hack above. I think it'd be great to have a direct translation from UInt64 formatted as a string to an S2 cell ID, so users could handle them without leaving R themselves. |
A hopefully-small feature request. I'm seeing data in the wild that uses UInt64 representations of S2 cell IDs. An example here is the global rooftop dataset in Source Cooperative where UInt64 IDs show up in the hive partition path.
It's the same underlying bit representation as what you're doing in S2 cells or in the class conversion to bit64::integer64 (thanks again for that), but ... I haven't found a way to read UInt64 at all in R except as a string. I'm guessing the translation would have to be in Rcpp.
Thanks, as always!
The text was updated successfully, but these errors were encountered: