Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add loops to break up search values for large batch searches #31

Open
madison-feshuk opened this issue Oct 23, 2024 · 0 comments
Open

Comments

@madison-feshuk
Copy link
Contributor

Trying to map dtxsids based on inchikeys in the provided file (too large to attach to ticket). Initial thought was API bug given size of request but seems more like a ctxR issue since playing around with rate_limit parameter works sometimes

data <- read_csv("activities_with_inchikeys_to_filter_in_R_all.csv") #full dataframe of 78104 records
length(unique(data$standard_inchi_key)) #47348 unique inchikeys
test <- chemical_equal_batch(word_list = unique(data$standard_inchi_key)) #returns empty data frame 
## subset for testing
# when you try  1000, it works and returns the mappings for the 780 uniques inchikeys
data <- data[1:2000,] 
#2000 fails, returns empty data frame, but works if you add rate_limit of .1
test <- chemical_equal_batch(word_list = unique(data$standard_inchi_key))

Looping through full list of search terms resolved this, but this is not expected behavior. We could consider breaking up search list if large in addition to adjusting rate limits

inchi_list <- unique(data$standard_inchi_key)
batches <- seq(1000,length(inchi_list), by = 1000)

library(ctxR)
test <- chemical_equal_batch(word_list = inchi_list[1:999], verbose = TRUE) #returns empty data frame 

for(i in batches){
  test <- rbind(test,chemical_equal_batch(word_list = inchi_list[i:(i+999)]))
  print(i)
}
test <- rbind(test,chemical_equal_batch(word_list = inchi_list[max(batches)+1:length(inchi_list)]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant