Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expressed_regions() is not working with the new IDIES and AWS bigWig file locations #23

Open
lcolladotor opened this issue Feb 20, 2023 · 5 comments

Comments

@lcolladotor
Copy link
Member

Hi,

Currently in BioC release (3.16) and devel (3.17), recount is failing. That's because neither the new IDIES location nor AWS are allowing us to read the BigWig files from the web. I manually edited a local clone of recount to try with the IDIES location.

You can test this on AWS (through duffel) with:

regions <- expressed_regions("SRP002001", "chrY", cutoff = 5)

from

regions <- expressed_regions("SRP002001", "chrY", cutoff = 5)
.

This is the type of warning we get:

2023-02-20 12:51:10 loadCoverage: loading BigWig file http://sciserver.org/public-data/recount2/data/SRP002001/bw/mean_SRP002001.bw
In addition: Warning messages:
1: In seqinfo(con) :
  No openssl available in netConnectHttps for sciserver.org : 443
2: In seqinfo(con) :
  No openssl available in netConnectHttps for sciserver.org : 443
3: In seqinfo(con) :
  No openssl available in netConnectHttps for sciserver.org : 443
> traceback()
8: stop(conditionMessage(output))
7: FUN(X[[i]], ...)
6: lapply(as.list(X), match.fun(FUN), ...)
5: lapply(as.list(X), match.fun(FUN), ...)
4: lapply(bList, .loadCoverageBigWig, range = which, chr = chr, 
       verbose = verbose)
3: lapply(bList, .loadCoverageBigWig, range = which, chr = chr, 
       verbose = verbose)
2: derfinder::loadCoverage(files = meanFile, chr = chr, chrlen = chrlen) at expressed_regions.R#121
1: expressed_regions("SRP002001", "chrY", cutoff = 5)
2023-02-20 12:36:04 loadCoverage: loading BigWig file http://duffel.rail.bio/recount/SRP002001/bw/mean_SRP002001.bw
In addition: Warning messages:
1: In seqinfo(con) :
  No openssl available in netConnectHttps for recount-opendata.s3.amazonaws.com : 443
2: In seqinfo(con) :
  No openssl available in netConnectHttps for recount-opendata.s3.amazonaws.com : 443
3: In seqinfo(con) :
  No openssl available in netConnectHttps for recount-opendata.s3.amazonaws.com : 443

I'm not sure what to do @nellore @ChristopherWilks.

I can try to provide a smaller test, digging into .loadCoverageBigWig() https://github.com/lcolladotor/derfinder/blob/5c1cbd412c5787bf2d2d778977e38dd6ae64976d/R/loadCoverage.R#L384 and well, ultimately rtracklayer.

Best,
Leo

@ChristopherWilks
Copy link

A quick check on an older version of BioC (3.11) and rtracklayer (1.50.0) appears to work:

> project_info <- abstract_search("GSE32465")
> regions <- expressed_regions("SRP009615", "chrY",
+     cutoff = 5L,
+     maxClusterGap = 3000L
+ )
2023-02-20 19:23:16 loadCoverage: loading BigWig file http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
2023-02-20 19:23:18 loadCoverage: applying the cutoff to the merged data
2023-02-20 19:23:18 filterData: originally there were 57227415 rows, now there are 57227415 rows. Meaning that 0 percent was filtered.
2023-02-20 19:23:18 findRegions: identifying potential segments
2023-02-20 19:23:18 findRegions: segmenting information
2023-02-20 19:23:18 .getSegmentsRle: segmenting with cutoff(s) 5
2023-02-20 19:23:18 findRegions: identifying candidate regions
2023-02-20 19:23:19 findRegions: identifying region clusters
> head(regions)
GRanges object with 6 ranges and 6 metadata columns:
    seqnames          ranges strand |     value      area indexStart  indexEnd
       <Rle>       <IRanges>  <Rle> | <numeric> <numeric>  <integer> <integer>
  1     chrY 2929794-2929829      * |  14.72650   530.154    2929794   2929829
  2     chrY 2956678-2956701      * |  12.81063   307.455    2956678   2956701
  3     chrY 2977203-2977227      * |   5.34908   133.727    2977203   2977227
  4     chrY 2977957-2977994      * |   6.46977   245.851    2977957   2977994
  5     chrY 2978850-2978871      * |   5.79766   127.548    2978850   2978871
  6     chrY 2979004-2979033      * |   6.79941   203.982    2979004   2979033
    cluster clusterL
      <Rle>    <Rle>
  1       1       36
  2       2       24
  3       3     2750
  4       3     2750
  5       3     2750
  6       3     2750
  -------
  seqinfo: 1 sequence from an unspecified genome
> tools:::.BioC_version_associated_with_R_version()
[1] ‘3.11’
> sessionInfo(package = NULL)
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] recount_1.16.1              SummarizedExperiment_1.20.0
 [3] Biobase_2.50.0              GenomicRanges_1.42.0
 [5] GenomeInfoDb_1.26.2         IRanges_2.24.1
 [7] S4Vectors_0.28.1            BiocGenerics_0.36.0
 [9] MatrixGenerics_1.2.1        matrixStats_0.58.0

loaded via a namespace (and not attached):
  [1] colorspace_2.0-0         ellipsis_0.3.2           qvalue_2.15.0
  [4] htmlTable_2.1.0          XVector_0.30.0           base64enc_0.1-3
  [7] rstudioapi_0.13          bit64_4.0.5              AnnotationDbi_1.52.0
 [10] fansi_0.4.2              xml2_1.3.2               codetools_0.2-18
 [13] splines_4.0.2            cachem_1.0.1             knitr_1.31
 [16] jsonlite_1.7.2           Formula_1.2-4            Rsamtools_2.6.0
 [19] cluster_2.1.0            dbplyr_2.1.1             png_0.1-7
 [22] rentrez_1.2.3            readr_2.1.2              compiler_4.0.2
 [25] httr_1.4.2               backports_1.2.1          assertthat_0.2.1
 [28] Matrix_1.3-2             fastmap_1.1.0            limma_3.46.0
 [31] cli_3.0.1                htmltools_0.5.1.1        prettyunits_1.1.1
 [34] tools_4.0.2              gtable_0.3.0             glue_1.6.2
 [37] GenomeInfoDbData_1.2.4   reshape2_1.4.4           dplyr_1.0.8
 [40] rappdirs_0.3.3           doRNG_1.8.2              Rcpp_1.0.6
 [43] bumphunter_1.32.0        vctrs_0.3.8              Biostrings_2.58.0
 [46] rtracklayer_1.50.0       iterators_1.0.13         xfun_0.20
 [49] stringr_1.4.0            lifecycle_1.0.1          rngtools_1.5
 [52] XML_3.99-0.5             zlibbioc_1.36.0          scales_1.1.1
 [55] BSgenome_1.58.0          VariantAnnotation_1.36.0 hms_1.0.0
 [58] GEOquery_2.58.0          derfinderHelper_1.24.1   RColorBrewer_1.1-2
 [61] curl_4.3                 memoise_2.0.0            gridExtra_2.3
 [64] downloader_0.4           ggplot2_3.3.5            biomaRt_2.46.3
 [67] rpart_4.1-15             latticeExtra_0.6-29      stringi_1.5.3
 [70] RSQLite_2.2.3            foreach_1.5.1            checkmate_2.0.0
 [73] GenomicFeatures_1.42.2   BiocParallel_1.24.1      rlang_1.0.1
 [76] pkgconfig_2.0.3          GenomicFiles_1.26.0      bitops_1.0-6
 [79] lattice_0.20-41          purrr_0.3.4              GenomicAlignments_1.26.0
 [82] htmlwidgets_1.5.4        bit_4.0.4                tidyselect_1.1.1
 [85] plyr_1.8.6               magrittr_2.0.1           R6_2.5.0
 [88] generics_0.1.3           Hmisc_4.5-0              DelayedArray_0.16.1
 [91] DBI_1.1.1                pillar_1.6.2             foreign_0.8-81
 [94] survival_3.2-7           RCurl_1.98-1.2           nnet_7.3-15
 [97] tibble_3.0.6             crayon_1.4.0             derfinder_1.24.2
[100] utf8_1.1.4               BiocFileCache_1.14.0     tzdb_0.2.0
[103] jpeg_0.1-8.1             progress_1.2.2           locfit_1.5-9.4
[106] grid_4.0.2               data.table_1.14.0        blob_1.2.1
[109] digest_0.6.27            tidyr_1.2.0              openssl_2.0.2
[112] munsell_0.5.0            askpass_1.1

@ChristopherWilks
Copy link

I'll need to setup a newer version of BioC with recount to further test the later versions, but I'm guessing this is an openssl<->rtracklayer interaction issue

@lcolladotor
Copy link
Member Author

Awesome, thanks for this info Chris! I'll create an issue for https://github.com/lawremi/rtracklayer

@lcolladotor
Copy link
Member Author

I just wrote lawremi/rtracklayer#83. Let's see where that leads. Thanks again Chris!

lcolladotor added a commit that referenced this issue Feb 20, 2023
…lks/snaptron#17. Also #23. I tried insulating recount from these tests, so they'll be reported as warnings instead of errors on the BioC build machines for now.
lcolladotor added a commit that referenced this issue Feb 20, 2023
…lks/snaptron#17. Also #23. I tried insulating recount from these tests, so they'll be reported as warnings instead of errors on the BioC build machines for now.
@lcolladotor
Copy link
Member Author

Note that I posted an update to lawremi/rtracklayer#83 (comment) today and updated the recount package to try to implement some workarounds. This is also related to #25 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants