Skip to content
/ l2p Public

List to pathway, or l2p, is an R package for gene set enrichment analysis that is optimized for speed.

License

Notifications You must be signed in to change notification settings

CCBR/l2p

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

l2p

DOI GitHub releases GitHub issues GitHub license

The beta web version is here: https://ccbr.github.io/l2p/ . ( You can copy/paste a whole paper in to see an instant pathway analysis. ). This is a single page webassembly web application.

List-to-pathway, or l2p, is an R package for gene set enrichment analysis that is optimized for speed!

l2p can be used to determine whether a biological process or function is over-represented in a user-defined gene list. This can be a list of differential expressed genes, or a list of annotated differential bound regions using a tool like uropa or homer.

l2psupp is the "l2p supplemental" package which contains routines for converting gene symbols. l2psupp and l2p are indepedent. It is recommened to use the l2psupp's updategenes() function to make your old gene symbols into new HGNC gene symbols.

Installation

The latest package of l2p can be downloaded directly from Github. Here we describe each method in more detail.

Option 1: Download and install latest R package, l2p_0.0-14.tar.gz, from command-line:

# Get l2p from Github
wget https://github.com/CCBR/l2p/raw/master/l2p_0.0-14.tar.gz
# Install as a site package 
R CMD INSTALL l2p_0.0-14.tar.gz
# install l2psupp ( "l2p supplemental")
wget https://github.com/CCBR/l2p/raw/master/l2psupp_0.0-14.tar.gz
# Install as a site package ZZ
R CMD INSTALL l2psupp_0.0-14.tar.gz

Option 2: Install l2p within an R console or RStudio session:

# Install from R console or 
install.packages("https://github.com/CCBR/l2p/raw/master/l2p_0.0-14.tar.gz", repos=NULL) 
install.packages("https://github.com/CCBR/l2p/raw/master/l2psupp_0.0-14.tar.gz", repos=NULL) 

Option 3: Download and install l2p using conda:

# Download 1) l2p and 2) l2psupp Packages
wget https://github.com/CCBR/l2p/raw/master/r-l2p-0.0_14-r41h9bf148f_1.tar.bz2
wget https://github.com/CCBR/l2p/raw/master/r-l2psupp-0.0_14-r41_0.tar.bz2


 # Install in a conda enviroment run conda install  
conda install r-l2p-0.0_14-r41h9bf148f_1.tar.bz2
conda install r-l2psupp-0.0_14-r41_0.tar.bz

Please Note: It is assumed R is installed on the target system.

Usage

# Available functions in l2p
l2p(genelist)               # returns data frame with proabilities that arg (list of genes) matches a pathway
l2pgetlongdesc(acc)         # get the full (possibly very long) description for pathway accession identifer string
l2pgetgenes4acc(acc)        # get the list all the genes for a pathway, use the accession.

# Convenience Functions:
l2pu(list,universe)         # returns data frame with probabilities with list of genes and user specified universe
l2pwcats(list,categeories)  # returns data frome with categories specified
l2puwcats(list,universe,categories) # same as l2pwcats but also with a universe
l2pver()                              # returns l2p version


# Available functions in l2psupp 
m2h(mousegenelist)          # returns list of human genes for input list of mouse gene names
a2a(genelist,fromspecies,tospecies) # returns list of source species genes and return list of orthologs for destination species
updategenes(genelist , [trust=1] , [ legitonly=0 ] )   # update old gene names to current HGNC (HUGO) gene names.
egid2hugos(entrez_gene_list) # get HGNC names for entrez gene ids
 

The l2p function supoports R style arguments : Here is a description of each parameter:

  • universe: list of gene names
  • categories: see categories below
  • custompathways: A list of vectors in gene matrix transpose (GMT) format. Each custom pw vector: pwname, desc, gene1, gene2...
  • customfile: a GMT file
  • universefile: list of genes one per line

Available categories:

  • BIOCYC: organism specific Pathway/ Genome Databases (PGDBs) - https://biocyc.org/
  • GO: initiative to unify representation of gene and gene product attributes - http://geneontology.org
  • KEGG: databases dealing with genomes, biological pathways, - https://www.kegg.jp/
  • PANTH: databases for protein analysis through evolutionary relationships - http://www.pantherdb.org/
  • PID: Pathway interaction database: legacy database from Carl Schaefer & buddies at NCI
  • REACTOME: curated database of biological pathways - https://reactome.org/
  • WikiPathways: community resource for biological pathways
  • C1: MSigDB positional gene sets for each human chromosome and cytogenetic band.
  • C2: MSigDB curated gene sets from online pathway databases, publications in PubMed, and experts.
  • C3: MSigDB motif gene sets based on conserved cis-regulatory motifs from comparative analysis
  • C4: MSigDB computational gene sets defined by mining large collections of cancer-oriented microarray data.
  • C6: MSigDB oncogenic gene sets defined directly from microarray data from cancer gene perturbations.
  • C7: MSigDB immunologic gene sets from microarray data from immunologic studies.
  • C8: MsigDB markers identified in single-cell sequencing studies of human tissues.

Please Note: MSigDB "ARCHIVED" pathways are not provided. MSigDB category "C5" is not there. Use "GO" category instead.

Example function call

library(l2p)
genes<-c( "CNOT1","IFITM1","CYP27A1","MTSS1","MXD1","TMEM150B")
x = l2p(as.vector(genes))
head(x,4)

Output

The output is a data frame with the following fields ...
 

1 pathway_name                  name of pathway
2 pval                          fisher's exact p-value
3 fdr                           false discovery rate: benjamini hochberg
4 enrichment_score              same as old but multiplied by 100 : ((number_hits /(number_hits+number_misses)) - (number_user_genes/(number_user_genes+total_gens_minus_input))) * 100
5 percent_gene_hits_per_pathway (number_hits/(number_hits+number_misses))
6 number_hits                   number of genes hit in pathway
7 number_misses                 pathway number of genes in the pathway
8 number_user_genes             total count of user genes (user input)
9 total_genes_minus_input       total number of unique genes in all pathways
10 pathway_id                   canonical accession ( if available, otherwise assigned by us )
11 category                     KEGG,REACTOME,GO,PANT(=PANTHER),PID=(pathway interaction database)  *was "source"*
12 pathway_type                 functional_set,pathway,structural_complex,custom
13 genesinpathway               HUGO genes from user that hit the pathway

```R
# Example usage for l2p
    
library(l2p)
genes <- c( "TP53", "PTEN", "APC" )
x = l2p(as.vector(genes))
options(max.print=1000000)
options(width=10000)
print(x)

# How to make a custom pathway
vec1 = c("lall_ad.2","all_ad","AARS","ABCA1","ABCC9","ACTA1","ACTA2","ACTB","ACTC1","ACTG1","ACTN2","ACTN4","ACVR2B","ACVRL1","ADAR","AFG3L2","AFP","AIP","AK1","AKAP9")
vec2 = c("ACMG_2_0.2","ACMG_2_0","BRCA1","BRCA2","TP53","STK11","MLH1","MSH2","MSH6","PMS2")
vec3 = c("berg_ad.2","berg_ad","AARS","ABCC9","ACTA2","ACTB","ACTC1","ACTG1","ACTN2","ACTN4","ACVR2B","ACVRL1","ADAR","AFG3L2","AIP","AK1","AKAP9","AKT2","AMPD1","ANG","ANK2","ANKH","APC","APOA2","APOA5","APOB","APP","ATL1","ATP1A2","ATP2A2","ATP2C1","ATXN1","ATXN10","ATXN2","ATXN3","ATXN7","AXIN2","BAG3","BCO1","BEST1")
mylist <- list(vec1, vec2,vec3)
genes <- c( "TP53", "PTEN", "APC" , "CENPF" , "DLAT", "TP53" , "NOTAGENE" ,"ABCA1","ABCC9","ACTA1", "ADH1A" ,"ATXN3", "BEST1")
x = l2p(as.vector(genes),custompathways=mylist)
print(length(x))
print(x);

# How to set a user universe
library(l2p)
options(width=10000)
options(max.print=999999)

fv<-c("ADH1A","CATSPERG","HLA-DQA2", "HINT2P1","MIR3150A","OR5BS1P","LINC02338","C4orf48","PARD3B","CX3CR1","RPL21P121","ARHGAP1","GAPDHP36","CNBD1","C8orf48","HTR3D","LINC00396","HIGD1AP5","C16orf90","RNU1-134P","CKAP2P1","AP5M1","FFAR3","LAD1","RNU6-524P","TJP3","JRKL","CRADD","RN7SL333P","CYP4F26P","CD1A","B3GNT5","TACC1P1","LINC02763","LOC100505664","TEX15","RPSAP18","CHP2","TRAV8-3","PFDN5","RPL7P8","SERPINA9","DNTTIP1","MELTF","HESX1","LINC02277","SFSWAP","SLC7A11","NAA16","FAM171B","GMNN","ZBTB2","WNT6","LINC02799","MRPL4","MTND1P37","HMGN2P40","NMD3P1","MIR195","LINC02785","DYM","TADA3","CEACAMP5","FAM198B-AS1","FZD8", "RN7SL470P","IQANK1","IGKV1OR9-1","RPL10AP3","BPI","RPL5P25","CARD16","LINC02415","UBE2Q2P10","MIR6761","RNU6-903P","LINC01559","ARL17A","MIR518F","BRAP","LINC01165","XPC","RNU6-505P","LRRIQ4","MIR192","CCL27","LAPTM4BP2","INVS","TMEM161B-AS1","FAM197Y6","HSPD1","UGT1A9","TOR3A","TAF15","MIR6726","TMEM87A","HMGB1","MEI4","NAGPA-AS1","MAPK6P5","HTRA2","HSPB1P1","DYRK1A","IFFO2","TACO1","PPP6C","OR5D14","RNU6-313P","LINC01940","BBS2","RN7SL435P","LINC02422","OR3B1P","ZZEF1","EARS2","LINC02558","LINC00265-2P","KCNH1","GSTP1P1","MIR8076","RNU6-370P","RNA5SP279","RN7SL752P","CXorf49B","ANKRD36P1","IDH3A","RNU6-644P","NUCB2","CHCHD4","FAM138C","MIR198","CDC23","BRCA1","LINC02681","TFB2M","PPIP5K2","MAP2K1","MTATP6P14","COX6B1P2","HDAC5","RAB11FIP2","VSIG4","RN7SL690P","DNAJC13","GOT2P1","GTF2H1","BIRC2","LOC100132202","GAGE4","MTRNR2L10","LINC02319","C8orf49","CCNG2","LINC01524","RN7SKP49","CLDN22","FXYD6","LINC00384","ZNF14","PCGF3","CCDC6","TM4SF20","PRPS1L1","PRORSD1P","SEPHS1P1","KCNA10","MGAT5","LINC02015","BSDC1","POTEM","PHAX","RNU4-65P","MTND1P16","GPRIN2","GALE","CALY","QTRT2","RNU2-18P","TNFRSF10A-AS1","NECTIN3","RNU7-84P","PCK2","BBS5","CEACAMP4","UBE2R2","ABCB9","INTS13","ZNF69","PLEKHM2","LDHA","PHKBP1","SLC9B2","HNRNPA3P9","ARGFXP1","IER5L","CAPRIN1","RNA5SP19","NOP9","COX6CP16")

genes<-c("ADH1A","CATSPERG","HLA-DQA2","HINT2P1","MIR3150A","OR5BS1P","LINC02338","C4orf48","PARD3B","CX3CR1","RPL21P121","ARHGAP1","GAPDHP36","CNBD1","C8orf48","HTR3D","LINC00396","HIGD1AP5")

# x=l2p(genes,categories="KEGG")
x=l2p(genes,universe=fv,categories="KEGG")
print(length(x))
x=l2pgetuniverse(categories="PID")
print(length(x))
x=l2pgetuniverse()
print(length(x))
x=l2pgetuniverse(categories="KEGG")
print(length(x))
x=l2pgetuniverse(categories="C2")
print(length(x))
x=l2pgetuniverse(categories="C3")
print(length(x))
x=l2pgetuniverse(categories="C2,C3")
print(length(x))

Testing

# Running l2p's QA test program
R --vanilla < test.R

Citation

Finney, R. & Nelson, G. (2020, July 13). List-to-Pathway: an ultrafast R package for gene set enrichment analysis (Version v0.0.3). Zenodo. http://doi.org/10.5281/zenodo.3942233


Back to Top

About

List to pathway, or l2p, is an R package for gene set enrichment analysis that is optimized for speed.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages