A curated list of resources of machine learning related datasets and open-source models for biomedicine.
Part content modified from beamandrew/medical-data.
- DLradiologyRSNA2016
- Susskind, Joshua, Volodymyr Mnih, and Geoffrey Hinton. "On deep generative models with applications to recognition." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.
- Brosch, Tom, Roger Tam, and Alzheimer’s Disease Neuroimaging Initiative. "Manifold learning of brain MRIs by deep learning." International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer Berlin Heidelberg, 2013.
- Wu, Zhirong, et al. "3d shapenets: A deep representation for volumetric shapes." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
- Calandra, Roberto, et al. "Learning deep belief networks from non-stationary streams." International Conference on Artificial Neural Networks. Springer Berlin Heidelberg, 2012.
- Conv Nets: A Modular Perspective ( http://colah.github.io/posts/2014-07-Conv-Nets-Modular/ )
- A BRIEF REPORT OF THE HEURITECH DEEP LEARNING MEETUP #5 (https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritechdeep-learning-meetup-5/)
- Deep Learning Tutorial ICML, Atlanta 2013, Yann LeCun and Marc'Aurelio Ranzato
- Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." European Conference on Computer Vision. Springer International Publishing, 2014.
- Ji, Shuiwang, et al. "3D convolutional neural networks for human action recognition." IEEE transactions on pattern analysis and machine intelligence35.1 (2013): 221-231.
- Tran, Du, et al. "Learning spatiotemporal features with 3d convolutional networks." 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015.
- Deep Style: Inferring the Unknown to Predict the Future of Fashion, TJ TORRES, http://multithreaded.stitchfix.com/blog/2015/09/17/deepstyle/
- Recurrent Neural Networks Neural Computation : Lecture 12, John A. Bullinaria, 2015 http://www.cs.bham.ac.uk/~jxb/INC/l12.pdf
- LSTM Networks for Sentiment Analysis, http://deeplearning.net/tutorial/lstm.html
- Yan, Zhennan, et al. "Multi-Instance Deep Learning: Discover Discriminative Local Anatomies for Bodypart Recognition." IEEE transactions on medical imaging 35.5 (2016): 1332-1343.
- Roth, Holger R., et al. "Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation." International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer International Publishing, 2015.
- Cha, Kenny H., et al. "Urinary bladder segmentation in CT urography using deep-learning convolutional neural network and level sets." Medical physics43.4 (2016): 1882-1896.
- Miao, Shun, Z. Jane Wang, and Rui Liao. "A CNN Regression Approach for Real-Time 2D/3D Registration." IEEE transactions on medical imaging 35.5 (2016): 1352-1363. References
- van Tulder, Gijs, and Marleen de Bruijne. "Combining Generative and Discriminative Representation Learning for Lung CT Analysis With Convolutional Restricted Boltzmann Machines." IEEE transactions on medical imaging 35.5 (2016): 1262-1272.
- Roth, Holger R., et al. "A new 2.5 D representation for lymph node detection using random sets of deep convolutional neural network observations."International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer International Publishing, 2014.
- Setio, Arnaud Arindra Adiyoso, et al. "Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks."IEEE transactions on medical imaging 35.5 (2016): 1160-1169.
- Li, Wen, Fucang Jia, and Qingmao Hu. "Automatic Segmentation of Liver Tumor in CT Images with Deep Convolutional Neural Networks." Journal of Computer and Communications 3.11 (2015): 146.
- Guo, Yanrong, Yaozong Gao, and Dinggang Shen. "Deformable MR prostate segmentation via deep feature learning and sparse patch matching." IEEE transactions on medical imaging 35.4 (2016): 1077-1089.
- Avendi, M. R., Arash Kheradvar, and Hamid Jafarkhani. "A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI." Medical image analysis 30 (2016): 108-119.
- Moeskops, Pim, et al. "Automatic segmentation of MR brain images with a convolutional neural network." IEEE transactions on medical imaging 35.5 (2016): 1252-1261.
- Hosseini-Asl, Ehsan, Robert Keynton, and Ayman El-Baz. "Alzheimer's disease diagnostics by adaptation of 3D convolutional network." Image Processing (ICIP), 2016 IEEE International Conference on. IEEE, 2016.
- Zhen, Xiantong, et al. "Multi-scale deep networks and regression forests for direct bi-ventricular volume estimation." Medical image analysis 30 (2016): 120-129.
- Nie, Dong, et al. "3D Deep Learning for Multi-modal Imaging-Guided Survival Time Prediction of Brain Tumor Patients." International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer International Publishing, 2016.
- Vasilakos, Athanasios V., Yu Tang, and Yuanzhe Yao. "Neural networks for computer-aided diagnosis in medicine: A review." Neurocomputing (2016).
- https://www.researchgate.net/profile/Yingju_Chen/publication/233806620/figure/fig4/AS:202659295436813@1425329149496/General-stepsinvolving-in-computer-aided-diagnosis-CAD-system-where-gray-boxes-may-be.png
- Tajbakhsh, Nima, et al. "Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?." IEEE transactions on medical imaging 35.5 (2016): 1299-1312.
- 5 Industries Being Most Affected By Artificial Intelligence https://www.fowcommunity.com/blog/future-work/5-industries-being-most-affectedartificial-intelligence
- Cheng, Jie-Zhi, et al. "Computer-Aided Diagnosis with Deep Learning Architecture: Applications to Breast Lesions in US Images and Pulmonary Nodules in CT Scans." Scientific reports 6 (2016).
- Artificial Intelligence & Machine Learning for Semantic Imaging, Imperial College London http://wp.doc.ic.ac.uk/bglocker/project/semanticimaging/
- Leung, M.K.K., Andrew, D., Babak, A. & Frey, B.J. Machine learning in genomic medicine: a review of computational problems and data sets. Proc. IEEE 104, 176–197 (2016).
- Mamoshina, P., Vieira, A., Putin, E. & Zhavoronkov, A. Applications of deep learning in biomedicine. Mol. Pharm. 13, 1445–1454 (2016).
- Gawehn, E., Hiss, J.A. & Schneider, G. Deep learning in drug discovery. Mol. Inform. 35, 3–14 (2016).
- Jurtz, V.I. et al. An introduction to deep learning on biological sequence data: examples and solutions. Bioinformatics 33, 3685–3690 (2017).
- Zou, J., et al., A primer on deep learning in genomics. Nat Genet, 2019. 51(1): p. 12-18.
- Eraslan, G., et al., Deep learning: new computational modelling techniques for genomics. Nat Rev Genet, 2019.
- Wainberg M, Merico D, Delong A, Frey BJ. Deep learning in biomedicine. Nat Biotechnol. 2018;36(9):829-838. doi:10.1038/nbt.4233
- Ching, T., et al., Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface, 2018. 15(141).
- Min, S., B. Lee, and S. Yoon, Deep learning in bioinformatics. Brief Bioinform, 2017. 18(5): p. 851-869.
- Jones, W., et al., Computational biology: deep learning. Emerging Topics in Life Sciences, 2017. 1(3): p. 257-274.
- Angermueller, C., et al., Deep learning for computational biology. Mol Syst Biol, 2016. 12(7): p. 878.
- Zhou, J., et al., Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat Genet, 2018. 50(8): p. 1171-1179.
- Sundaram, L., et al., Predicting the clinical impact of human mutation with deep neural networks. Nat Genet, 2018.
- Libbrecht, M.W. and W.S. Noble, Machine learning applications in genetics and genomics. Nat Rev Genet, 2015. 16(6): p. 321-32.
- Camacho, D.M., et al., Next-Generation Machine Learning for Biological Networks. Cell, 2018. 173(7): p. 1581-1592.
- Baldi, P. Deep learning in biomedical data science. Annu. Rev. Biomed. Data Sci. 1, 181–205 (2018).
- Tan, J., Ung, M., Cheng, C. & Greene, C.S. Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders. Pac. Symp. Biocomput. 2015, 132–143 (2015).
- Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
- Visscher, P.M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
- Boyle, E.A., Li, Y.I. & Pritchard, J.K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
- Alipanahi, B., Delong, A., Weirauch, M.T. & Frey, B.J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
- Kelley, D.R., Snoek, J. & Rinn, J.L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).
- Zhou, J. & Troyanskaya, O.G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
- Quang, D. & Xie, X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107 (2016).
- Angermueller, C., Lee, H.J., Reik, W. & Stegle, O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 18, 67 (2017).
- Zhang, S., Hu, H., Jiang, T., Zhang, L. & Zeng, J. TITER: predicting translation initiation sites by deep learning. Bioinformatics 33, i234–i242 (2017).
- Ma, J. et al. Using deep learning to model the hierarchical structure and function of a cell. Nat. Methods 15, 290–298 (2018).
- Duvenaud, D. et al. Convolutional networks on graphs for learning molecular fingerprints. Preprint at https://arxiv.org/abs/1509.09292 (2015)
- Ramsundar, B. et al. Massively multitask networks for drug discovery. Preprint at https://arxiv.org/abs/1502.02072 (2015).
- Wallach, I., Dzamba, M. & Heifets, A. AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery. Preprint at https://arxiv. org/abs/1510.02855 (2015).
- Liu, Y. et al. Detecting cancer metastases on gigapixel pathology images. Preprint at https://arxiv.org/abs/1703.02442 (2017).
- Wang, D., Khosla, A., Gargeya, R., Irshad, H. & Beck, A.H. Deep learning for identifying metastatic breast cancer. Preprint at https://arxiv.org/abs/1606.05718 (2016).
- Lakhani, P. & Sundaram, B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 284, 574–582 (2017)
- Kraus, O.Z. et al. Automated analysis of high-content microscopy data with deep learning. Mol. Syst. Biol. 13, 924 (2017).
- Carpenter, A.E. et al. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 7, R100 (2006).
- Bruno, M.A., Walker, E.A. & Abujudeh, H.H. Understanding and confronting our mistakes: the epidemiology of error in radiology and strategies for error reduction. Radiographics 35, 1668–1676 (2015).
- Leinonen, R., Sugawara, H. & Shumway, M. The Sequence Read Archive. Nucleic Acids Res. 39, D19–D21 (2011).
- Mamoshina P, Vieira A, Putin E, Zhavoronkov A. Applications of Deep Learning in Biomedicine. Mol Pharm. 2016;13(5):1445-1454. doi:10.1021/acs.molpharmaceut.5b00982
- Cao C, Liu F, Tan H, et al. Deep Learning and Its Applications in Biomedicine. Genomics Proteomics Bioinformatics. 2018;16(1):17-32. doi:10.1016/j.gpb.2017.07.003
- Nussinov, R. Advancements and Challenges in Computational Biology. PLoS Comput. Biol. 2015, 11 (1), e1004053.
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.; Veness, J.; Bellemare, M.; Graves, A.; Riedmiller, M.; Fidjeland, A.; Ostrovski, G.; Petersen, S.; Beattie, C.; Sadik, A.; Antonoglou, I.; King, H.; Kumaran, D.; Wierstra, D.; Legg, S.; Hassabis, D. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518 (7540), 529−533.
- Schmidhuber, J. Deep Learning in Neural Networks: An Overview. Neural Networks 2015, 61, 85−117.
- Bakhtiar, R. Biomarkers in Drug Discovery and Development. J. Pharmacol. Toxicol. Methods 2008, 57 (2), 85−91.
- Lezhnina, K.; Kovalchuk, O.; Zhavoronkov, A. A.; Korzinkin, M. B.; Zabolotneva, A. A.; Shegay, P. V.; Sokov, D. G.; Gaifullin, N. M.; Rusakov, I. G.; Aliper, A. M.; Roumiantsev, S. A.; Alekseev, B. Y.; Borisov, N. M.; Buzdin, A. A. Novel Robust Biomarkers for Human Bladder Cancer Based on Activation of Intracellular Signaling Pathways. Oncotarget 2014, 5 (19), 9022−9032.
- Jarvinen, A.-K.; Hautaniemi, S.; Edgren, H.; Auvinen, P.; Saarela, ̈ J.; Kallioniemi, O.-P.; Monni, O. Are Data from Different Gene Expression Microarray Platforms Comparable? Genomics 2004, 83 (6), 1164−1168
- Hira, Z. M.; Gillies, D. F. A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data. Adv. Bioinf. 2015, 2015 (1), 198363.
- Buzdin, A. A.; Zhavoronkov, A. A.; Korzinkin, M. B.; Roumiantsev, S. A.; Aliper, A. M.; Venkova, L. S.; Smirnov, P. Y.; Borisov, N. M. The OncoFinder Algorithm for Minimizing the Errors Introduced by the High-Throughput Methods of Transcriptome Analysis. Front. Mol. Biosci. 2014, DOI: 10.3389/fmolb.2014.00008.
- Ibrahim, R.; Yousri, N. A.; Ismail, M. A.; El-Makky, N. M. MultiLevel gene/MiRNA Feature Selection Using Deep Belief Nets and Active Learning. Eng. Med. Biol. Soc. (EMBC), 2014 36th Annu. Int. Conf. IEEE 2014, 3957−3960.
- Fakoor, R.; Huber, M. Using Deep Learning to Enhance Cancer Diagnosis and Classification. In Proceeding 30th Int. Conf. Mach. Learn. Atlanta, GA, 2013, Vol. 28
- Jones, A. L. Segmenting Microarrays with Deep Neural Networks 2015, DOI: 10.1101/020404.
- Zeng, T.; Li, R.; Mukkamala, R.; Ye, J.; Ji, S. Deep Convolutional Neural Networks for Annotating Gene Expression Patterns in the Mouse Brain. BMC Bioinf. 2015, 16 (1), 147.
- Xiong, H. Y.; Alipanahi, B.; Lee, L. J.; Bretschneider, H.; Merico, D.; Yuen, R. K. C.; Hua, Y.; Gueroussov, S.; Najafabadi, H. S.; Hughes, T. R.; Morris, Q.; Barash, Y.; Krainer, Ad. R.; Jojic, N.; Scherer, S. W.; Blencowe, B. J.; Frey, B. J. The human splicing code reveals new insights into the genetic determinants of disease. Science 2015, 347 (6218), 1254806.
- Leung, M. K. K.; Xiong, H. Y.; Lee, L. J.; Frey, B. J. Deep Learning of the Tissue-Regulated Splicing Code. Bioinformatics 2014, 30 (12), i121−i129.
- Cech, T. R.; Steitz, J. A. The Noncoding RNA Revolution-Trashing Old Rules to Forge New Ones.pdf. Cell 2014, 157 (1), 77− 94.
- Fan, X.-N.; Zhang, S.-W. lncRNA-MFDL: Identification of Human Long Non-Coding RNAs by Fusing Multiple Features and Using Deep Learning. Mol. BioSyst. 2015, 11 (3), 892−897
- Witteveen, M. J. Identification and Elucidation of Expression Quantitative Trait Loci (eQTL) and Their Regulating Mechanisms Using Decodive Deep Learning; 2014; pp 1−17.
- Chen, L.; Cai, C.; Chen, V.; Lu, X. Trans-Species Learning of Cellular Signaling Systems with Bimodal Deep Belief Networks. Bioinformatics 2015, 31, 3008−3015.
- Spencer, M.; Eickholt, J.; Cheng, J. A Deep Learning Network Approach to ab Initio Protein Secondary Structure Prediction. IEEE/ ACM Trans. Comput. Biol. Bioinf. 2015, 12 (1), 103−112.
- Di Lena, P.; Nagata, K.; Baldi, P. Deep Architectures for Protein Contact Map Prediction. Bioinformatics 2012, 28 (19), 2449−2457.
- Eickholt, J.; Cheng, J. DNdisorder: Predicting Protein Disorder Using Boosting and Deep Networks. BMC Bioinf. 2013, 14 (1), 88.
- Wang, S.; Weng, S.; Ma, J.; Tang, Q. DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural Fields. Int. J. Mol. Sci. 2015, 16 (8), 17315−17330.
- Zhang, S.; Zhou, J.; Hu, H.; Gong, H.; Chen, L.; Cheng, C.; Zeng, J. A Deep Learning Framework for Modeling Structural Features of RNA-Binding Protein Targets. Nucleic Acids Res. 2016, 44 (4), e32
- Schirle, M.; Jenkins, J. L. Identifying Compound Efficacy Targets in Phenotypic Drug Discovery. Drug Discovery Today 2015, 21 (1), 82.
- Wang, C.; Liu, J.; Luo, F.; Tan, Y. Pairwise Input Neural Network for Target-Ligand Interaction Prediction. 2014 IEEE Int. Conf. Bioinf. Biomed. (BIBM) 2014, 67−70.
- Xu, Y.; Dai, Z.; Chen, F.; Gao, S.; Pei, J.; Lai, L. Deep Learning for Drug-Induced Liver Injury. J. Chem. Inf. Model. 2015, 55, 2085− 2093.
- Hughes, T. B.; Miller, G. P.; Swamidass, S. J. Modeling Epoxidation of Drug-like Molecules with a Deep Machine Learning Network. ACS Cent. Sci. 2015, 1 (4), 168−180.
- Alipanahi, B.; Delong, A.; Weirauch, M. T.; Frey, B. J. Predicting the Sequence Specificities of DNA- and RNA-Binding Proteins by Deep Learning. Nat. Biotechnol. 2015, 33, 831−838.
- Zhou, J.; Troyanskaya, O. G. Predicting Effects of Noncoding Variants with Deep Learning−based Sequence Model. Nat. Methods 2015, 12 (10), 931−934.
- Kelley, D. R.; Snoek, J.; Rinn, J. Basset: Learning the Regulatory Code of the Accessible Genome with Deep Convolutional Neural Networks; 2015.
- Liang, M.; Li, Z.; Chen, T.; Zeng, J. Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach. IEEE/ACM Trans. Comput. Biol. Bioinf. 2015, 12 (4), 928−937.
- Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; Harris, M. A.; Hill, D. P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J. C.; Richardson, J. E.; Ringwald, M.; Rubin, G. M.; Sherlock, G. Gene Ontology: Tool for the Unification of Biology. Nat. Genet. 2000, 25 (1), 25−29.
- Papadatos, G.; Davies, M.; Dedman, N.; Chambers, J.; Gaulton, A.; Siddle, J.; Koks, R.; Irvine, S. A.; Pettersson, J.; Goncharoff, N.; Hersey, A.; Overington, J. P. SureChEMBL: A Large-Scale, Chemically Annotated Patent Document Database. Nucleic Acids Res. 2016, 44, D1220 .
- Liu Y, Li C, Shen S, et al. Discovery of regulatory noncoding variants in individual cancer genomes by using cis-X [published online ahead of print, 2020 Jul 6]. Nat Genet. 2020;10.1038/s41588-020-0659-5. doi:10.1038/s41588-020-0659-5
- Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24-29. doi:10.1038/s41591-018-0316-z
- medical-data: a curated list of medical data for machine learning.
- grand-challenge.org: an overview of all challenges that have been organised within the area of medical image analysis that we are aware of. Please contact us if you want to advertise your challenge or know of any study that would fit in this overview.
- EchoNet-Dynamic: A Large New Cardiac Motion Video Data Resource for Medical Machine Learning, from Stanford
- The National Library of Medicine presents MedPix: Database of 53,000 medical images from 13,000 patients with annotations. Requires registration.
- ABIDE: The Autism Brain Imaging Data Exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Paper: http://www.ncbi.nlm.nih.gov/pubmed/23774715; Information: http://fcon_1000.projects.nitrc.org/indi/abide/
- Alzheimer's Disease Neuroimaging Initiative (ADNI): MRI database on Alzheimer's patients and healthy controls. Also has clinical, genomic, and biomaker data. Requires registration. Paper: http://www.neurology.org/content/74/3/201.short
- CT Colongraphy for Colon Cancer (Cancer Imaging Archive): CT scan for diagnosing of colon cancer. Includes data for patients without polyps, 6-9mm polyps, and greater than 10 mm polyps.
- Digital Retinal Images for Vessel Extraction (DRIVE): The DRIVE database is for comparative studies on segmentation of blood vessels in retinal images. It consists of 40 photographs out of which 7 showing signs of mild early diabetic retinopathy.; Paper: https://ieeexplore.ieee.org/document/1282003
- OASIS: a project aimed at making MRI data sets of the brain freely available to the scientific community.
- Isic Archive - Melanoma: This archive contains 23k images of classified skin lesions. It contains both malignant and benign examples.
- Sunnybrook Cardiac Data: The Sunnybrook Cardiac Data (SCD), also known as the 2009 Cardiac MR Left Ventricle Segmentation Challenge data, consist of 45 cine-MRI images from a mixed of patients and pathologies: healthy, hypertrophy, heart failure with infarction and heart.
- Lung Image Database Consortium (LIDC): Preliminary clinical studies have shown that spiral CT scanning of the lungs can improve early detection of lung cancer in high-risk individuals. Image processing algorithms have the potential to assist in lesion detection on spiral CT studies, and to assess the stability or change in lesion size on serial CT studies. The use of such computer-assisted algorithms could significantly enhance the sensitivity and specificity of spiral CT lung screening, as well as lower costs by reducing physician time needed for interpretation.
- TCIA Collections: Cancer imaging data sets across various cancer types (e.g. carcinoma, lung cancer, myeloma) and various imaging modalities.
- Belarus tuberculosis portal: a major problem of Belarus Public Health. Recently situation has been complicated with emergence and development of MDR/XDR TB and HIV/TB which require long-term treatment.
- DDSM: Digital Database for Screening Mammography: a resource for use by the mammographic image analysis research community.
- INbreast: Database for Digital Mammography: a mammographic database, with images acquired at a Breast Centre, located in a University Hospital (Hospital de São João, Breast Centre, Porto, Portugal).
- mini-MIAS: MIAS MiniMammographic Database: an organisation of UK research groups interested in the understanding of mammograms and has generated a database of digital mammograms.
- Prostate: MRI Lesion Segmentation in Multiple Sclerosis Database; Emergency Tele-Orthopedics X-ray Digital Library; IMT Segmentation; Needle EMG MUAP Time Domain Features
- DICOM image sample sets: exclusively available for research and teaching. You are not authorized to redistribute or sell them, or use them for commercial purposes.
- SCR database: Segmentation in Chest Radiographs: The SCR database has been established to facilitate comparative studies on segmentation of the lung fields, the heart and the clavicles in standard posterior-anterior chest radiographs.
- Medical Image Databases & Libraries
- VIA Group Public Databases: Documented image databases are essential for the development of quantitative image analysis tools especially for tasks of computer-aided diagnosis (CAD). In collaboration with the I-ELCAP group we have established two public image databases that contain lung CT images in the DICOM format together with documentation of abnormalities by radiologists.
- CVonline: Image Databases
- The USC-SIPI Image Database: a collection of digitized images. It is maintained primarily to support research in image processing, image analysis, and machine vision.
- Histology dataset: image registration of differently stain slices: The dataset consists of 2D histological microscopy tissue slices, stained with different stains, and landmarks denoting key-points in each slice. The task is image registration - align all slices in particular set of images (consecutive stain cuts) together, for instance to the initial image plane. The main challenges for these images are the following: very large image size, appearance differences, and lack of distinctive appearance objects. The dataset contains 108 image pairs and manually placed landmarks for registration quality evaluation.
- Challenges/Contest Data: Visual Concept Extraction Challenge in Radiology Manually annotated radiological data of several anatomical structures (e.g. kidney, lung, bladder, etc.) from several different imaging modalities (e.g. CT and MR). They also provide a cloud computing instance that anyone can use to develop and evaluate models against benchmarks.
- Grand Challenges in Biomedical Image Analysis: A collection of biomedical imaging challenges in order to facilitate better comparisons between new and existing solutions, by standardizing evaluation criteria.
- Dream Challenges: pose fundamental questions about systems biology and translational medicine. Designed and run by a community of researchers from a variety of organizations, our challenges invite participants to propose solutions — fostering collaboration and building communities in the process.
- Kaggle diabetic retinopathy: High-resolution retinal images that are annotated on a 0–4 severity scale by clinicians, for the detection of diabetic retinopathy. This data set is part of a completed Kaggle competition, which is generally a great source for publicly available data sets.
- Cervical Cancer Screening: In this kaggle competition, you will develop algorithms to correctly classify cervix types based on cervical images. These different types of cervix in our data set are all considered normal (not cancerous), but since the transformation zones aren't always visible, some of the patients require further testing while some don't.
- Multiple sclerosis lesion segmentation: challenge 2008. A collection of brain MRI scans to detect MS lesions.
- Multimodal Brain Tumor Segmentation Challenge: Large data set of brain tumor magnetic resonance scans. They’ve been extending this data set and challenge each year since 2012.
- Coding4Cancer: A new initiative by the Foundation for the National Institutes of Health and Sage Bionetworks to host a series of challenges to improve cancer screening. The first is for digital mammography readings. The second is for lung cancer detection. The challenges are not yet launched.
- EEG Challenge Datasets on Kaggle: Melbourne University AES/MathWorks/NIH Seizure Prediction - Predict seizures in long-term human intracranial EEG recordings: here; American Epilepsy Society Seizure Prediction Challenge - Predict seizures in intracranial EEG recordings: here; UPenn and Mayo Clinic's Seizure Detection Challenge - Detect seizures in intracranial EEG recordings: here; Grasp-and-Lift EEG Detection - Identify hand motions from EEG recordings: here
- Challenges track in MICCAI Conference: The Medical Image Computing and Computer Assisted Intervention. Most of the challenges would've been covered by websites like grand-challenges etc. You can still see all of them under the "Satellite Events" tab of the conference sites. 2019; 2018 2017; 2016; 2015
- International Symposium on Biomedical Imaging (ISBI): The IEEE International Symposium on Biomedical Imaging (ISBI) is a scientific conference dedicated to mathematical, algorithmic, and computational aspects of biomedical imaging, across all scales of observation. Most of these challenges will be listed in grand-challenges. You can still access it by visiting the "Challenges" tab under "Program" in each year's website. 2019; 2018; 2017; 2016
- Continuous Registration Challenge (CRC): Continuous Registration Challenge (CRC) is a challenge for registration of lung- and brain images inspired by modern software development practices. Participants implement their algorithm using the open source SuperElastix C++ API. The challenge focuses on pairwise registration of lungs and brains, two problems frequently encountered in clinical settings. They have collected seven open-access data sets and one private data set (3+1 lung data sets, 4 brain data sets). The challenge results will be presented and discussed at the upcoming Workshop On Biomedical Image Registration (WBIR 2018).
- Automatic Non-rigid Histological Image Registration (ANHIR): This ANHIR challenge aims at the automatic nonlinear image registration of 2D whole slice imaging (WSI) microscopy images of histopathology tissue samples stained with different dyes. The task is difficult due to non-linear deformations affecting the tissue samples, different appearance of each stain, repetitive texture, and the large size of the whole slide images. Benchmark; BIRL: Benchmark on Image Registration methods with Landmark validation
- Bone X-Ray Deep Learning Competition using MURA: MURA (musculoskeletal radiographs) is a large dataset of bone X-rays. The Stanford ML Group and AIMI Center are hosting a competition where algorithms are tasked with determining whether an X-ray study is normal or abnormal. The algorithms are evaluated on a test set of 207 musculoskeletal studies, where each study was individually retrospectively labeled as normal or abnormal by 6 board-certified radiologists. Three of these radiologists were used to create a gold standard, defined as the majority vote of the labels of the radiologists, and the other three were used to obtain the best radiologist performance, defined as the maximum score of the three radiologists with the gold standard as groundtruth. The challenge leaderboard is hosted publicly and updated every two weeks.
- 2019 Kidney and Kidney Tumor Segmentation Challenge (KiTS19): The KiTS19 challenge is on the semantic segmentation of kidneys and kidney tumors in contrast-enhanced CT scans. The dataset consists of 300 patients with preoperative arterial-phase abdominal CTs annotated by experts. 210 (70%) of these were released as a training set and the remaining 90 (30%) were held out as a test set. This challenge was held in conjunction with MICCAI 2019.
- Building the graph of medicine from millions of clinical narratives: Co-occurence statistics for medical terms extracted from 14 million clinical notes and 260,000 patients. Paper: http://www.nature.com/articles/sdata201432
- Learning Low-Dimensional Representations of Medical Concept: Low-dimensional embeddings of medical concepts constructed using claims data. Note that this paper utilizes data from Building the graph of medicine from millions of clinical narratives. Paper: http://cs.nyu.edu/~dsontag/papers/ChoiChiuSontag_AMIA_CRI16.pdf
- MIMIC-III, a freely accessible critical care database: Anonymized critical care EHR database on 38,597 patients and 53,423 ICU admissions. Requires registration. Paper: http://www.nature.com/articles/sdata201635
- Clinical Concept Embeddings Learned from Massive Sources of Medical Data: Embeddings for 108,477 medical concepts learned from 60 million patients, 1.7 million journal articles, and clinical notes of 20 million patients; Paper: https://arxiv.org/abs/1804.01486; Embeddings: https://figshare.com/s/00d69861786cd0156d81
- Evaluation of Embeddings of Laboratory Test Codes for Patients at a Cancer Center: 200 dimensional Word2Vec embeddings of 1098 laboratory test codes (LOINCs) trained from 8,280,238 lab orders for 79,081 patients at City of Hope National Medical Center (Los Angeles, CA). Paper: https://arxiv.org/abs/1907.09600
- National Healthcare Data
- Medicare Data: Data from the Centers for Medicare & Medicaid Services (CMS) on hospitals, nursing homes, physicians, home healthcare, dialysis, and device providers. Landing page: https://data.medicare.gov
- Texas Public Use Inpatient Data File: Data on 11 Million inpatient visits with diagnosis, procedure codes and outcomes from Texas between 2006 & 2009.
- Dollars for Doctors: Propublica investigation of money paid by pharmaceutical companies to doctors. Information: https://www.propublica.org/series/dollars-for-docs; Search tool: https://projects.propublica.org/docdollars/
- DocGraph: Physician interaction network obtained through a freedom of information act request. Covers nearly 1 million entities. Main page: http://www.docgraph.com; Information: http://thehealthcareblog.com/blog/2012/11/05/tracking-the-social-doctor-opening-up-physician-referral-data-and-much-more/
- UCI Datasets: Liver Disorders Data Set | Data on 345 patients with and without liver disease. Features are 5 blood biomarkers thought to be involved with liver disease; Thyroid Disease Data Set; Breast Cancer Data Set; Heart Disease Data Set; Lymphography Data Set; Parkinsons Data Set; Parkinsons Telemonitoring Data Set; Parkinson Speech Dataset with Multiple Types of Sound Recordings Data Set; Parkinson's Disease Classification Data Set; Primary Tumor Dataset Data
- PMC Open Access Subset: Collection of all the full-text, open access articles in Pubmed central. Information: http://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/
- PubMed 200k RCT: Collection of pubmed abstracts from randomized control trials (RCTs). Annotations for each sentence in the abstract are available. Paper: https://arxiv.org/abs/1710.06071
- Web API of PubMed Articles: NLM also provided Web API for accessing biomedical literatures in PubMed. Instructions for getting PubMed articles: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/BioC-PubMed/ (not full text, just title, abstract, etc.)
- EBM NLP: Collection of pubmed abstracts from randomized control trials (RCTs). Annotation of Population, Intervention, and Outcomes (PICO elements) are available. Paper: https://arxiv.org/abs/1806.04185; Website: https://ebm-nlp.herokuapp.com/index
- Evidence Inference: A dataset for inferring the results of randomized control trials (RCTs). A collection of pubmed RCTs from the open access subset. Annotations of (intervention, comparison intervention, outcome, significance finding, evidence span) are available. Paper: https://arxiv.org/abs/1904.01606. Website: http://evidence-inference.ebm-nlp.com/
- PubMedQA: A dataset for biomedical research question answering. The task is to use yes/no/maybe to answer naturally occuring questions in PubMed titles. Paper: https://arxiv.org/abs/1909.06146 Website: https://pubmedqa.github.io/
- TREC Precision Medicine / Clinical Decision Support Track: 2014; 2015; 2016; 2017
- The TORGO Database: Acoustic and articulatory speech from speakers with dysarthria: The TORGO database of dysarthric articulation consists of aligned acoustics and measured 3D articulatory features from speakers with either cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS), which are two of the most prevalent causes of speech disability (Kent and Rosen, 2004), and matched controls. This database, called TORGO, is the result of a collaboration between the departments of Computer Science and Speech-Language Pathology at the University of Toronto and the Holland-Bloorview Kids Rehab hospital in Toronto. Paper: link
- NKI-CCRT Corpus: Speech Intelligibility Before and After Advanced Head and Neck Cancer Treated with Concomitant Chemoradiotherapy: NKI-CCRT corpus with individual listener judgements on the intelligibility of recordings of 55 speakers treated for cancer of the head and neck will be made available for restricted scientific use. The corpus contains recordings and perceptual evaluations of speech intelligibility over three evaluation moments: before treatment and after treatment (10-weeks and 12-months). Treatment was by means of chemoradiotherapy (CCRT).
- Atypical Affect Interspeech Sub-Challenge: Björn Schuller, Simone Hantke, and colleagues are providing the EMOTASS Corpus. This unique corpus is the first to give access to recordings of affective speech from disabled individuals encompassing a broader variety of mental, neurological, and physical disabilities. It comprises recordings of 15 disabled adult individuals (ages range from 19 to 58 years with a mean age of 31.6 years). The task will be classification of five emotions from their speech facing atypical display. Recordings were made in their everyday working environment. Overall, around 11k utterances and around nine hours of speech are included. Paper: http://emotion-research.net/sigs/speech-sig/is2018_compare.pdf
- Autism Sub-Challenge: The Autism Sub-Challenge is based upon the “Child Pathological Speech Database” (CPSD) . It provides speech as recorded in two university departments of child and adolescent psychiatry, located in Paris, France (Universite Pierre et Marie Curie/Pitie Salpetiere Hospital and Universite Rene Descartes/Necker Hospital). The dataset used in the Sub-Challenge contains 2.5 k instances of speech recordings from 99 children aged 6 to 18. Paper: http://emotion-research.net/sigs/speech-sig/is2013_compare.pdf
- MedicalNet: a Pytorch implementation of Med3D: Transfer Learning for 3D Medical Image Analysis.
- MONAI: a PyTorch-based, open-source framework for deep learning in healthcare imaging, part of PyTorch Ecosystem.
- NiftyNet: a TensorFlow-based open-source convolutional neural networks (CNN) platform for research in medical image analysis and image-guided therapy