This is the repository for the learning materials from the 3050571: Pracical Clinical Data Science course taught by our group at the Faculty of Medicine, Chulalongkorn University in Bangkok, Thailand in Spring 2024.
Lectures were taught in Thai but the slides and assignments are in English.
- Location: Classes will be held in the conference room at Bhumisiri Building, 8th Floor, Zone C (look for Stem Cell Center)
- Time: See the schedule for more details
- Lecture and recitation on Tuesday and Thursday 1-2pm (except the first class on Jan 30 which will be 1-3pm)
- Python workshop on Friday 1-3pm (bring your computer, except on March 8 which will be a recitation) + Wednesday March 6
- Contact:
- Post your questions / comments in the Discussion
- Emails and phone numbers for myself and TAs will be given in the first class
- Lecture and recitation slides
- Assigned online video and readings
- Python notebooks and data files for the practical sessions
This is a 6-week course that mixes independent study of online contents with in-class recitation and Python workshop. The total assigned videos range from 1-3 hours for each session. Assignments include Kaggle Python modules and analysis of public clinical datasets. Half of the course involve an active intership with the Data Team at King Chulalongkorn Memorial Hospital where students are expected to participate in drafting solutions for real-world use cases.
- Computational thinking
- Python programming for data science
- Machine learning
- AI in healthcare and hospital management
- MIT 6.100L Introduction to CS and Programming using Python
- MIT 6.0001 Introduction to Python programming
- Some familiarity with the syntax of a programming language
- MIT 6.0002 Computational Thinking dedicates more time to problem-solving and computational thinking practice with Python. MIT 6.036 Introduction to Machine Learning is an advanced undergraduate course.
- MIT 6.S191 Deep Learning & AI provides deeper knowledge of deep learning and artificial intelligence beyond this course.
- University of Tubingen Intro to ML provides more rigorous understanding of machine learning.
- MIT 6.S897 ML for Healthcare covers more real-world applications of AI in healthcare.
- StatQuest YouTube channel presents bite-size visual explanation of computational techniques.
- Pick a problem they want to tackle by Week 2
- Submit a draft proposal on how to solve the problem by Week 3 and a refined proposal by Week 4
- Work with the data team to test the proposed solution
- Present the findings and progress at the end of the course on Week 6
- What is computational thinking and how do you apply it to solve problem?
- How to systematically approach a problem?
- Computational Thinking video and reading
- A perspective on programming vs coding first 3 min
- Three (3) things to do when starting out in Data Science
- Optimization problem from MIT 6.0002 Lecture 1 and Lecture 2
- Python code editors
- Kaggle Intro to programming and Python lessons
- What can (and can't) the data tell us?
- What are the right statistical & analytical techniques for your hypothesis?
- How to pick the right graphs to tell your story?
- MIT 6.0002 Stochastic Thinking and Sampling
- StatQuest Hypothesis Testing, P-value, and Maximum Likelihood clips
- Extra: MIT 18.05 summary notes on Probability and Statistics. You are not expected to understand everything in this class, but you should understand everything for your future career.
- Exploratory data analysis with Python
- Basic visualization with Python
- Story telling with graphs design
- Kaggle Data Handling (for tabular data) and Visualization lessons
- Kaggle Titanic Dataset
- Explore the data. Develop and test some hypotheses regarding the passengers of the Titanic.
- Visualize the patterns to tell something interesting
- For example, what were the demographics of the passengers? Who were the survivors? Which factors did you think are predictive of survival? Did the data agree or disagree?
- How can we learn from unlabeled data (such as clinical data without diagnosis result)?
- How do dimensionality reduction and clustering techniques work? What are the pros and cons? How to interpret?
- The first 24 minutes of Andrew Ng's Dimensionality Reduction lecture
- StatQuest Principal Component Analysis (PCA)'s Concept and Practical Points
- Extra: PCA explanations from Steve Brunton and University of Tubingen
- The first 33 minutes of University of Tubingen's Manifold and t-SNE lecture
- StatQuest t-distributed Stochastic Neighbor Embedding
- MIT 6.0002 introduction to Clustering
- StatQuest k-Mean, Hierarchical Clustering, and DBSCAN explanation
- More technical explanation on k-mean from the first 23 minutes of University of Tubingen
- Extra: Introduction to network-based clustering from Stanford University [Lecture 24, 28, and 29]
- Kaggle Data Cleaning lesson
- Python for Health Data Science's exercises on Emergency Department, Stroke, and Readmission datasets
- Operating Room Utilization by a Kaggle User
- Summarize and visualize the activity of operating room at this hospital.
- Extract knowledge. For example, which type of operations occupy the room the longest? Did booked time match well with actual usage?
- Extra: Colorectal Cancer Molecular Subtyping
- Get the clinical and gene expression data (n = 604, Synapse registration required)
- Use unsupervised learning technique on gene expressiondata to identify patient subgroups
- Test your subgroups with clinical data
- How does the computer learn to make prediction?
- What are the key parameters describing each model? How can we optimize them using our data?
- MIT 6.0002 Machine Learning and Classification
- StatQuest Ridge Regression, LASSO Regression, Logistic Regression Part 1, Logistic Regression Part 2, and Support Vector Machine
- StatQuest Bias vs Variance Tradeoff principle
- StatQuest Decision Tree, Random Forest, and Adaptive Boosting
- StatQuest XGBoost library in Python
- Kaggle Intro to ML and Intermediate ML lessons
- Predicting hospital admission using Emergency Department dataset from Hong, W.S. et al. PLOS ONE 2018
- What are deep learning and artificial neural network?
- How did modern AI emerge? Why is AI so powerful today?
- An introduction by COMPSCI188 at UC Berkeley
- Stanford webminar on How can AI improve healthcare?
- TEDx talk by A Doctor Who Code
- MIT 6.S191 Introduction to Deep Learning and Convolutional Neural Network
- Extra: MIT 6.S191 Recurrent Neural Network, Transformer, and Attention and Reinforcement Learning
- Kaggle Deep Learning and Computer Vision lessons
- Kaggle Digit MNIST Dataset
- Use unsupervised learning technique to visualize the data
- Develop linear, tree, and artificial neural network models
- Extra: Chula's COVID-19 Home Isolation Dataset
- Predict whether a particular patient will develop pneumonia and be admitted to a hospital within 3 days
- How can we understand decisions made by the model?
- What should we be concerned about when developing a medical AI?
- Stanford webminar on Motivation, Explainability Techniques (first 30 min), and How to Evaluate (first 40 min)
- MIT 6.S191 Robustness and Trustworthiness
- Stanford symposium on Responsible Implementation of AI in Healthcare
- TensorFlow tech talk on Building AI Models for Healthcare and Best Practices for ML Product Decisions
- Kaggle Explainability lesson
- A CVPR 2021 CXR tutorial on Explainability by Dr. Alistair Johnson with both video and Google Colab notebook