Welcome to 3050571 Pracical Clinical Data Science

This is the repository for the learning materials from the 3050571: Pracical Clinical Data Science course taught by our group at the Faculty of Medicine, Chulalongkorn University in Bangkok, Thailand in Spring 2024.

Lectures were taught in Thai but the slides and assignments are in English.

Announcements

Location: Classes will be held in the conference room at Bhumisiri Building, 8th Floor, Zone C (look for Stem Cell Center)
Time: See the schedule for more details
- Lecture and recitation on Tuesday and Thursday 1-2pm (except the first class on Jan 30 which will be 1-3pm)
- Python workshop on Friday 1-3pm (bring your computer, except on March 8 which will be a recitation) + Wednesday March 6
Contact:
- Post your questions / comments in the Discussion
- Emails and phone numbers for myself and TAs will be given in the first class

What can you find here?

Lecture and recitation slides
Assigned online video and readings
Python notebooks and data files for the practical sessions

Course structure

This is a 6-week course that mixes independent study of online contents with in-class recitation and Python workshop. The total assigned videos range from 1-3 hours for each session. Assignments include Kaggle Python modules and analysis of public clinical datasets. Half of the course involve an active intership with the Data Team at King Chulalongkorn Memorial Hospital where students are expected to participate in drafting solutions for real-world use cases.

Key topics

Computational thinking
Python programming for data science
Machine learning
AI in healthcare and hospital management

Recommended prerequisites

MIT 6.100L Introduction to CS and Programming using Python
MIT 6.0001 Introduction to Python programming
Some familiarity with the syntax of a programming language

Recommended extra resources

MIT 6.0002 Computational Thinking dedicates more time to problem-solving and computational thinking practice with Python. MIT 6.036 Introduction to Machine Learning is an advanced undergraduate course.
MIT 6.S191 Deep Learning & AI provides deeper knowledge of deep learning and artificial intelligence beyond this course.
University of Tubingen Intro to ML provides more rigorous understanding of machine learning.
MIT 6.S897 ML for Healthcare covers more real-world applications of AI in healthcare.
StatQuest YouTube channel presents bite-size visual explanation of computational techniques.

Timeline for internship with hospital data team

Pick a problem they want to tackle by Week 2
Submit a draft proposal on how to solve the problem by Week 3 and a refined proposal by Week 4
Work with the data team to test the proposed solution
Present the findings and progress at the end of the course on Week 6

Week 1 - Computational Thinking

Key learning points

What is computational thinking and how do you apply it to solve problem?
How to systematically approach a problem?

Assigned study

Computational Thinking video and reading
A perspective on programming vs coding first 3 min
Three (3) things to do when starting out in Data Science
Optimization problem from MIT 6.0002 Lecture 1 and Lecture 2

Assigned practice

Python code editors
Kaggle Intro to programming and Python lessons

Week 2 - Data Exploration and Storytelling

Key learning points

What can (and can't) the data tell us?
What are the right statistical & analytical techniques for your hypothesis?
How to pick the right graphs to tell your story?

Assigned study

Statistics and probability

MIT 6.0002 Stochastic Thinking and Sampling
StatQuest Hypothesis Testing, P-value, and Maximum Likelihood clips
Extra: MIT 18.05 summary notes on Probability and Statistics. You are not expected to understand everything in this class, but you should understand everything for your future career.

Data handling and visualization

Assigned homework

Kaggle Data Handling (for tabular data) and Visualization lessons
Kaggle Titanic Dataset
- Explore the data. Develop and test some hypotheses regarding the passengers of the Titanic.
- Visualize the patterns to tell something interesting
- For example, what were the demographics of the passengers? Who were the survivors? Which factors did you think are predictive of survival? Did the data agree or disagree?

Week 3 - Unsupervised Learning

Key learning points

How can we learn from unlabeled data (such as clinical data without diagnosis result)?
How do dimensionality reduction and clustering techniques work? What are the pros and cons? How to interpret?

Assigned study

Dimensionality reduction

The first 24 minutes of Andrew Ng's Dimensionality Reduction lecture
StatQuest Principal Component Analysis (PCA)'s Concept and Practical Points
Extra: PCA explanations from Steve Brunton and University of Tubingen
The first 33 minutes of University of Tubingen's Manifold and t-SNE lecture
StatQuest t-distributed Stochastic Neighbor Embedding

Clustering

MIT 6.0002 introduction to Clustering
StatQuest k-Mean, Hierarchical Clustering, and DBSCAN explanation
More technical explanation on k-mean from the first 23 minutes of University of Tubingen
Extra: Introduction to network-based clustering from Stanford University [Lecture 24, 28, and 29]

Assigned homework

Kaggle Data Cleaning lesson
Python for Health Data Science's exercises on Emergency Department, Stroke, and Readmission datasets
Operating Room Utilization by a Kaggle User
- Summarize and visualize the activity of operating room at this hospital.
- Extract knowledge. For example, which type of operations occupy the room the longest? Did booked time match well with actual usage?
Extra: Colorectal Cancer Molecular Subtyping
- Get the clinical and gene expression data (n = 604, Synapse registration required)
- Use unsupervised learning technique on gene expressiondata to identify patient subgroups
- Test your subgroups with clinical data

Week 4 - Supervised Learning

Key learning points

How does the computer learn to make prediction?
What are the key parameters describing each model? How can we optimize them using our data?

Assigned study

Principles of machine learning

MIT 6.0002 Machine Learning and Classification
StatQuest Ridge Regression, LASSO Regression, Logistic Regression Part 1, Logistic Regression Part 2, and Support Vector Machine
StatQuest Bias vs Variance Tradeoff principle

Tree models

StatQuest Decision Tree, Random Forest, and Adaptive Boosting
StatQuest XGBoost library in Python

Assigned homework

Kaggle Intro to ML and Intermediate ML lessons
Predicting hospital admission using Emergency Department dataset from Hong, W.S. et al. PLOS ONE 2018

Week 5 - Introduction to Deep Learning and AI

Key learning points

What are deep learning and artificial neural network?
How did modern AI emerge? Why is AI so powerful today?

Assigned study

Artificial intelligence

An introduction by COMPSCI188 at UC Berkeley
Stanford webminar on How can AI improve healthcare?
TEDx talk by A Doctor Who Code

Deep learning

MIT 6.S191 Introduction to Deep Learning and Convolutional Neural Network
Extra: MIT 6.S191 Recurrent Neural Network, Transformer, and Attention and Reinforcement Learning

Assigned homework

Kaggle Deep Learning and Computer Vision lessons
Kaggle Digit MNIST Dataset
- Use unsupervised learning technique to visualize the data
- Develop linear, tree, and artificial neural network models
Extra: Chula's COVID-19 Home Isolation Dataset
- Predict whether a particular patient will develop pneumonia and be admitted to a hospital within 3 days

Week 6 - AI Explainability and Pitfall

Key learning points

How can we understand decisions made by the model?
What should we be concerned about when developing a medical AI?

Assigned study

Explainability

Stanford webminar on Motivation, Explainability Techniques (first 30 min), and How to Evaluate (first 40 min)
MIT 6.S191 Robustness and Trustworthiness

AI project design

Stanford symposium on Responsible Implementation of AI in Healthcare
TensorFlow tech talk on Building AI Models for Healthcare and Best Practices for ML Product Decisions

Assigned homework

Kaggle Explainability lesson
A CVPR 2021 CXR tutorial on Explainability by Dr. Alistair Johnson with both video and Google Colab notebook

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
extra		extra
notebooks		notebooks
slides		slides
3050571_assignment.pdf		3050571_assignment.pdf
3050571_syllabus_schedule.pdf		3050571_syllabus_schedule.pdf
README.md		README.md

cmb-chula/clini-data-science-3050571

Folders and files

Latest commit

History

Repository files navigation

Welcome to 3050571 Pracical Clinical Data Science

Announcements

What can you find here?

Course structure

Key topics

Recommended prerequisites

Recommended extra resources

Timeline for internship with hospital data team

Week 1 - Computational Thinking

Key learning points

Assigned study

Assigned practice

Week 2 - Data Exploration and Storytelling

Key learning points

Assigned study

Statistics and probability

Data handling and visualization

Assigned homework

Week 3 - Unsupervised Learning

Key learning points

Assigned study

Dimensionality reduction

Clustering

Assigned homework

Week 4 - Supervised Learning

Key learning points

Assigned study

Principles of machine learning

Tree models

Assigned homework

Week 5 - Introduction to Deep Learning and AI

Key learning points

Assigned study

Artificial intelligence

Deep learning

Assigned homework

Week 6 - AI Explainability and Pitfall

Key learning points

Assigned study

Explainability

AI project design

Assigned homework

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages