This repository regroups the course material of two "intermediate python" one-day courses of SIB Training:
- Data analysis and representation in python (DARPY).
- Optimizing Python Code for Better Performance (OPTPY).
- Interactive Visualization with Python (IVIPY)
These courses are addressed to life scientists, bioinformaticians and researchers who are familiar with writing Python code and core Python elements, and would like to explore it further in their daily data wrangling and exploration tasks.
Please note that the courses require participants to already be familiar with basic Python syntax, environment, and the most common commands.
Topics covered in these courses include:
Data analysis and representation in python:
- Parsing, transforming, and exploring data using Pandas.
- Performing statistical simulation and testing with Numpy/Scipy.
- Representing data in an efficient and impactful manner using Seaborn.
Optimizing Python Code for Better Performance:
- Assessing computational resource usage of your code.
- Speeding-up your Python code with Numba and more.
Interactive Visualization with Python
- Create simple interactive plots and tune them to make them useful for scientific data exploration with python plotly
- Enrich visualizations with interactive elements while keeping them easy to share as simple html files with python plotly or web assembly
- Develop web server-based data visualization applications with plotly-dash
Participants are expected to be familiar with basic Python syntax, concepts, and the most common commands, such as:
- Basic data types such as
list
,tuple
, ordict
, and their basic methods. - Flow control such as loops (
for
,while
) andif ... else
structures. - Using and writing functions.
- If you need a refresher, please go through the notebook 00_python_warmup.ipynb in this repository.
These courses also rely on Jupyter Notebooks, a web based notebook system for creating and sharing computational documents in an interactive manner.
The courses do not provide an introduction to Jupyter Notebooks, so if you are not familiar with them we recommend to go through a short tutorial such as this one or that more in-depth one.
Please make sure you have installed all the required software by setting-up your computer before the start of the course.
In addition, you should ensure you have the following libraries installed
(installation can be done via conda
or pip
for example):
Course 1 - Data analysis and representation in python:
Course 2 - Optimizing Python Code for Better Performance:
Course 3 - Interactive Visualization with Python:
- pandas
- plotly
- plotly-dash
The course revolves around a series of Jupyter Notebooks which develop different aspect of data analysis with Python.
Each jupyter notebook interleaves theory and examples of codes. We heartily recommend you execute and play around with these bits of code as you follow along.
-
Prerequisite:
- 00_python_warmup.ipynb: something to help you get in programming mood, and check your knowledge of basic python syntax.
-
Course 1 - Data analysis and representation in python:
- 01_data_manipulation.ipynb:
an introduction and exploration of
pandas
. - 02_data_description_and_representation.ipynb:
usage of
pandas
,matplotlib
, andseaborn
for tabular data exploration and plotting. - 03_statistics_with_python.ipynb:
an overview of statistical testing
scipy.stats
and linear models withstatsmodels
.
- 01_data_manipulation.ipynb:
an introduction and exploration of
-
Course 2 - Optimizing Python Code for Better Performance:
- 01_resource_usage_measure_and_profiling.ipynb: code resource usage monitoring and profiling.
- 02_faster_python.ipynb: making python
code run faster, in particular with
numba
andcython
. - 03_multiprocessing_multithreading_python.ipynb: multiprocessing and multithreading parallelization in python.
The data used in the practicals can be found in the data/
subdirectories
of course1/
or course2/
.
Solutions to the exercises:
- For regular exercises, solutions can be loaded directly from the exercise
notebooks. The actual files are located in the
solutions/
subdirectories ofcourse1/
orcourse2/
. - For micro-exercises, solutions can be found in the
solutions/micro_exercises
subdirectories ofcourse1/
orcourse2/
.
We recorded the last edition of these courses in November 2023 and organized them into two playlists:
- Data analysis and representation in Python playlist
- Optimizing Python Code for Better Performance playlist
Please cite as: Wandrille Duchemin, Robin Engler. (2023, November 14). Material for the intermediate python SIB-training course. Zenodo. https://doi.org/10.5281/zenodo.10124583