Skip to content

A curated list of free courses from reputable universities that meet the requirements of an undergraduate curriculum in Data Science, excluding general education. With projects, supporting materials in an organized structure.

Notifications You must be signed in to change notification settings

marcoshsq/The_Self-taught_Data_Scientist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

Developer Roadmap

Advanced Data Science and Analytics Self Taught Program



Content Summary


Overview/About

The Self-taught Data Science Curriculum is a learning guide I developed to master data science concepts and skills for free. Upon realizing the vast amount of high-quality, free resources available online, I decided to compile and organize them into a coherent roadmap. This project is not only my personal journey into data science but also a guide for anyone who wishes to follow a similar path.

Initially, this curriculum was designed for my own learning, but you are welcome to clone it and explore the courses if they align with your goals. The material here covers a broad range of topics essential for a successful data science career, from programming to artificial intelligence. The sources I used can be found in the "References" section at the end of the README.

Learning Goals

The main objective is to follow a structured learning path inspired by the roadmap from the AI Expert team. The key skills and concepts I aim to master by the end of this curriculum include:

1. Proficiency in Programming

  • Python: The primary language for data manipulation, machine learning, and AI model development. Python will be heavily explored due to its versatility and wide adoption in data science.
  • R: A powerful language for statistical analysis, data visualization, and in-depth exploration of statistical data.

2. Databases, Big Data, and Data Warehousing

  • Databases: Focus on both relational (SQL) and non-relational (NoSQL) database systems for effective data management and retrieval.
  • Data Warehousing: Understanding the design and implementation of data warehouses for efficient storage and management of large datasets.

3. Artificial Intelligence and Machine Learning

  • Machine Learning: Learn how to build and apply machine learning models for tasks such as predictive analytics, classification, and pattern recognition.
  • Deep Learning: Dive into neural networks, with an emphasis on frameworks like TensorFlow and PyTorch, to explore architectures and advanced AI techniques.

How to Use This Curriculum

This curriculum is broken down into various modules that align with the core areas of data science. You can follow them sequentially or skip to specific areas based on your current knowledge and interests. I encourage you to adapt this guide to your own learning style, pace, and goals.

References

The "References" section at the end of this repository contains a comprehensive list of resources that I consulted while building this guide, including free online courses, tutorials, and learning platforms.


Feel free to make this description more personal or technical based on your style! It provides a structured overview while highlighting your personal journey and intention of sharing knowledge with others.


Section 01 - Fundamentals (~40h)

Course Offered by Effort Certificate, if applicable Status
Data – What It Is, What We Can Do With It Johns Hopkins University ~11h Certificate of Completion
What is Data Science? IBM Skills Network ~11h Certificate of Completion
The Data Scientist's Toolbox Johns Hopkins University ~18h Certificate of Completion

Section 02 - Mathematics and Statistics for Data Science (~90h)

Course Offered by Effort of Certificate, if applicable Status
Linear Algebra for Machine Learning and Data Science DeepLearning.AI ~34h -- --
Calculus for Machine Learning and Data Science DeepLearning.AI ~25h -- --
Probability and Statistics for Machine Learning and Data Science DeepLearning.AI ~33h -- --

Section 03 - Programming for Data Science

Section 03-A - Python Language for Data Analysis (~140h)

Course Offered by Effort of Certificate, if applicable Status
Introduction to Data Science in Python University of Michigan ~34h -- --
Applied Plotting, Charting & Data Representation in Python University of Michigan ~24h -- --
Applied Machine Learning in Python University of Michigan ~31h -- --
Applied Text Mining in Python University of Michigan ~25h -- --
Applied Social Network Analysis in Python University of Michigan ~26h -- --

Section 03-B - R Language for Statistical Analysis and Modeling (~75h)

Course Offered by Effort of Certificate, if applicable Status
R Programming Johns Hopkins University ~27h -- --
Advanced R Programming Johns Hopkins University ~18h -- --
Building R Packages Johns Hopkins University ~20 -- --
Building Data Visualization Tools Johns Hopkins University ~12h -- --
Mastering Software Development in R Johns Hopkins University ~3h -- --

Section 04 - Data Mining (~120h)

Course Offered by Effort Certificate, if applicable Status
Data Visualization University of Illinois ~15h -- --
Text Retrieval and Search Engines University of Illinois ~30h -- --
Text Mining and Analysis University of Illinois ~33h -- --
Pattern Discovery in Data Mining University of Illinois ~17h -- --
Cluster Analysis in Data Mining University of Illinois ~16h -- --

Section 05 - Databases and SQL (~80h)

Course Offered by Effort Certificate, if applicable Status
Relational Database Design University of Colorado ~34h -- --
The Structured Query Language (SQL) University of Colorado ~26h -- --
Advanced Topics and Future Trends in Database Technologies University of Colorado ~16h -- --

Section 06 - Big Data (~85h)

Course Offered by Effort Certificate, if applicable Status
Introduction to Big Data University of California ~17h -- --
Big Data Modeling and Management Systems University of California ~13h -- --
Big Data Integration and Processing University of California ~17h -- --
Machine Learning with Big Data University of California ~23h -- --
Graph Analytics for Big Data University of California ~13h -- --

Section 07 - Machine Learning (~120h)

Course Offered by Effort Certificate, if applicable Status
Supervised Machine Learning: Regression and Classification DeepLearning.AI ~33h -- --
Advanced Machine Learning Algorithms DeepLearning.AI ~34h -- --
Unsupervised Learning, Recommenders, Reinforcement Learning DeepLearning.AI ~37h -- --

Section 08 - Deep Learning (~125h)

Course Offered by Effort Certificate, if applicable Status
Neural Networks and Deep Learning DeepLearning.AI ~24h -- --
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization DeepLearning.AI ~23h -- --
Structuring Machine Learning Projects DeepLearning.AI ~06h -- --
Convolutional Neural Networks DeepLearning.AI ~35h -- --
Sequence Models DeepLearning.AI ~37h -- --

Extra Bibliography

Mathematics Books

Books, Articles, and Related Documentation

These resources cover a wide range of topics from foundational mathematics and statistical theory to advanced machine learning and artificial intelligence.

Notes and Clarifications

  • The duration of the courses listed here are estimates provided by the platforms where they are offered.

  • At the moment, I am working on this graduation, so the tense of this readme is a bit strange, sometimes in the past, sometimes in the future. As I work on it, I will reformat it to better reflect my experience.

  • Regarding the books, my university has partnerships with some platforms like O'Reilly, in addition to a very large library where I managed to find almost all of them. But if you don't have access... ahem... try to see if they fall off the truck... ahem... but if you can buy them, please do.

References

Sources consulted for the construction of this curriculum.

  • OSSU Data Science - OSSU offers a free, open-source curriculum in data science, perfect for those looking to study technology in a self-paced and flexible manner. I highly recommend OSSU and any initiative that aims to democratize education.

  • AI Expert Roadmap - A detailed roadmap to becoming an AI expert, developed by specialists in the field.

  • Python Developer - Roadmap SH provides comprehensive learning paths across various technology areas and tools. This link directs to the Python roadmap, but they offer many other paths.

  • PostgreSQL - PostgreSQL Database Administrator roadmap, also from Roadmap SH, outlining a specific learning path for professionals in the field.

  • USP Statistics Course - Curriculum for the Bachelor's Degree in Statistics at the University of São Paulo, used to guide the selection of courses and books in this list.


Developer Roadmap


About

A curated list of free courses from reputable universities that meet the requirements of an undergraduate curriculum in Data Science, excluding general education. With projects, supporting materials in an organized structure.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published