MLZoomcamp is an educational initiative by DataTalksClub focused on teaching machine learning and data science. The program covers various topics in machine learning, including supervised and unsupervised learning, deep learning, natural language processing, and more. MLZoomcamp often involves lectures, coding exercises, projects, and real-world applications to provide a comprehensive understanding of machine learning concepts and their practical implementations.
Project Link | Area | Project Description | Libraries |
---|---|---|---|
Exploratory Data Analysis and Linear Regression | Data Analysis & Linear Regression | Demonstrated essential data analysis techniques and linear regression using Python in a Jupyter notebook. Emphasized Pandas and NumPy for data manipulation, exploratory analysis, and linear regression implementation. Calculated linear regression weights through matrix inversion. | Pandas, NumPy |
Machine Learning for Regression | Machine Learning & Regression Analysis | Built a regression model predicting housing prices using the California Housing Prices dataset. Utilized Pandas, NumPy, Matplotlib, and Seaborn for data handling, visualization, and regression analysis. Regression by matrix inversion. | Pandas, NumPy, Matplotlib, Seaborn |
Classification with scikit-learn | Machine Learning & Classification | Focused on transforming a car price dataset into a classification problem, predicting whether a car's price is above its mean value ('above_average'). Utilized scikit-learn for classification modeling, NumPy, Pandas, Matplotlib, and Seaborn for data handling, visualization, and preprocessing. | scikit-learn, NumPy, Pandas, Matplotlib, Seaborn |
Model Evaluation | Machine Learning & Evaluation | Focused on model evaluation techniques—ROC-AUC, precision, recall, and F1 score. Transformed a car price dataset into a binary classification problem, predicting whether a car's price is above its mean value ('above_average'). Utilized scikit-learn for data prep, exploratory analysis, logistic regression, cross-validation, and hyperparameter tuning. | scikit-learn, NumPy, Pandas, Matplotlib, Seaborn |
Model Deployment with Flask, Gunicorn, and Docker | Deployment & Machine Learning | Focused on deploying a pre-trained ML model using Flask and containerizing it with Docker. Created a web service incorporating a Scikit-Learn model for predicting credit probabilities for clients. Utilized Flask, Gunicorn, and Docker for deployment. | Pipenv, Scikit-Learn, Pickle, Flask, Gunicorn, Docker |
Decision Trees and Ensemble Learning | Machine Learning & Regression Analysis | Analyzed the California Housing Prices dataset from Kaggle, focusing on predicting 'median_house_value.' Explored Decision Trees, Random Forests, and XGBoost for ensemble-based regression analysis. Implemented hyperparameter tuning for improved model performance. | pandas, numpy, matplotlib, seaborn, scikit-learn, XGBoost |
Image Classification with TensorFlow and Keras | Deep Learning & Computer Vision | Built an image classification model to differentiate between bees and wasps using TensorFlow and Keras. Leveraged the "Bee or Wasp?" dataset from Kaggle and implemented Convolutional Neural Networks (CNNs) for the classification task. Explored data augmentation through image transformations. | numpy, pandas, matplotlib, seaborn, TensorFlow, Keras |