Data Science and Machine Learning projects
Description:
Creation of a relevance classifier for articles.
"Article_relevance_classification" directory contains the .ipynb file and data links (Google Drive)
Stack: Python, Jupyter Notebook, NumPy, Pandas, Scikit-learn, Matplotlib, nltk, pymorphy2, wordcloud
Description:
Implementation of BERTopic framework to create news topic model.
"BERTopic_news_clf" directory contains the source files and links
Stack: Python, Jupyter Notebook, BERTopic, Pandas, NumPy, Matplotlib, SentenceTransformers, Scikit-learn
Description:
The model identifies a digit on an input image using softmax output activation function. The accuracy of the CNN reaches about ~99%.
"CNN_MNIST_classifier" directory contains the .ipynb file
MNIST is imported from tensorflow.keras.datasets
Stack: Python, Jupyter Notebook, NumPy, Matplotlib, TensorFlow, Keras
Description:
The CNN is projected as a fire-detection solution based on flight altitude obtained thermograms analysis.
The model classifies a thermogram with values 0 (no fire) and 1 (fire). The reached accuracy of the network classification is about ~75%.
"CNN_thermogram_classifier" directory contains the .ipynb file and link to the dataset
Stack: Python, Jupyter Notebook, NumPy, Matplotlib, TensorFlow, Keras
Description:
A Streamlit application for AI image processing in the frame of the GSOM SPbU Deep Learning for Business Applications course.
The app allows you to classify images, extract texts, organize photos by categories and do many other amazing AI things.
"DLBA_final_project" directory contains the .ipynb file, the app sources and the project presentation
Stack: Python, Jupyter Notebook, Streamlit, Tesseract, OpenCV, Ultralytics, Transformers, DeepFace
Description:
Realization of semantic search using cosine distance, GigaChatEmbeddings
and Weaviate vector database.
"Embeddings_and_Similarity" directory contains source files, configs and news dataset with 1000 docs
Stack: Python, Pandas, LangChain, GigaChat SDK (GigaChain), Weaviate
Description:
A model to predict either a Titanic passenger will survive or not based on their passenger class, sex, age, amount of sibligs, amount of children and ticket price.
The solution uses the logistic regression method provided by Scikit-learn library. The reached accuracy is top 80%.
"Logistic_regression_Titanic_classifier" directory contains the .ipynb file and the dataset
Stack: Python, Jupyter Notebook, NumPy, Pandas, Scikit-learn
Description:
Realization of MapReduce algorithm to find average title lengths
for specified news sources based on open news dataset.
"Map-Reduce" directory contains the source files and data links
Stack: Python, Pandas