Atharva Ingle Gladiator07

Hi 👋, I'm Atharva

Teaching machines to learn, one byte at a time

Welcome to my Profile. This is an overview of all the work I did and I'm planning to do.

Contact Me

Hit me up on any platform on my linktree or the ones mentioned below if you want to have a talk!

My Linktree

Other platforms

About Me

🔭 I am Data Scientist at Wolters Kluwer where I work mostly on Document AI and LLMs, developing end-to-end MLOPs pipelines for extracting information from PDFs.
🚀 I am an Ambassador for Weights & Biases
👯 I’m looking to collaborate on Kaggle Competitions
💪 I love contributing to open-source libraries.
⚡ Fun fact: I love to participate in Machine Learning competitions and I'm a Kaggle Competition Expert and highly active there

Recent public work

Participated in WSDM Cup - Multilingual Chatbot Arena [got a solo silver medal placing 31st out of 950 teams]: challenge was to develop a reward model (used in RLHF stage) for multilingual human conversations on the chatbot arena (formerly LMSYS). Finetuned LLMs as reward models in classification setting and used various techniques like multi-stage training (pretraining, finetuning), pseudo labelling, LoRA, QLoRA, efficient inference techniques, knowledge distillation, etc.

Talks

I was a speaker at inaugural Weights & Biases MLOps conference (Fully Connected 2023). You can listen to my talk here. [Announcement LinkedIn Post]

🏆 Achievements

Click on the competition name to go to it's overview page and click solution + code links to see the approach and source code

Competition	Placement	Organization	Code/Solution
DataSolve 2022	1st place	Wolters Kluwer	code
GIA Winner 2024	CEO's choice award winner	Wolters Kluwer
Amazon ML Challenge 2021	11th place (among 3200+ teams)	Amazon	solution + code
WSDM Cup - Multilingual Chatbot Arena	Silver medal (31/950)	Kaggle	-
U.S. Patent Phrase to Phrase Matching	Top 1%, silver medal (31/1889)	Kaggle	solution code
Bristol-Myers Squibb – Molecular Translation	Silver medal (50/874)	Kaggle	solution + code
Sartorius - Cell Instance Segmentation	Bronze medal (117/1505)	Kaggle	code
Happywhale - Whale and Dolphin Identification	Bronze medal (132/1588)	Kaggle	-
Kaggle - LLM Science Exam	Silver medal (123/2664)	Kaggle	-

🛠 My Projects

Click on the project name to directly go to it's GitHub Repository and click demo app to see a live demo of the project

QA Bot for Gradient Dissent podcast hosted by Weights & Biases: A Question-Answering bot built on top of OPENAI's LLMs and LangChain to provide summary, potential questions on the podcast and the ability to answer any question the user has on the podcast. [demo app]
Text to Image Synthesis using Attentional GANs - A PyTorch re-implementation of the paper AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adverserial Networks. [demo app]
Text Summarizer - A transformer based extractive and abstractive text summarizer supporting wide range of input formats. [demo app]
Healthify - a ML based website which predicts the disease based on the symptoms entered by the patient, also supports several disease diagnosis. [demo app]
GPU vs TPU comparison for computer vision - compared GPU and TPU for computer vision applications using Google Cloud TPUs and TFRecords data format. [results]
HARVESTIFY - a ML and DL based website which recommends the best crop to grow, fertilizers to use and the diseases caught by your crops. [demo app]
JARVIS - a simple voice assistant made using Python [demo video]

📝 Open-Source Contributions

Fixed an example script on HuggingFace 🤗 transformers repository for XLA devices - [PR]
Made the experiment trackers to launch only on main process in distributed setups on 🤗 Accelerate library - [PR]
Fixed several examples and removed the check for main process after fixing the tracker's initialization on all processes on 🤗 Accelerate library - [PR]
Update several 🤗 transformers no_trainer scripts leveraging 🤗 Accelerate to remove the check for is_main_process while initiating trackers by accelerator.init_trackers() as this issue was fixed by me in this PR - [PR]
Contributed a report to Weights & Biases showcasing the integration of MONAI and W&B - [PR]

TODO: add more contributions

✍️ Blogs

RAG techniques: From naive to advanced - a comprehensive guide to RAG techniques from basic techniques to advance like metadata filtering, query transformation, reranking, hyde, context selection and much more.
LLMs are machine learning classifiers - learn how to use LLMs like GPT for text classification. Explore prompting, fine-tuning, and when to choose LLMs over traditional machine learning classifiers.
Building a Q&A Bot for Weights & Biases' Gradient Dissent Podcast using LangChain and OpenAI - in this article, we explore how to utilize OpenAI's ChatGPT and LangChain to build a Question-Answering bot for Weights & Biases' podcast series, Gradient Dissent.
DeepMind Flamingo: A Visual Language Model for Few-Shot Learning - A Weights & Biases report explaining the paper Flamingo: A Visual Language Model for Few-Shot learning by DeepMind. This report is also featured on Two Minute Papers YouTube Channel here.
BLIP-2: A new Visual Language Model by Salesforce - this reports explores BLIP-2, a new Vision Language Model by Salesforce which beats the previous state-of-the-art Flamingo on various multimodal tasks.
SetFit: Efficient Few-Shot Learning Without Prompts - this report dives into HuggingFace, Intel Labs, and UKP Lab's recent paper on Efficient Few-Shot Learning Without Prompts. SetFit exceeds prior methods in accuracy and efficiency with just 8 labeled samples.
Is PyTorch 2.0 Faster Than PyTorch 1.13? - this report compares the PyTorch 1.3 and the newly announced PyTorch 2.0, with it's highlight being the torch.compile mode which provides significant speed gains .
3D Segmentation with MONAI and PyTorch Supercharged by Weights & Biases - A W&B report showcasing the integration of MONAI and W&B for medical imaging. Shows how to efficiently track experiments, perform error analysis, artifacts versioning and much more using the rich suite of W&B features.
HuggingFace is all you need for NLP and beyond - A tutorial/blog on HuggingFace ecosystem and how to use and customize 🤗 datasets and 🤗Trainer for all of your NLP problems.
Hugging Face Accelerate Super Charged With Weights & Biases - a Weights & Biases report to show how you can supercharge your raw PyTorch code to train on distributed systems with Hugging Face Accelerate and seamlessly integrate Weights & Biases into your workflow.
Managing and Tracking ML Experiments - An extensive blog where I've shared my experience and ways to track and manage machine learning experiments effectively for research projects and Kaggle competitions (w/ Weights & Biases and Hydra)

AI and Finance

AI and regulatory compliance in finance - explains what regulatory compliance means in simple terms, why it’s so important, and how it affects you and the financial world around us, and how the boom of AI has provided added risks and opportunities.
What is model risk management in finance? - explores Model Risk Management in finance, explaining how financial institutions identify, assess, and mitigate risks associated with increasingly complex ML models to prevent financial losses, regulatory penalties, and reputational damage.
The importance of reproducbility in finance - talks about some of the challenge of reproducibility in finance, best practices for achieving it, the tools you should prioritize, and a whole lot more.

💪 Currently working on

Kaggle competitions and Open-Source :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly