Hit me up on any platform on my linktree or the ones mentioned below if you want to have a talk!
- π I am Data Scientist at Wolters Kluwer where I work mostly on Document AI and LLMs, developing end-to-end MLOPs pipelines for extracting information from PDFs.
- π I am an Ambassador for Weights & Biases
- π― Iβm looking to collaborate on Kaggle Competitions
- πͺ I love contributing to open-source libraries.
- β‘ Fun fact: I love to participate in Machine Learning competitions and I'm a Kaggle Competition Expert and highly active there
- Participated in WSDM Cup - Multilingual Chatbot Arena [got a solo silver medal placing 31st out of 950 teams]: challenge was to develop a reward model (used in RLHF stage) for multilingual human conversations on the chatbot arena (formerly LMSYS). Finetuned LLMs as reward models in classification setting and used various techniques like multi-stage training (pretraining, finetuning), pseudo labelling, LoRA, QLoRA, efficient inference techniques, knowledge distillation, etc.
I was a speaker at inaugural Weights & Biases MLOps conference (Fully Connected 2023). You can listen to my talk here. [Announcement LinkedIn Post]
Click on the competition name to go to it's overview page and click solution + code links to see the approach and source code
Competition | Placement | Organization | Code/Solution |
---|---|---|---|
DataSolve 2022 | 1st place | Wolters Kluwer | code |
GIA Winner 2024 | CEO's choice award winner | Wolters Kluwer | |
Amazon ML Challenge 2021 | 11th place (among 3200+ teams) | Amazon | solution + code |
WSDM Cup - Multilingual Chatbot Arena | Silver medal (31/950) | Kaggle | - |
U.S. Patent Phrase to Phrase Matching | Top 1%, silver medal (31/1889) | Kaggle | solution code |
Bristol-Myers Squibb β Molecular Translation | Silver medal (50/874) | Kaggle | solution + code |
Sartorius - Cell Instance Segmentation | Bronze medal (117/1505) | Kaggle | code |
Happywhale - Whale and Dolphin Identification | Bronze medal (132/1588) | Kaggle | - |
Kaggle - LLM Science Exam | Silver medal (123/2664) | Kaggle | - |
Click on the project name to directly go to it's GitHub Repository and click demo app to see a live demo of the project
- QA Bot for Gradient Dissent podcast hosted by Weights & Biases: A Question-Answering bot built on top of OPENAI's LLMs and LangChain to provide summary, potential questions on the podcast and the ability to answer any question the user has on the podcast. [demo app]
- Text to Image Synthesis using Attentional GANs - A PyTorch re-implementation of the paper AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adverserial Networks. [demo app]
- Text Summarizer - A transformer based extractive and abstractive text summarizer supporting wide range of input formats. [demo app]
- Healthify - a ML based website which predicts the disease based on the symptoms entered by the patient, also supports several disease diagnosis. [demo app]
- GPU vs TPU comparison for computer vision - compared GPU and TPU for computer vision applications using Google Cloud TPUs and TFRecords data format. [results]
- HARVESTIFY - a ML and DL based website which recommends the best crop to grow, fertilizers to use and the diseases caught by your crops. [demo app]
- JARVIS - a simple voice assistant made using Python [demo video]
- Fixed an example script on HuggingFace π€ transformers repository for XLA devices - [PR]
- Made the experiment trackers to launch only on main process in distributed setups on π€ Accelerate library - [PR]
- Fixed several examples and removed the check for main process after fixing the tracker's initialization on all processes on π€ Accelerate library - [PR]
- Update several π€ transformers
no_trainer
scripts leveraging π€ Accelerate to remove the check foris_main_process
while initiating trackers byaccelerator.init_trackers()
as this issue was fixed by me in this PR - [PR] - Contributed a report to Weights & Biases showcasing the integration of MONAI and W&B - [PR]
TODO: add more contributions
- RAG techniques: From naive to advanced - a comprehensive guide to RAG techniques from basic techniques to advance like metadata filtering, query transformation, reranking, hyde, context selection and much more.
- LLMs are machine learning classifiers - learn how to use LLMs like GPT for text classification. Explore prompting, fine-tuning, and when to choose LLMs over traditional machine learning classifiers.
- Building a Q&A Bot for Weights & Biases' Gradient Dissent Podcast using LangChain and OpenAI - in this article, we explore how to utilize OpenAI's ChatGPT and LangChain to build a Question-Answering bot for Weights & Biases' podcast series, Gradient Dissent.
- DeepMind Flamingo: A Visual Language Model for Few-Shot Learning - A Weights & Biases report explaining the paper Flamingo: A Visual Language Model for Few-Shot learning by DeepMind. This report is also featured on Two Minute Papers YouTube Channel here.
- BLIP-2: A new Visual Language Model by Salesforce - this reports explores BLIP-2, a new Vision Language Model by Salesforce which beats the previous state-of-the-art Flamingo on various multimodal tasks.
- SetFit: Efficient Few-Shot Learning Without Prompts - this report dives into HuggingFace, Intel Labs, and UKP Lab's recent paper on Efficient Few-Shot Learning Without Prompts. SetFit exceeds prior methods in accuracy and efficiency with just 8 labeled samples.
- Is PyTorch 2.0 Faster Than PyTorch 1.13? - this report compares the PyTorch 1.3 and the newly announced PyTorch 2.0, with it's highlight being the torch.compile mode which provides significant speed gains .
- 3D Segmentation with MONAI and PyTorch Supercharged by Weights & Biases - A W&B report showcasing the integration of MONAI and W&B for medical imaging. Shows how to efficiently track experiments, perform error analysis, artifacts versioning and much more using the rich suite of W&B features.
- HuggingFace is all you need for NLP and beyond - A tutorial/blog on HuggingFace ecosystem and how to use and customize π€
datasets
and π€Trainer
for all of your NLP problems. - Hugging Face Accelerate Super Charged With Weights & Biases - a Weights & Biases report to show how you can supercharge your raw PyTorch code to train on distributed systems with Hugging Face Accelerate and seamlessly integrate Weights & Biases into your workflow.
- Managing and Tracking ML Experiments - An extensive blog where I've shared my experience and ways to track and manage machine learning experiments effectively for research projects and Kaggle competitions (w/ Weights & Biases and Hydra)
- AI and regulatory compliance in finance - explains what regulatory compliance means in simple terms, why itβs so important, and how it affects you and the financial world around us, and how the boom of AI has provided added risks and opportunities.
- What is model risk management in finance? - explores Model Risk Management in finance, explaining how financial institutions identify, assess, and mitigate risks associated with increasingly complex ML models to prevent financial losses, regulatory penalties, and reputational damage.
- The importance of reproducbility in finance - talks about some of the challenge of reproducibility in finance, best practices for achieving it, the tools you should prioritize, and a whole lot more.
- Kaggle competitions and Open-Source :)