BioInstruct Fine-tuning Experiment

Personal experiment fine-tuning Llama-3.2-1B-Instruct on the BioInstruct dataset. This was just me testing out LoRA fine-tuning and seeing how a small model performs on biomedical questions.

What's Here

Two Jupyter notebooks documenting my experiment:

Fine-tuning - Testing LoRA fine-tuning on biomedical data
Merging - Merging the LoRA adapters with the base model

What I Tried

Used Unsloth for faster training on Google Colab's free T4 GPU
LoRA fine-tuning (only training 2.83% of parameters)
Sequence packing for efficiency
Trained on 15K samples in about 32 minutes

Experiment Details

1. BioInstruct_Finetuning_FINAL.ipynb

My main fine-tuning experiment on 15,000 samples from the BioInstruct dataset.

What I Used

Parameter	Value
Base Model	unsloth/Llama-3.2-1B-Instruct
Dataset	bio-nlp-umass/bioinstruct (15K subset)
Sequence Length	1024
Batch Size	4 (per device)
Gradient Accumulation	4
Epochs	1
Training Time	~32-40 minutes
Learning Rate	2e-4
LoRA r	32
LoRA alpha	64
LoRA dropout	0.05

Results

Final Loss: 1.1254
Memory Usage: 9.7 GB
Trainable Parameters: 22,544,384 (2.83%)
Training Time: 32.3 minutes

2. Merge_LoRA.ipynb

After training, I merged the LoRA adapters with the base model to create a standalone model.

Experiment Outputs

The models I created during this experiment:

LoRA Adapters: daffakautsar/bioinstruct-llama3.2-1b-lora
Merged Model: daffakautsar/bioinstruct-llama3.2-1b-merged

How It Performed

Some test outputs from the fine-tuned model:

Q: Explain how Metformin works in simple terms.

Metformin is a medication used to treat type 2 diabetes. It works by lowering the amount of sugar produced by your liver and increasing the sensitivity of your muscles to insulin, which helps your body use the insulin it makes to control blood sugar levels.

Q: Simplify this medical term for a patient: Myocardial infarction

Heart attack

Q: What is the difference between Type 1 and Type 2 diabetes?

Type 1 diabetes is an autoimmune condition where the body's immune system attacks and destroys insulin-producing cells in the pancreas. Type 2 diabetes is a metabolic disorder where the body becomes resistant to insulin or doesn't produce enough insulin to regulate blood sugar levels.

Dataset Used

I used the BioInstruct dataset - took 15,000 samples randomly and split them for train/validation.

Tools & Credits

Unsloth - Made training way faster
BioInstruct Dataset - The dataset I used
Meta AI - For the Llama 3.2 base model

Just a personal learning experiment by Daffa Kautsar - GitHub | Hugging Face

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
BioInstruct_Finetuning_FINAL.ipynb		BioInstruct_Finetuning_FINAL.ipynb
Merge_LoRA.ipynb		Merge_LoRA.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioInstruct Fine-tuning Experiment

What's Here

What I Tried

Experiment Details

1. BioInstruct_Finetuning_FINAL.ipynb

What I Used

Results

2. Merge_LoRA.ipynb

Experiment Outputs

How It Performed

Dataset Used

Tools & Credits

About

Uh oh!

Releases

Packages

Languages

ZeeetOne/bioinstruct-finetuning-experiment

Folders and files

Latest commit

History

Repository files navigation

BioInstruct Fine-tuning Experiment

What's Here

What I Tried

Experiment Details

1. BioInstruct_Finetuning_FINAL.ipynb

What I Used

Results

2. Merge_LoRA.ipynb

Experiment Outputs

How It Performed

Dataset Used

Tools & Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages