GitHub - jina-ai/finetuner: :dart: Task-oriented embedding tuning for BERT, CLIP, etc.

Task-oriented finetuning for better embeddings on neural search

Fine-tuning is an effective way to improve performance on neural search tasks. However, setting up and performing fine-tuning can be very time-consuming and resource-intensive.

Jina AI's Finetuner makes fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure in the cloud. With Finetuner, you can easily enhance the performance of pre-trained models, making them production-ready without extensive labeling or expensive hardware.

🎏 Better embeddings: Create high-quality embeddings for semantic search, visual similarity search, cross-modal text<->image search, recommendation systems, clustering, duplication detection, anomaly detection, or other uses.

⏰ Low budget, high expectations: Bring considerable improvements to model performance, making the most out of as little as a few hundred training samples, and finish fine-tuning in as little as an hour.

📈 Performance promise: Enhance the performance of pre-trained models so that they deliver state-of-the-art performance on domain-specific applications.

🔱 Simple yet powerful: Easy access to 40+ mainstream loss functions, 10+ optimizers, layer pruning, weight freezing, dimensionality reduction, hard-negative mining, cross-modal models, and distributed training.

☁ All-in-cloud: Train using our GPU infrastructure, manage runs, experiments, and artifacts on Jina AI Cloud without worrying about resource availability, complex integration, or infrastructure costs.

Documentation

Pretrained Text Embedding Models

name	parameter	dimension	Huggingface
jina-embedding-t-en-v1	14m	312	link
jina-embedding-s-en-v1	35m	512	link
jina-embedding-b-en-v1	110m	768	link
jina-embedding-l-en-v1	330m	1024	link

Benchmarks

Model	Task	Metric	Pretrained	Finetuned	Delta
BERT	Quora Question Answering	mRR	0.835	0.967	15.8%
BERT	Quora Question Answering	Recall	0.915	0.963	5.3%
ResNet	Visual similarity search on TLL	mAP	0.110	0.196	78.2%
ResNet	Visual similarity search on TLL	Recall	0.249	0.460	84.7%
CLIP	Deep Fashion text-to-image search	mRR	0.575	0.676	17.4%
CLIP	Deep Fashion text-to-image search	Recall	0.473	0.564	19.2%
M-CLIP	Cross market product recommendation (German)	mRR	0.430	0.648	50.7%
M-CLIP	Cross market product recommendation (German)	Recall	0.247	0.340	37.7%
PointNet++	ModelNet40 3D Mesh Search	mRR	0.791	0.891	12.7%
PointNet++	ModelNet40 3D Mesh Search	Recall	0.154	0.242	57.1%

_{^{All metrics were evaluated for k@20 after training for 5 epochs using the Adam optimizer with learning rates of 1e-4 for ResNet, 1e-7 for CLIP and 1e-5 for the BERT models, 5e-4 for PointNet++}}

Install

Make sure you have Python 3.8+ installed. Finetuner can be installed via pip by executing:

pip install -U finetuner

If you want to submit a fine-tuning job on the cloud, please use

pip install "finetuner[full]"

⚠️ Starting with version 0.5.0, Finetuner computing is performed on Jina AI Cloud. The last local version is 0.4.1. This version is still available for installation via pip. See Finetuner git tags and releases.

Articles about Finetuner

Check out our published blogposts and tutorials to see Finetuner in action!

If you find Jina Embeddings useful in your research, please cite the following paper:

@misc{günther2023jina,
      title={Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models}, 
      author={Michael Günther and Louis Milliken and Jonathan Geuter and Georgios Mastrapas and Bo Wang and Han Xiao},
      year={2023},
      eprint={2307.11224},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Support

Use Discussions to talk about your use cases, questions, and support queries.
Join our Discord community and chat with other community members about ideas.
Join our Engineering All Hands meet-up to discuss your use case and learn Jina AI new features.
- When? The second Tuesday of every month
- Where? Zoom (see our public events calendar/.ical) and live stream on YouTube
Subscribe to the latest video tutorials on our YouTube channel

Join Us

Finetuner is backed by Jina AI and licensed under Apache-2.0.

We are actively hiring AI engineers and solution engineers to build the next generation of open-source AI ecosystems.

Name		Name	Last commit message	Last commit date
Latest commit History 686 Commits
.github		.github
datasets		datasets
docs		docs
finetuner		finetuner
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
codecov.yml		codecov.yml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Documentation

Pretrained Text Embedding Models

Benchmarks

Install

Articles about Finetuner

Support

Join Us

About

Releases 40

Contributors 31

Languages

License

jina-ai/finetuner

Folders and files

Latest commit

History

Repository files navigation

Documentation

Pretrained Text Embedding Models

Benchmarks

Install

Articles about Finetuner

Support

Join Us

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 40

Contributors 31

Languages