GitHub - InftyAI/Manta: 💫 A lightweight p2p-based cache system for model distributions on Kubernetes. 🌟 Star to support our work!

A lightweight P2P-based cache system for model distributions on Kubernetes.

Name Story: the inspiration of the name Manta is coming from Dota2, called Manta Style, which will create 2 images of your hero just like peers in the P2P network.

Architecture

Note: llmaz is just one kind of integrations, Manta can be deployed and used independently.

Features Overview

Model Hub Support: Models could be downloaded directly from model hubs (Huggingface etc.) or object storages, no other effort.
Model Preheat: Models could be preloaded to clusters, or specified nodes to accelerate the model serving.
Model Cache: Models will be cached as chunks after downloading for faster model loading.
Model Lifecycle Management: Model lifecycle is managed automatically with different strategies, like Retain or Delete.
Plugin Framework: Filter and Score plugins could be extended to pick up the best candidates.
Memory Management(WIP): Manage the reserved memories for caching, together with LRU algorithm for GC.

You Should Know Before

Manta is not an all-in-one solution for model management, instead, it offers a lightweight solution to utilize the idle bandwidth and cost-effective disk, helping you save money.
It requires no additional components like databases or storage systems, simplifying setup and reducing effort.
All the models will be stored under the host path of /mnt/models/
After all, it's just a cache system.

Quick Start

Installation

Read the Installation for guidance.

Preheat Models

A sample to preload the Qwen/Qwen2.5-0.5B-Instruct model:

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct

If you want to preload the model to specified nodes, use the NodeSelector:

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct
  nodeSelector:
    zone: zone-a

Delete Models

If you want to remove the model weights once Torrent is deleted, set the ReclaimPolicy=Delete, default to Retain:

apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
  name: torrent-sample
spec:
  hub:
    name: Huggingface
    repoID: Qwen/Qwen2.5-0.5B-Instruct
  reclaimPolicy: Delete

More details refer to the APIs.

Roadmap

In the long term, we hope to make Manta an unified cache system within MLOps.

Preloading datasets from model hubs
RDMA support for faster model loading
More integrations with MLOps system, including training and serving

Community

Join us for more discussions:

Slack Channel: #manta

Contributions

All kinds of contributions are welcomed ! Please following CONTRIBUTING.md.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
agent		agent
api		api
cmd		cmd
config		config
docs		docs
hack		hack
pkg		pkg
test		test
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.agent		Dockerfile.agent
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
PROJECT		PROJECT
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A lightweight P2P-based cache system for model distributions on Kubernetes.

Architecture

Features Overview

You Should Know Before

Quick Start

Installation

Preheat Models

Delete Models

Roadmap

Community

Contributions

About

Releases 3

Packages

Contributors 2

Languages

License

InftyAI/Manta

Folders and files

Latest commit

History

Repository files navigation

A lightweight P2P-based cache system for model distributions on Kubernetes.

Architecture

Features Overview

You Should Know Before

Quick Start

Installation

Preheat Models

Delete Models

Roadmap

Community

Contributions

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages