diff --git a/README.md b/README.md index c8221c7..e8b54b2 100644 --- a/README.md +++ b/README.md @@ -308,28 +308,38 @@ ### 画像+テキストからのテキスト生成 - -#### 汎用 +#### フルスクラッチ学習モデル + +**汎用** -| | アーキテクチャ / ベースのVLM | 学習画像/テキスト | 開発元 | ライセンス | +| | アーキテクチャ | 学習画像/テキスト | 開発元 | ライセンス | |:---|:---:|:---:|:---:|:---:| -| [AXCXEPT/EZO-InternVL2-26B](https://huggingface.co/AXCXEPT/EZO-InternVL2-26B) | InternVL2 | - |  Axcxept | MIT | | [llava-calm2-siglip](https://www.cyberagent.co.jp/news/detail/id=30344)
([llava-calm2-siglip](https://huggingface.co/cyberagent/llava-calm2-siglip)) | LLaVA-1.5 | MS-COCO と VisualGenome から生成された対話データ | サイバーエージェント | Apache 2.0 | -| [Llama-3-EvoVLM-JP-v2](https://sakana.ai/evovlm-jp/)
([v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2)) | - | - (Mantis-8B-SigLIP-Llama-3、Llama-3-ELYZA-JP-8B、Bunny-v1.1-Llama-3-8B-V をマージ) | Sakana AI | Llama 3 Community License | -| [AXCXEPT/Llama-3-EZO-VLM-1](https://huggingface.co/AXCXEPT/Llama-3-EZO-VLM-1) | Llama-3-EvoVLM-JP-v2 | - | Axcxept | Llama 3 Community License | -| [EvoVLM-JP](https://sakana.ai/evolutionary-model-merge-jp/)
([v1-7B](https://huggingface.co/SakanaAI/EvoVLM-JP-v1-7B)) | - | - (Shisa Gamma 7B (v1) と LLaVA-1.6-Mistral-7B をマージ) | Sakana AI | Apache 2.0 | | [Heron](https://github.com/turingmotors/heron/blob/main/docs/README_JP.md)
([blip-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0), [blip-ja-stablelm-base-7b-v1](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1), [blip-ja-stablelm-base-7b-v1-llava-620k](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1-llava-620k), [git-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-git-ja-stablelm-base-7b-v0), [git-ELYZA-fast-7b-v0](https://huggingface.co/turing-motors/heron-chat-git-ELYZA-fast-7b-v0), [git-ja-stablelm-base-7b-v1](https://huggingface.co/turing-motors/heron-chat-git-ja-stablelm-base-7b-v1)) | BLIP-2 または GIT | v1: LLaVA-Instruct-150K-JA または LLaVA-Instruct-620K-JA
v0: LLaVA-Instruct-150K-JA, Japanese STAIR Captions, Japanese Visual Genome VQA dataset | Turing | CC BY-NC 4.0 | | [Japanese Stable VLM](https://ja.stability.ai/blog/japanese-stable-vlm)
([japanese-stable-vlm](https://huggingface.co/stabilityai/japanese-stable-vlm)) | LLaVA-1.5 | Japanese CC12M, STAIR Captions, Japanese Visual Genome VQA dataset | Stability AI | STABILITY AI JAPANESE STABLE VLM COMMUNITY LICENSE | | [Japanese InstructBLIP Alpha](https://ja.stability.ai/blog/japanese-instructblip-alpha)
([japanese-instructblip-alpha](https://huggingface.co/stabilityai/japanese-instructblip-alpha)) | InstructBLIP | Japanese CC12M, STAIR Captions, Japanese Visual Genome VQA dataset | Stability AI | JAPANESE STABLELM RESEARCH LICENSE | | [rinna MiniGPT-4](https://rinna.co.jp/news/2023/07/20230731.html)
([bilingual-gpt-neox-4b-minigpt4](https://huggingface.co/rinna/bilingual-gpt-neox-4b-minigpt4)) | MiniGPT-4 | CC12M, COCO 2014, Visual Genome, STAIR Captions, Japanese Visual Genome VQA dataset | rinna | MIT | - -#### ドメイン特化型 +**ドメイン特化型** | | アーキテクチャ | ドメイン | 開発元 | ライセンス | |:---|:---:|:---:|:---:|:---:| | [watashiha/Watashiha-Llama-2-13B-Ogiri-sft-vlm](https://huggingface.co/watashiha/Watashiha-Llama-2-13B-Ogiri-sft-vlm) | LLaVA | 大喜利 | わたしは | Llama 2 Community License | +#### 海外モデルに日本語で追加学習を行ったモデル + +| | ベースのVLM | 学習画像/テキスト | 開発元 | ライセンス | +|:---|:---:|:---:|:---:|:---:| +| [AXCXEPT/EZO-InternVL2-26B](https://huggingface.co/AXCXEPT/EZO-InternVL2-26B) | InternVL2 | - |  Axcxept | MIT | + +#### 複数のVLM・LLMをマージして作成されたモデル + +| | マージ元のLLM・VLM(太字は日本語LLM) | 開発元 | ライセンス | +|:---|:---:|:---:|:---:| +| [Llama-3-EvoVLM-JP-v2](https://sakana.ai/evovlm-jp/)
([v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2)) | Mantis-8B-SigLIP-Llama-3, **Llama-3-ELYZA-JP-8B**, Bunny-v1.1-Llama-3-8B-V | Sakana AI | Llama 3 Community License | +| [AXCXEPT/Llama-3-EZO-VLM-1](https://huggingface.co/AXCXEPT/Llama-3-EZO-VLM-1) | - (Llama-3-EvoVLM-JP-v2 に対して追加学習) | Axcxept | Llama 3 Community License | +| [EvoVLM-JP](https://sakana.ai/evolutionary-model-merge-jp/)
([v1-7B](https://huggingface.co/SakanaAI/EvoVLM-JP-v1-7B)) | **Shisa Gamma 7B (v1)**, LLaVA-1.6-Mistral-7B | Sakana AI | Apache 2.0 | + ### テキストからの画像生成 diff --git a/en/README.md b/en/README.md index 34d39bb..2844911 100644 --- a/en/README.md +++ b/en/README.md @@ -306,28 +306,38 @@ Please point out any errors on the [issues page](https://github.com/llm-jp/aweso ### Text+Image to Text - -#### General Purpose +#### Models built from scratch + +**General purpose** -| | Architecture / Base Model | Training Data | Developer | License | +| | Architecture | Training Data | Developer | License | |:---|:---:|:---:|:---:|:---:| -| [AXCXEPT/EZO-InternVL2-26B](https://huggingface.co/AXCXEPT/EZO-InternVL2-26B) | InternVL2 | - |  Axcxept | MIT | | [llava-calm2-siglip](https://www.cyberagent.co.jp/news/detail/id=30344)
([llava-calm2-siglip](https://huggingface.co/cyberagent/llava-calm2-siglip)) | LLaVA-1.5 | coversational data generated from MS-COCO and VisualGenome | CyberAgent | Apache 2.0 | -| [Llama-3-EvoVLM-JP-v2](https://sakana.ai/evovlm-jp/)
([v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2)) | - | - (merged from Mantis-8B-SigLIP-Llama-3, Llama-3-ELYZA-JP-8B, and Bunny-v1.1-Llama-3-8B-V) | Sakana AI | Llama 3 Community License | -| [AXCXEPT/Llama-3-EZO-VLM-1](https://huggingface.co/AXCXEPT/Llama-3-EZO-VLM-1) | Llama-3-EvoVLM-JP-v2 | - | Axcxept | Llama 3 Community License | -| [EvoVLM-JP](https://sakana.ai/evolutionary-model-merge)
([v1-7B](https://huggingface.co/SakanaAI/EvoVLM-JP-v1-7B)) | - | - (merged from Shisa Gamma 7B (v1) and LLaVA-1.6-Mistral-7B) | Sakana AI | Apache 2.0 | | [Heron](https://github.com/turingmotors/heron)
([blip-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0), [blip-ja-stablelm-base-7b-v1](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1), [blip-ja-stablelm-base-7b-v1-llava-620k](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1-llava-620k), [git-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-git-ja-stablelm-base-7b-v0), [git-ELYZA-fast-7b-v0](https://huggingface.co/turing-motors/heron-chat-git-ELYZA-fast-7b-v0), [git-ja-stablelm-base-7b-v1](https://huggingface.co/turing-motors/heron-chat-git-ja-stablelm-base-7b-v1)) | BLIP-2 / GIT | v1: LLaVA-Instruct-150K-JA or LLaVA-Instruct-620K-JA
v0: LLaVA-Instruct-150K-JA, Japanese STAIR Captions, Japanese Visual Genome VQA dataset | Turing | CC BY-NC 4.0 | | [Japanese Stable VLM](https://ja.stability.ai/blog/japanese-stable-vlm)
([japanese-stable-vlm](https://huggingface.co/stabilityai/japanese-stable-vlm)) | LLaVA-1.5 | Japanese CC12M, STAIR Captions, Japanese Visual Genome VQA dataset | Stability AI | STABILITY AI JAPANESE STABLE VLM COMMUNITY LICENSE | | [Japanese InstructBLIP Alpha](https://ja.stability.ai/blog/japanese-instructblip-alpha)
([japanese-instructblip-alpha](https://huggingface.co/stabilityai/japanese-instructblip-alpha)) | InstructBLIP | Japanese CC12M, STAIR Captions, Japanese Visual Genome VQA dataset | Stability AI | JAPANESE STABLELM RESEARCH LICENSE | | [rinna MiniGPT-4](https://rinna.co.jp/news/2023/07/20230731.html)
([bilingual-gpt-neox-4b-minigpt4](https://huggingface.co/rinna/bilingual-gpt-neox-4b-minigpt4)) | MiniGPT-4 | CC12M, COCO 2014, Visual Genome, STAIR Captions, Japanese Visual Genome VQA dataset | rinna | MIT | - -#### Domain Specific +**Domain Specific** | | Architecture | Domain | Developer | License | |:---|:---:|:---:|:---:|:---:| | [watashiha/Watashiha-Llama-2-13B-Ogiri-sft-vlm](https://huggingface.co/watashiha/Watashiha-Llama-2-13B-Ogiri-sft-vlm) | LLaVA | [Oogiri](https://en.wikipedia.org/wiki/Glossary_of_owarai_terms#oogiri) | Watashiha | Llama 2 Community License | +#### Models built off non-Japanese VLMs + +| | Base Model | Training Data | Developer | License | +|:---|:---:|:---:|:---:|:---:| +| [AXCXEPT/EZO-InternVL2-26B](https://huggingface.co/AXCXEPT/EZO-InternVL2-26B) | InternVL2 | - |  Axcxept | MIT | + +#### Merged models + +| | Original Models (Japanese LLMs in bold) | Developer | License | +|:---|:---:|:---:|:---:| +| [Llama-3-EvoVLM-JP-v2](https://sakana.ai/evovlm-jp/)
([v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2)) | Mantis-8B-SigLIP-Llama-3, **Llama-3-ELYZA-JP-8B**, Bunny-v1.1-Llama-3-8B-V | Sakana AI | Llama 3 Community License | +| [AXCXEPT/Llama-3-EZO-VLM-1](https://huggingface.co/AXCXEPT/Llama-3-EZO-VLM-1) | - (trained from Llama-3-EvoVLM-JP-v2) | Axcxept | Llama 3 Community License | +| [EvoVLM-JP](https://sakana.ai/evolutionary-model-merge)
([v1-7B](https://huggingface.co/SakanaAI/EvoVLM-JP-v1-7B)) | **Shisa Gamma 7B (v1)**, LLaVA-1.6-Mistral-7B | Sakana AI | Apache 2.0 | + ### Text to Image diff --git a/fr/README.md b/fr/README.md index cb55a2f..f14af9c 100644 --- a/fr/README.md +++ b/fr/README.md @@ -306,28 +306,38 @@ N'hésitez pas à signaler les erreurs sur la page [issues](https://github.com/l ### Text+Image vers Text - -#### D'usage général +#### Modèles développés à partir de zéro + +**D'usage général** -| | Architecture / Base du Model | Données d'entraînement | Développeur | Licence | +| | Architecture | Données d'entraînement | Développeur | Licence | |:---|:---:|:---:|:---:|:---:| -| [AXCXEPT/EZO-InternVL2-26B](https://huggingface.co/AXCXEPT/EZO-InternVL2-26B) | InternVL2 | - |  Axcxept | MIT | -| [llava-calm2-siglip](https://www.cyberagent.co.jp/news/detail/id=30344)
([llava-calm2-siglip](https://huggingface.co/cyberagent/llava-calm2-siglip)) | LLaVA-1.5 | coversational data generated from MS-COCO and VisualGenome | CyberAgent | Apache 2.0 | -| [Llama-3-EvoVLM-JP-v2](https://sakana.ai/evovlm-jp/)
([v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2)) | - | - (merged from Mantis-8B-SigLIP-Llama-3, Llama-3-ELYZA-JP-8B, and Bunny-v1.1-Llama-3-8B-V) | Sakana AI | Llama 3 Community License | -| [AXCXEPT/Llama-3-EZO-VLM-1](https://huggingface.co/AXCXEPT/Llama-3-EZO-VLM-1) | Llama-3-EvoVLM-JP-v2 | - | Axcxept | Llama 3 Community License | -| [EvoVLM-JP](https://sakana.ai/evolutionary-model-merge)
([v1-7B](https://huggingface.co/SakanaAI/EvoVLM-JP-v1-7B)) | - | - (merged from Shisa Gamma 7B (v1) and LLaVA-1.6-Mistral-7B) | Sakana AI | Apache 2.0 | -| [Heron](https://github.com/turingmotors/heron)
([blip-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0), [blip-ja-stablelm-base-7b-v1](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1), [blip-ja-stablelm-base-7b-v1-llava-620k](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1-llava-620k), [git-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-git-ja-stablelm-base-7b-v0), [git-ELYZA-fast-7b-v0](https://huggingface.co/turing-motors/heron-chat-git-ELYZA-fast-7b-v0), [git-ja-stablelm-base-7b-v1](https://huggingface.co/turing-motors/heron-chat-git-ja-stablelm-base-7b-v1)) | BLIP-2 / GIT | v1: LLaVA-Instruct-150K-JA or LLaVA-Instruct-620K-JA
v0: LLaVA-Instruct-150K-JA, Japanese STAIR Captions, Japanese Visual Genome VQA dataset | Turing | CC BY-NC 4.0 | +| [llava-calm2-siglip](https://www.cyberagent.co.jp/news/detail/id=30344)
([llava-calm2-siglip](https://huggingface.co/cyberagent/llava-calm2-siglip)) | LLaVA-1.5 | coversational data generated from MS-COCO and VisualGenome | CyberAgent | Apache 2.0 | +[Heron](https://github.com/turingmotors/heron)
([blip-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0), [blip-ja-stablelm-base-7b-v1](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1), [blip-ja-stablelm-base-7b-v1-llava-620k](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1-llava-620k), [git-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-git-ja-stablelm-base-7b-v0), [git-ELYZA-fast-7b-v0](https://huggingface.co/turing-motors/heron-chat-git-ELYZA-fast-7b-v0), [git-ja-stablelm-base-7b-v1](https://huggingface.co/turing-motors/heron-chat-git-ja-stablelm-base-7b-v1)) | BLIP-2 / GIT | v1: LLaVA-Instruct-150K-JA or LLaVA-Instruct-620K-JA
v0: LLaVA-Instruct-150K-JA, Japanese STAIR Captions, Japanese Visual Genome VQA dataset | Turing | CC BY-NC 4.0 | | [Japanese Stable VLM](https://ja.stability.ai/blog/japanese-stable-vlm)
([japanese-stable-vlm](https://huggingface.co/stabilityai/japanese-stable-vlm)) | LLaVA-1.5 | Japanese CC12M, STAIR Captions, jeu de données Japanese Visual Genome VQA | Stability AI | STABILITY AI JAPANESE STABLE VLM COMMUNITY LICENSE | | [Japanese InstructBLIP Alpha](https://ja.stability.ai/blog/japanese-instructblip-alpha)
([japanese-instructblip-alpha](https://huggingface.co/stabilityai/japanese-instructblip-alpha)) | InstructBLIP | Japanese CC12M, STAIR Captions, jeu de données Japanese Visual Genome VQA | Stability AI | JAPANESE STABLELM RESEARCH LICENSE | | [rinna MiniGPT-4](https://rinna.co.jp/news/2023/07/20230731.html)
([bilingual-gpt-neox-4b-minigpt4](https://huggingface.co/rinna/bilingual-gpt-neox-4b-minigpt4)) | MiniGPT-4 | CC12M, COCO 2014, Visual Genome, STAIR Captions, Japanese Visual Genome VQA dataset | rinna | MIT | - -#### Spécifique à un domaine +**Spécifique à un domaine** | | Architecture | Domaine | Développeur | Licence | |:---|:---:|:---:|:---:|:---:| | [watashiha/Watashiha-Llama-2-13B-Ogiri-sft-vlm](https://huggingface.co/watashiha/Watashiha-Llama-2-13B-Ogiri-sft-vlm) | LLaVA | [Oogiri](https://en.wikipedia.org/wiki/Glossary_of_owarai_terms#oogiri) | Watashiha | Llama 2 Community License | +#### Modèles développés à partir d'VLM non-japonais + +| | Base du Model | Données d'entraînement | Développeur | Licence | +|:---|:---:|:---:|:---:|:---:| +| [AXCXEPT/EZO-InternVL2-26B](https://huggingface.co/AXCXEPT/EZO-InternVL2-26B) | InternVL2 | - |  Axcxept | MIT | + +#### Modèles fusionnés + +| | Modèles originaux (LLMs japonais en gras) | Développeur | Licence | +|:---|:---:|:---:|:---:| +| [Llama-3-EvoVLM-JP-v2](https://sakana.ai/evovlm-jp/)
([v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2)) | Mantis-8B-SigLIP-Llama-3, **Llama-3-ELYZA-JP-8B**, Bunny-v1.1-Llama-3-8B-V | Sakana AI | Llama 3 Community License | +| [AXCXEPT/Llama-3-EZO-VLM-1](https://huggingface.co/AXCXEPT/Llama-3-EZO-VLM-1) | (trained from Llama-3-EvoVLM-JP-v2) | Axcxept | Llama 3 Community License | +| [EvoVLM-JP](https://sakana.ai/evolutionary-model-merge)
([v1-7B](https://huggingface.co/SakanaAI/EvoVLM-JP-v1-7B)) | **Shisa Gamma 7B (v1)**, LLaVA-1.6-Mistral-7B | Sakana AI | Apache 2.0 | + ### Text vers Image