add Sarashina2-8x70B

llm-jp · Nov 10, 2024 · 957192a · 957192a
1 parent 2b965dc
commit 957192a
Show file tree

Hide file tree

Showing 7 changed files with 23 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -36,6 +36,7 @@
 
 |    |  アーキテクチャ  |  入出力で扱える<br>トークン数  |  学習テキスト  |  開発元  | ライセンス / 利用規約 |
 |:---|:---:|:---:|:---:|:---:|:---:|
+| [Sarashina2-8x70B](https://www.sbintuitions.co.jp/news/press/20241108_01/) | Mixtral<br>([8x70b (**465b**)](https://huggingface.co/sbintuitions/sarashina2-8x70b)) | 8,192 | 不明 | SB Intuitions | Sarashina Model NonCommercial License |
 | [LLM-jp-3 172B beta1](https://www.nii.ac.jp/news/release/2024/0917.html) | Llama<br>([**172b**-beta1](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1), [**172b**-beta1-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1-instruct)) | 4,096 | 事前学習: [llm-jp-corpus-v3](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)の一部<br>(計 **0.7T** トークン)<br>Instruction Tuning: [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/), [answer-carefully](https://liat-aip.sakura.ne.jp/wp/answercarefully-dataset/), Dolly Dataset, OASST1, OASST2, Aya Dataset, 	ichikara-instruction-format, Daring-Anteater, FLAN | 大規模言語モデル研究開発センター (LLMC) | LLM-jp-3 172B beta1 Terms of Use |
 | [LLM-jp-3 172B alpha](https://llmc.nii.ac.jp/topics/llm-jp-3-172b-alpha1-alpha2/) | Llama<br>([**172b**-alpha1](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha1), [**172b**-alpha1-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha1-instruct), [**172b**-alpha2](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha2), [**172b**-alpha2-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha2-instruct)) | 4,096 | 事前学習: [llm-jp-corpus-v3](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)の一部<br>(alpha1: 計 **0.7T** トークン, alpha2: 計 **1.4T** トークン)<br>Instruction Tuning: [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/), [answer-carefully](https://liat-aip.sakura.ne.jp/wp/answercarefully-dataset/), Dolly Dataset, OASST1, OASST2, Aya Dataset, 	ichikara-instruction-format, Daring-Anteater, FLAN | 大規模言語モデル研究開発センター (LLMC) | Apache 2.0 |
 | [Stockmark-100b](https://stockmark.co.jp/news/20240516) | Llama<br>([**100b**](https://huggingface.co/stockmark/stockmark-100b), [**100b**-instruct-v0.1](https://huggingface.co/stockmark/stockmark-100b-instruct-v0.1)) | 4,096 | 事前学習: RedPajama, 日本語 Wikipedia, Japanese mC4, Japanese CommonCrawl, 日本語特許, Stockmark Web Corpus<br>(計 **910B** トークン)<br>Instruction Tuning (LoRA): [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/) | ストックマーク | MIT |

diff --git a/en/README.md b/en/README.md
@@ -35,6 +35,7 @@ Please point out any errors on the [issues page](https://github.com/llm-jp/aweso
 
 |    |  Architecture  |  Max Context Length  |  Training Data  |  Developer  | License / Terms of Use |
 |:---|:---:|:---:|:---:|:---:|:---:|
+| [Sarashina2-8x70B](https://www.sbintuitions.co.jp/news/press/20241108_01/) | Mixtral<br>([8x70b (**465b**)](https://huggingface.co/sbintuitions/sarashina2-8x70b)) | 8,192 | undisclosed | SB Intuitions | Sarashina Model NonCommercial License |
 | [LLM-jp-3 172B beta1](https://www.nii.ac.jp/en/news/release/2024/0917.html) | Llama<br>([**172b**-beta1](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1), [**172b**-beta1-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1-instruct)) | 4,096 | Pre-training: part of [llm-jp-corpus-v3](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)<br>(**0.7T** tokens)<br>Instruction Tuning: [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/), [answer-carefully](https://liat-aip.sakura.ne.jp/wp/answercarefully-dataset/), Dolly Dataset, OASST1, OASST2, Aya Dataset, 	ichikara-instruction-format, Daring-Anteater, FLAN | Research and Development Center for Large Language Models (LLMC) | LLM-jp-3 172B beta1 Terms of Use |
 | [LLM-jp-3 172B alpha](https://llmc.nii.ac.jp/en/topics/llm-jp-3-172b-alpha1-alpha2/) | Llama<br>([**172b**-alpha1](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha1), [**172b**-alpha1-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha1-instruct), [**172b**-alpha2](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha2), [**172b**-alpha2-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha2-instruct)) | 4,096 | Pre-training: part of [llm-jp-corpus-v3](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)<br>(alpha1: **0.7T** tokens, alpha2: **1.4T** tokens)<br>Instruction Tuning: [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/), [answer-carefully](https://liat-aip.sakura.ne.jp/wp/answercarefully-dataset/), Dolly Dataset, OASST1, OASST2, Aya Dataset, 	ichikara-instruction-format, Daring-Anteater, FLAN | Research and Development Center for Large Language Models (LLMC) | Apache 2.0 |
 | [Stockmark-100b](https://huggingface.co/stockmark/stockmark-100b) | Llama<br>([**100b**](https://huggingface.co/stockmark/stockmark-100b), [**100b**-instruct-v0.1](https://huggingface.co/stockmark/stockmark-100b-instruct-v0.1)) | 4,096 | Pre-training: RedPajama, Japanese Wikipedia, Japanese mC4, Japanese CommonCrawl, Japanese Patent, Stockmark Web Corpus<br>(**910B** tokens)<br>Instruction Tuning (LoRA): [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/) | Stockmark | MIT |

diff --git a/figures/parameter_size_overview_en.png b/figures/parameter_size_overview_en.png
diff --git a/figures/parameter_size_overview_ja.png b/figures/parameter_size_overview_ja.png
diff --git a/figures/scripts/parameter_size_overview.csv b/figures/scripts/parameter_size_overview.csv
@@ -1,5 +1,22 @@
 Model,Lab,Parameters(B),Announced,Type
+Hunyuan-Large,Tencent,389.0,2024/11/01,EN-available
+SEA-LIONv3,AI Singapore,9.24,2024/11/01,EN-available
+AMD OLMo,AMD,1.0,2024/11/01,EN-available
+SmolLM2,Hugging Face,1.7,2024/11/01,EN-available
+Aya-Expanse-32B,Cohere,32.0,2024/10/01,EN-available
+Claude 3.5 Sonnet (new),Anthropic,,2024/10/01,EN-available
+Granite 3.0 8B,IBM,8.0,2024/10/01,EN-available
+Granite-3.0-3B-A800M-Instruct,IBM,3.0,2024/10/01,EN-available
+aiXcoder-7B,aiXcoder,7.0,2024/10/01,EN-available
+Llama-3.1-Nemotron-70B,NVIDIA,70.0,2024/10/01,EN-available
+Ministral 8B,Mistral,8.0,2024/10/01,EN-available
+Yi-Lightning,01-ai,200.0,2024/10/01,EN-available
+Zamba2-7B,Zyphra,7.0,2024/10/01,EN-available
+nGPT,NVIDIA,1.0,2024/10/01,EN-available
+Inflection-3 Pi (3.0),Inflection AI,1200.0,2024/10/01,EN-available
+Inflection-3 Productivity (3.0),Inflection AI,1200.0,2024/10/01,EN-available
 LFM-40B,Liquid AI,40.0,2024/09/01,EN-available
+Emu3,BAAI,8.0,2024/09/01,EN-available
 NLVM 1.0,NVIDIA,72.0,2024/09/01,EN-available
 TeleChat2-115B,China Telecom Artificial Intelligence Research Institute,115.0,2024/09/01,EN-available
 AMD-Llama-135m,AMD,0.135,2024/09/01,EN-available
@@ -139,6 +156,7 @@ Reka Flash,Reka AI,21.0,2024/02/01,EN-available
 Gemma,Google DeepMind,7.0,2024/02/01,EN-available
 Gemini 1.5 Pro,Google DeepMind,1500.0,2024/02/01,EN-available
 Qwen-1.5 72B,Alibaba,72.0,2024/02/01,EN-available
+MobileLLM,Meta AI,1.0,2024/02/01,EN-available
 GOODY-2,BRAIN,,2024/02/01,EN-available
 Natural-SQL-7B,ChatDB,7.0,2024/02/01,EN-available
 Sea-Lion,AI Singapore,7.5,2024/02/01,EN-available
@@ -319,6 +337,7 @@ ruGPT-3,Huawei/Sberbank,1.3,2021/02/01,EN-available
 Switch,Google,1600.0,2021/01/01,EN-available
 GPT-3,OpenAI,175.0,2020/05/01,EN-available
 SFR-LLaMA-3.1-70B-Judge,Salesforce,70.0,2024/09/01,EN-unavailable
+Unnamed 1T,China Telecom Artificial Intelligence Research Institute,1000.0,2024/09/01,EN-unavailable
 LTM-2-mini,Magic,20.0,2024/08/01,EN-unavailable
 SpreadsheetLLM,Microsoft,1760.0,2024/07/01,EN-unavailable
 FLAMe,Google DeepMind,24.0,2024/07/01,EN-unavailable

diff --git a/figures/scripts/parameter_size_overview_ja.csv b/figures/scripts/parameter_size_overview_ja.csv
@@ -1,5 +1,6 @@
 Model,Lab,Parameters(B),Announced,Type,Source(JP)
 日本語版  Gemma 2 2B,Google,2,2024/10/3,JP-available-CP,https://developers-jp.googleblog.com/2024/10/gemma-2-for-japan.html
+Sarashina2-8x70B,SB Intuitions,465,2024/11/8,JP-available,https://www.sbintuitions.co.jp/news/press/20241108_01/
 Sarashina2-70b,SB Intuitions,70,2024/8/7,JP-available,https://huggingface.co/sbintuitions/sarashina2-70b
 Sarashina,SB Intuitions,65,2024/6/14,JP-available,https://www.sbintuitions.co.jp/news/press/20240614_01/
 Takane,Fujitsu,104,2024/9/30,JP-unavailable,https://pr.fujitsu.com/jp/news/2024/09/30.html

diff --git a/fr/README.md b/fr/README.md
@@ -35,6 +35,7 @@ N'hésitez pas à signaler les erreurs sur la page [issues](https://github.com/l
 
 |    |  Architecture  | Longueur Maximale du Contexte | Données d'entraînement  |  Développeur  | Licence / Conditions d'utilisation |
 |:---|:---:|:---:|:---:|:---:|:---:|
+| [Sarashina2-8x70B](https://www.sbintuitions.co.jp/news/press/20241108_01/) | Mixtral<br>([8x70b (**465b**)](https://huggingface.co/sbintuitions/sarashina2-8x70b)) | 8,192 | undisclosed | SB Intuitions | Sarashina Model NonCommercial License |
 | [LLM-jp-3 172B beta1](https://www.nii.ac.jp/en/news/release/2024/0917.html) | Llama<br>([**172b**-beta1](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1), [**172b**-beta1-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1-instruct)) | 4,096 | Pre-training: part of [llm-jp-corpus-v3](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)<br>(**0.7T** tokens)<br>Instruction Tuning: [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/), [answer-carefully](https://liat-aip.sakura.ne.jp/wp/answercarefully-dataset/), Dolly Dataset, OASST1, OASST2, Aya Dataset, 	ichikara-instruction-format, Daring-Anteater, FLAN | Research and Development Center for Large Language Models (LLMC) | LLM-jp-3 172B beta1 Terms of Use |
 | [LLM-jp-3 172B alpha](https://llmc.nii.ac.jp/en/topics/llm-jp-3-172b-alpha1-alpha2/) | Llama<br>([**172b**-alpha1](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha1), [**172b**-alpha1-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha1-instruct), [**172b**-alpha2](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha2), [**172b**-alpha2-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha2-instruct)) | 4,096 | Pre-training: part of [llm-jp-corpus-v3](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)<br>(alpha1: **0.7T** tokens, alpha2: **1.4T** tokens)<br>Instruction Tuning: [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/), [answer-carefully](https://liat-aip.sakura.ne.jp/wp/answercarefully-dataset/), Dolly Dataset, OASST1, OASST2, Aya Dataset, 	ichikara-instruction-format, Daring-Anteater, FLAN | Research and Development Center for Large Language Models (LLMC) | Apache 2.0 |
 | [Stockmark-100b](https://huggingface.co/stockmark/stockmark-100b) | Llama<br>([**100b**](https://huggingface.co/stockmark/stockmark-100b), [**100b**-instruct-v0.1](https://huggingface.co/stockmark/stockmark-100b-instruct-v0.1)) | 4,096 | Pre-training: RedPajama, Wikipedia en japonais, Japanese mC4, Japanese CommonCrawl, Japanese Patent, Stockmark Web Corpus<br>(**910B** tokens)<br>Instruction Tuning (LoRA): [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/) | Stockmark | MIT |