diff --git a/README.md b/README.md index 8a017cb..4dc4e97 100644 --- a/README.md +++ b/README.md @@ -36,6 +36,7 @@ | | アーキテクチャ | 入出力で扱える
トークン数 | 学習テキスト | 開発元 | ライセンス / 利用規約 | |:---|:---:|:---:|:---:|:---:|:---:| +| [Sarashina2-8x70B](https://www.sbintuitions.co.jp/news/press/20241108_01/) | Mixtral
([8x70b (**465b**)](https://huggingface.co/sbintuitions/sarashina2-8x70b)) | 8,192 | 不明 | SB Intuitions | Sarashina Model NonCommercial License | | [LLM-jp-3 172B beta1](https://www.nii.ac.jp/news/release/2024/0917.html) | Llama
([**172b**-beta1](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1), [**172b**-beta1-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1-instruct)) | 4,096 | 事前学習: [llm-jp-corpus-v3](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)の一部
(計 **0.7T** トークン)
Instruction Tuning: [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/), [answer-carefully](https://liat-aip.sakura.ne.jp/wp/answercarefully-dataset/), Dolly Dataset, OASST1, OASST2, Aya Dataset, ichikara-instruction-format, Daring-Anteater, FLAN | 大規模言語モデル研究開発センター (LLMC) | LLM-jp-3 172B beta1 Terms of Use | | [LLM-jp-3 172B alpha](https://llmc.nii.ac.jp/topics/llm-jp-3-172b-alpha1-alpha2/) | Llama
([**172b**-alpha1](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha1), [**172b**-alpha1-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha1-instruct), [**172b**-alpha2](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha2), [**172b**-alpha2-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha2-instruct)) | 4,096 | 事前学習: [llm-jp-corpus-v3](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)の一部
(alpha1: 計 **0.7T** トークン, alpha2: 計 **1.4T** トークン)
Instruction Tuning: [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/), [answer-carefully](https://liat-aip.sakura.ne.jp/wp/answercarefully-dataset/), Dolly Dataset, OASST1, OASST2, Aya Dataset, ichikara-instruction-format, Daring-Anteater, FLAN | 大規模言語モデル研究開発センター (LLMC) | Apache 2.0 | | [Stockmark-100b](https://stockmark.co.jp/news/20240516) | Llama
([**100b**](https://huggingface.co/stockmark/stockmark-100b), [**100b**-instruct-v0.1](https://huggingface.co/stockmark/stockmark-100b-instruct-v0.1)) | 4,096 | 事前学習: RedPajama, 日本語 Wikipedia, Japanese mC4, Japanese CommonCrawl, 日本語特許, Stockmark Web Corpus
(計 **910B** トークン)
Instruction Tuning (LoRA): [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/) | ストックマーク | MIT | diff --git a/en/README.md b/en/README.md index e7fc7f0..7aa60f4 100644 --- a/en/README.md +++ b/en/README.md @@ -35,6 +35,7 @@ Please point out any errors on the [issues page](https://github.com/llm-jp/aweso | | Architecture | Max Context Length | Training Data | Developer | License / Terms of Use | |:---|:---:|:---:|:---:|:---:|:---:| +| [Sarashina2-8x70B](https://www.sbintuitions.co.jp/news/press/20241108_01/) | Mixtral
([8x70b (**465b**)](https://huggingface.co/sbintuitions/sarashina2-8x70b)) | 8,192 | undisclosed | SB Intuitions | Sarashina Model NonCommercial License | | [LLM-jp-3 172B beta1](https://www.nii.ac.jp/en/news/release/2024/0917.html) | Llama
([**172b**-beta1](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1), [**172b**-beta1-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1-instruct)) | 4,096 | Pre-training: part of [llm-jp-corpus-v3](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)
(**0.7T** tokens)
Instruction Tuning: [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/), [answer-carefully](https://liat-aip.sakura.ne.jp/wp/answercarefully-dataset/), Dolly Dataset, OASST1, OASST2, Aya Dataset, ichikara-instruction-format, Daring-Anteater, FLAN | Research and Development Center for Large Language Models (LLMC) | LLM-jp-3 172B beta1 Terms of Use | | [LLM-jp-3 172B alpha](https://llmc.nii.ac.jp/en/topics/llm-jp-3-172b-alpha1-alpha2/) | Llama
([**172b**-alpha1](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha1), [**172b**-alpha1-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha1-instruct), [**172b**-alpha2](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha2), [**172b**-alpha2-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha2-instruct)) | 4,096 | Pre-training: part of [llm-jp-corpus-v3](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)
(alpha1: **0.7T** tokens, alpha2: **1.4T** tokens)
Instruction Tuning: [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/), [answer-carefully](https://liat-aip.sakura.ne.jp/wp/answercarefully-dataset/), Dolly Dataset, OASST1, OASST2, Aya Dataset, ichikara-instruction-format, Daring-Anteater, FLAN | Research and Development Center for Large Language Models (LLMC) | Apache 2.0 | | [Stockmark-100b](https://huggingface.co/stockmark/stockmark-100b) | Llama
([**100b**](https://huggingface.co/stockmark/stockmark-100b), [**100b**-instruct-v0.1](https://huggingface.co/stockmark/stockmark-100b-instruct-v0.1)) | 4,096 | Pre-training: RedPajama, Japanese Wikipedia, Japanese mC4, Japanese CommonCrawl, Japanese Patent, Stockmark Web Corpus
(**910B** tokens)
Instruction Tuning (LoRA): [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/) | Stockmark | MIT | diff --git a/figures/parameter_size_overview_en.png b/figures/parameter_size_overview_en.png index 454f9a2..bc0273d 100644 Binary files a/figures/parameter_size_overview_en.png and b/figures/parameter_size_overview_en.png differ diff --git a/figures/parameter_size_overview_ja.png b/figures/parameter_size_overview_ja.png index f3674de..b1726a0 100644 Binary files a/figures/parameter_size_overview_ja.png and b/figures/parameter_size_overview_ja.png differ diff --git a/figures/scripts/parameter_size_overview.csv b/figures/scripts/parameter_size_overview.csv index a1bf9a8..81db348 100644 --- a/figures/scripts/parameter_size_overview.csv +++ b/figures/scripts/parameter_size_overview.csv @@ -1,5 +1,22 @@ Model,Lab,Parameters(B),Announced,Type +Hunyuan-Large,Tencent,389.0,2024/11/01,EN-available +SEA-LIONv3,AI Singapore,9.24,2024/11/01,EN-available +AMD OLMo,AMD,1.0,2024/11/01,EN-available +SmolLM2,Hugging Face,1.7,2024/11/01,EN-available +Aya-Expanse-32B,Cohere,32.0,2024/10/01,EN-available +Claude 3.5 Sonnet (new),Anthropic,,2024/10/01,EN-available +Granite 3.0 8B,IBM,8.0,2024/10/01,EN-available +Granite-3.0-3B-A800M-Instruct,IBM,3.0,2024/10/01,EN-available +aiXcoder-7B,aiXcoder,7.0,2024/10/01,EN-available +Llama-3.1-Nemotron-70B,NVIDIA,70.0,2024/10/01,EN-available +Ministral 8B,Mistral,8.0,2024/10/01,EN-available +Yi-Lightning,01-ai,200.0,2024/10/01,EN-available +Zamba2-7B,Zyphra,7.0,2024/10/01,EN-available +nGPT,NVIDIA,1.0,2024/10/01,EN-available +Inflection-3 Pi (3.0),Inflection AI,1200.0,2024/10/01,EN-available +Inflection-3 Productivity (3.0),Inflection AI,1200.0,2024/10/01,EN-available LFM-40B,Liquid AI,40.0,2024/09/01,EN-available +Emu3,BAAI,8.0,2024/09/01,EN-available NLVM 1.0,NVIDIA,72.0,2024/09/01,EN-available TeleChat2-115B,China Telecom Artificial Intelligence Research Institute,115.0,2024/09/01,EN-available AMD-Llama-135m,AMD,0.135,2024/09/01,EN-available @@ -139,6 +156,7 @@ Reka Flash,Reka AI,21.0,2024/02/01,EN-available Gemma,Google DeepMind,7.0,2024/02/01,EN-available Gemini 1.5 Pro,Google DeepMind,1500.0,2024/02/01,EN-available Qwen-1.5 72B,Alibaba,72.0,2024/02/01,EN-available +MobileLLM,Meta AI,1.0,2024/02/01,EN-available GOODY-2,BRAIN,,2024/02/01,EN-available Natural-SQL-7B,ChatDB,7.0,2024/02/01,EN-available Sea-Lion,AI Singapore,7.5,2024/02/01,EN-available @@ -319,6 +337,7 @@ ruGPT-3,Huawei/Sberbank,1.3,2021/02/01,EN-available Switch,Google,1600.0,2021/01/01,EN-available GPT-3,OpenAI,175.0,2020/05/01,EN-available SFR-LLaMA-3.1-70B-Judge,Salesforce,70.0,2024/09/01,EN-unavailable +Unnamed 1T,China Telecom Artificial Intelligence Research Institute,1000.0,2024/09/01,EN-unavailable LTM-2-mini,Magic,20.0,2024/08/01,EN-unavailable SpreadsheetLLM,Microsoft,1760.0,2024/07/01,EN-unavailable FLAMe,Google DeepMind,24.0,2024/07/01,EN-unavailable diff --git a/figures/scripts/parameter_size_overview_ja.csv b/figures/scripts/parameter_size_overview_ja.csv index 434a1c8..6f4da61 100644 --- a/figures/scripts/parameter_size_overview_ja.csv +++ b/figures/scripts/parameter_size_overview_ja.csv @@ -1,5 +1,6 @@ Model,Lab,Parameters(B),Announced,Type,Source(JP) 日本語版 Gemma 2 2B,Google,2,2024/10/3,JP-available-CP,https://developers-jp.googleblog.com/2024/10/gemma-2-for-japan.html +Sarashina2-8x70B,SB Intuitions,465,2024/11/8,JP-available,https://www.sbintuitions.co.jp/news/press/20241108_01/ Sarashina2-70b,SB Intuitions,70,2024/8/7,JP-available,https://huggingface.co/sbintuitions/sarashina2-70b Sarashina,SB Intuitions,65,2024/6/14,JP-available,https://www.sbintuitions.co.jp/news/press/20240614_01/ Takane,Fujitsu,104,2024/9/30,JP-unavailable,https://pr.fujitsu.com/jp/news/2024/09/30.html diff --git a/fr/README.md b/fr/README.md index eb5c788..320f56c 100644 --- a/fr/README.md +++ b/fr/README.md @@ -35,6 +35,7 @@ N'hésitez pas à signaler les erreurs sur la page [issues](https://github.com/l | | Architecture | Longueur Maximale du Contexte | Données d'entraînement | Développeur | Licence / Conditions d'utilisation | |:---|:---:|:---:|:---:|:---:|:---:| +| [Sarashina2-8x70B](https://www.sbintuitions.co.jp/news/press/20241108_01/) | Mixtral
([8x70b (**465b**)](https://huggingface.co/sbintuitions/sarashina2-8x70b)) | 8,192 | undisclosed | SB Intuitions | Sarashina Model NonCommercial License | | [LLM-jp-3 172B beta1](https://www.nii.ac.jp/en/news/release/2024/0917.html) | Llama
([**172b**-beta1](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1), [**172b**-beta1-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1-instruct)) | 4,096 | Pre-training: part of [llm-jp-corpus-v3](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)
(**0.7T** tokens)
Instruction Tuning: [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/), [answer-carefully](https://liat-aip.sakura.ne.jp/wp/answercarefully-dataset/), Dolly Dataset, OASST1, OASST2, Aya Dataset, ichikara-instruction-format, Daring-Anteater, FLAN | Research and Development Center for Large Language Models (LLMC) | LLM-jp-3 172B beta1 Terms of Use | | [LLM-jp-3 172B alpha](https://llmc.nii.ac.jp/en/topics/llm-jp-3-172b-alpha1-alpha2/) | Llama
([**172b**-alpha1](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha1), [**172b**-alpha1-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha1-instruct), [**172b**-alpha2](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha2), [**172b**-alpha2-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-alpha2-instruct)) | 4,096 | Pre-training: part of [llm-jp-corpus-v3](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)
(alpha1: **0.7T** tokens, alpha2: **1.4T** tokens)
Instruction Tuning: [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/), [answer-carefully](https://liat-aip.sakura.ne.jp/wp/answercarefully-dataset/), Dolly Dataset, OASST1, OASST2, Aya Dataset, ichikara-instruction-format, Daring-Anteater, FLAN | Research and Development Center for Large Language Models (LLMC) | Apache 2.0 | | [Stockmark-100b](https://huggingface.co/stockmark/stockmark-100b) | Llama
([**100b**](https://huggingface.co/stockmark/stockmark-100b), [**100b**-instruct-v0.1](https://huggingface.co/stockmark/stockmark-100b-instruct-v0.1)) | 4,096 | Pre-training: RedPajama, Wikipedia en japonais, Japanese mC4, Japanese CommonCrawl, Japanese Patent, Stockmark Web Corpus
(**910B** tokens)
Instruction Tuning (LoRA): [ichikara-instruction](https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/) | Stockmark | MIT |