Continued pre-training from a fully open model

Research question: can we continually pre-train a fully open model (like OLMo) instead of closed-data Mistral and get the same performance?

Openness rankings can be found [here](https://osai-index.eu/the-index)

Main candidate models are [OLMo](https://huggingface.co/allenai/OLMo-2-1124-13B) and [Apertus](https://huggingface.co/swiss-ai/Apertus-8B-2509), but preferably OLMo, because it has the right size (13B).

We start with Norwegian. but aim at 2-3 more languages in the end. For this, we have max 250K GPU hours on LUMI.

For Norwegian, the data mixture will probably include HPLT v3, FinePDFs and MadLad, but up for discussion.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Continued pre-training from a fully open model #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Continued pre-training from a fully open model #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions