Skip to content

Continued pre-training from a fully open model #9

@akutuzov

Description

@akutuzov

Research question: can we continually pre-train a fully open model (like OLMo) instead of closed-data Mistral and get the same performance?

Openness rankings can be found here

Main candidate models are OLMo and Apertus, but preferably OLMo, because it has the right size (13B).

We start with Norwegian. but aim at 2-3 more languages in the end. For this, we have max 250K GPU hours on LUMI.

For Norwegian, the data mixture will probably include HPLT v3, FinePDFs and MadLad, but up for discussion.

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions