Research question: can we continually pre-train a fully open model (like OLMo) instead of closed-data Mistral and get the same performance?
Openness rankings can be found here
Main candidate models are OLMo and Apertus, but preferably OLMo, because it has the right size (13B).
We start with Norwegian. but aim at 2-3 more languages in the end. For this, we have max 250K GPU hours on LUMI.
For Norwegian, the data mixture will probably include HPLT v3, FinePDFs and MadLad, but up for discussion.