basenji_read overflow #196

ElArquitectorgo · 2024-05-20T14:53:05Z

Hi,

I am trying to do a study similar to the Enformer study for my final thesis, and to do so I have downloaded 4505 Encode tracks.
When I ran the basenji_data script on I encountered the following error message numerous times

/mnt2/fscratch/users/ac_aux/vguirado/preprocess/bin/basenji_data_read.py:307: RuntimeWarning: overflow encountered in cast
  cov = self.cov_open.values(chrm, start, end, numpy=True).astype('float16')

The code:

#SBATCH --job-name=preprocess
#SBATCH --time=0-30:0
#SBATCH --mem=50G
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
##SBATCH --ntasks-per-node=1
#SBATCH --constraint=cal

...more sbatch things...

time python bin/basenji_data.py -s .9 -g data/hg38_gaps.bed -l 196608 --local -o data/human -p 128 -v .1 -w 128 data/genome.fa data/human_data.txt

I would like to know if this can affect something to the generation of the TFRecords, since during the training I am finding an extremely strange behavior as I show below:

This is by recovering a checkpoint at epoch 80 and training 50 more until 130 (first graph), recovering the checkpoint from epoch 130 until 180 (second) and from 180 until 230 (right). Here I'm using a small subsample, but the same happens with the whole dataset (and worse loss).

Apparently my training code is fine, because I have tried retrieving the Enformer checkpoint that is public and modifying the output to train the same subset and there I do get results. That is, I keep the trunk part already trained and add a single linear layer on top.

But starting from 0, and also including 1019 tracks for the mouse, the model is not able to learn anything. The values of R^2 are 0 or negative no matter how many steps I train.

So it occurs to me that the problem is in the generation of the TFRecords, but the only warning I found was that.

Thank you for your time.

The text was updated successfully, but these errors were encountered:

davek44 · 2024-05-28T04:33:24Z

Hi, it appears that the tracks you downloaded have values above the float16 max. You could change the code to use float32, or explicitly clip the values. All active development on this software is now here: https://github.com/calico/baskerville

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

basenji_read overflow #196

basenji_read overflow #196

ElArquitectorgo commented May 20, 2024

davek44 commented May 28, 2024

basenji_read overflow #196

basenji_read overflow #196

Comments

ElArquitectorgo commented May 20, 2024

davek44 commented May 28, 2024