a new way of backprob of C[Xb] instead of for loop #55

haduoken · 2024-06-13T09:13:12Z

@karpathy
when I'm watching your zero_to_hero serial in youtube (and I think it's awesome)
I come up with an new idea of backprob of C[Xb], (I have reply in youtube as well)

the original method like this :
dC = torch.zeros_like(C)
for k in range(Xb.shape[0]):
for j in range(Xb.shape[1]):
ix = Xb[k,j]
dC[ix] += demb[k,j]

my method like this
dC = (F.one_hot(Xb).float().transpose(1, 2) @ demb).sum(0)

and I check that the grad is matched

that woks because we can convert the index format to an one_hot with matrix multiple @, then we can just use the backprob rule as the matrix multiple

afrozenator · 2024-07-19T03:20:59Z

+1, it can be written as:

# scatter the gradient via OHE
Xb_ohe = F.one_hot(Xb, num_classes=vocab_size).float()
dC = torch.einsum('ble,blv->ve', demb, Xb_ohe)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a new way of backprob of C[Xb] instead of for loop #55

a new way of backprob of C[Xb] instead of for loop #55

haduoken commented Jun 13, 2024

afrozenator commented Jul 19, 2024

a new way of backprob of C[Xb] instead of for loop #55

a new way of backprob of C[Xb] instead of for loop #55

Comments

haduoken commented Jun 13, 2024

afrozenator commented Jul 19, 2024