Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can a loss be negative? #1

Open
educob opened this issue May 2, 2019 · 5 comments
Open

Can a loss be negative? #1

educob opened this issue May 2, 2019 · 5 comments

Comments

@educob
Copy link

educob commented May 2, 2019

Hi.
Yesterday I adapted the net to run with pytorch 1.1.
After 40 epochs its writting was not good at all and errors are not improving.

But what most confused me is that loss is many times negative. Can a loss be negative?

Thanks.

@naba89
Copy link
Owner

naba89 commented May 2, 2019

Hi

Well, the code used to work in 0.4. I have not tested on 1.1 yet. Maybe some api's might have undergone some changes.
The loss can be negative. What does your training graph look like? Does it in general follow the one I shared in the README?

And what exactly do you mean by writing is not good. Is it like very small or not in a line or some strokes go very long etc?

However, there is one thing I would like you to modify and check

In line 63 of model.py

Change this line:
if self.hidden is not None:
to:
if self.hidden is not None and self.training:

Check if that works well while generating sequences.

@educob
Copy link
Author

educob commented May 2, 2019

Hi.
Loss starts between 2.5-3.5 and goes down quite fast to 0.3-0.5 (positive and negative). I don't know how to start he tensorboard thing but in principle it looks similar to yours.
But the writing is just gibberish. It doesn't look writing at all.

I made the change with self.training but the result looks the same as before.

Thanks for the code.

This is my modifed train.py:

import argparse
import os
import pickle
import time

import numpy as np

import torch.optim as optim
import torch
from torch.autograd import Variable

from loss_functions import PredictionLoss
from model import RNNPredictNet
from utils import DataLoader
from sample import sample_stroke

from tensorboardX import SummaryWriter

writer = SummaryWriter()


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def main():
	parser = argparse.ArgumentParser()
	parser.add_argument('--input_size', type=int, default=3,
						help='input num features')
	parser.add_argument('--hidden_size', type=int, default=256,
						help='size of RNN hidden state')
	parser.add_argument('--num_layers', type=int, default=2,
						help='number of layers in the RNN')
	parser.add_argument('--bidirectional', type=bool, default=False,
						help='use BLSTM')
	parser.add_argument('--batch_size', type=int, default=50,
						help='batch size')
	parser.add_argument('--seq_length', type=int, default=300,
						help='RNN sequence length')
	parser.add_argument('--num_epochs', type=int, default=300,
						help='number of epochs')
	parser.add_argument('--save_every', type=int, default=100,
						help='save frequency')
	parser.add_argument('--model_dir', type=str, default='save',
						help='directory to save model to')
	parser.add_argument('--grad_clip', type=float, default=10.,
						help='clip gradients at this value')
	parser.add_argument('--learning_rate', type=float, default=0.005,
						help='learning rate')
	parser.add_argument('--decay_rate', type=float, default=0.95,
						help='decay rate for rmsprop')
	parser.add_argument('--num_mixture', type=int, default=20,
						help='number of gaussian mixtures')
	parser.add_argument('--data_scale', type=float, default=20,
						help='factor to scale raw data down by')
	parser.add_argument('--keep_prob', type=float, default=0.8,
						help='dropout keep probability')
	parser.add_argument('--validate_every', type=int, default=10,
						help='frequency of validation')
	args = parser.parse_args()
	train(args)


def train(args):
	data_loader = DataLoader(args.batch_size, args.seq_length, args.data_scale)

	if args.model_dir != '' and not os.path.exists(args.model_dir):
		os.makedirs(args.model_dir)

	with open(os.path.join(args.model_dir, 'config.pkl'), 'wb') as f:
		pickle.dump(args, f)

	model = RNNPredictNet(args).to(device)

	loss_fn = PredictionLoss(args.batch_size, args.seq_length)
	optimizer = optim.Adam(model.parameters(), lr=args.learning_rate)
	lr_scheduler = optim.lr_scheduler.ExponentialLR(optimizer=optimizer, gamma=args.decay_rate)

	#training_loss = []
	#validation_loss = []

	for e in range(args.num_epochs):
		data_loader.reset_batch_pointer()
		v_x, v_y = data_loader.validation_data()
		v_x = torch.FloatTensor(v_x).to(device)
		v_y = torch.FloatTensor(v_y).to(device)

		for b in range(data_loader.num_batches):
			model.train()
			train_step = e * data_loader.num_batches + b
			start = time.time()

			x, y = data_loader.next_batch()
			x = torch.FloatTensor(x).to(device)
			y = torch.FloatTensor(y).to(device)

			optimizer.zero_grad()
			output = model(x)

			train_loss = loss_fn(output, y)

			train_loss.backward()
			torch.nn.utils.clip_grad_norm_(model.parameters(), args.grad_clip)

			optimizer.step()

			#training_loss.append(train_loss.data[0])
			writer.add_scalar('Training Loss', train_loss.item(), train_step)

			model.eval()
			with torch.no_grad():
				output = model(v_x)
				val_loss = loss_fn(output, v_y)
			#validation_loss.append(val_loss.data[0])

			end = time.time()

			print(
				"{}/{} (epoch {}), train_loss = {:.3f}, valid_loss = {:.3f}, time/batch = {:.3f}"
					.format(
					train_step,
					args.num_epochs * data_loader.num_batches,
					e,
					train_loss.item(),
					val_loss.item(),
					end - start))

			if (train_step % args.save_every == 0) and (train_step > 0):

				checkpoint_path = os.path.join(args.model_dir, 'model.pth')
				torch.save({
					'model': model.state_dict(),
					'optimizer': optimizer.state_dict(),
					'epoch': e,
					'current_lr': args.learning_rate * (args.decay_rate ** e)
				},	checkpoint_path)

				_, img = sample_stroke() # error svg
				#print("model saved to {}".format(checkpoint_path))
		lr_scheduler.step()


if __name__ == '__main__':
	main()
	writer.close()

@naba89
Copy link
Owner

naba89 commented May 4, 2019

Aah, so by gibberish you mean there are no words? That is expected because the current version is unconditional. It just randomly generates strokes which look like handwriting. I did not get around to implement the conditional version, where you can essentially give a word as an input and the model will write that word in the handwriting.

@educob
Copy link
Author

educob commented May 4, 2019 via email

@Weifeilong611
Copy link

sorry,I don't konw why the loss will be negative?
follow your computational formula,it should not be a negative value.
Can you tell me some detail for this situation?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants