-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine-tune cross-lingual translator for text2text generation #27
Comments
I would be working on this. |
Awesome. For question generation, one approach to get started is to use the SQuAD dataset, and pre-process it into |
Here is the link to the colab for the analysis of the question answering and question generation API --> https://colab.research.google.com/drive/1WzO_TP9Nn98AeKmicCYaNTRWezAp9OLt?usp=sharing |
Thanks very much for sharing the notebook. I can recommend two things to try:
|
Thanks Art. Feedback: Blocker: When using the text2text fine-tuning API to get a pre-trained translator I run out of space. I have experienced this on both AWS and colab (I get 'no space left on device' error message). I would appreciate every help I can get. I have attached screenshots here. |
|
@artitw oh Okay. Got it |
Thanks Art. The next steps is for me to report the test set accuracy so that we can establish a benchmark. |
Reviewed the notebook. It looks like the fine-tuning was not performed on question generation data; rather, it was done using the example for translation. Could you try the following format? I updated the API in the repo to avoid confusion with the
|
Oh I see |
Hi @artitw I have gone back to do the work again and the question generation API actually works on a pre-trained translator. Here https://colab.research.google.com/drive/1WzO_TP9Nn98AeKmicCYaNTRWezAp9OLt?usp=sharing What strategy do you recommend for benchmarking the test set accuracy? |
For benchmarking, we can start with lower casing the text and then calculating the exact match accuracy. For finetuning a pretrained translator, we would have to use the translate (not question generation) API to generate the finetuned results. |
In addition to exact match accuracy, it would be good to calculate average BLEU scores over the answers as well. For reference, see https://en.wikipedia.org/wiki/BLEU |
Hi @artitw , After training with about 33 data points, the pre-trained translator is still just translating the payload. |
Thanks for sharing the notebook. It looks like the right direction, but I would expect it to need much more training (>10k examples). I would also recommend saving the intermediate results in Google Drive so that you can pick up where you left off without starting over. |
@artitw oh okay, got it |
Hi @artitw, I would like to continue from where John stopped. |
great, I've assigned you to this issue. Please review what John has done and let us know of any questions here. |
Noted. I have reviewed John's work and played with the notebooks he reported. It seems that my assignments are the following, in order:
Am I right? Do you have any suggestions on getting training data? Thank you. |
What you describe sounds like the right track. I would recommend starting with the English SQuAD [1] dataset and then use XQuAD [2] after that is somewhat working. [1] https://rajpurkar.github.io/SQuAD-explorer/ |
Hi @artitw, After trying different options that did not work out, I opted for Amazon Sagemaker.
The job has been running for some hours taking SQuAD[1] dataset as input. I will keep you updated. |
Hi @lere01 What you suggest seems interesting. I would recommend using a small dataset to test your setup before running any heavy jobs. |
Hi @artitw, I used a small dataset to test my setup as you suggested and it worked fine. But the larger dataset took too long to run. I set the job to run for 5 days and even that time frame was not enough. However, you can see some sort of proof of concept at https://colab.research.google.com/drive/1Vvem1DqNJZQej4t2qAIkZN0DyCdUY_sM#scrollTo=RXf2UrMvSc25.
This was just to show that the whole process works. I would like your suggestion on how to proceed. |
Hi @lere01, Thanks for sharing your work and the summary. It looks like a good start. The main issue I can see is that the notebook you shared uses the Answerer model, not the finetuned translator you performed fitting on. We would have to perform predictions using the translator model because we are using it for an unintended purpose. |
Hi @artitw Hope you have had a good day. Two things. 1. Before going far, I want to let you know that I am fine tuning using
AND NOT
Am I on the right track? |
2. I dug into the codebase and figured out a way to use the GPU.By editing the Translator and doing this:
What do you think? |
The second approach should work, as we want to generate questions that correspond to a context and an answer.
|
Nice find. I am referencing your pull request here: #31
|
Hi @artitw, The dataset we are using for fine tuning has multiple questions attached to each context. Do you think that this might be affecting the algorithm's learning? As against one question per context. |
Yes, I would suggest that the context be concatenated with the answer for each target question. This would ensure that each unique question is mapped to a unique input to the model. |
Hi @artitw I have been able to fine-tune up to 50,000 of the training data (SQuAD 1.0). At 10000, 20000 and 50000, I tried the model on the dev dataset but got a BLEU Score of 0 in all cases. Is this expected? Would you be able to take a look at my code to ascertain that I am doing things right? I ran the code locally but you can find it here https://colab.research.google.com/drive/1z3YTjOF1dllxqSQPLgxDDeKOf9wJFfG3?usp=sharing |
@lere01 thanks for your efforts on this and for sharing the notebook. Code looks fine to me, so good job with that. Can you share the prediction results after 50k training? If those don't look promising, we might have to put this project on hold until we can figure out how to train it more. |
Fine-tune cross-lingual translator for text2text generation tasks, e.g. question generation, question answering, summarization, etc. to demonstrate cross-lingual alignment, zero-shot generation, etc.
For example, can we demonstrate question generation or question answering using the existing API? If not, what needs to get fixed?
https://github.com/artitw/text2text#training--finetuning
The text was updated successfully, but these errors were encountered: