Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate UDify into AllenNLP #5

Open
14 tasks
Hyperparticle opened this issue Nov 2, 2019 · 0 comments
Open
14 tasks

Integrate UDify into AllenNLP #5

Hyperparticle opened this issue Nov 2, 2019 · 0 comments

Comments

@Hyperparticle
Copy link
Owner

Hyperparticle commented Nov 2, 2019

It would be useful to integrate the UDify model directly into AllenNLP as a PR, as the code merely extends the library to handle a few extra features. Since the release of the UDify code, AllenNLP also has added a multilingual UD dataset reader and a multilingual dependency parser with a corresponding model, which should make things easier.

Here is a list of things that need to be done:

  • Add scripts to download and concatenate the UD data for training/evaluation. Also, add the CoNLL 2018 evaluation script.
  • Create a UDify conllu -> conllu predictor that can handle unseen tokens and multiword ids.
  • Add the sqrt learning rate decay LR scheduler.
  • Add optional dropout to ScalarMix.
  • Modify the multilingual UD dataset reader to handle multiword ids.
  • Add lemmatizer edit script code.
  • Modify the BERT token embedder to be able to return multiple scalar mixes, one per task (or alternatively all the embeddings). Add optional args for internal BERT dropout.
  • Add generic dynamic masking functions.
  • Add the custom sequence tagger and biaffine dependency parser that handles a multi-task setup.
  • Add the UDify main model, wrapping the BERT, dynamic masking, scalar mix, sequence tagger, and dependency parser code. Provide custom metrics for TensorBoard.
  • Add utility code to optionally cache the vocab and grab UD treebank names from files.
  • Add helper script to evaluate conllu predictions and output them to json.
  • Add tests to verify the new UDify model and modules.
  • Add UDify config jsonnet file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant