Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trimming down the package size #20

Open
Nagasaki45 opened this issue Sep 22, 2018 · 3 comments
Open

Trimming down the package size #20

Nagasaki45 opened this issue Sep 22, 2018 · 3 comments

Comments

@Nagasaki45
Copy link
Contributor

The good news: deep_disfluency is on PyPI!

The bad news: it's a 59MB package, and the limit on PyPI is 60MB. Most of the data in the package is currently the experiments folder. Can we take only part of it? As I understand it using the package and replicating the study are two different things. We might be able to give the entire functionality with only one experiment data files, aren't we?
Alternatively, there's a way to apply for a larger package size on PyPI. We can do that. Or even just wait until we hit the limit and then think of a solution. It's up to you :-)

@wadkar
Copy link

wadkar commented Oct 24, 2018

Maybe we can take a page from NLTK and ask users to download the experiments data on demand, e.g. http://www.nltk.org/data.html
How does it sound?

@davidschlangen
Copy link

That makes a lot of sense. Just provide a get_data.py script, which could even point here (to github, downloading the raw files).

@Nagasaki45
Copy link
Contributor Author

Just one thing to note: there are files in experiments that are necessary for using the tagger. Also, currently the user selects the experiment and configuration file while instantiating the tagger. So, we must keep at least part of this data, or even better, move a "default" configuration somewhere else and have it loaded by default if no configuration is passed to the tagger __init__ method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants