Trimming down the package size #20

Nagasaki45 · 2018-09-22T21:23:02Z

The good news: deep_disfluency is on PyPI!

The bad news: it's a 59MB package, and the limit on PyPI is 60MB. Most of the data in the package is currently the experiments folder. Can we take only part of it? As I understand it using the package and replicating the study are two different things. We might be able to give the entire functionality with only one experiment data files, aren't we?
Alternatively, there's a way to apply for a larger package size on PyPI. We can do that. Or even just wait until we hit the limit and then think of a solution. It's up to you :-)

The text was updated successfully, but these errors were encountered:

wadkar · 2018-10-24T05:28:50Z

Maybe we can take a page from NLTK and ask users to download the experiments data on demand, e.g. http://www.nltk.org/data.html
How does it sound?

davidschlangen · 2018-10-24T09:14:03Z

That makes a lot of sense. Just provide a get_data.py script, which could even point here (to github, downloading the raw files).

Nagasaki45 · 2018-10-29T12:41:15Z

Just one thing to note: there are files in experiments that are necessary for using the tagger. Also, currently the user selects the experiment and configuration file while instantiating the tagger. So, we must keep at least part of this data, or even better, move a "default" configuration somewhere else and have it loaded by default if no configuration is passed to the tagger __init__ method.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trimming down the package size #20

Trimming down the package size #20

Nagasaki45 commented Sep 22, 2018

wadkar commented Oct 24, 2018

davidschlangen commented Oct 24, 2018

Nagasaki45 commented Oct 29, 2018

Trimming down the package size #20

Trimming down the package size #20

Comments

Nagasaki45 commented Sep 22, 2018

wadkar commented Oct 24, 2018

davidschlangen commented Oct 24, 2018

Nagasaki45 commented Oct 29, 2018