HistText

Objective

A library dedicated to downloading, analyzing, and comparing well-known texts on Project Gutenberg using natural language processing.

Use

pip install git+https://github.com/anlandu/histtext

Run scripts/GetGut.py to get the top 100 books on Project Gutenberg from the past 30 days as text files.

Run scripts/GetBookFromURL.py to get the text file of a specific book from Project Gutenberg. Takes string as command-line argument: the URL of the root directory of the book. e.g. http://www.gutenberg.org/ebooks/4300

Import src.AvgWordLen, src.AvgSentLen, or src.FreqPOS to use their methods. AvgWordLen enables you to get the average word length of a given file, analyze all files in the resources folder, and/or compare all stored word lengths to an inputted one. AvgSentLen does the same for sentence lengths. FreqPOS counts the number of times each part of speech is used in a text.

(Note that these three files assume the top 100 books are already downloaded using GetGut.py; alternatively, you can simply download your own texts of choice into the resources folder.)

Contributing

Contributions are more than welcome! Please see the CONTRIBUTING.md to learn how to contribute. Check out issues for some starting points!

LICENSE

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
resources		resources
scripts		scripts
src		src
tests		tests
CONDUCT.md		CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DAILY_PROGRESS.md		DAILY_PROGRESS.md
LICENSE		LICENSE
README.md		README.md
book_diffs.csv		book_diffs.csv
book_diffs_normalized.csv		book_diffs_normalized.csv
book_stats.csv		book_stats.csv
book_tfidf_closeness.csv		book_tfidf_closeness.csv
requirements.txt		requirements.txt
userChosenText.txt		userChosenText.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HistText

Objective

Use

Contributing

LICENSE

About

Releases

Packages

Contributors 3

Languages

License

anlandu/conText

Folders and files

Latest commit

History

Repository files navigation

HistText

Objective

Use

Contributing

LICENSE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages