-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequency transforms of text #132
Comments
(now that I've slept) I'm grateful to @aparrish for sharing her word vectors generated from Project Gutenberg. I wouldn't have had the time to pull this together without that resource. If I had more time, I'd go back and be more content-aware about tokenizing the source texts -- I split on spaces and at each non-letter character, and the vector file contains entries for tokens like '--' and contractions. Entertainingly enough, The Waves isn't in Project Gutenberg, and so my lookup error log was a nice list of words that she coined in that book. For those, I greedily matched valid sub-words starting from the beginning of the word. I used JWave for the Haar and Daubechies transforms, and Annoy for the nearest-neighbor matching. |
🎈 Is the source available somewhere? |
I'll put it up later today--was mostly rushing to meet the deadline (which I now see was UTC, not local, so oh well). |
Scripts are up at https://github.com/danuep/nanogenmo2017 |
I didn't even get the idea until a couple of days ago, and mostly I'm hoping I can get this uploaded before midnight...
Reading @aparrish at #23 talk about hoping to get a meaningful average novel got me thinking about the scales of variation in play, which led to wavelet transforms, which led to
Haar of Darkness
which is unfortunately 2000 words short of the limit, so in honor of a brilliant woman of letters and a brilliant woman of numbers:
The Wavelets, a Daubechies transform of The Waves, by Virginia Woolf
[edit: now with correct link to The Wavelets]
The text was updated successfully, but these errors were encountered: