Folder Name Description of Contents active-search-engines list of active general purpose search engine names from https://wikipedia.org/wiki/Template:Web_search_engines alexa-top1mil-sites Alexa list of top 1 million web sites amazon-aws-namespaces AWS name spaces (paths found in aws.amazon.com URL's) amazon-macie-types Amazon Macie data object content types via https://docs.aws.amazon.com/macie/latest/userguide/macie-classify-objects-content-type.html censorship-test-urls URL testing list intended for discovering web site censorship https://github.com/citizenlab/test-lists content-access-guidelines Web Content Accessibility Guidelines by W3C free-web-hosts list of free web hosting services from https://mirror1.malwaredomains.com/files/freewebhosts.txt github-dmca-users links to GitHub accounts that have received DMCA notices https://github.com/github/dmca marketing-tech-landscape top 5,000 marketing technology web sites modern-web-history A History of The Modern Web phishtank-developers-database PhishTank downloadable database in CSV format via https://phishtank.com/developer_info.php piidox-search-sites list of personally identifiable information search engines simpl-redir-shortcuts shortcuts for redirection on simpl.info sites-using-cloudflare sites using CloudFlare WAF according to GitHub @pirate subreddit-list-full http://www.reddit.com/r/ListOfSubreddits/wiki/listofsubreddits subreddit-list-nsfw WARNING! NSFW Same as above, but with "not-safe-for-work" subreddit materials tls-scanner-urls URL's to test TLS scanning on via Botan top-sites-global Top 1,000 Internet web sites across the globe by OWASP headers url-shortener-sites URL shortener sites taken from http://dns-bh.sagadc.org/url_shorteners.txt