A data science project to analyze codegolf.stackexchange.com in a Wolfram Language EntityStore
A version 12 or newer Wolfram Engine product.
I've listed the rough steps I took here in order if you're interested in the details of how I processed the data. But you may want to save yourself some time and effort by skipping to the exploration section and downloading the cleaned and processed store.
The latest XML snapshots of codegolf.stackexchange.com can be found here on archive.org, along with the other StackExchange network archives. Code and directions for converting these into a Wolfram Language EntityStore are currently TBD as the conversion utility is made public-ready. But for now, you can download a compressed MX file of a converted EntityStore here (~418 MB).
Follow along with the code in GatherMetadata.nb to collect submission information including programming languages, reported sizes, and code snippets.
Follow along with the code in ProcessMetadata.nb to further refine the metadata, merge unnecessarily duplicated language entities and add additional properties to the EntityStore for fast and easy exploration.
There is a lot of interesting data to explore and extract, but I've done some work on things I found interesting in Explore.nb. I've uploaded a public version of this notebook for easy viewing on the Wolfram Cloud.
If you'd like to do some exploration on your own, you can download compressed MX files of the cleaned and processed EntityStores here:
- codegolf.stackexchange.com_Cleaned.mx.zip (~450 MB)
- CodeGolfProgrammingLanguage_Cleaned.mx.zip (~325 KB)
After extracting the files, you should be able to follow along with the code in Explore.nb and go from there.