Data Discovery Project

Pick a favorite topic that you care about
Find at least 20 datasets for that topic (use, for example, https://toolbox.google.com/datasetsearch). I for one, collect open source git repositories, so I searched for "git urls"
For each of the 20 datasets you chose determine if the underlying data can be accessed (some of these datasets do not provide public access)
Create a mongodb collection YourNetId within the database fdac19mp2 where you store metadata for each of the 20 datasets: YourTopic, title, license, description, url(s) were the data may be retrieved

import pymongo, json
client = pymongo.MongoClient (host="da1.eecs.utk.edu")
db = client ['fdac19mp2']
coll = db ['YourNetId']
# for each dataset
coll.insert ( { 'topic':'YourTopic', 'title': 'Data title', 'license': 'license', 'description': 'Brief data description', 'urls': [ 'url1', 'url2', ... ] } )

To check what is recorded:

import pprint
import pymongo, json
client = pymongo.MongoClient (host="da1.eecs.utk.edu")
db = client ['fdac19mp2']
coll = db ['YourNetId']
pp = pprint.PrettyPrinter(indent=1,width=65)
for r in coll. find():
  print(pp .pformat (r))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data Discovery Project

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data Discovery Project