This class is a structured, collaborative study of advanced topics in Data Science. During the semester, students will apply the data analytics lifecycle to a research topic of their choosing. Students will select appropriate predictive analytical methods for their topics and evaluate its social and ethical implications. Individual work will complement peer collaboration as students explore issues of visualizing and communicating data to each other and to the public.
Class repository is maintained on GitHub/MoreDataScience and hosted on MoreDataScience.github.io/CSCI499-Spring2019/.
OCNL 220 11am - 3pm Or by appointment
Topic | Activities | Due Date | Lead |
---|---|---|---|
Collaboration and Version Control with R projects | Create a GitHub repository for your project portfolio and make it host a public site that will include your code and blog and review how to maintain version control and collaborate using Git. Join the class slack channel and use it for all out-of-class communication, including questions and when you need assistance. As a first entry for your blog, identify a /r/dataisbeautiful post that interests you and summarize a critique for it. As a first commit to your code, identify a research topic and edit your README.md to provide a brief expanation, including where you expect to find the source(s) of your data. Finally, submit a Pull Request to edit this document with the topic area you choose to lead and a link to your site. |
February 8 | Kevin Buffardi |
Ethics and Data Science in Society | Resources: Weapons of Math Destruction chapters 3, 6; Podcast: "Science Vs - Gentrification: what's really happening". Add at least one blog entry to communicate your thoughts on ethics and societal impact of data science and how it applies to your topic. | February 15 | Grant Esparza |
Data Analytics Lifecycle | Resources: R for Data Science chapters 4 and 8 | March 15 | ? |
Regression models and Classification | Resources: Introduction to Statistical Learning chapters 2-4 | March 29 | Eduardo Gomez |
Resampling and Tree based methods | Resources: Introduction to Statistical Learning chapters 5, 8 | April 5 | Jerry Tucay |
Information Visualization | Resources: Edward Tufte keynote (video); The Schneiderman Information Visualization Mantra (video); Learning Data Visualization (via Lynda) chapter 5: Visual Dispay | April 19 | ? |
Peer Review and Replication | With a peer, perform a pull request review so they can verify that they can replicate, review, and critique your results | May 3 | Kevin Buffardi |
| Student | Project Porfolio Link |
Name | Topic repo |
---|---|
Eduardo Gomez | Crime in the United States |
Grant Esparza | Public Perception of Tech Companies Following Security Leaks |
Jerry Tucay | Sales Forecasting (https://JerryTucay.github.io) |