A collection of code and resources to serve as a starting point for data science projects. For more explanation and material on R visit my blog.
- Resources - Websites and references that I find helpful for data science projects
- Developing With R - Notes on R package development
- How to Git - version control with git
- How to Anaconda - managing environments with Anaconda
- Visualization Cookbook (R) - A wide variety of data visualizations demonstrated.
- Geospatial Data Analysis (R) - Making maps with R.
- Modeling Fundamentals (R) - A primer on logistic and linear regression modeling with the classic Titanic dataset.
- Survival Analysis (R) - Survival analysis methods such as cox proportion hazard models and Kaplan-Meier curves.
- Modeling Workflows (R) - Streamlined Tidyverse modeling workflows with the gapminder dataset.
- Multilevel Models (R) - Multi-level aka. mixed effects models
- Time Series Modeling (R) - Experimenting with time series modeling (tsibble, forecast libraries, prophet, etc.)
- Ordinal Regression (R) - Experimenting with ordinal (ranked categorical outcome) regression
- Presenting Regression Models (R) - Code for cleaning the outputs of regression models for presentations.
- Sklearn Modeling Workflows (Python) - Modeling workflows with sklearn (cross-validation, randomized search for optimizing hyperparameters, lift curves).
- Sklearn - Skopt Workflow (Python) - Modeling workflow with sklearn and scikit-optimize (bayesian hyperparameter optimization.
- Machine Learning with Caret (R) - Using the Caret library for machine learning.
- Parsnip (R) - fitting models with the parsnip package (from tidymodels)
- Bayesian Basics (R) - exploring a simple Bayesian multilevel model
- Bayesian Modeling (R) - Experimenting with Bayesian models using rstanarm
- Comparing Bayesian Packages (R) - Comparing rstanarm, brms, and rstan.
- k-means clustering (R) - Using the k-means algorithm to cluster data.
- Clustering (Python) - Agglomerative (Hierarchical) clustering, k-means clustering, and Gaussian mixture models
- Power Analysis (R) - Statistical power analysis
- Distribution Sampling and Hypothesis Testing (R)
- Hypothesis Testing (R)
- Document Embeddings (Python) - Using word embeddings to compare the similarity of State of the Union addresses.
- State of the Union Analysis (Python) - An exploration of state of the union addresses with topic modeling and sentiment analysis.
- Sentiment Analysis (R) - Exploring sentiment analysis in R.
- LSTM Demo (Python) - An LSTM network for predicting if a company review from glassdoor is positive
- R-Quickstart (R) - Minimal data analysis and visualization workflows. See the blog post "Data Science Essentials" for more details and explanation.
- Creating Formatted Spreadsheets (R) - How to create a custom formatted spreadsheet report with the openxlsx R package.
- Using Python and R Together - How to use python and R code together in the same Jupyter notebook with the rpy2 python package.
- R Quotation (R) - If you want to do certain things such as pass variable names as arguments to a function in R, you have to use quotation methods like
quo()
andenquo()
. This notebook demonstrates how to do this. See my blog post on Tidy Evaluation for more details and explanation. - SQL Databases (Python) - Code for creating and manipulating a SQL database.