Skip to content

Latest commit

 

History

History
105 lines (55 loc) · 2 KB

README.md

File metadata and controls

105 lines (55 loc) · 2 KB

Coursework for MIDS Scaling Up! Really Big Data

This is an index of coursework for the MIDS class "Scaling Up! Really Big Data". Please submit corrections if you find problems in the assignments. Submissions should be well-formed git pull requests.

Week 2: Cloud Computing 101

Labs

  1. Salt States and Docker deployment of the ELK stack

Week 3: Openstack Introduction

Labs

  1. Hadoop over OpenStack DevStack using Sahara

Week 4: Distributed Filesystems

Homework

This is a graded homework

  1. Part 1- GPFS setup
  2. Part 2- The Mumbler

Labs

There will be no in-class lab for this assignment

Week 5: Distributed Filesystems

Homework

  1. Part 1- Hadoop v1 Setup
  2. Part 2- Hadoop v2 Setup

Labs

(Complete the following in order)

  1. Load Google 2-gram dataset into HDFS
  2. Preprocess 2-gram data for Mumbler

Week 6: Apache Spark

Homework

  1. Apache Spark Introduction

Labs

  1. Machine Learning with Spark and MLLib

Week 7: Object Storage

Homework

  1. Object Storage

Labs

(Complete the following in order)

  1. Data Transfer Performance
  2. Rsync Investigation

Week 8: NoSQL

Homework

  1. NoSQL

Week 9: Spark Streaming

Homework

  1. Streaming Tweet Processing

Labs

  1. Spark Streaming and Cassandra

Week 10: Scaling Up

Homework

  1. Orchestrate with Brooklyn

Labs

  1. Brooklyn labs

Week 11: Spark ML Round 2

(Homework-free week!)

Labs

  1. Streaming Analytics with AlchemyAPI

Week 12: Search

Homework

  1. Elasticsearch