Skip to content

clojurecup2014/cloujera

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cloujera

Cloujera lets you do a fine-grained search for spoken words in Coursera's videos. It does this by performing full text searches on the transcripts of videos on coursera.

Local Setup

  1. Bring up Vagrant (elasticsearch + redis): vagrant up

  2. Compile the clojurescript: (Make sure you have java >1.7) lein cljsbuild once

  3. Start the app: lein run

  4. On the first run, visit http://127.0.0.1:8080/burglar/go to seed the db (it will error out ridiculously with an IndexMissingException from elasticsearch if you don't do this!);

Testing dockerized cloujera inside Vagrant VM

$ vagrant ssh
$ cd /vagrant
$ ./scripts/deploy.sh

NOTE: the address to access the dockerized cloujera is http://127.0.0.1:8081 (see Vagrantfile)

Testing uberjar inside Vagrant

$ vagrant ssh
$ cd /vagrant
$ source ./scripts/prod-env.sh
$ lein uberjar
$ java -jar ./target/uberjar/cloujera-*-standalone.jar

NOTE: the address to access the uberjarred cloujera running on port 8080 is http://127.0.0.1:8082 (see Vagrantfile)

Scraping courses

Visiting http://cloujera.whatever/burglar/go scrapes some 10 courses to get you started;

To scrape another course, you need to:

  1. Visit the cloujera session API https://api.coursera.org/api/catalog.v1/sessions and choose a course

  2. Sign up for the course and agree to honour code manually for the [email protected] user

  3. Find the video lecture URL (videoLecturesURL)

  4. Perform an http POST http://cloujera.whatever/burglar/raid with this payload (JSON):

    { "url": videoLecturesURL }
    

    For example:

    { "url": "https://class.coursera.org/apcalcpart1-001/lecture" }
    

Deployment

Provisioning (The first time)

$ ssh user@cloudmachine
$ git clone https://github.com/vise890/cloujera
$ cd cloujera
$ sudo ./scripts/provision.sh

(Re-)Deploying cloujera

# in the cloujera directory...
$ ./scripts/deploy.sh

NOTE: deploy.sh pulls the most recent version of cloujera from the repo

Troubleshooting

Ensure that all the containers are running in the VM:

$ vagrant ssh
$ sudo docker ps -a

You should see redis, elasticsearch and cloujera running

Checking the cloujera logs

$ vagrant ssh
$ sudo docker logs cloujera

Checking Elasticsearch health

Visit http://localhost:9200/, you should see status: 200

Checking if Redis is running

redis-cli will drop you into a Redis shell. Some useful commands are: INFO, MONITOR, HELP, HELP @server.

NOTE: this works form the host as well as in the Vagrant VM

Dropping into a shell inside a container

$ vagrant ssh || ssh user@cloudbox
$ sudo docker exec -i -t cloujera bash

BUGS

  • lein run doesn't give any output initially
  • lein run doesn't reload