amld2020-unsupervised

Repository for the Fraud Detection (outlier detection on mixed data) workshop, AMLD 2020

General instructions:

By cloning the repo, you will have the notebooks and data files needed during the workshop
To have a working environment for the code, you may either:
- Use the colab link in the *_colab.ipynb notebooks (NB: a google account is needed for this!)
- Use Docker (in that case, be sure to run the Dockerfile, see below)
- Install the packages from the requirements.txt
NB: All has been tested with Python 3.6.x only For the workshop challenge, you will be submitting to an API that is hosted on AWS. Internet access is therefore vital! (the data quantities are however rather small).

Cloning the repo

To clone the repo:

mkdir <DIR>

cd <DIR>

git clone https://github.com/amld/workshop-unsupervised-fraud

cd workshop-unsupervised-fraud/

Running the exercises

During the workshop, we will work on two Notebooks, exercises_1.ipynb (or exercises_1_colab.ipynb) and challenge.ipynb (or challenge_colab.ipynb).

Instructions will be given during the workshop.

Getting the right Python Environments

The notebooks in the directory /notebooks depend on packages like scikit-learn and pyod, which in turn have other dependencies. To guarantee a compatible environment, there are three options.

The first option is to use the _colab.ipynb notebooks that have a colab link. For colab, a Google account is necessary.

The second one is to use the Dockerfile that is provided in \docker-python. Note that Docker needs to be installed, and that the Docker image is large, almost 4GB, so this needs to be done before the workshop.

The final option is to create a conda environment (or other virtual environment) with the packages in requirements.txt installed.

Colab instructions (option 1)

Open the jupyter notebook with the colab link (_colab.ipynb). This link will direct you to Google's colab.

Docker instructions (option 2)

The Docker image is based on the docker image jupyter/tensorflow-notebook, see also https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-tensorflow-notebook This image may also be useful for other workshops.

Installing Docker and downloading the tensorflow-notebook image requires roughly 6 GB of disk space.

To get Docker up and running,

Download and install Docker Desktop https://www.docker.com/products/docker-desktop
In the base directory, execute:

$ docker build docker-python -t jupyter-outlieramld This will fetch the base image and additionally install keras and pyod
Check that the image is built: $ docker images. The image jupyter-outlieramld should be there
The following command will run the image, broadcast the notebook server to port 8888, and attach a volume (a connection to the host' filesystem) $ docker run -it -p 8888:8888 -v $(pwd):/home/jovyan jupyter-outlieramld

Copy-paste the link (http://127.0.0.1:8888/?token=124a64...) into a browser.

Docker tips

The container can be stopped by ctrl-c in the terminal when the notebook is running (the normal way)
$ docker ps -a shows all Docker containers, running and stopped
A terminal may be opened in a running Docker container, with $ docker exec -it <container id> bash
To kill all stopped Docker containers (which may save some space, no need to do so when in doubt): $ docker containers prune

Creating a conda- or virtual environment (option 3)

Run pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
data		data
docker-python		docker-python
notebooks		notebooks
presentation		presentation
.bash_history		.bash_history
.gitignore		.gitignore
.python_history		.python_history
LICENSE		LICENSE
README.md		README.md
datapreparation_pendigits.py		datapreparation_pendigits.py
outlierutils.py		outlierutils.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

amld2020-unsupervised

General instructions:

Cloning the repo

Running the exercises

Getting the right Python Environments

Colab instructions (option 1)

Docker instructions (option 2)

Creating a conda- or virtual environment (option 3)

About

Releases

Packages

Languages

License

amld/workshop-unsupervised-fraud

Folders and files

Latest commit

History

Repository files navigation

amld2020-unsupervised

General instructions:

Cloning the repo

Running the exercises

Getting the right Python Environments

Colab instructions (option 1)

Docker instructions (option 2)

Creating a conda- or virtual environment (option 3)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages