Skip to content

Latest commit

 

History

History
42 lines (30 loc) · 1.14 KB

README.md

File metadata and controls

42 lines (30 loc) · 1.14 KB

Extreme-Classification

DS-GA 1003 (Machine Learning) - Group project on "Extreme Classification"

Dataset

  1. Download train.csv and dev.csv from here.

  2. Place the files inside the directory data/raw/ in the root of the repository. (Important)

  3. Run the following command to convert sparse data to "normal" dataframes, from the root of the repository.

    cd code
    python construct_data.py
  4. The dataframes can then be loaded as follows:

    import pandas as pd
    
    NUM_FEATURES = 5000
    NUM_CLASSES = 3993
    
    # Assuming we are in one of the sub-directories (code, notebooks, etc)
    features = pd.read_csv("../data/expanded/train_features.csv", names=range(NUM_FEATURES))
    labels = pd.read_csv("../data/expanded/train_labels.csv", names=range(NUM_CLASSES))

Tracking the project

Refer to the project tracker.

Requirements

  • numpy
  • pandas
  • scikit-learn
  • matplotlib
  • tqdm
  • PyTorch
  • PyTorch-Lightning
  • Tensorboard (for visualizing results)