Skip to content
forked from xiyanghu/OSDT

Optimal Sparse Decision Trees

Notifications You must be signed in to change notification settings

zhouyuan1119/OSDT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Optimal Sparse Decision Trees (OSDT)

This accompanies the paper, "Optimal Sparse Decision Trees" by Xiyang Hu, Cynthia Rudin, and Margo Seltzer.

It appeared in the 2019 NeurIPS conference

Dependencies

  • gmp (GNU Multiple Precision Arithmetic Library)
  • mpfr (GNU MPFR Library for multiple-precision floating-point computations; depends on gmp)
  • libmpc (GNU MPC for arbitrarily high precision and correct rounding; depends on gmp and mpfr)
  • gmpy2 (GMP/MPIR, MPFR, and MPC interface to Python 2.6+ and 3.x)

Main function

The main function is the bbound() function in osdt.py.

Arguments

[x] The features of the training data.

[y] The labels of the training data.

[lamb] The regularization parameter lambda of the objective function.

[prior_metric] The scheduling policy.

  • Use curiosity to prioritize by curiosity (see Section 5 of our paper).
  • Use bound to prioritize by the lower bound.
  • Use objective to prioritize by the objective.
  • Use FIFO for first-in-first-out search.

[MAXDEPTH] Maximum depth of the tree. Default value is float('Inf').

[MAX_NLEAVES] Maximum number of leaves of the tree. Default value is float('Inf').

[niter] Maximum number of tree evaluations. Default value is float('Inf').

[logon] Record relevant trees and values during the execution. Default if False.

[support] Turn on Lower bound on leaf support. Default is True.

[incre_support] Turn on Lower bound on incremental classification accuracy. Default is True.

[accu_support] Turn on Lower bound on classification accuracy. Default is True.

[equiv_points] Turn on Equivalent points bound. Default is True.

[lookahead] Turn on Lookahead bound. Default is True.

[lenbound] Turn on Prefix-specific upper bound on number of leaves. Default is True.

[timelimit] Time limit on the running time. Default is True.

[init_cart] Initialize with CART. Default is True.

Example test code

We provide our test code in test_accuracy.py.

Dataset

See data/preprocessed/.

We used 7 datasets: Five of them are from the UCI Machine Learning Repository (tic-tac-toc, car evaluation, monk1, monk2, monk3). The other two datasets are the ProPublica recidivism data set and the Fair Isaac (FICO) credit risk datasets. We predict which individuals are arrested within two years of release ({N = 7,215}) on the recidivism data set and whether an individual will default on a loan for the FICO dataset.

About

Optimal Sparse Decision Trees

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%