Thien Q. Tran, Kazuto Fukuchi, Youhei Akimoto and Jun Sakuma
SPUR stands for [S]ignificant [P]attern Mining with [U]tility [R]elationship, a method that allows preference relation between patterns and aims to find out the most useful patterns under the constraint of statistical significance.
SPUR introduce a novel iterative multiple testing procedure which is able to alternately reject a hypothesis and safely ignore some other hypotheses. To be specific, after discovered a significant pattern, SPUR can safely ignore less useful patterns compared to the discovered pattern. As a result, SPUR can save the significance budget for discovering other useful patterns. Moreover, it can leverage the Tarone’s trick to deal with large amount of patterns.
No. Our testing procedure control the FWER under an assumption that the p-values obtained from the true patterns and the false patterns are independent. At the moment, we found that this assumption holds when the test to use is Fisher’s exact test, and the pattern set is designed to separate the dataset into non-overlap subsets. Please consider reading our paper for more details.
See requirements.txt for a list of python dependencies to run our code.
python synthesis_exp.py
python synthesis_exp_summarizer.py
Data preparation: We note that the used data is upload in this repository since the data from the source is updated regularly and the result would be different.
sh data_preprocess.sh
To run an experiment:
python real_data_exp.py [task] [alpha]
where task in [crash, adult, property, person, society] and alpha is the significant level.
To run all experiments:
sh real_data_run_all.sh
The full proof of Theorem 6.1 (FWER controlling) can be found in supplementary.pdf