[KDD2020] Statistically Significant Pattern Mining with Ordinal Utility (SPUR)

Thien Q. Tran, Kazuto Fukuchi, Youhei Akimoto and Jun Sakuma

What is SPUR?

SPUR stands for [S]ignificant [P]attern Mining with [U]tility [R]elationship, a method that allows preference relation between patterns and aims to find out the most useful patterns under the constraint of statistical significance.

What is special about SPUR compared to other methods?

SPUR introduce a novel iterative multiple testing procedure which is able to alternately reject a hypothesis and safely ignore some other hypotheses. To be specific, after discovered a significant pattern, SPUR can safely ignore less useful patterns compared to the discovered pattern. As a result, SPUR can save the significance budget for discovering other useful patterns. Moreover, it can leverage the Tarone’s trick to deal with large amount of patterns.

Is SPUR applicable to all Statistically Significant Pattern Mining (SSPM) problem?

No. Our testing procedure control the FWER under an assumption that the p-values obtained from the true patterns and the false patterns are independent. At the moment, we found that this assumption holds when the test to use is Fisher’s exact test, and the pattern set is designed to separate the dataset into non-overlap subsets. Please consider reading our paper for more details.

Requirements

See requirements.txt for a list of python dependencies to run our code.

How to reprocedure our experiments.

Synthesis experiment

python synthesis_exp.py
python synthesis_exp_summarizer.py

Real-world dataset experiment

Data preparation: We note that the used data is upload in this repository since the data from the source is updated regularly and the result would be different.

sh data_preprocess.sh

To run an experiment:

python real_data_exp.py [task] [alpha]

where task in [crash, adult, property, person, society] and alpha is the significant level.

To run all experiments:

sh real_data_run_all.sh

Supplementary material

The full proof of Theorem 6.1 (FWER controlling) can be found in supplementary.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[KDD2020] Statistically Significant Pattern Mining with Ordinal Utility (SPUR)

What is SPUR?

What is special about SPUR compared to other methods?

Is SPUR applicable to all Statistically Significant Pattern Mining (SSPM) problem?

Requirements

How to reprocedure our experiments.

Synthesis experiment

Real-world dataset experiment

Supplementary material

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
func		func
LICENSE		LICENSE
README.org		README.org
data_preprocess.sh		data_preprocess.sh
real_data_exp.py		real_data_exp.py
real_data_run_all.sh		real_data_run_all.sh
requirements.txt		requirements.txt
supplementary.pdf		supplementary.pdf
synthesis_exp.py		synthesis_exp.py
synthesis_exp_summarizer.py		synthesis_exp_summarizer.py

License

dizzyvn/significant-pattern-mininig-with-utility

Folders and files

Latest commit

History

Repository files navigation

[KDD2020] Statistically Significant Pattern Mining with Ordinal Utility (SPUR)

What is SPUR?

What is special about SPUR compared to other methods?

Is SPUR applicable to all Statistically Significant Pattern Mining (SSPM) problem?

Requirements

How to reprocedure our experiments.

Synthesis experiment

Real-world dataset experiment

Supplementary material

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages