Skip to content

Commit

Permalink
Azadkia-Chatterjee coefficient (#385)
Browse files Browse the repository at this point in the history
* reproducible test for chatterjee coefficient too

* be explicit with keyword

* Azadkia-Chatterjee coefficient

* Fix tests

* Fix header

* Add missing reference

* Show method

* More tests based on the original paper
  • Loading branch information
kahaaga authored Aug 4, 2024
1 parent a0014fc commit 27a2bb8
Show file tree
Hide file tree
Showing 21 changed files with 434 additions and 13 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name = "Associations"
uuid = "614afb3a-e278-4863-8805-9959372b9ec2"
authors = ["Kristian Agasøster Haaga <[email protected]>", "Tor Einar Møller <[email protected]>", "George Datseris <[email protected]>"]
repo = "https://github.com/kahaaga/Associations.jl.git"
version = "4.1.0"
version = "4.2.0"

[deps]
Accessors = "7d9f7c33-5ae7-4f3b-8dc6-eff91059b697"
Expand Down
4 changes: 4 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

From version v4.0 onwards, this package has been renamed to to Associations.jl.

# 4.2

- New association measure: `AzadkiaChatterjeeCoefficient`.

# 4.1

- New association measure: `ChatterjeeCorrelation`.
Expand Down
11 changes: 11 additions & 0 deletions docs/refs.bib
Original file line number Diff line number Diff line change
Expand Up @@ -1322,4 +1322,15 @@ @article{Dette2013
pages={21--41},
year={2013},
publisher={Wiley Online Library}
}

@article{Azadkia2021,
title={A simple measure of conditional dependence},
author={Azadkia, Mona and Chatterjee, Sourav},
journal={The Annals of Statistics},
volume={49},
number={6},
pages={3070--3102},
year={2021},
publisher={Institute of Mathematical Statistics}
}
1 change: 1 addition & 0 deletions docs/src/associations.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ PearsonCorrelation
PartialCorrelation
DistanceCorrelation
ChatterjeeCorrelation
AzadkiaChatterjeeCoefficient
```

## [Cross-map measures](@id cross_map_api)
Expand Down
17 changes: 17 additions & 0 deletions docs/src/examples/examples_associations.md
Original file line number Diff line number Diff line change
Expand Up @@ -1724,5 +1724,22 @@ z = rand(rng, 1:15, 120) .* sin.(w) # introduce some dependence
association(ChatterjeeCorrelation(handle_ties = true), w, z)
```

## [[`AzadkiaChatterjeeCoefficient`](@ref)](@id example_AzadkiaChatterjeeCoefficient)


```@example example_AzadkiaChatterjeeCoefficient
using Associations
using Random; rng = Xoshiro(1234);
x = rand(rng, 120)
y = rand(rng, 120) .* x
z = rand(rng, 120) .+ y
```

For the variables above, where `x → y → z`, we expect stronger assocation between `x` and `y` than
between `x` and `z`. We also expect the strength of the association between `x` and `z` to drop when conditioning on `y`, because `y` is the variable that connects `x` and `z`.

```@example example_AzadkiaChatterjeeCoefficient
m = AzadkiaChatterjeeCoefficient(theiler = 0) # only exclude self-neighbors
association(m, x, y), association(m, x, z), association(m, x, z, y)
```

77 changes: 77 additions & 0 deletions docs/src/examples/examples_independence.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,58 @@ independence(test, y, z)

The test clearly picks up on the functional dependence.


### [Azadkia-Chatterjee coefficient](@id example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient)

```@example example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient
using Associations
using Random; rng = Xoshiro(1234)
n = 1000
# Some categorical variables (we add a small amount of noise to avoid duplicate points
# during neighbor searches)
x = rand(rng, 1.0:50.0, n) .+ rand(n) .* 1e-8
y = rand(rng, 1.0:50.0, n) .+ rand(n) .* 1e-8
test = SurrogateAssociationTest(AzadkiaChatterjeeCoefficient(), nshuffles = 19)
independence(test, x, y)
```

As expected, the test indicates that we can't reject independence. What happens if we introduce
a third variable that depends on `y`?

```@example example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient
z = rand(rng, 1.0:20.0, n) .* y
independence(test, y, z)
```

The test clearly picks up on the functional dependence. But what about conditioning?
Let's define three variables where `x → y → z`. When then expect significant association between `x` and `y`, possibly between `x` and `z` (depending on how strong the intermediate connection is), and
non-significant association between `x` and `z` if conditioning on `y` (since `y` is the variable
connecting `x` and `z`.) The Azadkia-Chatterjee coefficient also should be able to verify these
claims.

```@example example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient
x = rand(rng, 120)
y = rand(rng, 120) .* x
z = rand(rng, 120) .+ y
independence(test, x, y)
```

The direct association between `x` and `y` is detected.

```@example example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient
independence(test, x, z)
```

The indirect association between `x` and `z` is also detected.

```@example example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient
independence(test, x, z, y)
```

We can't reject independence between `x` and `z` when taking into consideration
`y`, as expected.


### [Distance correlation](@id example_SurrogateAssociationTest_DistanceCorrelation)

```@example
Expand Down Expand Up @@ -393,3 +445,28 @@ The same goes for variables one step up the chain:
```@example example_LocalPermutationTest
independence(test, y, w, z)
```


### [[`AzadkiaChatterjeeCoefficient`](@ref)](@id example_LocalPermutationTest_AzadkiaChatterjeeCoefficient)

```@example example_LocalPermutationTest_AzadkiaChatterjeeCoefficient
using Associations
using Random; rng = Xoshiro(1234)
n = 300
# Some categorical variables (we add a small amount of noise to avoid duplicate points
# during neighbor searches)
test = LocalPermutationTest(AzadkiaChatterjeeCoefficient(), nshuffles = 19)
x = rand(rng, n)
y = rand(rng, n) .* x
z = rand(rng, n) .+ y
```

Let's define three variables where `x → y → z`. We expect a
non-significant association between `x` and `z` if conditioning on `y` (since `y` is the variable
connecting `x` and `z`.)

```@example example_LocalPermutationTest_AzadkiaChatterjeeCoefficient
independence(test, x, z, y)
```

The test verifies our expectation.
3 changes: 2 additions & 1 deletion src/core.jl
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ an [`AssociationMeasureEstimator`](@ref) to compute.
| Correlation | [`PartialCorrelation`](@ref) | ✓ | ✓ |
| Correlation | [`DistanceCorrelation`](@ref) | ✓ | ✓ |
| Correlation | [`ChatterjeeCorrelation`](@ref) | ✓ | ✖ |
| Correlation | [`AzadkiaChatterjeeCoefficient`](@ref) | ✓ | ✓ |
| Closeness | [`SMeasure`](@ref) | ✓ | ✖ |
| Closeness | [`HMeasure`](@ref) | ✓ | ✖ |
| Closeness | [`MMeasure`](@ref) | ✓ | ✖ |
Expand Down Expand Up @@ -93,6 +94,7 @@ Concrete subtypes are given as input to [`association`](@ref).
| [`DistanceCorrelation`](@ref) | Not required |
| [`PartialCorrelation`](@ref) | Not required |
| [`ChatterjeeCorrelation`](@ref) | Not required |
| [`AzadkiaChatterjeeCoefficient`](@ref) | Not required |
| [`SMeasure`](@ref) | Not required |
| [`HMeasure`](@ref) | Not required |
| [`MMeasure`](@ref) | Not required |
Expand Down Expand Up @@ -125,7 +127,6 @@ Concrete subtypes are given as input to [`association`](@ref).
| [`KLDivergence`](@ref) | [`JointProbabilities`](@ref) |
| [`RenyiDivergence`](@ref) | [`JointProbabilities`](@ref) |
| [`VariationDistance`](@ref) | [`JointProbabilities`](@ref) |
"""
abstract type AssociationMeasureEstimator end

Expand Down
21 changes: 13 additions & 8 deletions src/independence_tests/local_permutation/LocalPermutationTest.jl
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,15 @@ instead of `Z` and we `I(X; Y)` and `Iₖ(X̂; Y)` instead of `I(X; Y | Z)` and
## Compatible measures
| Measure | Pairwise | Conditional | Requires `est` | Note |
| ---------------------------------- | :------: | :---------: | :------------: | :-------------------------------------------------------------------------------------------------------------------------------: |
| [`PartialCorrelation`](@ref) | ✖ | ✓ | No | |
| [`DistanceCorrelation`](@ref) | ✖ | ✓ | No | |
| [`CMIShannon`](@ref) | ✖ | ✓ | Yes | |
| [`TEShannon`](@ref) | ✓ | ✓ | Yes | Pairwise tests not possible with `TransferEntropyEstimator`s, only lower-level estimators, e.g. `FPVP`, `GaussianMI` or `Kraskov` |
| [`PartialMutualInformation`](@ref) | ✖ | ✓ | Yes | |
| Measure | Pairwise | Conditional | Requires `est` | Note |
| -------------------------------------- | :------: | :---------: | :------------: | :-------------------------------------------------------------------------------------------------------------------------------: |
| [`PartialCorrelation`](@ref) | ✖ | ✓ | No | |
| [`DistanceCorrelation`](@ref) | ✖ | ✓ | No | |
| [`CMIShannon`](@ref) | ✖ | ✓ | Yes | |
| [`TEShannon`](@ref) | ✓ | ✓ | Yes | Pairwise tests not possible with `TransferEntropyEstimator`s, only lower-level estimators, e.g. `FPVP`, `GaussianMI` or `Kraskov` |
| [`PartialMutualInformation`](@ref) | ✖ | ✓ | Yes | |
| [`AzadkiaChatterjeeCoefficient`](@ref) | ✖ | ✓ | No | |
The `LocalPermutationTest` is only defined for conditional independence testing.
Exceptions are for measures like [`TEShannon`](@ref), which use conditional
Expand All @@ -96,6 +98,8 @@ The nearest-neighbor approach in Runge (2018) can be reproduced by using the
Conditional independence test using [`CMIShannon`](@ref)
- [Example 2](@ref example_LocalPermutationTest_TEShannon)):
Conditional independence test using [`TEShannon`](@ref)
- [Example 3](@ref example_LocalPermutationTest_AzadkiaChatterjeeCoefficient):
Conditional independence test using [`AzadkiaChatterjeeCoefficient`](@ref)
"""
struct LocalPermutationTest{M, C, R} <: IndependenceTest{M}
est_or_measure::M
Expand Down Expand Up @@ -231,5 +235,6 @@ end
function LocalPermutationTest(m::MultivariateInformationMeasure; kwargs...)
throw(ArgumentError("You need to provide an estimator for the multivariate information measure $(typeof(m)), not only the definition."))
end
# TODO: fix this

include("transferentropy.jl")
include("azadkia_chatterjee_coefficient.jl")
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
function independence(test::LocalPermutationTest{<:AzadkiaChatterjeeCoefficient}, x::AbstractVector, y, z)
est_or_measure, nshuffles = test.est_or_measure, test.nshuffles
# Make sure that the measure is compatible with the input data.
verify_number_of_inputs_vars(est_or_measure, 3)

Y, Z = StateSpaceSet(y), StateSpaceSet(z)
@assert length(x) == length(Y) == length(Z)
= association(est_or_measure, x, Y, Z)
Îs = permuted_Îs(x, Y, Z, est_or_measure, test)
p = count(Î .<= Îs) / nshuffles
return LocalPermutationTestResult(3, Î, Îs, p, nshuffles)
end
5 changes: 5 additions & 0 deletions src/independence_tests/surrogate/SurrogateAssociationTest.jl
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ For each shuffle, `est_or_measure` is recomputed and the results are stored.
[`CMIShannon`](@ref) test for conditional independence on categorical data.
- [Example 6](@ref example_independence_MCR): [`MCR`](@ref) test for
pairwise and conditional independence.
- [Example 7](@ref example_SurrogateAssociationTest_ChatterjeeCorrelation).
[`ChatterjeeCorrelation`](@ref) test for pairwise independence.
- [Example 8](@ref example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient).
[`AzadkiaChatterjeeCoefficient`](@ref) test for pairwise and conditional independence.
"""
struct SurrogateAssociationTest{E, R, S} <: IndependenceTest{E}
est_or_measure::E
Expand Down Expand Up @@ -152,6 +156,7 @@ include("transferentropy.jl")
include("crossmapping.jl")
include("hlms_measure.jl")
include("chatterjee_correlation.jl")
include("azadkia_chatterjee_coefficient.jl")

# Input checks
function SurrogateAssociationTest(measure::T) where T <: MultivariateInformationMeasure
Expand Down
19 changes: 19 additions & 0 deletions src/independence_tests/surrogate/azadkia_chatterjee_coefficient.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
function independence(test::SurrogateAssociationTest{<:AzadkiaChatterjeeCoefficient},
x::AbstractVector, y_and_possibly_z...)
(; est_or_measure, rng, surrogate, nshuffles) = test

# Create a new instance of the measure with pre-allocated values. We can then
# re-use this struct to avoid excessive allocations.
measure = AzadkiaChatterjeeCoefficient(x, y_and_possibly_z...;
theiler = test.est_or_measure.theiler)

= association(measure, x, y_and_possibly_z...)
sx = surrogenerator(x, surrogate, rng)
Îs = zeros(nshuffles)
for b in 1:nshuffles
Îs[b] = association(est_or_measure, sx(), y_and_possibly_z...)
end
p = count(Î .<= Îs) / nshuffles

return SurrogateAssociationTestResult(1 + length(y_and_possibly_z), Î, Îs, p, nshuffles)
end
Loading

2 comments on commit 27a2bb8

@kahaaga
Copy link
Member Author

@kahaaga kahaaga commented on 27a2bb8 Aug 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/112378

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v4.2.0 -m "<description of version>" 27a2bb86b2f23abb0392aabf74f23743551b6cbe
git push origin v4.2.0

Please sign in to comment.