Fluctuation complexity, restrict possibilites to formally defined self-informations #413

kahaaga · 2024-06-10T07:40:56Z

What's this?

Here I address #410 and restrict the fluctuation complexity to information measures for which it is possible to define "self-information" in the following sense.

Given an information measure H, I define the "generalized" self-information as the functional I(p_i) that allows us to re-write the expression for H as a probability-weighted sum H = sum_i (p_i I(p_i)) ( a weighted average, but since sum(p_i) = 1, the denominator in the weighted average doesn't appear explicitly).

Next, the fluctuation complexity is the square root of sum_{i=1}^N p_i(I(p_i) - H)^2). Hence, using the formulation above, we can meaningfully speak about a fluctuation of local information around the mean of information, regardless of which measure is chosen.

I also require that the generalized self-information will yield a fluctuation complexity that have the same properties as the original Shannon-based fluctuation complexity:

that is zero for uniform distribution
that it is zero for distributions where p_k = 1 and p_i = 0 for i != k.

Note that we don't involve the axioms which Shannon self-information fulfill at all: we only demand that the generalized self-information is the functional with the properties above. I haven't been able, at least until now, to find any papers in the literature that deals with this concept for Tsallis or other generalized entropies, so I think it is safe to explore with this naming convention.

New design

I introduce a new API method self_information(measure::InformationMeasure, p_i, args...).
This method is called inside information(measure::FluctuationComplexity, args...) to compute the I(p_i) terms inside the sum of the fluctuation complexity. Only measures that implement self_information are valid, otherwise an error will be thrown.

Progress

I've made the necessary derivations for the measures where calculations looked easiest: Shannon entropy/extropy, Tsallis entropy and Curado entropy. I'll fill in the gaps for the rest of the measures whenever I get some free time.

I'm writing this all up in a paper, where I also highlight ComplexityMeasures.jl and how easy it is to use the measure practically due to our discrete estimation API. I've essentially finished the intro and method, but the experimental part remains to be done. For that, I need functional code. So before I proceed, I'd like to get your input on this code proposal, @Datseris. Does this dispatch-based system make sense?

Pending the paper, I verify correctness by numerical comparison in the test suites. I re-write the information measures as weighted sums involving self_information, and check that we obtain the same value as if computing the measure using the traditional formulations.

Datseris · 2024-06-18T09:07:04Z

This all sounds good to me, however I dislike the term self_information. I don't understand at all where the "self" refers to. The correct way to call this quantity would be information. But we have used this name already for the average information in a whole signal instead. surprisal is okay. In our textbook we just call this surprise. But perhaps an overall better is unit_information?

Datseris · 2024-06-18T09:09:31Z

Oh, just now I saw the wiki https://en.wikipedia.org/wiki/Information_content

Right, I would still stay away from self, but we can use information_content instead.

kahaaga · 2024-06-18T10:06:24Z

Oh, just now I saw the wiki https://en.wikipedia.org/wiki/Information_content

Right, I would still stay away from self, but we can use information_content instead.

I'm fine with whatever term we use, as long as it is rooted in common literature usage. We can go for information_content.👍

Datseris · 2024-06-18T09:07:48Z

src/core/information_functions.jl

@@ -279,6 +280,39 @@ function information(::InformationMeasure, ::DifferentialInfoEstimator, args...)
    ))
 end

+"""
+    self_information(measure::InformationMeasure, pᵢ)


Suggested change

self_information(measure::InformationMeasure, pᵢ)

self_information(measure::InformationMeasure, p::Real)

I'd suggest that we use just p here and make it clear in the docstring that this is a number. We use probs for Probabilities in most places in the library.

Another argument is that the subscript i is out of context here, and may confuse instead of clarify.

Agreed, we can call this simply p. It is clear that p is from a distribution.

Datseris · 2024-06-18T09:08:27Z

src/core/information_functions.jl

+Compute the "self-information"/"surprisal" of a single probability `pᵢ` under the given 
+information measure. 
+
+This function assumes `pᵢ > 0`, so make sure to pre-filter your probabilities.


Suggested change

This function assumes `pᵢ > 0`, so make sure to pre-filter your probabilities.

This function requires `pᵢ > 0`.

Just require it and throw error in the function body.

Ah, I see the problem here. You want to define all information content functions with their simple syntax as we anyways filter 0 probabilities when we compute entropy.

Okay, let's say then: "This function requires pᵢ > 0, giving 0 will yield Inf or NaN.".

Yep, we can say that.

src/information_measure_definitions/fluctuation_complexity.jl

Datseris · 2024-06-18T09:15:29Z

src/information_measure_definitions/tsallis_extropy.jl

@@ -57,3 +57,8 @@ function information_maximum(e::TsallisExtropy, L::Int)

    return ((L - 1) * L^(q - 1) - (L - 1)^q) / ((q - 1) * L^(q - 1))
 end
+
+function self_information(e::TsallisExtropy, pᵢ, N) #must have N


What is N here? This is not reflected in the function definition or call signature.

If N is either the length of probs, or the total number of outcomes, then this quantity does not satisfy the definition of an information content, as it isn't an exclusive function of a single real number. You can explore options in the paper, but here i'd say we don't keep it in the software until it is more solid.

I'll see if I can define a functional that doesn't depend on N, if not we keep it out.

Co-authored-by: George Datseris <[email protected]>

kahaaga · 2024-06-18T10:54:43Z

@Datseris I'm also a bit hesitant about the name FluctuationComplexity. I mean: yes - it is a complexity measure in the sense that it uses entropies & friends. However, I think a better, more general name is InformationFluctuation, which is more in line with the fact that we define information_content.

This goes a bit against the terminology used in the literature, but I think it is more precise. What do you think?

EDIT: or maybe just Fluctuation(measure::InformationMeasure). Then it is implicit that it is fluctuation of information, since it takes as input an InformationMeasure

kahaaga · 2024-06-18T10:59:51Z

This is how it will look in the paper if using Fluctuation. I think this is nice and clean syntax.

Datseris · 2024-06-18T11:00:15Z

We can keep FluctuationComplexity and have it as a complexity measure with reference to the literature article. It dispatches to InformationFluctuation with Shannon. The name InformationFluctuation can do the generic thing you attempt to do in this PR and should cite your paper once you are done with it.

Datseris · 2024-06-18T11:00:58Z

I don't like the generic Fluctuation because it is too generic. Fluctuation Dissipation theorems for example.

Datseris · 2024-06-23T10:08:41Z

src/core/information_functions.jl

-Compute the "self-information"/"surprisal" of a single probability `pᵢ` under the given 
-information measure. 
+Compute the "self-information"/"surprisal" of a single probability `p` under the given 
+information measure, assuming that `p` is part of a length-`N` probability distribution.


Unfortunately I don't agree with the latest change of requiring N. It seems simpler, and more reasonable, to simply not allow Curado to be part of this interface. The opposite, defining the information unit as depending on N, doesn't make much sense at least not with how Shannon introduced it.

@Datseris Curado is not the only information measure whose surprisal/self-information depends explicitly on N, when following the definition of an information measure as a probability weighted average of the surprisal (as I do in the paper).

The opposite, defining the information unit as depending on N, doesn't make much sense at least not with how Shannon introduced it.

In the context of Shannon information unit alone, I agree. But the point of this interface is to generalize the Shannon information unit. This inevitably introduces N as a parameter.

Can we discuss the final API when I'm done with writing up the paper? I'm not too far from finishing it; I just need to generate a few example applications. Since I am using this PR for the paper analyses, it would be nice to not change anything in this draft PR until the paper is ready.

The alternative is to have information_content/information_unit which dispatches to a subset of InformationMeasures, and then generalized_information_content/generalized_information_unit which dispatches to those InformationMeasures whose generalization of information unit depends on N. But that kind of defeats the purpose of having an interface to begin with - since we're back at defining multiple functions with different names for things that are fundamentally identical (modulo the parameter N).

PS: I sent you a link to the paper draft, @Datseris

…cal_encoding

…exity_formal

kahaaga and others added 9 commits June 9, 2024 17:54

Generalized self_information function

95f8b94

Some generalied self-informations

58b80e6

Self-information for Shannon extropy

f5d5b87

Update FluctuationComplexity docstring

805b60f

Merge branch 'main' into fluctuation_complexity_formal

785cb6c

use self_info

0d8abd7

Add Anteneodo-Plastino self-info and update syntax

bb2712f

Docs

1d4f326

Merge branch 'main' into fluctuation_complexity_formal

e3c4d39

Datseris reviewed Jun 18, 2024

View reviewed changes

Update src/information_measure_definitions/fluctuation_complexity.jl

780176b

Co-authored-by: George Datseris <[email protected]>

wip...

ba6b73b

Datseris reviewed Jun 23, 2024

View reviewed changes

kahaaga added 6 commits June 23, 2024 13:57

Merge branch 'fluctuation_complexity_formal' into sequential_categori…

bb1aa37

…cal_encoding

wip..

0e7a2d1

Merge branch 'sequential_categorical_encoding' into fluctuation_compl…

0132c05

…exity_formal

Merge branch 'main' into fluctuation_complexity_formal

6c26097

Update docs

30a82f7

self_information for identification entropy

ddea888

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluctuation complexity, restrict possibilites to formally defined self-informations #413

Fluctuation complexity, restrict possibilites to formally defined self-informations #413

kahaaga commented Jun 10, 2024 •

edited

Loading

Datseris commented Jun 18, 2024

Datseris commented Jun 18, 2024

kahaaga commented Jun 18, 2024

Datseris Jun 18, 2024

Datseris Jun 18, 2024

kahaaga Jun 18, 2024

Datseris Jun 18, 2024

Datseris Jun 18, 2024

kahaaga Jun 18, 2024

Datseris Jun 18, 2024

kahaaga Jun 18, 2024

kahaaga commented Jun 18, 2024 •

edited

Loading

kahaaga commented Jun 18, 2024

Datseris commented Jun 18, 2024

Datseris commented Jun 18, 2024

Datseris Jun 23, 2024

kahaaga Jun 23, 2024

kahaaga Jun 23, 2024

kahaaga Jun 23, 2024

	self_information(measure::InformationMeasure, pᵢ)
	self_information(measure::InformationMeasure, p::Real)

	This function assumes `pᵢ > 0`, so make sure to pre-filter your probabilities.
	This function requires `pᵢ > 0`.

Fluctuation complexity, restrict possibilites to formally defined self-informations #413

Are you sure you want to change the base?

Fluctuation complexity, restrict possibilites to formally defined self-informations #413

Conversation

kahaaga commented Jun 10, 2024 • edited Loading

What's this?

New design

Progress

Datseris commented Jun 18, 2024

Datseris commented Jun 18, 2024

kahaaga commented Jun 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kahaaga commented Jun 18, 2024 • edited Loading

kahaaga commented Jun 18, 2024

Datseris commented Jun 18, 2024

Datseris commented Jun 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kahaaga commented Jun 10, 2024 •

edited

Loading

kahaaga commented Jun 18, 2024 •

edited

Loading