Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluctuation complexity, restrict possibilites to formally defined self-informations #413

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

kahaaga
Copy link
Member

@kahaaga kahaaga commented Jun 10, 2024

What's this?

Here I address #410 and restrict the fluctuation complexity to information measures for which it is possible to define "self-information" in the following sense.

Given an information measure H, I define the "generalized" self-information as the functional I(p_i) that allows us to re-write the expression for H as a probability-weighted sum H = sum_i (p_i I(p_i)) ( a weighted average, but since sum(p_i) = 1, the denominator in the weighted average doesn't appear explicitly).

Next, the fluctuation complexity is the square root of sum_{i=1}^N p_i(I(p_i) - H)^2). Hence, using the formulation above, we can meaningfully speak about a fluctuation of local information around the mean of information, regardless of which measure is chosen.

I also require that the generalized self-information will yield a fluctuation complexity that have the same properties as the original Shannon-based fluctuation complexity:

  • that is zero for uniform distribution
  • that it is zero for distributions where p_k = 1 and p_i = 0 for i != k.

Note that we don't involve the axioms which Shannon self-information fulfill at all: we only demand that the generalized self-information is the functional with the properties above. I haven't been able, at least until now, to find any papers in the literature that deals with this concept for Tsallis or other generalized entropies, so I think it is safe to explore with this naming convention.

New design

  • I introduce a new API method self_information(measure::InformationMeasure, p_i, args...).
  • This method is called inside information(measure::FluctuationComplexity, args...) to compute the I(p_i) terms inside the sum of the fluctuation complexity. Only measures that implement self_information are valid, otherwise an error will be thrown.

Progress

I've made the necessary derivations for the measures where calculations looked easiest: Shannon entropy/extropy, Tsallis entropy and Curado entropy. I'll fill in the gaps for the rest of the measures whenever I get some free time.

I'm writing this all up in a paper, where I also highlight ComplexityMeasures.jl and how easy it is to use the measure practically due to our discrete estimation API. I've essentially finished the intro and method, but the experimental part remains to be done. For that, I need functional code. So before I proceed, I'd like to get your input on this code proposal, @Datseris. Does this dispatch-based system make sense?

Pending the paper, I verify correctness by numerical comparison in the test suites. I re-write the information measures as weighted sums involving self_information, and check that we obtain the same value as if computing the measure using the traditional formulations.

@Datseris
Copy link
Member

This all sounds good to me, however I dislike the term self_information. I don't understand at all where the "self" refers to. The correct way to call this quantity would be information. But we have used this name already for the average information in a whole signal instead. surprisal is okay. In our textbook we just call this surprise. But perhaps an overall better is unit_information?

@Datseris
Copy link
Member

Oh, just now I saw the wiki https://en.wikipedia.org/wiki/Information_content

Right, I would still stay away from self, but we can use information_content instead.

@kahaaga
Copy link
Member Author

kahaaga commented Jun 18, 2024

Oh, just now I saw the wiki https://en.wikipedia.org/wiki/Information_content

Right, I would still stay away from self, but we can use information_content instead.

I'm fine with whatever term we use, as long as it is rooted in common literature usage. We can go for information_content.👍

@@ -279,6 +280,39 @@ function information(::InformationMeasure, ::DifferentialInfoEstimator, args...)
))
end

"""
self_information(measure::InformationMeasure, pᵢ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self_information(measure::InformationMeasure, pᵢ)
self_information(measure::InformationMeasure, p::Real)

I'd suggest that we use just p here and make it clear in the docstring that this is a number. We use probs for Probabilities in most places in the library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another argument is that the subscript i is out of context here, and may confuse instead of clarify.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we can call this simply p. It is clear that p is from a distribution.

Compute the "self-information"/"surprisal" of a single probability `pᵢ` under the given
information measure.

This function assumes `pᵢ > 0`, so make sure to pre-filter your probabilities.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This function assumes `pᵢ > 0`, so make sure to pre-filter your probabilities.
This function requires `pᵢ > 0`.

Just require it and throw error in the function body.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see the problem here. You want to define all information content functions with their simple syntax as we anyways filter 0 probabilities when we compute entropy.

Okay, let's say then: "This function requires pᵢ > 0, giving 0 will yield Inf or NaN.".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we can say that.

@@ -57,3 +57,8 @@ function information_maximum(e::TsallisExtropy, L::Int)

return ((L - 1) * L^(q - 1) - (L - 1)^q) / ((q - 1) * L^(q - 1))
end

function self_information(e::TsallisExtropy, pᵢ, N) #must have N
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is N here? This is not reflected in the function definition or call signature.

If N is either the length of probs, or the total number of outcomes, then this quantity does not satisfy the definition of an information content, as it isn't an exclusive function of a single real number. You can explore options in the paper, but here i'd say we don't keep it in the software until it is more solid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can define a functional that doesn't depend on N, if not we keep it out.

@kahaaga
Copy link
Member Author

kahaaga commented Jun 18, 2024

@Datseris I'm also a bit hesitant about the name FluctuationComplexity. I mean: yes - it is a complexity measure in the sense that it uses entropies & friends. However, I think a better, more general name is InformationFluctuation, which is more in line with the fact that we define information_content.

This goes a bit against the terminology used in the literature, but I think it is more precise. What do you think?

EDIT: or maybe just Fluctuation(measure::InformationMeasure). Then it is implicit that it is fluctuation of information, since it takes as input an InformationMeasure

@kahaaga
Copy link
Member Author

kahaaga commented Jun 18, 2024

Skjermbilde 2024-06-18 kl  12 59 04

This is how it will look in the paper if using Fluctuation. I think this is nice and clean syntax.

@Datseris
Copy link
Member

We can keep FluctuationComplexity and have it as a complexity measure with reference to the literature article. It dispatches to InformationFluctuation with Shannon. The name InformationFluctuation can do the generic thing you attempt to do in this PR and should cite your paper once you are done with it.

@Datseris
Copy link
Member

I don't like the generic Fluctuation because it is too generic. Fluctuation Dissipation theorems for example.

Compute the "self-information"/"surprisal" of a single probability `pᵢ` under the given
information measure.
Compute the "self-information"/"surprisal" of a single probability `p` under the given
information measure, assuming that `p` is part of a length-`N` probability distribution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't agree with the latest change of requiring N. It seems simpler, and more reasonable, to simply not allow Curado to be part of this interface. The opposite, defining the information unit as depending on N, doesn't make much sense at least not with how Shannon introduced it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Datseris Curado is not the only information measure whose surprisal/self-information depends explicitly on N, when following the definition of an information measure as a probability weighted average of the surprisal (as I do in the paper).

The opposite, defining the information unit as depending on N, doesn't make much sense at least not with how Shannon introduced it.

In the context of Shannon information unit alone, I agree. But the point of this interface is to generalize the Shannon information unit. This inevitably introduces N as a parameter.

Can we discuss the final API when I'm done with writing up the paper? I'm not too far from finishing it; I just need to generate a few example applications. Since I am using this PR for the paper analyses, it would be nice to not change anything in this draft PR until the paper is ready.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to have information_content/information_unit which dispatches to a subset of InformationMeasures, and then generalized_information_content/generalized_information_unit which dispatches to those InformationMeasures whose generalization of information unit depends on N. But that kind of defeats the purpose of having an interface to begin with - since we're back at defining multiple functions with different names for things that are fundamentally identical (modulo the parameter N).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: I sent you a link to the paper draft, @Datseris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants