Cohomology in algebra, geometry, physicsand statistics

Information cohomology and Topological Information Data Analysis

Speaker’s name:

Pierre Baudot

Speaker’s affiliation:

Median Technologies

Place:

in IM building, ground floor

Date:

Wednesday, 6. March 2019 - 11:30 to 12:30

Abstract:

We establish methods that quantify the statistical interactions structure within a given data set using the characterization of information theory in cohomology by finite methods, and provide their expression in term of statistical physic and machine learning.

In a first part, we will have a look at the formalism of Information Cohomology obtained with Daniel Bennequin and refined by Juan Pablo Vigneaux with extension to Tsallis entropies [1,2]. It considers random variables as partitions of atomic probabilities and the associated poset given by their lattice. The basic cohomology is settled by the Hochschild coboundary, with a left action corresponding to information conditioning. The first degree cocycle is the entropy chain rule, allowing to derive the functional equation of information and hence to characterize entropy uniquely as the first group of the cohomology. (minus) Odd multivariate mutual informations (MI, I2k+1) appears as even degrees coboundary, and the introduction of a second trivial or symmetric action coboundary gives even MI (I2k) in the odd degrees. If time permits, I will try also to present how this setting fits nicely into a topos giving a constructive and multivalued probabilistic logic, and how related results came out surprisingly from motiv studies in the work of Catelineau, Gangl and Elbaz-Vincent, leading to the conjecture that higher groups are polylogarithmic forms à la Aomoto: a modern philosopher's stone.

In a second part, we will have a look at the application of this formalism to real data, here genetic expression, and its interpretation in terms of statistical physic and machine learning [3,4]. Mutual statistical independence is equivalent to the vanishing of all k-MI (Ik=0), leading to the conclusion that the Ik define refined measures of statistical dependencies and that the cohomology quantifies the obstruction to statistical factorization. We develop the computationally tractable subcase of on the simplicial (Boolean) sub-lattice, represented by entropy Hk and information Ik landscapes. The marginal I1 component defines a self-internal energy functional Uk, and (-1)^k Ik,k>1 define the contribution of the k-body interactions to the free energy functional Gk given by the KL divergence between marginals and the joined variable (the "total correlation"). The set of information paths in simplicial structure is in bijection with the symmetric group and random processes and provides a trivial topological expression of the 2nd law of thermodynamic. The slope of the Ik paths is (minus) the conditional mutual information. The local minima of Ik longest paths, a conditional mutual independence criterion, characterize a complex corresponding to the minima of free energy components. The application to genetic expression and cell-type classification recovers the known differential expressions and co-regulations of genetic modules, giving a topological version of Waddington epigenetic landscapes that quantifies the epigenetic information storage and learning beyond pairwise-interactions. Negativity of information detects clusters of differential genetic expression analogously to first order transition to the condensed phase. Finite data size effects severely constrains the computation of the information topology on data, and we provide simple statistical test for the undersampling bias and for the k-dependences.

[1] P. Baudot and D. Bennequin. The homological nature of entropy. Entropy, 17(5):3253–3318, 2015.

[2] J.P. Vigneaux. The structure of information: from probability to homology. arXiv:1709.07807, 2017.

[3] Tapia M., Baudot P., Dufour M., Formizano-Treziny C., Temporal S., Lasserre M., Kobayashi K., Goaillard J.M.. Neurotransmitter identity

and electrophysiological phenotype are genetically coupled in midbrain dopaminergic neurons. Scientific Reports. 2018. BioArXiv168740

[4] Baudot P., Tapia M., Goaillard J.M., Topological Information Data Analysis: Poincare-Shannon Machine and Statistical Physic of Finite

Heterogeneous Systems. Preprint doi: 10.20944/preprints201804.0157.v1 (submitted)

In a first part, we will have a look at the formalism of Information Cohomology obtained with Daniel Bennequin and refined by Juan Pablo Vigneaux with extension to Tsallis entropies [1,2]. It considers random variables as partitions of atomic probabilities and the associated poset given by their lattice. The basic cohomology is settled by the Hochschild coboundary, with a left action corresponding to information conditioning. The first degree cocycle is the entropy chain rule, allowing to derive the functional equation of information and hence to characterize entropy uniquely as the first group of the cohomology. (minus) Odd multivariate mutual informations (MI, I2k+1) appears as even degrees coboundary, and the introduction of a second trivial or symmetric action coboundary gives even MI (I2k) in the odd degrees. If time permits, I will try also to present how this setting fits nicely into a topos giving a constructive and multivalued probabilistic logic, and how related results came out surprisingly from motiv studies in the work of Catelineau, Gangl and Elbaz-Vincent, leading to the conjecture that higher groups are polylogarithmic forms à la Aomoto: a modern philosopher's stone.

In a second part, we will have a look at the application of this formalism to real data, here genetic expression, and its interpretation in terms of statistical physic and machine learning [3,4]. Mutual statistical independence is equivalent to the vanishing of all k-MI (Ik=0), leading to the conclusion that the Ik define refined measures of statistical dependencies and that the cohomology quantifies the obstruction to statistical factorization. We develop the computationally tractable subcase of on the simplicial (Boolean) sub-lattice, represented by entropy Hk and information Ik landscapes. The marginal I1 component defines a self-internal energy functional Uk, and (-1)^k Ik,k>1 define the contribution of the k-body interactions to the free energy functional Gk given by the KL divergence between marginals and the joined variable (the "total correlation"). The set of information paths in simplicial structure is in bijection with the symmetric group and random processes and provides a trivial topological expression of the 2nd law of thermodynamic. The slope of the Ik paths is (minus) the conditional mutual information. The local minima of Ik longest paths, a conditional mutual independence criterion, characterize a complex corresponding to the minima of free energy components. The application to genetic expression and cell-type classification recovers the known differential expressions and co-regulations of genetic modules, giving a topological version of Waddington epigenetic landscapes that quantifies the epigenetic information storage and learning beyond pairwise-interactions. Negativity of information detects clusters of differential genetic expression analogously to first order transition to the condensed phase. Finite data size effects severely constrains the computation of the information topology on data, and we provide simple statistical test for the undersampling bias and for the k-dependences.

[1] P. Baudot and D. Bennequin. The homological nature of entropy. Entropy, 17(5):3253–3318, 2015.

[2] J.P. Vigneaux. The structure of information: from probability to homology. arXiv:1709.07807, 2017.

[3] Tapia M., Baudot P., Dufour M., Formizano-Treziny C., Temporal S., Lasserre M., Kobayashi K., Goaillard J.M.. Neurotransmitter identity

and electrophysiological phenotype are genetically coupled in midbrain dopaminergic neurons. Scientific Reports. 2018. BioArXiv168740

[4] Baudot P., Tapia M., Goaillard J.M., Topological Information Data Analysis: Poincare-Shannon Machine and Statistical Physic of Finite

Heterogeneous Systems. Preprint doi: 10.20944/preprints201804.0157.v1 (submitted)