Difference between revisions of "Mathematical Background"
m |
|||
Line 22: | Line 22: | ||
=== Structure Theorem === | === Structure Theorem === | ||
− | Let | + | Let $$D$$ be a finite subset of $$X$$. Then the persistence module is a representation of a linear quiver |
=== Stability Theorem === | === Stability Theorem === |
Revision as of 15:06, 15 May 2019
Topological and geometrical data analysis are fundamentally built on ideas and results of algebraic topology. Main notions: simplicial complex, homology, ...
Data is always given by point clouds in [math] \mathbb{R}^n [/math]. In order to use these topological notions it is necessary to first translate the data into topological spaces. This step usually requires the notion of distance, i.e. a metric on [math] \mathbb{R}^n [/math] that is tailored to the system from which the data arises. thus enter geometry: metric spaces, ...
The distance of points in [math]\mathbb{R}^n[/math] is a notion that is usually not inherent to the data. Also there is no distinguished value of distance, so one should consider all values. The algebraic theory gives relations between different values of the parameter.
Therefore the applications of topological ideas to data analysis can almost always be divided into the following two steps
$$ \text{Data} \stackrel{geometry}{\longrightarrow} \text{simplicial complex} \stackrel{algebraic\;topology}{\longrightarrow} \text{invariants inherent to the data set} $$
Fundamental Definitions
Fundamental Theorems
There are three fundamental results that make the classical theory of persistent homology work
Nerve Theorem
Let [math]X[/math] be a paracompact space and [math]\mathcal U=\{U_i\}[/math] a good open cover, that is every nonempty intersection of finitely many sets in [math]\mathcal U[/math] is contractible.
Then the nerve [math]N\mathcal U[/math] is homotopy equivalent to [math]X[/math].
Structure Theorem
Let $$D$$ be a finite subset of $$X$$. Then the persistence module is a representation of a linear quiver
Stability Theorem
Explanation of Relevance
Let's assume that our initial point cloud data [math]D=\{p_i \in \mathbb{R}^n\}[/math] arises as a finite subset of a paracompact topological space [math]X[/math] that is embedded in [math]\mathbb{R}^n[/math]. In topological data analysis we want to determine the characteristic properties of the topological space [math]X[/math] from the point cloud [math]D[/math].
For any value of [math]r[/math] the open balls centered at the points are a good open cover [math]\mathcal{U}_r = \{B_r(p_i)\}[/math] of the compact space [math]Y_r=\cup B_r(p_i)[/math]. Recall that the Čech complex [math]{C}_r(D)[/math] is just the filtered complex of nerves [math]N(\mathcal U_r)[/math], so by the nerve theorem there is a homotopy equivalence [math]{C}_r(D) \simeq Y_r[/math] for each [math]r[/math].
For the moment assume that there is a value [math]r[/math], such that [math]X\cap\mathcal U_r[/math] is a good open cover of $X$. Then at this value of [math]r[/math] the Čech complex is actually homotopy equivalent to [math]X[/math] due to the nerve theorem. This motivates the standard approach to topological data analysis, where we assume that generically we are in a situation, where the above assumptions hold.
However, it is not clear at which value of [math]r[/math] this situation occurs, which means we need to consider the Čech complex of [math]D[/math] as a a whole. Then the topological invariants of this filtered complex will contain the characteristic properties of [math]X[/math]. The homology groups of a 1-parameter filtered complex actually have the structure of a persistence module. The structure theorem now tells us that we have a complete classification of these persistence modules by the birth and death values of homology classes.
Finally, the situation as described so far is not one we can expect to hold in real-life. The hypotheses of [math]X[/math] being embedded and there being a radius [math]r[/math] that admits a good open cover of the embedding of [math]X[/math] by balls with this radius are problematic. Furthermore, measurements are always endowed with uncertainties, such that the points of [math]D[/math] do not necessarily lie on the image of [math]X[/math] in [math]\mathbb{R}^n[/math].
This is where the stability theorem enters, which makes precise the following statement: a small variation of the metric results in only a small variation of the associated persistence module. This result does not fully invalidate the objections of the previous paragraph, though.