When we are working with high dimensional data, typically we have to deal with the problem of defining a distance between points that can take into account its underlying structure. For example, many tasks in machine learning require a notion of distance or similarity between points, such as clustering, classification, dimensionality reduction, and deep learning. This distance has to capture as much relevant information as possible in order to give better results.
For this scenario, a good hypothesis is that the dataset is constituted by a set of samples with unknown density distribution which is supported on a lower dimensional manifold. In some cases, this can be corroborated and there are a few degrees of freedom that parametrize the points in the dataset. Even more, there are theoretical results that show that independently of the dimension of the space, points are near to some lower dimensional structure.
Tipically, real data is modeled as a collection of points living in a manifold with a intrinsic dimension lower than the ambient space.
Naturally, The number of points needed to fill a dimensional set scales exponentially in the dimension
of the space, making impossible for a finite set of points to do that task. More technically, the Johnson-Linderstraus Lemma claims that given a dataset of
points, there always exists a projection of the points in
with
that preserves the Euclidean distance between all pairs of points with an error smaller than
, independently of the dimension of the dataset.
The Fermat distance
For the sake of the explanation, let consider first that we know the density , where
is a
-dimensional manifold embedded in
. As previously mentioned, typically we have
but this is not a restricted condition. On
we have defined some distance, for example, the Euclidean distance
. Given two points
and a parameter
we define the Fermat distance between
and
as
where the minimization carried over all rectifiable curves that start at
and end at
.
- Given the input distance
,
defines a (new) distance that takes into consideration both manifold and density.
- For
, the Fermat distance is identical to the length of the geodesic on
, recovering the Isomap distance.
- For higher values of
, paths that cross for regions with low density are disfavorable.
Let’s considerer the example of a bimodal distribution supported on the lower dimensional manifold , with modes in points
and
and with different dispersions (see image below). If we consider the point
on the manifold, both Euclidean and geodesic distance will make
nearer to
than to
. On the contrary, since the density
takes lower values in every path
that conects
with
, the Fermat distance
will be higher than
for a choice of
large enough.
On the other hand, for the optimal path
that realizes the Fermat distance is most likely to cross regions with high density. In that way, we can think
as the most likely continuous transformation of
to
.
The Fermat distance is defined as an optimization problem which selects a path minimizing some functional. As its name suggests, the definition of the Fermat distance resembles the optical principle in optics.
In optics, the path follows by light in a media is a minimizer of the optical path lenght, which is the path integral of the refractive index of the media. This least action formulation of the geometric optic is known as Fermat’s Principle.
The refractive index is related to the speed of light in the media. More specific, if is the speed of light in vacuum and
is the speed of light in the media, the refractive index is defined as
. Then, the path integral
represents the time that requires to a ray of light to go through the path given by . Regions with lower refractive index represent regions where light can move faster. In that way, the role of the refractive index in the definition of the Fermat distance is played by the inverse power of the density
: the light moves faster in regions with high density.
How to estimate the Fermat distance?
Assuming that we have knowledge of the density and its support
, finding the optimal path
that minimizes the Fermat distance could be computationally expensive. However, in a real case scenario we have the information of
via some finite set of samples
with common density
. One approach could be to first estimate the density to estimate in a second step the Fermat distance.
However, let see how it is possible to estimate the Fermat distance directly using mathematical results in percolation theory. Given two points and
and a sequence
of points such that
for all
, with
, we say that
is a path from
to
if
and
are the nearest points in
to
and
, respectelly. Given a parameter
, the Fermat distance estimator is defined as
Starting in , this represents the minimum cost of going to
when the local cost from moving from one point
to another
is
.
At a first glance, the relationship between the Fermat distance and its estimator is not immediate. However, it can be proved that if we rescale adequately and take the limit as the number of samples
goes to infinity then
where the parameter and
satisfies the relation
, with
the intrinsic dimension of the manifold and
some constant that depends on
and
.
To better understand this result, let’s consider the simple case where and the contribution of each
is proportional to the area of a circle with center
and radious
. Then, the Fermat distance estimator is proportional to the total area of the shaded region in the figure above. Consequently, a path that goes for regions with high density will have lower cost, as the Fermat distance does.
Algorithmic Implementation
The minimization between all possible paths that connect two given points can be treated as a shortest path problem in the graph with nodes in the points and where the weight of the edge between two points
and
is
. The faster method to compute
is the Floyd-Warshall algorithm, which requires
operations (a lot!). However, we can perform the two following relaxation of the problem
Since the power discourages large distances between consecutive points of the optimal path, we can restrict the search of paths
where each
is
-nearest neighbour of
. Even more, if we choose
then with probability at least
the estimator of the Fermat distance doesn’t change. In such case, Dijkstra algorithm requires
operations.
Furthermore, we can consider a subset of points which we are going to call the landmarks, selected with some rule, for example, a random sample of size
from
. Then, we compute the shortest path from each of the
to the rest of the
points in
operations using Dijsktra algortihm. Finally, it is possible to bound the Fermat distance
between all
and
as
If both lower and upper bound are the same, we have the exact value of , but if not we have some inteval where the real value lies.
An implementation in Python of all these variants can be found in www.aristas.com.ar/fermat
References
Weighted Geodesic Distance Following Fermat’s Principle
F. Sapienza, P. Groisman, M. Jonckheere
6th International Conference on Learning Representations (ICLR), 2018.
Nonhomogeneous Euclidean first-passage percolation and distance learning
P. Groisman, M. Jonckheere, F. Sapienza
Follow me