Synthetic maps for navigating high-dimensional data spaces

Tuesday, February 13, 2018

2.30 p.m.

ISI seminar room 2nd floor

Alessandro Laio International School for Advanced Studies (SISSA)

Data analysis in high-dimensional spaces aims at obtaining a synthetic description of a data set, revealing its main structure  and its salient features.
We will describe an approach for charting data spaces,  providing a topography of the  probability distribution from which  the data are harvested.  This topography includes information on the number   and the height of the probability peaks, the depth of  the "valleys" separating them, the relative location of the peaks and their hierarchical organization.  The topography is reconstructed by using an unsupervised variant of Density Peak clustering[1]  exploiting a non-parametric density estimator[2],  which automatically measures the density in  the manifold       containing the data[3]. Importantly, the density estimator provides an estimate of the error.  This is a key feature, which allows distinguishing genuine probability  peaks from density fluctuations due to finite sampling.
[1] Science, 1492, vol 322  (2014)
[2] JCTC , in press, (2018)
[3] Sci Rep. 12140, vol 7 (2017)