Intrinsic dimensions, densities and the Information Imbalance: Remarkably simple yet effective tools for the data scientist

LocationISI Foundation, Seminar Room 2nd floor
Speaker(s)Dr. Aldo Glielmo - Applied Research Team, Directorate General for IT, Banca d'Italia
Data Science
Boliviainteligente Rhaw3xllmso Unsplash

ABSTRACT
Data increasingly comes in the form of very high dimensional descriptors possessing hundreds of even thousands of coordinates, but they typically lie on manifolds of much lower dimensionality and a rich set of hidden properties. In my talk, I will overview some simple, and yet very effective, numerical techniques to analyse fundamental characteristics of data manifolds. Specifically, I will describe estimators of intrinsic dimension [Macocco et al., PRL (2023)] and manifold density [Carli et al., ArXiv (2024)], as well as methods to find informative coordinates [Glielmo et al., PNAS Nexus (2022); Camboulin et al., UniReps@NeurIPS (2024)]. I will support the theoretical explanation of the methods with practical demonstrations on toy datasets using the DADApy package [Glielmo et al., Patterns (2022)], available at https://github.com/sissa-data-science/DADApy.

Published on monday, 14 april 2025

Related News