Unlabeled Principal Component Analysis and Matrix Completion(Manolis C. Tsakiris and collaborators)

Watch Video

10 24, 2024

We introduce robust principal component analysis from a data matrix in which the entries of its columns have been corrupted by permutations, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that UPCA is a well-defined algebraic problem since we prove that the only matrices of minimal rank that agree with the given data are row-permutations of the ground-truth matrix, arising as the unique solutions of a polynomial system of equations. Further, we propose an efficient two-stage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data have been permuted. Stage-I employs outlier-robust PCA methods to estimate the ground-truth column-space. Equipped with the column-space, Stage-II applies recent methods for unlabeled sensing to restore the permuted data. Allowing for missing entries on top of permutations in UPCA leads to the problem of unlabeled matrix completion, for which we derive theory and algorithms of similar flavor. Experiments on synthetic data, face images, educational and medical records reveal the potential of our algorithms for applications such as data privatization and record linkage.

 

Publication:

Journal of Machine Learning Research 25 (2. 2024)

https://jmlr.org/papers/v25/22-0816.html

 

Author:

Yunzhen Yao

School of Computer and Communication SciencesEPFLCH-1015 Lausanne, Switzerland

Emailyunzhen.yao@epfl.ch

 

Liangzu Peng

Emaillpenn@seas.upenn.edu

 

Manolis C. Tsakiris

Key Laboratory for Mathematics MechanizationAcademy of Mathematics and Systems ScienceChinese Academy of SciencesBeijing, 100190, China

Emailmanolis@amss.ac.cn


Contacts:

E-mail:

Copyright@2008,All Rights Reserved, Academy of Mathematics and Systems Science,CAS
Tel:86-10-82541777 Fax: 86-10-82541972 E-mail: contact@amss.ac.cn
京ICP备05002806-1号 京公网安备110402500020号