A framework for the incremental update of the MCA solution

Authors

  • Angelos Markos Department of Primary Education, Democritus University of Thrace, Greece
  • Alfonso Iodice D’Enza Department of Economics and Law, University of Cassino and Southern Lazio, Italy

DOI:

https://doi.org/10.26398/IJAS.0029-011

Keywords:

Categorical data streams, Dimensionality reduction, Stochastic approximation, Incremental SVD

Abstract

In the era of data deluge, a major challenge is to handle large amounts of data which are produced at a high rate and are characterized by association structures changing over time. As modern applications become increasingly scalable, efficient approaches are needed for dealing with high-dimensional categorical data. Multiple Correspondence Analysis (MCA) is a popular method for reducing the dimensionality of categorical data while preserving the most essential information. MCA is typically implemented via the eigenvalue decomposition (EVD) or the singular value decomposition (SVD) of a suitably transformed matrix. Because of the high computational and memory requirements of ordinary EVD and SVD, MCA is essentially unfeasible with massive data sets or data streams that change rapidly and have to be processed on the fly. We distinguish two main families of methods that can be efficiently used to incrementally compute the dominant eigenvalues and eigenvectors of a covariance matrix, i) stochastic approximation and ii) heuristic incremental EVD/SVD. A general algorithmic framework is presented to embed these methods in the MCA context and provide incremental dimension reduction of categorical data. The methods are compared on artificial data, in terms of the similarity between ordinary and incremental MCA configurations. Results do not clearly support the superiority of one method over another. However, methods that allow for block-based updates outperform vector-based approaches. The method of choice may be decided on the basis of the most desirable balance of speed and accuracy.

Downloads

Published

2020-02-17

How to Cite

Markos, A. ., & Iodice D’Enza, A. . (2020). A framework for the incremental update of the MCA solution. Statistica Applicata - Italian Journal of Applied Statistics, 29(2-3), 217–231. https://doi.org/10.26398/IJAS.0029-011

Issue

Section

Latest articles