A framework for the incremental update of the MCA solution

Angelos  Markos; Alfonso  Iodice D’Enza

doi:10.26398/IJAS.0029-011

Authors

Angelos Markos Department of Primary Education, Democritus University of Thrace, Greece
Alfonso Iodice D’Enza Department of Economics and Law, University of Cassino and Southern Lazio, Italy

DOI:

https://doi.org/10.26398/IJAS.0029-011

Keywords:

Categorical data streams, Dimensionality reduction, Stochastic approximation, Incremental SVD

Abstract

In the era of data deluge, a major challenge is to handle large amounts of data which are produced at a high rate and are characterized by association structures changing over time. As modern applications become increasingly scalable, efficient approaches are needed for dealing with high-dimensional categorical data. Multiple Correspondence Analysis (MCA) is a popular method for reducing the dimensionality of categorical data while preserving the most essential information. MCA is typically implemented via the eigenvalue decomposition (EVD) or the singular value decomposition (SVD) of a suitably transformed matrix. Because of the high computational and memory requirements of ordinary EVD and SVD, MCA is essentially unfeasible with massive data sets or data streams that change rapidly and have to be processed on the fly. We distinguish two main families of methods that can be efficiently used to incrementally compute the dominant eigenvalues and eigenvectors of a covariance matrix, i) stochastic approximation and ii) heuristic incremental EVD/SVD. A general algorithmic framework is presented to embed these methods in the MCA context and provide incremental dimension reduction of categorical data. The methods are compared on artificial data, in terms of the similarity between ordinary and incremental MCA configurations. Results do not clearly support the superiority of one method over another. However, methods that allow for block-based updates outperform vector-based approaches. The method of choice may be decided on the basis of the most desirable balance of speed and accuracy.

A framework for the incremental update of the MCA solution

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Make a Submission

onlinefirstarticles

Online First Articles

dividerblock

Scopus CiteScore

Scimago Score

Current Issue