About fragmented analysis of texts. some inferential issues in text mining (variations on the “inaugural addresses corpus”)

Authors

  • Ludovic Lebart Télécom-ParisTech, Paris, France

DOI:

https://doi.org/10.26398/IJAS.0029-015

Keywords:

Statistical inference, Validation, Bootstrap, Textual data analysis

Abstract

After a brief reminder about the geometrical aspects of data analysis, we contrast the supervised approach (leading to straightforward external validation) and the unsupervised approaches (leading to several methods of internal validation based on resampling techniques). In the case of a corpus of texts comprising several parts, a fragmentation of the text provides an unsupervised variant of the analysis of the global lexical table (parts x words). We present then in the unsupervised case some validation procedures allowing for a critical use of the methods and thus providing an assessment of the results. These procedures could be described as variants of bootstrap techniques adapted to the complex nature of textual data. The application example concerns the corpus of Inaugural Addresses of US presidents.

Downloads

Published

2020-02-18

How to Cite

Lebart, L. . (2020). About fragmented analysis of texts. some inferential issues in text mining (variations on the “inaugural addresses corpus”). Statistica Applicata - Italian Journal of Applied Statistics, 29(2-3), 273–291. https://doi.org/10.26398/IJAS.0029-015

Issue

Section

Latest articles