About fragmented analysis of texts. some inferential issues in text mining (variations on the “inaugural addresses corpus”)

Ludovic  Lebart

doi:10.26398/IJAS.0029-015

About fragmented analysis of texts. some inferential issues in text mining (variations on the “inaugural addresses corpus”)

Authors

Ludovic Lebart Télécom-ParisTech, Paris, France

DOI:

https://doi.org/10.26398/IJAS.0029-015

Keywords:

Statistical inference, Validation, Bootstrap, Textual data analysis

Abstract

After a brief reminder about the geometrical aspects of data analysis, we contrast the supervised approach (leading to straightforward external validation) and the unsupervised approaches (leading to several methods of internal validation based on resampling techniques). In the case of a corpus of texts comprising several parts, a fragmentation of the text provides an unsupervised variant of the analysis of the global lexical table (parts x words). We present then in the unsupervised case some validation procedures allowing for a critical use of the methods and thus providing an assessment of the results. These procedures could be described as variants of bootstrap techniques adapted to the complex nature of textual data. The application example concerns the corpus of Inaugural Addresses of US presidents.

Cover Italian Journal of Applied Statistics, vol. 29, 2-3, 2017

Downloads

Published

2020-02-18

How to Cite

Lebart, L. . (2020). About fragmented analysis of texts. some inferential issues in text mining (variations on the “inaugural addresses corpus”). Statistica Applicata - Italian Journal of Applied Statistics, 29(2-3), 273–291. https://doi.org/10.26398/IJAS.0029-015

Download Citation

Issue

No. 2-3 (2017): Vol. 29, Number 2-3

About fragmented analysis of texts. some inferential issues in text mining (variations on the “inaugural addresses corpus”)

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Make a Submission

onlinefirstarticles

Online First Articles

dividerblock

Scopus CiteScore

Scimago Score

Current Issue