About fragmented analysis of texts. some inferential issues in text mining (variations on the “inaugural addresses corpus”)
DOI:
https://doi.org/10.26398/IJAS.0029-015Keywords:
Statistical inference, Validation, Bootstrap, Textual data analysisAbstract
After a brief reminder about the geometrical aspects of data analysis, we contrast the supervised approach (leading to straightforward external validation) and the unsupervised approaches (leading to several methods of internal validation based on resampling techniques). In the case of a corpus of texts comprising several parts, a fragmentation of the text provides an unsupervised variant of the analysis of the global lexical table (parts x words). We present then in the unsupervised case some validation procedures allowing for a critical use of the methods and thus providing an assessment of the results. These procedures could be described as variants of bootstrap techniques adapted to the complex nature of textual data. The application example concerns the corpus of Inaugural Addresses of US presidents.