Matthew L. Jockers

in Macroanalysis

Published by University of Illinois Press

Published in print April 2013 | ISBN: 9780252037528
Published online April 2017 | e-ISBN: 9780252094767 | DOI:

More Like This

Show all results sharing this subject:

  • Literary Theory and Cultural Studies


Show Summary Details


This chapter demonstrates how big data and computation can be used to identify and track recurrent themes as the products of external influence. It first considers the limitations of the Google Ngram Viewer as a tool for tracing thematic trends over time before turning to Douglas Biber's Corpus Linguistics: Investigating Language Structure and Use, a primer on various factors complicating word-focused text analysis and the subsequent conclusions one might draw regarding word meanings. It then discusses the results of the author's application of latent Dirichlet allocation (LDA) to a corpus of 3,346 nineteenth-century novels using the open-source MALLET (MAchine Learning for LanguagE Toolkit), a software package for topic modeling. It also explains the different types of analyses performed by the author, including text segmentation, word chunking, and author nationality, gender and time-themes relationship analyses. The thematic data from the LDA model reveal the degree to which author nationality, author gender, and date of publication could be predicted by the thematic signals expressed in the nineteenth-century novels corpus.

Keywords: big data; computation; Google Ngram Viewer; latent Dirichlet allocation; nineteenth-century novels; MALLET; topic modeling; author nationality; author gender; thematic signals

Chapter.  12017 words.  Illustrated.

Subjects: Literary Theory and Cultural Studies

Full text: subscription required

How to subscribe Recommend to my Librarian

Users without a subscription are not able to see the full content. Please, subscribe or login to access all content. subscribe or login to access all content.