Computational history: using semantic models to measure changes in attitudes, values, and beliefs from language

Thumbnail Image
Cook, Matthew
Journal Title
Journal ISSN
Volume Title
Language is not only a tool for communication, but a window into human nature and the mind. The way we talk about ourselves, others, and the world around us reveals our personalities, mental health, self- and group-serving biases, and more. Though many sources of language exist to understand human psychology, newspapers provide a unique opportunity for studying changes in attitudes, values, and beliefs. In this thesis, I analyzed 221 years (1789-2009) of American historical newspaper data from two historical newspaper corpora (Chronicling America and the Corpus of Historical America English [COHA]). The volume of data is more than could ever be read by any scholar or group of scholars. Therefore, I made use of a standard computational language model of distributed semantics called the Random Permutation (RP) model that “reads” through a corpus of text and generates a mathematical representation of word meaning (i.e., a vector representation). I used the RP model to generate a vector representation for each word written in each decade of the newspapers. The result of this procedure is a 3000-dimensional vector space where each word in each decade is represented as a point in a space that evolves through time. Similar words (e.g., dog and canine) occupy similar regions in the semantic vector space, whereas dissimilar words (e.g., dog and toolbox) occupy dissimilar regions in the semantic space. Having derived dozens of sets of vectors, I first conducted a series of four experiments using unambiguous ground truths to validate the semantic meaning embedding within the vector space. After validating the vectors, I used several methods, including machine learning methods, to measure long term changes in attitudes, values, and beliefs through an analysis of language. In addition to the computational work, I conducted an empirical experiment that demonstrated that the methods I used to measure meaning also predict peoples’ behavioural bias in real-world consequential decisions (i.e., job hiring). The ultimate goal of the thesis is to advance computational methods for accurately predicting people’s emotions, thoughts, and behaviour from language.
computational humanities, computational social sciences, distributed models of semantics, natural language processing, text classification