Computational history: using semantic models to measure changes in attitudes, values, and beliefs from language

dc.contributor.authorCook, Matthew
dc.contributor.examiningcommitteeJakobson, Lorna (Psychology)en_US
dc.contributor.examiningcommitteeLi, Johnson (Psychology)en_US
dc.contributor.examiningcommitteeBunt, Andrea (Computer Science)en_US
dc.contributor.examiningcommitteeWestbury, Chris (University of Alberta)en_US
dc.contributor.supervisorJamieson, Randall
dc.date.accessioned2022-08-24T21:00:40Z
dc.date.available2022-08-24T21:00:40Z
dc.date.copyright2022-08-09
dc.date.issued2022-08-09
dc.date.submitted2022-08-09T20:49:56Zen_US
dc.degree.disciplinePsychologyen_US
dc.degree.levelDoctor of Philosophy (Ph.D.)en_US
dc.description.abstractLanguage is not only a tool for communication, but a window into human nature and the mind. The way we talk about ourselves, others, and the world around us reveals our personalities, mental health, self- and group-serving biases, and more. Though many sources of language exist to understand human psychology, newspapers provide a unique opportunity for studying changes in attitudes, values, and beliefs. In this thesis, I analyzed 221 years (1789-2009) of American historical newspaper data from two historical newspaper corpora (Chronicling America and the Corpus of Historical America English [COHA]). The volume of data is more than could ever be read by any scholar or group of scholars. Therefore, I made use of a standard computational language model of distributed semantics called the Random Permutation (RP) model that “reads” through a corpus of text and generates a mathematical representation of word meaning (i.e., a vector representation). I used the RP model to generate a vector representation for each word written in each decade of the newspapers. The result of this procedure is a 3000-dimensional vector space where each word in each decade is represented as a point in a space that evolves through time. Similar words (e.g., dog and canine) occupy similar regions in the semantic vector space, whereas dissimilar words (e.g., dog and toolbox) occupy dissimilar regions in the semantic space. Having derived dozens of sets of vectors, I first conducted a series of four experiments using unambiguous ground truths to validate the semantic meaning embedding within the vector space. After validating the vectors, I used several methods, including machine learning methods, to measure long term changes in attitudes, values, and beliefs through an analysis of language. In addition to the computational work, I conducted an empirical experiment that demonstrated that the methods I used to measure meaning also predict peoples’ behavioural bias in real-world consequential decisions (i.e., job hiring). The ultimate goal of the thesis is to advance computational methods for accurately predicting people’s emotions, thoughts, and behaviour from language.en_US
dc.description.noteOctober 2022en_US
dc.identifier.urihttp://hdl.handle.net/1993/36754
dc.language.isoengen_US
dc.rightsopen accessen_US
dc.subjectcomputational humanitiesen_US
dc.subjectcomputational social sciencesen_US
dc.subjectdistributed models of semanticsen_US
dc.subjectnatural language processingen_US
dc.subjecttext classificationen_US
dc.titleComputational history: using semantic models to measure changes in attitudes, values, and beliefs from languageen_US
dc.typedoctoral thesisen_US
local.subject.manitobanoen_US
project.funder.nameNatural Sciences and Engineering Research Council of Canadaen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Cook_Matthew.pdf
Size:
3.47 MB
Format:
Adobe Portable Document Format
Description:
PhD Dissertation
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.2 KB
Format:
Item-specific license agreed to upon submission
Description: