The core vocabulary of international English, a corpus approach

Thumbnail Image
Peyawary, Ahmad Shah
Journal Title
Journal ISSN
Volume Title
This study investigates the vocabulary of the English language in order to determine if there is a core vocabulary common to the three major dialects of the English language. Thorndike and West examined the frequency of vocabulary in samples of the English language. This study discusses the theory of corpus design in general and in language specifically; on the basis of this theory, it develops criteria for the evaluation of language corpora as representative samples of language permitting stati tical analysis. These criteria, applied to the pioneering work of Thorndike and West, show the need for a new analysis. The ICAME database, containing million word samples of American, British, and Indian English, does conform to these criteria. A methodology appropriate to the statistical analysis of the ICAME materials is elaborated. After the ICAME data have been converted from spelled forms to dictionary headwords, their frequency distribution patterns are analyzed using Spearman's correlation coefficient to determine the extent of similarity. Although analysis of the undifferentiated list produces inconclusive results, when the words are grouped by syntactic category it is possible to discern a boundary between core English and its three major dialects. The study concludes with word lists presenting the core vocabulary of International English.