Mixed Company: A Comparison between Word Senses and Usages based on Distributional Similarity
The issue of how words should be separated into senses is crucial for the problem of word sense disambiguation. Early work in computational linguistics was based on manual divisions that were made by individual researchers (e.g, Small 1982, Hirst 1987, Yarowsky 1992). More recently, work has focused on unsupervised learning and making distinctions on the basis of distributional similarity (Lee 1999, Weeds, Weir and McCarthy 2004, Dinu and Lapata 2010). These approaches are based on the idea that "a word is characterized by the company it keeps" (Firth 1957). The approaches use statistics about how words co-occur in order to create clusters. If the same word appears in different clusters, it is ascribed to different senses/usages of the word.
This talk will describe an investigation into the way senses are distinguished, both lexicographically and by distributional similarity. A sample of words was assessed with two senses, and in which those senses are either homonymous, or related via metaphor or systematic polysemy. The sample was based on the senses that occur in the Longman Dictionary of Contemporary English (LDOCE). There was a cline of word sense individuation with regard to these distinctions. The senses that were related via systematic polysemy were distinguished less often between dictionaries, as well as having inconsistent individuation across the set within a given dictionary. This cline was also found with distributional similarity. A greater percentage of the homonymous set was separated into different clusters than the metaphor set, and an even smaller percentage with regard to the systematic polysemy set. These results are in accord with Panman (1980) and Lehrer (1974). They found that when word senses are homonymous, people will agree that they reflect different senses. When the senses are related in meaning, people will differ about whether they are two senses or one. This work shows that such differences apply to lexicographic judgment as well, and that statistical approaches also suffer from inconsistency across a set of words that have a systematic relationship between their senses. The talk conclude with suggestions for modifying current statistical approaches so that they can capture more linguistic generalizations.
Biography: Robert Krovetz, PhD, is President of Lexical Research, and a Senior Research Scientist at Right Answers. His work focuses on lexical semantics, morphology, and multiword expressions. He has published more than 30 papers, including one on morphology and information retrieval that has been cited by more than 800 other papers and was ranked by Citeseer as one of the 100 most-cited papers of the year in computer science.
Definitions and pictures in a dictionary: Competition or synergy?
This study looks into the relative importance of pictures and definitions in a monolingual entry for advanced learners of English, and how the two devices work together. Ten Polish advanced learners of English were presented with on-screen monolingual entries adapted from the digital version of the Longman Dictionary of Contemporary English. The entries depicted concrete objects, each being defined and also illustrated with a picture (the original pictorials from LDOCE were used). The entries were all low-frequency lexical items, so they would not be known participating students. The entries used were initially rated, using a separate but similar group of 20 learners, for (1) familiarity and (2) illustrability. In addition, half of the items depicted the designate in a natural context (e.g. banister shown next to a staircase), while the other half only showed the designate. In an experimental study using the Tobii T-60 eyetracker, participants were asked to supply verbally Polish equivalents for the 20 words defined and illustrated, based on the dictionary entries presented on the screen, and their eye gaze patterns as well as their responses were recorded. Following the eye tracking session, participants were also given a lexical recall test.
We hope that the data will reveal patterns of relative use of the verbal and pictorial elements of the entries, in particular the extent to which pictures and definitions compete for the users’ attention (see Figure 2), along with entry comprehension and vocabulary retention, and how these two outcome measures are affected by the contextuality of the pictorial illustration as well as the relative time spent on the different entry components.
Biography: Robert Lew is a professor at the Department of Lexicography and Lexicology at Adam Mickiewicz University in Poznań, Poland. His current research interests centre around dictionary use. He has worked as a practical lexicographer and is the Editor of the International Journal of Lexicography.
A multidialect model for monolingual dictionary data
The paper discusses a data model which enables multiple versions of a base dictionary, representing different dialects, to be stored and maintained in a single database. The model is being used for an English monolingual dictionary dataset created by merging two formerly discrete dictionaries of US and UK English. In the multidialect dictionary, entries are forked to provide different spellings, word choices, definitions, and examples only when this is required to accurately reflect dialect differences, while entry elements that do not involve dialect distinctions exist in only one version. The dictionary is published for end-users in separate, independent versions for each dialect, but lexicographers work in the multidialect environment of the base data. Although currently being used only for US and UK English, the data model could be extended to other languages or dialects.
Users of US and UK English prefer to consume dictionary content in their respective dialects, but in the global environment of the Internet they are more likely than ever before to encounter the distinctive usages of the opposite dialect. The multidialect dictionary model offers several advantages over the more traditional approach of maintaining parallel dictionaries. The back-end dictionary data eliminates unnecessary duplications and is far more efficient to maintain and update than separate dictionary texts, and the formerly separate dictionaries are easily expanded to include words and meanings from the other dialect. The multidialect model enables an editorial team consisting of native speakers of both dialects to work collaboratively, resulting in more nuanced and comprehensive labeling and documentation of dialect differences. And by documenting semantic distinctions between the two major dialects of English at a granular level, the US-UK multidialect dictionary comprises a detailed record that has the potential to serve as a valuable resource for analyzing lexical variation.
Biography: Katherine Connor Martin is Head of US Dictionaries at Oxford University Press, where she has been working as a lexicographer since 2003.
M. LYNNE MURPHY
Language lovers or linguistic authorities? Contrasts in British and American dictionary cultures
This paper examines the contrasting “dictionary cultures” of the UK and US and the ways in which dictionary publishers help to create, support, and maintain them. By considering marketing materials from the late 19th to early 21st centuries, we get a picture of American dictionary publishers exploiting linguistic insecurity, while British publishers ignore it. American dictionary marketing focuses on the dictionary as a tool for working people and students, while British dictionaries focus on enjoyment of language. The comparative usage studies that are available seem to support the existence of these distinct cultures. The different dictionary cultures can be seen as reflecting different notions of who owns and has access to the language, with an American emphasis on the dictionary as a self-help book for the masses, and the British on dictionary as part of a somewhat elite literary tradition. American dictionary culture embodies a number of deeply held beliefs: that anyone can learn to read and write well, that these skills can be taught, that the written word holds special status, and that a responsible citizenry requires an educated populace. This contrasts with the British culture, in which language is a key marker of social class, where literacy focuses on literature, and where rules are always meant to be a bit fuzzy. These values, combined with the (not unrelated) commercial competition in US dictionaries (and resultant ‘dictionary wars’), make descriptive–prescriptive conflict in lexicography more marked in US dictionary culture.
Biography: Lynne Murphy is Reader in Linguistics at the University of Sussex, where she researches in lexicology, metalexicography, pragmatics, and British/American Englishes. The work presented here is from a British Academy/Leverhulme-funded project on ‘British and American Dictionary Cultures’ and contributes to her forthcoming book on British–American linguistic relations (Penguin, 2018).
The Importance of Island-Specific Dialect Dictionaries in the Caribbean
Dialect dictionaries of a single locality or microarea have the ability to reveal the ways in which dialects both reflect and shape everyday life in underrepresented communities (Patrick 1995; Wolfram, Reaser, and Vaughn 2008). Moreover, they importantly function “as instruments of transmission of knowledge” which may otherwise be under threat of extinction (Barbato and Varvaro 2004). This presentation focuses on the compilation and publication of Johnson’s (2016) A Lee Chip, the first comprehensive dictionary of the English variety spoken on the island of Saba. Home to fewer than 2,100 native residents, Saba is a six-square mile Dutch municipality located in the Eastern Caribbean. A Lee Chip features more than two thousand words, usages, meanings, and folk expressions that can be (or have been) heard on the island. The book includes chapters on the history of English on Saba, as well as chapters on local phonology and grammar. Highlighting the preservative and reflective functions of dialect dictionaries, this presentation addresses (1) the gathering and cataloging of words and phrases; (2) the inclusion of place names, flora, fauna, and fishing rocks; (3) the importance of the book for linguistics and lexicography; and (4) the importance of the book for non-academics. Specific focus is given to the ways in which dialect dictionaries serve as a manifestation of the principle of linguistic gratuity (Wolfram 1993) and engender a sense of cultural pride in the Caribbean.
Biography: Caroline Myrick is a PhD student studying sociolinguistics at North Carolina State University. Her fieldwork on the island of Saba, Dutch Caribbean, served as the basis for the first formal description and acoustic analysis of Saban English, as well as the publication of the first Saban English dictionary.
“It can be edited tenderly”: William Crooke and the revising of Hobson-Jobson
The Anglo-Indian dictionary Hobson-Jobson has existed in print for most of the 130 years since it was first published in 1886 (Yule & Burnell, 1886). Yet nearly everyone familiar today with this dictionary knows it not in its 1886 form, but as a reprint of its 1903 revised edition, edited by the anthropologist William Crooke 14 years after both of Hobson-Jobson’s authors had died (Yule & Burnell, 1903). Surprisingly, this second edition arose not from action by its publisher to keep a successful and highly esteemed work in print, nor from any wish by the heirs of the original authors, Henry Yule and Arthur Burnell, to secure their dictionary’s legacy. Instead, Crooke himself proposed the revision to the publisher John Murray in 1900 after noticing that Hobson-Jobson had recently gone out of print. His intention was to edit it “tenderly, preserving as far as possible [its] original form and style” (JMP, W. Crooke to J. Murray, 19 Feb. 1900, NLS Acc. 12604/1295). Crooke successfully convinced Murray and Yule’s daughter Amy of his qualifications but triggered alarm in 1902 when he revealed that he had retyped the entire dictionary rather than marking his edits on sheets of the printed first edition—a decision that Amy Yule felt unnecessarily added to Crooke’s work but that greatly concerned Murray, for it potentially introduced numerous errors into the second edition. Who was William Crooke and what may have motivated him, Amy Yule, and John Murray to pursue this revision? What types of revisions did Crooke make and how did they differ from those Yule himself might have made had he survived to complete the revision himself? And what were the effects of Crooke’s unfortunate decision to retype the first edition? Using techniques of forensic dictionary analysis (Coleman & Ogilvie, 2009), this paper addresses these questions in unearthing the history of the second edition of Hobson-Jobson, through Yule’s own notes for an anticipated second edition (heretofore unexamined), Crooke’s work, reviewers’ assessments, and other archival records. The paper will reveal that Crooke worked without access to Yule’s own notes for a second edition (UAA, ms. 2860), which were in Amy’s possession but were apparently not shared with Crooke.
Biography: Traci Nagle is a linguist at Indiana University who specializes in sound systems and variations in how speakers acquire and pronounce words. She focuses her research on the languages of South Asia, including English as it was and is spoken there. Her previous work on the 1886/1903 Anglo-Indian dictionary Hobson-Jobson has been published in Dictionaries and in the International Journal of Lexicography.