Webbläsaren som du använder stöds inte av denna webbplats. Alla versioner av Internet Explorer stöds inte längre, av oss eller Microsoft (läs mer här: * https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Var god och använd en modern webbläsare för att ta del av denna webbplats, som t.ex. nyaste versioner av Edge, Chrome, Firefox eller Safari osv.

Tone restoration in transcribed Kammu: decision-list word sense disambiguation for an unwritten language

Författare

Redaktör

  • Stephan Oepen
  • Kristin Hagen
  • Janne Bondi Johannessen

Summary, in English

The RWAAI (Repository and Workspace for Austroasiatic Intangible heritage) project aims at building a digital archive out of existing legacy data from the austroasiatic language family. One aspect of the project is the preservation of analogue legacy data. In this context, we have at our hands a large number of mostly-phonemic transcriptions of narrative monologues, often with accompanying sound recordings, in the unwritten Kammu language of northern Laos. Some of the transcriptions, however, lack tone marks, which for a tonal language such as Kammu makes them substantially less useful. The problem of restoring tones can be recast as one of word sense disambiguation, or, more generally, lexical ambiguity resolution. We attack it by decision lists, along the lines of Yarowsky (1994), using the tone-marked part of the corpus (120kW) as training data. The performance ceiling of this corpus is uncertain: the stories were all annotated, primarily for human rather than machine consumption, by a single person during almost 40 years, with slowly emerging idiosyncratic conventions. Thus, both inter-annotator and intra-annotator agreement figures are unknown. Nevertheless, with the data from this one annotator as a gold standard, we improve from an already-high baseline accuracy of 95.7% to 97.2% (by 10-fold cross-validation).

Publiceringsår

2013

Språk

Engelska

Sidor

399-410

Publikation/Tidskrift/Serie

Linköping Electronic Conference Proceedings

Volym

85

Dokumenttyp

Konferensbidrag

Ämne

  • General Language Studies and Linguistics

Nyckelord

  • word sense disambiguation
  • Kammu
  • decision lists
  • lexical ambiguity resolution
  • tone restoration
  • legacy data

Conference name

Nodalida 2013

Conference date

2013-05-23

Status

Published

ISBN/ISSN/Övrigt

  • ISSN: 1650-3686
  • ISSN: 1650-3740