Du är här

Extended constituent-to-dependency conversion for English

Författare:
Redaktör:
  • Joakim Nivre
  • Heiki-Jaan Kalep
  • Kadri Muischnek
  • Mare Koit
Publiceringsår: 2007
Språk: Engelska
Sidor: 105-112
Publikation/Tidskrift/Serie: NODALIDA 2007 Proceedings
Dokumenttyp: Konferensbidrag
Förlag: University of Tartu

Sammanfattning

We describe a new method to convert English constituent trees using the Penn Treebank annotation style into dependency trees. The new format was inspired by annotation practices used in other dependency treebanks with the intention to produce a better interface to further semantic processing than existing methods. In particular, we used a richer set of edge labels and introduced links to handle long-distance phenomena such as wh-movement and topicalization.

The resulting trees generally have a more complex dependency structure. For example, 6% of the trees contain at least one nonprojective link, which is difficult for many parsing algorithms. As can be expected, the more complex structure and the enriched set of edge labels make the trees more difficult to predict, and we observed a decrease in parsing accuracy when applying two dependency parsers to the new corpus. However, the richer information contained in the new trees resulted in a 23% error reduction in a baseline FrameNet semantic role labeler that relied on dependency arc labels only.

Disputation

Nyckelord

  • Technology and Engineering
  • dependency syntax
  • treebanks
  • Natural language processing

Övrigt

16th Nordic Conference of Computational Linguistics
2007-05-25/2007-05-26
Tartu, Estonia
Published
Yes
  • ISBN: 978-9985-4-0514-7

Box 117, 221 00 LUND
Telefon 046-222 00 00 (växel)
Telefax 046-222 47 20
lu [at] lu [dot] se

Fakturaadress: Box 188, 221 00 LUND
Organisationsnummer: 202100-3211
Om webbplatsen

LERU logo U21 logo