Webbläsaren som du använder stöds inte av denna webbplats. Alla versioner av Internet Explorer stöds inte längre, av oss eller Microsoft (läs mer här: * https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Var god och använd en modern webbläsare för att ta del av denna webbplats, som t.ex. nyaste versioner av Edge, Chrome, Firefox eller Safari osv.

Word length, sentence length and frequency : Zipf's law revisited

Författare

Summary, in English

This paper examines data from English, Swedish and German in order to find a theoretical distribution that describes the observed relation between word length and frequency. In Swedish and English, most word tokens consist of three letters only, while shorter or longer words occur less frequently. We found that the equation with the general form fexp = a * Lb * cL (a variant of the so-called gamma distribution) approximates the observed frequencies reasonably well. This formula incorporates both the fact that the number of possible words increases with word length, and the fact that longer words tend to be avoided, presumably because they are uneconomic. To our knowledge this formula has not been proposed to describe word frequency data. We examined frequency distributions of word length in Swedish and English, and explored different variants of the equation by systematically varying the a, b and c parameters. Subsequently, we also applied the formula to the frequency distribution of sentence length in English, and found an almost perfect fit for a corpus consisting of different text genres. Moreover, the data showed that the formula can be used to distinguish between different kinds of text genres.

Publiceringsår

2004

Språk

Engelska

Sidor

37-52

Publikation/Tidskrift/Serie

Studia Linguistica

Volym

58

Issue

1

Dokumenttyp

Artikel i tidskrift

Förlag

Wiley-Blackwell

Ämne

  • General Language Studies and Linguistics

Nyckelord

  • length frequency

Status

Published

ISBN/ISSN/Övrigt

  • ISSN: 1467-9582