Webbläsaren som du använder stöds inte av denna webbplats. Alla versioner av Internet Explorer stöds inte längre, av oss eller Microsoft (läs mer här: * https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Var god och använd en modern webbläsare för att ta del av denna webbplats, som t.ex. nyaste versioner av Edge, Chrome, Firefox eller Safari osv.

Comparing and combining two approaches to automated subject classification of text

Författare

  • Koraljka Golub
  • Anders Ardö
  • Dunja Mladenic
  • Marko Grobelnik

Summary, in English

A machine-learning and a string-matching approach to automated subject classification of text were compared, as to their performance, advantages and downsides. The former approach was based on an SVM algorithm, while the latter comprised string-matching between a controlled vocabulary and words in the text to be classified. Data collection consisted of a subset from Compendex, classified into six different classes. It was shown that SVM on average outperforms the string-matching approach: our hypothesis that SVM yields better recall and string-matching better precision was confirmed only on one of the classes. The two approaches being complementary, we investigated different combinations of the two based on combining their vocabularies. The results have shown that the original approaches, i.e. machine-learning approach without using background knowledge from the controlled vocabulary, and string-matching approach based on controlled vocabulary, outperform approaches in which combinations of automatically and manually obtained terms were used. Reasons for these results need further investigation, including a larger data collection and combining the two using predictions.

Publiceringsår

2006

Språk

Engelska

Sidor

467-470

Publikation/Tidskrift/Serie

Research and Advanced Technology for Digital Libraries. Proceedings / Lecture Notes in Computer Science

Volym

4172

Dokumenttyp

Konferensbidrag

Förlag

Springer

Ämne

  • Electrical Engineering, Electronic Engineering, Information Engineering

Conference name

10th European Conference, ECDL 2006

Conference date

2006-09-17 - 2006-09-22

Conference place

Alicante, Spain

Status

Published

ISBN/ISSN/Övrigt

  • ISSN: 1611-3349
  • ISSN: 0302-9743