Javascript verkar inte påslaget? - Vissa delar av Lunds universitets webbplats fungerar inte optimalt utan javascript, kontrollera din webbläsares inställningar.
Du är här

Importance of HTML structural elements and metadata in automated subject classification

  • Koraljka Golub
  • Anders Ardö
Publiceringsår: 2005
Språk: Engelska
Sidor: 368-378
Publikation/Tidskrift/Serie: Research and advanced technology for digital libraries / Lecture Notes in Computer Science
Volym: 3652
Dokumenttyp: Konferensbidrag
Förlag: Springer


The aim of the study was to determine how significance indicators assigned to different Web page elements (internal metadata, title, headings, and main text) influence automated classification. The data collection that was used comprised 1000 Web pages in engineering, to which Engineering Information classes had been manually assigned. The significance indicators were derived using several different methods: (total and partial) precision and recall, semantic distance and multiple regression. It was shown that for best results all the elements have to be included in the classification process. The exact way of combining the significance indicators turned out not to be overly important: using the F1 measure, the best combination of significance indicators yielded no more than 3% higher performance results than the baseline.


  • Electrical Engineering, Electronic Engineering, Information Engineering


9th European Conference, ECDL 2005
  • ISSN: 1611-3349
  • ISSN: 0302-9743
  • ISBN: 3-540-28767-1
  • doi:10.1007/3-540-45747-X

Box 117, 221 00 LUND
Telefon 046-222 00 00 (växel)
Telefax 046-222 47 20
lu [at] lu [dot] se

Fakturaadress: Box 188, 221 00 LUND
Organisationsnummer: 202100-3211
Om webbplatsen