Webbläsaren som du använder stöds inte av denna webbplats. Alla versioner av Internet Explorer stöds inte längre, av oss eller Microsoft (läs mer här: * https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Var god och använd en modern webbläsare för att ta del av denna webbplats, som t.ex. nyaste versioner av Edge, Chrome, Firefox eller Safari osv.

Can we trust Web-page metadata?

Författare

  • Anders Ardö

Summary, in English

A statistical study of embedded metadata in a sample of

more than 4 million HTML Web-pages is reported. The paper tries to

determine and quantify the validity of this metadata. Of particular

interest is to see if it is trustworthy enough for determining the

topic of a Web-page. Datasets are collected by a Web crawler running

both as a general and a focused crawler. Metadata fields 'title',

'author', 'keywords', 'description', and 'language' are analyzed in

detail together with Dublin Core metadata. The study reveals

problems with how metadata is created. Among the 75 \% of all

Web-pages that have interesting metadata, the field 'language' is

the most trustworthy. All other metadata fields show a high degree

of duplication thus degrading their usefulness. The strict answer to

the title question is 'No', however there is a lot of meaningful and

useful information, but it must be interpreted and used with

care. The study also provides statistics on the usage of metadata

today and how it has changed over time.

Publiceringsår

2010

Språk

Engelska

Sidor

58-74

Publikation/Tidskrift/Serie

Journal of Library Metadata

Volym

10

Issue

1

Dokumenttyp

Artikel i tidskrift

Förlag

Routledge

Ämne

  • Electrical Engineering, Electronic Engineering, Information Engineering

Nyckelord

  • Dublin Core
  • metadata analysis
  • metadata usage
  • metadata validity
  • Web metadata

Status

Published

Forskningsgrupp

  • KnowLib

ISBN/ISSN/Övrigt

  • ISSN: 1938-6389