Webbläsaren som du använder stöds inte av denna webbplats. Alla versioner av Internet Explorer stöds inte längre, av oss eller Microsoft (läs mer här: * https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Var god och använd en modern webbläsare för att ta del av denna webbplats, som t.ex. nyaste versioner av Edge, Chrome, Firefox eller Safari osv.

Low-Variance Multitaper MFCC Features: A Case Study in Robust Speaker Verification

Författare

  • Tomi Kinnunen
  • Rahim Saeidi
  • Filip Sedlak
  • Kong Aik Lee
  • Johan Sandberg
  • Maria Sandsten
  • Haizhou Li

Summary, in English

In speech and audio applications, short-term signal spectrum is often represented using mel-frequency cepstral coefficients (MFCCs) computed from a windowed discrete Fourier transform (DFT). Windowing reduces spectral leakage but variance of the spectrum estimate remains high. An elegant extension to windowed DFT is the so-called multitaper method which uses multiple time-domain windows (tapers) with frequency-domain averaging. Multitapers have received little attention in speech processing even though they produce low-variance features. In this paper, we propose the multitaper method for MFCC extraction with a practical focus. We provide, first, detailed statistical analysis of MFCC bias and variance using autoregressive process simulations on the TIMIT corpus. For speaker verification experiments on the NIST 2002 and 2008 SRE corpora, we consider three Gaussian mixture model based classifiers with universal background model (GMM-UBM), support vector machine (GMM-SVM) and joint factor analysis (GMM-JFA). Multitapers improve MinDCF over the baseline windowed DFT by relative 20.4% (GMM-SVM) and 13.7% (GMM-JFA) on the interview-interview condition in NIST 2008. The GMM-JFA system further reduces MinDCF by 18.7% on the telephone data. With these improvements and generally noncritical parameter selection, multitaper MFCCs are a viable candidate for replacing the conventional MFCCs.

Avdelning/ar

Publiceringsår

2012

Språk

Engelska

Sidor

1990-2001

Publikation/Tidskrift/Serie

IEEE Transactions on Audio, Speech, and Language Processing

Volym

20

Issue

7

Dokumenttyp

Artikel i tidskrift

Förlag

IEEE - Institute of Electrical and Electronics Engineers Inc.

Ämne

  • Probability Theory and Statistics

Nyckelord

  • Mel-frequency cepstral coefficient (MFCC)
  • multitaper
  • small-variance
  • estimation
  • speaker verification

Status

Published

Forskningsgrupp

  • Statistical Signal Processing Group

ISBN/ISSN/Övrigt

  • ISSN: 1558-7924