Using likelihood L-statistics to measure confidence in audio-visual speech recognition
DSpace at IIT Bombay
View Archive InfoField | Value | |
Title |
Using likelihood L-statistics to measure confidence in audio-visual speech recognition
|
|
Creator |
GHOSH, A
VERMIA, A SARKAR, A |
|
Description |
This paper describes recent work on decision fusion in audiovisual speech recognition. lit this work, a novel approach is proposed to combine audio and video channel information in audiovisual speech recognition scenario. We have considered frame-level phonetic classification problem using two single-stream Gaussian Mixture Models. Audio and video streams are adaptively weighted using a cumulative mean of the sample confidence values over past frames in addition to the present sample confidence value. The confidence values for audio and video decisions are computed using an L-statistics (linear combination of order-statistics) of log-likelihoods against phone models. It is shown through various experiments, on a database of about 15000 sentences from large vocabulary continuous speech, that the proposed approach results in better classification accuracy as compared to other approaches.
|
|
Publisher |
IEEE
|
|
Date |
2011-10-25T01:05:22Z
2011-12-15T09:10:43Z 2011-10-25T01:05:22Z 2011-12-15T09:10:43Z 2001 |
|
Type |
Proceedings Paper
|
|
Identifier |
2001 IEEE FOURTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING,27-32
0-7803-7025-2 http://dx.doi.org/10.1109/MMSP.2001.962707 http://dspace.library.iitb.ac.in/xmlui/handle/10054/15563 http://hdl.handle.net/100/1623 |
|
Source |
4th IEEE Workshop on Multimedia Signal Processing (MMSP 01),CANNES, FRANCE,OCT 03-05, 2001
|
|
Language |
English
|
|