Record Details

Compressed Domain Processing of MPEG Audio

Electronic Theses of Indian Institute of Science


Field	Value

Title	Compressed Domain Processing of MPEG Audio

Creator	Anantharaman, B

Subject	Electrical Communications MPEG Audio Coding Digital Technique Audio Signal Processing Least Significant Bit (LSB) Audio Signals Compression Wavelet Coefficients Time Scale Modifications Sinusoidal Model Compressed Domain Wavelet Based Pitch Extraction Audio Watermarking

Description	MPEG audio compression techniques significantly reduces the storage and transmission requirements for high quality digital audio. However, compression complicates the processing of audio in many applications. If a compressed audio signal is to be processed, a direct method would be to decode the compressed signal, process the decoded signal and re-encode it. This is computationally expensive due to the complexity of the MPEG filter bank. This thesis deals with processing of MPEG compressed audio. The main contributions of this thesis are a) Extracting wavelet coefficients in the MPEG compressed domain. b) Wavelet based pitch extraction in MPEG compressed domain. c) Time Scale Modifications of MPEG audio. d) Watermarking of MPEG audio. The research contributions starts with a technique for calculating several levels of wavelet coefficients from the output of the MPEG analysis filter bank. The technique exploits the toeplitz structure which arises when the MPEG and wavelet filter banks are represented in a matrix form, The computational complexity for extracting several levels of wavelet coefficients after decoding the compressed signal and directly from the output of the MPEG analysis filter bank are compared. The proposed technique is found to be computationally efficient for extracting higher levels of wavelet coefficients. Extracting pitch in the compressed domain becomes essential when large multimedia databases need to be indexed. For example one may be interested in listening to a particular speaker or to listen to male female audio segments in a multimedia document. For this application, pitch information is one of the very basic and important features required. Pitch is basically the time interval between two successive glottal closures. Glottal closures are accompanied by sharp transients in the speech signal which in turn gives rise to a local maxima in the wavelet coefficients. Pitch can be calculated by finding the time interval between two successive maxima in the wavelet coefficients. It is shown that the computational complexity for extracting pitch in the compressed domain is less than 7% of the uncompressed domain processing. An algorithm for extracting pitch in the compressed domain is proposed. The result of this algorithm for synthetic signals, and utterances of words by male/female is reported. In a number of important applications, one needs to modify an audio signal to render it more useful than its original. Typical applications include changing the time evolution of an audio signal (increase or decrease the rate of articulation of a speaker),or to adapt a given audio sequence to a given video sequence. In this thesis, time scale modifications are obtained in the subband domain such that when the modified subband signals are given to the MPEG synthesis filter bank, the desired time scale modification of the decoded signal is achieved. This is done by making use of sinusoidal modeling [I]. Here, each of the subband signal is modeled in terms of parameters such as amplitude phase and frequencies and are subsequently synthesised by using these parameters with Ls = k La where Ls is the length of the synthesis window , k is the time scale factor and La is the length of the analysis window. As the PCM version of the time scaled signal is not available, psychoacoustic model based bit allocation cannot be used. Hence a new bit allocation is done by using a subband coding algorithm. This method has been satisfactorily tested for time scale expansion and compression of speech and music signals. The recent growth of multimedia systems has increased the need for protecting digital media. Digital watermarking has been proposed as a method for protecting digital documents. The watermark needs to be added to the signal in such a way that it does not cause audible distortions. However the idea behind the lossy MPEC encoders is to remove or make insignificant those portions of the signal which does not affect human hearing. This renders the watermark insignificant and hence proving ownership of the signal becomes difficult when an audio signal is compressed. The existing compressed domain methods merely change the bits or the scale factors according to a key. Though simple, these methods are not robust to attacks. Further these methods require original signal to be available in the verification process. In this thesis we propose a watermarking method based on spread spectrum technique which does not require original signal during the verification process. It is also shown to be more robust than the existing methods. In our method the watermark is spread across many subband samples. Here two factors need to be considered, a) the watermark is to be embedded only in those subbands which will make the addition of the noise inaudible. b) The watermark should be added to those subbands which has sufficient bit allocation so that the watermark does not become insignificant due to lack of bit allocation. Embedding the watermark in the lower subbands would cause distortion and in the higher subbands would prove futile as the bit allocation in these subbands are practically zero. Considering a11 these factors, one can introduce noise to samples across many frames corresponding to subbands 4 to 8. In the verification process, it is sufficient to have the key/code and the possibly attacked signal. This method has been satisfactorily tested for robustness to scalefactor, LSB change and MPEG decoding and re-encoding.

Publisher	Indian Institute of Science

Contributor	Ramakrishnan, K R

Date	2005-02-11T06:36:18Z 2005-02-11T06:36:18Z 2005-02-11T06:36:18Z 2001-03

Type	Electronic Thesis and Dissertation

Format	2757785 bytes application/pdf

Identifier	http://etd.iisc.ernet.in/handle/2005/68 null

Language	en

Rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

ICAR Research Data Repository for Knowledge Management

Record Details

Compressed Domain Processing of MPEG Audio

Electronic Theses of Indian Institute of Science