Sparsity Motivated Auditory Wavelet Representation and Blind Deconvolution
Electronic Theses of Indian Institute of Science
View Archive InfoField | Value | |
Title |
Sparsity Motivated Auditory Wavelet Representation and Blind Deconvolution
|
|
Creator |
Adiga, Aniruddha
|
|
Subject |
Sparse Coding
Time Frequency Representation Gammatone Wavelet Transform Auditory Wavelet Representation Gammatone Wavelet Transform Auditory System Modeling Blind Deconvolution Voiced Speech Signals Sparse Blind Deconvolution (SBD) Gammatone Wavelet Cepstral Coe cients (GWCC) Electrical Engineering |
|
Description |
In many scenarios, events such as singularities and transients that carry important information about a signal undergo spreading during acquisition or transmission and it is important to localize the events. For example, edges in an image, point sources in a microscopy or astronomical image are blurred by the point-spread function (PSF) of the acquisition system, while in a speech signal, the epochs corresponding to glottal closure instants are shaped by the vocal tract response. Such events can be extracted with the help of techniques that promote sparsity, which enables separation of the smooth components from the transient ones. In this thesis, we consider development of such sparsity promoting techniques. The contributions of the thesis are three-fold: (i) an auditory-motivated continuous wavelet design and representation, which helps identify singularities; (ii) a sparsity-driven deconvolution technique; and (iii) a sparsity-driven deconvolution technique for reconstruction of nite-rate-of-innovation (FRI) signals. We use the speech signal for illustrating the performance of the techniques in the first two parts and super-resolution microscopy (2-D) for the third part. In the rst part, we develop a continuous wavelet transform (CWT) starting from an auditory motivation. Wavelet analysis provides good time and frequency localization, which has made it a popular tool for time-frequency analysis of signals. The CWT is a multiresolution analysis tool that involves decomposition of a signal using a constant-Q wavelet filterbank, akin to the time-frequency analysis performed by basilar membrane in the peripheral human auditory system. This connection motivated us to develop wavelets that possess auditory localization capabilities. Gammatone functions are extensively used in the modeling of the basilar membrane, but the non-zero average of the functions poses a hurdle. We construct bona de wavelets from the Gammatone function called Gammatone wavelets and analyze their properties such as admissibility, time-bandwidth product, vanishing moments, etc.. Of particular interest is the vanishing moments property, which enables the wavelet to suppress smooth regions in a signal leading to sparsi cation. We show how this property of the Gammatone wavelets coupled with multiresolution analysis could be employed for singularity and transient detection. Using these wavelets, we also construct equivalent lterbank models and obtain cepstral feature vectors out of such a representation. We show that the Gammatone wavelet cepstral coefficients (GWCC) are effective for robust speech recognition compared with mel-frequency cepstral coefficients (MFCC). In the second part, we consider the problem of sparse blind deconvolution (SBD) starting from a signal obtained as the convolution of an unknown PSF and a sparse excitation. The BD problem is ill-posed and the goal is to employ sparsity to come up with an accurate solution. We formulate the SBD problem within a Bayesian framework. The estimation of lter and excitation involves optimization of a cost function that consists of an `2 data- fidelity term and an `p-norm (p 2 [0; 1]) regularizer, as the sparsity promoting prior. Since the `p-norm is not differentiable at the origin, we consider a smoothed version of the `p-norm as a proxy in the optimization. Apart from the regularizer being non-convex, the data term is also non-convex in the filter and excitation as they are both unknown. We optimize the non-convex cost using an alternating minimization strategy, and develop an alternating `p `2 projections algorithm (ALPA). We demonstrate convergence of the iterative algorithm and analyze in detail the role of the pseudo-inverse solution as an initialization for the ALPA and provide probabilistic bounds on its accuracy considering the presence of noise and the condition number of the linear system of equations. We also consider the case of bounded noise and derive tight tail bounds using the Hoe ding inequality. As an application, we consider the problem of blind deconvolution of speech signals. In the linear model for speech production, voiced speech is assumed to be the result of a quasi-periodic impulse train exciting a vocal-tract lter. The locations of the impulses or epochs indicate the glottal closure instants and the spacing between them the pitch. Hence, the excitation in the case of voiced speech is sparse and its deconvolution from the vocal-tract filter is posed as a SBD problem. We employ ALPA for SBD and show that excitation obtained is sparser than the excitations obtained using sparse linear prediction, smoothed `1=`2 sparse blind deconvolution algorithm, and majorization-minimization-based sparse deconvolution techniques. We also consider the problem of epoch estimation and show that epochs estimated by ALPA in both clean and noisy conditions are closer to the instants indicated by the electroglottograph when with to the estimates provided by the zero-frequency ltering technique, which is the state-of-the-art epoch estimation technique. In the third part, we consider the problem of deconvolution of a specific class of continuous-time signals called nite-rate-of-innovation (FRI) signals, which are not bandlimited, but specified by a nite number of parameters over an observation interval. The signal is assumed to be a linear combination of delayed versions of a prototypical pulse. The reconstruction problem is posed as a 2-D SBD problem. The kernel is assumed to have a known form but with unknown parameters. Given the sampled version of the FRI signal, the delays quantized to the nearest point on the sampling grid are rst estimated using proximal-operator-based alternating `p `2 algorithm (ALPAprox), and then super-resolved to obtain o -grid (O. G.) estimates using gradient-descent optimization. The overall technique is termed OG-ALPAprox. We show application of OG-ALPAprox to a particular modality of super-resolution microscopy (SRM), called stochastic optical reconstruction microscopy (STORM). The resolution of the traditional optical microscope is limited by di raction and is termed as Abbe's limit. The goal of SRM is to engineer the optical imaging system to resolve structures in specimens, such as proteins, whose dimensions are smaller than the di raction limit. The specimen to be imaged is tagged or labeled with light-emitting or uorescent chemical compounds called uorophores. These compounds speci cally bind to proteins and exhibit uorescence upon excitation. The uorophores are assumed to be point sources and the light emitted by them undergo spreading due to di raction. STORM employs a sequential approach, wherein each step only a few uorophores are randomly excited and the image is captured by a sensor array. The obtained image is di raction-limited, however, the separation between the uorophores allows for localizing the point sources with high precision. The localization is performed using Gaussian peak- tting. This process of random excitation coupled with localization is performed sequentially and subsequently consolidated to obtain a high-resolution image. We pose the localization as a SBD problem and employ OG-ALPAprox to estimate the locations. We also report comparisons with the de facto standard Gaussian peak- tting algorithm and show that the statistical performance is superior. Experimental results on real data show that the reconstruction quality is on par with the Gaussian peak- tting. |
|
Contributor |
Seelamantula, Chandra Sekhar
|
|
Date |
2018-08-29T05:01:35Z
2018-08-29T05:01:35Z 2018-08-29 2017 |
|
Type |
Thesis
|
|
Identifier |
http://etd.iisc.ernet.in/2005/3987
http://etd.iisc.ernet.in/abstracts/4875/G28512-Abs.pdf |
|
Language |
en_US
|
|
Relation |
G28512
|
|