Record Details

Learning Compact Architectures for Deep Neural Networks

Electronic Theses of Indian Institute of Science

View Archive Info
 
 
Field Value
 
Title Learning Compact Architectures for Deep Neural Networks
 
Creator Srinivas, Suraj
 
Subject Deep Neural Networks
Learning Compact Architectures
Machine Learning
Binary Neural Nets
Architecture Learning
Sparse Neural Networks
Bayesian Neural Networks
Neural Network Architectures
Computational and Data Sciences
 
Description Deep neural networks with millions of parameters are at the heart of many state of the art computer vision models. However, recent works have shown that models with much smaller number of parameters can often perform just as well. A smaller model has the advantage of being faster to evaluate and easier to store - both of which are crucial for real-time and embedded applications. While prior work on compressing neural networks have looked at methods based on sparsity, quantization and factorization of neural network layers, we look at the alternate approach of pruning neurons.
Training Neural Networks is often described as a kind of `black magic', as successful training requires setting the right hyper-parameter values (such as the number of neurons in a layer, depth of the network, etc ). It is often not clear what these values should be, and these decisions often end up being either ad-hoc or driven through extensive experimentation. It would be desirable to automatically set some of these hyper-parameters for the user so as to minimize trial-and-error. Combining this objective with our earlier preference for smaller models, we ask the following question - for a given task, is it possible to come up with small neural network architectures automatically? In this thesis, we propose methods to achieve the same.
The work is divided into four parts. First, given a neural network, we look at the problem of identifying important and unimportant neurons. We look at this problem in a data-free setting, i.e; assuming that the data the neural network was trained on, is not available. We propose two rules for identifying wasteful neurons and show that these suffice in such a data-free setting. By removing neurons based on these rules, we are able to reduce model size without significantly affecting accuracy.
Second, we propose an automated learning procedure to remove neurons during the process of training. We call this procedure ‘Architecture-Learning’, as this automatically discovers the optimal width and depth of neural networks. We empirically show that this procedure is preferable to trial-and-error based Bayesian Optimization procedures for selecting neural network architectures.

Third, we connect ‘Architecture-Learning’ to a popular regularize called ‘Dropout’, and propose a novel regularized which we call ‘Generalized Dropout’. From a Bayesian viewpoint, this method corresponds to a hierarchical extension of the Dropout algorithm. Empirically, we observe that Generalized Dropout corresponds to a more flexible version of Dropout, and works in scenarios where Dropout fails.
Finally, we apply our procedure for removing neurons to the problem of removing weights in a neural network, and achieve state-of-the-art results in scarifying neural networks.
 
Contributor Venkatesh Babu, R
 
Date 2018-05-22T15:04:55Z
2018-05-22T15:04:55Z
2018-05-22
2017
 
Type Thesis
 
Identifier http://etd.iisc.ernet.in/2005/3581
http://etd.iisc.ernet.in/abstracts/4449/G28168-Abs.pdf
 
Language en_US
 
Relation G28168