Date: June 27, 2017. 12:00
Location: CCU Seminar Room
Title: Inormation Theory of Deep Neural Networks: Why and how Deep Learning works so well?
Affiliation: The Hebrew University, Jerusalem, Israel.
Despite their great success, the theoretical understating of Deep
Neural Networks (DNNs) and their inner organization is still lacking.
We proposed to analyze DNNs in the Information Plane, the mutual information that each layer preserve on the input and on the label variables. They formulated the goal of the network as optimizing the Information Bottleneck (IB) tradeoff between compression and prediction, successively,for each layer.
In this talk I will present a theory of Deep Neural Networks which explains many of the key problems behind DNNs, including their self-regularization, optimality, and benefit of the many hidden layers.Our analysis gives the following new results: (i) the Stochastic Gradient Decent (SGD) optimization has two main phases. In the first and shorter phase the layers increase the information on the labels (fitting), while in the second and much longer phase the layer reduce the information on the input (compression phase). We argue that the second phase amounts to a stochastic relaxation (diffusion) that maximizes the conditional entropy of the layers subject to the empirical error constraint. (ii) The converged layers lie on or very close to the IB theoretical bound, for different values of the tradeoff parameter, and the maps from the input to to each layer (encoder) and from the layer to the output (decoder) satisfy the IB self-consistent optimality conditions. (iii) The main advantage of the hidden layers is computational, as they dramatically reduce the stochastic relaxation times. (iv) The hidden layers appear to lie close to critical points on the IB bound, which can be explained by critical slowing down of the stochastic relaxation process.