Tuesday, 6th of October
(University of Oxford)
Initializing a neural network on the edge of chaos
How to initialize correctly weights and biases of an infinitely wide and deep neural network ? Glorot and Bengio (2010), then He et al. (2015), have proposed a simple answer based on the preservation of the variance of the preactivations during the forward pass. Afterwards, Poole et al. proposed the concept of "Edge of Chaos" in the paper "Exponential expressivity in deep neural networks through transient chaos" (2016). They proposed another definition of "correct" initialization. Instead of looking at the variance of the preactivations, they considered the evolution of the correlation between two inputs during the forward pass. This new point of view led to finer results, as the evidence of a phase-transition-like phenomenon according to the initialization distribution. Moreover, we are now able to predict the typical depth at which information can be propagated or backpropagated at initialization. Since the theoretical results of Edge of Chaos rely on an infinite-width assumption, some links have been drawn with the Neural Tangents Kernels (NTK).
All TAU seminars: here