# Mean field neural network

Pick up a neural network with the typical propagation

$y_k^l = \sum_i w_{ki}^ly_i^{l-1}+b_k$

with depth $N$ and size  $N_l$ at level  $l$. If you would want to compute the average norm of the values at level  $l$

$q^l = \frac{1}{N_l}\sum_i (y_i^l)^2$

you would have a typical recursive situation like one has e.g. with spin models.

Let’s assume that the network is large  $1\ll N$ and that the weight are normal
$w^l_{mn} \sim N(0,\frac{\sigma^2_w}{N_l})$
$b_m \sim N(0,\sigma^2_b)$
then the expectation of the vector  $q^l$ is

$\mathbb{E}(q^l) = \frac{\sigma^2_w}{N_l}\sum_k(y^{l-1}_k)^2 + \sigma^2_b$
which, as aforementioned, is recursive but can be solved by using a mean-field approach. Instead of using the precise value at level  $l-1$ we assume that the network is large, there is a thermal equilibrium and the effect at a node is averaged out;

$\frac{1}{N_l}\sum_k(y^{l-1}_k)^2 \sim \int\;dz \;z^2\;\exp{-\frac{z^2}{q^{l-1}}} = \int \;d\mu^{l-1}(z)\;z^2$

with  $\mu$ the Gaussian measure. So, taken together this gives
$q^l = \sigma_w^2\; \int \;d\mu\; \phi(z)^2 + \sigma^2_b$

with  $\phi$ the non-linear activation function. As it stands this can now be used and analyzed like any statistical field theory. For example, what are the critical points, is there a phase transition, how does the activation function influence the results and so on.

### Why does it matter?

All of this might seem too abstract to you but this kind of research in fact really makes a difference:

• when you initialize a neural network, what should you take? Does it matter and if not, why not?
• if you train a neural network, are you sure it will converge? In what context do the gradients disappear or blow up?

Furthermore, there are some interesting extrapolations to our own brain:

• information travels across gigantic amounts of neurons, but what makes it stop? Why don’t we have deadlocks and persistent information flows?
• phase transitions and critical points occur in all large networks, what does it mean in our thinking? Why is it or is it not happening?
Tags: