Mean field neural network

Pick up a neural network with the typical propagation

$y_k^l = \sum_i w_{ki}^ly_i^{l-1}+b_k$

with depth $N$ and size  $N_l$ at level  $l$. If you would want to compute the average norm of the values at level  $l$

$q^l = \frac{1}{N_l}\sum_i (y_i^l)^2$

you would have a typical recursive situation like one has e.g. with spin models.

Let’s assume that the network is large  $1\ll N$ and that the weight are normal
$w^l_{mn} \sim N(0,\frac{\sigma^2_w}{N_l})$
$b_m \sim N(0,\sigma^2_b)$
then the expectation of the vector  $q^l$ is

$\mathbb{E}(q^l) = \frac{\sigma^2_w}{N_l}\sum_k(y^{l-1}_k)^2 + \sigma^2_b$
which, as aforementioned, is recursive but can be solved by using a mean-field approach. Instead of using the precise value at level  $l-1$ we assume that the network is large, there is a thermal equilibrium and the effect at a node is averaged out;

$\frac{1}{N_l}\sum_k(y^{l-1}_k)^2 \sim \int\;dz \;z^2\;\exp{-\frac{z^2}{q^{l-1}}} = \int \;d\mu^{l-1}(z)\;z^2$

with  $\mu$ the Gaussian measure. So, taken together this gives
$q^l = \sigma_w^2\; \int \;d\mu\; \phi(z)^2 + \sigma^2_b$

with  $\phi$ the non-linear activation function. As it stands this can now be used and analyzed like any statistical field theory. For example, what are the critical points, is there a phase transition, how does the activation function influence the results and so on.