Gist

Probabilistic programming is not just another way of thinking, it’s just as effective as any other machine learning algorithm. If applied to the iris dataset (the hello-world of ML) you get something like the following.

We’ll model the binary classification of the ‘setosa’ and ‘versicolor’ types using the sepal length. So, we assume that the sigmoid \theta
\theta \sim (1 + \exp-(\alpha +\beta.x))^{-1}

can differentiate the two classes with

\alpha \sim N(0, 10)
\beta \sim N(0, 10)

and the binary classification (success) distributed as Bernoulli:

bin \sim B(\theta)

The data has to be transformed into a binary problem first:

and the translation of the model description amounts to:

where you are free to experiment with various optimizations. Much like one would with neural networks in fact.

A nice visualization of the decision boundary can be obtained as follows:

For myself, this is where one sees most clearly the difference with non-probabilistic approaches. The uncertainty of the prior propagates into the model and you get fuzzy predictive power.

Similarly for the parms distributions:

Finally, to use this model to predict new cases you can assemble something like this: