This is a straightforward example of applying Bayesian stats to estimating the mean and variance of a distribution based on a given series. It helps to understand the theory (Monte-Carlo, Markov etc.) underneath this approach but you can also see it as just another ML technique in your toolbox.

There is a large philosophical context around Bayesian vs. frequentist approaches but personally I don’t understand the fuss and see both approaches like one thinks of classical vs. quantum mechanics. The two approaches don’t contradict each other but rather complement. Just like quantum mechanics, the frequentist perception is valid in the case of a contimuum (i.e. large or infinite amounts of repeatable experiments).

There is no better intro to probabilistic programming than the excellent Bayesian methods for hackers book.

Context and assumptions

  • you have a dataset with a limited amount of samples taken from a distribution
  • the underlying parameters of the distribution are unknown

The crucial bit is the word ‘limited’ in the sense that with limited information comes uncertainty. Imagine you have to measure something ones and would rather check twice; this leads to uncertainty in your measurment. While a frequentist has presumably limitless context and data you have to do with a limited context. The lack of data can however be augmented with context or knowledge you have about the data. Imagine you have a dataset (like below) and plot a histogram which shows that it looks like a normal distribution. This ‘looks like’ is info you can add to your data in the context of a probabilistic model. While it might not be true in an objective sense, it can help to make an estimate. Someone else might judge the histogram shows a Poisson shape and draw a different conclusion. This situation is typical of the Bayesian approach; you make assumptions (often called believes) and this is a subjective thing. Adding (business) context to data is a subjective thing; domain knowledge is subjective.

So, let’s take some normal data and assume that the real mean/variance are God-given and hidden for eternity.

normalhistogram

The histogram shows that the mean should be somewhere in the [2,8] interval. This is insight you add to the model. In a way it replaces the lack of data which you would have in a frequentist approach. Someone else might infer that the mean is around [2,6] and that would be fine as well.

Probabilistic frameworks like PyMC help you to construct models of data and assumptions. They help to formalize observations like the one above. For example, the perception that the mean sits somewhere between 2 and 8 can be stated as:

At the same time, the perception that the precision (precision = 1/(std dev)^2) is somewhere between 0.025 and 2 can be formulated as

Finally, the believe that the data above is normally distributed can be written as

At this point we have added all we know about the data. If you would have additional parameters and additional domain expertise you could add it as well.

The next crucial bit is that a data series like above could be the outcome of various distributions. That’s the thing with probability; even if something is highly unlikely it still can happen. So, even if you would take a sample from a normal distribution with mean -10 and variance 45 it could be precisely the data above. Hence, the mean/variance cannot be precise and are distributed as well.

Probalistic frameworks like pymc make it easy to experiment with this fact and if you have a model it returns to you what the distributions are. That’s where Monte Carlo simulations come in and where larger experiments means more precisions. So, in our case we perform 20K experiments with the model like so

What do you get out of this? There are two distributed parameters in our model (mean and precision) and this is precisely what the experimentation returns. For example, the histogram of the mean:

meanestimate

It shows that based on our initial data and the model we defined the mean is somewhere around 5.0 or 5.1. The precision at the same time:

So, the model is only indicative and is based both on your assumptions (the model defined), the Monte Carlo experimentation and the amount of data you supplied. If adorn the model with more knowledge or have a larger dataset it will improve the sharpness of the mean/variance distributions.

PyMC and MCMC contain much more than what we have shown here. For example, if the default precision parameter is too abstract for you, the standard deviation can be introduced in the model as follows.

Modeling things in this way is actually great fun and if you read through the Bayesian methods for hackers book you will find practical real-world examples showing that this approach allows you to ask interesting questions about data. It also supports ‘scientifically’ what you think you know or see about data. In this sense, probabilistic programming is the correct approach to model opinions, subjective views and domain knowledge.