Introduction:
Bayesian machine learning techniques allow us to obtain a posterior density for individual predictions instead of just the mean. This additional information allows us to understand and explore the uncertainty involved. However not all uncertainties are the same; one could be certain the mean of a predicted value is a certain value but there could be high variance or one can not be certain of a particular prediction because the training set didn’t include values like the predicted input. This post will explore these two types of uncertainty and see if:
- these methods can calculate the variance of the dataset itself and
- how this compares with uncertainty of a value it hasn’t seen before (low support regions in data)
Data Setup:
I generated a dataset with the variance of \(y\) as a function of \(x\) but with zero mean as follows.
\[ x = sequence(-10,10,.1)\] \[ y = normal(0, sin(x) +2) \]