<< Chapter < Page | Chapter >> Page > |
So there’s some current distribution over the events A one through AK, and maybe they’re independent maybe not, no assumption on that. Then the probability of A one or A two or dot, dot, dot, up to AK, this union symbol, this hat, this just means this sort of just set notation for probability just means “or.” So the probability of at least one of these events occurring, of A one or A two, or up to AK, this is S equal to the probability of A one plus probability of A two plus dot, dot, dot, plus probability of AK. Okay? So the intuition behind this is just that – I’m not sure if you’ve seen Venn diagrams depictions of probability before, if you haven’t, what I’m about to do may be a little cryptic, so just ignore that. Just ignore what I’m about to do if you haven’t seen it before. But if you have seen it before then this is really – this is really great – the probability of A one, union A two, union A three, is less than the P of A one, plus P of A two, plus P of A three. Right. So that the total mass in the union of these three things [inaudible] to the sum of the masses in the three individual sets, it’s not very surprising.
It turns out that depending on how you define your axioms of probability, this is actually one of the axioms that probably varies, so I won’t actually try to prove this. This is usually written as an axiom. So sigmas of avitivity are probably measured as this – what is sometimes called as well. But in learning theory it’s commonly called the union balance – I just call it that. The other lemma I need is called the Hufting inequality. And again, I won’t actually prove this, I’ll just state it, which is – let’s let Z1 up to ZM, BM, IID, there may be random variables with mean Phi. So the probability of ZI equals 1 is equal to Phi. So let’s say you observe M IID for newly random variables and you want to estimate their mean. So let me define Phi hat, and this is again that notation, no convention, Phi hat means – does not attempt – is an estimate or something else. So when we define Phi hat to be 1 over M, semper my equals one through MZI. Okay? So this is our attempt to estimate the mean of these Benuve random variables by sort of taking its average. And let any gamma be fixed.
Then, the Hufting inequality is that the probability your estimate of Phi is more than gamma away from the true value of Phi, that this is bounded by two E to the next of two gamma squared. Okay? So just in pictures – so this theorem holds – this lemma, the Hufting inequality, this is just a statement of fact, this just holds true. But let me now draw a cartoon to describe some of the intuition behind this, I guess. So lets say [inaudible] this is a real number line from zero to one. And so Phi is the mean of your Benuve random variables. You will remember from – you know, whatever – some undergraduate probability or statistics class, the central limit theorem that says that when you average all the things together, you tend to get a Gaussian distribution. And so when you toss M coins with bias Phi, we observe these M Benuve random variables, and we average them, then the probability distribution of Phi hat will roughly be a Gaussian lets say. Okay? It turns out if you haven’t seen this up before, this is actually that the cumulative distribution function of Phi hat will converse with that of the Gaussian. Technically Phi hat can only take on a discreet set of values because these are factions one over Ms. It doesn’t really have an entity but just as a cartoon think of it as a converse roughly to a Gaussian.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?