We've talked about information as bits,
measuring information
we've talked about counting, so we can use
bits to count 0 0 0 1, 1 0 1 1,
counting from zero up to three, modulo two.
We've talked as bits as labeling,
that we can use barcodes which are just
bits to label things.
And finally we've talked about how bits are
physical,
that all bits that we have in computers, all
the bits of information that I'm conveying
via the vibrations of my vocal chords and
the vibrations of the air
are actually physical systems, physical
manifestations of information.
And then we also talked about a discovery
which is 150 years old,
that all physical systems carry information,
and that amount of information can be quantified.
So number of bits is the logarithm to the base two
of the number of possibilities,
a result which ironically is inscribed on Boltzmann's
grave.
So now I'd like to give you another aspect
of bits, and this a very 20th century aspect
of bits of information.
And that is the relationship between information
and probability.
So probability is something that we're all
familiar with and all confused by,
and I'm always confused by probability.
Human beings are known to have a very bad
intuitive sense of probability,
we overestimate the probability of truly
awful events,
we underestimate the probability of fine,
nice, normal events.
Of course, from an evolutionary standpoint,
overestimating the probability of some event
like a sabretooth tiger dropping out of this
tree and sinking his teeth into your neck,
this is probably a good thing, which might
be why.
But there's a simple idea of probability,
and let me try to demonstrate them right here.
So let's take the example of heads and tails.
I have here a nice shiny, new nickel that's
been given to me by a member of the Santa Fe
Institute, she didn't ask me to give it back
either so I'm five cents ahead.
So it can either be heads or tails.
What do you think? What's the probability that
it's head or that it's tails?
Well I claim it's fifty-fifty. But why?
Why is it one half? The probability that it's
heads or tails.
It was tails, I swear.
So there are two notions of probability for
heads and tails.
So one notion is - and I claim that this is the kind
of nicest, most intuitive notion - when I just
flip it like this, I wasn't watching it on the
air, I didn't know how hard I flipped it,
I didn't see it before I put it down there.
I have no reason for preferring heads over
tails.
Heads over tails are just a priori they have
equal weight.
Heads. It was heads by the way, now the
probability is one that it was heads,
that's the funny thing about probabilities.
First you don't know and you have ones that
are probabilities.
These are called prior or a priori probabilities.
So probability of heads is equal to the
probability of tails is one-half,
because there is no reason to prefer heads
over tails. This is a good argument.
So this is the prior probabilities of heads
or tails, it's 50 percent.
But there's another argument about why
the probability of heads and tails would be 50 percent.
So let me just try it like this, let me just
this coin a bunch of times.
Tails.
Heads.
Heads.
Heads.
Tails.
Heads.
Tails.
Heads.
Heads.
So I actually got seven heads and three tails
out of ten tosses.
That was kind of dull, this is the problem.
With probability it's dull and confusing
and to figure out what's going on,
you have to do it many times.
Because I don't think that you're going to
agree that this shiny new United States nickel
really has a probability of having seven out
of ten of having heads and three out of ten of
having tails.
It was just the luck of the draw,
or the luck of the toss.
It just so happens that there were seven
heads and three tails, which, if you're flipping
a coin ten times, is pretty reasonable.
So if I were to flip this coin a whole bunch
more times,
which I'm not going to do because I know it
will be dull, you would be very bored by this.
So if I flip a coin, and I should say a fair coin,
I should note that in my classes at MIT,
the students all start out seeming to
believe what I say, but after a few lectures,
they become very distrustful.
I don't know why this is, I seem like a
trustworthy person.
Anyway, I flip a fair coin m times and we
look at the number of heads and the number of tails
and the sum of the number of heads plus the
number of tails is equal to m.
I just flipped it ten times.
And we're going to call the frequency,
or the frequency of heads
is just equal to the number of heads divided
by m.
So I flipped it ten times, I got seven heads,
frequency is 0.7.
Frequency of tails, as you may very well guess,
is the number of tails over m,
and that's equal to one minus the number of heads
divided by m.
Now what we expect, just from personal
experience, is that if we just keep flipping
the coin many many many times.
Well, if I flip it 100 times, I certainly don't
expect to get exactly 50 heads,
which would be a frequency of exactly 0.5,
matching the probability.
But I would expect to get something a little
better than 0.7, seven-tenths.
That seems, you know, very unlikely, that
if I flip it a hundred times I'm going to get 70 heads.
It's perfectly possible, why not.
So I will just give you the formula for this.
So the expected number of heads, which is
also the expected number of tails, because
there's nothing to choose between them,
is equal to 50 percent.
I flip it 100 times, for example, m is equal to 100.
Then m over two is equal to 50.
So I'd expect to get it roughly 50, and then
I'm going to use this notation, plus or minus,
I'll explain what this is in a moment, plus
one-half times the square root of m.
So actually what you would expect means
well it's roughly in this interval.
I flip it 100 times, the square root of 100
is 10.
I expect it to be roughly within five, might
be a few more, might be seven or eight more
but I'd be really kind of surprised if there
were seventy heads and thirty tails.
I would think it'd be more likely, you know
60 heads, 40 tails, but probably more like
55 and 45.
And that's actually what you can do.
So let's actually ask why is this so.
So if I look at all different possible sequences
H H T T H H H T H H H T
you may notice that the first ten of these
are pretty much what I got for when
I was flipping the coin.
Dot dot dot, which is a way of meaning
et cetera.
Just keeps on going, and then we're going
to have n of these,
and we're going to count the number of
possible sequences
with exactly m_h heads and m_t tails.
Of course, because it's got to be heads or
tails,
at least unless it lands on its side, which
I don't think it's going to do,
this has got to add up to m.
So I'm going to count the number of possible
sequences with exactly m_h heads, m_t tails,
the two have to add up to m.
And what we're going to find out, well there's not so
many sequences which are heads heads heads...
tails.
So there's going to be a very small number
of sequences that have almost all heads and
a few tails.
There's similarly going to be very small
number of sequences that have almost all
tails and a few heads, and there's going to
be humongous number of sequences that have
roughly the same number of heads and of tails.
So you can see, to relate this to information
theory,
each sequence is like a sequence of zeros
and ones.
You can call heads zero and tails one,
this is just a long long long bit string.
And so we can relate ideas of information,
numbers of possible sequences with a
particular pattern,
in this case a particular number of heads and
of tails
to probability.