Okay, thank you for joining me
we are going to talk today about what
is called Max Ent, or
the Principal of Maximum Entropy, or
sometimes called Maximum Entropy Methods.
Max Ent can break into two parts
and those two parts here will have
just in the 1st and 2nd half
of this unit.
Max Entropy was invented by
a gentleman named E.T. Jaynes.
He was the first person to
really put this all onto a single paper
for physical review and
Jaynes was after some really deep
philosophical, almost epistemological,
questions about the nature of
reality, why physical laws took
the shape that they did.
But in recent years, what we've
found is that max ent, the principal of
maximum entropy, has found
an enormous amount
of use in machine learning, in the
modelling of real-world processes,
as opposed to, lets say, their
explanation and the understanding
of those processes.
So there are some people who are really
interested in prediction, for example,
they would like to learn what the
stock market looks like and predict
what it is going to look like tomorrow.
They want to learn the natural, lets
say, a particular patients cancer;
they would like to model in such
a way that the model is good enough
that tomorrow they can predict what's
going to happen next.
That's a huge goal, intellectually
incredibly ambitious goal that people
in the artificial intelligence & machine
learning community have. And
max ent is huge in that part of the
world and that part of the intellectual
world and what we will do is begin
there. In the 2nd part of the
talk, i am going to try to draw connections
from what you have learned on the
prediction & machine learning side
and try to apply those to some really
exciting problems that we find in the
study of biological systems, in the
study of social systems. In
particular I will try to get a little bit
at some of the kind of deeper
philosophical questions that maximum
entropy raises for us particularly when
it works so well as it does.
So what I will do is begin with the
prediction problem. In particular
I begin with the kind of problem which
maximum entropy excels at.
That is in the prediction of high
dimensional data; so high dimensional
data in this case, we'll explain as a kind
of working definition, say something
along the following line: a system is
high dimensional if the number of
configurations (which we call n) is
much greater than the amount of
data that we have, and we call that k.
So this is the amount of data and this is
the number of possible configurations.
So the number of ways the system could look
is much greater than the number
of ways you've actually observe it in
the real world. The number of
times we observe in the real world.
Oftentimes we can talk about the
dimensions of a data set, so lets
for example, take an image, and lets take
a black and white image. Lets say
that the image has 10,000 pixels.
Each pixel in your image, each pixel in
your image, can take on, lets say, +1
or -1 value. Black, lets say +1, and -1
lets say is white. So any image here
can have any arbitrary combination,
any arbitrary combination, of
pixel values. If those 10,000 pixels,
and each pixel can be the +1 or -1,
then the total # of images is 2^10,000.
Each pixel has a discrete dimension,
+1 or -1, and there's 10,000 of them.
So if you're trying to build a model
of, lets say, handwritten words, so you
would like to model for example all the
different ways in which i can write the
letter 'e'.
There is almost no way that you are
going to acquire, in fact I think there is,
probably, its probably possible to prove
that the universe will die a fearsome
heat death before you're able to gather enough
samples of my handwriting so
that the amount of data you have (k)
is anyway comparable to the 2^10,000.
Just to give you a sense, 2^10,000 is
sort of like 1,000^1,000 10^3,000. So thats
like a googal to the power of 30. So
in these cases here what we would
like to do is, we would like to talk
about, or at least give probability to
particular images drawn from
a set where the total number
of images is far fewer than the total
number of possible images. That's
where something like max ent excels...