I'm now gonna review all the steps we took
ok? Because we went on a long journey and
you've learned a huge number of things
in the pursuit of solving this rather, in
the end, rather simple problem. Ok?
The problem you wanted to solve, ok?
was a parsimonious description of how long
it takes to get a cab in New York City ok?
And that parsimonious description you
wanted to induce or indoctor learn from
data, ok? So things that aren't
parsimonious are saying that the
probability of waiting n-minutes is the
number of times you saw yourself waiting
for n-minutes, ok?, for a cab. Those kinds
of descriptions we decided were
they were too..they were overfitting the
data. Ok? So instead what I said was
what we're gonna do is we're gonna try to
reproduce a limited number of features.
We're not gonna try to reproduce, for
example, the exact number of times we
waited six minutes. ok? Or the exact
fraction of time we waited six minutes
Instead what we're gonna do is reproduce
right, some of the overall gross
characteristics of the data. In particular
what I said was you know what? the only
thing I want to preserve is the average
time it take me to get a cab. That's it.
Everything else, forget it.
Now the problem is, there's many
distributions that preserve that
So what we decided to do was take the
distribution that had maximum entropy
subject to that constraint, ok? And the
argument that we made was that the
distribution with maximum entropy leaves
you maximally uncertain about the waiting
time, ok? It has no additional hidden
theories. There's no way that it
implicitly assumes something else about
the data specific that would reduce your
uncertainty about what was going to happen
ok? So that was our argument...that was
sort of our...er, uh...intuitive
justification for this step here, to
maximize the entropy. Once you believe
that's a good thing to do, then we dive
into the mathematics. In particular,
what I had to do was show you how the
method of Lagrange multipliers works. This
is a great mathematical tool, it's useful
not only the particular case of the MaxEnt
problem, but you see it all over the place
particularly in a subject like economics
ok? where you're goal - in fact, their
Lagrange multipliers are called
"Shadow Prices", ok? But, in those..in a
lot of systems is trying to maximize one
quantity but you're constrained by another
set of forces, ok? So I showed you how to
do a Lagrange multiplier trick. I gave you
I gave you...I gave you the one constraint
two dimensional problem and I told you
that the end constraint problem seems, or
works out in a very similar fashion, ok?
And then I actually worked through the
problem of maximizing constraints - of
maximizing entropy subject to constraints
and we found a particular functional form
but it was only a functional form.
It was only a functional form because
lambda and Z, these were the hidden
Lagrange multiplier terms. These were
terms that I had to set by hand.
So I know the functional form right away
But now I have to the heavy lifting to
actually figure out what Lambda and Z
should be. And so, I had to do some
infinite sums, played
some nice mathematical
games - I hope you had fun - and in the
end, what we've found was that solving
for these Lagrange multipliers ended up
with a single transcendental equation
for Lambda 1. While you weren't looking
I quickly plugged that into Mathematica
and found the numerical value of Lambda 1
which is equal to about 0.22.
So, at the end of all of this - if this
is 0 minutes, 1, 2, 3, 4, 5, 6, 7…
this is your waiting time in minutes
this is the probability of waiting that
long. So in the data, we had, you know
sometimes we waited 6 minutes
sometimes we waited, let's see, 3 minutes
sometimes we waited 4 minutes, there were
a couple times we waited 2, ok? So,
this was the distribution of the data that
we had measured, alright? This would have
been what we would have decided was an
overfitting model. And in fact, we found
was that the distribution actually looks
something like this. It's an exponential
distribution, in x, ok? So this here
is in some sense, the best fitting model
to this if you were strict...if if if...
if this was constrained only by fixing
the average value of these waiting times
here. That's the only thing we've
constrained. And this here, for this
particular choice of Lagrange multiplier
constraints, gets the average right and
nothing else. It's maximally uncertain.
It's not that this doesn't have other
properties, it's distribution does have
for example, a variance. But those are
all dependent, those are chosen so
that this distribution here has the
maximum entropy subject to a constraint
only on the average. So,
let's think
briefly about this model, which by the way
is mechanistically agnostic, alright? It
has no theory about taxi cabs. At no place
we could have, instead of modeling
waiting time for taxi cabs, we could have
modeled waiting time for, I don't know
you know, your next United flight, alright
We could have modeled, you know, the
number of, um, you know, the number of
earthquakes in Japan over a year of a
certain magnitude. We could have modeled
the number of, you know, C-pluses you
give to your students in a particular year
Ok? This method is totally agnostic about
the actual underlying sort of physics
or cognitive science or sociology of the
problem, ok? But let go, and look and see
if there's any implicit mechanistic model
that maximum entropy has kind of
implicitly given to us. In particular,
let's see if we can construct, and we'll
be able to do this quite easily, an
underlying mechanistic model for catching
a cab in New York that produces the same
probability distribution, ok? And so what
I'm going to do, is I'm going to say the
chance of you getting a cab in New York
is constant and independent of time. And
in particular, the chance of you getting a
cab at any one minute interval is 'p'.
Alright? Some number 'p', ok? So that
means the chance of you getting a cab
between 0 minutes and 1 minute is 'p'
the chance of you getting a cab between
1 minute and 2 minutes, well, first of all
it's 1 minus p, cause you didn't get a cab
that first minute, ok, you got unlucky.
And the chance that, ok - having not
gotten a cab in the first minute, you get
a cab in the second minute. Ok, that's
just 'p'. Ok? So...or rather p(0), is p.
The probability of getting a cab between
0 and less than 1 minute is p. p(1) is
1 minus p times p. And of course p(2) is
just 1 minus p squared times p.
Didn't get it the first minute, didn't
get it the second, finally got it the
third. Ok?
And so, this here is a mechanistic model
Ok? And at least has some theory about
taxi cabs in New York, it assumes they are
sort of like rain drops, they kinda fall
from the sky. Ok?
Independent of each other.
And of course you can map this model here,
which in general looks like…
P(x) equals one minus p to the x, times p.
And if I define Z has one over p.
And I define lambda 1 has negative
log one minus p.
Then I have an exact correspondence
between these two models. Ok?
So, what We've just showed is that the
maximum entropy model, ok, where the
waiting time is constraint on average to
be a particular value, but the system is
other wise completely uncertain,
is equivalent to sot of random rain drop
taxi cab arrival model, and, what We'll do
on and off for the rest of this lecture is
talk little bit of how this mechanism
agnostic story, ok? Can be translated into
some set of assumptions, ok?
About the underlying principles,
the underlying scientific principles that
might be at work, and so
in particular here is a bit exalted to
call this a scientific principle, but the
story is essentially that, you know…
Privately own transportation services
in New York arrive in an uncorrelated
fashion with each other,
constant over time, ok?
And you can see, of course, that you know,
if you wait, you know, too long maybe
the time of day changes, maybe some other
features of the system changes, so
this p might change, ok, in which case
this model here would no longer have
the same functional form of
the max. Ent. Model, ok?
And you can see there how additional
mechanistic phenomena might drive
the system away from the simple
max. Ent. Model constraint model, ok?