1
00:00:01,018 --> 00:00:03,762
Just to remind youwhere we are,
we have two constraints,
2
00:00:03,857 --> 00:00:07,330
one is, the expectation value of x,
the average waiting time is four minutes,
3
00:00:07,509 --> 00:00:09,646
the other is that the probabilities
sum to 1.
4
00:00:09,759 --> 00:00:11,970
Those are our two constraints,
and then what we're going to do is
5
00:00:12,060 --> 00:00:16,334
maximize the entropy of a distribution,
subject to those constraints.
6
00:00:16,360 --> 00:00:19,549
Okay? So S is going to be the entropy
function,
7
00:00:19,549 --> 00:00:22,371
and we're not going to have 1 g,
but in fact 2 gs, okay?
8
00:00:23,067 --> 00:00:26,495
One g is this function,
one g is this function. Okay?
9
00:00:26,737 --> 00:00:31,817
So, how do you do maximization,
under constraints,
10
00:00:31,968 --> 00:00:34,618
for more than one constraint?
I gave you an intuitive picture
11
00:00:34,866 --> 00:00:36,946
about how you could do one
constraint at a time,
12
00:00:37,098 --> 00:00:38,770
but how could you do two constraints?
13
00:00:39,098 --> 00:00:41,220
I'm going to tell you the answer,
14
00:00:41,434 --> 00:00:46,477
because it's much harder to work through
the problem of multiple constraints,
15
00:00:46,819 --> 00:00:49,703
but it's an intuitive answer, one that is
worth remembering,
16
00:00:49,902 --> 00:00:52,031
and if you ever want to work through it,
17
00:00:52,031 --> 00:00:55,332
there's plenty of places to find the
answer.
18
00:00:56,116 --> 00:00:59,128
So, here's how to do lagrange multipliers,
19
00:00:59,267 --> 00:01:01,598
the method of lagrange multipliers,
you remember the lagrange multiplier
20
00:01:01,746 --> 00:01:05,269
is that lambda term, okay? And so,
that's where the method get its name from,
21
00:01:07,058 --> 00:01:09,403
you want to maximize the function f,
22
00:01:14,833 --> 00:01:20,059
subject to a set of constraints, now,
23
00:01:23,003 --> 00:01:26,833
and we'll number these constraints
g sub i, so g1, g2, all the way through
24
00:01:27,184 --> 00:01:28,983
your n constraints however
many you have,
25
00:01:29,244 --> 00:01:33,797
and the way you do that is you set the
gradient of the function equal to
26
00:01:33,953 --> 00:01:40,233
a linear combination of the gradients of
the constraints.
27
00:01:40,704 --> 00:01:43,084
So, here's the case where you have
n constraints.
28
00:01:43,243 --> 00:01:46,855
So, this is the general method of
lagrange multipliers,
29
00:01:46,933 --> 00:01:51,019
in order to maximize this function subject
to these n constraints,
30
00:01:51,235 --> 00:01:56,858
set the gradient of the function f equal
to some linear combination
31
00:01:57,098 --> 00:02:00,788
of gradient of the function g, and
then the problem
32
00:02:00,929 --> 00:02:06,277
is just how do you find g. What you know
is, you know your maximum point
33
00:02:06,433 --> 00:02:12,375
is such that you can add together all of
these gradients in such a way,
34
00:02:12,605 --> 00:02:17,217
with some weight so that you can reproduce
the original gradient of the contours.
35
00:02:17,609 --> 00:02:23,016
Okay? And so, now the problem is, what
are these Ls, or what are these lambdas,
36
00:02:26,334 --> 00:02:29,763
so what I'm going to do is, I'm going to
walk you through, now the problem of
37
00:02:29,928 --> 00:02:35,240
maximum entropy, using this formula here,
and if these seem mysterious right now,
38
00:02:35,471 --> 00:02:37,114
by the end, they hopefully should not be.
39
00:02:37,235 --> 00:02:40,717
What you're going to do is turn knobs,
and twiddle knobs around,
40
00:02:40,870 --> 00:02:43,148
until you get the lambda such that
41
00:02:43,400 --> 00:02:47,349
those lambdas satisfy the particular
constraint values you have in mind.
42
00:02:50,038 --> 00:02:55,703
So, we're going to maximize not arbitrary
function f, but in fact the entropy,
43
00:02:56,235 --> 00:02:59,481
and our constraints are going to be a
constraint on the average, and
44
00:02:59,618 --> 00:03:04,160
a constraint on the normalization. So,
we want the derivative of S
45
00:03:04,459 --> 00:03:08,432
with respect to p_i, we do this term by
term in the vector,
46
00:03:08,719 --> 00:03:15,120
we want that to be equal to lambda1,
times the derivative of g1 wrt p_i,
47
00:03:15,587 --> 00:03:22,261
plus lambda2 times the derivative of g2,
with respect to p_i. Okay?
48
00:03:24,014 --> 00:03:29,995
So this reminds you, S is the entropy
of the distribution,
49
00:03:30,172 --> 00:03:34,173
S is equal to minus the sum over all possible
waiting times.
50
00:03:34,339 --> 00:03:37,138
So again, just for convenience's sake,
I'm talking about the discrete case,
51
00:03:37,222 --> 00:03:41,173
you can take limits, if you have your
measures set up correctly,
52
00:03:41,318 --> 00:03:45,010
and you can turn this into integrals,
and turn this into integrals,
53
00:03:45,199 --> 00:03:47,008
and this as well into integral.
54
00:03:47,188 --> 00:03:50,253
But it's easier just conceptually, to talk
about the discrete case first.
55
00:03:50,508 --> 00:03:56,707
So, g1, remember, this is a function of p,
alright, p is a vector here, okay?
56
00:03:56,861 --> 00:04:05,351
g1 is just the sum i 0 to infinity, okay,
of p_i times i.
57
00:04:05,579 --> 00:04:08,280
and I'm using just, I'm using i now
instead of x,
58
00:04:08,441 --> 00:04:11,809
it's easier to write for me. Okay?
59
00:04:11,980 --> 00:04:17,066
So, this is the constraint function, okay,
that constraints the average value.
60
00:04:17,357 --> 00:04:22,239
And of course what we want in the end is
we want g1(p) to be equal to 4 minutes.
61
00:04:23,481 --> 00:04:33,889
g2(p) is just the normalization constraint
so the function just looks like summing
62
00:04:34,028 --> 00:04:38,234
over all values of p, and of course in the
end what we'll do is we'll take g2 = 1.
63
00:04:39,411 --> 00:04:42,217
And we previously defined the entropy
here.
64
00:04:42,439 --> 00:04:47,710
So, what if the derivative of the entropy,
with respect to a particular probability,
65
00:04:48,081 --> 00:04:53,270
right, a particular probability of a
particular configuration, okay? So,
66
00:04:55,241 --> 00:05:03,522
Alright move this out here, S is equal to
negative p_i log p_i i from 0 to infinity,
67
00:05:03,879 --> 00:05:11,425
okay? So, dS/d(p_i), equals, well the only
term that's going to survive is
68
00:05:11,575 --> 00:05:15,903
where you have the p_i in it, and then
we have the derivative of p_i log p_i,
69
00:05:16,099 --> 00:05:22,255
that has two terms: log p_i, and then the
other one is p_i times the derivative of
70
00:05:22,806 --> 00:05:27,877
log p_i. derivative of log p_i is 1/p_i,
so in fact, you have plus 1.
71
00:05:28,126 --> 00:05:32,783
So, this is the left hand side of your
lagrange multiplier equation.
72
00:05:32,917 --> 00:05:35,438
And just to remind you, we set the base
of the log to e.
73
00:05:35,438 --> 00:05:39,440
So, now we have to take the derivative of
g1 with respect to p_i,
74
00:05:39,942 --> 00:05:43,870
okay? So again, take the derivative of
this sum here with respect to p_i,
75
00:05:44,041 --> 00:05:52,627
and what of course you'll find, is that
dg1/d(p_i) = i,
76
00:05:52,966 --> 00:06:00,648
and finally, dg2/d(p_i) = 1.
77
00:06:00,648 --> 00:06:03,850
There's only 1 term in the sum that
doesn't get destroyed by the derivative.
78
00:06:04,167 --> 00:06:07,591
So, let's now put all this together,
79
00:06:13,417 --> 00:06:21,493
we have negative log p_i -1 is equal to
lambda1, times the derivative of g_1,
80
00:06:21,902 --> 00:06:28,730
with respect to p_i, which is i, plus the
derivative of g2 with respect to p_i,
81
00:06:28,882 --> 00:06:36,136
times lambda 2, so this is now our
equation, okay, that is satisfied
82
00:06:37,847 --> 00:06:43,536
when you try to maximize the entropy,
try to maximize this function,
83
00:06:43,680 --> 00:06:46,106
subject to these constraints, for some
value of the constraint.
84
00:06:46,226 --> 00:06:51,196
So let's solve for p_i. So, let's move
things around here,
85
00:06:51,396 --> 00:06:59,601
and we have negative 1 minus lambda1 i,
minus lambda 2, equals log p_i,
86
00:07:00,300 --> 00:07:02,952
we'll exponentiate both sides, flip them
around, and we have
87
00:07:03,073 --> 00:07:09,972
p_i is equal to e to the negative 1 minus
lambda1 i, minus lambda 2. Okay?
88
00:07:11,493 --> 00:07:15,175
And, we can write that somewhat more
succinctly in the following way,
89
00:07:15,422 --> 00:07:19,150
e to the lambda1 i divided by Z,
90
00:07:19,389 --> 00:07:26,624
where Z is equal to e to the 1
plus lambda 2.
91
00:07:29,134 --> 00:07:34,089
The probability of waiting for a certain
time i, is equal to e to the negative
92
00:07:34,416 --> 00:07:40,615
lambda1 times i, there's an exponential
distribution of waiting times.
93
00:07:40,935 --> 00:07:45,818
Now, all that remains is to figure out
what on earth lambda1 is,
94
00:07:45,942 --> 00:07:48,048
and what on earth Z is.
95
00:07:48,048 --> 00:07:50,896
And what we're going to do is we're going
to turn, we're going to figure out
96
00:07:51,017 --> 00:07:53,516
the value you have to set lambda1 to,
97
00:07:53,583 --> 00:07:57,065
in order to satisfy this particular value
of the constraint,
98
00:07:57,173 --> 00:08:01,036
and this particular value of the
constraint. So,
99
00:08:01,235 --> 00:08:04,640
we know the functional form of the
distribution,
100
00:08:04,752 --> 00:08:07,392
and now we just have to figure out the
parameters of that function.
101
00:08:07,517 --> 00:08:09,290
And there will be two parameters.
102
00:08:09,452 --> 00:08:11,789
So the first thing we know is of course,
103
00:08:14,674 --> 00:08:18,209
that the probability is normalized, and
that means in fact that,
104
00:08:27,669 --> 00:08:29,204
okay?
105
00:08:29,373 --> 00:08:32,816
plugging in this functional form, and so
now, already, we can solve for Z
106
00:08:33,187 --> 00:08:43,068
in terms of lambda1. So, eliminating the
first variable here, Z, is easy.
107
00:08:43,389 --> 00:08:50,828
We can just set Z equal to the sum from i
0 to infinity, e to the negative lambda1 i
108
00:08:50,828 --> 00:08:53,057
okay? So we already eliminated one
variable,
109
00:08:53,246 --> 00:08:56,826
and now, all we have to do is to solve
for the other constraint. Okay?
110
00:08:56,995 --> 00:08:59,723
In particular, just let me write this here
111
00:09:04,708 --> 00:09:12,909
In particular, we have the sum from i 0 to
infinity times e to the negative lambda1 i
112
00:09:13,233 --> 00:09:20,966
i, all over Z, that has to be equal to 4.