In the previous three units we covered
was first, a mathematical account
of how to model the arrival time of
taxicabs in New York, and then I tried
to generalized and give a sense of how
maximum entropy methods are used in
the real world, in particular
how they are used or might be used
to described the open Safer ecosystem
I did by analogy by a set
of foundational work people have done
in the study of ecosystems
And I showed how, for example,
the maximum entropy model might be
intention with a simpler mechanistic
model
and currently we dont have the ability
to distinguish between the two
functional forms
MaxEnt predicts one functional form
its mechanistic probabalistic accrual
of language aderant has slightly
functional form and they
look too similiar to us to
decide right now.
In this next part of the talk
or next part of this unit,
I'm going to try to show you
another kind of argument that gets made
about social systems and biological
systems (of course in this case) a
social system and I'm going to show you
how those arguments get made in a
MaxEnt form and the kinds of insights
you might be able to derive
So this is a story that focuses
on a really interesting part of
Americana, it's the Sears-Roebuck
catalogue --so the Sears-Roebuck
company invented this idea of selling
large amounts of consumer goods
not directly through a store, through a
printed catalogue
that was then distributed all across the
country
So if you were a farmer in
the fall of 1909, you weren't able
necessarily to get to Chicago
to go buy the things you need to buy
to get by--needles and thread
and clothes pins and buggy whips
and Remington shavers
So what you did instead
you consulted the Sears-Roebuck & Co.
catalogue
and you able to order, by mail
all the things that you needed and this
revolutionized, of course,
consumer buying, sort of
the Amazon Prime or
Amazon.com of
the early Twentieth Century
In fact the Sears and Roebuck
catalogue ran before, all
the way from the 1800s
all the way through to the end of
the Twentieth Century
It may still exist in some form, today
buying things through the mail
has declined somewhat
So I'm going to talk in particular
about a paper that was written in
1981, apparently Montroll
called "On the Entropy Function
of Socio-Technical Systems"
and it's interesting in part because
it's one of the first times that
somebody tried to build
and argument about social systems,
about living systems,
by using Maximum Entropy arguments
So here's what Montroll did
Montroll looked at the prices
of goods in the
Sears-Roebuck catalogue
(and in fact he took data from another
source) and what he
plots here is year-by-year..
This is 1916,
this is 1924,
this is 1974
And what he does--he plots
the distribution of prices
the probability of that a "good"
in the Sears catalogue has
some cost seed.
He plots this on a log scale
This is log price
and, in fact, he uses log base 2
and this ranges from -6
that is 1 over 64,
to +6, 64 $ dollars
in the 1916 case,
and he plots the distribution
of goods. So here, for example,
the 60 percent chance you pull
an item out of the Sears catalogue
in 1916 at random
that it costs roughly log2
dollars equal zero, or in other words
it cost about a dollar.
So sixty percent of all the goods
in the catalogue cost a dollar
and you can see that on the
extremes the distribution dies out.
There are very few goods that cost more
than 60 dollars and very few here that
cost on the order of 10.
So the first thing he notes
is this distribution looks roughly
Gaussian. ...or normal.
And if you paid attention to the previous
unit, you noticed that
this here at the log price, in fact,
the distribution
of prices in the Sears catalogue
is log-normal
In other words, if you take the
logarithm of the price
to show the distribution, you get a
Gaussian.
So let's dig a little into the log-normal
distribution ...it looks like
P of X is proportional
To E to the negative X minus
Mu squared over
2(Sigma) squared.
I called mu x_ there but mu
is the mean of the distribution
(We call it the mean)
And sigma is something we call the
variant.
Let's expand this a little bit more
I'm going to write this as e to
the negative x squared over
2 sigma squared
plus 2X mu over 2 sigma squared
Minus mu squared over
2 sigma squared.
All I have done is expand the x
minus mu squared over 2 sigma squared
term.
So, I'm going to rewrite this as e
to the negative lambda 1 X squared plus
lambda 2 X plus lambda 3
And when I write it in this form you
realized the log-normal distribution
is just a maximum entropy
distribution if we constrain two things
One: we constrain Xsquared
And the other: we constrain x
And of course, constraining these two
things, is equivalent to
constraining the variant, which is
X-the expectatiion value of Xsquared
and the mean.
Constraining these two is the equivalent
of constraining these two.
And of course you can just expand
this here.....so fixing this
and this, to some set of values
is the same as fixing this to a value and
this to a value.
So the log-normal distribution is
secretly, and has been secretly
all along another
MaxEnt distribution.
Once you write it like this,
and realize that all these sort of
constants...here....and....here,
all these constants are really just
LaGrange multipliers that people
figured out the right answers for
You realize that the log-normal
distribution constrains these two
quantities.