Login

Join for Free!
16760 members

Kullback–Leibler divergence

Everything on bioinformatics, the science of information technology as applied to biological research.

Moderator: BioTeam

Kullback–Leibler divergence

Postby mith on Fri Oct 13, 2006 7:33 pm

Can someone give me a sample calculation using the Kullback–Leibler divergence?I'm a bit miffed by all the symbols...
Living one day at a time;
Enjoying one moment at a time;
Accepting hardships as the pathway to peace;
~Niebuhr
User avatar
mith
Inland Taipan
Inland Taipan
 
Posts: 4504
Joined: Thu Jan 20, 2005 8:14 pm
Location: Berkeley, CA

Postby G-Do on Wed Oct 18, 2006 3:06 am

Hi mith,

The context in which I learned KL-distance (or "relative entropy") is where you have a probabilistic model - a probability distribution - and you want to contrast it with the distribution suggested by your data. So for example, you could have a single six-sided loaded die, set up so that the following distribution describes how it rolls:

1: 30%
2: 15%
3: 5%
4: 5%
5: 10%
6: 35%

And if you knew nothing about this die, you might pick it up and assume it was a fair die, in which case you'd expect its distribution to be even:

1: 16.7%
2: 16.7%
3: 16.7%
4: 16.7%
5: 16.7%
6: 16.7%

The first distribution, which we shall denote P, is real - it's what you'd approximate if you began to sample by rolling the die a bunch of times. The second distribution, which we shall denote Q, is a model - a possibly incorrect estimate of the distribution made using all the information we have to date.

This is a toy example, but bear with me!

To determine "how good" the model is, you can look at its relative entropy (K-L divergence, or distance) to the real distribution. To do this, you would use the formula:

Distance = SUM[across all outcomes i, of the quantity P(i) * ln(P(i)/Q(i))]

In other words, the SUM decomposes into six terms, since there are six outcomes (each side of the die). The first term would be 0.3*ln(0.3/0.167), since that is P(1)*ln(P(1)/Q(1)). The whole equation can be written out as:

D = 0.3*ln(0.3/0.167)+0.15*ln(0.15/0.167)+0.05*ln(0.05/0.167)+
0.05*ln(0.05/0.167)+0.10*ln(0.10/0.167)+0.35*ln(0.35/0.167)

Within each ln(), you have something from the actual distribution P divided by something from the model distribution Q (and since every probability in Q is 16.7%, the divisor is always 0.167). And this sums up to 0.247, thereabouts.

Clear?
Vi veri veniversum vivus vici
User avatar
G-Do
Garter
Garter
 
Posts: 38
Joined: Mon Oct 02, 2006 12:23 pm
Location: Philadelphia, PA; USA

Postby mith on Wed Oct 18, 2006 3:42 am

Thank you very much! I figured out how to calculate it(it was due monday lol) but your explanation really helped me understand why I was doing it like that..

btw, we used log base 2, is there any significant difference if ln is used?
Living one day at a time;
Enjoying one moment at a time;
Accepting hardships as the pathway to peace;
~Niebuhr
User avatar
mith
Inland Taipan
Inland Taipan
 
Posts: 4504
Joined: Thu Jan 20, 2005 8:14 pm
Location: Berkeley, CA

Postby G-Do on Thu Oct 19, 2006 1:26 pm

Hi mith,

The KL divergence is a measure of how much information your model gives you about the data you are modeling. In its original conception, it was designed with log-base-2, because information (in information theory) is measured in bits. When the statistical biologists got their hands on it, they changed it to the natural log because it makes the math for certain statistics (which I can't remember now) easier.
Vi veri veniversum vivus vici
User avatar
G-Do
Garter
Garter
 
Posts: 38
Joined: Mon Oct 02, 2006 12:23 pm
Location: Philadelphia, PA; USA


Return to Bioinformatics

Who is online

Users browsing this forum: No registered users and 0 guests