Kullback–Leibler divergence

Everything on bioinformatics, the science of information technology as applied to biological research.

mith
Inland Taipan
Posts: 5345
Joined: Thu Jan 20, 2005 8:14 pm
Location: Nashville, TN
Contact:

Kullback–Leibler divergence

Can someone give me a sample calculation using the Kullback–Leibler divergence?I'm a bit miffed by all the symbols...
Living one day at a time;
Enjoying one moment at a time;
Accepting hardships as the pathway to peace;
~Niebuhr

G-Do
Garter
Posts: 38
Joined: Mon Oct 02, 2006 12:23 pm
Hi mith,

The context in which I learned KL-distance (or "relative entropy") is where you have a probabilistic model - a probability distribution - and you want to contrast it with the distribution suggested by your data. So for example, you could have a single six-sided loaded die, set up so that the following distribution describes how it rolls:

1: 30%
2: 15%
3: 5%
4: 5%
5: 10%
6: 35%

And if you knew nothing about this die, you might pick it up and assume it was a fair die, in which case you'd expect its distribution to be even:

1: 16.7%
2: 16.7%
3: 16.7%
4: 16.7%
5: 16.7%
6: 16.7%

The first distribution, which we shall denote P, is real - it's what you'd approximate if you began to sample by rolling the die a bunch of times. The second distribution, which we shall denote Q, is a model - a possibly incorrect estimate of the distribution made using all the information we have to date.

This is a toy example, but bear with me!

To determine "how good" the model is, you can look at its relative entropy (K-L divergence, or distance) to the real distribution. To do this, you would use the formula:

Distance = SUM[across all outcomes i, of the quantity P(i) * ln(P(i)/Q(i))]

In other words, the SUM decomposes into six terms, since there are six outcomes (each side of the die). The first term would be 0.3*ln(0.3/0.167), since that is P(1)*ln(P(1)/Q(1)). The whole equation can be written out as:

D = 0.3*ln(0.3/0.167)+0.15*ln(0.15/0.167)+0.05*ln(0.05/0.167)+
0.05*ln(0.05/0.167)+0.10*ln(0.10/0.167)+0.35*ln(0.35/0.167)

Within each ln(), you have something from the actual distribution P divided by something from the model distribution Q (and since every probability in Q is 16.7%, the divisor is always 0.167). And this sums up to 0.247, thereabouts.

Clear?
Vi veri veniversum vivus vici

mith
Inland Taipan
Posts: 5345
Joined: Thu Jan 20, 2005 8:14 pm
Location: Nashville, TN
Contact:
Thank you very much! I figured out how to calculate it(it was due monday lol) but your explanation really helped me understand why I was doing it like that..

btw, we used log base 2, is there any significant difference if ln is used?
Living one day at a time;
Enjoying one moment at a time;
Accepting hardships as the pathway to peace;
~Niebuhr

G-Do
Garter
Posts: 38
Joined: Mon Oct 02, 2006 12:23 pm
Hi mith,

The KL divergence is a measure of how much information your model gives you about the data you are modeling. In its original conception, it was designed with log-base-2, because information (in information theory) is measured in bits. When the statistical biologists got their hands on it, they changed it to the natural log because it makes the math for certain statistics (which I can't remember now) easier.
Vi veri veniversum vivus vici

baccts
Garter
Posts: 1
Joined: Thu Aug 06, 2009 6:15 pm

Re: Kullback–Leibler divergence

G-Do,

Thanks for the explanation. That is the easiest-to-understand explanation of KL div and thus IMHO the best.

I have related question. Suppose the die outcomes are continous, i.e. it can be any number between 1 to 6 how do I calculate the probability in this case?

Thanks

Who is online

Users browsing this forum: No registered users and 0 guests