Log sum inequality

The log sum inequality is used for proving theorems in information theory.

Statement

Let a 1 , , a n {\displaystyle a_{1},\ldots ,a_{n}} and b 1 , , b n {\displaystyle b_{1},\ldots ,b_{n}} be nonnegative numbers. Denote the sum of all a i {\displaystyle a_{i}} s by a {\displaystyle a} and the sum of all b i {\displaystyle b_{i}} s by b {\displaystyle b} . The log sum inequality states that

i = 1 n a i log a i b i a log a b , {\displaystyle \sum _{i=1}^{n}a_{i}\log {\frac {a_{i}}{b_{i}}}\geq a\log {\frac {a}{b}},}

with equality if and only if a i b i {\displaystyle {\frac {a_{i}}{b_{i}}}} are equal for all i {\displaystyle i} , in other words a i = c b i {\displaystyle a_{i}=cb_{i}} for all i {\displaystyle i} .[1]

(Take a i log a i b i {\displaystyle a_{i}\log {\frac {a_{i}}{b_{i}}}} to be 0 {\displaystyle 0} if a i = 0 {\displaystyle a_{i}=0} and {\displaystyle \infty } if a i > 0 , b i = 0 {\displaystyle a_{i}>0,b_{i}=0} . These are the limiting values obtained as the relevant number tends to 0 {\displaystyle 0} .)[1]

Proof

Notice that after setting f ( x ) = x log x {\displaystyle f(x)=x\log x} we have

i = 1 n a i log a i b i = i = 1 n b i f ( a i b i ) = b i = 1 n b i b f ( a i b i ) b f ( i = 1 n b i b a i b i ) = b f ( 1 b i = 1 n a i ) = b f ( a b ) = a log a b , {\displaystyle {\begin{aligned}\sum _{i=1}^{n}a_{i}\log {\frac {a_{i}}{b_{i}}}&{}=\sum _{i=1}^{n}b_{i}f\left({\frac {a_{i}}{b_{i}}}\right)=b\sum _{i=1}^{n}{\frac {b_{i}}{b}}f\left({\frac {a_{i}}{b_{i}}}\right)\\&{}\geq bf\left(\sum _{i=1}^{n}{\frac {b_{i}}{b}}{\frac {a_{i}}{b_{i}}}\right)=bf\left({\frac {1}{b}}\sum _{i=1}^{n}a_{i}\right)=bf\left({\frac {a}{b}}\right)\\&{}=a\log {\frac {a}{b}},\end{aligned}}}

where the inequality follows from Jensen's inequality since b i b 0 {\displaystyle {\frac {b_{i}}{b}}\geq 0} , i = 1 n b i b = 1 {\displaystyle \sum _{i=1}^{n}{\frac {b_{i}}{b}}=1} , and f {\displaystyle f} is convex.[1]

Generalizations

The inequality remains valid for n = {\displaystyle n=\infty } provided that a < {\displaystyle a<\infty } and b < {\displaystyle b<\infty } .[citation needed] The proof above holds for any function g {\displaystyle g} such that f ( x ) = x g ( x ) {\displaystyle f(x)=xg(x)} is convex, such as all continuous non-decreasing functions. Generalizations to non-decreasing functions other than the logarithm is given in Csiszár, 2004.

Another generalization is due to Dannan, Neff and Thiel, who showed that if a 1 , a 2 a n {\displaystyle a_{1},a_{2}\cdots a_{n}} and b 1 , b 2 b n {\displaystyle b_{1},b_{2}\cdots b_{n}} are positive real numbers with a 1 + a 2 + a n = a {\displaystyle a_{1}+a_{2}\cdots +a_{n}=a} and b 1 + b 2 + b n = b {\displaystyle b_{1}+b_{2}\cdots +b_{n}=b} , and k 0 {\displaystyle k\geq 0} , then i = 1 n a i log ( a i b i + k ) a log ( a b + k ) {\displaystyle \sum _{i=1}^{n}a_{i}\log \left({\frac {a_{i}}{b_{i}}}+k\right)\geq a\log \left({\frac {a}{b}}+k\right)} . [2]

Applications

The log sum inequality can be used to prove inequalities in information theory. Gibbs' inequality states that the Kullback-Leibler divergence is non-negative, and equal to zero precisely if its arguments are equal.[3] One proof uses the log sum inequality.

Proof[1]
Let P = ( p i ) i N {\displaystyle P=(p_{i})_{i\in \mathbb {N} }} and Q = ( q i ) i N {\displaystyle Q=(q_{i})_{i\in \mathbb {N} }} be pmfs. In the log sum inequality, substitute n = {\displaystyle n=\infty } , a i = p i {\displaystyle a_{i}=p_{i}} and b i = q i {\displaystyle b_{i}=q_{i}} to get
D K L ( P Q ) i p i log 2 p i q i 1 log 1 1 = 0 {\displaystyle \mathbb {D} _{\mathrm {KL} }(P\|Q)\equiv \sum _{i}p_{i}\log _{2}{\frac {p_{i}}{q_{i}}}\geq 1\log {\frac {1}{1}}=0}

with equality if and only if p i = q i {\displaystyle p_{i}=q_{i}} for all i (as both P {\displaystyle P} and Q {\displaystyle Q} sum to 1).

The inequality can also prove convexity of Kullback-Leibler divergence.[4]

Notes

  1. ^ a b c d Cover & Thomas (1991), p. 29.
  2. ^ F. M. Dannan, P. Neff, C. Thiel (2016). "On the sum of squared logarithms inequality and related inequalities" (PDF). Journal of Mathematical Inequalities. 10 (1): 1–17. doi:10.7153/jmi-10-01. S2CID 23953925. Retrieved 12 January 2023.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  3. ^ MacKay (2003), p. 34.
  4. ^ Cover & Thomas (1991), p. 30.

References

  • Cover, Thomas M.; Thomas, Joy A. (1991). Elements of Information Theory. Hoboken, New Jersey: Wiley. ISBN 978-0-471-24195-9.
  • Csiszár, I.; Shields, P. (2004). "Information Theory and Statistics: A Tutorial" (PDF). Foundations and Trends in Communications and Information Theory. 1 (4): 417–528. doi:10.1561/0100000004. Retrieved 2009-06-14.
  • T.S. Han, K. Kobayashi, Mathematics of information and coding. American Mathematical Society, 2001. ISBN 0-8218-0534-7.
  • Information Theory course materials, Utah State University [1]. Retrieved on 2009-06-14.
  • MacKay, David J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. ISBN 0-521-64298-1.