Sufficient statistic paradox

A sufficient statistic summarizes a set of data.

If the data come from a known distribution with an unknown parameter, then the sufficient statistic carries as much information as the full data [0].

For example, given a set of n coin flips, the number of heads is a sufficient statistic.

More on sufficient statistics here.

There is a theorem by Koopman, Pitman, and Darmois that essentially says that useful sufficient statistics exist only if the data come from a class of probability distributions known as normal families [1].

This leads to what Persi Diaconis [2] labeled a paradox.

Exponential families are quite a restricted family of measuures.

Modern statistics deals with far richer classes of probabilities.

This suggests a kind of paradox.

If statistics is to be of any real use it must provide ways of boiling down great masses of data to a few humanly interpretable numbers.

The Koopman-Pitman-Darmois theorem suggests this is impossible unless nature follows highly specialized laws which no one really believes.

Diaconis suggests two ways around the paradox.

First, the theorem he refers to concerns sufficient statistics of a fixed size; it doesn’t apply if the summary size varies with the data size.

Second, and more importantly, the theorem says nothing about a summary containing approximately as much information as the full data.

*** [0] Students may naively take this to mean that all you need is sufficient statistics.

No, it says that if you know the distribution then all you need is the sufficient statistics.

You cannot test whether a model fits given sufficient statistics that assume the model.

For example, mean and variance are sufficient statistics assuming data come from a normal distribution.

But knowing only the mean and variance you can’t assess whether assuming a normal distribution makes sense.

[1] This does not mean the data have to come from the normal distribution.

The normal family of distributions includes the normal (Gaussian) distribution, but other distributions as well.

[2] Persi Diaconis.

Sufficiency as Statistical Symmetry.

Proceedings of the AMS Centennial Symposium.

August 8–12, 1988.

.

Leave a Reply