I was at the Wikipedia site the other day, looking up properties of the Chi-square distribution. I noticed that the formula for the median of the chi-square distribution with d degrees of freedom is given as ≈ d(1-2/(9d))3. However, there is no mention of how well this formula approximates the true value of the median, nor is there a reference. (Please post a comment if you know where this formula is derived.)
The formula is a model for the value of the median. As such, it contains modeling error. By using the SAS/IML language or the SAS DATA step, you can compare this model to a direct numerical computation of the median.
As John D. Cook discussed, the error introduced by a model (the formula) can be orders of magnitude greater than the approximation error from a finite precision numerical computation. Let's compare the formula to a numerical computation of the median for a range of degrees of freedom. The median is the 0.5 quantile, so you can use the QUANTILE function to compute the "true" median. ("True" is in quotes because the computation is itself an approximation, but the approximation error should be small.) The following program computes the difference between the median values of the chi-square distribution as returned by the QUANTILE function and as computed by using the Wikipedia formula:
proc iml; d = do(0.1,0.3,0.005) || do(0.4, 6, 0.1); /* range of degree of freedom */ q = quantile("chisquare", 0.5, d); /* "true" computation */ formula = d#(1-2/(9*d))##3; diff = q-formula;
I've plotted the results. In the top graph, the "true" and approximate medians look identical except when the degree of freedom, d, is very small (for example, d < 0.125). However, there is a general rule to follow when you are trying to understand how two similar quantities differ: Do not plot the quantities themselves (the top graph), but rather plot the difference between the quantities (the bottom graph).
When you plot the difference, you get a better sense for the approximation. The formula appears to be biased for d > 1, since the formula is always less than the true median.
Is the formula of any value? Yes. It is a model, and models are good for showing us the big picture. The formula shows that the median increases roughly linearly with the degree of freedom. The formula is also useful for understanding the asymptotic value of the median: For a large degree of freedom, d, the formula indicates that the median is approximately d - 2/3. (I've used the approximation (1-x)3 ≈ (1-3x) for small x.)
Incidentally, the formula is a Laurent polynomial, which means that it includes both positive and negative powers of d. Laurent polynomials can be useful in numerical analysis, but you don't see them as often as their better-known cousin, the Taylor polynomial. A Laurent polynomial can capture the behavior of a function near zero (where Taylor polynomials also succeed) and far from zero (where Taylor polynomials often fail).