Sample, Small
Sample, Small
a statistical sample of such a small size n that it is impossible to apply to it the simple classical formulas that are valid only asymptotically as n —. co. The distinctive features of statistical estimates of parameters using small samples can be most easily understood using the example of a normal distribution, for which samples of size n ≤ 30 are usually considered small. Suppose it is necessary to estimate an unknown mean value a of a normal population with unknown variance σ2 using a sample x1, x2, … , xn. From the population we denote
In estimating a, we proceed from the fact that the probability distribution of the variable
is independent of a and σ.
The probability ω that the inequality —tω < t < tω holds and that the inequalities
equivalent to it hold is calculated here using the formula
where s(t, n — 1) is the probability density for the Student distribution with n — 1 degrees of freedom. By determining the corresponding tω for given n and ω (0 < ω < 1), which may be done, for example, by using tables, we obtain rule (1) for finding the confidence limits for a with a significance of level ω.
For large n, equation (2), which relates ω and tω, can be replaced by the approximate formula
This formula is sometimes incorrectly used for determining tω for small n, which leads to gross errors. Thus, for ω =0.99, using formula (3) we find that t0.99 = 2.58. True values of t0.99 for small n are given in Table 1.
Table 1 | |||||||
---|---|---|---|---|---|---|---|
n | 2 | 3 | 4 | 5 | 10 | 20 | 30 |
t0.99 | 63.66 | 9.92 | 5.84 | 4.60 | 3.25 | 2.86 | 2.76 |
If formula (3) is used for n = 5, we may conclude that the inequality
is satisfied with probability 0.99. In fact, in the case of five observations, the probability of this inequality is equal to only 0.94, while the inequality
has probability 0.99, as can be seen from Table 1.
Similar methods of estimating the parameters of multivariate distributions (for example, correlation coefficients) using small samples have been developed.
REFERENCES
Cramér, H. Matematicheskie melody statistiki. Moscow, 1948. (Translated from English.)Kolmogorov, A. N. “Opredelenie tsentra rasseivaniia i mery tochnosti po ogranichennomu chislu nabliudenii.” Izv. AN SSSR: Seriia matematicheskaia, 1942, vol. 6, nos. 1-2.
Bol’shev, L. N., and N. V. Smirnov. Tablitsy matematicheskoi statistiki. Moscow, 1965.
IU. V. PROKHOROV