regression analysis

[ri′gresh·ən ə‚nal·ə·səs] (statistics) The description of the nature of the relationship between two or more variables; it is concerned with the problem of describing or estimating the value of the dependent variable on the basis of one or more independent variables.

Regression Analysis

the branch of mathematical statistics that encompasses practical methods of studying a regression relation between variables on the basis of statistical data. The purposes of regression analysis include the determination of the general form of a regression equation, the construction of estimates of unknown parameters occurring in a regression equation, and the testing of statistical regression hypotheses.

When the relationship between two variables is studied on the basis of the observed values (x₁, y₁),…, (x_n, y_n) in accordance with regression theory, it is assumed that one of the variables, Y, has a certain probability distribution when the value of the other variable is fixed as x. This probability distribution is such that

E(Yǀx) = g(x, β)

D(Yǀx) = σ²h²(x)

where β denotes the set of unknown parameters that determine the function g(x), and h(x) is a known function of x—for example, it may have the constant value one. The choice of a regression model is determined by the assumptions regarding the form of the dependence of g(x, β) on x and β. The most natural model, from the standpoint of a unified method of estimating the unknown parameters β, is the regression model that is linear in β:

g(x, β) = β₀g₀(x) + … + β_kg_k(x)

Different assumptions may be made regarding the values of the variable x, depending on the nature of the observations and the aims of the analysis. In order to establish the relationship between the variables in an experiment, a model is used that is based on simplified but plausible assumptions. These assumptions are that the variable x is a controllable variable, whose value is assigned during the design of the experiment, and that the observed values of y are expressed in the form

y_i = g(x_i, β) + ∊_i i = 1,…,k

where the quantities ∊_i, describe the errors. The errors are assumed to be independent under different measurements and to be identically distributed with zero mean and constant variance σ². The case where x is an uncontrollable variable differs in that the observed values (x₁, y₁),…. (x_n, y_n) constitute a sample from a bivariate population. In either case, the regression analysis is performed in the same way. The interpretation of the results, however, is done substantially differently. If both the variables are random, the relationship between them is studied by the methods of correlation analysis.

A preliminary idea of the nature of the relation between g(x) and x can be obtained by plotting the points (x_i, ȳ(x_i) in a scatter diagram, which is also called a correlation field when both variables are random. The ȳ(x_i) are the arithmetic means of the values of y that correspond to a fixed value x_i. For example, if the points fall near a straight line, a linear regression can be used as the approximation.

The standard method of estimating the regression line is based on the polynomial model (m ≥ 1):

y(x, β) = β₀ + β₁x = … + β_mx^m

One reason for the choice of this model is that every function continuous over some interval can be approximated by a polynomial to any desired degree of accuracy. The unknown regression coefficients β₀,…, β_m and the unknown variance σ² are estimated by the method of least squares. The estimates β₀,…, β̂₀ of the parameters β₀, …, β_m obtained by this method are called the sample regression coefficients, and the equation

ŷ(x) = β̂₀ + … + β_mx^m

defines what is called the sample regression line. If the observed values are assumed to be normally distributed, this method leads to estimates of β₀,…, β_m and of σ² that coincide with estimates obtained by the maximum likelihood method. The estimates obtained by the least squares method are in some sense best estimates even when the distribution is not normal. Thus, if a linear regression hypothesis is to be tested,

where x̄ and ȳ are the arithmetic means of the x_i and y_i. The estimate ĝ(x) = β̂₀ + β₁(x) is an unbiased estimate of g(x); its variance is less than the variance of any other linear estimate. The assumption that the y_i have a normal distribution is the most effective method of checking the accuracy of the constructed sample regression equation and of testing the hypotheses on the parameters of the regression model. In this case, the construction of the confidence intervals for the true regression coefficients β₀,…, β_m and the testing of the hypothesis that no regression relationship exists (β_i = 0, i = 1,…, m) are carried out by means of Student’s distribution.

In a more general situation, the observed values y₁,…,y_n are regarded as values of independent random variables with identical variances and the mathematical expectations

Ey_i = β_i x_u+ … + β_kx_ki i = 1…,n

where the values of the X_ji, j = 1,…, k, are assumed known. This form of linear regression model is general in the sense that higher-order models in the variables x₁, …, x_k reduce to it. Moreover, certain models that are nonlinear in β can also be reduced to this linear form by a suitable transformation.

Regression analysis is one of the most widespread methods of processing the results of observations made during the study of relationships in such fields as physics, biology, economics, and engineering. Such branches of mathematical statistics as analysis of variance and the design of experiments are also based on regression analysis. Regression analysis models are widely used in multivariate statistical analysis.

REFERENCES

Yule, G. U., and M. G. Kendall. Teoriia statisliki, 14th ed. Moscow, 1960. (Translated from English.)
Smirnov, N. V., and I. V. Dunin-Barkovskii. Kurs teorii veroiatnostei i matematicheskoi statisliki dlia tekhnicheskikh prilozhenii, 3rd ed. Moscow, 1969.
Aivazian, S. A. Statislicheskoe issledovanie zavisimostei. Moscow, 1968.
Rao, C. R. Lineinye statisticheskie metody i ikh primeneniia. Moscow, 1968. (Translated from English.)

A. V. PROKHOROV

regression analysis

In statistics, a mathematical method of modeling the relationships among three or more variables. It is used to predict the value of one variable given the values of the others. For example, a model might estimate sales based on age and gender. A regression analysis yields an equation that expresses the relationship. See correlation.

regression analysis

Regression analysis

A statistical technique that can be used to estimate relationships between variables.

Regression Analysis

In statistics, the analysis of variables that are dependent on other variables. Regression analysis often uses regression equations, which show the value of a dependent variable as a function of an independent variable. For example, a regression could take the form:

y = a + bx

where y is the dependent variable and x is the independent variable. In this case, the slope is equal to b and a is the intercept. When plotted on a graph, y is determined by the value of x. Regression equations are charted as a line and are important in calculating economic data and stock prices.

regression analysis

The measurement of change in one variable that is the result of changes in other variables. Regression analysis is used frequently in an attempt to identify the variables that affect a certain stock's price.

regression analysis

a statistical technique for estimating the equation which best fits sets of observations of dependent variables and independent variables, so generating the best estimate of the true underlying relationship between these variables. From this estimated equation it is then possible to forecast what the (unknown) dependent variable will be for a given value of the (known) independent variable.

Regression analysis is used in SALES FORECASTING to estimate trend lines in time series analysis and causal links with variables which affect sales; and in ECONOMETRICS to measure relationships between economic variables.

Fig. 168 Regression analysis.

regression analysis

a statistical technique used in ECONOMETRICS for estimating the EQUATION that best fits sets of observations of DEPENDENT VARIABLES and INDEPENDENT

VARIABLES, so generating the best estimate of the true underlying relationship between these variables. From this estimated equation, it is then possible to redict what the (unknown) dependent variable(s) will be for a given value of the (known) independent variable(s).

Taking the simplest example of a linear equation with just one independent variable and a dependent variable (disposable income and consumption expenditure), the problem is to fit a straight line to a set of data consisting of pairs of observations of income (Y) and consumption. Fig. 168 shows such a set of paired observations plotted in graph form, and we need to find the equation of the line that provides the best possible fit to our data, for this line will yield the best predictions of the dependent variable. The line of best fit to the data should be chosen so that the sum of the squares of the vertical deviations (distances) between the points and the line should be a minimum. This method of ordinary least squares is applied in most regressions. The goodness of fit of the regression line to the sample observations is measured by the CORRELATION COEFFICIENT.

In arithmetic terms the line depicted in Fig. 168 is a linear equation of the form:

where the coefficients of the equation, a and b, are estimates (based on single observations) of the true population parameters. These constants, a and b, obtained with the method of ordinary least squares, are called the estimated regression coefficients, and once their numerical values have been determined then they can be used to predict values of the dependent variable from values of the independent variable (Y). For example, if the estimated regression coefficient of a and b were 1,000 and 0.9 respectively then the regression equation would be C=1,000 + 0.9 Y and we could predict that for a disposable income of £10,000, consumer expenditure would be:

The regression coefficient of the slope of the linear regression, b, is particularly important in economics for it shows the change in the dependent variable (here consumption) associated with a unit change in the independent variable (here income). For example, in this case a b value of 0.9 suggests that consumers will spend 90% of any extra disposable income.

The regression equation will not provide an exact prediction of the dependent variable for any given value of the independent variable, because the regression coefficients estimated from sample observations are merely the best estimate of the true population parameters and are subject to chance variations.

To acknowledge the imperfections in any estimated regression equation based on a sample in depicting the true underlying relationship in the population as a whole, the regression equation is generally written as:

with the addition of a residual or error term, e , to reflect the residual effect of chance variations and the effects of other independent variables (such as, say, interest rates on consumer credit) that influence consumption spending but are not explicitly included in the regression equation.

Where it is felt that more than one independent variable has a significant effect upon the dependent variable, then the technique of multiple linear regression will be employed. The technique involves formulating a multiple linear regression equation involving two or more independent variables, such as:

where I is the interest rate on consumer credit and d is an additional regression coefficient attached to this extra independent variable. Estimation of this multiple linear regression equation by the method of ordinary least squares involves fitting a three-dimensional plane or surface to a set of sample observations of consumer spending, disposable income and interest rates in such a way as to minimize the squared deviations of the observations from the plane.

In arithmetic terms, the sample observations can be used to generate numerical estimates of the three regression coefficients (a , b and d) in the above equation. See FORECASTING.

regression analysis

regression analysis

regression analysis

Regression Analysis

REFERENCES

regression analysis

regression analysis

re·gres·sion a·nal·y·sis

re·gres·sion a·nal·y·sis

regression analysis

re·gres·sion a·nal·y·sis

regression analysis

Regression analysis

Regression Analysis

regression analysis

regression analysis

regression analysis

regression analysis

Words related to regression analysis

noun the use of regression to make quantitative predictions of one variable from the values of another

Related Words