© 2000 John Petroff 

E- Sensitivity, elasticity and regression analysis

 

Company performance is dictated by economic and market conditions. Consequently, many accounting data series are correlated to outside events or trends. This is especially true of sales revenue which depends on number of customers and their ability to buy (i.e. purchasing power, in other words, disposable income). Expense items, such as energy cost, price of raw materials, availability of work force are also affected by economic patterns. In this analytical approach, the direct control of events by management decision is set aside. What is investigated is whether an accounting variable is sensitive to an outside variable. Establishing a correlation between an accounting data series and outside variable(s) can be very informative. Causality is usually not tested and would be very difficult to establish. Just learning that an accounting number changes at the same time as an economic trend is useful in itself. It is possible that the correlation exists because management take decisions in light of recent changes in the outside variable. The knowledge of a correlation with an outside variable can later be combined with information about management strategy or internal events revealed by ratios. How the mathematical precedures described below are used is demonstrated in Chapter 9 Section E-2 and in Chapter 14 Section E.

Determining whether two series of numbers go up and down together (or in opposite direction) can be done informally on a graph, scatter diagram or historigram. But this is very imprecise. A sensitivity analysis is best performed with statistical methods which are part of econometrics, or more generally statistical inference. A regression analysis is the procedure that tests the presence of a correlation between variables. It is obviously preferable to have as many empirical observations as possible: for an equation with one exogenous (or independent) variable, a regression may give poor results with just a few observations, say less than six, and meaningless result with less than four observations. What the results must show is whether the estimated equations is capable of generating estimated values of the endogenous variable (i.e. the variable that we want to explain) that are very close to the actual observations, or not. In other words, the quality of the correlation depends on whether the errors between estimated and actual values are small or large.

There can be any number of variables in the equation, but the number of variables has to be smaller than the number of observations (minus one). Having two or more exogenous variables can cause a number of statistical estimation problems. Autocorrelation is the major problem of this procedure; identifying it and dealing with it will be briefly explained below.

A complete mathematical explanation of regression estimated derivation is beyond the purpose of this text. A brief outline of the underlying derivation of estimates is presented in Appendix 5A and the main formulas are presented below. In practice, there are many software packages that only require entering data and interpreting results. Links to a large assortment of these regression software packages, excellent econometrics reference textbooks and numerous articles can be found at The Econometrics Journal online at http://www.econ.vu.nl/econometriclinks/.


The calculation can also be done with a calculator or any spreadsheet. An example of spreadsheet regression is present in Table T-5.28 presented in the appendix. One can verify that all the formulas correspond to mathematical rigor, and one could copy and use regs.xls in one's own spreadsheet. The values for x, y and n can be substituted at will. To obtain correct results one must copy and paste formulas for as many columns as there are observations, and change the value of n. A very courageous person could even rework all the formulas for more than one exogenous variable. The following will refer to the example detailed in the appendix to illustrate how to interpret regression results.

See review questions Q-5E.1 through Q-5E.4.

1)- Ordinary least squares (OLS)

Let us say we want to test if sales y, the endogenous variable, can be explained by disposable income x, the exogenous variable. The relationship tested is

yt = a + bxt + et

where a = constant term
b = coefficient of correlation
e = error term
t = time

for all observations of y and x from t = 1 to t = n

The regression calculates coefficients a and b for the minimum of the sum of squared estimated error term e. In other words, error terms (or residuals) et are

et = yt - a* - b*xt

where a* and b* are the estimates of coefficients a and b for which sum(et2) is minimum. The technique of the regression can be observed in Graph G-5.1 of the example below, where the estimated line or fitted line is such that the vertical distance (or deviation) of the fitted line to each observation is as small as possible. Note that the sum of error terms or deviations is null because positive deviations cancel out negative deviations. That is why it is the sum of squared error terms that must be minimized. A null sum of deviations is one of the necessary conditions for obtaining reliable estimates a* and b*.

The values of b* estimated coefficient is given by

b* = sum(d(yi)d(xi)) / sum(d(xi)2)

where d(xi) = xi - E(x)
d(yi) = yi - E(y)
E(x) = sum(xi)/n
E(y) = sum(yi)/n

And the value of a* coefficient estimate is given by

a* = E(y) - b*E(x)

To determine if there is a significant correlation between y and x, one must look at the standard deviations of estimated coefficients (shown in Table T-5.28 as sb and sa) and compare them the coefficients estimated values b* and a*. A rule of thumb is that the standard deviation should be smaller than the coefficient estimate. A more rigorous assessment is conducted with the t statistic obtained by dividing the coefficient estimate by the standard deviation

tb = b* / sb

Tables of t statistic values appear in most econometrics textbooks and are arranged by level of significance and degrees of freedom. Degrees of freedom are calculated as the number of observations minus the number of variables. The level of significance indicates the probability of making an error in believing that the true value of the coefficient is not zero. The higher the level of significance the more we can be confident that we found an actual correlation, but the higher is also the necessary value of the t statistic. In addition, the fewer are the degrees of freedom the higher is the required value of the t statistic. This confirms that to test a correlation a large number of observations is better.

In general, the correlation of interest is indicated by a meaningful coefficient of correlation, which is the b coefficient. Coefficient a, which is known as the intercept or constant term, is usually less important. Only in rare cases where a estimated minimum value of y is needed, is coefficient a studied with care. One method of judging the reliability of both estimates at once is to look at the statistic known as coefficient of determination, or R2, which is calculated as

R2 = 1 - SSR/TD

where SSR = sum squared residuals (i.e. error terms)
TD = sum of squared deviations (i.e. vertical distance of y values to the fitted line in Graph G-5.1)

The maximum value of R2 is 1 and the minimum is zero. Generally, R2 values of less than .50 show that the correlation is not very strong. However, in social sciences, R2 values of as low as .25 are sometimes accepted as indication that some correlation does exist.

 For the following example of regression analysis, the revenues of Delta Airlines are used and tested for any correlation with US GDP from 1987 to 1999. Table T-5.1 below presents the data, under it Graph G-5.1 gives a pictorial view of the relationship and Table T-5.2 gives the results of an OLS regression. The derivation of the results are further described and clarified in Appendix 5A Section 2.

 Table T-5.1

Delta Airlines Revenues and US GDP
years Delta Revenues (in $ millions) US GDP (in $ billions)
1987 5318 4742.5
1988 6915 5108.3
1989 8039 5489.1
1990 8683 5803.2
1991 9171 5986.2
1992 10837 6318.9
1993 11657 6642.3
1994 12077 7054.3
1995 12194 7400.3
1996 12455 7813.2
1997 13594 8300.8
1998 14138 8759.9
1999 14711 9256.1
 Source: Delta Airline Annual Reports 1999 and prior, and Statistical Abstract of the United States 2000.

Graph G-5.1

Table T-5.2

Coefficient estimates of equation: (Revenue) = a + b (GDP)

 a*

 sa

 ta

 b*

 sb

 tb

 R2

-2707

1089.6

-2.48

 1.97

0.16

 12.59

 0.93

The results of the regression show an undeniable relationship between Delta Airline revenues and GDP. This comes as no surprise since airlines are very sensitive to consumer confidence which is closely linked to economic prosperity.

 

Ordinary least squares regression will give acceptable results in most sensitivity analyses with one, two or three exogenous variables. The evaluation of the estimated coefficients of several exogenous variables is conducted exactly as for just one variable. But, with the introduction of each additional variable, there is more chance that problems may distort results and make one believe that a correlation is present where it is not, or occasionally, on the contrary, give erroneously poor results . The regression method discussed so far is known as ordinary least squares to distinguish it from several more complex techniques which are necessary when estimation problems arise and which will be briefly touched upon next.

See review questions Q-5E1.1 through Q-5E1.4.

See research assignment R-5.4

2)- More complex regression methods than OLS

Technically speaking, the presence of problems means that one or more of the conditions for obtaining best linear unbiased estimators (BLUE) of coefficients is violated, and ordinary least squares should not be used. One of the conditions (i.e. that the sum of residuals be null) was mentioned earlier, another is that residuals be independent from one another (i.e. uncorrelated). A complete discussion of the conditions for obtaining BLUE coefficient estimates is beyond the scope of this manual, and can be found in any econometrics textbook. But an analyst must be aware of the problems that may arise in order not to rely on defective results. In addition, suggestions will indicate what remedies are available if problems are present.

The problems encountered in regressions can be
- the magnitudes of the variables are not comparable (e.g. comparing consumption of food to both weights of elephants and weights of ants); this is called heteroscedasticity and requires separate investigations of the variables involved.
- when two of the variables are perfectly correlated (e.g. automobile consumption efficiency is compared to both distances traveled stated in miles and distances traveled stated in kilometers); it is obvious that one of the superfluous variables must be eliminated, but detecting that two variables are perfectly correlated is not always evident; this problem is known as multicollinearity.
- when one variable or the error term is not independent from its own prior years values; this is called autocorrelation and is the most troublesome.

There are also problems in specifying the equation to be estimated. If the variable that affects the endogenous variable most is omitted, this will cause the function to have poorly identified variables: the results will not be reliable. A somewhat similar problem stem form the choice of endogenous variable. Take for instance the case of consumer demand. Is quantity purchased increasing because sellers lower prices, or is price lowered by sellers because purchasers buy larger quantities? Most would answer: probably both. It would not matter much if only these two variables are used. But if we include credit terms, then only one function is appropriate (i.e. purchaser decision on quantity purchased is determined by price and credit terms), the other is misspecified (i.e. seller's price is determined by quantity purchased, as well as credit terms to some extent, but putting the two together is like mixing dogs and cats).

Problems also stem from the empirical data used. One typical data problem is that one observation is totally out of the ordinary (possibly caused by some catastrophe); such observation is considered an outlier or disturbance, and is usually removed from the series or replaced. It is common for the data set to have missing observations (which researcher accept in order to have as many degrees of freedom as possible because one will recall that this is necessary to obtain reliable results). If the missing data cannot be substituted, it may then be necessary to run regressions on separate sets of data. The data may in fact suggest that there isn't one linear trendline, but several separate ones or a nonlinear trendline. Regression is still possible in those cases, as will be outlined later.

Autocorrelation is by far the most common problem and possibly the most serious because most financial and economic variables are determined by what took place in the past and are not therefore independent from prior years. It is just not possible to avoid it the way it is possible to avoid other problems. The result is that OLS estimates are biased (i.e. erroneous). Fortunately, there is one method for detecting autocorrelation, and several techniques for overcoming it. Autocorrelation is detected with the Durbin-Watson statistic presented in the appendix. There are several regression techniques to deal with autocorrelation. They are
- generalized least squares where data series are transformed by removing the first order correlation between successive observations of the variables.
- two stage least squares, where the variable that causes the autocorrelation is purged of autocorrelation by substituting an instrumental variable.
- three state least squares, which purges even more autocorrelation than two stage least squares.
- full information maximum likelihood estimation.
Most regression packages will have these procedures available. An analyst ought to know when they should be used, as metioned above, by looking at the Durbin Watson statistic.

See review question Q-5E2.1.

See research assignments R-5.5 and R-5.6.

3)- Non linear models

There are many instances when a linear relation cannot be theoretically assumed to be present, or simply does not represent the observable pattern of data in a graph. Ordinary least squares regression (or one of the more complex procedures mentioned above if autocorrelation is detected) is still possible after transformation of the data of the variable(s). The relationship studied can be specified after the exogenous variable(s) has (have) been transformed by being
- raised to some power,
- a logarithmic function,
- an exponential or fractional relation,
- differentiated (i.e. change in value from one year to the next),
- a distributed lag model,
- or a combination of the above.
After obtaining the linear estimates from the transformed data, the coefficients are recalculated to allow the fitted line to be applied to the original data.

When it is necessary to postulate that the coefficients are not linear, then one of the procedures is to run regressions on separate intervals of the data.

See review question Q-5E3.1.

See research assignment R-5.7.

4)- Systems of equations and other extensions

Beyond sensitivity analysis which studies relations of one variable with other variables, usually one variable at a time, there are many cases where it is necessary to look as a system of several equations because endogenous variables need to be included as right hand side variables. Or, putting it another way, some of the right hand side variables are determined by the system. Entire econometric models can be estimated using either ordinary least squares if autocorrelation is not too serious, or two stage least squares and three stage least squares.

Multiple regressions are also used on a single functional relation by adding one variable at a time in order to determine which of the variables contribute most explanatory power. This is known in the field as stepwise multiple regression.

A less rigorous statistical procedure is sometimes used when the data is too dispersed to lead to meaningful results, i.e. there is some degree of heteroscetasticity (which was discussed above). The method is to use an OLS regression not on the original data series, but on a series where each observation is assign a rank, say from 1 to N (for N observations). This procedure is known as rank-order correlation, and is occasionally used in social sciences.

See review questions Q-5E4.1 through Q-5E4.3

See research assignment R-5.8.

 Previous: Ratio_meaning

Last modified: Jun/01/01
 Next: Time-series