How is econometrics different from statistics?


Table of Contents

  1. Tasks of econometrics
  2. Core areas of econometrics
  3. Linear regression
  4. Violations of assumptions in the linear regression model
  5. Final notes

Tasks of econometrics

The central task of econometrics is the derivation of econometric models from economic theories and their numerical concretization. Since different detailed goals can be pursued with econometric analyzes, a consideration of the same is a further possibility to describe the task area of ​​econometrics:

  • Firstly, econometrics is used to quantify economic relationships. This is done by using empirical data and estimating specific values ​​for originally only abstract parameters in model equations that describe the relationship between economic variables. For example, for a model of investment demand, it can be specified in detail how much this changes if the long-term interest rate increases by one percentage point.
  • Second, econometrics offers opportunities for empirical testing of hypotheses and models. Before it can be said for a model that it approximately describes reality and that it can thus be used for the analysis of economic facts, its empirical validity must be checked after its estimate. Econometrics also offers test procedures that make it possible to discriminate between competing hypotheses.
  • If the empirical validity of the model is underpinned by appropriate tests, the third goal of the econometric analysis is to provide forecasts or to simulate changes that result, for example, from economic policy interventions. Models estimated from historical data are used by economists, for example, to make forecasts for future GDP growth or the inflation rate. The accuracy of such forecasts naturally depends on whether the past development of such variables can really provide information about their future development.

From these goals it can be seen that an econometric analysis must always be based on economic theories. In some cases, demands are placed on the theory that one is not always used to as an economist, e.g. the specification of concrete functional forms (e.g. linear or quadratic functions) for theoretical relationships. Furthermore, one needs suitable data in order to be able to describe the model variables with empirically observable data. For some questions, it is difficult to empirically operationalize the variables contained in the theoretical model (e.g. willingness to take risks, intelligence, trustworthiness, motivation).

Core areas of econometrics

While the cointegration and time series analysis, the generalized moment method, Bayesian econometrics, methods for utilizing quasi-experiments as well as micro and panel data models have become increasingly important in economics over the past 30 years, classical regression analysis is the oldest process complex in of econometrics and the starting point of the more modern methods and models mentioned. It is a statistical method that tries to explain the change in a so-called explained variable by means of the changes in a series of so-called explanatory variables by quantifying a single equation. Correlation analysis is closely related, but conceptually very different from regression analysis. Your primary goal is to measure the strength and direction of a linear relationship between variables (e.g., math and statistical grades). In addition, the two variables under consideration are treated symmetrically in the correlation analysis, i.e. no distinction is made between explained and explanatory variables.

A regression can determine whether there is a quantitative relationship between the explanatory variable and the explained variable. A regression result alone cannot prove causality despite statistical significance, since a statistical relationship never implies causality. In order to establish causality, one also needs theories and a priori knowledge outside of statistics. Nevertheless, in contrast to correlation analysis, regression analysis and other methods of econometrics are used to investigate causalities. In order for the empirical regression analysis to be able to do this, however, strict assumptions (see below) must be fulfilled. Ultimately, these assumptions can only be checked in their entirety with the help of economic theories and a priori knowledge outside of statistics. Because the statistical procedures for checking the assumptions are in turn based on certain assumptions that are justified with the help of economic theories. Therefore, one needs a priori knowledge outside of statistics to justify causalities.

Linear regression

In the simplest case, a regression model describes Y = β0 + β1X1 +… + ΒkXk + ε an endogenous variable Y through a linear relationship to one or more other variables X1, ..., Xk. Since there will be no exact relationship between empirically observed quantities in practice, an interference term e also includes all factors that, in addition to X1, ..., Xk have an influence on Y and are not directly detectable. Obtaining estimates for the model parameters β is of particular practical importance0, ..., βk, since on their basis forecasts for the occurrence of Y with existing occurrences of X1, ..., Xk are possible, provided that the model has proven to be empirically suitable. The standard method for estimating the parameters in linear regression models is the OLS estimation (Ordinary Least Squares). In order to be able to apply them without any problems, however, a number of assumptions must be fulfilled by the regression model. First, the regression model must have a linear parameter form and not all available observations of an X-variable must be the same, since otherwise no estimation with OLS is possible. Second, the conditional expectation of the disturbance term must be zero, i.e. E (ε | X1, ..., Xk) = 0, which implies a covariance between the X variables and ε of zero, i.e. Cov (ε, X1) = 0, ..., Cov (ε, Xk) = 0. This assumption of the exogeneity of X1, ..., Xk is essential, since only in this case ceteris paribus statements like "A change in X1 by one unit leads to a change in Y by β1 Units. "Are possible. A violation of this assumption (e.g. due to measurement errors in the explanatory variables or neglect of central model variables) leads to distorted and inconsistent parameter estimates . If disturbance rhomoscedasticity and uncorrelatedness are not given, the standard errors of the parameter estimators are estimated in a biased manner and the hypothesis tests belonging to the regression model are falsified. Fourth, there must be no perfect correlation between the explanatory variables, since in such a case so-called perfect multicollinearity an OLS estimation is impossible. Imperfect multicollinearity, which is characterized by high correlations (different from one), is also problematic, since OLS in this case does not precisely differentiate between the influences of the individual variables and is therefore imprecise can provide accurate parameter estimates. Fifthly, the disturbance terms should be distributed as normally as possible, since this distribution property is of decisive importance for hypothesis tests in regressions.

Violations of assumptions in the linear regression model

A series of statistical tests can be used to obtain indications that assumptions have been violated. If problems are identified, depending on the nature of the problem, the model specification can be revised, robust supporting methods can be used (e.g. Newey-West standard error) or alternative estimation methods (e.g. instrument variable estimation) can be used. If economic theory already indicates that the assumptions of the classic regression model do not turn out to be realistic, the more general methods are usually used directly.

If the assumption of parameter linearity is not fulfilled, a parameter-linear form can possibly be produced by model transformation (e.g. by taking the logarithm). If this is not possible, no OLS estimation can be carried out. In this case, however, an NLS estimate (Nonlinear Least Squares) can be used, for example. Closely related to the assumption of parameter linearity is that time-invariant model parameters are assumed in the linear regression model. To check this assumption, Chow, CUSUM and CUSUMQ tests are used in practice. If structural breaks are identified with these tests, this can be interpreted as an indication of a serious incorrect specification of the model. It is then advisable to revise the model specification, for example with regard to the selection of variables or functional form. However, if structural breaks are expected based on theoretical considerations, instead of a modification of the specification, the present sample period can be split up and a separate OLS estimate can be made for the partial periods.

A lack of exogeneity or existing endogeneity of the explanatory variables can be detected with the so-called Hausman test. The violation of assumptions can be countered with the so-called instrument variable estimation (IV estimation). This requires so-called instrument variables that are highly correlated with the endogenous explanatory variables (instrument relevance) and at the same time not correlated with the disturbance term (instrument exogeneity). In contrast to OLS, the IV estimator delivers consistent parameter estimates with suitable instrument quality. The quality of the instruments can be checked by regressing the endogenous explanatory variable on all instruments including the exogenous variables (test of instrument relevance) and the so-called Sargan test (test of instrument exogeneity).

A problem frequently encountered particularly in cross-sectional regressions (data material on different feature carriers at a certain point in time) is that of heteroscedasticity, i.e. non-constant, conditioned variance of the stochastic disturbance term. If heteroscedasticity can be detected by means of a Breusch-Pagan or White test, there is the possibility in large samples of using heteroscedasticity-robust White standard errors instead of the standard errors then incorrectly estimated by OLS. Alternatively, the use of WLS (Weighted Least Squares) is also conceivable in large samples. Here, the data is transformed on the basis of the heteroscedasticity structure specifically revealed by special test procedures, so that a model that can be estimated with OLS is created that no longer exhibits heteroscedasticity. This procedure not only gives different estimated standard errors, but also more efficient estimates of the model parameters. What is critical with this method, however, is that more efficient estimates than with OLS in the non-transformed model can only arise if the heteroscedasticity structure is correctly recorded. It is precisely the detection of this structure that is usually problematic.

In time series regressions (data material on a feature carrier at different points in time) one is often confronted with the problem of fault auto-correlation. If you can use a Durbin-Watson test to discover autocorrelation of the first order (interference term only depends on the value of the previous period) or a Breusch-Godfrey test for autocorrelation of a higher order (interference term also depends on values ​​further back), you have the option to counter this in large samples with autocorrelation-robust Newey-West standard errors. These are standard errors that take into account autocorrelation of any order (and also heteroscedasticity) in their calculation formulas. Alternatively, a generalization of WLS, namely GLS (Generalized Least Squares), can be used in large samples. This method provides correct standard errors and more efficient estimates of the model parameters, provided that the autocorrelation structure used for the model transformation has been captured correctly. In this context, it is important that both heteroscedasticity and autocorrelation can be interpreted as indications of incorrect specification of the model. This means that the countermeasures described should only be implemented when specification errors have been ruled out as far as possible.

In connection with time series regressions, it should also be noted that the so-called stationarity of the time series used plays an important role here. Time series are generally referred to as stationary if their basic properties such as expected value and variance do not change over time. In the case of non-stationary time series, however, such changes occur. If non-stationary time series are used in regressions, this can lead to so-called spurious regressions. This means that in most cases incorrect estimates result which suggest that the explanatory variable has a significant influence on the explained variable, even if there is no connection between the two variables. For example, the so-called Dickey-Fuller test can be used to test for stationarity. If the test shows non-stationarity, one can make do with estimating first differences, since first differences are often stationary, even if the initial series are not. However, there are also cases in which one can use non-stationary variables in models. If the variables of a regression model are non-stationary, it can still happen that a linear combination of the variables is stationary. In such a case, the variables are said to be cointegrated or to have a long-term relationship (an equilibrium relationship) between them. Whether two variables are cointegrated can be checked, for example, by subjecting the residuals of the simple OLS estimate to a Dickey-Fuller test, taking special critical values ​​into account. In the case of cointegrated variables, it would be disadvantageous to estimate only the first differences, since this only shows the short-term and not the long-term relationship between the variables. If one estimates the model in its original form, one only depicts the long-term relationship. In order to take both maturities into account, special methods for estimating the parameters of interest are used for models with cointegrated series. These include, for example, error correction models and the dynamic OLS estimation.

After this excursus into the time series regression, we will deal again with the basic assumptions of the linear regression model. While perfect multicollinearity is immediately indicated by a technical error message from the econometric software, indications of imperfect multicollinearity can be obtained from high paired correlation coefficients and high coefficients of determination in regressions of the explanatory variables on each other. Since imperfect multicollinearity does not necessarily lead to an increase in the variance of the parameter estimates, in practice a certain amount of imperfect multicollinearity is usually tolerated as long as the variances do not become so large that they influence the hypothesis tests too strongly.

Even the assumption of the normal distribution of the stochastic disturbance term is usually not subjected to intensive tests in practice. There are test procedures, such as the Jarque-Bera test, but in large samples it is usually assumed that the central limit theorem means that the estimated parameters are at least approximately normal.

Final notes

The present article gives only a rough overview of the basic facts of traditional econometric approaches. The use of professional software is ideal for the practical implementation of regression analyzes and other econometric methods.EViews and Stata are the most widely used econometric programs in this regard, with Stata having its particular strengths in panel and microeconometrics, while EViews is more designed for the analysis of time series.