Introduction
This module offers a brief introduction of some of the issues that arise in the analysis of time-series. Most of the topics covered are those that we attacked first by statisticians and economists. As such they do not demand the more sophisticated tools used by the more modern approaches to time-series. In spite of these shortcomings, they should give you some understanding of the issues that arise with the use of times-series in econometric analyses. One final note of explanation is necessary. These notes are designed to give you a brief introduction to how Stata handles time-series data. These notes are not a substitute for reading the Stata manual, completing a forecasting course, or reading standard texts on the rather complicated field.
Time-series analysis in Stata
Throughout this module we work with US macroeconomic data included in the MS Excel file Macro data.xls. The variables are real level of investments (RINV), real gross national product (RGNP), and real interest rate (RINTRATE). The real interest rate is approximated by the difference between the nominal interest rate and the rate of change of the price index from the previous year. The data are for the years 1963 to 1982. You can replicate the analysis done here by copying this data set into a Stata file.
The first step after entering the data set into Stata, is to declare that the data set is a time-series. The command to do this is:
. tsset year
The data set can be broken into any number of time periods including daily, weekly, monthly, quarterly, halfyearly, yearly and generic.1
Assume that we want to estimate the following regression:
using the data set in the appendix. Figure 1 shows this regression command and the resultant output.
![]() |
On the surface the estimates seem “reasonable” because the signs on the two explanatory variables are what theory predicts they should be and the parameter for real GNP is statistically different from zero. However, an examination of the residuals shown in Figure 2 suggest that the error terms might exhibit autocorrelation.
![]() |
There are several issues that arise here. First, what sort of models can we use to account for autocorrelation? Second, what sorts of tests exist for detecting the existence of autocorrelation? We begin with the first of these questions by introducing the concept of first-order autocorrelation. Consider the following model:
We say that this model exhibits first-order autocorrelation if the error terms can be written as:
where
For the moment we will assume that Equations (2) and (3) are true representations of the world. What then can we do to estimate (2)? What we need to do is find a way to transform (2) so that the error term of whatever regression we estimate does not exhibit autocorrelation. In time period
Multiply (4) by
Now subtracting (5) from (4) gives:
or, equivalently,
Let
and
Remember that (3) implies that
where
Cochran and Orcutt [1949] use this algebra to suggest one way to estimate (6). The estimation entails several steps. First, you use OLS to estimate (2). Second, you estimate (3) using the residuals from the first stage to approximate
We now turn to the issue of detecting the existence of autocorrelation. In what follows we focus mainly on the detection of first-order autocorrelation as shown in Equation (3). We can use the Durbin-Watson test to see if our suspicions are correct. The Durbin-Watson statistic tests the hypothesis:
![]() |
The details of the test statistic can be found in any econometrics textbook and need not detain us here. What you need to know about the DW-statistic are (1) it has a mean value of 2; (2) because its distribution lies between two limiting distributions, we need to look at two critical values. For this reason there are two critical values—one for each of the limiting distributions. Figure 3 illustrates the probability distribution function (pdf) for the Durbin-Watson statistic. The true pdf lies somewhere between the blue pdf and the red pdf. What is shown in the figure is the point below which, say, 5 percent of the distribution lies for each distribution. The true critical point lies somewhere between
If
![]() |
The command for the test and the resultant DW-statistics for the estimate of Equation (2) are shown in Figure 4. The 5 percent level critical values for the Durbin-Watson statistic for a sample size of 19 with two parameters (less the intercept) estimated are 1.074 and 1.536—if the observed value of the DW-statistic is between 1.536 and 2.464, we can accept the null hypothesis that the residuals do not exhibit autocorrelation. Our value of 1.32 falls in the uncertain region where we are not sure if we can or cannot reject the null hypothesis.
At this point we can try the Cochran-Orcutt estimate. Figure 5 reports the results of using the Cochran-Orcutt estimation procedure. Notice that it took 7 iterations for the estimate of
![]() |
![]() |
Using either the Cochran-Orcutt or the Prais-Winstn estimator is dependent on the assumption that the error terms exhibit first-order autocorrelation. Unfortunately, there is no particular reason (from a theoretical viewpoint) to believe in this assumption. Why, for instance, couldn't the error terms of Equation (2) exhibit second-order autocorrelation of the form:
There is a more troubling possible explanation for the low Durbin-Watson statistic: the model may be misspecified. In particular, there may be important explanatory variables omitted from the regression. These omitted explanatory variables may exhibit autocorrelation and, thus, may be the source of autocorrelation in the error term. If the omitted explanatory variables are correlated with the included explanatory variables, then the parameter estimates are biased. The large difference in the estimate of parameter for real interest rates for the OLS regression and the Cochran-Orcutt estimate is suggestive of model misspecification.2
More modern time-series models
ARMA models
The model we described above is assumed to have first-order autoregressive error disturbances. Such a process is referred to as AR(1). The error structure in (8) is AR(2). If we apply this concept to a data series, we would call the following an AR(p) process:
Another approach available to us is to think of a data as a weighted average of some error terms that are assumed to have a mean of zero, have a fixed variance, and be uncorrelated over time3:
A data series exhibiting this pattern is called a moving average process or MA(q). The error tern is known in the literature as white noise. A data series that has both autoregressive and moving average characteristics is call an autoregressive moving average (ARMA) series; an ARMA(p, q) is:
It may help to show two series constructed to have different ARMA patterns. Figure 7 shows one of the potential time series generated by the ARMA(2,1) process:
![]() |
Figure 8 shows one potential time series generated by the ARMA(1,1) process:
![]() |
Stationarity
Consider the time-series
The last term,
Quite often you can create a stationary time-series from a non-stationary time-series by taking the first-differences of the non-stationary series. If the first difference does not produce a stationary series, then one continues to take first differences until you find a stationary series. For instance, the time-series shown in Figure 7 appears to be non-stationary. The first differences of this series is shown in Figure 9. Using the imperfect eye, it would appear that the first differences of (13) is stationary. However, we really cannot tell anything for sure from the graph of a data set. We need to use the restrictions of the parameters derived in advanced texts to determine if a data set is stationary.4
![]() |
The autocorrelation function
One of the major ways to identify the structure of a time series is to look at the autocorrelation function. The autocorrelation function,
The researcher then has to compare the actual autocorrelation function with the theoretical autocorrelation for comparable data series. To see to use the autocorrelation function consider the following five time series5:
Each of these functions has a theoretical autocorrelation function; graphs of these autocorrelation functions are shown in the left column of Figure 10.6
![]() |
There is additional function we can use to help identify the nature of a time-series. Consider the following regressions:
where
Our interpretation of the
![]() |
You can generate prettier graphs of the autocorrelation functions using the .ac varname command. For instance, the command .ac rinv generates the graph shown in Figure 12. The .pac varname generates a graph for the partial autocorrelations as is shown in Figure 13.
![]() |
![]() |
There are several generalizations one can use to help identify the process underlying a data series. Table 1 [Enders (2005): p. 85] offers a brief summary of these properties of the autocorrelation and partial autocorrelation functions.
| Process | Autocorrelation function | Partial autocorrelation function |
| White-noise | All
|
All
|
| AR(1):
|
Direct exponential decay |
|
| AR(1):
|
Decays toward zero. Coefficients may oscillate |
|
| AR(p) | Decays toward zero; Coefficients may oscillate | Spikes through lag p. All
|
| MA(1):
|
Negative spike at lag 1.
|
Oscillating decay:
|
| MA(1):
|
Positive spike at lag 1.
|
Decay:
|
| ARMA(1, 1):
|
Exponential decay beginning at lag 1. Sign
|
Oscillating decay beginning at lag 1.
|
| ARMA(1, 1):
|
Oscillating decay beginning at lag 1. Sign
|
Exponential decay beginning at lag 1.
|
| ARMA(p, q) | Decay (either direct or oscillatory) beginning at lag q | Decay (either direct or oscillatory) beginning at lag p |
Estimation of ARMA models
The estimation of ARMA models are relatively easy in Stata. The basic command to estimate an ARMA model is: .arima depvar [varlist], ar(numlist) ma(numlist).7 The first thing to notice in the command that this command can apply to either to a single variable or to an equation. If [varlist] is omitted, Stata will produce an estimate of the ARMA model for that variable; if the list is included, it will estimate the model with the disturbances allowed to have the ARMA structure specified in the command. Figure 14 reports the estimation of an ARMA model for real investment levels. Notice that we write AR(1/2) so that Stata knows to include both the first and second autoregressive term. A command of AR(2) would include only the second autoregressive term. In Figure 15 we report the ARMA (2, 1) estimation of (1).
![]() |
![]() |
| ARMA(1, 1) | ARMA(2, 1) | AR(1) | AR(2) | MA(1) | |
| Intercept | 185.307 | 185.6556 | 184.8208 | 185.2092 | 189.373 |
| (10.06) | (10.83) | (9.27) | (10.25) | (18.09) | |
| AR (L1) | 0.70936 | 1.76342 | 0.80307 | 0.95257 | — |
| (3.12) | (5.27) | (5.51) | (4.47) | — | |
| AR (L2) | — | -0.81715 | — | -0.18963 | — |
| (-3.21) | (-0.91) | ||||
| MA (L1) | 0.26236 | -0.99998 | — | — | 0.87262 |
| (0.90) | (-0.00) | (2.97) | |||
| Log likelihood | -86.1791 | -85.8702 | -86.47780 | -86.21224 | -88.48713 |
| Wald χ2 | 26.96 | 422.60 | 30.36 | 31.65 | 8.81 |
| Probability > χ2 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| Sample size | 19 | 19 | 19 | 19 | 19 |
| (14,1) | 1964-1982 | 1964-1982 | 1964-1982 | 1964-1982 | 1964-1982 |
The interpretation of these results is not obvious. We check the sensitivity of these results by estimation some other models. The results of these estimations are reported in Table 2 and Table 3. Based purely on ML tests, it would appear that AR(1) model in Table 2 is as good as any of the models describing the ARMA structure of real investments. On the other hand, the results reported in Table 3 suggests that the ARMA(2, 1) appears to be the best model to assume for the disturbance term in the estimates of Equation (1).
| AR(1) | ARMA(1, 1) | ARMA(2, 1) | |
| Intercept | -14.49489 | -13.37455 | -16.89182 |
| (-0.26) | (-0.23) | (-1.68) | |
| Real GNP | 0.17006 | 0.16912 | 0.17253 |
| (3.96) | (3.78) | (20.18) | |
| Real interest rate | -0.82517 | -0.92007 | -0.33692 |
| (-0.46) | (-0.33) | (-0.25) | |
| AR (L1) | 0.27953 | -0.02028 | 0.85619 |
| (0.60) | (-0.02) | (1.46) | |
| AR (L2) | — | — | -0.70702 |
| (-2.64) | |||
| MA (L1) | — | 0.41151 | -1.00000 |
| (0.42) | (-2.98) | ||
| Log likelihood | -78.7868 | -78.4279 | -72.94569 |
| Wald χ2 | 26.30 | 31.86 | 980.18 |
| Probability > χ2 | 0.0000 | 0.0000 | 0.0000 |
| Sample size | 19 | 19 | 19 |
| Sample period | 1964-1982 | 1964-1982 | 1964-1982 |
Other time-series concepts
There are a large number of additional time-series methods and issues that are not discussed in this module. These topics include, among others, ARCH and GARCH estimators, unit roots, the Dickey-Fuller test, and vector autoregression (VAR) models. There is no way to do justice to these topics in notes as short as these are. Moreover, it is necessary to discuss difference equations (the discrete version of differential equations) if one wants to understand many of these topic at anything more than an intuitive level. Those interested in these topics should enroll in the forecasting course (Economics 422) or, if they cannot, plan to read several textbooks on whatever econometric tool they need to understand.
Exercise
Exercise 1
This exercise is designed to be sure you know how to use Stata in analyzing time-series data sets; there is no economic content in the exercise. The MS Excel file Rabun County Temperature Data reports the morning temperature (MornTemp) observed in Rabun County, Georgia for every day between March 15, 2005 to November 2, 2008. The data set includes a variable “edate” that is the daily date in Stata notation. The data set also includes dummy variables for the season, the month, and the year of each observation (with the Winter, the December, and the 2008 dummy variables omitted).
a. Create a graph of (a) the data set morntemp, (2) the autocorrelations of morntemp, and (3) the partial autocorrelations of morntemp (you will have to set the matrix size to some number greater than 43 using the command .set matsize #).
b. Estimate the following models:
- ARMA(2,2) for morntemp.
- ARMA(2,2) for morntemp as a function of the season dummy variables.
- ARMA(2,2) for morntemp as a function of the monthly dummy variables.
- ARMA(2,2) for morntemp as a function of the monthly dummy variables and the annual dummy variables.
- ARMA(1,2) for morntemp as a function of the monthly dummy variables and the annual dummy variables.
- ARMA(1,1) for morntemp as a function of the monthly dummy variables and the annual dummy variables.
c. Arrange the parameter estimates in a table and comment on them. Include the results of estimating (6) using OLS; what is the DW-statistic for this regression?
References
Cochran, D. and G. Orcutt (1949). Application of Least Squares Regression to Relationships Containing Autocorrelated Error Terms. Journal of the American Statistical Association 44: 32-61.
Enders, Walter (1995). Applied Econometric Time Series (New York: John Wiley & Sons, Inc.).
Greene, William H. (1990). Econometric Analysis (New York: Macmillan Publishing Company).
StataCorp (2003). Stata Statistical Software: Release 8.0: Stata Time-Series Reference Manual (College Station, TX: Stat Corporation).



Statistical terminology















