# Connexions

You are here: Home » Content » Panel Data Models

### Recently Viewed

This feature requires Javascript to be enabled.

# Panel Data Models

Module by: Christopher Curran. E-mail the author

Summary: This module offers a short introduction to the techniques used to estimate panel data models; it is designed for use by advanced undergraduates.

## Equation Chapter 1 Section 1Notes on Panel Data Models

### Introduction

Panel data methods are appropriate when the researcher has available observations that are both cross-sectional and time series. For example, one could form a panel data set with observations on the per capita consumption of tobacco for a set of OECD countries over the period 1960 to 2005. Usually the data is “stacked”—that is, all of the observations for country A is listed together in order of year before the data for country B, etc. It is also possible to stack the data by year—countries A to Z for 1960, countries A to Z for 1961, and so on through 2005.

Let y it y it MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGPbGaamiDaaqabaaaaa@3905@ be the per capita consumption of tobacco for country i in year t. We wish to model the per capita consumption of tobacco as a function of a set of observable independent variables like the price of tobacco, income, restrictions on tobacco advertising, and restrictions on tobacco consumption. Of course there are several sources of unobserved heterogeneity in that data set. In particular, we might expect that systematic differences in consumption patterns would exist due to differences in the customs and mores of the various countries in the sample. It also would be reasonable to assume that these country-level differences are be relatively stable over time. Additionally, we might expect that there would be differences the per capita consumption of tobacco over time due to changes in our understanding of the long run health effects of tobacco consumption. These changes might affect both (1) the level of consumption and (2) the responsiveness of the consumption of tobacco to changes in the explanatory variables.

In these notes we describe some of the ways of modeling panel data sets and discuss some of the issues associated with the estimation of these models. We also discuss how to use Stata to analyze panel data sets. We begin by considering some of the types of panel data model specifications.

### Model specification

There are four general specifications of the panel data model available. The differences in these models reflect differing assumptions one might make and are listed below.

#### 1. Slope coefficients are constant and the intercept varies over the individuals:

y it = α i + j=1 k β j x jit + ε it , i=1,,N, i=1,,N, and t=1,,T. y it = α i + j=1 k β j x jit + ε it , i=1,,N, i=1,,N, and t=1,,T. MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGPbGaamiDaaqabaGccqGH9aqpcqaHXoqydaWgaaWcbaGaamyAaaqabaGccqGHRaWkdaaeWbqaaiabek7aInaaBaaaleaacaWGQbaabeaakiaadIhadaWgaaWcbaGaamOAaiaadMgacaWG0baabeaaaeaacaWGQbGaeyypa0JaaGymaaqaaiaadUgaa0GaeyyeIuoakiabgUcaRiabew7aLnaaBaaaleaacaWGPbGaamiDaaqabaGccaGGSaGaaeiiaiaadMgacqGH9aqpcaaIXaGaaiilaiablAciljaacYcacaWGobGaaiilaiaabccacaWGPbGaeyypa0JaaGymaiaacYcacqWIMaYscaGGSaGaamOtaiaacYcacaqGGaGaaeyyaiaab6gacaqGKbGaaeiiaiaadshacqGH9aqpcaaIXaGaaiilaiablAciljaacYcacaWGubGaaiOlaaaa@692C@
(1)

#### 2. Slope coefficients are constant and the intercept varies over the individuals and over time:

y it = α it + j=1 k β j x jit + ε it , i=1,,N, and t=1,,T. y it = α it + j=1 k β j x jit + ε it , i=1,,N, and t=1,,T. MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGPbGaamiDaaqabaGccqGH9aqpcqaHXoqydaWgaaWcbaGaamyAaiaadshaaeqaaOGaey4kaSYaaabCaeaacqaHYoGydaWgaaWcbaGaamOAaaqabaGccaWG4bWaaSbaaSqaaiaadQgacaWGPbGaamiDaaqabaaabaGaamOAaiabg2da9iaaigdaaeaacaWGRbaaniabggHiLdGccqGHRaWkcqaH1oqzdaWgaaWcbaGaamyAaiaadshaaeqaaOGaaiilaiaabccacaWGPbGaeyypa0JaaGymaiaacYcacqWIMaYscaGGSaGaamOtaiaacYcacaqGGaGaaeyyaiaab6gacaqGKbGaaeiiaiaadshacqGH9aqpcaaIXaGaaiilaiablAciljaacYcacaWGubGaaiOlaaaa@62CE@
(2)

#### 3. All coefficients vary over individuals:

y it = α i + j=1 k β ji x jit + ε it , i=1,,N, and t=1,,T. y it = α i + j=1 k β ji x jit + ε it , i=1,,N, and t=1,,T. MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGPbGaamiDaaqabaGccqGH9aqpcqaHXoqydaWgaaWcbaGaamyAaaqabaGccqGHRaWkdaaeWbqaaiabek7aInaaBaaaleaacaWGQbGaamyAaaqabaGccaWG4bWaaSbaaSqaaiaadQgacaWGPbGaamiDaaqabaaabaGaamOAaiabg2da9iaaigdaaeaacaWGRbaaniabggHiLdGccqGHRaWkcqaH1oqzdaWgaaWcbaGaamyAaiaadshaaeqaaOGaaiilaiaabccacaWGPbGaeyypa0JaaGymaiaacYcacqWIMaYscaGGSaGaamOtaiaacYcacaqGGaGaaeyyaiaab6gacaqGKbGaaeiiaiaadshacqGH9aqpcaaIXaGaaiilaiablAciljaacYcacaWGubGaaiOlaaaa@62C3@
(3)

#### 4. All coefficients vary over time and individuals:

(4)

These four models can be classified further, depending on whether the researcher assumes that the coefficients of the model are fixed or random. However, most research in economics is restricted to estimation of (1) and (2) because they strike a reasonable balance between being general enough without introducing unnecessary assumptions that can render estimation extremely difficult.

### Estimation issues

Hsiao (2003: 27-30) discusses a convenient example of a panel data model that illustrates many of the important issues that arise with panel data. We make use of this example in what follows. Assume that we want to estimate a production function for farm production in order to determine if the farm industry exhibits increasing returns to scale. Assume the sample consists of observations for N farms over T years, giving a total sample size of NT . NT . MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOtaiaadsfaaaa@37A0@ For simplicity, we assume that the Cobb-Douglas production is an adequate description of the production process. The general form of the Cobb-Douglas production function is:

(5)

where q is output and I j I j MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysamaaBaaaleaacaWGQbaabeaaaaa@37DD@ is the quantity of the j-th input (for example, land, machinery, labor, feed, and fertilizer). The parameter, β j , β j , MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqOSdi2aaSbaaSqaaiaadQgaaeqaaOGaaiilaaaa@396A@ is the output elasticity of the j-th input; the farms exhibit constant returns to scale if the output elasticities sum to one and either increasing or decreasing returns to scale if they sum to a value greater than or less than one, respectively. is the quantity of the j-th input (for example, land, machinery, labor, feed, and fertilizer). The parameter, is the output elasticity of the j-th input; the farms exhibit constant returns to scale if the output elasticities sum to one and either increasing or decreasing returns to scale if they sum to a value greater than or less than one, respectively.

Taking the natural logarithm of (5) gives lnq=ln α 0 + β 1 ln I 1 ++ β k ln I k . lnq=ln α 0 + β 1 ln I 1 ++ β k ln I k . MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaciiBaiaac6gacaWGXbGaeyypa0JaciiBaiaac6gacqaHXoqydaWgaaWcbaGaaGimaaqabaGccqGHRaWkcqaHYoGydaWgaaWcbaGaaGymaaqabaGcciGGSbGaaiOBaiaadMeadaWgaaWcbaGaaGymaaqabaGccqGHRaWkcqWIVlctcqGHRaWkcqaHYoGydaWgaaWcbaGaam4AaaqabaGcciGGSbGaaiOBaiaadMeadaWgaaWcbaGaam4AaaqabaGccaGGUaaaaa@5060@ We can re-write this equation (adding an error term, as well as farm and year subscripts) giving:

y it = β 0 + β 1 x 1it ++ β k x kit + ε it , y it = β 0 + β 1 x 1it ++ β k x kit + ε it , MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGPbGaamiDaaqabaGccqGH9aqpcqaHYoGydaWgaaWcbaGaaGimaaqabaGccqGHRaWkcqaHYoGydaWgaaWcbaGaaGymaaqabaGccaWG4bWaaSbaaSqaaiaaigdacaWGPbGaamiDaaqabaGccqGHRaWkcqWIVlctcqGHRaWkcqaHYoGydaWgaaWcbaGaam4AaaqabaGccaWG4bWaaSbaaSqaaiaadUgacaWGPbGaamiDaaqabaGccqGHRaWkcqaH1oqzdaWgaaWcbaGaamyAaiaadshaaeqaaaaa@530D@
(6)

where y it =ln q it , y it =ln q it , MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGPbGaamiDaaqabaGccqGH9aqpciGGSbGaaiOBaiaadghadaWgaaWcbaGaamyAaiaadshaaeqaaaaa@3F01@ , β 0 =ln α 0 , β 0 =ln α 0 , MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqOSdi2aaSbaaSqaaiaaicdaaeqaaOGaeyypa0JaciiBaiaac6gacqaHXoqydaWgaaWcbaGaaGimaaqabaaaaa@3DF4@ x jit =ln I jit , x jit =ln I jit , MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiEamaaBaaaleaacaWGQbGaamyAaiaadshaaeqaaOGaeyypa0JaciiBaiaac6gacaWGjbWaaSbaaSqaaiaadQgacaWGPbGaamiDaaqabaaaaa@40B7@ for j=1,,k j=1,,k MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOAaiabg2da9iaaigdacaGGSaGaeSOjGSKaaiilaiaadUgaaaa@3C16@ and ε it ε it MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqyTdu2aaSbaaSqaaiaadMgacaWG0baabeaaaaa@39AE@ is an error term. One way to account for year and time effects is to assume:

(7)

where Fi is a measure of the unobserved farm specific effects on productivity and Pt is a measure of the unobserved changes in productivity that are the same for all farms but vary annually. Substitution of (7) into (6) gives: y it =( β 0 +λ F i +η P t )+ j=1 k β j x jit + υ it y it =( β 0 +λ F i +η P t )+ j=1 k β j x jit + υ it MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGPbGaamiDaaqabaGccqGH9aqpdaqadaqaaiabek7aInaaBaaaleaacaaIWaaabeaakiabgUcaRiabeU7aSjaadAeadaWgaaWcbaGaamyAaaqabaGccqGHRaWkcqaH3oaAcaWGqbWaaSbaaSqaaiaadshaaeqaaaGccaGLOaGaayzkaaGaey4kaSYaaabCaeaacqaHYoGydaWgaaWcbaGaamOAaaqabaGccaWG4bWaaSbaaSqaaiaadQgacaWGPbGaamiDaaqabaaabaGaamOAaiabg2da9iaaigdaaeaacaWGRbaaniabggHiLdGccqGHRaWkcqaHfpqDdaWgaaWcbaGaamyAaiaadshaaeqaaaaa@5989@ or

y it = α it + j=1 k β j x jit + υ it , y it = α it + j=1 k β j x jit + υ it , MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGPbGaamiDaaqabaGccqGH9aqpcqaHXoqydaWgaaWcbaGaamyAaiaadshaaeqaaOGaey4kaSYaaabCaeaacqaHYoGydaWgaaWcbaGaamOAaaqabaGccaWG4bWaaSbaaSqaaiaadQgacaWGPbGaamiDaaqabaaabaGaamOAaiabg2da9iaaigdaaeaacaWGRbaaniabggHiLdGccqGHRaWkcqaHfpqDdaWgaaWcbaGaamyAaiaadshaaeqaaOGaaiilaaaa@50CE@
(8)

where α it = β 0 +λ F i +η P t . α it = β 0 +λ F i +η P t . MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqySde2aaSbaaSqaaiaadMgacaWG0baabeaakiabg2da9iabek7aInaaBaaaleaacaaIWaaabeaakiabgUcaRiabeU7aSjaadAeadaWgaaWcbaGaamyAaaqabaGccqGHRaWkcqaH3oaAcaWGqbWaaSbaaSqaaiaadshaaeqaaOGaaiOlaaaa@4710@ Thus, (8) is equivalent to (2). Moreover, if we assume that η=0 , η=0 , MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4TdGMaeyypa0JaaGimaaaa@3960@ we get

y it = α i + j=1 k β j x jit + υ it , y it = α i + j=1 k β j x jit + υ it , MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGPbGaamiDaaqabaGccqGH9aqpcqaHXoqydaWgaaWcbaGaamyAaaqabaGccqGHRaWkdaaeWbqaaiabek7aInaaBaaaleaacaWGQbaabeaakiaadIhadaWgaaWcbaGaamOAaiaadMgacaWG0baabeaaaeaacaWGQbGaeyypa0JaaGymaaqaaiaadUgaa0GaeyyeIuoakiabgUcaRiabew8a1naaBaaaleaacaWGPbGaamiDaaqabaGccaGGSaaaaa@4FD5@
(9)

where α i = β 0 +λ F i . α i = β 0 +λ F i . MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqySde2aaSbaaSqaaiaadMgaaeqaaOGaeyypa0JaeqOSdi2aaSbaaSqaaiaaicdaaeqaaOGaey4kaSIaeq4UdWMaamOramaaBaaaleaacaWGPbaabeaakiaac6caaaa@4185@ Thus, (9) is equivalent to (1).

### Fixed-effects models

A natural way to make (9) operational is to introduce a dummy variable, Di, for each farm so that the intercept term becomes:

α i = α 1 + α 2 D 2 ++ α m D m = α 1 + j=2 m α j D j , α i = α 1 + α 2 D 2 ++ α m D m = α 1 + j=2 m α j D j , MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqySde2aaSbaaSqaaiaadMgaaeqaaOGaeyypa0JaeqySde2aaSbaaSqaaiaaigdaaeqaaOGaey4kaSIaeqySde2aaSbaaSqaaiaaikdaaeqaaOGaamiramaaBaaaleaacaaIYaaabeaakiabgUcaRiabl+UimjabgUcaRiabeg7aHnaaBaaaleaacaWGTbaabeaakiaadseadaWgaaWcbaGaamyBaaqabaGccqGH9aqpcqaHXoqydaWgaaWcbaGaaGymaaqabaGccqGHRaWkdaaeWbqaaiabeg7aHnaaBaaaleaacaWGQbaabeaakiaadseadaWgaaWcbaGaamOAaaqabaaabaGaamOAaiabg2da9iaaikdaaeaacaWGTbaaniabggHiLdGccaGGSaaaaa@5998@
(10)

where D j =1 D j =1 MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiramaaBaaaleaacaWGQbaabeaakiabg2da9iaaigdaaaa@39A3@ if j=i j=i MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOAaiabg2da9iaadMgaaaa@38D7@ and 0 otherwise. This substitution is equivalent to replacing the intercept term with a dummy variable for each farm and letting the farm dummy variable “sweep out” the farm-specific effects. In this specification the slope terms are the same for every farm while the intercept term is given for farm j by α 1 + α j . α 1 + α j . MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqySde2aaSbaaSqaaiaaigdaaeqaaOGaey4kaSIaeqySde2aaSbaaSqaaiaadQgaaeqaaOGaaiOlaaaa@3CDC@ Clearly, the intercept term for the first farm is equal to just α 1 . α 1 . MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqySde2aaSbaaSqaaiaaigdaaeqaaOGaaiOlaaaa@3936@ This specification is known as the fixed effect model and is estimated using ordinary least squared (OLS). We can extend the fixed-effects model to fit (8) by including a dummy variable for each time period except one.

In sum, fixed-effects models assume either (or both) that the omitted effects that are specific to cross-sectional units are constant over time or that the effects specific to time are constant over the cross-sectional units. This method is equivalent to including a dummy variable for all but one of the cross-sectional units and/or a dummy variable for all but one of the time periods.

### Random-effects models

An alternative approach to treating the α i α i MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqySde2aaSbaaSqaaiaadMgaaeqaaaaa@38AD@ in (1) as fixed constants over time is to treat it as a random variable. Returning to (1) where the intercepts vary due to individual level differences, we have y it = α i + j=1 k β k x kit + ε it . y it = α i + j=1 k β k x kit + ε it . MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGPbGaamiDaaqabaGccqGH9aqpcqaHXoqydaWgaaWcbaGaamyAaaqabaGccqGHRaWkdaaeWbqaaiabek7aInaaBaaaleaacaWGRbaabeaakiaadIhadaWgaaWcbaGaam4AaiaadMgacaWG0baabeaakiabgUcaRiabew7aLnaaBaaaleaacaWGPbGaamiDaaqabaaabaGaamOAaiabg2da9iaaigdaaeaacaWGRbaaniabggHiLdGccaGGUaaaaa@4FB9@ Treating α i α i MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqySde2aaSbaaSqaaiaadMgaaeqaaaaa@38AD@ as a random variable is equivalent to setting the model up as:

(11)

For simplicity we consider only the case when λ t =0. λ t =0. MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4UdW2aaSbaaSqaaiaadshaaeqaaOGaeyypa0JaaGimaiaac6caaaa@3B49@ Thus, the error term for (11) is ( α i + ε it ). ( α i + ε it ). MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaeWaaeaacqaHXoqydaWgaaWcbaGaamyAaaqabaGccqGHRaWkcqaH1oqzdaWgaaWcbaGaamyAaiaadshaaeqaaaGccaGLOaGaayzkaaGaaiOlaaaa@3F97@ We assume that

E( α i )=E( ε it )=0, E( α i ε it )=0, E( α i α j )={ σ α 2    if i=j 0     if ij , and E( ε it ε js )={ σ ε 2    if i=j, t=s 0     otherwise. E( α i )=E( ε it )=0, E( α i ε it )=0, E( α i α j )={ σ α 2    if i=j 0     if ij , and E( ε it ε js )={ σ ε 2    if i=j, t=s 0     otherwise. MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaacaWGfbWaaeWaaeaacqaHXoqydaWgaaWcbaGaamyAaaqabaaakiaawIcacaGLPaaacqGH9aqpcaWGfbWaaeWaaeaacqaH1oqzdaWgaaWcbaGaamyAaiaadshaaeqaaaGccaGLOaGaayzkaaGaeyypa0JaaGimaiaacYcaaeaacaWGfbWaaeWaaeaacqaHXoqydaWgaaWcbaGaamyAaaqabaGccqaH1oqzdaWgaaWcbaGaamyAaiaadshaaeqaaaGccaGLOaGaayzkaaGaeyypa0JaaGimaiaacYcaaeaacaWGfbWaaeWaaeaacqaHXoqydaWgaaWcbaGaamyAaaqabaGccqaHXoqydaWgaaWcbaGaamOAaaqabaaakiaawIcacaGLPaaacqGH9aqpdaGabaabaeqabaGaeq4Wdm3aa0baaSqaaiabeg7aHbqaaiaaikdaaaGccaqGGaGaaeiiaiaabccacaqGPbGaaeOzaiaabccacaWGPbGaeyypa0JaamOAaaqaaiaaicdacaqGGaGaaeiiaiaabccacaqGGaGaaeiiaiaabMgacaqGMbGaaeiiaiaadMgacqGHGjsUcaWGQbaaaiaawUhaaiaacYcacaqGGaGaaeyyaiaab6gacaqGKbaabaGaamyramaabmaabaGaeqyTdu2aaSbaaSqaaiaadMgacaWG0baabeaakiabew7aLnaaBaaaleaacaWGQbGaam4CaaqabaaakiaawIcacaGLPaaacqGH9aqpdaGabaabaeqabaGaeq4Wdm3aa0baaSqaaiabew7aLbqaaiaaikdaaaGccaqGGaGaaeiiaiaabccacaqGPbGaaeOzaiaabccacaWGPbGaeyypa0JaamOAaiaacYcacaqGGaGaamiDaiabg2da9iaadohaaeaacaaIWaGaaeiiaiaabccacaqGGaGaaeiiaiaabccacaqGVbGaaeiDaiaabIgacaqGLbGaaeOCaiaabEhacaqGPbGaae4CaiaabwgacaqGUaaaaiaawUhaaaaaaa@9C93@
(12)

We also assume that all of the elements of the error term are uncorrelated with the explanatory variables, x j . x j . MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiEamaaBaaaleaacaWGQbaabeaakiaac6caaaa@38C8@

The key econometric issue is that the presence of α i α i MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqySde2aaSbaaSqaaiaadMgaaeqaaaaa@38AD@ in the error term means that the correlation among the residual of the same cross-sectional unit is not zero; the error terms for one farm, for instance, are correlated with each other. Therefore, the error terms exhibit heteroskedasticity. The appropriate estimation technique is generalized-least-squares, a technique that attempts to adjust the parameter estimates (and their standard error estimates) for heteroskedasticity and autocorrelation. Alternatively one can assume that α i α i MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqySde2aaSbaaSqaaiaadMgaaeqaaaaa@38AC@ and ε it ε it MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqyTdu2aaSbaaSqaaiaadMgacaWG0baabeaaaaa@39AD@ are normally distributed and use a ML estimator. Hsiao [2003: 35-41] and Cameron and Trivedi [2005: 699-716] offer greater detail on the estimation of the parameters of both the fixed-effects and the random-effects models. It is enough for our purposes to accept that the econometricians have found a number of ways to estimate these parameters.

### Random-effects or fixed effect model?

Economists generally prefer to use fixed-effects models. The decision to use fixed-effects or random-effects does not matter when T is large because the two methods will yield the same estimates of the parameters. When the number of individual categories (N) is large and the number of time periods (T) is small, the choice of which model to use becomes unclear. Hsiao summarized this somewhat arcane issue with the following observations:

If the effects of omitted variables can be appropriately summarized by a random variable and the individual (or time) effects represent the ignorance of the investigator, it does not see reasonable to treat one source of ignorance () as fixed and the other source of ignorance () as random. It appears that one way to unify the fixed-effects and random-effects models is to assume from
the outset that the effects are random. The fixed-effects model is viewed as one in which investigators make inferences conditional on the effects that are in the sample. The random-effects model is viewed as one in which investigators make unconditional or marginal inferences with respect to the population of all effects. There is really no distinction in the “nature (of the effect).” It is up to the investigator to decide whether to make inference with respect to population characteristics or only with respect to the effects that are in the sample. Hsiao [2003: 43]

Needless to say, Hsiao’s advice may well leave many researchers without any idea of whether to use a random-effects or a fixed-effects model. In your own research I suggest that you consult an econometrician for advice.

There is one problem that arises when using a fixed-effects model. Assume that you have a sample of observations for a large number of individuals over a period of years. If you use a fixed-effects model, you will not be able to find parameter estimates for any variable like race or sex that do not change over the time period of the sample. The reason for this limitation is that the time-constant variables are perfectly correlated with the dummy variables used for the fixed-effects. A similar problem arises if the fixed-effects are for years (rather than individuals). You cannot include a variable is constant for all individuals in any given year. Quite often the individual-constant (or time-constant) variable is not of interest and nothing is lost by not having the parameter estimate. On the other hand, the random-effects model does not have this problem because the estimation makes use of differences amongst the individuals to estimate a parameter for the individual-constant variable.1 We discuss in the next section an example in which this “problem” arises.

What would be nice is if there were a statistical test that allows us to decide if the random-effects model is the appropriate model? The Hausman test offers such a statistical test. The Hausman (specification) test exploits the fact that the parameters for the random-effects model should be not be statistically different from those found using a fixed-effects specification. If one observes a chi-squared value greater than the critical value you can conclude that the parameter estimates for the random-effects model are statistically different from the parameter estimates for a model using an assumption of fixed-effects, then you can conclude that the random-effects model is misspecified. Unfortunately, the misspecification could be due to the fact that the fixed-effects model is appropriate or it could be due to the unobserved error terms being correlated with the included explanatory variables. If the latter is the case, then one might consider augmenting the model with an appropriate measure of the part of the unobserved effect that is correlated with the error term. What we are describing is that same thing that happens when omitted variables are correlated with the error term—the parameter estimates are biased. We include an example of how to use Stata to perform the Housman specification test.

### Estimation of panel data models in Stata

There are three commands that matter in setting up the panel data. The first two commands precede the regression command because they establish which variable denotes the time period and which variable denotes the cross-sectional unit. These commands are:

.iis [variable name]

.tis [variable name]

The command for estimating the fixed-effects model is:

. xtreg depvar [varlist], fe

The command for estimating the random-effects model is:

. xtreg depvar [varlist], re

If the part of the command with the comma and either re or fe is omitted, Stata will assume that you want to estimate the random-effects model.

#### Understanding Stata output

To understand the Stata output we need to return to the algebra of the model. Assume that we are fitting a model of the following form:

y it =α+ j=1 k β j x jit + ν i + ε it , i=1,,N, and t=1,,T. y it =α+ j=1 k β j x jit + ν i + ε it , i=1,,N, and t=1,,T. MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGPbGaamiDaaqabaGccqGH9aqpcqaHXoqycqGHRaWkdaaeWbqaaiabek7aInaaBaaaleaacaWGQbaabeaakiaadIhadaWgaaWcbaGaamOAaiaadMgacaWG0baabeaaaeaacaWGQbGaeyypa0JaaGymaaqaaiaadUgaa0GaeyyeIuoakiabgUcaRiabe27aUnaaBaaaleaacaWGPbaabeaakiabgUcaRiabew7aLnaaBaaaleaacaWGPbGaamiDaaqabaGccaGGSaGaaeiiaiaadMgacqGH9aqpcaaIXaGaaiilaiablAciljaacYcacaWGobGaaiilaiaabccacaqGHbGaaeOBaiaabsgacaqGGaGaamiDaiabg2da9iaaigdacaGGSaGaeSOjGSKaaiilaiaadsfacaGGUaaaaa@646F@
(13)

We can sum (13) over t (holding the individual unit constant) and divide by T to get:

y ¯ i =α+ j=1 k β j x ¯ ji + ν i + ε ¯ i , y ¯ i =α+ j=1 k β j x ¯ ji + ν i + ε ¯ i , MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmyEayaaraWaaSbaaSqaaiaadMgaaeqaaOGaeyypa0JaeqySdeMaey4kaSYaaabCaeaacqaHYoGydaWgaaWcbaGaamOAaaqabaGcceWG4bGbaebadaWgaaWcbaGaamOAaiaadMgaaeqaaaqaaiaadQgacqGH9aqpcaaIXaaabaGaam4AaaqdcqGHris5aOGaey4kaSIaeqyVd42aaSbaaSqaaiaadMgaaeqaaOGaey4kaSIafqyTduMbaebadaWgaaWcbaGaamyAaaqabaGccaGGSaaaaa@4FAC@
(14)

(15)

Equations (13), (14), and (15) are the basis of Stats’s estimates of the parameters of the model. In particular, the command xtreg, fe uses OLS to estimate (15); this is known as the fixed-effects estimator (or the within estimator). The command xtreg, be uses OLS to estimate (14) and is known as the between estimator. The command xtreg, re—the random-effects estimator—is a weighted average of the between and within estimators, where the weight is a function of the variances of and ( and respectively).2

In general, you will not make use of the between estimator. However, these three equations do lie at the basis of the goodness-of-fit measures that Stata reports. In particular, Stata output reports three “R-squareds”3—the overall-R2 the between-R2 and the within-R2 These three R-squareds are derived using one of the three equations. In particular, the overall-R2 uses (13); the between-R2 uses (14); and the within-R2 uses (15).

#### Example 1: A panel data analysis using Stata

In this example we follow the example offered in the Stata manual and use a large data set from the National Longitudinal Survey of wage data on 28,534 women who were between 14 and 26 years of age in 1968. The women were surveyed in each of the 21 years between 1968 and 1988 except for the six years 1974, 1976, 1979, 1981, 1984, and 1986. The study is focused on the determinants of wage levels, as measured by the natural logarithm of real wages.

Figure 1 shows the commands used to put the data into Stata. The first command (set memory 5m) increases the size of the memory that the program uses; I did this because of the large sample size. The use command accesses that data from the Stata web site. The describe command calls up a description of the variables. Figure 2 presents a summary of the data using the command summerize.

Additionally, because race is a categorical variable that has three potential values—1 if white, 2 if black, and 3 otherwise—we have to create a dummy variable in order to use this variable. The transformations we use are shown in Figure 3.

The last step before estimating the regressions is to identify the data set as a panel data. shows the two commands that must be entered in order for Stata to know that idcode is the individual category and that year is the time series variable. Figure 4 shows these two commands.

We are now ready to estimate the model (the natural logarithm of wages as a function of various variables). We begin with the random-effects model. Figure 5 shows the command and the results of the estimation of the random-effects model. There are several things to note here. First, in the command we are able to refer to all variables that have age in them by using age*, the * tells Stata to use and variable that begins with the letters age. Second, we will need to use the estimation results in the Hausman test. Thus, we have stored these results in “random_effects” using the command estimates store random_effects.

Notice that three R-squared values are reported in Figure 5. Also, wages reach a peak when the woman is 0.036806 2( 0.0007133 ) =25.7998 0.036806 2( 0.0007133 ) =25.7998 MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyOeI0YaaSaaaeaacaaIWaGaaiOlaiaaicdacaaIZaGaaGOnaiaaiIdacaaIWaGaaGOnaaqaaiaaikdadaqadaqaaiabgkHiTiaaicdacaGGUaGaaGimaiaaicdacaaIWaGaaG4naiaaigdacaaIZaGaaG4maaGaayjkaiaawMcaaaaacqGH9aqpcaaIYaGaaGynaiaac6cacaaI3aGaaGyoaiaaiMdacaaI4aaaaa@4CCE@ years old and after 9.795857 years on the job. The interpretation of the other variables demands a bit of algebra. For instance, the fact that black is a dummy variable affects our interpretation; when an individual is a black, her wage level is: ln w B = β 0 + β 1 +. ln w B = β 0 + β 1 +. MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaciiBaiaac6gacaWG3bWaaSbaaSqaaiaadkeaaeqaaOGaeyypa0JaeqOSdi2aaSbaaSqaaiaaicdaaeqaaOGaey4kaSIaeqOSdi2aaSbaaSqaaiaaigdaaeqaaOGaey4kaSIaeS47IWKaaiOlaaaa@445D@ When she is nonblack, her wage level is ln w NB = β 0 +. ln w NB = β 0 +. MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaciiBaiaac6gacaWG3bWaaSbaaSqaaiaad6eacaWGcbaabeaakiabg2da9iabek7aInaaBaaaleaacaaIWaaabeaakiabgUcaRiabl+Uimjaac6caaaa@41BC@ Thus, we have: ln w B ln w NB = β 1 ln w B ln w NB = β 1 MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaciiBaiaac6gacaWG3bWaaSbaaSqaaiaadkeaaeqaaOGaeyOeI0IaciiBaiaac6gacaWG3bWaaSbaaSqaaiaad6eacaWGcbaabeaakiabg2da9iabek7aInaaBaaaleaacaaIXaaabeaaaaa@42FB@ or w B w NB = e β 1 = e 0.0530532 =0.94833. w B w NB = e β 1 = e 0.0530532 =0.94833. MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaSaaaeaacaWG3bWaaSbaaSqaaiaadkeaaeqaaaGcbaGaam4DamaaBaaaleaacaWGobGaamOqaaqabaaaaOGaeyypa0JaamyzamaaCaaaleqabaGaeqOSdi2aaSbaaWqaaiaaigdaaeqaaaaakiabg2da9iaadwgadaahaaWcbeqaaiabgkHiTiaaicdacaGGUaGaaGimaiaaiwdacaaIZaGaaGimaiaaiwdacaaIZaGaaGOmaaaakiabg2da9iaaicdacaGGUaGaaGyoaiaaisdacaaI4aGaaG4maiaaiodacaGGUaaaaa@5001@ Thus, the wage level of a black is, everything else held constant, 94.8 percent of the wage level of a nonblack.

If we assume that grade is a continuous variable (it really is not), we have the following interpretation of the parameter: lnw= β 0 + β 1 grade+ lnw= β 0 + β 1 grade+ MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaciiBaiaac6gacaWG3bGaeyypa0JaeqOSdi2aaSbaaSqaaiaaicdaaeqaaOGaey4kaSIaeqOSdi2aaSbaaSqaaiaaigdaaeqaaOGaam4zaiaadkhacaWGHbGaamizaiaadwgacqGHRaWkcqWIVlctaaa@474A@ implies that 1 w w grade = β 1 1 w w grade = β 1 MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaSaaaeaacaaIXaaabaGaam4DaaaadaWcaaqaaiabgkGi2kaadEhaaeaacqGHciITcaWGNbGaamOCaiaadggacaWGKbGaamyzaaaacqGH9aqpcqaHYoGydaWgaaWcbaGaaGymaaqabaaaaa@43BC@ . Thus, in our case a increase of 1 year of schooling causes wages to increase by 6.46 percent.

We can compare the results of using the re option with using the mle option (which directs Stata to use maximum likelihood techniques to estimate the parameters of the system. The mle parameter estimates, shown in Figure 6, are the same as those generated using the re command. However, the estimates of the standard errors (and, thus, the z-values) are different.

The estimation of the fixed-effects model is straightforward and is shown in Figure 7. The command is the same as in the random-effects model but with the re replaced by fe. Notice from the results that the variables grade and black are dropped from the estimation results. They are dropped because the amount of schooling and race of an individual is fixed over all observations. These two variables, thus, are perfectly correlated with the dummy variables that hold constant the individual level characteristics. The effects of education and race differences are absorbed into the residual.

The estimates of the parameter values for the fixed-effects model are very similar to those found for the random-effects model with the exception for the parameters associated with not living in an SMSA (not_smsa) and with living in the South (south). The random-effects model suggests that the wage level for someone living outside of a SMSA is 87.6 percent of the wage level of someone living in an SMSA; in the fixed-effects model, the wage level outside the SMSA is estimated to be 91.5 percent of the wage level of a woman living in a SMSA. The random-effects model estimates wages in the South are 91.6 percent the level of wages outside the South; the fixed-effects model fixes this wage premium at 91.6 percent.

The final issue we discuss in this example is the Hausman specification test. If the model is correctly specified and if νi is uncorrelated with the explanatory variables, then the parameter estimates in the two models should not be statistically different. As shown in Figure 8, we first must same the results of the fixed-effects estimation using the command estimates store fixed_effects. The null hypothesis is that the the difference in that parameter estimates is not systematic. The appropriate test statistic is the χ2(8), where the degrees of freedom are equal to the number of parameters in the model (8). The chi-squared statistic of 149.44 is greater than the critical value and we must reject the null hypothesis. The Stata offers this interpretation of this result:

What does this mean? We have an unpleasant choice: we can admit that our model is misspecified—that we have not parameterized it correctly—or we can hold to our specification
being correct, in which case the observed differences must be due to the zero-correlation of and the assumption. [StataCorp: 202]

### Exercises

#### Exercise 1

Estimation of a Labor Supply Function. An important issue in labor economics is the responsiveness of the number of hours worked to wages. Because labor supply curves can, in theory, be backward-bending, the sign and size of the impact of wages on the amount of labor supplied is an empirical issue. In this project you are to estimate the demand for labor curve for a cross-section of adult males.

#### The model to be estimated is:

y it = β 0 + β 1 h it + β 2 Ag e it + β 3 Ag e it 2 + β 4 N C it + β 5 H I it + ε it y it = β 0 + β 1 h it + β 2 Ag e it + β 3 Ag e it 2 + β 4 N C it + β 5 H I it + ε it MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaadMhadaWgaaWcbaGaamyAaiaadshaaeqaaOGaeyypa0JaeqOSdi2aaSbaaSqaaiaaicdaaeqaaOGaey4kaSIaeqOSdi2aaSbaaSqaaiaaigdaaeqaaOGaamiAamaaBaaaleaacaWGPbGaamiDaaqabaGccqGHRaWkcqaHYoGydaWgaaWcbaGaaGOmaaqabaGccaWGbbGaam4zaiaadwgadaWgaaWcbaGaamyAaiaadshaaeqaaOGaey4kaSIaeqOSdi2aaSbaaSqaaiaaiodaaeqaaOGaamyqaiaadEgacaWGLbWaa0baaSqaaiaadMgacaWG0baabaGaaGOmaaaakiabgUcaRiabek7aInaaBaaaleaacaaI0aaabeaakiaad6eacaWGdbWaaSbaaSqaaiaadMgacaWG0baabeaakiabgUcaRiabek7aInaaBaaaleaacaaI1aaabeaakiaadIeacaWGjbWaaSbaaSqaaiaadMgacaWG0baabeaakiabgUcaRiabew7aLnaaBaaaleaacaWGPbGaamiDaaqabaaaaa@672E@
(16)

where:

y it = natural logarithm of individual i’s wage rate in year t,

h it = natural logarithm of total number of hours worked by individual i in year t,

Age it = age of individual i in year t,

NC it = number of children of individual i in year t, and

HI it = an dummy variable equal to 1 if individual i in year t has bad health and 0 otherwise.

The data are from Ziliak, James P. (1997) “Efficient Estimation with Panel Data When Instruments Are Predetermined: An Empirical Comparison of Moment-Condition Estimators,” Journal of Business & Economic Statistics 15(4): 419-431. Ziliak (p. 423) describes his data as follows:

The data used to estimate the life-cycle labor-supply parameters come from Waves XII-XXI (calendar years 1978-1987) of the PSID. The sample is selected on many dimensions and is similar to other research studying life-cycle models of labor supply. The sample is restricted to continuously married, continuously working, prime-age men aged 22-51 in 1978 from the Survey Research Center random subsample of the PSID. In addition the individual must either be paid an hourly wage rate or must be salaried, and he cannot be a piece-rate worker or self-employed. This selection process resulted in a balanced panel of 532 men over 10 years or 5,320 observations. The real wage rate, wit,. is the hourly wage reported by the panel participant rather than the average wage (annual earnings over annual hours) to minimize division bias (Borjas 1981).

The data are available in the any of the three files MOM.dat, MOM.doc, and MOM.wks.

1. Provide scatter plots among the dependent variable (Natural logarithm of hours) against each of the explanatory variables Natural logarithm of real wages, Age, Number of children, and Health. (Label these Figures 1 to 4.)
2. Present a table of the summary statistics for all of the variables in this data set (except ID and Year).
3. Provide a histogram of each of the following variables: Natural logarithm of hours, Natural logarithm of real wages, Age, and Number of children. (Label these Figures 5 to 8).
4. Estimate Equation (1) using (1) OLS (sometimes called a “pooled model”), (2) a “between” model (where the observations in the regression are the averages over the 10 years of each variable for each individual, (3) a fixed effects model, (4) a MLE random effects model and (5) a GLS random effects model. Present the results of your estimations in a single table and offer an interpretation for each parameter you estimate. Use Table 1 as shown below as a template for the table to present your results.
 (1) Pooled (2) Between (3) Fixed Effects (4) MLE Random Effects (5) GLS Random Effects Natural logarithm of real wages ( ) ( ) ( ) ( ) ( ) Age ( ) ( ) ( ) ( ) ( ) Age2 ( ) ( ) ( ) ( ) ( ) Number of children ( ) ( ) ( ) ( ) ( ) Health indicator ( ) ( ) ( ) ( ) ( ) Intercept ( ) ( ) ( ) ( ) ( ) R2 — — σμ — — σε — — Sample size

#### Exercise 2

The Effectiveness of Advertising Bans on Smoking. Anti-smoking activists often push for a total ban on cigarette advertisements. Indeed, one of the basic assumptions of the groups pushing the 1996 proposed settlement with the tobacco companies is that the amount of tobacco consumed is positively affected by the amount of tobacco advertising. There are two mechanisms that might underlie such a relationship. The first mechanism suggests that the advertising increases the amount of cigarettes smoked by current smokers. Many economists doubt that the tobacco advertising increases the consumption of current smokers, arguing that the total consumption of cigarettes is unresponsive to advertisement. Instead, they argue that advertising is an effort by cigarette companies to affect the brand of cigarettes that current smokers consume. The second mechanism suggests that advertising is an effort by cigarette companies to induce non-smokers (especially children) to try cigarettes. The main reason that cigarette companies want non-smokers to try smoking, so the argument goes, is that some percentage of non-smokers who try cigarettes will become addicted and will form the future demand for cigarettes.

The effect of a total ban on advertising would be completely different if cigarette companies advertise with the hope of increasing the number of people addicted to cigarettes. In particular, the ban should have a small or negligible effect on current cigarette demand. Instead, the cigarette companies would face a steadily decreasing demand for their product. Such a decrease in demand would reduce future profits for these companies. If future profits fell enough, some of the companies might be forced out of business. Clearly, it is this result that anti-smoking activists have in mind with their proposals to ban cigarette advertisements.

Finally, if advertising only induces current smokers to increase the number of cigarettes they consume, then the total ban on advertising should cause a one-time reduction in cigarette consumption that will reduce the profits of cigarette companies. However, which of these three mechanisms (if any) is correct is an empirical question.

Six European countries adopted a complete ban on cigarette advertising in the period after 1970. It this project we use annual data on smoking consumption in 22 developed countries for the 27 years between 1964 and 1990 to test the effect of a complete smoking ban on cigarette demand (giving us 594 observations). Moreover, since we have no a priori reason to choose one model specification over another, we check the stability of the estimated impact of an advertising ban on cigarette demand under several alternative model specifications.

We estimate three types of specifications of the model — the linear model, the log-linear model, and the log-log model. In general whether one uses a variable or the logarithm of the variable is the main difference in these three specifications. The linear model does not transform either the dependent or the independent variables. A variation on the linear models allows the use of the square and product of some of the independent variables in order to take care of any non-linearity in the data. The log-linear model takes the same form as the linear model except that the dependent variable is the logarithm of variable under study. Finally, in the log-log model both the dependent and independent variables are, if possible, in logarithm form.

For example, for this problem the dependent variable in any of these specifications is either the per capita consumption of tobacco or the logarithm of the per capita consumption of tobacco. The dependent variables might include (1) the real price of tobacco in each country for each year, (2) a measure of the per capita income level of the country for each year, (3) the unemployment rate of the country for each year, (4) a measure of the age distribution of the population to measure smoking intensity by age, (5) a trend variable to account for the rising awareness of the health costs of smoking, (6) a dummy variable equal to one for years that a country has a complete ban on cigarette advertising, and (7) a set of 21 dummy variables identifying the country. Let Tit be the measure of per capita cigarette consumption in country i for year t; Pit, the price of tobacco; Iit, the measure of per capita income level; Uit, country i’s unemployment rate in year t; Ait, country i’s age distribution in year t; Year, a trend variable; Bit, the dummy variable for the ban; and Ci, the dummy variable for country i.

#### Examples of the three models are:

1. Linear: T it = β 0 + β 1 P it + β 2 I it +β U it + β 4 A it + β 5 Yea r t + β 6 B it + ε it T it = β 0 + β 1 P it + β 2 I it +β U it + β 4 A it + β 5 Yea r t + β 6 B it + ε it MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaadsfadaWgaaWcbaGaamyAaiaadshaaeqaaOGaeyypa0JaeqOSdi2aaSbaaSqaaiaaicdaaeqaaOGaey4kaSIaeqOSdi2aaSbaaSqaaiaaigdaaeqaaOGaamiuamaaBaaaleaacaWGPbGaamiDaaqabaGccqGHRaWkcqaHYoGydaWgaaWcbaGaaGOmaaqabaGccaWGjbWaaSbaaSqaaiaadMgacaWG0baabeaakiabgUcaRiabek7aIjaadwfadaWgaaWcbaGaamyAaiaadshaaeqaaOGaey4kaSIaeqOSdi2aaSbaaSqaaiaaisdaaeqaaOGaamyqamaaBaaaleaacaWGPbGaamiDaaqabaGccqGHRaWkcqaHYoGydaWgaaWcbaGaaGynaaqabaGccaWGzbGaamyzaiaadggacaWGYbWaaSbaaSqaaiaadshaaeqaaOGaey4kaSIaeqOSdi2aaSbaaSqaaiaaiAdaaeqaaOGaamOqamaaBaaaleaacaWGPbGaamiDaaqabaGccqGHRaWkcqaH1oqzdaWgaaWcbaGaamyAaiaadshaaeqaaaaa@6854@
2. Log-Linear: ln( T it )= β 0 + β 1 P it + β 2 I it +β U it + β 4 A it + β 5 Yea r t + β 6 B it + ε it ln( T it )= β 0 + β 1 P it + β 2 I it +β U it + β 4 A it + β 5 Yea r t + β 6 B it + ε it MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiGacYgacaGGUbWaaeWaaeaacaWGubWaaSbaaSqaaiaadMgacaWG0baabeaaaOGaayjkaiaawMcaaiabg2da9iabek7aInaaBaaaleaacaaIWaaabeaakiabgUcaRiabek7aInaaBaaaleaacaaIXaaabeaakiaadcfadaWgaaWcbaGaamyAaiaadshaaeqaaOGaey4kaSIaeqOSdi2aaSbaaSqaaiaaikdaaeqaaOGaamysamaaBaaaleaacaWGPbGaamiDaaqabaGccqGHRaWkcqaHYoGycaWGvbWaaSbaaSqaaiaadMgacaWG0baabeaakiabgUcaRiabek7aInaaBaaaleaacaaI0aaabeaakiaadgeadaWgaaWcbaGaamyAaiaadshaaeqaaOGaey4kaSIaeqOSdi2aaSbaaSqaaiaaiwdaaeqaaOGaamywaiaadwgacaWGHbGaamOCamaaBaaaleaacaWG0baabeaakiabgUcaRiabek7aInaaBaaaleaacaaI2aaabeaakiaadkeadaWgaaWcbaGaamyAaiaadshaaeqaaOGaey4kaSIaeqyTdu2aaSbaaSqaaiaadMgacaWG0baabeaaaaa@6BC1@
3. Log-Log: ln( T it )= β 0 + β 1 ln( P it )+ β 2 ln( I it )+β U it + β 4 A it + β 5 Yea r t + β 6 B it + ε it ln( T it )= β 0 + β 1 ln( P it )+ β 2 ln( I it )+β U it + β 4 A it + β 5 Yea r t + β 6 B it + ε it MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiGacYgacaGGUbWaaeWaaeaacaWGubWaaSbaaSqaaiaadMgacaWG0baabeaaaOGaayjkaiaawMcaaiabg2da9iabek7aInaaBaaaleaacaaIWaaabeaakiabgUcaRiabek7aInaaBaaaleaacaaIXaaabeaakiGacYgacaGGUbWaaeWaaeaacaWGqbWaaSbaaSqaaiaadMgacaWG0baabeaaaOGaayjkaiaawMcaaiabgUcaRiabek7aInaaBaaaleaacaaIYaaabeaakiGacYgacaGGUbWaaeWaaeaacaWGjbWaaSbaaSqaaiaadMgacaWG0baabeaaaOGaayjkaiaawMcaaiabgUcaRiabek7aIjaadwfadaWgaaWcbaGaamyAaiaadshaaeqaaOGaey4kaSIaeqOSdi2aaSbaaSqaaiaaisdaaeqaaOGaamyqamaaBaaaleaacaWGPbGaamiDaaqabaGccqGHRaWkcqaHYoGydaWgaaWcbaGaaGynaaqabaGccaWGzbGaamyzaiaadggacaWGYbWaaSbaaSqaaiaadshaaeqaaOGaey4kaSIaeqOSdi2aaSbaaSqaaiaaiAdaaeqaaOGaamOqamaaBaaaleaacaWGPbGaamiDaaqabaGccqGHRaWkcqaH1oqzdaWgaaWcbaGaamyAaiaadshaaeqaaaaa@729B@

In models (1) and (2) it is possible to include additional explanatory variables that are the square of some of the currently included explanatory variables. In all three models it is possible to include as explanatory variables the product of the ban dummy and any of the currently included explanatory variables. Finally, in equation (2) we cannot take the logarithm of the unemployment rate because the data we have report zero levels of unemployment.

The data you will use in this project are in the MS Excel file Smkdata.xls. The variables included in the file are as follows:

 Column Variable Definition A Country Name of country B Country ID Integar from 1 to 22, each designating a country C Year Year of observation (1964, …, 1990) D Tobacco Total grams of tobacco sold per individual 15 years or older E Price Real price of 20 grams of tobacco in 1990 US cents (= Nominal price per E 20 grams of tobacco divided by the Gross Domestic Price deflator) F Consump Per capita private final consumption expenditures in 1990 US dollars G Unemp Number of unemployed persons per 1000 members of the workforce H AgeDist Age distribution. This variable attempts to measure the differences in intensity of smoking as a function of age. It is equal to the relative consumption rate of tobacco in the UK observed between 1966 and 1981 by age group times the percentage of the population in the country in that age group. I Ban Dummy variable equal to 1 if the country has a complete ban on tobacco advertising. The six countries in the sample with a complete ban and the first year of the ban are: Iceland (1972), Norway (1976), Finland (1979), Portugal (1984), Italy (1984), and Canada (1989). J BanTime The number of years since the ban was put in place (if ban went into effect in 1972, then years 1964-1972 are equal to 0, year 1973 equals 1, year 1974 equals 2, etc.)

(a) How do these variables match the ones suggested in the discussion of equations (1), (2), and (3)?

(b) Estimate the fixed effects models of the following versions of equations (1), (2), and (3):

1. Equations (1), (2), and (3) as specified above.
2. Equations (1) and (2) with squared terms for the price, income, unemployment rate, and the age distribution included. This regression is designed to test for non-linearity.
3. Equations (1) and (2) with the squared terms mentioned in 2 that are statistically significant plus the following new variables: Ban*Time, Ban*Price, and Ban*Consump. (You must create these variables) This regression allows for an effect of the Ban on the slopes of the other explanatory variables.
4. Equation (3) with the following new variables: Ban*Log(Time), Ban*Log(Price), and Ban*Log(Consump).
5. Equations (1), (2), and (3) as estimated in 3 and 4 with a variable that counts the number of years that a total ban has been in effect (BanTime) and its square (BanTime2). This regression allows for a changing impact of a ban the longer it is in effect.

Report the results of your regressions in a table that allows you to comment on the stability of your estimation results over specifications.

(c) Do these results support any of the theories suggested above?

(d) What, if any, policy conclusions would you make given your estimations?

(e) Assume for the moment that you “believe” your results you got in (5). Sketch out a strategy you would follow to forecast the impact of a ban in a country that does not currently have a ban.

Note: The data in this problem are from Stewart, Michael J. (1993) “The Effect on Tobacco Consumption of Advertising Bans in OECD Countries,” International Journal of Advertising 12(2): 155-180. The data set can be downloaded from the author's website.

### Bibliography

Cameron, A. Colin and Pravin K. Trivedi (2005). Microeconometrics: Methods and Applications (New York: Cambridge University Press).

Greene, W. H. (2003). Econometric Analysis, 5th edition (Upper Saddle River, NJ: Prentice-Hall).

Hsiao, Cheng (2003). Analysis of Panel Data, 2nd Edition (New York: Cambridge University Press).

StataCorp (2003). Stata Statistical Software: Release8.0 (College Station, TX: Stata Corporation).

Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data (Cambridge, MA: MIT Press).

## Footnotes

1. Another way to think about this point is to remember that, unlike the fixed-effects model, the random-effects does not use dummy variables to summarized the unknown characteristics; thus, there is no problem with multicollinearity.
2. See Cameron and Trivedi (2005: 705] for a detailed discussion of the random-effects estimator.
3. R-squared is in quotes in this line because these R-squareds do not have all the properties of OLS R-squareds.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks