Connexions

You are here: Home » Content » Logit and Probit Regressions

Recently Viewed

This feature requires Javascript to be enabled.

Logit and Probit Regressions

Module by: Christopher Curran. E-mail the author

Summary: This module reviews the basic concepts needed to estimate and understand logit and probit regressions using Stata. It is intended for advanced undergraduates.

Logit and Probit models

Introduction

Consider a model that “explains” whether a wife enters the work force. It is straight forward to think of potential explanatory variables—her potential wage rate, the income of her partner, the number of children under the age of 6 in the household, and the number of children in the household between the ages of 6 and 18 are candidates to be independent variables used to explain the wife’s decision to enter the labor force. The dependent variable, Y, however, is a dummy variable because the wife chooses either to enter the labor force ( Y=1 ) ( Y=1 ) MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaamaabmaabaGaamywaiabg2da9iaaigdaaiaawIcacaGLPaaaaaa@3A10@ or not to enter the labor force ( Y=0 ). ( Y=0 ). MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaamaabmaabaGaamywaiabg2da9iaaicdaaiaawIcacaGLPaaacaGGUaaaaa@3AC1@ An OLS model of the form:

(1)

does not make sense. Figure 1 shows what the data of this model might look like when graphed against one of the explanatory variables. Figure 1 also includes the regression line that an OLS estimation of (1) will yield. It is easy to see one problem with this approach—the predicted values of Y that can be greater than 1 and less than 0. In addition, special properties must be attributed to the error term and it is the simple properties ascribed to the error term that make the OLS model so attractive.1

The logit model

We can simplify our analysis by using a bit of algebra. First, the inverse probability is 1Pr( z )=1 e z 1+ e z = 1 1+ e z . 1Pr( z )=1 e z 1+ e z = 1 1+ e z . MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaGymaiabgkHiTiGaccfacaGGYbWaaeWaaeaacaWG6baacaGLOaGaayzkaaGaeyypa0JaaGymaiabgkHiTmaalaaabaGaamyzamaaCaaaleqabaGaamOEaaaaaOqaaiaaigdacqGHRaWkcaWGLbWaaWbaaSqabeaacaWG6baaaaaakiabg2da9maalaaabaGaaGymaaqaaiaaigdacqGHRaWkcaWGLbWaaWbaaSqabeaacaWG6baaaaaakiaac6caaaa@4ACA@ Thus,

Pr( z ) 1Pr( z ) = e z 1+ e z 1 1+ e z = e z . Pr( z ) 1Pr( z ) = e z 1+ e z 1 1+ e z = e z . MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaSaaaeaaciGGqbGaaiOCamaabmaabaGaamOEaaGaayjkaiaawMcaaaqaaiaaigdacqGHsislciGGqbGaaiOCamaabmaabaGaamOEaaGaayjkaiaawMcaaaaacqGH9aqpdaWcaaqaamaalaaabaGaamyzamaaCaaaleqabaGaamOEaaaaaOqaaiaaigdacqGHRaWkcaWGLbWaaWbaaSqabeaacaWG6baaaaaaaOqaamaalaaabaGaaGymaaqaaiaaigdacqGHRaWkcaWGLbWaaWbaaSqabeaacaWG6baaaaaaaaGccqGH9aqpcaWGLbWaaWbaaSqabeaacaWG6baaaOGaaiOlaaaa@4FB6@
(2)

(3)

We can estimate the parameters of this model using maximum likelihood methods. In the probit model the error term is assumed to be normally distributed with a mean of zero and a unit variance.3 In the logit model the error term is assumed to have a standardized logistic distribution. This distribution has a mean of 0 and a variance of 1 and is very similar to a normal distribution with the same mean and variance.4 While the choice of which model to use generally is personal, it should be noted that the ratio of the parameter of a logit model to the parameter of a probit model (using the same data set) usually varies between 1.6 and 2.0. We focus on the logit model in the balance of this discussion.

Interpretation of the logit model parameters

The interpretation of the economic meaning of the parameter values in a logit model is not very obvious.5 One simple, but not often used, interpretation comes from taking the first-derivative of (3) with respect to x:

ln( odds Y=1 )= β 0 + β 1 x+ε ln( odds Y=1 ) x = β 1 . ln( odds Y=1 )= β 0 + β 1 x+ε ln( odds Y=1 ) x = β 1 . MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaciiBaiaac6gadaqadaqaaiaab+gacaqGKbGaaeizaiaabohacaqGGaGaamywaiabg2da9iaaigdaaiaawIcacaGLPaaacqGH9aqpcqaHYoGydaWgaaWcbaGaaGimaaqabaGccqGHRaWkcqaHYoGydaWgaaWcbaGaaGymaaqabaGccaWG4bGaey4kaSIaeqyTduMaaeiiaiaabccacqGHshI3caqGGaGaaeiiamaalaaabaGaeyOaIyRaciiBaiaac6gadaqadaqaaiaab+gacaqGKbGaaeizaiaabohacaqGGaGaamywaiabg2da9iaaigdaaiaawIcacaGLPaaaaeaacqGHciITcaWG4baaaiabg2da9iabek7aInaaBaaaleaacaaIXaaabeaakiaac6caaaa@625A@
(4)

Thus, in the labor force participation model one interpretation is that β 1 β 1 MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqOSdi2aaSbaaSqaaiaaigdaaeqaaaaa@387C@ is equal to the change in the natural logarithm of the odds that the wife is in the labor force due to a one unit change in the independent variable x. This interpretation is both awkward and not really economically informative.

Stata offers two command for estimating a logit regression—logit and logistic. The logit command returns the parameter estimates as shown in (3). The logistic command returns the odds ratio rather than the parameter estimates. The odds ratio is equal to e β 1 e β 1 MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyzamaaCaaaleqabaGaeqOSdi2aaSbaaWqaaiaaigdaaeqaaaaaaaa@3994@ . Thus, one can go from the odds ratio reported by the logistic command to the parameter estimates merely by taking the natural logarithm of the odds ratio. The interpretation of the odds ratio is straightforward. For example, assume that y=1 y=1 MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEaiabg2da9iaaigdaaaa@38B3@ means that the birth weight of an individual is less than 2,500 grams and y=0 y=0 MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEaiabg2da9iaaicdaaaa@38B2@ means that the birth weight is greater than 2,500 grams. A logit parameter estimate of -0.27 is equivalent to an odds ratio of 0.97 (i.e., e 0.27 =0.97 e 0.27 =0.97 MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyzamaaCaaaleqabaGaeyOeI0IaaGimaiaac6cacaaIYaGaaG4naaaakiabg2da9iaaicdacaGGUaGaaGyoaiaaiEdaaaa@3EE1@ ). An odds ratio of 0.97 means that odds of a baby being underweight are 0.97 times those of the odds of a baby being of normal weight. To see what is being said re-write (2.3) as:

Pr( x ) 1Pr( x ) = e β 0 + β 1 x+ε . Pr( x ) 1Pr( x ) = e β 0 + β 1 x+ε . MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaSaaaeaaciGGqbGaaiOCamaabmaabaGaamiEaaGaayjkaiaawMcaaaqaaiaaigdacqGHsislciGGqbGaaiOCamaabmaabaGaamiEaaGaayjkaiaawMcaaaaacqGH9aqpcaWGLbWaaWbaaSqabeaacqaHYoGydaWgaaadbaGaaGimaaqabaWccqGHRaWkcqaHYoGydaWgaaadbaGaaGymaaqabaWccaWG4bGaey4kaSIaeqyTdugaaOGaaiOlaaaa@4CB7@

A one unit change in x implies that:

or

or

Thus, e β 1 e β 1 MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyzamaaCaaaleqabaGafqOSdiMbambadaWgaaadbaGaaGymaaqabaaaaaaa@39AD@ is equal to the percent change in the odds that y equals 1 (a baby is born underweight) due to a one unit change in x. The logistic command reports e β 1 e β 1 MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyzamaaCaaaleqabaGafqOSdiMbambadaWgaaadbaGaaGymaaqabaaaaaaa@39AD@ while the logit command reports β 1 . β 1 . MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGafqOSdiMbambadaWgaaWcbaGaaGymaaqabaGccaGGUaaaaa@3951@ Because of the ease of interpretation of the odds ratio, Stata argues that the logistic command is the proper one to use.

Elasticities

Another route to follow is to try to find something that can be interpreted as an elasticity. Elasticities are important enough topic in economics for us to discuss them here in some detail. The reason they are so attractive to economists is that they have no units and, thus, can be compared across different commodities. For instance, it is quite reasonable to compare the demand elasticity for apples with the demand elasticity for pearl necklaces in spite of the fact that the units of measuring apples and necklaces are different. There are a few important ways that elasticities appear in regressions.

Linear regression elasticities

In a linear regression of the form (ignoring the subscripts and the error term)

Y= β 0 + β 1 x, Y= β 0 + β 1 x, MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamywaiabg2da9iabek7aInaaBaaaleaacaaIWaaabeaakiabgUcaRiabek7aInaaBaaaleaacaaIXaaabeaakiaadIhacaGGSaaaaa@3F89@

we would calculate the elasticity of Y with respect to x to be

η Yx = x Y Y x = β 1 x Y . η Yx = x Y Y x = β 1 x Y . MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4TdG2aaSbaaSqaaiaadMfacaWG4baabeaakiabg2da9maalaaabaGaamiEaaqaaiaadMfaaaWaaSaaaeaacqGHciITcaWGzbaabaGaeyOaIyRaamiEaaaacqGH9aqpcqaHYoGydaWgaaWcbaGaaGymaaqabaGcdaWcaaqaaiaadIhaaeaacaWGzbaaaiaac6caaaa@478D@

Clearly, researchers need to choose the levels of Y and x at which to report this elasticity; it is traditional to calculate the elasticity at the means. Thus, economists typically report

η Yx = β 1 x ¯ Y ¯ . η Yx = β 1 x ¯ Y ¯ . MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4TdG2aaSbaaSqaaiaadMfacaWG4baabeaakiabg2da9iabek7aInaaBaaaleaacaaIXaaabeaakmaalaaabaGabmiEayaaraaabaGabmywayaaraaaaiaac6caaaa@4015@

Constant elasticities

Consider the following demand equation:

(5)

where q is the quantity demanded, p is the price the good is sold at, α,β>0, α,β>0, MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiabeg7aHjaacYcacqaHYoGycqGH+aGpcaaIWaGaaiilaaaa@3C4B@ and ε ε MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqyTdugaaa@379B@ is an error term. The price elasticity of demand is given by

η qp = p q q p = p α p β e ε ( βα p β1 e ε )=β. η qp = p q q p = p α p β e ε ( βα p β1 e ε )=β. MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4TdG2aaSbaaSqaaiaadghacaWGWbaabeaakiabg2da9maalaaabaGaamiCaaqaaiaadghaaaWaaSaaaeaacqGHciITcaWGXbaabaGaeyOaIyRaamiCaaaacqGH9aqpdaWcaaqaaiaadchaaeaacqaHXoqycaWGWbWaaWbaaSqabeaacqGHsislcqaHYoGyaaGccaWGLbWaaWbaaSqabeaacqaH1oqzaaaaaOWaaeWaaeaacqGHsislcqaHYoGycqaHXoqycaWGWbWaaWbaaSqabeaacqGHsislcqaHYoGycqGHsislcaaIXaaaaOGaamyzamaaCaaaleqabaGaeqyTdugaaaGccaGLOaGaayzkaaGaeyypa0JaeyOeI0IaeqOSdiMaaiOlaaaa@5DDA@

In other words, this demand curve has a constant price elasticity of demand equal to β. β. MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyOeI0IaeqOSdiMaaiOlaaaa@3933@ Moreover, we can convert the estimation of this equation into a linear regression by taking the natural logarithm of both sides of (5) to get lnq=lnαβlnp+ε. lnq=lnαβlnp+ε. MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaciiBaiaac6gacaWGXbGaeyypa0JaciiBaiaac6gacqaHXoqycqGHsislcqaHYoGyciGGSbGaaiOBaiaadchacqGHRaWkcqaH1oqzcaGGUaaaaa@45F8@

The logit equation and the quasi-elasticity

It is not appropriate to use the normal formula for an elasticity with (3) because the dependent variable is itself a number without units between 0 and 1. As an alternative it makes more sense to calculate the quasi-elasticity, which is defined as:

(6)

Since

we can calculate this elasticity as follows:

( ln( Pr( x i ) 1Pr( x i ) ) ) x = β 1 . ( ln( Pr( x i ) 1Pr( x i ) ) ) x = β 1 . MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaSaaaeaacqGHciITdaqadaqaaiGacYgacaGGUbWaaeWaaeaadaWcaaqaaiGaccfacaGGYbWaaeWaaeaacaWG4bWaaSbaaSqaaiaadMgaaeqaaaGccaGLOaGaayzkaaaabaGaaGymaiabgkHiTiGaccfacaGGYbWaaeWaaeaacaWG4bWaaSbaaSqaaiaadMgaaeqaaaGccaGLOaGaayzkaaaaaaGaayjkaiaawMcaaaGaayjkaiaawMcaaaqaaiabgkGi2kaadIhaaaGaeyypa0JaeqOSdi2aaSbaaSqaaiaaigdaaeqaaOGaaiOlaaaa@4FB0@

Focusing on the left-hand-side, we get:

or

or

(7)

Thus, we see from (6) that the quasi-elasticity is given by:

(8)

The quasi-elasticity measures the percentage point change in the probability due to a 1 percent increase of x. Notice that it is dependent on what value of x it is evaluated at. It is usual to evaluate (8) at the mean of x. Thus, the quasi-elasticity at the mean of x is:

η( x ¯ )= β 1 x ¯ Pr( x ¯ )( 1Pr( x ¯ ) ), η( x ¯ )= β 1 x ¯ Pr( x ¯ )( 1Pr( x ¯ ) ), MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4TdG2aaeWaaeaaceWG4bGbaebaaiaawIcacaGLPaaacqGH9aqpcqaHYoGydaWgaaWcbaGaaGymaaqabaGcceWG4bGbaebaciGGqbGaaiOCamaabmaabaGabmiEayaaraaacaGLOaGaayzkaaWaaeWaaeaacaaIXaGaeyOeI0IaciiuaiaackhadaqadaqaaiqadIhagaqeaaGaayjkaiaawMcaaaGaayjkaiaawMcaaiaacYcaaaa@4B9F@

where

Pr( x ¯ )= e β 0 + β 1 x ¯ 1+ e β 0 + β 1 x ¯ . Pr( x ¯ )= e β 0 + β 1 x ¯ 1+ e β 0 + β 1 x ¯ . MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaciiuaiaackhadaqadaqaaiqadIhagaqeaaGaayjkaiaawMcaaiabg2da9maalaaabaGaamyzamaaCaaaleqabaGaeqOSdi2aaSbaaWqaaiaaicdaaeqaaSGaey4kaSIaeqOSdi2aaSbaaWqaaiaaigdaaeqaaSGabmiEayaaraaaaaGcbaGaaGymaiabgUcaRiaadwgadaahaaWcbeqaaiabek7aInaaBaaameaacaaIWaaabeaaliabgUcaRiabek7aInaaBaaameaacaaIXaaabeaaliqadIhagaqeaaaaaaGccaGGUaaaaa@4E40@

Hypothesis testing

The researcher using the logit model (and any regression estimated by ML) has three choices when constructing tests of hypotheses about the unknown parameter estimates—(1) the Wald test statistic, (2) the likelihood ratio test, or (3) the Lagrange Multiplier test. We consider them in turn.

The Wald test

The Wald test is the most commonly used test in econometric models. Indeed, it is the one that most statistics students learn in their introductory courses. Consider the following hypothesis test:

H 0 : β 1 =β H A : β 1 β. H 0 : β 1 =β H A : β 1 β. MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGceaqabeaacaqGibWaaSbaaSqaaiaabcdaaeqaaOGaaeOoaiaabccacqaHYoGydaWgaaWcbaGaaGymaaqabaGccqGH9aqpcqaHYoGyaeaacaqGibWaaSbaaSqaaiaabgeaaeqaaOGaaeOoaiaabccacqaHYoGydaWgaaWcbaGaaGymaaqabaGccqGHGjsUcqaHYoGycaGGUaaaaaa@4818@
(9)

Quite often in these test researchers are interested in the case when β=0 β=0 MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqOSdiMaeyypa0JaaGimaaaa@3955@ —i.e., in testing if the independent variable’s estimated parameter is statistically different from zero. However, β β MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqOSdigaaa@3794@ can be any value. Moreover, this test can be used to test multiple restrictions on the slope parameters for multiple independent variables. In the case of a hypothesis test on a single parameter, the t-ratio is the appropriate test statistic. The t-statistic is given by

t= β i β s.e.( β i ) ~ t nk1 , t= β i β s.e.( β i ) ~ t nk1 , MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiDaiabg2da9maalaaabaGafqOSdiMbambadaWgaaWcbaGaamyAaaqabaGccqGHsislcqaHYoGyaeaacaqGZbGaaeOlaiaabwgacaqGUaWaaeWaaeaacuaHYoGygaWeamaaBaaaleaacaWGPbaabeaaaOGaayjkaiaawMcaaaaacaGG+bGaamiDamaaBaaaleaacaWGUbGaeyOeI0Iaam4AaiabgkHiTiaaigdaaeqaaOGaaiilaaaa@4C70@

where k is the number of parameters in the mode that are estimated. The F-statistic is the appropriate test statistic when the null hypothesis has restrictions on multiple parameters. See Cameron and Trivedi (2005: 224-231) for more detail on this test. According to Hauck and Donner (1977) the Wald test may exhibit perverse behavior when the sample size is small. For this reason this test must be used with some care.

The likelihood ratio test

The likelihood ratio test is based on a comparison of the maximum log of likelihood function for the unrestricted model with the maximum log of likelihood function for the model with the restrictions implied by the null hypothesis. Consider the null hypothesis given in (9). Let L( β ) L( β ) MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaabmaabaGaeqOSdigacaGLOaGaayzkaaaaaa@39EF@ be the value of the likelihood function when β 1 β 1 MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqOSdi2aaSbaaSqaaiaaigdaaeqaaaaa@387B@ be the value of the likelihood function when is restricted to being equal to β β MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqOSdigaaa@3794@ and L( β 1 ) L( β 1 ) MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaabmaabaGafqOSdiMbambadaWgaaWcbaGaaGymaaqabaaakiaawIcacaGLPaaaaaa@3AFA@ be the value of the likelihood function when there is no restriction on the value of β. β. MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqOSdiMaaiOlaaaa@3846@ Then the appropriate test statistic is

The likelihood ratio statistic has the Chi-square distribution χ 2 ( r ), χ 2 ( r ), MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4Xdm2aaWbaaSqabeaacaaIYaaaaOWaaeWaaeaacaWGYbaacaGLOaGaayzkaaGaaiilaaaa@3BCE@ where r is the number of restrictions. Thus, using a likelihood ratio test involves two estimations—one with no restrictions on the model and one with the restrictions implied by null hypothesis. Since the likelihood ratio test does not appear to exhibit perverse behavior with small sample sizes, it is an attractive test. Thus, we will run through an example of how to execute the test using Stata. The example we are using is from the Stata manual, volume 2, pp. 353-355.

Example 1: Underweight births.

In this model we estimate a model that explains the likelihood that a child will be born with a weight under 2,500 grams (low). The eight explanatory variables used in the model are listed in Table 1. The model to be estimated is:

(10)

Also, we want to test the null hypothesis that the coefficients on Age, Lwt, Ptl, and Ht are all zero. The first step is to estimate the unrestricted regression using the command:

. logistic low age lwt raceb raceo smoke ptl ht ui

 Variable name Definition Age Age of mother Lwt Weight at last menstrual period RaceB Dummy variable =1 if mother is black; 0 otherwise RaceO Dummy variable = 1 if mother in neither white or black; 0 otherwise Smoke Dummy variable = 1 if mother smoked during pregnancy; 0 otherwise Ptl Number of times mother had premature labor Ht Dummy variable = 1 if mother has a history of hypertension; 0 otherwise Ui Dummy variable = 1 there is presence in mother of uterine irritability; 0 otherwise Ftv Number of visits to physician during first trimester

The results of this estimation are shown in column 2 of Table 2. Next we save the results of this regression with the command:

. estimates store full

where “full” is the name that we will refer to when we want to recall the estimation results from this regression. Now we estimate the logistic regression with the omitting the variables whose parameters are to be restricted to being equal to zero:

. logistic low raceb raceo smoke ui

The results of this estimation are reported in column 3 of Table 2. Finally we run the likelihood ratio test with the command:

. lrtest full .

Notice that we refer to the first regression with the word “full” and to the second regression with the second period. The results of this command are as follows:

Likelihood-ratio test LR chi2(4) = 14.42

(Assumption: . nested in full) Prob > chi2 = 0.0061

The interpretation of these results is that the omitted variables are statistically significant at the 0.6 percent level.6

 Explanatory variable Unrestricted model Restricted model Age of mother -0.9732636 — (-0.74) Weight at last menstrual period -0.9849634 — (-2.19) Dummy variable =1 if mother is black; 0 otherwise 3.534767 3.052746 (2.40) (2.27) Dummy variable = 1 if mother in neither white or black; 0 otherwise 2.368079 2.922593 (1.96) (2.64) Dummy variable = 1 if mother smoked during pregnancy; 0 otherwise 2.517698 2.945742 (2.30) (2.89) Number of times mother had premature labor 1.719161 — (1.56) Dummy variable = 1 if mother has a history of hypertension; 0 otherwise 6.249602 — (2.64) Dummy variable = 1 if there is presence in mother of uterine irritability; 0 otherwise 2.1351 2.419131 (1.65) (2.04) Log likelihood -100.724 -107.93404 Number of observations 189 189 pseudo-R2 0.1416 0.0801

The Lagrange multiplier test

The intuition behind the Lagrange multiplier (LM) test (or score test) is that the gradient of the log of the likelihood function is equal to zero at the maximum of the likelihood function.7 If the null hypothesis in (2.9) is correct, then maximizing the log of the likelihood function for the restricted model is equivalent to maximizing the log of the likelihood function with the constraint specified by the null hypothesis. The LM test measures how close the Lagrangian multipliers of this constrained maximization problem are to zero—the closer they are to zero, the more likely that the null hypothesis can be rejected.

Economists generally do not make use of the LM test because the test is complicated to compute and the LR test is a reasonable alternative. Thus, as a practical matter the Wald test and the LR test are reasonable alternative test statistics to use to test most linear restrictions on the parameters. Moreover, since the calculations are relatively easy, it may make sense to calculate both test statistics to be sure they produce consistent conclusions. However, when the sample size is small, the LM test probably is preferred.

Goodness-of-fit measures

The standard measure of goodness-of-fit in the linear OLS regression model is R 2 . R 2 . MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOuamaaCaaaleqabaGaaGOmaaaakiaac6caaaa@3870@ No such measure exists for non-linear models like the logit model. Several potential alternatives have been developed in the literature and are known collectively as pseudo-R2. Many of these measures are discussed in McFadden (1974), Amemiya (1981), and Maddala (1983). In case any reader really cares about the pseudo-- R 2 , R 2 , MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOuamaaCaaaleqabaGaaGOmaaaakiaacYcaaaa@386E@ a practical approach is to report the value that the computer program reports.

One addition measure of goodness-of-fit is a measure called percentage correctly predicted. This variable is computed in one of several ways. One way is to use the observed values of the independent variable to forecast the probability the dependent variable equal one. Then, if the predicted probability is above some critical value, you assume that the predicted value of the dependent value is one. If it is below this value, you assume the predicted value of the dependent variable is zero. Then you construct a table that compares the predicted values of the dependent variable with the actual value of the dependent as shown in Table 3.

The percentage correctly predicted is equal to the sum of the diagonal elements, that is, n 00 + n 11 n 00 + n 11 MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOBamaaBaaaleaacaaIWaGaaGimaaqabaGccqGHRaWkcaWGUbWaaSbaaSqaaiaaigdacaaIXaaabeaaaaa@3C08@ , over the sample size. The main problem with this measure is that the choice of the cutoff point is arbitrary. Traditionally, a cutoff point used has been 0.5. However, there is no reason why this cutoff is the appropriate one. Cramer (2003, 67) suggests that a more appropriate cutoff point is the sample frequency—that is, n 10 + n 11 n 00 + n 01 + n 10 + n 11 . n 10 + n 11 n 00 + n 01 + n 10 + n 11 . MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaSaaaeaacaWGUbWaaSbaaSqaaiaaigdacaaIWaaabeaakiabgUcaRiaad6gadaWgaaWcbaGaaGymaiaaigdaaeqaaaGcbaGaamOBamaaBaaaleaacaaIWaGaaGimaaqabaGccqGHRaWkcaWGUbWaaSbaaSqaaiaaicdacaaIXaaabeaakiabgUcaRiaad6gadaWgaaWcbaGaaGymaiaaicdaaeqaaOGaey4kaSIaamOBamaaBaaaleaacaaIXaGaaGymaaqabaaaaOGaaiOlaaaa@49F3@ The bottom line is that the uncertainty about the proper choice of cutoff point is a major problem with using the percentage correctly predicted as a measure of goodness-of-fit.

Additional notes on binary variable models

One of the key choices in the various binary variable models involves the cumulative distribution function. The Table 4 shows the four commonly used binary outcome models along with the cumulative distribution functions:

The logit, probit, and complementary log-log models are symmetric around zero and restrict 0p1. 0p1. MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaGimaiabgsMiJkaadchacqGHKjYOcaaIXaGaaiOlaaaa@3C7A@ The linear does not impose either of these restrictions. Use of the complementary log-log regression sometimes is recommended when the sample is skewed such that there is a high proportion of ones and zeros. In general, economists use either the logit or probit models a majority of the time. Interestingly, there is no need to use robust estimation techniques for the logit and probit models if they are correctly specified. If use of the vce(robust) option produces substantially different parameter estimates than the estimates without the robust option, then it is likely that the models are misspecified. The linear model is inherently heteroskedastistic, implying that the vce(robust) option should be used.

The parameter estimates are comparable across the first three models in Table 4. In particular,

1. β Logit 4 β Linear , β Logit 4 β Linear , MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGafqOSdiMbambadaWgaaWcbaGaaeitaiaab+gacaqGNbGaaeyAaiaabshaaeqaaOGaeyisISRaaGinaiqbek7aIzaataWaaSbaaSqaaiaabYeacaqGPbGaaeOBaiaabwgacaqGHbGaaeOCaaqabaGccaGGSaaaaa@46EF@
2. β Probit 2.5 β Linear , and β Probit 2.5 β Linear , and MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGafqOSdiMbambadaWgaaWcbaGaaeiuaiaabkhacaqGVbGaaeOyaiaabMgacaqG0baabeaakiabgIKi7kaaikdacaGGUaGaaGynaiqbek7aIzaataWaaSbaaSqaaiaabYeacaqGPbGaaeOBaiaabwgacaqGHbGaaeOCaaqabaGccaGGSaGaaeiiaiaabggacaqGUbGaaeizaaaa@4CB1@

Example 2: Supplementary health insurance coverage.

These data come from wave 5 (2002) of the Health and Retirement Study (HRS), a panel survey sponsored by the National Institute of Aging. The sample is restricted to Medicare beneficiaries; there are 3,206 observations. The elderly can obtain supplementary insurance coverage either by purchasing it themselves or by joining employer-sponsored plans. The data is in the file Example.xls. The variables included are listed in Table ?.

 Variable Definition Binary variables (ins = 1 if individual has purchased supplementary insurance from any source retire = 1 if individual is retired hstatusg = 1 if individual assess his/her health status either as good, very good, or excellent married = 1 if married hisp = 1 if hispanic female = 1 if female white = 1 if white sretire = 1 if a retired spouse is present in household Continuous variables age Age of individual in years hhincome Household income educyear Years of education chronic Total number of chronic conditions adl Number of limitations on daily activity (up to 5)

Stata commands

Place the data into the editor and then create a list of the independent variables. Now create a new variable equal to the log of income:

.generate linc = ln(hhinc)

[notice that 9 observations are eliminated.]

Create list of "extra" variables in order to shorten future commands:

. global extralist linc female white chronic adl sretire

Summarize the variables in order to check for obvious typos (output is suppressed):

.summarize ins retire $xlist$extralist

Estimate logit regression (output is shown in Figure 3):

.logit ins retire $xlist Estimate and save results from several models (the Stata command "quietly" suppresses the output from the command): . estimates store blogit .quietly probit ins retire$xlist

.estimates store bprobit

.quietly regress ins retire $xlist .estimates store bols .quietly logit ins retire$list, vce(robust)

. estimates store blogitr

.quietly probit ins retire $xlist, vce(robust) .estimates store bprobitr .quietly regress ins retire$xlist, vce(robust)

.estimates store bolsr

We can create table for comparing the models (output is suppressed):

.estimates table blogit blogitr bprobit bprobitr bols bolsr, t stats(N ll) b(%8.4f) stfmt(%8.2f)

We now test for the presence of interaction variables:

.generate age2 = age*age

.generate agefem = age*fem

.generate agewhite = age*white

.generate agechronic = age*chronic

.global intlist age2 agefem agewhite agechronic

.quietly logit ins retire $xlist$intlist

.test $intlist ( 1) [ins]age2 = 0 ( 2) [ins]agefem = 0 ( 3) [ins]agewhite = 0 ( 4) [ins]agechronic = 0 chi2( 4) = 7.45 Prob > chi2 = 0.1141 Likelihood ratio test .quietly logit ins retire$xlist $intlist .estimates store B .quietly logit ins retire$xlist

.lrtest B

Likelihood-ratio test LR chi2(4) = 7.57

(Assumption: . nested in B) Prob > chi2 = 0.1088

Comparison with using the logistic command:

. logistic ins retire $xlist The marginal effects at the mean will yield more useful results when the model is non-linear: .quietly logit ins retire$xlist

.mfx

Let’s put the table comparing parameter estimates into a cleaned up table:

 Logit Robust Logit Probit Robust Probit OLS Robust OLS Individual retired 0.1969 0.1969 0.1184 0.1184 0.0409 0.0409 (2.34) (2.32) (2.31) (2.30) (2.24) (2.24) Age of individual -0.0146 -0.0146 -0.0089 -0.0089 -0.0029 -0.0029 (-1.29) (-1.29) (-1.29) (-1.32) (-1.20) (-1.25) Health status 0.3123 0.3123 0.1977 0.1977 0.0656 0.0656 (3.41) (3.40) (3.56) (3.57) (3.37) (3.45) Household income 0.0023 0.0023 0.0012 0.0012 0.0005 0.0005 (3.02) (2.01) (3.19) (2.21) (3.58) (2.63) Years of education 0.1143 0.1143 0.0707 0.0707 0.0234 0.0234 (8.05) (7.96) (8.34) (8.33) (8.15) (8.63) Individual married 0.5786 0.5786 0.3623 0.3623 0.1235 0.1235 (6.20) (6.15) (6.47) (6.16) (6.38) (6.62) Individual is an Hispanic -0.8103 -0.8103 -0.4731 -0.4731 -0.1210 -0.1210 (-4.14) (-4.18) (-4.28) (-4.36) (-3.59) (-4.49) Intercept -1.7156 -1.7156 -1.0693 -1.0693 0.1271 0.1271 (-2.29) (-2.36) (-2.33) (-2.40) (0.79) (0.83) Sample size 3,206 3,206 3,206 3,206 3,206 3,206 Log of the likelihood function -1994.88 -1994.88 -1993.62 -1993.62 -2104.75 -2104.75

As a last exercise use the following commands to generate a graph of the predicted values:

. quietly logit ins hhincome

. predict plogit, pr

. quietly probit ins hhincome

. predict pprobit, pr

. quietly regress ins hhincome

. predict pols, xb

. summarize ins plogit pprobit pols

. sort hhincome

.twoway (scatter ins hhincome, msize(vsmall)) (line plogit hhincome, lcolor(blue) lpattern

> (solid)) (line pprobit hhincome, lcolor(red) lpattern(tight_dot)) (line pols hhincome,

> lcolor(green) lpattern(longdash_shortdash)), ytitle(Predicted Probability) xtitle(Household income)

Note: save file as a .tif file if you want to insert the graph directly into a word file.

Exercises

Exercise 1

The determinants of physician advice. Physicians are expected to give lifestyle advice as a part of their normal interaction with their patients. Sometimes doctors choose not to comment on a patient’s lifestyle because they do not have time for personal comments, they feel the advice will be unwelcome, they feel that lifestyle choices are not any business of the physician, they find the discussion of lifestyle issues to be embarrassing, or they are not aware of the patient’s actual lifestyle choices. In this project we are interested in understanding when physicians choose to give advice concerning the consumption of alcohol.

The MS Excel file ktdata contains the responses to the 1990 National Health Interview Survey core questionnaire and special supplements from 2,467 males who were current drinkers in 1990. Individuals who are lifetime abstainers or who are former drinkers who have not consumed any alcohol in the past year are excluded from the sample. Table 7 contains the names and definitions of the variables collected in the survey.

 Variable Definition Drinks Total number of drinks taken in the past two weeks Advice Did your physician give you advice about alcohol consumption? Yes = 1, No = 0 Income Monthly income in $1,000 (there are 5 missing values denoted by a “.”) Age30 Dummy variable equal to 1 if 30 < Age ≤ 40and 0 otherwise Age40 Dummy variable equal to 1 if 40 < Age ≤ 50 and 0 otherwise Age50 Dummy variable equal to 1 if 50 < Age ≤ 60 and 0 otherwise Age60 Dummy variable equal to 1 if 60 < Age ≤ 70 and 0 otherwise AgeGT70 Dummy variable equal to 1 if individual’s age is greater than 70 and 0 otherwise Educ Number of years of schooling (0 to 18) Black Dummy variable equal to 1 if the individual is a black and 0 otherwise Other Dummy variable equal to 1 if the individual is non-white and non-black and 0 otherwise Married Dummy variable equal to 1 if the individual is married and 0 otherwise Widow Dummy variable equal to 1 if the individual is a widow and 0 otherwise DivSep Dummy variable equal to 1 if the individual is either divorce or separated and 0 otherwise Employed Dummy variable equal to 1 if the individual is currently employed and 0 otherwise Unemploy Dummy variable equal to 1 if the individual is currently unemployed and 0 otherwise NE Dummy variable equal to 1 if the individual lives in the Northeast US and 0 otherwise MW Dummy variable equal to 1 if the individual lives in the Midwest US and 0 otherwise South Dummy variable equal to 1 if the individual lives in the South and 0 otherwise Medicare Dummy variable equal to 1 if the individual receives Medicare and 0 otherwise Medicaid Dummy variable equal to 1 if the individual receives Medicaid and 0 otherwise Champus Dummy variable equal to 1 if the individual has military insurance and 0 otherwise HlthIns Dummy variable equal to 1 if the individual has health insurance and 0 otherwise RegMed Dummy variable equal to 1 if the individual has a regular source of medical care and 0 otherwise DRI Dummy variable equal to 1 if the individual sees the same doctor and 0 otherwise MajorLim Dummy variable equal to 1 if the individual has limits on major daily activity and 0 otherwise SomeLim Dummy variable equal to 1 if the individual has limits on some daily activity and 0 otherwise Diabetes Dummy variable equal to 1 if the individual has diabetes and 0 otherwise Heart Dummy variable equal to 1 if the individual has a heart condition and 0 otherwise Stroke Dummy variable equal to 1 if the individual has had a stroke and 0 otherwise You are to estimate a logit regression of the form: ln( p 1p )= β 0 + i β i x i +ε, ln( p 1p )= β 0 + i β i x i +ε, MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaciiBaiaac6gadaqadaqaamaalaaabaGaamiCaaqaaiaaigdacqGHsislcaWGWbaaaaGaayjkaiaawMcaaiabg2da9iabek7aInaaBaaaleaacaaIWaaabeaakiabgUcaRmaaqafabaGaeqOSdi2aaSbaaSqaaiaadMgaaeqaaOGaamiEamaaBaaaleaacaWGPbaabeaaaeaacaWGPbaabeqdcqGHris5aOGaey4kaSIaeqyTduMaaiilaaaa@4CA0@ where p is the probability that a patient received advice about his level of consumption of alcohol and xi are the explanatory variables. Provide the following information: 1. Make a table of the means of all of the variables. 2. Offer an economic justification for the inclusion of each explanatory variable you use in your regression (including a prediction of its expected sign). 3. Make a table reporting the results of the estimation of (1) an OLS linear estimation, (2) a probit estimation, and (3) a logit estimation. Also include a column with the ratio of each of the logit parameters to the probit parameter. Do not use the abbreviated name of the explanatory variables in the table. 4. Present a table of results of a logit model with all of the variables and with whatever other models you feel are suggested by your empirical results. Discuss the results of the estimation and what the estimation tells you about how physicians decide whether to give advice on alcohol consumption to their male patients. Exercise 2 The Supply of Married Women in the Workforce. We are interested in understanding the decision of married women to enter the labor force. We have available two data sets, one using data from the United States and the other using data from Portugal. You are to estimate a logit regression for married women for each of the two data sets.  Variable Definition Working dummy variable = 1 if a married woman works during the year Fulltime dummy variable = 1 if a married woman works more than 1000 hours in a year Other the other household income in$100 (not in \$1000) Age age of the wife Educ education years of the wife C0005 number of children for ages 0 to 5 C0613 number of children for ages 6 to 13 C1417 number of children for ages 14 to 17 NW 1 if non-white, and 0 otherwise. HOwn 1 if the home is owned by the household, and 0 otherwise HMort 1 if the home is on mortgage, and 0 otherwise Prof 1 if the husband is manager or professional, and 0 otherwise Sales 1 if the husband is sales worker or clerical or craftsman, and 0 otherwise Farm 1 if the husband is farm-related worker Unem local unemployment rate in %

Data Set 1: The data for this project are in the MS Excel file FLABOR. These data are observations on married females drawn from the 1987 wave of Michigan Panel Study of Income Dynamics (PSID). The data set has observations for 3,382 individuals.

Data Set 2: These data are from Portugal. The data set is a sample from Portuguese Employment Survey, from the interview year 1991, and has been provided by the Portuguese National Institute of Statistics (INE). The data are in the Excel file Martins. This file is organized into seven columns, corresponding to seven variables, with 2,339 observations.

 Variable Definition Works Dummy variable equal to 1 if the woman works, 0 otherwise Child18 The number of children younger than 18 living in the family Child03 The number of children younger than 3 living in the family Age The woman’s age LogWomanWageRate The log of women's hourly wage rate (measured in escudos) Education The women's educational level, measured in years of schooling LogHusbandMonthlyWages The log of the husband's monthly wage (measured in escudos)

1. What factors other than wage levels determine the number of hours that a wife will spend in the work force? Remember to use economic theory in answering this question.
2. Clearly, one of the major factors in determining if a wife will enter the labor force is the wage level she can earn. The US data set does not include the wife’s wage level. Is there any other variable in the data set that economic theory suggests will be a good proxy for wage levels?
3. The variable Age is a proxy for the work (or life) experience of a woman. We would expect that its effect on the probability that a woman will enter the labor force will be non-linear—that is, its marginal impact will be positive and decreasing. This reasoning suggests that you should use Age and Age2 as explanatory variables. Can the same reasoning be used with the variable Education? What are your expectations about the signs of the parameters of these two explanatory variables? The same reasoning can be used about the number of years of education.
4. Estimate and report in a table the following two logit regressions: (1) US women enter the labor force at all and (2) US women enter the labor force for at least 1,000 hours if they enter the labor force,. In each of these cases, compare your results to a linear model.
5. The Portuguese data set has a different problem. We have reported the wage rate of women who are working, but no wage level for women who are not working. We will get around this problem by first using the data for women who actually work to estimate the relationship between wage rates and the age and education of the women. We will then use this relationship to predict the wage rate for both women who do work and women who do not work. We will then use this predicted wage rate data series as an independent variable in a logit model explaining the probability that a married woman will enter the labor force. When completing the logit regression be sure that you separate all of the children in a family into those 3 and under and those between 4 and 18. Also, include the years of education in this regression to see if a Portuguese married woman’s taste for participation in the labor force increases or decreases with the level of her education.
6. Is it reasonable to compare your results for the two countries?

References

Amemiya, T. (1981). Quantitative Response Models: A Survey. Journal of Economic Literature 19: 1483-1536.

Cramer, J. S. (2003). Logit Models from Economics and Other Fields (Cambridge: Cambridge University Press).

Cameron, A. Colin and Pravin K. Trivedi (2005). Microeconometrics: Methods and Applications (Cambridge: Cambridge University Press).

Ladd, G. W. (1966). Linear Probability Functions and Discriminant Functions. Econometrica 34: 873-888.

Maddala, G. S. (1983). Limited-Dependent and Qualitative Variables in Economics (Cambridge: Cambridge University Press).

McFadden, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior. In P. Zarembka (ed.) Frontiers in Econometrics (New York: Academic Press): 105-142.

Wald, A. (1943). Test of Statistical Hypotheses Concerning Several Parameters When the Number of Observations is Large. Transactions of the American Mathematical Society 54: 426-482.

Footnotes

1. J. S. Cramer (2003) Logit Models from Economics and Other Fields (Cambridge: Cambridge University Press): 10.
2. For a full discussion of this model see Ladd, G. W. (1966) “Linear Probability Functions and Discriminant Functions,” Econometrica34: 873-888.
3. The assumption that the variance is equal to 1 is due to technical considerations. See [Cramer, 22].
4. The pdf of a logistic distribution is f(x)= λ e λx ( 1+ e λx ) 2 f(x)= λ e λx ( 1+ e λx ) 2 MathType@MTEF@5@5@+=feaagyart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8qacaWGMbGaaiikaiaadIhacaGGPaGaeyypa0ZaaSaaa8aabaWdbiabeU7aSjaadwgapaWaaWbaaSqabeaapeGaeyOeI0Iaeq4UdWMaamiEaaaaaOWdaeaapeWaaeWaa8aabaWdbiaaigdacqGHRaWkcaWGLbWdamaaCaaaleqabaWdbiabgkHiTiabeU7aSjaadIhaaaaakiaawIcacaGLPaaapaWaaWbaaSqabeaapeGaaGOmaaaaaaaaaa@4A65@ , where λ= π 3 1.814 λ= π 3 1.814 MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4UdWMaeyypa0ZaaSaaaeaacqaHapaCaeaadaGcaaqaaiaaiodaaSqabaaaaOGaeyisISRaaGymaiaac6cacaaI4aGaaGymaiaaisdaaaa@40B6@ . See Cramer, 24-26 for a fuller discussion of the logistic distribution.
5. See Stata Library, Categorical and Count Data Analysis Utilities for useful utilities and an excellent discussion of how to interpret categorical and count regression results at http://www.ats.ucla.edu/stat/stata/library/longutil.htm/ (accessed July 19, 2009).
6. The phrase “(Assumption: . nested in full)” tells you the name of the regression is the unrestricted model (full) and offers you a hyperlink to call this regression up to the screen.
7. The gradient is a vector of first-derivatives. In this case it is a vector of the first-derivatives with respect to each parameter estimate ( i.e.,  β ^ i ). ( i.e.,  β ^ i ). MathType@MTEF@5@5@+=feaagyart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaamaabmaabaGaaeyAaiaab6cacaqGLbGaaeOlaiaabYcacaqGGaGafqOSdiMbaKaadaWgaaWcbaGaamyAaaqabaaakiaawIcacaGLPaaacaGGUaaaaa@3F81@ To obtain the ML estimate, we have to set these first-derivatives equal to zero.

Content actions

Give feedback:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks