14.171: Software Engineering for Economists 9/5/2008 & 9/7/2008 University of Maryland Department of Economics Instructor: Matt Notowidigdo
Slide 2Lecture 2, Intermediate Stata
Slide 3Detailed Course Outline Today 9am-11am: Lecture 1, Basic Stata Quite survey of essential Stata (information administration, basic inherent components) Ccontrol stream, circles, factors, methods) Programming "best practices" Post-estimation programming 11am-twelve: Exercise 1 1a: Preparing an information set, running some preparatory relapses, and yielding results 1b: More on discovering delay flights 1c: Using general expressions to parse information Noon-1pm: Lunch 1pm-3pm: Lecture 2, Intermediate Stata Non-parametric estimation, quantile relapse, NLLS, post-estimation tests, and other implicit orders Dealing with huge information sets Bootstrapping and Monte carlo recreations in Stata Programs, ADO documents in Stata network dialect 3pm-4pm: Exercise 2 2a: Monte Carlo trial of OLS/GLS with serially associated information 4pm-6pm: Lecture 3, Maximum Likelihood Estimation in Stata MLE cookbook! Sunday: Mata and GMM
Slide 4Warm-up and survey Before getting into MLE, we should discuss the activities … exercise1a.do (privatization DD) TMTOWTDI! You needed to get the greater part of the diverse exp* factors into one variable. You had a wide range of arrangements, a considerable lot of them extremely imaginative. Supplant missing qualities with 0, and after that you included all the exp* factors You utilized the rsum() order in egen (this regards missing qualities as zeroes which is weird, however it works) You "hard-coded" everything
Slide 5Intermediate Stata review slide Quick voyage through other inherent summons: non-parametric estimation, quantile relapse, and so on. In case you're not certain it's in there, ask somebody. And after that counsel instructional booklets. What's more, (perhaps) email around. Don't re-imagine the wheel! On the off chance that it's not very difficult to do, it's reasonable that somebody has effectively done it. Illustrations: Proportional danger models (streg, stcox) Generalized direct models (glm) Non-parametric estimation (kdensity) Quantile relapse (qreg) Conditional settled impacts poisson (xtpoisson) Arellano-Bond dynamic board estimation (xtabond) But once in a while more up to date summons don't (yet) have precisely what you need, and you should actualize it yourself e.g. xtpoisson doesn't have bunched standard mistakes Dealing with expansive information sets How does Stata oversee memory What is truly happening in those "if", "xi", "in", "safeguard" explanations When would it be a good idea for me to leave the solace of Stata and utilize another dialect? Monte carlo reenactments in Stata You ought to have the capacity to do this in view of what you realized last address (you know how to set factors and utilize control structures). Simply require some network linguistic structure.
Slide 6"Middle of the road" Stata summons Hazard models (streg) Generalized direct models (glm) Non-parametric estimation (kdensity) Quantile relapse (qreg) Conditional altered impacts poisson (xtpoisson) Arellano-Bond dynamic board estimation (xtabond) I have found these charges simple to utilize, yet the econometrics behind them is not generally straightforward. Make a point to comprehend what you are doing when you are running them. It's anything but difficult to get comes about, however with a significant number of these charges, the outcomes are now and then difficult to translate. Yet, initial, a snappy survey and a simple warm-up …
Slide 7Quick audit, FE and IV clear set obs 10000 gen id = floor( (_n - 1)/2000) bys id: gen fe = invnorm(uniform()) if _n == 1 by id: supplant fe = fe[1] gen spunk = invnorm(uniform()) gen z = invnorm(uniform()) gen tutoring = invnorm(uniform()) + z + spunk + fe gen capacity = invnorm(uniform()) + spunk gen e = invnorm(uniform()) gen y = tutoring + capacity + e + 5*fe reg y tutoring xtreg y tutoring , i(id) fe xi: reg y tutoring i.id xi i.id reg y tutoring _I* areg y tutoring, absorb(id) ivreg y (tutoring = z) _I* xtivreg y (tutoring = z), i(id) xtivreg y (tutoring = z), i(id) fe
Slide 8Data check
Slide 9Results
Slide 10Results, con't
Slide 11Results, con't
Slide 12Results, con't
Slide 13Results, con't
Slide 14Fixed impacts in Stata Many approaches to do altered impacts in Stata. Which is ideal? "xi: relapse y x i.id" is quite often wasteful "xi i.id" makes the settled impacts as factors (as "_Iid0", "_Iid1", and so on.), so accepting you have the space this gives you re-a chance to utilize them for different charges (e.g. encourage estimation, classification, and so on.) "areg" is extraordinary for expansive information sets; it abstains from making the settled impact factors since it belittles the information by gathering (i.e. it is simply an "inside" estimator). In any case, it is not direct to get the settled impact gauges themselves (" help areg postestimation ") "xtreg" is an enhanced variant of areg. It ought to likely be utilized rather (despite the fact that requires board id variable to be whole number, can't have a string) What on the off chance that you need state-by-year FE in an expansive information set?
Slide 15Generalized direct models (glm) E[y] = g(X*B) + e g() is known as the "connection work". Stata's "glm" summon bolsters log, logit, probit, log-log, power, and negative binomial connection capacities Can likewise make dispersion of "e" non-Gaussian and make an alternate parametric suspicion on the mistake term (Bernoulli, binomial, Poisson, negative binomial, gamma are upheld) Note that not all mixes bode well (i.e. can't have Gaussian mistakes in a probit interface work) This is executed in Stata's ML dialect (more on this next address) If connect capacity or blunder dispersion you need isn't in there, it is anything but difficult to write in Stata's ML langauge (once more, we will see this more next address) See Finkelstein (QJE 2007) for an illustration and talk of this system.
Slide 16glm diversion Manning (1998) … "In numerous investigations of uses on medicinal services, the uses for clients are liable to a log change to lessen, if not dispense with, the skewness innate in wellbeing use information… In such cases, gauges in view of logged models are frequently a great deal more exact and hearty than direct examination of the unlogged unique ward variable. Albeit such gauges might be more exact and strong, nobody is keen on log display comes about on the log scale fundamentally. Congress does not suitable log dollars. To begin with Bank won't money a check for log dollars. Rather, the log scale comes about must be retransformed to the first scale with the goal that one can remark on the normal or aggregate reaction to a covariate x. There is an undeniable threat that the log scale results may give an extremely deceptive, fragmented, and one-sided gauge of the effect of covariates on the untransformed scale, which is typically the size of extreme intrigue."
Slide 17glm clear set obs 100 gen x = invnormal(uniform()) gen e = invnormal(uniform()) gen y = exp(x) + e gen log_y = log(y) reg y x reg log_y x, strong glm y x, link(log) family(gaussian)
Slide 18glm, con't Regression in levels produces coefficient that is too huge, while relapse in logs produces coefficient that is too low (which we expect since dispersion of y is skewed)
Slide 19Non-parametric estimation Stata has worked in support for portion densities. Frequently a helpful distinct apparatus to show "smoothed" conveyances of information Can likewise non-parametrically evaluate likelihood thickness elements of intrigue. Case : Guerre, Perrigne & Vuong (EMA, 2000) estimation of first-value barters with hazard unbiased bidders and iid private qualities: Estimate appropriation of offers non-parametrically Use watched offers and this evaluated conveyance to develop dissemination of qualities Assume qualities are dispersed by CDF: Then you can infer the accompanying offering capacity for N =3 bidders QUESTION: Do bidders "shade" their offers for all qualities?
Slide 20GPV with kdensity clear set mem 100m set seed 14171 set obs 5000 nearby N = 3 gen esteem = - log(1-uniform()) gen offer = ( (value+0.5)*exp(- 2*value)- 2*(1+value)*exp(- value)+1.5 )/(1-2*exp(- value)+exp(- 2*value)) sort offer gen cdf_G = _n/_N kdensity offer, width (0.2) produce (b pdf_g) at(bid) ** pseudo-esteem retreated from offer dissemination gen pseudo_v = offer + (1/(`N'- 1))*cdf_G/pdf_g twoway ( kdensity esteem, width(0.2) ) (kdensity pseudo_v, width(0.2) ),/title ("Kernel densities of genuine qualities and pseudo-values")/scheme(s2mono) ylabel(, nogrid) graphregion(fcolor(white))/legend (region(style(none)))/legend(label(1 "Actual values"))/legend(label(2 "Pseudo-values"))/legend(cols(1))/xtitle ("valuation") diagram trade gpv.eps, supplant
Slide 21GPV with kdensity
Slide 22Quantile relapse (qreg) qreg log_wage age female edhsg edclg dark other _I*, quantile (.1) framework temp_betas = e(b) grid betas = ( nullmat (betas) \ temp_betas) qreg log_wage age female edhsg edclg dark other _I*, quantile(.5) network temp_betas = e(b) lattice betas = (nullmat(betas) \ temp_betas) qreg log_wage age female edhsg edclg dark other _I*, quantile(.9) lattice temp_betas = e(b) network betas = (nullmat(betas) \ temp_betas) QUESTIONS: - What does it mean if the coefficient on "edclg" contrasts by quantile? What are we realizing when the coefficients are distinctive? (Indicate: What does it let us know whether the coefficient is about the same in each relapse) What would you be able to do if training is endogenous?
Slide 23Non-direct slightest squares (NLLS) clear set obs 50 worldwide alpha = 0.65 gen k= exp (invnormal(uniform())) gen l=exp(invnormal(uniform())) gen e=invnormal(uniform()) gen y=2.0*(
SPONSORS
SPONSORS
SPONSORS