top of page

Papers must be stapled or paper-clipped on all assignments to receive full credit 

 

Assignment 1  Using Stata to Generate Regressions & Related Graphics (See Video for Assistance) 

(All of the commands below can be completed by using the mouse to select the appropriate menu item(s) in Stata)

​

1. Open Stata example file auto.dta to your local computer
2. Create a local folder on your computer (or external storage device) and instruct Stata to write output to this new folder by using:  cd c:\Users\Wkuuser\statastuff

3. Save the file to your local fold

4. Generate a histogram of mpg with a normal curve overlay and a density plot and normal density overlay:
             histogram mpg, frequency normal kdensity

5. Generate scatter plots between mpg, weight, and price

6. Generate a scatter plot between mpg and weight that also includes the OLS regression line:     
          scatter mpg weight || lfit mpg weight

(save these graphs to your Stata folder)7. Generate summary statistics for mpg, weight, and price

8. Generate a regression equation for mpg as the dependent variable with weight and price as independent variables
    with standardized coefficients also reported:     reg mpg weight price, beta

10. Transfer your results to a Word file (or another document processor)

Add the following titles:
Figure 1. Scatterplot of MPG, Weight, and Price
Figure 2. Scatterplot and Regression Line for MPG and Weight
Table 1. Summary Statistics
Table 2. Regression Results for MPG including Standardized Coefficients

​

 

Questions useful for Quizzes & Exam 1.

Q1. What do the histogram and density plots indicate about mpg?

Q2. If the regression line were not included in the scatterplot of mpg and weight, could you figure out the (approximate) slope and intercept?

Q3: From the summary statistics, what the is average weight and mpg of a car in the sample? What are the units on weight and mpg? What is the standard deviation of weight and mpg? What are the max and min values for each?

Q4. From the regression results, how much does a 1 pound increase in weight decrease mpg? What about a 100 pound increase? What about a 1 standard deviation increase in weight? How does this relate to the the standardized regression coefficients?

Q5. What is the meaning of the standard error on the weight coefficient? What about the p-value? What about the R-squared value? 

​​

 

​

Assignment 2 Building Reliable Regressions  + Publishing Results in Report-Ready Formats

Part 1: Thinking about Causal Regressions Before Doing Them

Suppose you were investigating this research question: 
What is the impact of state tax on long-run state economic growth?

 

1. Write out a regression equation with long-run state economic growth as the dependent variable, state tax policy as an independent variable, and additional "control" variables that are likely to influence long-run state economic growth

2. Provide 2 ways to specifically measure (operationalize) long-run state economic growth using annual Gross State Product (GSP) as the starting point. 

3. Provide 2 alternative ways to specifically measure tax policy differences across states?

4. Express an equation in regression format that includes these specific measures and specific measures of the control variables.

5. What variables (both within and outside of the model) might not only be correlated with tax policy but cause differences in tax policy across states? 

6. (Give your best shot). It is difficult to fully incorporate all influences on economic growth that may correlate with tax policy. Ideally, a researcher would like to run an experiment where some tax rates across states or over time were changed and others remained the same while holding all other influences constant. That isn't possible. However, is there some way to manipulate the data or design of the regression so that, at least in part, in an experimental setup for examining the relationship between taxes and growth? 

​

​

Part 2: Reporting Regression-Related Results in Report-Level Formats  (Assignment 2 Video Tutorial)  

This part of the assignment uses data and some of the estimation methods described at a Princeton site:
https://dss.princeton.edu/training/Regression101.pdf  to produces tables & figures that are formatted in ways often seen in economic reports & articles. 

​

First, you need download the state level data on SAT scores and related variables from Dropbox and save it into the folder where you keep your Stata data.  states_sat    (see the Princeton tutorial p. 4 for explanations of variables).

 

This assignment also introduces the use of Stata "do files" (programs with Stata coded commands) to generate results.
You can use this do-file, assign 2 tutorial, for assistance along with this video tutorial

You can check each command in Stata before putting it in the Do-file, if you want to be sure that it works)

 

Remember to install the esttab /estout package: install ssc estout

(Details on using esttab are available at  a UToronto website and at the SSC website )  

Remember to use the cd command so output is sent to your Stata folder

Open the do-file editor and instruct Stata to

 

1. Put the "clear" command first to clear anything in Stata memory

2. Utilize the "cd" command so Stata automatically looks at your Stata data folder. For me this is

            cd C:\Users\brn81119\Documents\statastuff\     

3. Open the states_sat file into Stata

4. Estimate a regression for SAT scores with per pupil expenditure (expense) and store the model's results with

5. Estimate a regression for SAT scores with expense and percent high school students taking the exam (percent) and store the results
6. Estimate a regression for SAT scores with expense, percent, and percent adults with college degrees (college) and store there results

7. Export the results to Word (or other document processor ) using the esttab command: 
6. For the last model, generate residual values and graph them with a histogram that includes a density plots like we did in Assignment one:  

8. Execute the do file (also print it and save it using a name of your choosing).

9. Copy & paste the graph into the regression output file: 
       Add titles to the regression table and to the graph:

              Table 1. Regression of SAT Scores Across States
                         Figure 1. Histogram & Density Plots of Residuals

                              Print the output from along with the Do-file

​

Note: Instead of the esttab command, many people use the "outreg2" command for publishing regression tables.  It is described on the Princeton site on pp. 33-35.  It also must be installed (ssc install outreg2). Go to help and type outreg2. The Princeton

​

Questions Useful for Quizzes & Exam 1:

1. In Part 1, what are the consequences of leaving out a control variable that is correlated with state tax policy?

2. In Part 1, what are the consequences if state economic growth causes as well as is caused by state tax policy?

3. In Part 1, what would be the consequences of using Gross State Product as the dependent variable and Median Household Income for the state as an explanatory variable?

4. In Part 2, what is the impact on the coefficient for expenditure per pupil (expense) of adding the variable for percent of high school students taking the SAT (percent)?  Why is this happening?

5. In Part 2, how are the residuals computed ("behind the scenes in Stata, not the Stata command)? What do the histogram and density plots indicate about the this regression and how it relates to "iid" residuals?

​

 

​​​​​Assignment 3  Non-linear & Qualitative Explanatory Regression Models  

This is a "flying solo" assignment with no tutorial, but it is very similar to Assignment #2. Please don't ask me for help on this.
The point is for you to figure it out. Asking other students for help is permissible. 

 

This assignment uses a dataset across cities on gasoline prices over several weeks for each city. The data is in a state file named gasprices.dta    Download this file from the Dropbox link and save it to your local computer or USB storage device.

​

1. Open Stata and open the do-file editor

2. Construct a do-file that will execute the following tasks:

     Remember: Start the do-file with clear; use the cd command to point to your Stata folder
                                 You can test any command by running it using the top menu or the command line before
                                 adding it to your do-file
a. Open the data file gasprices.dta.  In each regression below generate Robust Standard Errors

b. A regression with gasprice as the dependent variable and the Bowling Green dummy (bg) as the X-variable; store these results
c. A second regression which adds the following variables to the first regression as X-variables:

      crude oil price (crudeoilp), state gas tax (statetax), distance to pipeline (pipdistance) and its square (pipdistance2), 
      population density (popdensity) and its square (popd2), a dummy for Kentucky (KY), a dummy for whether a 
      city is on a border with a state with a higher gas tax (borderhigh), and a dummy for whether a state is on a border
      with a state with a lower gas tax (borderlow).  Store the results

d. A third regression which uses the natural log of gas prices (gaspriceln) and the natural log of crude oil prices (crudeoilpln)
    and the natural log of state gas taxes (statetaxln). Store the results 

e. Write these regression results to a single table (with square and standard errors) in that can be edited by Word (or equivalent) that is in report-ready format as in Assignment 2. 

f. Insert a title (with title number) for the table and print it along with the do-file. 

​

Remember to save your do-file. 

 

 

Questions Useful for Quizzes & Exam 1:

1. What do the coefficients on pipdistance and pipdistance2 indicate about its impact on gas prices? What is pipeline distance (in miles) at which its effect on gas prices switches signs? 

2. What does the coefficient on bg indicate about Bowling Green gasoline prices? How stable are the coefficients? What do the 
standard errors and significance stars seem to indicate?

3. What is the effect of a city being near a border of a state with a higher gas?  Draw the relationship between crude oil prices
and gas prices when borderhigh = 0  and then redraw it when it equals 1.

4. In the third regression, what do the coefficients on gas taxes and crude oil prices mean in economic terms? 


​

Assignment 4 Panel Data + Accessing Data

​1. Open the panel wage do file (this will automatically open an empty Stata data window)

The do file uses the Stata Example file nlswork.dta (national longitudinal survey of apx 5159 young women from 1968 to 1988; the Stata file is already defined as a panel data set using the Stata command: xtset idcode year).

 

You will need to adjust the change directory command at the top of the do file for the path that is appropriate for your local computer.  The variables used in the do file should be obvious from their names and labels in the data file.

​

2. Execute the do file which generates various regressions with log of wage (ln_wage) as the dependent variable.  Print the do file.

 

3. After executing the do file in Stata, the regression results will be written to a file called assign4.rtf in whatever folder you have entered in the path in the change directory command and formatted for publication.

​

4. In the assign4.rtf file, add a line for "Fixed Effects" that then designates in each column with a Y or N whether the estimates include first effects (for the individuals). Also, at the bottom, add a note that explains the estimation differences between each column. Print the results.

​

5. On the back of the do-file printout, print answers to the following questions (they are also useful for Quizzes & Exam 2)

Q1: What is the estimation method in model 1, and what is the effect of experience (tenure) on ln_wage ?

Q2: How do the estimation methods in models 2 and 3 differ from model 1 and from each other? What difference does this make in the estimation results?

Q3:What estimation method is used in model 4, and what difference does this make for the coefficient on tenure?

Q4: What is similar and different between the results in model 4 and model 5?

 

​

 

 

 

 

 

 

Assignment 5 Instrumental Variable Regression

1. Save the data file Fish Market  to your local machine (or open and save it)

       ltotqty = daily quantity of fish sold in market

        lavgprc = avg daily price per pound of fish

       wave2 = avg of last 2 days max wave heights

       wave3 = avg of max wave heights for day 3 and day 4 before current day

For each regression below, use the method utilized in Assignment 2 to store the regression results and then publish all the results
using them to a formatted table that can be opened by a document processor using the esttab command.  See the video tutorial for Assignment 2 if help is needed.

​2. Generate the regression: reg ltotqty lavgprc. 

3. Generate the regression: reg lavgprc wave1 wave2.

4. Save the predicted values for lavgprc:  predict lnpricepred, xb  (or use postestimation menu)

5. Generate the regression: reg ltotqty lnpricepred.

2. Generate a scatterplot with ltotqty on y-axis and lavgprice on x axis. Copy and paste this to the document processor file

 

Useful Questions for Quizzes & Exam 2:

Q1: Describe the "raw" relationship between quantity and price from the scatterplot. Given that the equation is estimated
        in log-log format, what is the meaning of the coefficient on lnprice?

Q2: In the regression for lnprice, what role does using wave heights play? Would wave height impact the supply of fish 
        on a given day, the demand for fish, or both?

Q3: What happens to the coefficient on price when the predicted price is used in place of the actual price?  How does
          this relate to question #2? 

 

 

​

Article Summary & PowerPoint Assignment

(Grad & Honors Students Only)

​

Provide a 2-3 page summary report of one of the regression-oriented articles listed on the Assignments page. You will make a 10- to 15-minute presentation to the class. The PPT should not just be bullets/text. You should cut/paste key graphics and tables.  Your reports & PPTs should focus on the empirical work and address these items:

     1) the overall question(s) that the data analysis is trying to answer;

     2) details about the data that are used (specifics on measurements of key variables, type of data (cross-section, panel, ...), level of data (individual, firm, national, ...); 

    3) a description and some detail about the regression methods/strategy used (IV, DID, Discontinuty, Quantile, ...) 

    4) Some details about the primary regression result(s);

    5) A brief overview of supportive results (checking for the sensitivity of results to various alternatives).  In the presentation, as you address these points, you should cut and paste graphics or tables to help illustrate your points. 

 

Your aim should be to explain the data analysis in a way that makes it clear to the undergrads in the class what the author(s) have done.  

​

I'll assign or have you select one of the following articles: (You may need to download the article from a campus computer if you run into access problems; If necessary, I can download the article and send you a pdf)

​

Reversal of Fortune (Acemoglu et al, NBER version of Quarterly Journal of Economics Article)

​

​​Economic Consequences of Hugo Chavez: Synthetic Controls (Grier & Maynard, JEBO) 

​

 Using Big Data to Estimate Consumer Surplus (Levitt et al, NBER) 

​

Impact of Crime Risk on Property Values from Megan's Laws  (American Economic Review)  

​

 

Can Losing Lead to Winning (Berger & Pope, Management Science)

​

Teachers & Match Effects (Jackson, Review of Economics & Statistics)

​

Why Do Movie Studios Produce R-Rated Films?  (Goff, Wilson, Zimmer) 

​

 

​

A Matched Pairs Analysis of State Growth Differences (Goff, Lebedinsky, & Lile)

​

​

​

​

​

​

Archived Assignments (Not in use this semester)

​

​

​

​

 

Assignment x  Regression with Binary Dependent Variables
1. Save the data file Army Vouchers (or open and save it). It contains data showing audited travel expenses for army personnel by individual.
amtclaimed = total expense filed;
error1= 1 if audit showed a problem;
miles = number of miles to destination; days =
number of days of trip; org = army unit; 

2. Generate summary statistics for all of the variable (use i.org to get summaries for groups). Snip and paste.

3. Generate a logistic regression using the command:  logistic error1 amtclaimed miles days i.org, vce(robust). Snip and paste.

4. Generate marginal effects at the mean of each variable using command:  margins, dydx(*) at means. Snip and paste.

5. Generate a classification table for predictions using a cutoff value of 0.5: estat classification, cutoff(0.5). Snip and paste.

6. Generate a specificity-sensitivity graph using command:   lsens  . Snip and paste

(All of these results from the logistic regression can be generated by navigating to and through the Postestimation option.)

 

 

Questions

Q1. Explain the marginal effect of amount claimed. What would happen to this effect if a very small amount or a very large amount were used to compute the marginal effect instead of the mean?

Q2: What does a "cutoff value" of 0.5 means in terms of computing the classification table?

Q3: How many errors are made in predicting error1=0 correctly and error1=1 correctly?  How do these relate to terms like "false positive" and "false negative" used in the table?

Q4: From the table and the graphic on specificity/sensitivity, what would happen to the predictions of 0s and 1s if the cutoff value were changed to 0.2? 

 

 

Assignment x VAR Basics (Grad + Honors Students Only;) 

Generate a 10 minute PPT presentation (5-8 slides) explaining the basic idea of Vector Autoregressions with particular attention to the impulse response functions and variance decomposition graphs that are generated from them. The presentation should be help the rest of the class understand these concepts.  Turn a "handout" version of your ppt slides. You should address: 

1. What is a VAR + what is it's purpose?

2. What is an impulse response function graph and how is one interpreted?

3. What is a (forecast error) variance decomposition graph and how is one interpreted?

 

 

 

 

Downloading .dta or .do files from Dropbox to Local Computer:

Frequently, a source of errors in using Stata files on assignments is in getting the file from Dropbox or Blackboard to the desired local folder can create issues. Here are some helps:

 

1>Select the Dropbox link and click the Download button that appears

2>Right click on the downloaded item in your browser (in Chrome this is shown at the bottom of the screen); Select “show in folder”; (you can also do this from the “downloads” folder)

3>Copy the file and then paste to the folder where you keep your Stata files (for example, statastuff)

4>Once saved in your local stata folder, you can open a data file or do-file using the File menu.

 

Using Stata “. do” Files

Stata allows users to create and execute instructions through programs that Stata labels with the extension .do and that users call “do files.”   Any command that can be executed in Stata can be written into one of these programs. They also allow for loops and complex instructions to be generated that cannot be created with a simple one-line command. In addition, they also create a permanent record of instructions that allow for the user to easily replicate procedures.

 

A new .do file can be opened by using the menu icon, or an existing .do file can be opened from the File menu. 

 

I have constructed an example do file (assign 1 do template) to help students see how these files work. This file assumes that you have the Stata example file (auto.dta) saved to your statastuff folder.

 

1>A critical aspect of the do file is changing the directory so Stata knows where to look for the data. On my local computer the address for the directory is
C:\Users\WKUUSER\Documents\statastuff

 

With this I can change the directory by using
cd C:\Users\WKUUSER\Documents\statastuff

 

2>The exact address that you use depends on your computer and folder structure. On a PC

>Go to the folder statstuff in Windows Explorer, right click on the address bar and select copy address

>Paste this address in place of my C\Users\... address and save the file.

 

Now you can execute the do file.  If correct, it will generate summary statistics in the main Stata window and a histogram in its own window.

 

This do file also creates a .txt output file with the summary stats (by using the log command). This file can be opened into Word or Notepad.  

 

 

 

 

 

 

 

 

 

 

 

 

Assignment x Time Series Regression Components

1. Save the file air passenger data to your local machine (or open it and save it)
It is an example file from Stata and contains data on monthly number of air passengers from 1949-1960. The file is already

setup as a Stata time series data set with month as the unit. The variables are

 

air = number of passengers enplaned during month

airln = ln(air) ;  

t = time trend variable (uses Jan 1960 as month 0; you can change this if you desire)

month = month of year (1 ... 12)

 

2. Generate a time plot of air and airln:

tsline air

tsline airln    (copy and paste both to a Word file) 

3. Create a correlogram and partial correlogram  of airln with command below then snip and paste. 

corrgram airln, lags(20)

4. Estimate and snip/paste regression that accounts for time series components in the form:

reg airln t l.airln l2.airln i.month

5. Generate and plot residuals with

predict residmodel1, residuals

tsline residmodel1   (copy and paste this plot)

6. Generate LM test for correlation using command: estat bgodfrey  (snip/paste) 

Print your results

 

Questions:

Q1: What does the time plot appear to indicate about the behavior of air passengers? How does the time plot of ln(air) differ?

Q2: What does the correlogram indicate about monthly airline passengers?

Q3: Briefly explain each of the components in the regression.

Q4: What do the month dummies indicate about "seasonality" of air travel? What is the reference month?

Q5: What does the residual plot indicate?

 

Write down brief notes concerning these three things on the back of your handout. 

 

The Stata (dta) data file is var irates.  Dowload it and save it to your local Stata folder.  It contains monthly data on the effective Fed Funds rate, 3 month Treasury rate, and 10 year Treasury Rate from 1959-2016.

 

Also download the var irates do file.  It contains commands for generating two VARs with associated IRF and VD graphs for each VAR.  The first VAR estimates will save to a log file. The second estimates are estimated "quietly" by Stata, meaning that it does it in the background without generating output -- just the associated graphics.  The difference in the VARs is only in the order of the variables, which may impact the IRF and VD graphs.  You will need to adjust the cd (change directory) setting in the do file to the path in your local computer

 

 

Print the log file with the VAR estimates. The graphs will be in separate files.  Incorporate these into your presentation. 

 

 

Assignment x  Building a Regression Model 

1. Write out a research question regarding a causal relationship of interest to you.

2. Express a preliminary equation (construct) indicating the dependent variable and causal variable of interest. 

3. Express this equation in regression format with a specific, measurable dependent variable, a specific, measurable causal variable, and specific, measurable control variables that may be important to take into account.

4. Provide of list definitions for the variables that might be used in the actual regression equation.

5. Is the causal variable exogenous, likely to be endogenous, or is their likely simultaneity between the dependent and causal variable?

6. Besides the control variable(s), try to think of a way to set up a control sub-sample where the causal variable has a very different from the causal ("experimental") sample.  Can you think of any kind of "natural" experiment that separates values of the causal variable for one part of the sample from another part but the two parts have similar control variable values? 

​

1. Save the file pop.dta to your local machine (or open it and then save it). It contains pop (U.S. pop in millions) and year. 

2. Regress pop on year. Snip this regression and paste into a Word file.

3. In postestimation (diagnostics and analytics plots), generate a residual-versus-predictor plot with year as the independent variable.  Copy and paste this graph into the Word file. 

4. Create a new variable =0 for years before 1950 and =1 from 1950 forward.  Name this variable d1950

5. Create a new variable that multiplies d1950 x year.  Name this variable iyear50.

6. Regress pop on year d1950 and iyear50.  Snip and paste into the Word file.
7. Open the Stata example data set lifeexp.dta containing data from 68 countries

8. Generate and print a scatterplot with lexp on the y-axis and gnppc on the x-axis. 
9. Generate the natural log of life expectancy and gnp per capita and the square of gnp per capita:

       gen lexpln=ln(lexp)      gen gnppcln=ln(gnppc)   gen gnppcsq=gdppc^2

9. Generate and print the following regressions:

      reg lexp gnppc

      reg lexp gnppc gnpcsq

      reg lexp gnppcln

      reg lexpln gnppcln

 

 

Questions useful for Exam 1.

Q1: How much does 1 year add to population in the first regression? What about 10 years?

Q2: What does the residual versus year plot indicate about this regression?

Q3: In the second regression, how much does pop go up per year before 1950? How much does pop increase per year after 1950?

Q4: Express the equation show the predicted value of pop if year = 2030? 

Q5: Draw a graph that illustrates the relationship between pop and year in the second equation

Q6: What does the scatter plot indicate about the relationship between lexp and gnppc?

Q7: What in the quadratic regression, what is the predicted lexp when gnppc=30,000? What is it in the linear version? Can you figure out the value of gnppc where the quadratic regression predictions would reach their maximum? 

Q8: Compare the R-square of the non-linear regressions? What does this indicate?

Q9: What is the elasticity of life expectancy relative to gnp per capita?

 

Assignment x Data Project 

Due at the beginning of class on Tuesday April 24 
Each day late is a 10% penalty/Last possible day to turn in is April 26

 

Choose one of the two following options. With either option, you must turn in a printed copy of the Stata do file that generates the results used in your report. (You can do preliminary work using the command line or a do file, but your final 

results must use a do file.  Here is an example do file that writes the Stata output to a word file in a "report-friendly" format.  

You can adjust your results from this format, but it makes it easier:  do file with outreg2 command      

 

Option 1

Data file is world series viewers (on television) by game from 1984 through 2017. 
The file contains the following information:

viewers = game tv viewership in millions
gamenumber = game within the series (1...7)
year = year of series

alteam = name of American League team n series
nlteam = name of National League team in series

 

Develop a research question based on this data.  For example, what is the impact of Game 7 on World Series viewership?

 

Design a regression model to try to best isolate the impact your research question. While this is the primary focus,

you should also seek to uncover and express in your report the effects of other influences. Maximizing the R-squared is not an explicit goal, but you should be able to achieve a relatively high explanatory value.  You should use OLS in your report, but you may discuss other methods that might be appropriate. (I would use the i.variable approach to estimating fixed effects rather than the explicit fixed effects estimator.  Use year as a single variable not as an i.variable).

 

Write a report on the project (double-space, 12-point font,  min 3 pages/max 6 pages not including Tables/Figures.  You may place Tables & Figures at the back. ) on the project that includes an introduction and background explaining the research question and provides a short background on the World Series, viewership, and how viewership varies over games.   You should present your model, define the variables, and present descriptive stats. Along the way, your report should include graphic(s)  (historgram, time plot, or something) that help readers visualize interesting aspects of the data.  The placement is at your discretion. Then, present a results section that explains your estimation method, displays your results in table format that is customary in economics, and summarizes relevant and interesting aspects of your the results in some detail. This includes but is not limited to the main research question. The report should be written as if writing to explain your investigation to your classmates in Econ 465 (as if they had never heard of this project) and with the formality that you would use if presenting this as your senior assessment or graduate project.

 

You may discuss your projects with each other, but your work must be your own/team.

 

 

Option 2

The data file is  wdw crowds  with daily crowd levels at Walt Disney World parks from Jan 1, 2010 to Dec 31, 2016. 

The file contains the following information

crowdx = crowd level for all 4 theme parks on scale 1 to 10 (10 is highest crowd)

dateid = day

month (monthx) = month of year

dayofweek (dayofweekx) = day of week (Sun ... Sat)

foodwinex = 1 if food & wine festival at EPCOT is going on

yearx = year

notes1, notes2, notes3 = string variables noting special events/days

 

(The "x" in the variable name denotes a numeric variable; otherwise the variable is a string or date variable)

 

Develop a research question based on this data. For example, what is the impact of the Food & Wine Festival on crowd levels?

 

Design a regression model to try to best isolate the impact your research question. While this is the primary focus,

you should also seek to uncover and express in your report the effects of other influences. Maximizing the R-squared is not an explicit goal, but you should be able to achieve a relatively high explanatory value.  You should use OLS but in your report explain

why other estimation methods might be appropriate. This will involve including time series components (lags, trends).  Assume that crowd levels are stationary so no tests of it are required in this project. 

 

You may construct additional dummy variables (essentially, fixed effects) that capture other events or seasonal aspects if you wish. Do not estimate with Stata fixed effecs.  Include any dummy variables in regression.  

(Key notes on dummy variables:  you can include all months by using i.monthx in the regression command -- Stata will automatically drop a reference value.  Also, to construct dummies with this data, it will be easier to use the string version of variables and then convert to numeric.)  Dateid can be included as a trend variable or you could include a year variable.  

 

Example:

gen xmas = "notxmas"

replace xmas = "xmas" if month=="December" & dayofmonth="25"

encode xmas, gen(xmasx)

 

Write a report on the project (double-space, 12-point font,  min 3 pages/max 6 pages not including Tables/figures. You may place Tables & Figures at the back. ) on the project that includes an introduction and background explaining the research question and provides a short background on WDW parks and crowds.  You should then present your model, define the variables, and present descriptive stats.  You should provides  graphic(s)  on crowd levels (historgram, time plot, or something) that help readers visualize interesting or relevant aspects of your data.  The placement is at your discretion. Then, present a results section that explains your estimation method, displays your results in table format that is customary in economics, and discusses the results in some detail. The report should be written as if writing to explain your investigation to your classmates in Econ 465 (as if they had never heard of this project) and with the formality that you would use if presenting this as your senior assessment project.

 

You may discuss your projects with each other, but your work must be your own/team. 

 

1x. Refer the handout (or click here) on downloading files from Dropbox and using do files.  Make sure the auto.dta data example file is saved to your local statastuff folder under the name auto.dta.

2. Download and save the do-file titled assign 1 do file template  to your local computer using the instructions on the handout.

3. Change the directory using your local computer address per the handout instructions.

4. Execute the do file.  Open the log file into Word or Notepad. The file will have the name auto.txt.  Print the log file: ), and print the histogram. 

 

Read the St. Louis Fed Review article on Human Capital Growth Across Metro Areas with special attention given to pp. 112-120 and

 

Student Basic Regression & Reporting Project

(Revision: Due Tuesday March 19 at beginning of class)  

 

​

​

Use gaspriceassign.dta   available in Dropox. 
The data cover retail gasoline prices and related variables across 40 cities from September 2010 to January 2011.

 

 

There are two primary research question: 1) What is the relationship between state gasoline taxes and gasoline prices? 2) What is the relationship between crude oil prices and gasoline prices? 

 

You should think about the overall model (control variables; alternative ways of accounting for other influences such as sub-samples ...)

 

Generate a report that reflects the structure and style (including graphs, tables) of the St. Louis Fed article that we reviewed. 

 

This is a formal report.   Tables/Regression output should look something close to common reporting formats in economic articles (it should not merely copy and paste Stata regression output our tables.  Scatterplots can be pasted from Stata). 

 

To receive an A, in addition to the main regression results, it must include at least 1 scatterplot and report at least one alternative means of controlling for other variables. It must also include some residual diagnostics.  

 

For B or higher, you must use and turn-in a do-file.  It is fine to run explorations using the command line, but you should use a do-file for generating your final output. 

 

Below, I provide a link to an example do-file using this data set that also generates tables using Stata's outreg command that are formatted nicely.  You can use this file as a template and adjust it for your specific purposes.
Example of Stata do-file with reg and outreg

​

​

​

Downloading and Do
bottom of page