Tuesday, May 8, 2012

Constructing the regression equation with actual coefficients/betas from the e(b) matrix from the ereturn list after running reg with xi and svy

I hope that the title to this post hit all the keywords. So here was my dilema: after running the reg command to estimate regression coefficients (betas), I wanted to apply this equation to a different set of data without having to copy and paste the actual beta hats.

So I have a dataset, hhsurvey.dta, and I estimate the following regression

y = b0 +b1*X1 + b2*X2 + ... bn*Xn

and I get


With this, I want to take a different dataset, applicants.dta, with the same variables (but of course different values for these variables), and I want to predict y for the observations in applicants.dta:

y_hat_2 =  b0_hat +b1_hat*X1 + b2_hat*X2 + ... bn_hat*Xn

I could copy and paste the beta_hats from the regression outputs, but this it painful to do even once (I am using many variables because I am using many including categorical variables). Any I suspect I will have to do this many times. My solution was to take the output of the e(b) matrix, which has all the information necessary. After running the regression command:

xi: svy: reg y car i.roofmaterial i.fencematerial i.hhsize ...

you will find some great information stored in the ereturn value "e(b)"

matrix list e(b)

anyways, to make an equation with the regression variables and beta_hats, try the following:

local varnames_rural : coln e(b) // Stores the column names (i.e. variable names) in a local macro.
local equation_rural "" // Will put the equation in this local macro
foreach varn of local varnames_rural { // Loop through all the column (variable) names
    local coef = _coef[`varn'] // This is the beta_hat corresponding to the variable name (inc. categorical vars)
    if ("`varn'" != "_cons") { // The constant in the regression shouldn't be multiplied by anything
        if (`coef' < 0) { // we want to put a "+" before positive coefficients, but not before negative coefficients
            local equation_rural "`equation_rural' `coef'*`varn'"
        else {
            local equation_rural "`equation_rural' + `coef'*`varn'"
    else {
        if (`coef' < 0) {
            local equation_rural "`equation_rural' `coef'"
        else {
            local equation_rural "`equation_rural' + `coef'"
di "equation: `equation_rural'"

How about if you want to save this to a file, so that you can load it into a macro in another do file? Try this:

tempname fh
file open `fh' using "myfile.txt", w replace all
file write `fh' "`equation_urban'" _n
file close `fh'

Now, in your new do file that has the applications.dta dataset, with the same variables names, you can use the following code to calcualte y_hat_2 for the applications.dta dataset:

// load the equation
tempname fh2
file open `fh2' using "myfile.txt", r t
file read `fh2' line1
file close `fh2'

di `"line1 = `line1'"'

gen y_hat_2 = `line1'

This should work - leave a comment if it doesn't. Good luck!

No comments: