I hope that the title to this post hit all the keywords. So here was
my dilema: after running the reg command to estimate regression
coefficients (betas), I wanted to apply this equation to a different set
of data without having to copy and paste the actual beta hats.

So I have a dataset, hhsurvey.dta, and I estimate the following regression

y = b0 +b1*X1 + b2*X2 + ... bn*Xn

and I get

y_hat

b0_hat

b1_hat

.

.

.

bn_hat

With this, I want to take a different dataset, applicants.dta, with the same variables (but of course different values for these variables), and I want to predict y for the observations in applicants.dta:

y_hat_2 = b0_hat +b1_hat*X1 + b2_hat*X2 + ... bn_hat*Xn

I could copy and paste the beta_hats from the regression outputs, but this it painful to do even once (I am using many variables because I am using many including categorical variables). Any I suspect I will have to do this many times. My solution was to take the output of the e(b) matrix, which has all the information necessary. After running the regression command:

xi: svy: reg y car i.roofmaterial i.fencematerial i.hhsize ...

you will find some great information stored in the ereturn value "e(b)"

matrix list e(b)

anyways, to make an equation with the regression variables and beta_hats, try the following:

local varnames_rural : coln e(b) // Stores the column names (i.e. variable names) in a local macro.

local equation_rural "" // Will put the equation in this local macro

foreach varn of local varnames_rural { // Loop through all the column (variable) names

local coef = _coef[`varn'] // This is the beta_hat corresponding to the variable name (inc. categorical vars)

if ("`varn'" != "_cons") { // The constant in the regression shouldn't be multiplied by anything

if (`coef' < 0) { // we want to put a "+" before positive coefficients, but not before negative coefficients

local equation_rural "`equation_rural' `coef'*`varn'"

}

else {

local equation_rural "`equation_rural' + `coef'*`varn'"

}

}

else {

if (`coef' < 0) {

local equation_rural "`equation_rural' `coef'"

}

else {

local equation_rural "`equation_rural' + `coef'"

}

}

}

di "equation: `equation_rural'"

How about if you want to save this to a file, so that you can load it into a macro in another do file? Try this:

tempname fh

file open `fh' using "myfile.txt", w replace all

file write `fh' "`equation_urban'" _n

file close `fh'

Now, in your new do file that has the applications.dta dataset, with the same variables names, you can use the following code to calcualte y_hat_2 for the applications.dta dataset:

// load the equation

tempname fh2

file open `fh2' using "myfile.txt", r t

file read `fh2' line1

file close `fh2'

di `"line1 = `line1'"'

gen y_hat_2 = `line1'

This should work - leave a comment if it doesn't. Good luck!

So I have a dataset, hhsurvey.dta, and I estimate the following regression

y = b0 +b1*X1 + b2*X2 + ... bn*Xn

and I get

y_hat

b0_hat

b1_hat

.

.

.

bn_hat

With this, I want to take a different dataset, applicants.dta, with the same variables (but of course different values for these variables), and I want to predict y for the observations in applicants.dta:

y_hat_2 = b0_hat +b1_hat*X1 + b2_hat*X2 + ... bn_hat*Xn

I could copy and paste the beta_hats from the regression outputs, but this it painful to do even once (I am using many variables because I am using many including categorical variables). Any I suspect I will have to do this many times. My solution was to take the output of the e(b) matrix, which has all the information necessary. After running the regression command:

xi: svy: reg y car i.roofmaterial i.fencematerial i.hhsize ...

you will find some great information stored in the ereturn value "e(b)"

matrix list e(b)

anyways, to make an equation with the regression variables and beta_hats, try the following:

local varnames_rural : coln e(b) // Stores the column names (i.e. variable names) in a local macro.

local equation_rural "" // Will put the equation in this local macro

foreach varn of local varnames_rural { // Loop through all the column (variable) names

local coef = _coef[`varn'] // This is the beta_hat corresponding to the variable name (inc. categorical vars)

if ("`varn'" != "_cons") { // The constant in the regression shouldn't be multiplied by anything

if (`coef' < 0) { // we want to put a "+" before positive coefficients, but not before negative coefficients

local equation_rural "`equation_rural' `coef'*`varn'"

}

else {

local equation_rural "`equation_rural' + `coef'*`varn'"

}

}

else {

if (`coef' < 0) {

local equation_rural "`equation_rural' `coef'"

}

else {

local equation_rural "`equation_rural' + `coef'"

}

}

}

di "equation: `equation_rural'"

How about if you want to save this to a file, so that you can load it into a macro in another do file? Try this:

tempname fh

file open `fh' using "myfile.txt", w replace all

file write `fh' "`equation_urban'" _n

file close `fh'

Now, in your new do file that has the applications.dta dataset, with the same variables names, you can use the following code to calcualte y_hat_2 for the applications.dta dataset:

// load the equation

tempname fh2

file open `fh2' using "myfile.txt", r t

file read `fh2' line1

file close `fh2'

di `"line1 = `line1'"'

gen y_hat_2 = `line1'

This should work - leave a comment if it doesn't. Good luck!

## No comments:

Post a Comment