Wednesday, November 28, 2012

Stata tip: collapse dataset while preserving variable and value labels

When I use the collapse command, I loose the variable and value labels associated with my variables. The following program does everything that the collapse command does, but preserves the variable and value labels. It also has an option to refrain from putting the stat in the variable label.

Usage:

collapseandpreserve hdff=zarpl time_id (last) obraz wouer=soc_st, by(hh_code resp) o

Program:

program define collapseandpreserve

    // Written by Shafique Jamal (shafique.jamal@gmail.com).
    // This will collapse the dataset and preserve the variable and value labels. The syntax for using this is just like with the collapse command.
    // There is one additional optional option: show stat. If you add this option to the command (collapseandperserve ... ,by(...) omitstatfromvarlabel
    //    then it will not show the statistic (i.e. (fist), (mean), (last), etc.) in the variable label
   
    syntax anything(id="variable and values" name=arguments equalok), by(string asis) [cw fast Omitstatfromvarlabel]
    version 9.1
   
    // save all the value labels
    tempfile tf
    label save using `"`tf'"', replace
   
    // get the list of variables to be collapse, and keep track of the value label - variable correspondence
    tempname precollapse_listofvars
    tempname postcollapse_listofvars
    tempname listofvaluelabels
    tempname valuelabelname
    tempname stat
    tempname oldvarname
    tempname newvarname
    local `stat' "(mean)"
    foreach a of local arguments {
        di `"word: `a'"'
        if (regexm(`"`a'"',"^\(.*\)$")) { // if there is something like (first), (mean), etc.
            local `stat' = `"`a'"'   
        }
        else { // This is a variable. Store the associated variable label and value label name
           
            // What if there is an = in the term? then need two list of variables: a precollapse list and a postcollapse list
            if (regexm(`"`a'"',"^(.*)=(.*)$")) {
                // di "Regex match!"
                local `oldvarname' = regexs(2)
                // di "oldvarname: ``oldvarname''"
                local `newvarname' = regexs(1)
                // di "newvarname: ``newvarname''"
            }
            else {
                // di "NO regex match!"
                local `oldvarname' `"`a'"'
                // di "oldvarname: ``oldvarname''"
                local `newvarname' `"`a'"'
                // di "newvarname: ``newvarname''"
            }
           
            local `precollapse_listofvars'   `"``precollapse_listofvars'' ``oldvarname''"'
            local `postcollapse_listofvars'   `"``postcollapse_listofvars'' ``newvarname''"'
            local `valuelabelname' : value label ``oldvarname''
            tempname vl_``newvarname''
            local `vl_``newvarname''' : variable label ``oldvarname''
            if (`"``vl_``newvarname''''"' == `""') {
                local `vl_``newvarname''' `"``newvarname''"'
            }
            di `"omitstatfromvarlabel = `omitstatfromvarlabel'"'
            if (`"`omitstatfromvarlabel'"'==`""') {
                local `vl_``newvarname''' `"``stat'' ``vl_``newvarname''''"'
                di "not omitting"
            }
            else {
                local `vl_``newvarname''' `"``vl_``newvarname''''"'
                di "omitting"
            }
           
            if (`"``valuelabelname''"' == `""') { // variable has no value label
                local `listofvaluelabels' `"``listofvaluelabels'' ."'
            }
            else {
                local `listofvaluelabels' `"``listofvaluelabels'' ``valuelabelname''"'
            }
        }
    }
   
    collapse `arguments', by(`by') `cw' `fast'
    // macro list
   
    // retrieve the valuelabels
    qui do `"`tf'"'
   
    // reapply the variable labels and the value labels
    tempname count
    local `count' = 0
    di "------------------------------------------------"
    foreach var of local `postcollapse_listofvars' {
        di `"var: `var'"'
        di `"its variable label: ``vl_`var'''"'
        // reapply the variable labels
        local `count' = ``count'' + 1
        label var `var' `"``vl_`var'''"'
       
        // reapply the value labels
        local `valuelabelname' : word ``count'' of ``listofvaluelabels''
        if (`"``valuelabelname''"' != `"."') {
            label values `var' ``valuelabelname''
        }
    }
end program

Tuesday, November 27, 2012

Stata tip: Plotting simillar graphs on the same graph

Suppose you want to make a bar graph of a variable, such as consumption, for two mutually exclusive groups such as males and females, represented by one categorical variable ("male"). This is easy enough: use the graph bar command with an over() option. What if you want to plot over two categorical variables, one within the other: for example you want to plot average consumption for males and females that are self-employed, and average consumption for males and females that are not self-employed. Easy enough, just include an extra over() option with the extra categorical variable. In this example, each over group is mutually exclusive: you are either male or female, but can't be both, and you are either self-employed or not self-employed, but can't be both. In this example, the variables in your dataset are:

consumption, male, self-employed

Suppose, however, that you want to essentially combine two separate graphs into one graph as follows: you want to plot average consumption over three categorical variables that are NOT mutually exclusive, so you don't want to plot one within the other. For example, imagine that you are considering three different policy options for awarding a social assistance benefits: the current policy ("currentpolicy"), alternative A ("alternative_a") and alternative B ("alternative_b"). Each mechanism divides the population into those who qualify and those who don't qualify for the social assistance benefit. Thus each policy option is represented in the data by a binary variable (a.k.a. a dummy variable, which is just a categorical variable with two levels: 0 for those who do not qualify and 1 for those who do qualify). Of the three policy options, the current policy is the least pro-poor, alternative A is more pro-poor (more of the benefits go to the poor), and alternative B is the most pro-poor.

Now, the three policy options are not mutually exclusive. It is possible to qualify under all three policy options, to be excluded under all three policy options, or to qualify under only one or two of the policy options. This is not true of the groups self-employed and not self-employed, and of the groups male and female. Essentially, suppose you want to combine the first three graphs below onto the the same graph, as shown in the fourth graph:






The program below will do this. Make sure you have label values defined for the categorical values. Also note that:

a) The value labels for each of the categories should have the same numbering (i.e. they should all be 0, 1, 2 or 0, 2, 5. It should not be the case that one has 0, 1, 3 and the other has 1, 3, 4).

b) The groups defined by the categorical variable will be plotted the in the order that you specify them. So in the fourth graph above, the order is: current policy, alternative a, alternative b, since that is what is given in the option catvarlist below: catvarlist("currentpolicy alternative_a alternative_b")

c) You need to specify new value labels with the same numbering as the original value labels for the categorical variables. This is necessary for the plot to turn out nice (the spacing gets mucked up if I don't force this). You can do this in the v() option below. So the original value labels could be:

current policy: 0 "non-exempt" 1 "exempt"
alternative a: 0 "does not receive" 1 "receives"
alternative b: 0 "Ineligible" 1 "eligible"

You might want to relabel this using the v() option as follows:

v(0 "non-beneficiary" 1 "beneficiary")

To generate the fourth graph:

. overlappingcatgraphmean pccd using "$WHO_KG_reports/eraseme.dta", gc(graph bar (asis)) catvarlist("currentpolicy alternative_a alternative_b")     v(0 "Non-beneficiary" 1 "Beneficiary") over2options(lab(angle(0) labs(vsmall))) replace go(note(`"Source: Some data source "') asy asc title("Average consumption of groups within population") subtitle("Simulation of policy options") ytitle("Per capita HH consumption (LCU)", margin(medium)) legend(size(small)) blabel(total, format(%9.0fc)))

To generate the first three graphs:

. graph bar pccd [aw=expfact], over(currentpolicy) asy title("Mean annual consumption comparison") subtitle("Current Policy") ytitle("Consumption")  note("Source: some data source") blabel(total, format(%9.0fc))

. graph bar pccd [aw=expfact], over(alternative_a) asy title("Mean annual consumption comparison") subtitle("Alternative A Policy") ytitle("Consumption") note("Source: some data source") blabel(total, format(%9.0fc))

. graph bar pccd [aw=expfact], over(alternative_b) asy title("Mean annual consumption comparison") subtitle("Alternative B Policy") ytitle("Consumption") note("Source: some data source") blabel(total, format(%9.0fc))

The program:

program define overlappingcatgraphmean

    // Written by Shafique Jamal (shafique.jamal@gmail.com). 27 Nov 2012
    // I want to plot the mean of a variable over categorical values on the same plot. Of course, these categorical variables will not be mutually exclusive between them (though the are within them)
    // "using" should specify a .dta file - this program will save a dataset
    // doesn't take weights - uses svy mean to calculate the mean
    //
    // You call it like this:
    //
    // overlappingcatgraphmean varname using "filename.dta", gc(graph bar (asis)) go(over(catvariablelabel, [over_subopts]) over(catvariablelevel, [over_subopts]) asc title("My Title") ...) catvarlist(categoricalvar1 categoricalvar2) replace
    //    
    // Note that:
    // 1. 'catvariablelabel', 'catvariablelevel_n' 'catvariablelevel' must be entered exactly as is (without the quotes) - these are names of variables that the program creates
    // 2. the order in which you enter the over() options is up to you.
    //
    // UPDATE 12-07-2012: Best way is to call it with a long dataset like this: graph bar v, over(eligible) over(avg). Also note that I haven't tested whether this works with "if"

    syntax varname using/ [if] [in], GCmd(string) GOptions(string asis) CATvarlist(varlist) Valuelabelsforlevels(string asis) [replace over1options(string asis) over2options(string asis) ]
    version 9.1
    marksample touse
    tempname tempmat
    tempname variablelabel
    local `variablelabel' : variable label `varlist'   
   
    // foreach category, find the mean
    foreach catvar of local catvarlist {
        tempfile tf_`catvar'
       
        // this is a pain: get the name of the variable's value label
        tempname tn_`catvarvaluelabel'
        local `tn_`catvarvaluelabel'' : value label `catvar'
        label save ``tn_`catvarvaluelabel''' using `"`tf_`catvar''"', replace
       
        // UPDATE: None of this is necessary. The user will pass a list of value labels, separated by spaces, and these will be assumed to be the same for all the categorical variables specified
        //    e.g. user can pass v(0 "Qualifies" 1 "Does not Qualify"), where the categorical variables and corresponding value lables are:
        //    exempt     : 0 "Exempt"     1 "Non-exempt"
        //    PMT        : 0 "Eligible"    1 "Non-eligible"
        //    MBPF    : 0 "Receives"    1 "Does not receive"

        di "cat = `catvar'"
        tempname catvarlabel_`catvar'
        local `catvarlabel_`catvar'': variable label `catvar'
        tempname levels_`catvar'
        levelsof `catvar', local(`levels_`catvar'')
        foreach level of local `levels_`catvar'' {
            svy: mean `varlist' if `catvar' == `level' & `touse'
            matrix `tempmat' = r(table)
            tempname mean_`catvar'_`level'
            local `mean_`catvar'_`level'' = `tempmat'[1,1]
            tempname vl`catvar'_`level'
            local `vl`catvar'_`level'' : label (`catvar') `level'
            // di "Mean of var: ``mean_`catvar'_`level'''"
        }
    }

    tempname valueslabels
    label define `valueslabels' `valuelabelsforlevels'
    tempfile tf_valuelabelsforlevels
    label save `valueslabels' using `"`tf_valuelabelsforlevels'"', replace
   
    // I'll now make a dataset out of this with the following variables: mean of the variable; category name; category level
    //     The latter two will be numeric, categorical variables with variable labels attached.
    preserve
    clear
    do `"`tf_valuelabelsforlevels'"'
    gen meanofvariable = .
    label var meanofvariable `"``variablelabel''"'
    gen catvariablelabel = ""
    gen catvariablelevel = ""
    gen catvariablelevel_n = .
    gen sortorder = .
   
    // create the sort order - it will be the order in which the categorical variables were specified
   
    tempname count sortcount
    local `count' = 0
    local `sortcount' = 0
    foreach catvar of local catvarlist {
        // di "cat = `catvar'"
        local `sortcount' = ``sortcount'' + 1
        foreach level of local `levels_`catvar'' {
            local `count' = ``count'' + 1
            set obs ``count''
            replace meanofvariable = ``mean_`catvar'_`level''' in ``count''
            replace catvariablelabel = `"``catvarlabel_`catvar'''"' in ``count''
            replace catvariablelevel_n = `level' in ``count''
            replace catvariablelevel = `"``vl`catvar'_`level'''"' in ``count''
            replace sortorder = ``sortcount'' in ``count''
           
            // di "Mean of var: ``mean_`catvar'_`level'''"
        }
    }
    label values catvariablelevel_n `valueslabels'
    save `"`using'"', `replace'
    `gcmd' meanofvariable,  over(catvariablelevel_n, sort(catvariablelevel_n) `over1options') over(catvariablelabel, sort(sortorder) `over2options') `goptions'
    restore

end program

Stata tip: fixing the legend on bar graphs to display variable labels instead of variable names

Check out the legends on these two graphs (the first one is the problem legend, the second one is the better legend):



For the first one, I used the command:

. graph bar (mean) appliedmed hospitalized [aw=expfact], over(exempt) ...

and in the legend, it used "mean of [variable name]" instead of using the variable label. If you use the option nolabel after the graph bar command, you would just get "[variable name]" in the legend. How do you get stata to use the variable labels in the legend instead of the variable names, like in the second graph above? (note that, in the second graph, my program makes the variable labels go over two lines when they are long, and makes the line break at a space, not in the middle of a word). Use the following code:

Usage:

/ local vlist appliedmed hospitalized
. makelegendlabelsfromvarlabels `vlist', local(relabellegend) c(30)
. graph bar (mean) `vlist' [aw=expfact], over(exempt) title(`"Share reporting applied for medical assistance in the past 30 days"') ytitle("Fraction of group", margin(medium)) blabel(total, format(%9.2fc)) subtitle("Average for each group") legend(size(vsmall) `relabellegend')


Where the program makelegendlabelsfromvarlabels is defined as below. In the above, the option c(30) tells stata that the first line should have only 30 characters, and that the rest of the value label should be placed on the line below.

program define makelegendlabelsfromvarlabels

    // Written by Shafique Jamal (shafique.jamal@gmail.com). 25 Nov 2012
    //
   
    // Wrote it to fix an annoyance with graph bar. I want graph bar to use variable labels, not variable names, in the legend, but it won't do this if I am using a "(stat)" rather than "(asis)"
    syntax varlist, local(name local) [c(integer 30)]
    version 9.1
   
    // local charlength = 30
   
    tempname count
    local `count' = 0
    tempname labeloptions
    tempname variablelabel
    foreach var of local varlist {
        local `count' = ``count'' + 1
        local `variablelabel' : variable label `var'
       
        // It would be great to break this up at a word boundary if the length is > 34 characters
        if (length(`"``variablelabel''"') > `c') {
            tempname variablelabel_part1
            tempname variablelabel_part2
            tempname variablelabel_tochange
            tempname positionofspace
            tempname positionofspace_prev
            tempname exitwhileloop
            local `exitwhileloop'   = 0
            local `positionofspace' = 0
            local `variablelabel_tochange' `"``variablelabel''"'
            while (``exitwhileloop'' == 0) {
           
                local `positionofspace' = strpos(`"``variablelabel_tochange''"', " ")
                if (``positionofspace'' >= `c' | ``positionofspace''==0) {
                    local `exitwhileloop'   = 1
                }
                else {
                    local `positionofspace_prev' = ``positionofspace''
                    local `variablelabel_tochange' = subinstr(`"``variablelabel_tochange''"'," ",".",1)
                }
           
            }
           
            local `variablelabel_part1' = substr(`"``variablelabel''"', 1, ``positionofspace_prev'')
            local `variablelabel_part2' = substr(`"``variablelabel''"', ``positionofspace_prev'' + 1, . )
            local `labeloptions' `"``labeloptions'' label(``count'' `"``variablelabel_part1''"' `"``variablelabel_part2''"') "'
        }
        else {
            local `labeloptions' `"``labeloptions'' label(``count'' `"``variablelabel''"') "'
        }
    }
   
    // di `"labeloptions: ``labeloptions''"'
    // need to return this in a local macro
    c_local `local' `"``labeloptions''"'
   

end program


Monday, November 26, 2012

Stata tip: Rename the value label associated with a variable, when renaming said variable

Suppose you have a datset with the variable a9, and the value label associated with this is a9 (or gobledygook, or whatever). You may want to change this variable name to something more telling, like maritalstatus. If you use

rename a9 maritalstatus

The value label remains a9. The following ado file will allow you to change both the variable name and the name of the variable label at the same time:

renamevarandvarlabel a9 maritalstatus

now both the variable name and the variable label are maritalstatus. Note that the original variable label can be named anything. For example, if the original variable label was gobledygook, it would still be changed to maritalstatus.

program define renamevarandvaluelabel

    // Written by Shafique Jamal (shafique.jamal@gmail.com). 25 Nov 2012

    // This program renames the variable and the value label. Usage:
    //     renamevarandvaluelabel originalvarname newvarname
    // What it does:
    //    rename originalvarname newvarname
    // and it changes the name of the value label of originalvarname to newvarname.
    // Just make sure that if there is already a value label named newvarname, you're ok with loosing it.

    // syntax anything(id="variable and values" name=arguments)
    syntax anything(id="original and new label name" name=labelnames)
    version 9.1
   
    // steps:
    //    1. drop the label with the new label name, if it exists
    //  2. create the new label from the old label
    //  3. apply this new label to variable
   
    // di "labelnames = `labelnames'"
   
    foreach item of local labelnames {
        // di `"item = `item'"'
    }
   
    tempname originallabelname
    tempname originalvarname
    local `originalvarname' : word 1 of `labelnames'
    tempname newvarandlabelname
    local `newvarandlabelname' : word 2 of `labelnames'
   
    // Step 1. drop the label with the new label name, if it exists. Wait, if it exists... what do we do? Quit the program
    cap label list ``newvarandlabelname''
    if (_rc == 0) {
        di "That label (``newvarandlabelname'') already exists. You can use the command "renamevaluelabel [oldlabelname] [newlabelname] written by Shafique Jamal (shafique.jamal@gmail.com) to change that value label name." Exiting"
        exit
    }
   
    // Step 2. create the new label from the old label. First need to get the name of the label value of the original variable name. Do this only if there is a value label attached
    local `originallabelname' : value label ``originalvarname''
    if ("``originallabelname''"~="" & "``originallabelname''"~=" ") {
   
        di "There is an existing label"
        label copy ``originallabelname'' ``newvarandlabelname''
   
        // Step 3. rename the variable, then attached the new variable label
        rename ``originalvarname'' ``newvarandlabelname''
        label values ``newvarandlabelname'' ``newvarandlabelname''
    }
    else { // Just rename the variable, forget about the value label, if there is no original value label
   
        di "No existing label"
        rename ``originalvarname'' ``newvarandlabelname''
    }
   
   
end



Stata tip: Easy and short way to generate household head variables for individual-level datasets

Suppose you have an individual-level dataset (so you have a dataset with data on multiple members in the household), and you want to generate an variable that says something about the household head (e.g. household head is male, or is unemployed, etc.). This program will allow you to do so with just one command:

program define genhhhcharacteristics

    // Written by Shafique Jamal (shafique.jamal@gmail.com).
    // For an individual level dataset (includes multiple household members, not just the household head), generates a variable indicating a characteristic of the household head
    // e.g. suppose you want to generate a new variable (hhh_male) indicating the gender of the household head, and the variable identifying the household head is "reltohead", with 1 being the head,
    // and you want to do it by hhid of course. You would use the following command:
    //   
    // genhhhcharacteristics male, b(hhid) gen(hhh_male) h(reltohead) id(1)
    //
    // The above would be the equivalent of doing the following:
    //    gen hhh_male_interm = 1 male if reltohead == 1
    //  bys hhid: egen hhh_male = max(hhh_male_interm)
    //    drop hhh_male_interm
    // And then copying the value label and a modified variable label over to the new household head variable
    //

    syntax varname, Byvariables(varlist) GENerate(name) Headvariable(varname) [IDofhead(integer 1) ]
    version 9.1

    tempvar intermediaryvariable
    gen `intermediaryvariable' = `varlist' if `headvariable' == `idofhead'
    bys `byvariables': egen `generate' = max(`intermediaryvariable')
   
    // Now copy the value label over, if there is one
    tempname valuelabel
    local `valuelabel' : value label `varlist'
    if ("``valuelabel''"~="" & "``valuelabel''"~=" ") {
   
        // di "There is an existing label"
        label values `generate' ``valuelabel''
    }
   
    // Copy over also the variable label
    tempname variablelabel
    local `variablelabel' : variable label `varlist'
    label var `generate' `"``variablelabel'' (For `headvariable' == `idofhead', by `byvariables')"'
   
end program



Friday, November 23, 2012

Stata tip: plotting the output of the tab function

UPDATE2: I updated this to allow for "if" and "in"

UPDATE: I updated this to preserve the value labels. So var2 (the second variable in your variable list) must have a value label attached to it.

Suppose you want to plot the output of the two-way tab function? Here is a program that will do it (see below). It is actually a wrapper for the tabout command. Some notes about the options:

using: put here the name of the filename that you want to save the tabout data to, in tab separated format. The graphs that this command produces will save graphs using the same filename but with different extension.

gc: this stands for graph command. You can use gc("graph bar"), gc("graph hbar")... and maybe others

go: this stands for graph options. These are the options that you would use for the graph command above (e.g. note, title, b1title, subtitle, etc)

ta: this stands for tabout options. These are the options you would use with the tabout command (e.g. c(), f(), etc.)

Usage:

taboutgraph var1 var2 [aw=weight] using "filename_to_savedatato.csv", gc("graph bar") ta(cells(col) f(2 2 2 2)) replace go( note("Source: XXX") b1title("Quintile") title(`"Composition of Population"') ytitle("Percent of population in the quntile"))

Code

program define taboutgraph

    // Written by Shafique Jamal (shafique.jamal@gmail.com)
    // This program requires that the second variable in varlist have a value label attached to it
    // It plots the column output of the tabout command

    syntax varlist(min=2 max=2) [if] [in] using/ [aweight], GCmd(string) GOptions(string asis) TAboutoptions(string asis) [replace overcategorysuboptions(string asis) overxsuboptions(string asis)]
    version 9.1
    marksample touse
    // di `"`0'"'
    cap drop _v*
    // cap ssc install lstrfun
  
    // first generate the table
    tabout `varlist' [`weight'`exp'] if `touse' using `using', `replace' `taboutoptions'
    di `"tabout [`weight'`exp'] `varlist' if `touse' using `using', `replace'"'
    local number_of_rows    = r(r)
    local number_of_columns = r(c)
    return list
  
    // get the filename
    di `"regexm:"'
    di regexm(`"`using'"',`"((.*)\.(.+))$"')
    if (regexm(`"`using'"',`"((.*)\.(.+))$"')) {
        local pathtofile_original            = regexs(1)
        local pathtofile_withoutextension    = regexs(2)
        local pathtofile_extension            = regexs(3)
    }
    di `"pathtofile_original:`pathtofile_original'"'
    di `"pathtofile_withoutextension:`pathtofile_withoutextension'"'
    di `"pathtofile_extension:`pathtofile_extension'"'
    // open the file and process it.
  
    local count = 0
    tempname fhr
    tempname fhw
    tempfile tf
    file open `fhr' using `"`pathtofile_original'"', r
  
    // ---------------------------
    // file open `fhw' using `"$WHO_KG_reports/tempfile.csv"', t write all replace
    file open `fhw' using `"`tf'"', t write all replace
  
    local count = `count' + 1

    // First line is variable label.
    file read `fhr' line
    return list
    local count = 1
    while r(eof)==0 {
        local count = `count' + 1
        // di `"count = `count'"'
        file read `fhr' line
       
        if (`count'~=3) { // This line is units - we can throw this away
            file write `fhw' `"`line'"' _n
            // di `"`line'"'
        }
    }
       
    file close `fhr'
    file close `fhw'
  
    // We should save the value labels. Check to make sure that the label exists
    tempfile tfvaluelabels
    tempname nameofvaluelabel
    tempname variablenamewithlabel
    local `variablenamewithlabel' : word 2 of `varlist'
    local `nameofvaluelabel' : value label ``variablenamewithlabel''
    label save ``nameofvaluelabel'' using `"`tfvaluelabels'"', replace
  
    preserve
    qui insheet using `"`tf'"', t clear names
  
    // I want to restore the value levels and value labels
    do `"`tfvaluelabels'"'
    // ssc install labellist
    // levelsof ``nameofvaluelabel'', local(levels)
    labellist ``nameofvaluelabel''
    local levels = r(``nameofvaluelabel''_values)
    local labels = r(``nameofvaluelabel''_labels)
  
    save `"`pathtofile_withoutextension'_short.dta"', replace

    drop total
    drop if _n == _N
  
    local count = 0
    local count_levels = 0
    foreach var of varlist * {
        local count = `count' + 1
       
        if (`count'==1) {
            qui rename `var' x
        }
        else {
            local count_levels = `count_levels' + 1
            local level : word `count_levels' of `levels'
            qui rename `var' _v`level'
            // qui rename `var' _v`count'
            local v`level'_labelforfilename = `"`var'"'           // used for the filename for saving graphs of individual variables
            local v`level'_varlabel : variable label _v`level'    // used for the subtitle in the plot of individual variables.
        }
    }
  
    // COME BACK TO THIS
    // graph each y var, then all y vars
  
    foreach level of local levels {
        `gcmd' (asis) _v`level', over(x, ) `goptions' subtitle(`"`v`level'_varlabel'"')
        graph export "`pathtofile_withoutextension'_`v`level'_labelforfilename'.pdf", replace
    }
    /*
    forv x = 2/`count' {
        `gcmd' (asis) _v`x', over(x) `goptions' subtitle(`"`v`x'_varlabel'"')
        // di `"subtitle: subtitle(`"`v`x'_varlabel'"'), `v`x'_varlabel', v`x'_varlabel"'
        graph export "`pathtofile_withoutextension'_`v`x'_labelforfilename'.pdf", replace
    }
    */
  
    // graph all yvars
    qui reshape long _v, i(x) j(category)
    // cap tostring category, replace
    label values category ``nameofvaluelabel''

    /*
    forv x = 2/`count' {
    qui replace category = `"`v`x'_varlabel'"' if category == `"`x'"'
    }
    */
    `gcmd' (asis) _v, over(category, `overcategorysuboptions') over(x, `overxsuboptions') asyvars `goptions'
    graph export "`pathtofile_withoutextension'_allvars.pdf", replace
    save `"`pathtofile_withoutextension'_long.dta"', replace
    restore
end program

MS Excel VBA script to translate worksheets using the google translate API


UPDATE: I've made and Excel Add-In, that you can download here. Add it in to your worksheet and type Control+Shift+T to start the macro. I'll try to make a youtube video to demonstrate.

UPDATE #2: Here is a YouTube video to show how to download and install the add-in.

A while ago I wrote some code in Perl to translate excel sheets using google translate while preserving the formatting. That way was long, unreliable, complicated, etc. Here is a better solution.

Put the following MS Excel VBA macro code into your personal workbook, and create a shortcut to it (I use Ctrl+shift+t). It uses the google translate API. It will translate all non-empty, non-numeric cells in the active worksheet, placing the translation into a new worksheet, with the original formatting. It will place the original of numeric cells (not translated) into the new worksheet. The new worksheet will be the name of the old worksheet, with an underscore and the two letter language code appended onto it. If a worksheet with that name already exists, it will be deleted.

You will have to specify the following in a dialog box that will pop up when you run the Macro (or just in the code - I don't know how to paste the code for the userform here):
1. your google API key. The google translate API is not free, right now it is $20 per 1M characters
2. two letter language code for the source language
3. two letter language code for the destination language

(for 2 and 3, you have to use the language codes that the google translate API supports. See https://developers.google.com/translate/)

Maybe I'll modify this one day to use autodetect for the language, so that you can translate multiple languages on the same worksheet.

Feedback is always appreciated. Good luck!

Sub TranslateWorsheet()

    ' I got the URL encoding function here: http://stackoverflow.com/questions/218181/how-can-i-url-encode-a-string-in-excel-vba
    ' To run this script, you need to add "Microsoft Script Control" as reference (Tools -> References in the VB Editor)

    ' Step 1: Create a new worksheet: existing worksheetname_2lettertargetlanguagecode
    ' Step 2: In the current sheet, loop through all non-empty cells
    '       a) send the REST request to API to translate the contents of the cell if it is non-numeric, otherwise paste the original cell contents
    '       b) put the translated contents in the corresponding cell of the new worksheet
    '       c) copy also the formatting of the cell

    Dim destinationWorksheetName As String
    Dim sourceWorksheetName As String
    Dim cellContent As String
    Dim cellAddress As String
    Dim sourceWorksheet As Worksheet
    Dim destinationWorksheet As Worksheet
    
    Dim ScriptEngine As ScriptControl
    Set ScriptEngine = New ScriptControl
    ScriptEngine.Language = "JScript"
    ScriptEngine.AddCode "function encode(str) {return encodeURIComponent(str);}"
    
    ' use regualr expression to get the translation
    Dim RE As Object
    Set RE = CreateObject("VBScript.RegExp")
    RE.Pattern = "\[\s*{\s*""translatedText"": ""(.*)""\s}*"
    RE.IgnoreCase = False
    RE.Global = False
    RE.MultiLine = True
    Dim testResult As Boolean
   
    ' send the translation request
    Dim REMatches As Object
    Dim translateD As String
    Dim sourceString As String
    Dim K As String
    Dim URL As String
    Dim encodedSourceString As String
    Dim sourceLanguage As String
    Dim destinationLanguage As String
    Set sourceWorksheet = ActiveSheet
    sourceWorksheetName = ActiveSheet.Name
   
    ' sourceString = "Hello World"
    destinationLanguage = "EN"
    sourceLanguage = "RU"
    K = InputBox(prompt:="Please enter your Google Translate API key", Title:="Google Translate API Key Required: For more info, see https://developers.google.com/translate/v2/getting_started")

    'obTranslateOptions.Show
    'sourceLanguage = obTranslateOptions.obSourceLanguage.Text
    'destinationLanguage = obTranslateOptions.obDestinationLanguage.Text
    'K = obTranslateOptions.obKey.Text

    'Debug.Print "K=" & K
    'Debug.Print "sourceLanguage=" & sourceLanguage
    'Debug.Print "destinationLanguage=" & destinationLanguage
   
    ' Unload obTranslateOptions
   
    ' If a worksheet of this name in this workbook already exist, then delete it
    destinationWorksheetName = sourceWorksheetName & "_" & destinationLanguage
    Application.DisplayAlerts = False
    On Error Resume Next
    Sheets(destinationWorksheetName).Delete
    Application.DisplayAlerts = True
    On Error GoTo 0
   
    ' Prepare to send the request
    Dim objHTTP As Variant
    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    Dim responseT As String
      
    ' copy active worksheet, clear contents of the copy
    ActiveWorkbook.ActiveSheet.Copy after:=ActiveWorkbook.ActiveSheet
    ActiveSheet.Name = destinationWorksheetName
    ActiveSheet.Cells.ClearContents
    Set destinationWorksheet = ActiveSheet
   
    sourceWorksheet.Activate
    ' loop through all non-empty cells or all selected cells
    Dim cell As Range
    For Each cell In ActiveSheet.UsedRange.Cells
   
        'Debug.Print cell.Address
        cellAddress = cell.Address
        sourceString = cell.Value
        'Debug.Print "sourceString:" & sourceString
   
        ' do only for non-numeric cells
        If (IsNumeric(cell.Value) = False) Then
               
            ' encode the source text
            encodedSourceString = ScriptEngine.Run("encode", sourceString)
            ' prepare and send the request
            URL = "https://www.googleapis.com/language/translate/v2?key=" & K & "&source=" & sourceLanguage & "&target=" & destinationLanguage & "&q=" & encodedSourceString
            objHTTP.Open "GET", URL, False
            objHTTP.SetRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
            objHTTP.send ("")
            responseT = objHTTP.ResponseText
            ' Debug.Print "responseT:" & responseT
       
            ' pull the translation from the response to the request
            If (RE.Test(responseT) = True) Then
                'Debug.Print "re.test is true"
                Set REMatches = RE.Execute(responseT)
                translateD = REMatches.Item(0).SubMatches.Item(0)
                'Debug.Print "translateD:" & translateD
            Else
                'Debug.Print "re.test is false"
            End If
       
            destinationWorksheet.Range(cellAddress).Value = translateD
        Else
            destinationWorksheet.Range(cellAddress).Value = cell.Value
        End If
    Next
   
End Sub

Stata Tutorial 3 is now up on youtube.com

Stata Tutorial 3: insheet, append, use, sort, merge, outsheet. Here is the link to the youtube video:

https://www.youtube.com/watch?v=8JA5nZPdqIk&feature=plcp

The do file is available at:

https://docs.google.com/document/d/1RouCrQOhxc9CoDs5XryY3RgUPP5r_j39_LanPKmGu3Q/edit

and the log file is available at:

https://docs.google.com/document/d/1nkFknJ7fOzbNiejM7SZi2LcLDsaYdrnd4__ZdYrgF8A/edit

Enjoy!