Saturday, December 8, 2012

Stata tip: "Mega If" : a simple command to generate long if conditions

Suppose you need to type the following command that has many conditions in the if condition:

replace group = 1 if (benefitNumber == 1 | benefitNumber == 2 | benefitNumber == 3 | benefitNumber == 4 | benefitNumber == 5 | benefitNumber == 6 | benefitNumber == 11 | benefitNumber == 12 | benefitNumber == 17 | benefitNumber == 19 | benefitNumber == 21 | benefitNumber == 22 | benefitNumber == 23)

It's a bit much to type. The program megaif below will generate this long line from the much shorter command:

megaif 1 2 3 4 5 6 11 12 17 19 21 22 23, v(benefitNumber) c(replace group = 1)

The above has lots of "or equals", but you can also generate lots of "and does not equal", for example:

replace group = 1 if (benefitNumber != 1 & benefitNumber != 2 & benefitNumber != 3 & benefitNumber != 4 & benefitNumber != 5 & benefitNumber != 6 & benefitNumber != 11 & benefitNumber != 12 & benefitNumber != 17 & benefitNumber != 19 & benefitNumber != 21 & benefitNumber != 22 & benefitNumber != 23)

using the following command:

megaif 1 2 3 4 5 6 11 12 17 19 21 22 23, v(benefitNumber) c(replace group = 1) e(!=) s(&)

It works for numeric variables and for string variables too. Check this out:

megaif "a b" b "cc" d `"e"', v(benefit_stringvar) c(replace group = 1) e(!=) s(&)

executes the following command:

cmd to execute: replace group = 1 if (benefit_stringvar != "a b" & benefit_stringvar != "b" & benefit_stringvar != "cc" & benefit_stringvar != "d" & benefit_stringvar != "e")

As you can see, for string variables the quotes are optional unless you're checking for text that has a space in it. The program is below. Enjoy!

program define megaif

    // By Shafique Jamal
    // e.g.
    // sysuse auto, clear
    // megaif 0 1, v(foreign) c(drop) e(~=) // this will drop all the observations. Just for illustrative purposes to show how the command could be used
    // another e.g.
    // The command:
    //    megaif 14 15 16 17 18 19 20 21 22, c(gen priv1 = 1) var(income_type2)
    // would execute the following command:
    //     gen priv1 = 1 if (income_type2 == "14" | income_type2 == "15" | income_type2 == "16" | income_type2 == "17" | income_type2 == "18" | income_type2 == "19" | income_type2 == "20" | income_type2 == "21" | income_type2 == "22")


    syntax anything(id="variable and values" name=arguments), Var(varname) Cmd(string) [Equality(string) Separator(string)]
   
    // The default is equality
    if ("`equality'" == "") {
        local equality "=="
    }
   
    if ("`separator'" == "") {
        local separator " | "
    }
    else {
        local separator " `separator' "
    }
   
    cap confirm numeric variable `var'
    if (_rc == 0) { // variable is numeric
        local numericvar = 1
    }
    else {
        local numericvar = 0
    }
    // di "numericvar = `numericvar'"
   
    local count = 0
    local orcondition ""
    foreach w of local arguments {
        local count = `count' + 1

        // di `"w = `w'"'
        if (`numericvar' == 0) {
            local orcondition `"`orcondition'`orseparator'`var' `equality' "`w'""'
        }
        else {
            local orcondition `"`orcondition'`orseparator'`var' `equality' `w'"'
        }
        local orseparator "`separator'"
    }
   
    // di `"orcondition = `orcondition'"'
    di `"cmd to execute: `cmd' if (`orcondition') "'
    // set trace on
    // set traced 1
    `cmd' if (`orcondition')
    set trace off

end




Thursday, December 6, 2012

Stata tip: Quickly, and in one command, rename all variable labels of variables generated with the 'xi' command to reflect the value labels of the xi'd variable

When you use the xi command on categorical variables, even on those that have a value label associated with them, you get this:

. xi: svy, subpop(rural): reg logpccd rural_regressors refrigerator car livingroomsper numchild11 numchild11_sq i.oblast i.typeofdwell i.roof i.coldwatermeterinstalled i.soc_

. d

_Ioblast_3      byte   %8.0g                  oblast==3
_Ioblast_4      byte   %8.0g                  oblast==4
_Ioblast_5      byte   %8.0g                  oblast==5
_Ioblast_6      byte   %8.0g                  oblast==6
_Ioblast_7      byte   %8.0g                  oblast==7
_Ioblast_8      byte   %8.0g                  oblast==8
_Ioblast_11     byte   %8.0g                  oblast==11
_Itypeofdwe_2   byte   %8.0g                  typeofdwelling==2
_Itypeofdwe_3   byte   %8.0g                  typeofdwelling==3
_Itypeofdwe_4   byte   %8.0g                  typeofdwelling==4
_Itypeofdwe_5   byte   %8.0g                  typeofdwelling==5
_Itypeofdwe_6   byte   %8.0g                  typeofdwelling==6
_Itypeofdwe_7   byte   %8.0g                  typeofdwelling==7
_Itypeofdwe_8   byte   %8.0g                  typeofdwelling==8
_Itypeofdwe_9   byte   %8.0g                  typeofdwelling==9


The value labels are not much more informative than are the variable names. Below is a program that will automatically rename the variable label of these variables that result from the xi command so that they include the corresponding value label, as follows:

_Ioblast_3      byte   %8.0g                  oblast=Jalalabat
_Ioblast_4      byte   %8.0g                  oblast=Naryn
_Ioblast_5      byte   %8.0g                  oblast=Batken
_Ioblast_6      byte   %8.0g                  oblast=Osh
_Ioblast_7      byte   %8.0g                  oblast=City of Osh
_Ioblast_8      byte   %8.0g                  oblast=Chui
_Ioblast_11     byte   %8.0g                  oblast=City of Bishkek
_Itypeofdwe_2   byte   %8.0g                  typeofdwelling=Apartment or room in a residential hotel
_Itypeofdwe_3   byte   %8.0g                  typeofdwelling=Separate house
_Itypeofdwe_4   byte   %8.0g                  typeofdwelling=Part of a house
_Itypeofdwe_5   byte   %8.0g                  typeofdwelling=Dormitory
_Itypeofdwe_6   byte   %8.0g                  typeofdwelling=Lodge or a tied cottage (temporary tenure dwelling)
_Itypeofdwe_7   byte   %8.0g                  typeofdwelling=Other non-residential premises used for residence
_Itypeofdwe_8   byte   %8.0g                  typeofdwelling=Other residential premises
_Itypeofdwe_9   byte   %8.0g                  typeofdwelling=Barracks
 

There are actually two programs - mine is a wrapper for a program that Nicholas J. Cox wrote. Both of these are below.

Usage (run this after the xi command):

. varsformyrelabel

Programs:

program define varsformyrelabel

    // Written by Shafique Jamal (shafique.jamal@gmail.com), 12-07-2012
    // UPDATE 12-07-2012: Need to change how the variable name for the list of `allunxidvariables' is determined. Need to get it from the variable label, rather than the variable name
    //

    // Get list of variables that were xi'd
    local xivars "`_dta[__xi__Vars__To__Drop__]:'"
    // di `"xivars:`xivars'"'
   
    // Now just need to get list of un-xi'd variables from this list
    // Here is the first one
    local currentdummyvar : word 1 of `xivars'
    // di `"currentdummyvar:`currentdummyvar'"'
   
    // This will get the full variable name
    local currentunxidvar = regexr("`: variable label `currentdummyvar''","==.*$","")
    // di `"currentunxidvar:`currentunxidvar'"'
    local allunxidvars "`currentunxidvar'"
    // di `"allunxidvars:`allunxidvars'"'
   
    // This will get the _I`var' name, without the _# suffix - I need this for the first argument to the myrelabel routine. Variable name gets shortened
    local currentunxidvarwith_I = regexr("`currentdummyvar'","_[0-9]+$","")
    // di `"currentunxidvar:`currentunxidvarwith_I'"'
    local allunxidvarswith_I "`currentunxidvarwith_I'"
    // di `"allunxidvarswith_I:`allunxidvarswith_I'"'
   
    // Now loop through the rest
    local count = 0
    foreach var of local xivars {
        local count = `count' + 1
        if (`count' != 1) {
            local w : word `count' of `xivars'
            // di "w: `w'"
           
            // check whether the next xi'd var is related to the current one
            // if (regexm("`w'","^_I`currentunxidvar'_[0-9]+$")) { // yes, this is part of the same family as the current _I.... variable under consideration
            if (regexm("`: variable label `w''","^`currentunxidvar'==.*$")) { // yes, this is part of the same family as the current _I.... variable under consideration
                // di "skip"
            }
            else { // no, it is different. add to the list
                // this gets the full variable name
                local currentunxidvar = regexr("`: variable label `w''","==.*$","")
                // di `"currentunxidvar:`currentunxidvar'"'
                local allunxidvars "`allunxidvars' `currentunxidvar'"
                // di `"allunxidvars:`allunxidvars'"'
               
                // This gets the _Ivar name
                local currentunxidvarwith_I = regexr("`w'","_[0-9]+$","")
                // di `"currentunxidvar:`currentunxidvarwith_I'"'
                local allunxidvarswith_I "`allunxidvarswith_I'  `currentunxidvarwith_I'"
                // di `"allunxidvarswith_I:`allunxidvarswith_I'"'
            }       
        }
    }

    di "allunixidvars: `allunxidvars'"
    di `"allunxidvarswith_I:`allunxidvarswith_I'"'
    local count = 0   
    foreach var of local allunxidvars {
        local count = `count' + 1
        local varwith_I : word `count' of `allunxidvarswith_I'
        myrelabel `varwith_I'_* `var'
    }
   
end


program def myrelabel
*! NJC 1.0.0 15 July 2003
    version 7
    syntax varlist(numeric)

    tokenize `varlist'
    local nvars : word count `varlist'
    local last ``nvars''
    local vallabel : value label `last'
    if "`vallabel'" == "" {
        di as err "`last' not labelled"
        exit 498
    }

    local `nvars'
    local varlist "`*'"

    foreach v of local varlist {
        local varlabel : variable label `v'
        local eqs = index(`"`varlabel'"', "==")
        if `eqs' {
            local value = real(substr(`"`varlabel'"', `eqs' + 2, .))
            if `value' < . {
                local label : label `vallabel' `value'
                label var `v' `"`last'=`label'"'
            }
        }
    }

end


Monday, December 3, 2012

Stata tip: Using perl compatible regular expressions (PCRE) in Stata

UPDATE 12-07-2012: Thanks to Nicholas J Cox who the problem I was having with the -marksample- command. I replaced the code below with the new, fixed code.

Stata's regular expression engine is too limited for my needs. I asked the statalist about how to change Stata's regular expression engine, but apparently it is not possible. So I wrote a Stata program (.ado file) to call a perl script to run a regular expression on a variable.

Matching and substitution are supported. Named captures/groups are not, but non-named captures (e.g. $1, $2, etc.) ARE supported. I think quantifiers are supported. Anyways, if it works as I think it should based on my design and testing, it should be a decent improvement over Stata's built-in regular expression engine (I hope they update it soon).

You can download the perl script here. Download it and place it in any directory - just remember the directory because you will have to specify when you call the program.

The Stata program is here and below. Put this in your personal ado folder.

Usage:

Match only:
        pcre SOME_STRING_VARIABLE, re("/^(\d)(\w)/i") gen(NEW_VARIABLE_TO_BE_GENERATED) pa("/usr/local/ActivePerl-5.16/bin/")

Substitution:
        pcre SOME_STRING_VARIABLE, re("/^(\d)(\w)/gi") gen(NEW_VARIABLE_TO_BE_GENERATED) pa("/usr/local/ActivePerl-5.16/bin/") repl("firstone_$1_secondone_$2") 

Notes:

1. The arguement for re() should be a regular expression enclosed in double quotes. You can use only the forward slash for a delimiter. Named captures/groups don't work yet (I can't figure out why. Any ideas?)

2. The arguement for repl() should be the replacement part of s//THIS_PART/. It should be enclosed in double quotes. Do NOT include the forward slashes or any delimiters. Option modifiers do NOT go here. You can use backreferences $1, $2, etc. but NOT named groups/named captures (i.e. you can't use \g{1}, \g{name}, etc. The \g{} notation doesn't work at all).

3. You can specify the path to your perl installation in pa() (Be sure to include the trailing forward slash). If you don't, it will use whatever version of perl is accessible from the command line in a terminal in whatever path this is run from.

4. You should specify the path of the perl script that this program calls: stataregex.pl. You can download this from my blog: shafiquejamal.blogspot.com. The default is the /Applications/STATA12/ directory. Be sure to include the trailing forward slash.

5. This will generate a binary/dummy variable the match was a success, and variables prefixed by this same variable name with _1, _2, _3 ... , _16 appended to store the named captures/groups.

program define pcre

     // 30101990
    // Written by Shafique Jamal (shafique.jamal@gmail.com), 01 Dec 2012. Use at own risk :-p
    //
    // This program allows the user to use perl compatible regular expressions on a (single) string VARIABLE (not a scalar string) for matching, obtaining captures from memory parenthesis, and
    //    subsitutions. Its not perfect... I think it supports quantifiers, it does support options/option modifiers, but it does not support named captures/groups.
    //
    // Usage:
    //
    //    Match only:
    //        pcre SOME_STRING_VARIABLE, re("/^(\d)(\w)/i") gen(NEW_VARIABLE_TO_BE_GENERATED) pa("/usr/local/ActivePerl-5.16/bin/")
    //  Substitution:
    //        pcre SOME_STRING_VARIABLE, re("/^(\d)(\w)/gi") gen(NEW_VARIABLE_TO_BE_GENERATED) pa("/usr/local/ActivePerl-5.16/bin/") repl("firstone_$1_secondone_$2")
    //
    // Note:
    //
    //    1. The arguement for re() should be a regular expression enclosed in double quotes. You can use only the forward slash for a delimiter. Named captures/groups don't work yet (I can't
    //        figure out why. Any ideas?)
    //  2. The arguement for repl() should be the replacement part of s//THIS_PART/. It should be enclosed in double quotes. Do NOT include the forward slashes or any delimiters.
    //        Option modifiers do NOT go here. You can use backreferences $1, $2, etc. but NOT named groups/named captures (i.e. you can't use \g{1}, \g{name}, etc. The \g{} notation doesn't work at all). 
    //  3. You can specify the path to your perl installation in pa() (Be sure to include the trailing forward slash). If you don't, it will use whatever version of perl is accessible from the command line in a terminal in whatever path this
    //        is run from.
    //  4. You should specify the path of the perl script that this program calls: stataregex.pl. You can download this from my blog: shafiquejamal.blogspot.com
    //        The default is the /Applications/STATA12/ directory. Be sure to include the trailing forward slash. 
    //    5. This will generate a binary/dummy variable the match was a success, and variables prefixed by this same variable name with _1, _2, _3 ... , _16 appended to store the named captures/groups.
    //        It will also store (NEW_VAR_NAME)_s to store the new string with the substitution
    //
    // Steps:
    // 1. generate a merge variable based on _n. This is to make sure that the newly generated variable matches up by observations with the argument variable
    // 2. outsheet the merge variable and the argument variable into a csv file
    // 3. read the file into memory using perl
    // 4. perform the reg exp mach querry on each observation. Store result (0 or 1) in an array, whose index is the observation number as given in the merge variable
    // 5. save a new datafile, with the orignal merge var, and the match results variable, with the variable names in the headings
    // 6. merge this
    //
    // 02-12-2012: go ahead and pass the full regular expression with delimiters and options in the option REgularexpression(string asis)
    // Next step: detect whether a variable or string is the first arguement
    //
    //
    //
    // 1. generate a merge variable based on _n. This is to make sure that the newly generated variable matches up by observations with the argument variable
   
    syntax varname(string) [if], GENerate(name) REgularexpression(string asis) [Perlprogramdirwithfinalslash(string asis) PAthroperlwithfinalslash(string asis) REPLacement(string asis)]
    version 9.1
    marksample touse, strok
    // di `"`0'"'
   
    // 2. outsheet the merge variable and the argument variable into a csv file
    tempvar mergevar
    tempname _m
    // tempname touse2
    tempfile tfoutsheet
    tempfile tfinsheet
    tempfile tfinsheed_dta
    gen `mergevar' = _n
    // for some reason, marksample is not working
    // gen `touse2' = 0
    // qui replace `touse2' = 1 `if'
    cap drop `generate'
    // this is the variable that will hold the string with subsitutions
    cap drop `generate'_*
   
   
    // count if `touse'
    // count if `touse2'
    // di `"`if'"'
    // list hhid `mergevar' `touse'
   
    // qui outsheet `mergevar' `varlist' `touse' using "tfoutsheet.csv", c replace
    qui outsheet `mergevar' `varlist' `touse' using "`tfoutsheet.csv'", c replace
   
    // check options passed
    if (`"`optionmodifiers'"'==`""') {
        local optionmodifiers `""'
    }
   
    // check for perl program directory
    if (`"`perlprogramdirwithfinalslash'"'==`""') {
        local perlprogramdirwithfinalslash "/Applications/STATA12/"
    }
   
    // 3. Perl operations. Need to supply arguments in this order: inputfilename outputfilename nameofnewvariablegenerated regularexpressionpattern regularexpressionoptions
     // shell `pathroperlwithfinalslash'perl -v
     // di `"shell `pathroperlwithfinalslash'perl "`perlprogramdirwithfinalslash'stataregex.pl" "`tfoutsheet.csv'" "`tfinsheet.csv'" "`generate'" `regularexpression'"'
     qui shell `pathroperlwithfinalslash'perl "`perlprogramdirwithfinalslash'stataregex.pl" "`tfoutsheet.csv'" "`tfinsheet.csv'" "`generate'" `regularexpression' '`replacement''
   
    preserve
    qui insheet using "`tfinsheet.csv'", c clear
    sort `mergevar'
    qui save `"`tfinsheed_dta'"', replace
    restore
   
    sort `mergevar'
    qui merge 1:1 `mergevar' using `"`tfinsheed_dta'"', gen(`_m')
    qui drop `_m'
   
    foreach var of varlist `generate'* {
        cap confirm numeric var `var'
        if (_rc == 0) {
            qui replace `var' = . if `touse' == 0
        }
        else {
            qui replace `var' = "" if `touse' == 0
        }
    }   

end program

Wednesday, November 28, 2012

Stata tip: collapse dataset while preserving variable and value labels

When I use the collapse command, I loose the variable and value labels associated with my variables. The following program does everything that the collapse command does, but preserves the variable and value labels. It also has an option to refrain from putting the stat in the variable label.

Usage:

collapseandpreserve hdff=zarpl time_id (last) obraz wouer=soc_st, by(hh_code resp) o

Program:

program define collapseandpreserve

    // Written by Shafique Jamal (shafique.jamal@gmail.com).
    // This will collapse the dataset and preserve the variable and value labels. The syntax for using this is just like with the collapse command.
    // There is one additional optional option: show stat. If you add this option to the command (collapseandperserve ... ,by(...) omitstatfromvarlabel
    //    then it will not show the statistic (i.e. (fist), (mean), (last), etc.) in the variable label
   
    syntax anything(id="variable and values" name=arguments equalok), by(string asis) [cw fast Omitstatfromvarlabel]
    version 9.1
   
    // save all the value labels
    tempfile tf
    label save using `"`tf'"', replace
   
    // get the list of variables to be collapse, and keep track of the value label - variable correspondence
    tempname precollapse_listofvars
    tempname postcollapse_listofvars
    tempname listofvaluelabels
    tempname valuelabelname
    tempname stat
    tempname oldvarname
    tempname newvarname
    local `stat' "(mean)"
    foreach a of local arguments {
        di `"word: `a'"'
        if (regexm(`"`a'"',"^\(.*\)$")) { // if there is something like (first), (mean), etc.
            local `stat' = `"`a'"'   
        }
        else { // This is a variable. Store the associated variable label and value label name
           
            // What if there is an = in the term? then need two list of variables: a precollapse list and a postcollapse list
            if (regexm(`"`a'"',"^(.*)=(.*)$")) {
                // di "Regex match!"
                local `oldvarname' = regexs(2)
                // di "oldvarname: ``oldvarname''"
                local `newvarname' = regexs(1)
                // di "newvarname: ``newvarname''"
            }
            else {
                // di "NO regex match!"
                local `oldvarname' `"`a'"'
                // di "oldvarname: ``oldvarname''"
                local `newvarname' `"`a'"'
                // di "newvarname: ``newvarname''"
            }
           
            local `precollapse_listofvars'   `"``precollapse_listofvars'' ``oldvarname''"'
            local `postcollapse_listofvars'   `"``postcollapse_listofvars'' ``newvarname''"'
            local `valuelabelname' : value label ``oldvarname''
            tempname vl_``newvarname''
            local `vl_``newvarname''' : variable label ``oldvarname''
            if (`"``vl_``newvarname''''"' == `""') {
                local `vl_``newvarname''' `"``newvarname''"'
            }
            di `"omitstatfromvarlabel = `omitstatfromvarlabel'"'
            if (`"`omitstatfromvarlabel'"'==`""') {
                local `vl_``newvarname''' `"``stat'' ``vl_``newvarname''''"'
                di "not omitting"
            }
            else {
                local `vl_``newvarname''' `"``vl_``newvarname''''"'
                di "omitting"
            }
           
            if (`"``valuelabelname''"' == `""') { // variable has no value label
                local `listofvaluelabels' `"``listofvaluelabels'' ."'
            }
            else {
                local `listofvaluelabels' `"``listofvaluelabels'' ``valuelabelname''"'
            }
        }
    }
   
    collapse `arguments', by(`by') `cw' `fast'
    // macro list
   
    // retrieve the valuelabels
    qui do `"`tf'"'
   
    // reapply the variable labels and the value labels
    tempname count
    local `count' = 0
    di "------------------------------------------------"
    foreach var of local `postcollapse_listofvars' {
        di `"var: `var'"'
        di `"its variable label: ``vl_`var'''"'
        // reapply the variable labels
        local `count' = ``count'' + 1
        label var `var' `"``vl_`var'''"'
       
        // reapply the value labels
        local `valuelabelname' : word ``count'' of ``listofvaluelabels''
        if (`"``valuelabelname''"' != `"."') {
            label values `var' ``valuelabelname''
        }
    }
end program

Tuesday, November 27, 2012

Stata tip: Plotting simillar graphs on the same graph

Suppose you want to make a bar graph of a variable, such as consumption, for two mutually exclusive groups such as males and females, represented by one categorical variable ("male"). This is easy enough: use the graph bar command with an over() option. What if you want to plot over two categorical variables, one within the other: for example you want to plot average consumption for males and females that are self-employed, and average consumption for males and females that are not self-employed. Easy enough, just include an extra over() option with the extra categorical variable. In this example, each over group is mutually exclusive: you are either male or female, but can't be both, and you are either self-employed or not self-employed, but can't be both. In this example, the variables in your dataset are:

consumption, male, self-employed

Suppose, however, that you want to essentially combine two separate graphs into one graph as follows: you want to plot average consumption over three categorical variables that are NOT mutually exclusive, so you don't want to plot one within the other. For example, imagine that you are considering three different policy options for awarding a social assistance benefits: the current policy ("currentpolicy"), alternative A ("alternative_a") and alternative B ("alternative_b"). Each mechanism divides the population into those who qualify and those who don't qualify for the social assistance benefit. Thus each policy option is represented in the data by a binary variable (a.k.a. a dummy variable, which is just a categorical variable with two levels: 0 for those who do not qualify and 1 for those who do qualify). Of the three policy options, the current policy is the least pro-poor, alternative A is more pro-poor (more of the benefits go to the poor), and alternative B is the most pro-poor.

Now, the three policy options are not mutually exclusive. It is possible to qualify under all three policy options, to be excluded under all three policy options, or to qualify under only one or two of the policy options. This is not true of the groups self-employed and not self-employed, and of the groups male and female. Essentially, suppose you want to combine the first three graphs below onto the the same graph, as shown in the fourth graph:






The program below will do this. Make sure you have label values defined for the categorical values. Also note that:

a) The value labels for each of the categories should have the same numbering (i.e. they should all be 0, 1, 2 or 0, 2, 5. It should not be the case that one has 0, 1, 3 and the other has 1, 3, 4).

b) The groups defined by the categorical variable will be plotted the in the order that you specify them. So in the fourth graph above, the order is: current policy, alternative a, alternative b, since that is what is given in the option catvarlist below: catvarlist("currentpolicy alternative_a alternative_b")

c) You need to specify new value labels with the same numbering as the original value labels for the categorical variables. This is necessary for the plot to turn out nice (the spacing gets mucked up if I don't force this). You can do this in the v() option below. So the original value labels could be:

current policy: 0 "non-exempt" 1 "exempt"
alternative a: 0 "does not receive" 1 "receives"
alternative b: 0 "Ineligible" 1 "eligible"

You might want to relabel this using the v() option as follows:

v(0 "non-beneficiary" 1 "beneficiary")

To generate the fourth graph:

. overlappingcatgraphmean pccd using "$WHO_KG_reports/eraseme.dta", gc(graph bar (asis)) catvarlist("currentpolicy alternative_a alternative_b")     v(0 "Non-beneficiary" 1 "Beneficiary") over2options(lab(angle(0) labs(vsmall))) replace go(note(`"Source: Some data source "') asy asc title("Average consumption of groups within population") subtitle("Simulation of policy options") ytitle("Per capita HH consumption (LCU)", margin(medium)) legend(size(small)) blabel(total, format(%9.0fc)))

To generate the first three graphs:

. graph bar pccd [aw=expfact], over(currentpolicy) asy title("Mean annual consumption comparison") subtitle("Current Policy") ytitle("Consumption")  note("Source: some data source") blabel(total, format(%9.0fc))

. graph bar pccd [aw=expfact], over(alternative_a) asy title("Mean annual consumption comparison") subtitle("Alternative A Policy") ytitle("Consumption") note("Source: some data source") blabel(total, format(%9.0fc))

. graph bar pccd [aw=expfact], over(alternative_b) asy title("Mean annual consumption comparison") subtitle("Alternative B Policy") ytitle("Consumption") note("Source: some data source") blabel(total, format(%9.0fc))

The program:

program define overlappingcatgraphmean

    // Written by Shafique Jamal (shafique.jamal@gmail.com). 27 Nov 2012
    // I want to plot the mean of a variable over categorical values on the same plot. Of course, these categorical variables will not be mutually exclusive between them (though the are within them)
    // "using" should specify a .dta file - this program will save a dataset
    // doesn't take weights - uses svy mean to calculate the mean
    //
    // You call it like this:
    //
    // overlappingcatgraphmean varname using "filename.dta", gc(graph bar (asis)) go(over(catvariablelabel, [over_subopts]) over(catvariablelevel, [over_subopts]) asc title("My Title") ...) catvarlist(categoricalvar1 categoricalvar2) replace
    //    
    // Note that:
    // 1. 'catvariablelabel', 'catvariablelevel_n' 'catvariablelevel' must be entered exactly as is (without the quotes) - these are names of variables that the program creates
    // 2. the order in which you enter the over() options is up to you.
    //
    // UPDATE 12-07-2012: Best way is to call it with a long dataset like this: graph bar v, over(eligible) over(avg). Also note that I haven't tested whether this works with "if"

    syntax varname using/ [if] [in], GCmd(string) GOptions(string asis) CATvarlist(varlist) Valuelabelsforlevels(string asis) [replace over1options(string asis) over2options(string asis) ]
    version 9.1
    marksample touse
    tempname tempmat
    tempname variablelabel
    local `variablelabel' : variable label `varlist'   
   
    // foreach category, find the mean
    foreach catvar of local catvarlist {
        tempfile tf_`catvar'
       
        // this is a pain: get the name of the variable's value label
        tempname tn_`catvarvaluelabel'
        local `tn_`catvarvaluelabel'' : value label `catvar'
        label save ``tn_`catvarvaluelabel''' using `"`tf_`catvar''"', replace
       
        // UPDATE: None of this is necessary. The user will pass a list of value labels, separated by spaces, and these will be assumed to be the same for all the categorical variables specified
        //    e.g. user can pass v(0 "Qualifies" 1 "Does not Qualify"), where the categorical variables and corresponding value lables are:
        //    exempt     : 0 "Exempt"     1 "Non-exempt"
        //    PMT        : 0 "Eligible"    1 "Non-eligible"
        //    MBPF    : 0 "Receives"    1 "Does not receive"

        di "cat = `catvar'"
        tempname catvarlabel_`catvar'
        local `catvarlabel_`catvar'': variable label `catvar'
        tempname levels_`catvar'
        levelsof `catvar', local(`levels_`catvar'')
        foreach level of local `levels_`catvar'' {
            svy: mean `varlist' if `catvar' == `level' & `touse'
            matrix `tempmat' = r(table)
            tempname mean_`catvar'_`level'
            local `mean_`catvar'_`level'' = `tempmat'[1,1]
            tempname vl`catvar'_`level'
            local `vl`catvar'_`level'' : label (`catvar') `level'
            // di "Mean of var: ``mean_`catvar'_`level'''"
        }
    }

    tempname valueslabels
    label define `valueslabels' `valuelabelsforlevels'
    tempfile tf_valuelabelsforlevels
    label save `valueslabels' using `"`tf_valuelabelsforlevels'"', replace
   
    // I'll now make a dataset out of this with the following variables: mean of the variable; category name; category level
    //     The latter two will be numeric, categorical variables with variable labels attached.
    preserve
    clear
    do `"`tf_valuelabelsforlevels'"'
    gen meanofvariable = .
    label var meanofvariable `"``variablelabel''"'
    gen catvariablelabel = ""
    gen catvariablelevel = ""
    gen catvariablelevel_n = .
    gen sortorder = .
   
    // create the sort order - it will be the order in which the categorical variables were specified
   
    tempname count sortcount
    local `count' = 0
    local `sortcount' = 0
    foreach catvar of local catvarlist {
        // di "cat = `catvar'"
        local `sortcount' = ``sortcount'' + 1
        foreach level of local `levels_`catvar'' {
            local `count' = ``count'' + 1
            set obs ``count''
            replace meanofvariable = ``mean_`catvar'_`level''' in ``count''
            replace catvariablelabel = `"``catvarlabel_`catvar'''"' in ``count''
            replace catvariablelevel_n = `level' in ``count''
            replace catvariablelevel = `"``vl`catvar'_`level'''"' in ``count''
            replace sortorder = ``sortcount'' in ``count''
           
            // di "Mean of var: ``mean_`catvar'_`level'''"
        }
    }
    label values catvariablelevel_n `valueslabels'
    save `"`using'"', `replace'
    `gcmd' meanofvariable,  over(catvariablelevel_n, sort(catvariablelevel_n) `over1options') over(catvariablelabel, sort(sortorder) `over2options') `goptions'
    restore

end program

Stata tip: fixing the legend on bar graphs to display variable labels instead of variable names

Check out the legends on these two graphs (the first one is the problem legend, the second one is the better legend):



For the first one, I used the command:

. graph bar (mean) appliedmed hospitalized [aw=expfact], over(exempt) ...

and in the legend, it used "mean of [variable name]" instead of using the variable label. If you use the option nolabel after the graph bar command, you would just get "[variable name]" in the legend. How do you get stata to use the variable labels in the legend instead of the variable names, like in the second graph above? (note that, in the second graph, my program makes the variable labels go over two lines when they are long, and makes the line break at a space, not in the middle of a word). Use the following code:

Usage:

/ local vlist appliedmed hospitalized
. makelegendlabelsfromvarlabels `vlist', local(relabellegend) c(30)
. graph bar (mean) `vlist' [aw=expfact], over(exempt) title(`"Share reporting applied for medical assistance in the past 30 days"') ytitle("Fraction of group", margin(medium)) blabel(total, format(%9.2fc)) subtitle("Average for each group") legend(size(vsmall) `relabellegend')


Where the program makelegendlabelsfromvarlabels is defined as below. In the above, the option c(30) tells stata that the first line should have only 30 characters, and that the rest of the value label should be placed on the line below.

program define makelegendlabelsfromvarlabels

    // Written by Shafique Jamal (shafique.jamal@gmail.com). 25 Nov 2012
    //
   
    // Wrote it to fix an annoyance with graph bar. I want graph bar to use variable labels, not variable names, in the legend, but it won't do this if I am using a "(stat)" rather than "(asis)"
    syntax varlist, local(name local) [c(integer 30)]
    version 9.1
   
    // local charlength = 30
   
    tempname count
    local `count' = 0
    tempname labeloptions
    tempname variablelabel
    foreach var of local varlist {
        local `count' = ``count'' + 1
        local `variablelabel' : variable label `var'
       
        // It would be great to break this up at a word boundary if the length is > 34 characters
        if (length(`"``variablelabel''"') > `c') {
            tempname variablelabel_part1
            tempname variablelabel_part2
            tempname variablelabel_tochange
            tempname positionofspace
            tempname positionofspace_prev
            tempname exitwhileloop
            local `exitwhileloop'   = 0
            local `positionofspace' = 0
            local `variablelabel_tochange' `"``variablelabel''"'
            while (``exitwhileloop'' == 0) {
           
                local `positionofspace' = strpos(`"``variablelabel_tochange''"', " ")
                if (``positionofspace'' >= `c' | ``positionofspace''==0) {
                    local `exitwhileloop'   = 1
                }
                else {
                    local `positionofspace_prev' = ``positionofspace''
                    local `variablelabel_tochange' = subinstr(`"``variablelabel_tochange''"'," ",".",1)
                }
           
            }
           
            local `variablelabel_part1' = substr(`"``variablelabel''"', 1, ``positionofspace_prev'')
            local `variablelabel_part2' = substr(`"``variablelabel''"', ``positionofspace_prev'' + 1, . )
            local `labeloptions' `"``labeloptions'' label(``count'' `"``variablelabel_part1''"' `"``variablelabel_part2''"') "'
        }
        else {
            local `labeloptions' `"``labeloptions'' label(``count'' `"``variablelabel''"') "'
        }
    }
   
    // di `"labeloptions: ``labeloptions''"'
    // need to return this in a local macro
    c_local `local' `"``labeloptions''"'
   

end program


Monday, November 26, 2012

Stata tip: Rename the value label associated with a variable, when renaming said variable

Suppose you have a datset with the variable a9, and the value label associated with this is a9 (or gobledygook, or whatever). You may want to change this variable name to something more telling, like maritalstatus. If you use

rename a9 maritalstatus

The value label remains a9. The following ado file will allow you to change both the variable name and the name of the variable label at the same time:

renamevarandvarlabel a9 maritalstatus

now both the variable name and the variable label are maritalstatus. Note that the original variable label can be named anything. For example, if the original variable label was gobledygook, it would still be changed to maritalstatus.

program define renamevarandvaluelabel

    // Written by Shafique Jamal (shafique.jamal@gmail.com). 25 Nov 2012

    // This program renames the variable and the value label. Usage:
    //     renamevarandvaluelabel originalvarname newvarname
    // What it does:
    //    rename originalvarname newvarname
    // and it changes the name of the value label of originalvarname to newvarname.
    // Just make sure that if there is already a value label named newvarname, you're ok with loosing it.

    // syntax anything(id="variable and values" name=arguments)
    syntax anything(id="original and new label name" name=labelnames)
    version 9.1
   
    // steps:
    //    1. drop the label with the new label name, if it exists
    //  2. create the new label from the old label
    //  3. apply this new label to variable
   
    // di "labelnames = `labelnames'"
   
    foreach item of local labelnames {
        // di `"item = `item'"'
    }
   
    tempname originallabelname
    tempname originalvarname
    local `originalvarname' : word 1 of `labelnames'
    tempname newvarandlabelname
    local `newvarandlabelname' : word 2 of `labelnames'
   
    // Step 1. drop the label with the new label name, if it exists. Wait, if it exists... what do we do? Quit the program
    cap label list ``newvarandlabelname''
    if (_rc == 0) {
        di "That label (``newvarandlabelname'') already exists. You can use the command "renamevaluelabel [oldlabelname] [newlabelname] written by Shafique Jamal (shafique.jamal@gmail.com) to change that value label name." Exiting"
        exit
    }
   
    // Step 2. create the new label from the old label. First need to get the name of the label value of the original variable name. Do this only if there is a value label attached
    local `originallabelname' : value label ``originalvarname''
    if ("``originallabelname''"~="" & "``originallabelname''"~=" ") {
   
        di "There is an existing label"
        label copy ``originallabelname'' ``newvarandlabelname''
   
        // Step 3. rename the variable, then attached the new variable label
        rename ``originalvarname'' ``newvarandlabelname''
        label values ``newvarandlabelname'' ``newvarandlabelname''
    }
    else { // Just rename the variable, forget about the value label, if there is no original value label
   
        di "No existing label"
        rename ``originalvarname'' ``newvarandlabelname''
    }
   
   
end



Stata tip: Easy and short way to generate household head variables for individual-level datasets

Suppose you have an individual-level dataset (so you have a dataset with data on multiple members in the household), and you want to generate an variable that says something about the household head (e.g. household head is male, or is unemployed, etc.). This program will allow you to do so with just one command:

program define genhhhcharacteristics

    // Written by Shafique Jamal (shafique.jamal@gmail.com).
    // For an individual level dataset (includes multiple household members, not just the household head), generates a variable indicating a characteristic of the household head
    // e.g. suppose you want to generate a new variable (hhh_male) indicating the gender of the household head, and the variable identifying the household head is "reltohead", with 1 being the head,
    // and you want to do it by hhid of course. You would use the following command:
    //   
    // genhhhcharacteristics male, b(hhid) gen(hhh_male) h(reltohead) id(1)
    //
    // The above would be the equivalent of doing the following:
    //    gen hhh_male_interm = 1 male if reltohead == 1
    //  bys hhid: egen hhh_male = max(hhh_male_interm)
    //    drop hhh_male_interm
    // And then copying the value label and a modified variable label over to the new household head variable
    //

    syntax varname, Byvariables(varlist) GENerate(name) Headvariable(varname) [IDofhead(integer 1) ]
    version 9.1

    tempvar intermediaryvariable
    gen `intermediaryvariable' = `varlist' if `headvariable' == `idofhead'
    bys `byvariables': egen `generate' = max(`intermediaryvariable')
   
    // Now copy the value label over, if there is one
    tempname valuelabel
    local `valuelabel' : value label `varlist'
    if ("``valuelabel''"~="" & "``valuelabel''"~=" ") {
   
        // di "There is an existing label"
        label values `generate' ``valuelabel''
    }
   
    // Copy over also the variable label
    tempname variablelabel
    local `variablelabel' : variable label `varlist'
    label var `generate' `"``variablelabel'' (For `headvariable' == `idofhead', by `byvariables')"'
   
end program



Friday, November 23, 2012

Stata tip: plotting the output of the tab function

UPDATE2: I updated this to allow for "if" and "in"

UPDATE: I updated this to preserve the value labels. So var2 (the second variable in your variable list) must have a value label attached to it.

Suppose you want to plot the output of the two-way tab function? Here is a program that will do it (see below). It is actually a wrapper for the tabout command. Some notes about the options:

using: put here the name of the filename that you want to save the tabout data to, in tab separated format. The graphs that this command produces will save graphs using the same filename but with different extension.

gc: this stands for graph command. You can use gc("graph bar"), gc("graph hbar")... and maybe others

go: this stands for graph options. These are the options that you would use for the graph command above (e.g. note, title, b1title, subtitle, etc)

ta: this stands for tabout options. These are the options you would use with the tabout command (e.g. c(), f(), etc.)

Usage:

taboutgraph var1 var2 [aw=weight] using "filename_to_savedatato.csv", gc("graph bar") ta(cells(col) f(2 2 2 2)) replace go( note("Source: XXX") b1title("Quintile") title(`"Composition of Population"') ytitle("Percent of population in the quntile"))

Code

program define taboutgraph

    // Written by Shafique Jamal (shafique.jamal@gmail.com)
    // This program requires that the second variable in varlist have a value label attached to it
    // It plots the column output of the tabout command

    syntax varlist(min=2 max=2) [if] [in] using/ [aweight], GCmd(string) GOptions(string asis) TAboutoptions(string asis) [replace overcategorysuboptions(string asis) overxsuboptions(string asis)]
    version 9.1
    marksample touse
    // di `"`0'"'
    cap drop _v*
    // cap ssc install lstrfun
  
    // first generate the table
    tabout `varlist' [`weight'`exp'] if `touse' using `using', `replace' `taboutoptions'
    di `"tabout [`weight'`exp'] `varlist' if `touse' using `using', `replace'"'
    local number_of_rows    = r(r)
    local number_of_columns = r(c)
    return list
  
    // get the filename
    di `"regexm:"'
    di regexm(`"`using'"',`"((.*)\.(.+))$"')
    if (regexm(`"`using'"',`"((.*)\.(.+))$"')) {
        local pathtofile_original            = regexs(1)
        local pathtofile_withoutextension    = regexs(2)
        local pathtofile_extension            = regexs(3)
    }
    di `"pathtofile_original:`pathtofile_original'"'
    di `"pathtofile_withoutextension:`pathtofile_withoutextension'"'
    di `"pathtofile_extension:`pathtofile_extension'"'
    // open the file and process it.
  
    local count = 0
    tempname fhr
    tempname fhw
    tempfile tf
    file open `fhr' using `"`pathtofile_original'"', r
  
    // ---------------------------
    // file open `fhw' using `"$WHO_KG_reports/tempfile.csv"', t write all replace
    file open `fhw' using `"`tf'"', t write all replace
  
    local count = `count' + 1

    // First line is variable label.
    file read `fhr' line
    return list
    local count = 1
    while r(eof)==0 {
        local count = `count' + 1
        // di `"count = `count'"'
        file read `fhr' line
       
        if (`count'~=3) { // This line is units - we can throw this away
            file write `fhw' `"`line'"' _n
            // di `"`line'"'
        }
    }
       
    file close `fhr'
    file close `fhw'
  
    // We should save the value labels. Check to make sure that the label exists
    tempfile tfvaluelabels
    tempname nameofvaluelabel
    tempname variablenamewithlabel
    local `variablenamewithlabel' : word 2 of `varlist'
    local `nameofvaluelabel' : value label ``variablenamewithlabel''
    label save ``nameofvaluelabel'' using `"`tfvaluelabels'"', replace
  
    preserve
    qui insheet using `"`tf'"', t clear names
  
    // I want to restore the value levels and value labels
    do `"`tfvaluelabels'"'
    // ssc install labellist
    // levelsof ``nameofvaluelabel'', local(levels)
    labellist ``nameofvaluelabel''
    local levels = r(``nameofvaluelabel''_values)
    local labels = r(``nameofvaluelabel''_labels)
  
    save `"`pathtofile_withoutextension'_short.dta"', replace

    drop total
    drop if _n == _N
  
    local count = 0
    local count_levels = 0
    foreach var of varlist * {
        local count = `count' + 1
       
        if (`count'==1) {
            qui rename `var' x
        }
        else {
            local count_levels = `count_levels' + 1
            local level : word `count_levels' of `levels'
            qui rename `var' _v`level'
            // qui rename `var' _v`count'
            local v`level'_labelforfilename = `"`var'"'           // used for the filename for saving graphs of individual variables
            local v`level'_varlabel : variable label _v`level'    // used for the subtitle in the plot of individual variables.
        }
    }
  
    // COME BACK TO THIS
    // graph each y var, then all y vars
  
    foreach level of local levels {
        `gcmd' (asis) _v`level', over(x, ) `goptions' subtitle(`"`v`level'_varlabel'"')
        graph export "`pathtofile_withoutextension'_`v`level'_labelforfilename'.pdf", replace
    }
    /*
    forv x = 2/`count' {
        `gcmd' (asis) _v`x', over(x) `goptions' subtitle(`"`v`x'_varlabel'"')
        // di `"subtitle: subtitle(`"`v`x'_varlabel'"'), `v`x'_varlabel', v`x'_varlabel"'
        graph export "`pathtofile_withoutextension'_`v`x'_labelforfilename'.pdf", replace
    }
    */
  
    // graph all yvars
    qui reshape long _v, i(x) j(category)
    // cap tostring category, replace
    label values category ``nameofvaluelabel''

    /*
    forv x = 2/`count' {
    qui replace category = `"`v`x'_varlabel'"' if category == `"`x'"'
    }
    */
    `gcmd' (asis) _v, over(category, `overcategorysuboptions') over(x, `overxsuboptions') asyvars `goptions'
    graph export "`pathtofile_withoutextension'_allvars.pdf", replace
    save `"`pathtofile_withoutextension'_long.dta"', replace
    restore
end program

MS Excel VBA script to translate worksheets using the google translate API


UPDATE: I've made and Excel Add-In, that you can download here. Add it in to your worksheet and type Control+Shift+T to start the macro. I'll try to make a youtube video to demonstrate.

UPDATE #2: Here is a YouTube video to show how to download and install the add-in.

A while ago I wrote some code in Perl to translate excel sheets using google translate while preserving the formatting. That way was long, unreliable, complicated, etc. Here is a better solution.

Put the following MS Excel VBA macro code into your personal workbook, and create a shortcut to it (I use Ctrl+shift+t). It uses the google translate API. It will translate all non-empty, non-numeric cells in the active worksheet, placing the translation into a new worksheet, with the original formatting. It will place the original of numeric cells (not translated) into the new worksheet. The new worksheet will be the name of the old worksheet, with an underscore and the two letter language code appended onto it. If a worksheet with that name already exists, it will be deleted.

You will have to specify the following in a dialog box that will pop up when you run the Macro (or just in the code - I don't know how to paste the code for the userform here):
1. your google API key. The google translate API is not free, right now it is $20 per 1M characters
2. two letter language code for the source language
3. two letter language code for the destination language

(for 2 and 3, you have to use the language codes that the google translate API supports. See https://developers.google.com/translate/)

Maybe I'll modify this one day to use autodetect for the language, so that you can translate multiple languages on the same worksheet.

Feedback is always appreciated. Good luck!

Sub TranslateWorsheet()

    ' I got the URL encoding function here: http://stackoverflow.com/questions/218181/how-can-i-url-encode-a-string-in-excel-vba
    ' To run this script, you need to add "Microsoft Script Control" as reference (Tools -> References in the VB Editor)

    ' Step 1: Create a new worksheet: existing worksheetname_2lettertargetlanguagecode
    ' Step 2: In the current sheet, loop through all non-empty cells
    '       a) send the REST request to API to translate the contents of the cell if it is non-numeric, otherwise paste the original cell contents
    '       b) put the translated contents in the corresponding cell of the new worksheet
    '       c) copy also the formatting of the cell

    Dim destinationWorksheetName As String
    Dim sourceWorksheetName As String
    Dim cellContent As String
    Dim cellAddress As String
    Dim sourceWorksheet As Worksheet
    Dim destinationWorksheet As Worksheet
    
    Dim ScriptEngine As ScriptControl
    Set ScriptEngine = New ScriptControl
    ScriptEngine.Language = "JScript"
    ScriptEngine.AddCode "function encode(str) {return encodeURIComponent(str);}"
    
    ' use regualr expression to get the translation
    Dim RE As Object
    Set RE = CreateObject("VBScript.RegExp")
    RE.Pattern = "\[\s*{\s*""translatedText"": ""(.*)""\s}*"
    RE.IgnoreCase = False
    RE.Global = False
    RE.MultiLine = True
    Dim testResult As Boolean
   
    ' send the translation request
    Dim REMatches As Object
    Dim translateD As String
    Dim sourceString As String
    Dim K As String
    Dim URL As String
    Dim encodedSourceString As String
    Dim sourceLanguage As String
    Dim destinationLanguage As String
    Set sourceWorksheet = ActiveSheet
    sourceWorksheetName = ActiveSheet.Name
   
    ' sourceString = "Hello World"
    destinationLanguage = "EN"
    sourceLanguage = "RU"
    K = InputBox(prompt:="Please enter your Google Translate API key", Title:="Google Translate API Key Required: For more info, see https://developers.google.com/translate/v2/getting_started")

    'obTranslateOptions.Show
    'sourceLanguage = obTranslateOptions.obSourceLanguage.Text
    'destinationLanguage = obTranslateOptions.obDestinationLanguage.Text
    'K = obTranslateOptions.obKey.Text

    'Debug.Print "K=" & K
    'Debug.Print "sourceLanguage=" & sourceLanguage
    'Debug.Print "destinationLanguage=" & destinationLanguage
   
    ' Unload obTranslateOptions
   
    ' If a worksheet of this name in this workbook already exist, then delete it
    destinationWorksheetName = sourceWorksheetName & "_" & destinationLanguage
    Application.DisplayAlerts = False
    On Error Resume Next
    Sheets(destinationWorksheetName).Delete
    Application.DisplayAlerts = True
    On Error GoTo 0
   
    ' Prepare to send the request
    Dim objHTTP As Variant
    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    Dim responseT As String
      
    ' copy active worksheet, clear contents of the copy
    ActiveWorkbook.ActiveSheet.Copy after:=ActiveWorkbook.ActiveSheet
    ActiveSheet.Name = destinationWorksheetName
    ActiveSheet.Cells.ClearContents
    Set destinationWorksheet = ActiveSheet
   
    sourceWorksheet.Activate
    ' loop through all non-empty cells or all selected cells
    Dim cell As Range
    For Each cell In ActiveSheet.UsedRange.Cells
   
        'Debug.Print cell.Address
        cellAddress = cell.Address
        sourceString = cell.Value
        'Debug.Print "sourceString:" & sourceString
   
        ' do only for non-numeric cells
        If (IsNumeric(cell.Value) = False) Then
               
            ' encode the source text
            encodedSourceString = ScriptEngine.Run("encode", sourceString)
            ' prepare and send the request
            URL = "https://www.googleapis.com/language/translate/v2?key=" & K & "&source=" & sourceLanguage & "&target=" & destinationLanguage & "&q=" & encodedSourceString
            objHTTP.Open "GET", URL, False
            objHTTP.SetRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
            objHTTP.send ("")
            responseT = objHTTP.ResponseText
            ' Debug.Print "responseT:" & responseT
       
            ' pull the translation from the response to the request
            If (RE.Test(responseT) = True) Then
                'Debug.Print "re.test is true"
                Set REMatches = RE.Execute(responseT)
                translateD = REMatches.Item(0).SubMatches.Item(0)
                'Debug.Print "translateD:" & translateD
            Else
                'Debug.Print "re.test is false"
            End If
       
            destinationWorksheet.Range(cellAddress).Value = translateD
        Else
            destinationWorksheet.Range(cellAddress).Value = cell.Value
        End If
    Next
   
End Sub

Stata Tutorial 3 is now up on youtube.com

Stata Tutorial 3: insheet, append, use, sort, merge, outsheet. Here is the link to the youtube video:

https://www.youtube.com/watch?v=8JA5nZPdqIk&feature=plcp

The do file is available at:

https://docs.google.com/document/d/1RouCrQOhxc9CoDs5XryY3RgUPP5r_j39_LanPKmGu3Q/edit

and the log file is available at:

https://docs.google.com/document/d/1nkFknJ7fOzbNiejM7SZi2LcLDsaYdrnd4__ZdYrgF8A/edit

Enjoy!

Friday, August 24, 2012

Stata tip: a wrapper for the outsheet command that can write variable lables instead of variable names

One limitation of Stata's outsheet command is that it does not give you the option of writing variable labels instead of variable names on the first line. To solve this, I wrote an ado file that is a wrapper for the outsheet command:

// This ado file is a wrapper for the outsheet stata command that allows one to put the variable labels instead of the variable names on the first line of the file.

program define outsheet_varlabels

    syntax [varlist] using/ [,Comma DELIMiter(string) NONames NOLabel NOQuote replace VARLabels] 
   
    // if no varlist, that means outsheet all variables
    if ("`varlist'"=="") {
        local varlist "*"
    }
    // Lets make sure that the delimiter is passed on to the outsheet command correctly. At the same time, I need the delimiter without quotes for the first line that I will write for the heading.
    if (`"`delimiter'"'~="") {
        local delimiterchar = `"`delimiter'"'
        local delimiter `"delimiter("`delimiter'")"'
    }
    else {
        local delimiterchar = `","'
    }
    // di `"new delimiter macro: `delimiter'"'
    // di `"delimiterchar = `delimiterchar'"'
    // Did the user say "noquote"? If not, then make sure the variable labels line below is double quoted
    if (`"`noquote'"'~="noquote") {
        local quote = `"""'
        // di `"use quotes: `quote'"'
    }
    if ("`varlabels'" == "") { // If user did not specify the variable labels option, then just call outsheet as is
        outsheet `varlist' using `"`using'"', `comma' `delimiter' `nonames' `nolabel' `noquote' `replace'
    }
    else { // Otherwise, write the variable lables instead of the variable names. Chose line1 to be variable labels
       
        tempfile tempoutsheetfile
        qui outsheet `varlist' using `"`tempoutsheetfile'"', `comma' `delimiter' `nonames' `nolabel' `noquotes' `replace'
       
        // Here, construct the first line
        local count = 0
        foreach var of varlist `varlist' {
            local varlabel : variable label `var'
            if (`"`varlabel'"'=="") {  // What if there no variable label for the label? Then use the variable name instead
                local varlabel `"`var'"'
            }
            // di "var: `var'"
            local count = `count' + 1
            if (`count'==1) { // Don't want a comma before the first item.
                local line1heading `"`quote'`varlabel'`quote'"'
                // di `"`quote'`varlabel'`quote'"'
            }
            else {
                local line1heading `"`line1heading'`delimiterchar'`quote'`varlabel'`quote'"'
                // di `"`line1heading'`delimiterchar'`quote'`varlabel'`quote'"'
            }
        }
        // di `"`line1heading'"'
        // di ""
       
        /* // This method does not work. It overwrites, rather than inserts
        tempname fht
        file open  `fht' using `"`using'"', read write t all
        file seek  `fht' tof
        file write `fht' _n `"`line1heading'"' _n
        file close `fht'
        */
       
        // Try open tempoutsheetfile as read, the final file as write with the line1heading as the first line
        // This is the final file
        tempname fh_write
        file open `fh_write' using `"`using'"', t write all replace
        file write `fh_write' `"`line1heading'"' _n
       
        // Read from this and put in the final file
        tempname fh_read
        file open `fh_read' using `"`tempoutsheetfile'"', t read        
       
        file read `fh_read' readfileline
        local count = 0
        while r(eof)==0 {
            local count = `count' + 1
            if (`count'~=1) {
                file write `fh_write' `"`readfileline'"' _n
            }
            file read `fh_read' readfileline
        }
       
        file close `fh_write'
        file close `fh_read'       
       
    }
   
   
    // di `"sytnax: `varlist' `using', `comma' `delimiter' `nonames' `nolabel' `noquotes' `replace'"'
   

end

To call this function so that it writes the variable labels instead of the variable names to the first line, call is just like you would the outsheet command, but with the varlabels option:

outsheet_varlabels using filename.csv, c replace varlabels