pre_limit_mult in synth_runner package does not work well - stata

This might be similar to this question (pre_limit_mult in synth_runner package stata does not work), However, I ask this because I could not find any useful tips from the link.
I am trying to use the pre_limit_mult(real) option to limit the placebo effects in the pool for inference. The ultimate purpose is to test the validity of estimation of the economic impact of the coup in Gambia in 1994.
However, even though I follow the practice as the guidebook explains, the command does not work and present the message like the following.
tsset country_id year
synth rgdpe pop rconna csh_i csh_x rgdpe(1971) rgdpe(1982) rgdpe(1993), trunit(21) trperiod(1994) keep("Gambia_outout") replace fig
local K = 2
synth_runner rgdpe pop rconna csh_i csh_x rgdpe(1971) rgdpe(1982) rgdpe(1993), trunit(21) trperiod(1994) keep("Gambia_outout") replace gen_vars pre_limit_mult(`K')
single_treatment_graphs, trlinediff(-1) raw_gname( rgdpe_raw) effects_gname(rgdpe_effects) effects_ylabels(-1500(750)1500) effects_ymax(2000) effects_ymin(-2000) do_color(bluishgray)
With -, gen_vars- the program needs to be able create the following variables: lead rgdpe_sy nth effect pre_rmspe post_rmspe. Please make sure there are no such varaibles and that the dependent variable [rgdpe] has a short enough name that the generated vars are not too long (usually a max of 32 characters)
I received this message.
With -, gen_vars- the program needs to be able create the following variables: lead rgdpe_synth effect pre_rmspe post_rmspe.
Please make sure there are no such varaibles and that the dependent variable [rgdpe] has a short enough name that the generated vars are not too long (usually a max of 32 characters).
Please download the input data and dofile from the following link (I could not find the way to directly attach the file in stackoverflow).
(https://drive.google.com/drive/folders/1VyP2GN3NfQT6jQ9enbCvE2VTVNxZdPgc?usp=sharing)
Does anyone know how to fix it?

Related

I need help in designing my C++ Console application

I have a task to complete.
There are two types of csv files 4000+ both related to each other.
2 types are:
1. Country2.csv
2. Security_Name.csv
Contents of Country2.csv:
Company Name;Security Name;;;;Final NOS;Final FFR
Contents of Security_Name.csv:
Date;Close Price;Volume
There are multiple countries and for each country multiple security files
Now I need to READ them do some CALCULATION and then WRITE the output in another files
READ
Read both the file Country 2.csv and Security.csv and extract all the data from them.
For example :
Read France 2.csv, extract Security_Name, Final NOS, Final FFR
Then Read Security.csv(which matches the Security_Name) and extract Date, Close Price, Volume
Calculation
Calculations are basically finding Median of the values extracted which is quite simple.
For Example:
Monthly Median Traded Values
Daily Traded Value of a Security ... and so on
Write
Based on the month I need to sort the output in two different file with following formats:
If Month % 3 = 0
Save It as MONTH_NAME.csv in following format:
Security name; 12-month indicator; 3-month indicator; FOT
Else
Save It as MONTH_NAME.csv in following format:
Security Name; Monthly Median Traded Value Ratio; Number of days Volume > 0
My question is how do I design my application in such a way that it is maintainable and the flow of data throughout the execution is seamless?
So first thing. Based on the kind of data you are looking to generate, I would probably be looking at moving this data to a SQL db if at all possible. This is "one SQL query" kind of stuff. And far more maintainable than C++ that generates CSV files from CSV files.
Barring that, I would probably look at using datamash and/or perl. On a Windows platform, you could do this through Cygwin or WSL. Probably less maintainable, but so much easier it's not too much of an issue.
That said, if you're looking for something moderately maintainable, C++ could work. The first thing I would do is design my input classes. Data-centric, but it can work. It sounds like you could have a Country class, a Security class, and a SecurityClose class...or something along those lines. You can think about whether a Security class should contain a collection of SecurityClosees (data), or whether the data should just be "loose" and reference the Security it belongs to. Same with the Country->Security relationship.
Once you've decided how all that's going to look, you want something (likely a function) that can tokenize a CSV line. So "1,2,3" gets turned into a vector<string> with the contents "1" "2" "3". Then, each of your input classes should have a constructor or initializer that takes a vector<string> and populates itself. You might need to pass higher level data along too. Like the filename if you want the security data to know which security it belongs to..
That's basically most of the battle there. Once you've pulled your data into sensibly organized classes, the rest should come more easily. And if you run into bumps, hopefully you can ask specific design or implementation questions from there.

Stata: Esttab of xtreg with time fixed effects

I'm trying to save output from several hundred eststo's storing results of bivariate probability models into one excel file using esttab. It works for xtlogit(both ,re and ,pa), xtprobit (both ,re and ,pa) and for the linear probability model xtreg (both standard and ,fe. However, when I use xtreg y x i.year, fe I get the error message too many base levels specified. Google doesn't help me much.
I've been trying for an hour to create a reproducible example but the stata datasets all work fine. It does not seem to be due to the number of years or the fact that different specifications have data for different years. Still, the normal xtreg, fe' works, the problem only appears with time dummies. The weirdest thing is that it works for all subsets of my variables but not for the whole list (again just the time fixed effects specifications).
Does anyone have an idea how to proceed? Using drop(*.year) works whenever the problem does not arise (so in specifications where it works, I get outputs without the year dummies) but does not prevent the too many base levels specified error; ,nobaselevels has no apparent effect as well. Is there a way to remove the time fixed effects from eststo before I pass those on to esttab? Any workaround would be appreciated as well.
The problem you might be facing is that of Stata creating different base levels for the factor variable year, in different regressions.
Try fixing the factor variable base level beforehand with fvset:
fvset base <some_number> year
Check help fvset and the manual entry for details. Also, read the source given below, which contains more information.
Source: two posts from Statalist; one from Tim Wade and another by Jeff Pitblado.

Stata : generate/replace alternatives?

I use Stata since several years now, along with other languages like R.
Stata is great, but there is one thing that annoys me : the generate/replace behaviour, and especially the "... already defined" error.
It means that if we want to run a piece of code twice, if this piece of code contains the definition of a variable, this definition needs 2 lines :
capture drop foo
generate foo = ...
While it takes just one line in other languages such as R.
So is there another way to define variables that combines "generate" and "replace" in one command ?
I am unaware of any way to do this directly. Further, as #Roberto's comment implies, there are reasons simply issuing a generate command will not overwrite (see: replace) the contents of a variable.
To be able to do this while maintaining data integrity, you would need to issue two separate commands as your question points out (explicitly dropping the existing variable before generating the new one) - I see this as method in which Stata forces the user to be clear about his/her intentions.
It might be noted that Stata is not alone in this regard. SQL Server, for example, requires the user drop an existing table before creating a table with the same name (in the same database), does not allow multiple columns with the same name in a table, etc. and all for good reason.
However, if you are really set on being able to issue a one-liner in Stata to do what you desire, you could write a very simple program. The following should get you started:
program mkvar
version 13
syntax anything=exp [if] [in]
capture confirm variable `anything'
if !_rc {
drop `anything'
}
generate `anything' `exp' `if' `in'
end
You then would naturally save the program to mkvar.ado in a directory that Stata would find (i.e., C:\ado\personal\ on Windows. If you are unsure, type sysdir), and call it using:
mkvar newvar=expression [if] [in]
Now, I haven't tested the above code much so you may have to do a bit of de-bugging, but it has worked fine in the examples I've tried.
On a closing note, I'd advise you to exercise caution when doing this - certainly you will want to be vigilant with regard to altering your data, retain a copy of your raw data while a do file manipulates the data in memory, etc.

What is a regular expression that satisfies all valid options for a JOB card in JCL?

I'm working on a program that will need to remove a JOB card from a JCL member. I'm having a lot of trouble building something that satisfies all possible options and configurations.
Below is a good guide on the JOB statement:
http://www.tutorialspoint.com/jcl/jcl_job_statement.htm
Some issues though:
There may be multiple job cards in a member
There may be comments in the job card
There may be characters in columns 73-80
There may be a SYSAFF, SET or similar statement directly following the JOB statement that should be retained but may begin with slashes and spaces just like a job card
Any help would be appreciated. Currently I have the following regular expression:
//.*JOB.*\n(//\s{4,}[^\s]+(\s|\d)*\n)+
Ultimately I only need to change the JOB name to fit the restriction of the FTP JES reader which requires your job name to be the submitting USERID plus exactly one character under JESINTERFACELEVEL 1 which is used by our site. Changing only the job name would also be acceptable.
With the information from your comment on Joe's answer, your task becomes easier.
//JJJJJAAA JOB other-stuff
If the second word is JOB and the first two characters of the first word are // and the third character is not *, then you have a JOB card. Remove the first word, replacing it with //JJJJJx, where x is your additional single character. JJJJJ represents the user-id.
This does assume that the user-id of the existing JOBs will be the same as the user-id of the new JOBs, in which case the replacement JOB name is not going to cause the extension of the JOB card.
If this is not the case, if the user-id on the original JOB cards is shorter, or indeed not a user-id at all and is shorter, either all or some, then I'd recommend splitting the JOB card after the first comma (if present).
In the unlikely event that you have very long accounting information and nothing else, this may cause a JCL error when the above is true. If so, fix the accounting information or get around the user-id limit. This is an unlikely situation :-)
If there is no accounting information but there is a long comment, this may cause a JCL error by accidentally hitting column 72 with data (so it will think the next line is a Continuation). In the unlikely even of that happening, fix it.
Neither of these two are worth coding for. They are worth verifying for, though the simplest way to do that is to watch and pick them up if they fall over.
You do have one more thing to watch for, and this is whether any of your steps use DD * or DD DATA. If they do, then you have to discover if any use DLM=. If they do, you will have to switch off the search for the JOB card when encountering DLM=, and switch it on again when you reach the delimiter value starting in column one.
Your single character may cause you problems. You will have a limited number of jobnames possible per userid. Unless allowed, JOBs with the same name will not run at the same time.
You will need to account for the two positional parameters -- 142 bytes of accounting information and 30ish bytes for programmers name. Also, you will have to account for the optional keyword parameters:
ADDRSPC= BYTES= CARDS= CLASS= COND=
GROUP= LINES= MEMLIMIT= MSGCLASS= MSGLEVEL=
NOTIFY= PAGES= PASSWORD= PERFORM= PRTY=
RD= REGION= RESTART= SECLABEL= SCHENV=
TIME= TYPRUN= USER=
Dealing with the JES commands like SYSAFF and other JCL commands like SET make it very complicated.
You might want to approach it in steps -- regex to handle the "//" followed by up to 69 bytes and continued with a comma except in cases of comments where it starts with "//*".
It might help to know what you are trying to accomplish. You can ask JES to process the JCL for you and there are ways you can inspect the parsed JCL via macros, exits and control blocks.
In most cases it's the first card anyway. Or at least the first non-comment card.

How to post Stata program via Dropbox or private website?

Here is a sample program .do file, sampleprog.do:
program sampleprog
egen newVar = group (`1' `2')
end
How can I post it on my website (or dropbox), so that other people could install it to their Stata like this?
net from http://www.mywebsite.com/sampleprog.do
*** or may be like like this:
ssc install ...
I read the documentation about stata.toc...but I did not quite get it. What files should I upload and should it be one folder or what?
(PS: I definitely can simply email the .do file but this is not an option in my case.)
Here is a full explanation of how to share program or data files with others using your own website. I tried using Dropbox, but Stata 12 appears to have issues with https, which is the protocol for all Dropbox public links. If you want to use Dropbox, I recommend creating a shared folder that will sync on your collaborators' machines. The rest of this answer assumes you have a website serving pages over http or are using Stata 13, which supports https.
If this is a one-time thing, you can skip the rest of this answer by putting the file on your website and telling your collaborator to type:
. copy http://your-site.com/ado/program.ado program.ado
That will copy the ado file at the specified url into the user's current directory. If you want to provide information about your files, plan on sharing with multiple people and need to maintain/document a set files, read on!
Step 1 Create a folder on your website to hold the programs. I will call mine ado/
Step 2 Add the program files, help files, and data files you want to share. For this example, I have created a simple ado file called unique.ado with the following contents:
********************************************** unique.ado
capture program drop unique
program define unique
*! Count and number observations within group defined by varlist
* Example: unique person_id, obs(prow) tobs(pcount) sortby(time)
* to count and number rows by a variable called person_id
syntax varlist, obs(name) tobs(name) [sortby(varlist)]
bys `varlist' (`sortby') : gen long `obs' = _n
bys `varlist' (`sortby') : gen long `tobs' = _N
la var `obs' "Number of this row within `varlist' group."
la var `tobs' "Total number of rows with identical `varlist' values."
end
Step 3 Create a file called stata.toc to describe the files you wish to share. Here is mine:
********************************************** stata.toc
v 3
d Program to count observations by group
p unique [The unique.ado program for counting observations by group]
These files can be complicated. There are many features I won't cover here, but you can read this documentation to learn more.
Step 4 Create a package file for each of the packages defined by the lines in stata.toc that start with the letter p. Here is my package file for the unique package defined above:
********************************************** unique.pkg
v 3
d unique
d Program to count observations by group
d Distribution-Date: 28 June 2012
f unique.ado
Your directory now looks like this:
ado/
stata.toc
unique.ado
unique.pkg
Step 5 Use the site! Here are the commands to enter.
. net from http://example.com/ado/
. net describe unique
. net install unique
Here is what you'll see after entering the first command:
-----------------------------------------------------------------------------------
http://www.example.com/ado/
Program to count observations by group
-----------------------------------------------------------------------------------
PACKAGES you could -net describe-:
unique [The unique.ado program for counting observations by group]
-----------------------------------------------------------------------------------
The second command will tell you more about the package net describe unique:
---------------------------------------------------------------------------------------
package unique from http://www.example.com/ado
---------------------------------------------------------------------------------------
TITLE
unique
DESCRIPTION/AUTHOR(S)
Program to count observations by group
Distribution-Date: 28 June 2012
INSTALLATION FILES (type net install unique)
unique.ado
---------------------------------------------------------------------------------------
The third command will install the package net install unique:
checking unique consistency and verifying not already installed...
installing into /Users/cpoliquin/Library/Application Support/Stata/ado/plus/...
installation complete.
EDIT
See Nick's comments in the answer below. I intended this example to be simple and I don't expect other people to use this program. If you plan on submitting things to Stata Journal or SSC then his comments certainly apply! I hope this answer can serve as a decent tutorial for those confused by the official documentation.
This will be too long for a comment, so it is going to be an extra answer.
Your example uses the program name unique. If you search unique, all (or in Stata 13, search unique) you will find that a user-written program with the same name has been installed on SSC since 1998. This will create a clash of names for your users if (and only if) they attempt to use your program and also that earlier program. The more general advice is to search to see if a program name is already in use to try to avoid these problems.
Specifically, although you may just be using your unique as an arbitrary example, note that it contains bugs. An int doesn't contain enough bits to hold observation numbers exactly for large datasets. Also, as a matter of style, unique can change the sort order of your data, which is widely considered to be poor data management style.
Your example concerns dissemination of a program file without an accompanying help file. Suffice it to say that the SSC site would never accept such a program and the Stata Journal would not even review a paper based on such a submission before a help file was written to accompany it. Including explanatory comments with the code may be sufficient for your personal practices, but it falls below general Stata standards.
Stata 13 now supports https. See http://www.stata.com/manuals13/u.pdf, Section 3.6.
In short, I appreciate that you are trying to explain how to do something, but it is already well documented, and explicitly and implicitly some of your recommendations are below community standards.