I was hoping someone could give me advice as to easily change the value of various variables within a Fortran input file from a C++ application.
I have a model written in Fortran but am writing a C++ application to execute the model in a loop but changing the value of the model parameters after each model execution.
Any advice would be appreciated.
Thanks!
.Pnd file Subbasin: 1 7/26/2012 12:00:00 AM ArcSWAT 2009.93.7
Pond inputs:
0.000 | PND_FR : Fraction of subbasin area that drains into ponds. The value for PND_FR should be between 0.0 and 1.0. If PND_FR = 1.0, the pond is at the outlet of the subbasin on the main channel
0.000 | PND_PSA: Surface area of ponds when filled to principal spillway [ha]
0.000 | PND_PVOL: Volume of water stored in ponds when filled to the principal spillway [104 m3]
0.000 | PND_ESA: Surface area of ponds when filled to emergency spillway [ha]
0.000 | PND_EVOL: Volume of water stored in ponds when filled to the emergency spillway [104 m3]
0.000 | PND_VOL: Initial volume of water in ponds [104 m3]
0.000 | PND_SED: Initial sediment concentration in pond water [mg/l]
0.000 | PND_NSED: Normal sediment concentration in pond water [mg/l]
0.000 | PND_K: Hydraulic conductivity through bottom of ponds [mm/hr].
0 | IFLOD1: Beginning month of non-flood season
0 | IFLOD2: Ending month of non-flood season
0.000 | NDTARG: Number of days needed to reach target storage from current pond storage
10.000 | PSETLP1: Phosphorus settling rate in pond for months IPND1 through IPND2 [m/year]
10.000 | PSETLP2: Phosphorus settling rate in pond for months other than IPND1-IPND2 [m/year]
5.500 | NSETLP1: Initial dissolved oxygen concentration in the reach [mg O2/l]
5.500 | NSETLP2: Initial dissolved oxygen concentration in the reach [mg O2/l]
1.000 | CHLAP: Chlorophyll a production coefficient for ponds [ ]
1.000 | SECCIP: Water clarity coefficient for ponds [m]
0.000 | PND_NO3: Initial concentration of NO3-N in pond [mg N/l]
0.000 | PND_SOLP: Initial concentration of soluble P in pond [mg P/L]
0.000 | PND_ORGN: Initial concentration of organic N in pond [mg N/l]
0.000 | PND_ORGP: Initial concentration of organic P in pond [mg P/l]
5.000 | PND_D50: Median particle diameter of sediment [um]
1 | IPND1: Beginning month of mid-year nutrient settling "season"
1 | IPND2: Ending month of mid-year nutrient settling "season"
Wetland inputs:
0.000 | WET_FR : Fraction of subbasin area that drains into wetlands
0.000 | WET_NSA: Surface area of wetlands at normal water level [ha]
0.000 | WET_NVOL: Volume of water stored in wetlands when filled to normal water level [104 m3]
0.000 | WET_MXSA: Surface area of wetlands at maximum water level [ha]
0.000 | WET_MXVOL: Volume of water stored in wetlands when filled to maximum water level [104 m3]
0.000 | WET_VOL: Initial volume of water in wetlands [104 m3]
0.000 | WET_SED: Initial sediment concentration in wetland water [mg/l]
0.000 | WET_NSED: Normal sediment concentration in wetland water [mg/l]
0.000 | WET_K: Hydraulic conductivity of bottom of wetlands [mm/hr]
0.000 | PSETLW1: Phosphorus settling rate in wetland for months IPND1 through IPND2 [m/year]
0.000 | PSETLW2: Phosphorus settling rate in wetlands for months other than IPND1-IPND2 [m/year]
0.000 | NSETLW1: Nitrogen settling rate in wetlands for months IPND1 through IPND2 [m/year]
0.000 | NSETLW2: Nitrogen settling rate in wetlands for months other than IPND1-IPND2 [m/year]
0.000 | CHLAW: Chlorophyll a production coefficient for wetlands [ ]
0.000 | SECCIW: Water clarity coefficient for wetlands [m]
0.000 | WET_NO3: Initial concentration of NO3-N in wetland [mg N/l]
0.000 | WET_SOLP: Initial concentration of soluble P in wetland [mg P/l]
0.000 | WET_ORGN: Initial concentration of organic N in wetland [mg N/l]
0.000 | WET_ORGP: Initial concentration of organic P in wetland [mg P/l]
0.000 | PNDEVCOEFF: Actual pond evaporation is equal to the potential evaporation times the pond evaporation coefficient
0.000 | WETEVCOEFF: Actual wetland evaporation is equal to the potential evaporation times the wetland evaporation coefficient.
Take a look at Fortran's namelist I/O which may suit your purposes. The IBM XL Fortran documentation for this feature is here. Make sure you consult the documentation for your own compiler, each one has some quirky variations on the standard it seems (or perhaps the standard isn't quite tight enough).
I tools like there is a number inside a 20-byte ASCII field, a '|' symbol, and a variable name terminated by a ':'. The lines are variable length. So something like this:
// ./bin/bin/g++ -o hackpond hackpond.cpp
#include <fstream>
#include <string>
#include <cstddef>
#include <sstream>
#include <iomanip>
int
main()
{
std::string variable_you_want = "PND_SOLP";
std::string line;
std::fstream pond("test.pnd");
std::getline(pond, line);
while (! pond.eof())
{
std::string value, variable;
std::size_t loc = pond.tellg();
std::getline(pond, line);
if (line == "Pond inputs:")
continue;
else if (line == "Wetland inputs:")
continue;
std::ostringstream hack();
value = line.substr(0, 20);
std::size_t colon = line.find(':', 20);
if (colon == std::string::npos)
continue;
variable = line.substr(22, colon - 22);
if (variable == variable_you_want)
{
double new_value = 666.66;
pond.seekp(loc);
std::ostringstream thing;
thing << std::setw(20) << new_value;
pond.write(thing.str().c_str(), thing.str().length());
}
if (pond.eof())
break;
}
}
The basic idea is that you 1. note the starting location of the line you want to change (tellg), 2. create a 20 character string containing the new value, 3. go to the saved start position of that line (seekp) and write the new chunk.
I'm not an expert on FORTRAN, but I think I would look into using command line arguments for each parameter that needs to change. This is only possible, however, if you are using FORTRAN 2003. I think that FORTRAN 95 is still one of the most common FORTRAN standards in use, but I don't know that for sure. If your implementation of FORTRAN does support command line arguments, see if this site can help you figure out how to use them: http://fortranwiki.org/fortran/show/Command-line+arguments
Otherwise, you could have the FORTRAN program read in those values from a text file. If you want to get really nasty, you could have your C++ program actually alter the FORTRAN source file and recompile it each time it needs to run. If you do that, then back it up first. Good luck.
As far as I understand your problem, you merely need some text-editing in C++.
I guess the easiest thing would be to have all the parameters in the C++ code and let that code write the complete text file for each run.
If you don't want to do that, I'd recommend to use some template toolbox, maybe something like cpptemplate or teng, though there are probably many more templating engines available for C++ out there, which might be suitable for your use case.
I know you asked for C++, but I have a very similar situation with my code and I use shell scripts.
I use sed s/var_name/value <template >input_file to change occurrences of the var_name in the template. My file format is set up to make this easy, but sed is a super flexible tool and I am sure it could do what you are asking. There is a tutorial here.
With that set up I write a script to loop through these sed commands and the application calls. This works like a charm. Besides, learning a little scripting will help you handle all kind of other tasks like sorting through the data generated from all those different runs.
Related
I have ran a regression of type
reg foo I.year
and would like to plot the yearly effects. The regression result table looks like this:
foo | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
year |
2001 | .1253994 .0047826 26.22 0.000 .1160255 .1347734
2002 | .06168 .0045566 13.54 0.000 .052749 .0706109
2003 | .1324228 .005008 26.44 0.000 .122607 .1422385
2004 | .1177605 .0051766 22.75 0.000 .1076143 .1279066
2005 | .1007163 .005018 20.07 0.000 .090881 .1105516
2006 | .0792936 .0047979 16.53 0.000 .0698897 .0886974
Unfortunately, when I use coefplot, vert, it says on the x-axis Survey year=2001, Survey year=2002 and so on which consumes a lot of space. I understand that coeflabels allows me to relabel coefficients, but do I have to do that for every single one of these? What if I had 30 years - is there a more generic version of relabeling it?
Sounds like a weird solution but it did work for me.
Simply add any value label to your survey year variable and it should recognize the years as their values.
In case adding any value label does not work, you can create a loop to set a value label for each year as its own year.
levelsof year, local(years)
foreach lvl of local years {
lab def year `lvl' "`lvl'", modify
}
lab val year year
I'm running a regression in Stata for which I would like to use cluster2 (http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/se_programming.htm).
I encounter the following problem. Stata reports factor variables and time-series operators not allowed. I am using a large vector of controls, extensively applying the methods Stata offers for interactions.
For example: state##c.wind_speed##L.c.relative_humidity. cluster2 and also other Stata packages do not allow to include such expressions as independent variables. Is there a productive way how to create such a long vector of interaction variables myself?
I believe that one can trick ivreg2 by Baum-Shaffer-Stillman into running OLS with two-way clustering and interactions thusly:
. webuse nlswork
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. ivreg2 ln_w grade c.age##c.ttl_exp tenure, cluster(idcode year)
OLS estimation
--------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on idcode and year
Number of clusters (idcode) = 4697 Number of obs = 28099
Number of clusters (year) = 15 F( 5, 14) = 674.29
Prob > F = 0.0000
Total (centered) SS = 6414.823933 Centered R2 = 0.3206
Total (uncentered) SS = 85448.21266 Uncentered R2 = 0.9490
Residual SS = 4357.997339 Root MSE = .3938
---------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
grade | .0734785 .002644 27.79 0.000 .0682964 .0786606
age | -.0005405 .002259 -0.24 0.811 -.0049681 .0038871
ttl_exp | .0656393 .0068499 9.58 0.000 .0522138 .0790648
|
c.age#c.ttl_exp | -.0010539 .0002217 -4.75 0.000 -.0014885 -.0006194
|
tenure | .0197137 .0029555 6.67 0.000 .013921 .0255064
_cons | .5165052 .0529343 9.76 0.000 .4127559 .6202544
---------------------------------------------------------------------------------
Included instruments: grade age ttl_exp c.age#c.ttl_exp tenure
------------------------------------------------------------------------------
Just to be sure compare that to OLS coefficients:
. reg ln_w grade c.age##c.ttl_exp tenure
Source | SS df MS Number of obs = 28,099
-------------+---------------------------------- F(5, 28093) = 2651.79
Model | 2056.82659 5 411.365319 Prob > F = 0.0000
Residual | 4357.99734 28,093 .155127517 R-squared = 0.3206
-------------+---------------------------------- Adj R-squared = 0.3205
Total | 6414.82393 28,098 .228301798 Root MSE = .39386
---------------------------------------------------------------------------------
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
grade | .0734785 .0010414 70.55 0.000 .0714373 .0755198
age | -.0005405 .000663 -0.82 0.415 -.0018401 .0007591
ttl_exp | .0656393 .0030809 21.31 0.000 .0596007 .0716779
|
c.age#c.ttl_exp | -.0010539 .0000856 -12.32 0.000 -.0012216 -.0008862
|
tenure | .0197137 .0008568 23.01 0.000 .0180344 .021393
_cons | .5165052 .0206744 24.98 0.000 .4759823 .557028
---------------------------------------------------------------------------------
You don't include a verifiable example. See https://stackoverflow.com/help/mcve for key advice.
At first sight, however, the problem is that cluster2 is an oldish program written in 2006/2007 whose syntax statement just doesn't allow factor variables.
You could try hacking a clone of the program to fix that; I have no idea whether that would be sufficient.
No specific comment is possible on the "other Stata packages" you imply to have the same problem except that it may well arise for the same reason. Factor variables were introduced in Stata 11 (see here for documentation) in 2009 and older programs won't allow them without modification.
In general, I would ask questions like this on Statalist. It's quite likely that this program has been superseded by some different program.
If you find a Stata program on the internet without a help file, as appears to be the case here, it is usually an indicator that the program was written ad hoc and is not being maintained. In this case, it is evident also that the program has not been updated in the 6 years since Stata 11.
You could also, as you imply, just create the interaction variables yourself. I don't think anyone has written a really general tool to automate that: there would be no point (since 2009) in a complicated alternative to factor variable notation.
I am running a probit regression with an interaction between one continuous and one dummy variable. The coefficient is displayed in the regression output but when I look at the marginal effects the interaction is missing.
How can I get the marginal effect of the interaction variable?
probit move_right c.real_income_change_percent##i.gender
Iteration 0: log likelihood = -345.57292
Iteration 1: log likelihood = -339.10962
Iteration 2: log likelihood = -339.10565
Iteration 3: log likelihood = -339.10565
Probit regression Number of obs = 958
LR chi2(3) = 12.93
Prob > chi2 = 0.0048
Log likelihood = -339.10565 Pseudo R2 = 0.0187
-----------------------------------------------------------------------------------------------------
move_right | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------------------------------------+----------------------------------------------------------------
real_income_change_percent | .0034604 .0010125 3.42 0.001 .001476 .0054448
|
gender |
Female | .0695646 .1139538 0.61 0.542 -.1537807 .2929099
|
gender#c.real_income_change_percent |
Female | -.0039908 .0015254 -2.62 0.009 -.0069805 -.0010011
|
_cons | -1.263463 .0798439 -15.82 0.000 -1.419954 -1.106972
-----------------------------------------------------------------------------------------------------
margins, dydx(*) post
Average marginal effects Number of obs = 958
Model VCE : OIM
Expression : Pr(move_right), predict()
dy/dx w.r.t. : real_income_change_percent 1.gender
--------------------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
---------------------------+----------------------------------------------------------------
real_income_change_percent | .0002846 .0001454 1.96 0.050 -4.15e-07 .0005697
|
gender |
Female | -.0102626 .0207666 -0.49 0.621 -.0509643 .0304392
--------------------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Your question seems strange to me. You asked about a dummy-dummy interaction, but your example involves continuous-dummy interaction.
Here's how to do either one:
webuse union, clear
/* dummy-dummy iteraction */
probit union i.south##i.black grade, nolog
margins r.south#r.black
/* continuous-dummy iteraction */
probit union i.south##c.grade
margins r.south, dydx(grade)
You should try to reproduce these by "hand" (using differences of predicts) to understand what the margins command is doing behind the scenes.
This sounds like a question specific to a piece of software (Stata), hence the close votes, but there is a statistical question lurking here: What would a marginal effect of an interaction effect look like?
Such marginal effects are not trivial, and tend to depend strongly on the values of the other covariates, see this article. Often this marginal effect is so variable that it makes no sense to try to summarize it with one number. In my opinion, this is a major weakness. To the extend that in general I tend to prefer using logistic regression and interpret the interaction term as a ratio of odds ratios, see this article.
I was wondering if there's a way to include panel-specific or just varying trends in a first-difference regression when clustering on the panel id and the time variable.
Here's an example of with Stata:
. webuse nlswork
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. ivreg2 S1.(ln_wage tenure) , cluster(idcode year)
OLS estimation
--------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on idcode and year
Number of clusters (idcode) = 3660 Number of obs = 10528
Number of clusters (year) = 8 F( 1, 7) = 2.81
Prob > F = 0.1378
Total (centered) SS = 1004.098948 Centered R2 = 0.0007
Total (uncentered) SS = 1035.845686 Uncentered R2 = 0.0314
Residual SS = 1003.36326 Root MSE = .3087
------------------------------------------------------------------------------
| Robust
S.ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tenure |
S1. | .0076418 .0042666 1.79 0.073 -.0007206 .0160043
|
_cons | .0501738 .0070986 7.07 0.000 .0362608 .0640868
------------------------------------------------------------------------------
Included instruments: S.tenure
------------------------------------------------------------------------------
. ivreg2 S1.(ln_wage tenure i.c_city), cluster(idcode year)
factor variables not allowed
r(101);
In the specification above, the constant corresponds to a common time trend. Putting the factor variable outside the seasonal difference operator errors as well.
I understand that the differencing operator does not play well with factor variables or interactions, but I feel there must be some hack to get around that.
The ivreg2 is a bit of a red herring. I am not doing IV estimation, I just want to use two-way clustering.
You get the same solution as #Metrics if you do xi: ivreg2 S1.(ln_wage tenure) i.ind_code , cluster(idcode year)
My question relates to calculating the standard deviation (SD) of transition probabilities derived from coefficients estimated through Weibull regression in Stata.
The transition probabilities are being used to model disease progression of leukemia patients over 40 cycles of 90 days (about 10 years). I need the SDs of the probabilities (which change over the run of the Markov model) to create beta distributions whose parameters can be approximated using the corresponding Markov cycle probability and its SD. These distributions are then used to do Probabilistic sensitivity analysis, i.e., they are substituted for the simple probabilities (one for each cycle) and random draws from them can evaluate the robustness of the model’s cost-effectiveness results.
Anyway, using time to event survival data, I’ve used regression analysis to estimate coefficients that can be plugged into an equation to generate transition probabilities. For example...
. streg, nohr dist(weibull)
failure _d: event
analysis time _t: time
Fitting constant-only model:
Iteration 0: log likelihood = -171.82384
Iteration 1: log likelihood = -158.78902
Iteration 2: log likelihood = -158.64499
Iteration 3: log likelihood = -158.64497
Iteration 4: log likelihood = -158.64497
Fitting full model:
Iteration 0: log likelihood = -158.64497
Weibull regression -- log relative-hazard form
No. of subjects = 93 Number of obs = 93
No. of failures = 62
Time at risk = 60250
LR chi2(0) = -0.00
Log likelihood = -158.64497 Prob > chi2 = .
------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------
-------------+------
_cons | -4.307123 .4483219 -9.61 0.000 -5.185818 -3.428429
-------------+----------------------------------------------------------
-------------+------
/ln_p | -.4638212 .1020754 -4.54 0.000 -.6638854 -.263757
-------------+----------------------------------------------------------
-------------+------
p | .628876 .0641928 .5148471 .7681602
1/p | 1.590139 .1623141 1.301812 1.942324
We then create the probabilities with an equation () that uses p and _cons as well as t for time (i.e., Markov cycle number) and u for cycle length (usually a year, mine is 90 days since I’m working with leukemia patients who are very likely to have an event, i.e., relapse or die).
So where lambda = p, gamma = (exp(_cons))
gen result = (exp((lambda*((t-u)^ (gamma)))-(lambda*(t^(gamma)))))
gen transitions = 1-result
Turning to the variability, I first calculate the standard errors for the coefficients
. nlcom (exp(_b[_cons])) (exp(_b[/ln_p]))
_nl_1: exp(_b[_cons])
_nl_2: exp(_b[/ln_p])
------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------
-------------+------
_nl_1 | .0116539 .0044932 2.59 0.009 .0028474 .0204604
_nl_2 | .6153864 .054186 11.36 0.000 .5091838 .721589
But what I’m really after is the standard errors on the transitions values, e.g.,
nlcom (_b[transitions])
But this doesn’t work and the book I'm using doesn't give hints on getting at this extra info. Any feedback on how to get closer would be much appreciated.
Second Answer: 2014-03-23
Update: 2014-03-26 I fixed the negative probabilities: I'd made an error in transcribing Emily's code. I also show that nlcom as suggested on Statalist by Austin Nichols (http://www.stata.com/statalist/archive/2014-03/msg00002.html). I made one correction to Austin's code.
Bootstrapping is still the key to the solution. The target quantities are probabilities calculated by a formula that is based on a nonlinear combination of estimated parameters from streg. As the estimates are not contained in the matrix e(b) returned after streg, nlcom will not estimate the standard errors. This is an ideal situation for bootstrapping. The standard approach is adopted: create a program myprog to estimate the parameters; then bootstrap that program.
In the example, transition probabilities pt for a range of t values are estimated. The user must set the minimum and maximum of the t range as well as a scalar u. Of interest, perhaps, is that , since the number of estimated parameters is variable, a foreach statement is required inside myprog. Also, bootstrap requires an argument that consists of a list of estimates returned by myprog. This list is also constructed in a foreach loop.
/* set u and minimum and maximum times here */
scalar u = 1
local tmin = 1
local tmax = 3
set linesize 80
capture program drop _all
program define myprog , rclass
syntax anything
streg , nohr dist(weibull)
scalar lambda = exp(_b[ln_p:_cons])
scalar gamma =exp(_b[_t:_cons])
forvalues t = `1'/`2'{
scalar p`t'= 1 - ///
(exp((lambda*((`t'-u)^(gamma)))-(lambda*(`t'^(gamma)))))
return scalar p`t' = p`t'
}
end
webuse cancer, clear
stset studytime, fail(died)
set seed 450811
/* set up list of returned probabilities for bootstrap */
forvalues t = `tmin'/`tmax' {
local p`t' = "p" + string(`t')
local rp`t'= "`p`t''" + "=" + "("+ "r(" + "`p`t''" +"))"
local rlist = `"`rlist' `rp`t''"'
}
bootstrap `rlist', nodots: myprog `tmin' `tmax'
forvalues t = `tmin'/`tmax' {
qui streg, nohr dist(weibull)
nlcom 1 - ///
(exp((exp(_b[ln_p:_cons])*((`t'-u)^(exp(_b[_t:_cons]))))- ///
(exp(_b[ln_p:_cons])*(`t'^(exp(_b[_t:_cons]))))))
}
Results:
Bootstrap results Number of obs = 48
Replications = 50
command: myprog 1 3
p1: r(p1)
p2: r(p2)
p3: r(p3)
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p1 | .7009447 .0503893 13.91 0.000 .6021834 .7997059
p2 | .0187127 .007727 2.42 0.015 .0035681 .0338573
p3 | .0111243 .0047095 2.36 0.018 .0018939 .0203548
------------------------------------------------------------------------------
/* results of nlcom */
------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_nl_1 | .7009447 .0543671 12.89 0.000 .594387 .8075023
-------------+----------------------------------------------------------------
_nl_1 | .0187127 .0082077 2.28 0.023 .0026259 .0347995
-------------+----------------------------------------------------------------
_nl_1 | .0111243 .0049765 2.24 0.025 .0013706 .0208781
------------------------------------------------------------------------------
This answer is incorrect, but I keep it because it generated some useful comments
sysuse auto, clear
gen u = 90 +rnormal()
set seed 1234
capture program drop _all
program define myprog , rclass
tempvar result
reg turn disp /* Here substitute your -streg- statement */
gen result' = _b[disp]*u
sumresult'
return scalar sd = r(sd)
end
bootstrap sdr = r(sd): myprog
estat bootstrap, bc percentile
Of note: in the bootstrapped program, the new variable (your result) must be defined as temporary; otherwise the gen statement will lead to an error because the variable is created anew for each bootstrap replicate.