Optimize in Stata / Mata - stata

I want write a loop in Stata with a Mata command 'optimize'. The basic syntax is (in .do file):
mata: x=runiform(100,2)
mata: F=J(rows(x),1,3)
mata: X=J(1,2,48)
mata: I=J(rows(x),1,1)
mata:
void mysolver(todo, p, x, X, I, F, lnf, S, H)
{
factor = F :* (I + x*p')
factor_bis= factor , factor
Cuenta = x :* factor_bis
Final=I'*Cuenta
vvv = Final - X
lnf = (vvv*vvv')[1,1]
}
mata:
S = optimize_init()
optimize_init_evaluator(S, &mysolver())
optimize_init_evaluatortype(S, "v0")
optimize_init_params(S, J(1,2,0.01))
optimize_init_which(S, "min" )
optimize_init_argument(S, 1, x)
optimize_init_argument(S, 2, X)
optimize_init_argument(S, 3, I)
optimize_init_argument(S, 4, F)
optimize_init_tracelevel(S,"none")
optimize_init_conv_ptol(S, 1e-16)
optimize_init_conv_vtol(S, 1e-16)
xx=optimize(S)
st_matrix("param_estim",xx)
end
How write a procedure 'optimizo' to be include in a loop:
forvalues i=(1)500 {
.....
optimizo
}
to repeat 500 times the optimization? (in my application, the matrices change in each cycle)
Thank You.

In your main code, instead of the line
xx=optimize(S)
Simply write this:
for (i = 1; i <= 500; i++) {
[Do your matrix changes, call your optimize_init_argument commands if you need to change them]
xx = optimize(S)
xx
}

Related

Looping in Mata with OLS

I need help with looping in Mata. I have to write a code for Beta coefficients for OLS in Mata using a loop. I am not sure how to call for the variables and create the code. Here is what I have so far.
foreach j of local X {
if { //for X'X
matrix XX = [mata:XX = cross(X,1 , X,1)]
XX
}
else {
mata:Xy = cross(X,1 , y,0)
Xy
}
I am getting an error message "invalid syntax".
I'm not sure what you need the loop for. Perhaps you can provide more information about that. However the following example may help you implement OLS in mata.
Load example data from bcuse:
ssc install bcuse
clear
bcuse bwght
mata
x = st_data(., ("male", "parity","lfaminc","packs"))
cons = J(rows(x), 1, 1)
X = (x, cons)
y = st_data(., ("lbwght"))
beta_hat = (invsym(X'*X))*(X'*y)
e_hat = y - X * beta_hat
s2 = (1 / (rows(X) - cols(X))) * (e_hat' * e_hat)
B = J(cols(X), cols(X), 0)
n = rows(X)
for (i=1; i<=n; i++) {
B =B+(e_hat[i,1]*X[i,.])'*(e_hat[i,1]*X[i,.])
}
V_robust = (n/(n-cols(X)))*invsym(X'*X)*B*invsym(X'*X)
se_robust = sqrt(diagonal(V_robust))
V_ols = s2 * invsym(X'*X)
se_ols = sqrt(diagonal(V_ols))
beta_hat
se_robust
end
This is far from the only way to implement OLS using mata. See the Stata Blog for another example using quadcross, I like my example because it preserves a little more of the matrix algebra in the code.

Rcpp Create DataFrame with Variable Number of Columns

I am interested in using Rcpp to create a data frame with a variable number of columns. By that, I mean that the number of columns will be known only at runtime. Some of the columns will be standard, but others will be repeated n times where n is the number of features I am considering in a particular run.
I am aware that I can create a data frame as follows:
IntegerVector i1(3); i1[0]=4;i1[1]=2134;i1[2]=3453;
IntegerVector i2(3); i2[0]=4123;i2[1]=343;i2[2]=99123;
DataFrame df = DataFrame::create(Named("V1")=i1,Named("V2")=i2);
but in this case it is assumed that the number of columns is 2.
To simplify the explanation of what I need, assume that I would like pass a SEXP variable specifying the number of columns to create in the variable part. Something like:
RcppExport SEXP myFunc(SEXP n, SEXP <other stuff>)
IntegerVector i1(3); <compute i1>
IntegerVector i2(3); <compute i2>
for(int i=0;i<n;i++){compute vi}
DataFrame df = DataFrame::create(Named("Num")=i1,Named("ID")=i2,...,other columns v1 to vn);
where n is passed as an argument. The final data frame in R would look like
Num ID V1 ... Vn
1 2 5 'aasda'
...
(In reality, the column names will not be of the form "Vx", but they will be known at runtime.) In other words, I cannot use a static list of
Named()=...
since the number will change.
I have tried skipping the "Named()" part of the constructor and then naming the columns at the end, but the results are junk.
Can this be done?
If I understand your question correctly, it seems like it would be easiest to take advantage of the DataFrame constructor that takes a List as an argument (since the size of a List can be specified directly), and set the names of your columns via .attr("names") and a CharacterVector:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::DataFrame myFunc(int n, Rcpp::List lst,
Rcpp::CharacterVector Names = Rcpp::CharacterVector::create()) {
Rcpp::List tmp(n + 2);
tmp[0] = Rcpp::IntegerVector(3);
tmp[1] = Rcpp::IntegerVector(3);
Rcpp::CharacterVector lnames = Names.size() < lst.size() ?
lst.attr("names") : Names;
Rcpp::CharacterVector names(n + 2);
names[0] = "Num";
names[1] = "ID";
for (std::size_t i = 0; i < n; i++) {
// tmp[i + 2] = do_something(lst[i]);
tmp[i + 2] = lst[i];
if (std::string(lnames[i]).compare("") != 0) {
names[i + 2] = lnames[i];
} else {
names[i + 2] = "V" + std::to_string(i);
}
}
Rcpp::DataFrame result(tmp);
result.attr("names") = names;
return result;
}
There's a little extra going on there to allow the Names vector to be optional - e.g. if you just use a named list you can omit the third argument.
lst1 <- list(1L:3L, 1:3 + .25, letters[1:3])
##
> myFunc(length(lst1), lst1, c("V1", "V2", "V3"))
# Num ID V1 V2 V3
#1 0 0 1 1.25 a
#2 0 0 2 2.25 b
#3 0 0 3 3.25 c
lst2 <- list(
Column1 = 1L:3L,
Column2 = 1:3 + .25,
Column3 = letters[1:3],
Column4 = LETTERS[1:3])
##
> myFunc(length(lst2), lst2)
# Num ID Column1 Column2 Column3 Column4
#1 0 0 1 1.25 a A
#2 0 0 2 2.25 b B
#3 0 0 3 3.25 c C
Just be aware of the 20-length limit for this signature of the DataFrame constructor, as pointed out by #hrbrmstr.
It's an old question, but I think more people are struggling with this, like me. Starting from the other answers here, I arrived at a solution that isn't limited by the 20 column limit of the DataFrame constructor:
// [[Rcpp::plugins(cpp11)]]
#include <Rcpp.h>
#include <string>
#include <iostream>
using namespace Rcpp;
// [[Rcpp::export]]
List variableColumnList(int numColumns=30) {
List retval;
for (int i=0; i<numColumns; i++) {
std::ostringstream colName;
colName << "V" << i+1;
retval.push_back( IntegerVector::create(100*i, 100*i + 1),colName.str());
}
return retval;
}
// [[Rcpp::export]]
DataFrame variableColumnListAsDF(int numColumns=30) {
Function asDF("as.data.frame");
return asDF(variableColumnList(numColumns));
}
// [[Rcpp::export]]
DataFrame variableColumnListAsTibble(int numColumns=30) {
Function asTibble("tbl_df");
return asTibble(variableColumnList(numColumns));
}
So build a C++ List first by pushing columns onto an empty List. (I generate the values and the column names on the fly here.) Then, either return that as an R list, or use one of two helper functions to convert them into a data.frame or tbl_df. One could do the latter from R, but I find this cleaner.

Print subset of matrix

I'm trying to create a code to run a simple perceptron in SAS base.
I'd like to print in each iteration (or store in a table) the result and the target, but I get an error when I try to print y[i,]:
proc iml;
use percept; read all var{x1 X2} into X;
read all var{Y} into Y;
W={0,0}; b=0; k=0; L=nrow(X); eta=.8; o=0;
print w b k L eta;
do step = 1 to 6;
mistakes=0;
do i=1 to L;
o=(X[i, ]*W + b);
if Y[i, ]*o <= 0 then do;
W = W + eta*(Y[i, ]-o)*X[i,]`;
b = b + eta*(Y[i, ]-o)*1;
k=k+1; mistakes=mistakes+1;
print o Y[i, ] W b k mistakes;
end;
end;
end;
I get the error:
Syntax error, expecting one of the following: C, COLNAME, F, FORMAT,
L, LABEL, R,
ROWNAME, ], |). The option or parameter is not recognized and will be ignored.
Do I have any other form to print the target?
Thanks a lot!
Per the documentation on PRINT, you need to do it like this:
print(Y[i,])
This is because they overload the [ ] to indicate formatting, rownames/colnames, etc., which is rather silly (but presumably to imitate some other language?). So you just need to wrap (Y[i,]) like so.
Here's a silly example.
proc iml;
use sashelp.class;
read all var{name,sex} into class;
read all var{height,weight,age} into classN;
y = mean(classN[,2]);
print class;
print (class[1:2,]);
print y (class[1:2,]);
quit;

Stata- Is there a way to store data like Python's dictionary or a hash map?

Is there a way to store information in Stata similar to a dictionary in Python or a hash map in other languages?
I am iterating through variable lists that are appended with _1, _2, _3, _4, _5, _6, _7 ... _18 to delineate sections, and I want to sum the number of times the letters "DK" appear in each variable in each section. Right now I have 18 for loops, with each loop iterating through a different section, saving the 'sum' of the total number of DK's in a new variable called DK_1sum, DK_2sum, and then I later produce graphs of that data.
I'm wondering if there is a way to turn all this into a large For loop, and just append the data to a dictionary/array such that the data looks like:
{s1Sum, 25
s2Sum, 56 ...
s18Sum, 101}
Is this possible?
This could be stored in a Stata matrix, a Mata matrix or just ordinary Stata variables.
gen count = .
gen which = _n
qui forval j = 1/18 {
scalar found = 0
foreach v of var *_`j' {
count if strpos(`v', "DK")
scalar found = scalar(found) + r(N)
}
replace count = scalar(found) in `j'
}
list which count in 1/18
For variation, here is a Stata matrix approach.
matrix count = J(18,1,.)
qui forval j = 1/18 {
scalar found = 0
foreach v of var *_`j' {
count if strpos(`v', "DK")
scalar found = scalar(found) + r(N)
}
matrix count[`j', 1] = scalar(found)
}
matrix list count
If you are concerned about efficiency you could consider the associative array capabilities of Mata.
* associate Y with X
local yvalue "Y"
mata : H = asarray_create()
mata : asarray(H, "X", st_local("yvalue"))
* available in Mata
mata : asarray(H, "X")
* available in Stata
mata : st_local("xvalue", asarray(H, "X"))
di "`xvalue'"

How can i find the sum of a list of functions in R?

So i have a list of functions.I want to create a for loop that returns (obviously as a function) the sum of them.
In order to create a list of functions inside a for loop i am using this code
##CODE
f=dnorm
h=function(x){log(f(x))}
S=c(-3,-2,-1,0,1,2,3)
K=matrix(rep(1:length(S),2),ncol=2)
for(i in 1:length(S)){
K[i,]=c(S[i],h(S[i]))
}
funcs=list()
## LOOP TO DEFINE THE LINES
for(i in 1:6){
## Make function name
funcName <- paste( 'hl', i,i+1, sep = '' )
## Make function
func1 = paste('function(x){ (K[',i,'+1,2]-K[',i,',2])/(K[',i,'+1,1]-K[',i,',1])*x+
K[',i,'+1,2]-((K[',i,'+1,2]-K[',i,',2])/(K[',i,'+1,1]-K[',i,',1]))*K[',i,'+1,1]}',sep
= '')
funcs[[funcName]] = eval(parse(text=func1))
}
which creates a list of 6 functions. How can I get their sum? I tried using the apply commands but either my syntax is not correct or they do not work.
P.S I am actually trying to write my one code for the ars command.
As Nick pointed out, "the sum of functions" doesn't make sense. I'm wildly guessing that you want to evaluate at function at some point (at S?) and then take the sum of those values. This should do the trick.
rowSums(sapply(funcs, function(f) f(S)))
Much of your code can be written more cleanly, and in a vectorised way.
f <- dnorm
h <- function(x) log(f(x))
S <- -3:3
K <- cbind(S, h(S)) #No need to define this twice; no need to use a loop
i <- seq_len(6)
funcNames <- paste('hl', i, i+1, sep = '') #paste is vectorised
#You can avoid using `paste`/`eval`/`parse` with this function to create the functions
#Can possibly be done even more cleanly by using local
makeFunc <- function(i)
{
evalq(substitute(
function(x)
{
(K[i + 1, 2] - K[i, 2]) / (K[i + 1, 1] - K[i, 1]) * x +
K[i + 1, 2] -
((K[i + 1, 2] - K[i, 2]) / (K[i + 1, 1] - K[i, 1])) * K[i + 1, 1]
},
list(i = i)
))
}
funcs <- lapply(i, makeFunc)
names(funcs) <- funcNames
rowSums(sapply(funcs, function(f) f(S)))