I would like to run several linear regressions using categorical exposure variables and to output the results to an excel sheet.
The code below works fine when the exposure is continuous. However, for categorical exposures the code only outputs the first row of results rather for ever level of the exposure.
*Code which works for continuous exposures
sysuse auto.dta
describe
summ
postfile temp str40 exp str40 outcome adjusted N beta se lci uci pval using ///
"\test.dta", replace
foreach out in price weight {
foreach exp in i.rep78 {
foreach adjust in 1 {
if `adjust'==1 local adjusted "mpg"
xi: reg `out' `exp' `adjusted'
local N = e(N)
matrix table=r(table)
local beta = table[1,1]
local se = table[2,1]
local lci = table[5,1]
local uci = table[6,1]
local pval=table[4,1]
post temp ("`out'") ("`exp'") (`adjusted') (`N') (`beta') (`se') ///
(`lci') (`uci') (`pval')
}
}
}
postclose temp
use "\test.dta", clear
export excel using "\test.xlsx", firstrow(variables)
The above code only produces one row with estimates for the first level of rep78 when it should produce 4 rows (rep78 is a 5-level categorical variable).
You need to modify your code to save the results from all the relevant columns of r(table):
. reg price i.rep78
. matrix list r(table)
r(table)[9,6]
1b. 2. 3. 4. 5.
rep78 rep78 rep78 rep78 rep78 _cons
b 0 1403.125 1864.7333 1507 1348.5 4564.5
se . 2356.0851 2176.4582 2221.3383 2290.9272 2107.3466
t . .5955324 .85677426 .67841985 .58862629 2.165994
pvalue . .55358783 .39476643 .49995129 .55818378 .03404352
ll . -3303.696 -2483.2417 -2930.6334 -3228.1533 354.5913
ul . 6109.946 6212.7083 5944.6334 5925.1533 8774.4087
df 64 64 64 64 64 64
crit 1.9977297 1.9977297 1.9977297 1.9977297 1.9977297 1.9977297
eform 0 0 0 0 0 0
So, in your code, after matrix table=r(table) you need to have something like:
forvalues i = 1 / `= colsof(r(table)) - 1' {
local beta = table[1,`i']
local se = table[2,`i']
local lci = table[5,`i']
local uci = table[6,`i']
local pval=table[4,`i']
post temp ("`out'") ("`exp'") (`adjusted') (`N') (`beta') (`se') ///
(`lci') (`uci') (`pval')
}
The following works for me:
sysuse auto.dta, clear
describe
summ
postfile temp str40 exp str40 outcome adjusted N beta se lci uci pval using ///
"test.dta", replace
foreach out in price weight {
foreach exp in i.rep78 {
foreach adjust in 1 {
if `adjust'==1 local adjusted "mpg"
reg `out' `exp' `adjusted'
local N = e(N)
matrix table=r(table)
forvalues i = 1 / `= colsof(r(table))-1' {
local beta = table[1,`i']
local se = table[2,`i']
local lci = table[5,`i']
local uci = table[6,`i']
local pval=table[4,`i']
post temp ("`out'") ("`exp'") (`adjusted') (`N') (`beta') ///
(`se') (`lci') (`uci') (`pval')
}
}
}
}
postclose temp
use "test.dta", clear
Related
I have been trying to run this simulation code in Stata version 15.1, but am having issues running it as indicated below.
local num_clus 3 6 9 18 36
local clussize 5 10 15 20 25
*Model specifications
local intercept 17.87
local timecoeff1 -5.42
local timecoeff2 -5.72
local timecoeff3 -7.03
local timecoeff4 -6.13
local timecoeff5 -9.13
local intrvcoeff 5.00
local sigma_u3 25.77
local sigma_u2 120.62
local sigma_error 38.35
local nrep 1000
local alpha 0.05
*Generate multi-level data
capture program drop swcrt
program define swcrt, rclass
version 15.1
preserve
clear
args num_clus clussize intercept intrvcoeff timecoeff1 timecoeff2 timecoeff3 timecoeff4 timecoeff5 sigma_u3 sigma_error alpha
assert `num_clus' > 0 & `clussize' > 0 & `intercept' > 0 & `intrvcoeff' > 0 & `timecoeff1' < 0 & `timecoeff2' < 0 & `timecoeff3' < 0 & `timecoeff4' < 0 & `timecoeff5' < 0 & `sigma_u3' > 0 & `sigma_error' > 0 & `alpha' > 0
/*Generate simulated multi—level data*/
qui
clear
set obs `num_clus'
qui gen cluster = _n
qui gen group = 1+mod(_n-1,4)
/*Generate cluster-level errors*/
qui gen u_3 = rnormal(0,`sigma_u3')
expand `clussize'
bysort cluster: gen individual = _n
/*Set up time*/
expand 6
bysort cluster individual: gen time = _n-1
/*Set up intervention variable*/
gen intrv = (time>=group)
/*Generate residual errors*/
qui gen error = rnormal(0,`sigma_error')
/*Generate outcome y*/
qui gen y = `intercept' + `intrvcoeff'*intrv + `timecoeff1'*1.time + `timecoeff2'*2.time + `timecoeff3'*3.time + `timecoeff4'*4.time + `timecoeff5'*5.time + u_3 + error
/*Fit multi-level model to simulated dataset*/
mixed y intrv i.time ||cluster:, covariance(unstructured) reml dfmethod(kroger)
/*Return estimated effect size, bias, p-value, and significance dichotomy*/
tempname M
matrix `M' = r(table)
return scalar bias = _b[intrv] - `intrvcoeff'
return scalar p = `M'[1,4]
return scalar p_= (`M'[1,4] < `alpha')
exit
end swcrt
*Postfile to store results
tempname step
tempfile powerresults
capture postutil clear
postfile `step' num_clus [B]clussize[/B] intrvcoeff p p_ bias using `powerresults', replace
ERROR: (note: file /var/folders/v4/j5kzzhc52q9fvh6w9pcx9fgm0000gn/T//S_00310.00000c not found)
*Loop over number of clusters
foreach c of local num_clus{
display as text "Number of clusters" as result "`c'"
foreach s of local clussize{
display as text "Cluster size" as result "`s'"
forvalue i = 1/`nrep'{
display as text "Iterations" as result `nrep'
quietly swcrt `num_clus' `clussize' `intercept' `intrvcoeff' `timecoeff1' `timecoeff2' `timecoeff3' `timecoeff4' `timecoeff5' `sigma_u3' `sigma_error' `alpha'
post `step' (`c') (`s') (`intrvcoeff') (`r(p)') (`r(p_)') (`r(bias)')
}
}
}
postclose `step'
ERROR:
Number of clusters3
Cluster size5
Iterations1000
r(9);
*Open results, calculate power
use `powerresults', clear
levelsof num_clus, local(num_clus)
levelsof clussize, local(clussize)
matrix drop _all
*Loop over combinations of clusters
*Add power results to matrix
foreach c of local num_clus{
foreach s of local clussize{
quietly ci proportions p_ if num_clus == `c' & clussize = `s'
local power `r(proportion)'
local power_lb `r(lb)'
local power_ub `r(ub)'
quietly ci mean bias if num_clus == `c' & clussize = `s'
local bias `r(mean)'
matrix M = nullmat(M) \ (`c', `s', `intrvcoeff', `power', `power_lb', `power_ub', `bias')
}
}
*Display the matrix
matrix colnames M = c s intrvcoeff power power_lb power_ub bias
ERROR:
matrix M not found
r(111);
matrix list M, noheader format(%3.2f)
ERROR:
matrix M not found
r(111);
There are a few things that seem to be amiss above.
I get a message after the postfile command saying that the file is not found. Nowhere in my code do I actually use that name so it seems to be generated by Stata.
After the loop and the post command I get error r(9).
Error message r(111) - says that the matrix is not found.
I have checked the following parts of the code to try and resolve the issue:
Specified local macros outside of the program and passed into it via the args statement of the program
Match between the variables in the call of the swcrt with the args statement in the program
Match between arguments in assert statement of the program with args command and whether the alligator clips are specified appropriately
Match b/w the number of variables in the post and postfile commands
I am not quite sure why I get these errors considering that the code did work previously and the program iterated (even when I take away the changes there is still the error). Does anyone know why this happens? If I had to guess, the matrix can't be found because of the error with the file not being found when I use postfile.
I run two regressions for which I would like to show the r-squared:
logit y c.x1 c.x2
quietly est store e1
local r1 = e(r2_p)
logit y c.x1 c.x2 c.x3
quietly est store e2
local r2 = e(r2_p)
I tried to create a matrix to fill it but was not successful:
mat t1=J(1,2,0) //Defining empty matrix with 2 columns 1 row
local rsq `r*' //Trying to store r1 and r2 as numeric
local a=1
forval i=1/2{
mat t1[`i'+1,`a']= `r*' // filling each row, one at a time, this fails
loc ++a
}
mat li t1
Ultimately, I would like to export the results with the community-contributed Stata command outreg2:
outreg2 [e*] using "myfile", excel replace addstat(Adj. R^2:, `rsq')
The following works for me:
webuse lbw, clear
logit low age smoke
outreg2 using "myfile.txt", replace addstat(Adj. R^2, e(r2_p))
logit low age smoke ptl ht ui
outreg2 using "myfile.txt", addstat(Adj. R^2, e(r2_p)) append
type myfile.txt
(1) (2)
VARIABLES low low
age -0.0498 -0.0541
(0.0320) (0.0339)
smoke 0.692** 0.557
(0.322) (0.339)
ptl 0.679**
(0.344)
ht 1.408**
(0.624)
ui 0.817*
(0.451)
Constant 0.0609 -0.168
(0.757) (0.806)
Observations 189 189
Adj. R^2 0.0315 0.0882
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
I am running an ordered probit with four levels (A lot, Somewhat, Little, Not at all) on a female variable and some controls:
* Baseline only
eststo, title ("OProbit1"): /*quietly*/ oprobit retincome_worry i.female $control_socio, vce(robust)
estimate store OProbit1
* Baseline + Health Controls
eststo, title ("OProbit3"): oprobit retincome_worry i.female $control_socio $control_health, vce(robust)
estimate store OProbit3
I am doing this for marginal effects of the female variable:
* TABLE BASELINE
estimate restore OProbit1
margins, dydx(i.female) predict (outcome(1)) atmeans post
outreg using results\Reg_margins\Reg2.tex, noautosumm replace rtitle(A lot) ctitle(Social Controls) title(Worry about Retirement Income)
estimate restore OProbit1
margins, dydx(i.female) predict (outcome(2)) atmeans post
outreg using results\Reg_margins\Reg2.tex, noautosumm append rtitle(Somewhat)
estimate restore OProbit1
margins, dydx(i.female) predict (outcome(3)) atmeans post
outreg using results\Reg_margins\Reg2.tex, noautosumm append rtitle(Little)
estimate restore OProbit1
margins, dydx(i.female) predict (outcome(4)) atmeans post
outreg using results\Reg_margins\Reg2.tex, noautosumm append rtitle(Not at all) tex
* TABLE BASELINE + HEALTH
estimate restore OProbit3
margins, dydx(i.female) predict (outcome(1)) atmeans post
outreg using results\Reg_margins\Reg3.tex, noautosumm replace rtitle(A lot) ctitle(Baseline and Health) title(Worry about Retirement Income)
estimate restore OProbit3
margins, dydx(i.female) predict (outcome(2)) atmeans post
outreg using results\Reg_margins\Reg3.tex, append noautosumm rtitle(Somewhat)
estimate restore OProbit3
margins, dydx(i.female) predict (outcome(3)) atmeans post
outreg using results\Reg_margins\Reg3.tex, append noautosumm rtitle(Little)
estimate restore OProbit3
margins, dydx(i.female) predict (outcome(4)) atmeans post
outreg using results\Reg_margins\Reg3.tex, append noautosumm rtitle(Not at all) tex
I currently have four tables (see examples for two of them), each with a column name which is the controls included in the model and four rows with each level:
How can I have all of this in a single table, keeping the four rows and adding more columns?
You can get the desired output using the community-contributed command esttab.
First, define the program appendmodels (obtained from here):
capt prog drop appendmodels
*! version 1.0.0 14aug2007 Ben Jann
program appendmodels, eclass
// using first equation of model
version 8
syntax namelist
tempname b V tmp
foreach name of local namelist {
qui est restore `name'
mat `tmp' = e(b)
local eq1: coleq `tmp'
gettoken eq1 : eq1
mat `tmp' = `tmp'[1,"`eq1':"]
local cons = colnumb(`tmp',"_cons")
if `cons'<. & `cons'>1 {
mat `tmp' = `tmp'[1,1..`cons'-1]
}
mat `b' = nullmat(`b') , `tmp'
mat `tmp' = e(V)
mat `tmp' = `tmp'["`eq1':","`eq1':"]
if `cons'<. & `cons'>1 {
mat `tmp' = `tmp'[1..`cons'-1,1..`cons'-1]
}
capt confirm matrix `V'
if _rc {
mat `V' = `tmp'
}
else {
mat `V' = ///
( `V' , J(rowsof(`V'),colsof(`tmp'),0) ) \ ///
( J(rowsof(`tmp'),colsof(`V'),0) , `tmp' )
}
}
local names: colfullnames `b'
mat coln `V' = `names'
mat rown `V' = `names'
eret post `b' `V'
eret local cmd "whatever"
end
Next, run the following (here I use Stata's fullauto toy dataset for illustration):
webuse fullauto, clear
estimates clear
forvalues i = 1 / 4 {
oprobit rep77 i.foreign
margins, dydx(foreign) predict (outcome(`i')) atmeans post
estimate store OProbit1`i'
}
appendmodels OProbit11 OProbit12 OProbit13 OProbit14
estimates store result1
forvalues i = 1 / 4 {
oprobit rep77 i.foreign length mpg
margins, dydx(foreign) predict (outcome(`i')) atmeans post
estimate store OProbit2`i'
}
appendmodels OProbit21 OProbit22 OProbit23 OProbit24
estimates store result2
forvalues i = 1 / 4 {
oprobit rep77 i.foreign trunk weight
margins, dydx(foreign) predict (outcome(`i')) atmeans post
estimate store OProbit3`i'
}
appendmodels OProbit31 OProbit32 OProbit23 OProbit34
estimates store result3
forvalues i = 1 / 4 {
oprobit rep77 i.foreign price displ
margins, dydx(foreign) predict (outcome(`i')) atmeans post
estimate store OProbit4`i'
}
appendmodels OProbit41 OProbit42 OProbit43 OProbit44
estimates store result4
Finally, see the results:
esttab result1 result2 result3 result4, keep(1.foreign) varlab(1.foreign " ") ///
labcol2("A lot" "Somewhat" "A little" "Not at all") gaps noobs nomtitles
-------------------------------------------------------------------------------------
(1) (2) (3) (4)
-------------------------------------------------------------------------------------
A lot -0.0572 -0.0677 -0.0728 -0.0690
(-1.83) (-1.67) (-1.81) (-1.67)
Somewhat -0.144** -0.247*** -0.188** -0.175*
(-2.73) (-3.54) (-2.86) (-2.47)
A little -0.124 -0.290** -0.290** -0.163
(-1.86) (-3.07) (-3.07) (-1.74)
Not at all 0.198** 0.351*** 0.252** 0.237*
(2.64) (3.82) (2.95) (2.55)
-------------------------------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
You can install esttab by typing the following in Stata's command prompt:
ssc install estout
I want to plot confidence intervals for some estimates after running a regression model.
As I'm working with a very big dataset, I need an efficient solution: in particular, a solution that does not require me to sort or save the dataset. In the following example, I plot estimates for b1 to b6:
reg y b1 b2 b3 b4 b5 b6
foreach i of numlist 1/6 {
local mean `mean' `=_b[b`i']' `i'
local ci `ci' ///
(scatteri ///
`=_b[b`i'] +1.96*_se[b`i']' `i' ///
`=_b[`i'] -1.96 * _se[b`i']' `i' ///
,lpattern(shortdash) lcolor(navy))
}
twoway `ci' (scatteri `mean', mcolor(navy)), legend(off) yline(0)
While scatteri efficiently plots the estimates, I can't get boundaries for the confidence interval similar to rcap.
Is there a better way to do this?
Here's token code for what you seem to want. The example is ridiculous. It's my personal view that refining this would be pointless given the very accomplished previous work behind coefplot. The multiplier of 1.96 only applies in very large samples.
sysuse auto, clear
set scheme s1color
reg mpg weight length displ
gen coeff = .
gen upper = .
gen lower = .
gen which = .
local i = 0
quietly foreach v in weight length displ {
local ++i
replace coeff = _b[`v'] in `i'
replace upper = _b[`v'] + 1.96 * _se[`v'] in `i'
replace lower = _b[`v'] - 1.96 * _se[`v'] in `i'
replace which = `i' in `i'
label def which `i' "`v'", modify
}
label val which which
twoway scatter coeff which, mcolor(navy) xsc(r(0.5, `i'.5)) xla(1/`i', val) ///
|| rcap upper lower which, lcolor(navy) xtitle("") legend(off)
I have a large data set. I have to subset the data set (Big_data) by using values stored in other dta file (Criteria_data). I will show you the problem first:
**Big_data** **Criteria_data**
==================== ================================================
lon lat 4_digit_id minlon maxlon minlat maxlat
-76.22 44.27 0765 -78.44 -77.22 34.324 35.011
-67.55 33.19 6161 -66.11 -65.93 40.32 41.88
....... ........
(over 1 million obs) (271 observations)
==================== ================================================
I have to subset the bid data as follows:
use Big_data
preserve
keep if (-78.44<lon<-77.22) & (34.324<lat<35.011)
save data_0765, replace
restore
preserve
keep if (-66.11<lon<-65.93) & (40.32<lat<41.88)
save data_6161, replace
restore
....
(1) What should be the efficient programming for the subsetting in Stata? (2) Are the inequality expressions correctly written?
1) Subsetting data
With 400,000 observations in the main file and 300 in the reference file, it takes about 1.5 minutes. I can't test this with double the observations in the main file because the lack of RAM takes my computer to a crawl.
The strategy involves creating as many variables as needed to hold the reference latitudes and longitudes (271*4 = 1084 in the OP's case; Stata IC and up can handle this. See help limits). This requires some reshaping and appending. Then we check for those observations of the big data file that meet the conditions.
clear all
set more off
*----- create example databases -----
tempfile bigdata reference
input ///
lon lat
-76.22 44.27
-66.0 40.85 // meets conditions
-77.10 34.8 // meets conditions
-66.00 42.0
end
expand 100000
save "`bigdata'"
*list
clear all
input ///
str4 id minlon maxlon minlat maxlat
"0765" -78.44 -75.22 34.324 35.011
"6161" -66.11 -65.93 40.32 41.88
end
drop id
expand 150
gen id = _n
save "`reference'"
*list
*----- reshape original reference file -----
use "`reference'", clear
tempfile reference2
destring id, replace
levelsof id, local(lev)
gen i = 1
reshape wide minlon maxlon minlat maxlat, i(i) j(id)
gen lat = .
gen lon = .
save "`reference2'"
*----- create working database -----
use "`bigdata'"
timer on 1
quietly {
forvalues num = 1/300 {
gen minlon`num' = .
gen maxlon`num' = .
gen minlat`num' = .
gen maxlat`num' = .
}
}
timer off 1
timer on 2
append using "`reference2'"
drop i
timer off 2
*----- flag observations for which conditions are met -----
timer on 3
gen byte flag = 0
foreach le of local lev {
quietly replace flag = 1 if inrange(lon, minlon`le'[_N], maxlon`le'[_N]) & inrange(lat, minlat`le'[_N], maxlat`le'[_N])
}
timer off 3
*keep if flag
*keep lon lat
*list
timer list
The inrange() function implies that the minimums and maximums must be adjusted beforehand to satisfy the OP's strict inequalities (the function tests <=, >=).
Probably some expansion using expand, use of correlatives and by (so data is in long form) could speed things up. It's not totally clear for me right now. I'm sure there are better ways in plain Stata mode. Mata may be even better.
(joinby was also tested but again RAM was a problem.)
Edit
Doing computations in chunks rather than for the complete database, significantly improves the RAM issue. Using a main file with 1.2 million observations and a reference file with 300 observations, the following code does all the work in about 1.5 minutes:
set more off
*----- create example big data -----
clear all
set obs 1200000
set seed 13056
gen lat = runiform()*100
gen lon = runiform()*100
local sizebd `=_N' // to be used in computations
tempfile bigdata
save "`bigdata'"
*----- create example reference data -----
clear all
set obs 300
set seed 97532
gen minlat = runiform()*100
gen maxlat = minlat + runiform()*5
gen minlon = runiform()*100
gen maxlon = minlon + runiform()*5
gen id = _n
tempfile reference
save "`reference'"
*----- reshape original reference file -----
use "`reference'", clear
destring id, replace
levelsof id, local(lev)
gen i = 1
reshape wide minlon maxlon minlat maxlat, i(i) j(id)
drop i
tempfile reference2
save "`reference2'"
*----- create file to save results -----
tempfile results
clear all
set obs 0
gen lon = .
gen lat = .
save "`results'"
*----- start computations -----
clear all
* local that controls # of observations in intermediate files
local step = 5000 // can't be larger than sizedb
timer clear
timer on 99
forvalues en = `step'(`step')`sizebd' {
* load observations and join with references
timer on 1
local start = `en' - (`step' - 1)
use in `start'/`en' using "`bigdata'", clear
timer off 1
timer on 2
append using "`reference2'"
timer off 2
* flag observations that meet conditions
timer on 3
gen byte flag = 0
foreach le of local lev {
quietly replace flag = 1 if inrange(lon, minlon`le'[_N], maxlon`le'[_N]) & inrange(lat, minlat`le'[_N], maxlat`le'[_N])
}
timer off 3
* append to result database
timer on 4
quietly {
keep if flag
keep lon lat
append using "`results'"
save "`results'", replace
}
timer off 4
}
timer off 99
timer list
display "total time is " `r(t99)'/60 " minutes"
use "`results'"
browse
2) Inequalities
You ask if your inequalities are correct. They are in fact legal, meaning that Stata will not complain, but the result is probably unexpected.
The following result may seem surprising:
. display (66.11 < 100 < 67.93)
1
How is it the case that the expression evaluates to true (i.e. 1) ? Stata first evaluates 66.11 < 100 which is true, and then sees 1 < 67.93 which is also true, of course.
The intended expression was (and Stata will now do what you want):
. display (66.11 < 100) & (100 < 67.93)
0
You can also rely on the function inrange().
The following example is consistent with the previous explanation:
. display (66.11 < 100 < 0)
0
Stata sees 66.11 < 100 which is true (i.e. 1) and follows up with 1 < 0, which is false (i.e. 0).
This uses Roberto's data setup:
clear all
set obs 1200000
set seed 13056
gen lat = runiform()*100
gen lon = runiform()*100
local sizebd `=_N' // to be used in computations
tempfile bigdata
save "`bigdata'"
*----- create example reference data -----
clear all
set obs 300
set seed 97532
gen minlat = runiform()*100
gen maxlat = minlat + runiform()*5
gen minlon = runiform()*100
gen maxlon = minlon + runiform()*5
gen id = _n
tempfile reference
save "`reference'"
timer on 1
levelsof id, local(id_list)
foreach id of local id_list {
sum minlat if id==`id', meanonly
local minlat = r(min)
sum maxlat if id==`id', meanonly
local maxlat = r(max)
sum minlon if id==`id', meanonly
local minlon = r(min)
sum maxlon if id==`id', meanonly
local maxlon = r(max)
preserve
use if (inrange(lon,`minlon',`maxlon') & inrange(lat,`minlat',`maxlat')) using "`bigdata'", clear
qui save data_`id', replace
restore
}
timer off 1
I would try to avoid preserveing and restoreing the "big" file, and doing so is possible, but at the expense of losing Stata format.
Using the same set up as Roberto and Dimitriy did,
set more off
use `bigdata', clear
merge 1:1 _n using `reference'
* check for data consistency:
* minlat, maxlat, minlon, maxlon are either all defined or all missing
assert inlist( mi(minlat) + mi(maxlat) + mi(minlon) + mi(maxlon), 0, 4)
* this will come handy later
gen byte touse = 0
* set up and cycle over the reference data
count if !missing(minlat)
forvalues n=1/`=r(N)' {
replace touse = inrange(lat,minlat[`n'],maxlat[`n']) & inrange(lon,minlon[`n'],maxlon[`n'])
local thisid = id[`n']
outfile lat lon if touse using data_`thisid'.csv, replace comma
}
Time it on your machine. You could avoid touse and thisid and only have the single outfile within the cycle, but it would be less readable.
You can then infile lat lon using data_###.csv, clear later. If you really need the Stata files proper, you can convert that swarm of CSV files with
clear
local allcsv : dir . files "*.csv"
foreach f of local allcsv {
* change the filename
local dtaname = subinstr(`"`f'"',".csv",".dta",.)
infile lat lon using `"`f'"', clear
if _N>0 save `"`dtaname'"', replace
}
Time it, too. I protected the save as some of the simulated data sets were empty. I think this was faster than 1.5 min on my machine, including the conversion.