Imputed values in Stata - stata

I am new to Stata and tried Multiple Imputation, but really don't understand how to access the imputed Data. I used the following commands:
mi set wide
mi register regular var1 var2 var3
mi register imputed var4 var5 var6
mi impute chained (pmm,knn(5)) var4 var5 var6 = var1 var2 var3, add(5) dots noisily
mi estimate: regress var1 var2 var3 var4 var5 var6
using ",saving(myfile, replace)" at the end of the mi estimate command gives me the following error:
"option saving() not allowed
an error occurred when mi estimate executed regress on m=1"

Related

How do I conditionally select variables in PROC SQL?

I have calculated a frequency table in a previous step. Excerpt below:
I want to automatically drop all variables from this table where the frequency is missing. In the excerpt above, that would mean the variables "Exkl_UtgUtl_Taxi_kvot" and "Exkl_UtgUtl_Driv_kvot" would need to be dropped.
I try the following step in PROC SQL (which ideally I will repeat for all variables in the table):
PROC SQL;
CREATE TABLE test3 as
SELECT (CASE WHEN Exkl_UtgUtl_Flyg_kvot!=. THEN Exkl_UtgUtl_Flyg_kvot ELSE NULL END)
FROM stickprovsstorlekar;
quit;
This fails, however, since SAS does not like NULL values. How do I do this?
I tried just writing:
PROC SQL;
CREATE TABLE test3 as
SELECT (CASE WHEN Exkl_UtgUtl_Flyg_kvot!=. THEN Exkl_UtgUtl_Flyg_kvot)
FROM stickprovsstorlekar;
quit;
But that just generates a variable with an automatically generated name (like DATA_007). I want all variables containing missing values to be totally excluded from the results.
Let's say you have 10 variables, where var1, var3, var5, var7, and var9 have missing values in the first observation. We want to select only the variables with no missing observations.
var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
. 8 . 9 . 6 . 1 . 4
5 1 2 7 2 7 2 9 7 7
5 9 7 7 6 8 5 6 4 9
...
First, let's find all variables that have missing observations:
proc means data=have noprint;
var _NUMERIC_;
output out=missing nmiss=;
run;
Then transpose this output table so it's easier to work with:
proc transpose data=missing out=missing_tpose;
run;
We now have a table that looks like this:
_NAME_ COL1
_TYPE_ 0
_FREQ_ 10
var1 1
var2 0
var3 1
var4 0
var5 1
var6 0
var7 1
var8 0
var9 1
var10 0
When COL1 is > 0 and the name is not _TYPE_ or _FREQ_, that means the variable has missing values. Let's extract the name of the variable from _NAME_ into a comma-separated list.
proc sql noprint;
select _NAME_
into :vars separated by ','
from missing_tpose
where COL1 = 0 AND _NAME_ NOT IN('_TYPE_', '_FREQ_')
;
quit;
%put &vars and you'll see all of the non-missing values that can be passed into SQL.
var2,var4,var6,var8,var10
Now we have a dynamic way to select variables with only non-missing values.
proc sql;
create table want as
select &vars
from have
;
quit;

How to make dataset, where there last variables will be in one column

I have a dataset:
1 300 apple pear onion
1 302 banana tomato cookie
2 302 bread meat tomato
How to make dataset, where there last variables will be in one column.
What I need:
Dataset
You need to look at the CATX function (or its siblings, CATS, CATT, CATQ, CAT).
new_var = catx(var1,var2,var3)
Or a couple of other options:
new_var = catx(of var:);
new_var = catx(of var1-var3);
If they're all starting with the same pattern.
Use proc transpose with your categories in the by statement and the variables to transpose in the var statement:
data have;
input var1 var2 var3 $ var4 $ var5 $;
datalines;
1 300 apple pear onion
1 302 banana tomato cookie
2 302 bread meat tomato
;
run;
proc transpose data=have out=want (drop=_name_ rename=(col1 = fruit));
by var1 var2;
var var3 var4 var5;
run;

Local macro on subsample data using if statement in Stata

I want to use the local command in Stata to store several variables that I afterwards want to export as two subsamples. I separate the dataset by the grouping variable grouping_var, which is either 0 or 1. I tried:
if grouping_var==0 local vars_0 var1 var2 var3 var4
preserve
keep `vars_0'
saveold "data1", replace
restore
if grouping_var==1 local vars_1 var1 var2 var3 var4
preserve
keep `vars_1'
saveold "data2", replace
restore
However, the output is not as I expected and the data is not divided into two subsamples. The first list includes the whole dataset. Is there anything wrong in how I use the if statement here?
There is a bit of confusion between the "if qualifier" and the "if command" here. The syntax if (condition) (command) is the "if command", and generally does not provide the desired behavior when written using observation-level logical conditions.
In short, Stata evaluates if (condition) for the first observation, which is why your entire data set is being kept/saved in the first block (i.e., in your current sort order, grouping_var[1] == 0). See http://www.stata.com/support/faqs/programming/if-command-versus-if-qualifier/ for more information.
Assuming you want to keep different variables in each case, something like the code below should work:
local vars_0 var1 var2 var3 var4
local vars_1 var5 var6 var7 var8
forvalues g = 0/1 {
preserve
keep if grouping_var == `g'
keep `vars_`g''
save data`g' , replace
restore
}

How to match data in SAS

I have a dataset which contain three variables var1, var2, and Price. Price is the price of var2. var1 is a subsample of of Var2. Now, I want to find the price of each product in var1 by matching the name of Var1 with Var2.
The data looks like this. Can anyone help me solve this out please. Many thanks
Var1 Var2 Price
apple ?
apple 2
banana ?
banana 2.1
apple ?
orange ?
orange 4
banana ?
yoghurt 2
You could do this through SQL by merging your prices onto your dataset by var1/var2:
proc sql ;
create table output as
select a.var1, a.var2, b.price
from input a
left join (select distinct var2, price
from input
where not missing(var2)) as b
on (a.var1=b.var2
or a.var2=b.var2)
;quit ;
Try to use hash table.
data want;
if 0 then set have(keep=var2 price where=(not missing(var2)));
if _n_=1 then do;
declare hash h (dataset:'have1(keep=var2 price where=(not missing(var2)))');
h.definekey('var2');
h.definedata('price');
h.definedone();
call missing(var2,price);
end;
set have;
rc=h.find(key:var1);
drop rc;
run;

SAS Pulling string variables from a csv file

SOLVED (per Neil Neyman's comment):
&var1 is not the same as var1.
DATA local.trow;
INFILE csvfile FIRSTOBS=&i OBS=&i;
INPUT var1 $ var2 $ var3 $ var4 $;
call symput('var1',var1); *Added line;
call symput('var2',var2); *Added line;
call symput('var3',var3); *Added line;
call symput('var4',var4); *Added line;
RUN;
Adding the lines marked with "*Added line;" solved the issue.
QUESTION
Disclaimer: I am very new to SAS and have been struggling with issues in this code for a while.
In a loop, I am trying to import string variables from a CSV file, one of which I then pass to a remote server (var1), but I'm running into an issue. If I include %let var1 = 'XXE'; at the top of the code and exclude the portion where I'm pulling the variables from my csv file, remote execution works fine and I get the output I would expect.
However, if I run the code as is, it appears to not treat the string variables as expected. For instance, the PROC PRINT statement produces the expected output (i.e. it shows the 4 variables), but the title does not show up properly--it appears that var1 is skipped altogether, while i(with a value of 1) and m (with a value of 2007) are displayed. The title shows up as "Title - 1 2007". The log displays the following error near the title line:
WARNING: Apparent symbolic reference VAR1 not resolved.
The remote submit does not work either, but instead produces the following error while highlighting &VAR1:
ERROR: Syntax error while parsing WHERE clause.
ERROR 22-322: Syntax error, expecting one of the following: a quoted string,
a numeric constant, a datetime constant, a missing value.
I'm really confused by this error because the PROC PRINT statement is able to print the variables (which do in fact visually appear to be strings). Is a "quoted string" a different type of variable?
If I explicitly declare var1 at the top of the code or manually enter 'XXE' into the WHERE clause, the remote query executes.
Could it be that am I handling the text file incorrectly? It looks like this:
XXE XXA XXB XXC
XXM XXN XXI XXP
...
My code:
LIBNAME local 'C:\...\Pulled Data\New\';
FILENAME csvfile 'C:\...\Pulled Data\New\indexes.txt';
%macro getthedata(nrows,ystart,yend); *nrows is the number of rows in the text file;
%GLOBAL var1 var2 var3 var4;
%do i=1 %to &nrows;
%do m=&ystart %to &yend;
DATA local.trow;
INFILE csvfile FIRSTOBS=&i OBS=&i;
INPUT var1 $ var2 $ var3 $ var4 $;
RUN;
PROC PRINT DATA = local.trow;
TITLE "Title - &i. &var1. &m";
var var1 var2 var3 var4;
RUN;
proc export data=local.trow
outfile="C:\...\Pulled Data\New\Indices_&i._&m..csv"
dbms=csv replace;
run;
signon username=_prompt_;
%syslput VAR1 = &var1;
rsubmit;
libname abc'server/sasdata';
data all2009;
set abc.file_2007:;
by index date time;
where index in (&VAR1) and time between '8:30:00't and '12:00:00't;
run;
endrsubmit;
%end;
%end;
%mend getthedata;
Options MPRINT;
%getthedata(1,2007,2007)
Short Answer:
&var1 is not the same as var1. Add the call symput() lines described below to assign the datastep values to the macro variable values.
DATA local.trow;
INFILE csvfile FIRSTOBS=&i OBS=&i;
INPUT var1 $ var2 $ var3 $ var4 $;
call symput('var1',var1);
call symput('var2',var2);
call symput('var3',var4);
call symput('var4',var4);
RUN;
Other Notes
Seems a strange way to go about this, but you said you are new to SAS so maybe I could give you some pointers?
Create the entire dataset at once outside the macro
data local.trows;
length var1 var2 var3 var4 $3; *assuming vars really are only 3 chars;
infile csvfile; *this is not really a csv file, it looks space-delimited.;
*confusing to name it as such;
input var1 var2 var3 var4;
run;
I'm not getting why there's a separate output csv file for each row? Is that really what you need?
Once you have your dataset your macro can do something like:
%macro getthedata(mdataset)
data _null_;
set &mdataset; #add mdataset as a macro parameter;
/* automatically assigning nrows based on dataset; */
if last then call symput('nrows',_n_);
run;
%do i=1 to &nrows;
data _null_;
set &mdataset;
if &i=_n_ then do;
call symput('var1',var1);
call symput('var2',var2);
/*
etc... Doesn't seem like these really should be
globals since they change every iteration, and
don't seem needed outside of the macro?
*/
run;
/** now you have your vars set for the current iteration
and proceed with your connect code **/
It seems you are just overwriting this dataset with every iteration. Is that what you want to do? Or is there some other code/macro variables you left out for this question?
libname abc'server/sasdata';
data all2009;
set abc.file_2007:;
/*seems to be a random colon here ^ by the way*/