How to count variable in a row in sas? - sas

I have a table of this form
id1|A|
id1| |var1
id1|B|var2
id2|C|
I would like to count retrieve the data that have all the information for all variables: ie
id1|B|var2
to perform this task I want to count the number of observations in each row and take only the rows which have full observation:
id|name|age |cntrow
id1| A | |2
id1| |var1|2
id1| B |var2|3
id2| C | |2
Any guess how to perform this task?

You can use a CMISS function. Something along the lines of:
Data nomissing missing;
Set input_dataset;
if CMISS(of _ALL_)=0 then output nomissing;
if CMISS(of _ALL_)>0 then output missing;
run;

The n function would work if this were numeric. Since the data are not, you can use CMISS to find out how many are missing:
data have;
infile datalines dlm='|';
input
id $ charvar1 $ charvar2 $ numvar;
vars_missing = cmiss(of _all_)-1; *because vars_missing is also missing at this point!;
put _all_;
datalines;
id1|A| |3
id1| |var1|2
id1|B|var2|.
id2|C| |2
;;;;
run;
And then subtract that from the known number of variables. If you don't know it, you can create _CHARACTER_ and _NUMERIC_ arrays and use dim() for those to find out.

Related

creating output from proc freq in SAS

I am running the following SAS code in SAS Enterprise Guide 6.1 to get some summary stats on null/not null for all the variables in a table. This is producing the desired info via the 'results' tab, which creates a separate table for each result showing null/not null frequencies and percentages.
What I'd like to do is put the results into an output dataset with all the variables and stats in a single table.
proc format;
value $missfmt ' '='Missing' other='Not Missing';
value missfmt . ='Missing' other='Not Missing';
run;
proc freq data=mydatatable;
format _CHAR_ $missfmt.;
tables _CHAR_ / out=work.out1 missing missprint nocum;
format _NUMERIC_ missfmt.;
tables _NUMERIC_ / out=work.out2 missing missprint nocum;
run;
out1 and out2 are being generated into tables like this:
FieldName | Count | Percent
Not Missing | Not Missing | Not Missing
But are only populated with one variable each, and the frequency counts are not being shown.
The table I'm trying to create as output would be:
field | Missing | Not Missing | % Missing
FieldName1 | 100 | 100 | 50
FieldName2 | 3 | 97 | 3
The tables statement output options only apply to the last table requested. _CHAR_ resolves to (all character variables), but they're single tables, so you only get the last one requested.
You can get this one of two ways. Either use PROC TABULATE, which more readily deals with lists of variables; or use ODS OUTPUT to grab the proc freq output. Both output styles will take some work likely to get into exactly the structure you want.
ods output onewayfreqs=myfreqs; *use `ODS TRACE` to find this name if you do not know it;
proc freq data=sashelp.class;
tables _character_;
tables _numeric_;
run;
ods output close;

How to count every 1000 observation in data set SAS

I have some data set ranked by some variable.
I need to take every 1000 observation from the beginning and count in which field1=1, then count next 1000 observations in the same way.
How can I do it?
I hope I understand correctly what you want.
You could try a datastep like this
data result (Keep=countob obnr);
retain obnr 1000;
retain countob 0;
set mydata;
if field1=1 then
countob=countob+1;
if mod(_n_,1000) = 0 then do;
output;
obnr=obnr+1000;
countob=0;
end;
run;
this would lead to a result like this:
obnr | countob
------------
1000 | 247
2000 | 325
3000 | 198
obnr is obviously optional...
Another, slightly shorter, way, utilizing CEIL-function and PROC FREQ-procedure:
data want;
set have;
thousand=ceil(_N_/1000)*1000;
run;
proc freq data=want;
tables thousand / out=want;
where field1=1;
run;

SAS Find Top Combinations in Dataset

Hell everyone --
I have some sales data which looks like this:
data have;
input order_id item $;
cards;
1 A
1 B
2 A
2 C
3 B
4 A
4 B
;
run;
What I'm trying to find out is what are the most popular combinations of items ordered. For example in the above case, there were 2 orders that contained items A&B, 1 order of A&C, and 1 order of B. What would be the best way to output the different combinations along with the numbers of orders placed?
It seems there is no permutation issue, you could try this:
proc sort data=have;
by order_id item;
run;
data temp;
set have;
by order_id;
retain comb;
length comb $4;
comb=cats(comb,item);
if last.order_id then do;
output;
call missing(comb);
end;
run;
proc freq data=temp;
table comb/norow nopercent nocol nocum;
run;
There are many possible approaches to this problem, and I would not presume to say which is the best. Here's a fairly simple method you could use:
Transpose your data so that you only have 1 row for each order, with an indicator variable for each product.
Feed the transposed dataset into proc corr to produce a correlation matrix for the indicator variables, and look for the strongest correlations.

Make a SAS data column into a Macro variable?

How can I convert the output of a SAS data column into a macro variable?
For example:
Var1 | Var2
-----------
A | 1
B | 2
C | 3
D | 4
E | 5
What if I want a macro variable containing all of the values in Var1 to use in a PROC REG or other procedure? How can I extract that column into a variable which can be used in other PROCS?
In other words, I would want to generate the equivalent statement:
%LET Var1 =
A
B
C
D
E
;
But I will have different results coming from a previous procedure so I can't just do a '%LET'. I have been exploring SYMPUT and SYMGET, but they seem to apply only to single observations.
Thank you.
proc sql;
select var1
into :varlist separated by ' '
from have;
quit;
creates &varlist. macro variable, separated by the separation character. If you don't specify a separation character it creates a variable with the last row's value only.
There are a lot of other ways, but this is the simplest. CALL SYMPUTX for example will do the same thing, except it's complicated to get it to pull all rows into one.
You can use it in a proc directly, no need for a macro variable. I used numeric values for your var1 for simplicity, but you get the idea.
data test;
input var1 var2 ##;
datalines;
1 100 2 200 3 300 4 400 5 500
run;
proc reg data=TEST;
MODEL VAR1 = VAR2;
RUN;

How to create a new variable in SAS by extracting part of the value of an existing numeric variable?

I have two datasets in SAS that I would like to merge, but they have no common variables. One dataset has a "subject_id" variable, while the other has a "mom_subject_id" variable. Both of these variables are 9-digit codes that have just 3 digits in the middle of the code with common meaning, and that's what I need to match the two datasets on when I merge them.
What I'd like to do is create a new common variable in each dataset that is just the 3 digits from within the subject ID. Those 3 digits will always be in the same location within the 9-digit subject ID, so I'm wondering if there's a way to extract those 3 digits from the variable to make a new variable.
Thanks!
SQL(using sample data from Data Step code):
proc sql;
create table want2 as
select a.subject_id, a.other, b.mom_subject_id, b.misc
from have1 a JOIN have2 b
on(substr(a.subject_id,4,3)=substr(b.mom_subject_id,4,3));
quit;
Data Step:
data have1;
length subject_id $9;
input subject_id $ other $;
datalines;
abc001def other1
abc002def other2
abc003def other3
abc004def other4
abc005def other5
;
data have2;
length mom_subject_id $9;
input mom_subject_id $ misc $;
datalines;
ghi001jkl misc1
ghi003jkl misc3
ghi005jkl misc5
;
data have1;
length id $3;
set have1;
id=substr(subject_id,4,3);
run;
data have2;
length id $3;
set have2;
id=substr(mom_subject_id,4,3);
run;
Proc sort data=have1;
by id;
run;
Proc sort data=have2;
by id;
run;
data work.want;
merge have1(in=a) have2(in=b);
by id;
run;
an alternative would be to use
proc sql
and then use a join and the substr() just as explained above, if you are comfortable with sql
Assuming that your "subject_id" variable is a number then the substr function wont work as sas will try convert the number to a string. But by default it pads some paces on the left of the number.
You can use the modulus function mod(input, base) which returns the remainder when input is divided by base.
/*First get rid of the last 3 digits*/
temp_var = floor( subject_id / 1000);
/* then get the next three digits that we want*/
id = mod(temp_var ,1000);
Or in one line:
id = mod(floor(subject_id / 1000), 1000);
Then you can continue with sorting the new data sets by id and then merging.