Is there a way to delete 0's from a dataset? - sas

Suppose I want to only apply proc means or the better means macro to only non zero entries in my dataset? Is there an easy option to do this? If I have a dataset:
A B C
0 1 2
2 2 0
2 0 1
How can I use proc means or the better means macro to ignore the 0 values?

You can create a view to convert them on the fly. BETTERMEANS may have a way of handling this; not sure.
data have;
input A B C ;
format a b c zeromissing1.;
datalines;
0 1 2
2 2 0
2 0 1
;;;;
run;
data have_z/view=have_z;
set have;
array num _numeric_;
do _i = 1 to dim(num);
if num[_i]=0 then num[_i]=.;
end;
run;
proc means data=have_z;
var a b c;
run;

Related

How to recode values of a variable based on the maxmium value in the variable, for hundreds of variables?

I want to recode the max value of a variable as 1 and 0 when it is not. For each variable, there may be multiple observations with the max value. The max value for each value is not fixed, i.e. from cycle to cycle the max value for each variable may change. And there are hundreds of variables, cannot "hard-code" anything.
The final product would have the same dimensions as the original table, i.e. equal number of rows and columns as a matrix of 0s and 1s.
This is within SAS. I attempted to calculate the max of each variable and then append these max as a new observation into the data. Then comparing down the column of each variable against the "max" observation... looking into examples of the following did not help:
SQL
Array in datastep
proc transpose
formatting
Any insight would be much appreciated.
Here is a version done with SQL:
The idea is that we first calculate the maximum. The Latter select. Then we join the data to original and the outer the case-select specifies if the flag is set up or not.
data begin;
input var value;
cards;
1 1
1 2
1 3
1 2.5
1 1.7
1 3
2 34
2 33
2 33
2 33.7
2 34
2 34
; run;
proc sql;
create table result as
select a.var, a.value, case when a.value = b.maximum then 1 else 0 end as is_max from
(select * from begin) a
left join
(select max(value) as maximum, var from begin group by var) b
on a.var = b.var
;
quit;
To avoid "hard-code" you need to use some code generation.
First let's figure out what code you could use to solve the problem. Later we can look into ways to generate that code.
It is probably easiest to do this with PROC SQL code. SAS will allow you to reference the MAX() value of a variable. Also note that SAS evaluates boolean expressions to 1 (TRUE) or 0 (FALSE). So you just want to generate code like:
proc sql;
create table want as
select var1=max(var1) as var1
, var2=max(var2) as var2
from have
;
quit;
To generate the code you need a list of the variables in your source dataset. You can get those with PROC CONTENTS but also with the metadata table (view) DICTIONARY.COLUMNS (also accessible as SASHELP.VCOLUMN from outside PROC SQL).
If the list of variables is small then you could generate the code into a single macro variable.
proc sql noprint;
select catx(' ',cats(name,'=max(',name,')'),'as',name)
into :varlist separated by ','
from dictionary.columns
where libname='WORK' and memname='HAVE'
order by varnum
;
create table want as
select &varlist
from have
;
quit;
The maximum number of characters that will fit into a macro variable is 64K. So long enough for about 2,000 variables with names of 8 characters each.
Here is little more complex way that uses PROC SUMMARY and a data step with a temporary array. It does not really need any code generation.
%let dsin=sashelp.class(obs=10);
%let dsout=want;
%let varlist=_numeric_;
proc summary data=&dsin nway ;
var &varlist;
output out=summary(drop=_type_ _freq_) max= ;
run;
data &dsout;
if 0 then set &dsin;
array vars &varlist;
array max [10000] _temporary_;
if _n_=1 then do;
set summary ;
do _n_=1 to dim(vars);
max[_n_]=vars[_n_];
end;
end;
set &dsin;
do _n_=1 to dim(vars);
vars[_n_]=vars[_n_]=max[_n_];
end;
run;
Results:
Obs Name Sex Age Height Weight
1 Alfred M 0 1 1
2 Alice F 0 0 0
3 Barbara F 0 0 0
4 Carol F 0 0 0
5 Henry M 0 0 0
6 James M 0 0 0
7 Jane F 0 0 0
8 Janet F 1 0 1
9 Jeffrey M 0 0 0
10 John M 0 0 0

Make a string in SAS from one column

I have a dataset in SAS and I want to Convert one column into string by the Product. I have attached the image of input and output required.
I need the Colomn STRING in the outut. can anyone please help me ?
I have coded a data step to create the input data:
data have;
input products $
dates
value
;
datalines;
a 1 0
a 2 0
a 3 1
a 4 0
a 5 1
a 6 1
b 1 0
b 2 1
b 3 1
b 4 1
b 5 0
b 6 0
c 1 1
c 2 0
c 3 1
c 4 1
c 5 0
c 6 1
;
Does the following suggested solution give you what you want?:
data want;
length string $ 20;
do until(last.products);
set have;
by products;
string = catx(',',string,value);
end;
do until(last.products);
set have;
by products;
output;
end;
run;
Here's my quick solution.
data temp;
length cat $20.;
do until (last.prod);
set have;
by prod notsorted;
cat=catx(',',cat,value);
end;
drop value date;
run;
proc sql;
create table want as
select have.*, cat as string
from have inner join temp
on have.prod=temp.prod;
quit;

SAS Adjacency Matrix Creation

I have created this table:
And from this I want to create an adjacency matrix which shows how many employee_id's the tables share. It would look like this (I think):
I'm not sure if I'm going about this the correct way. I think I may be doing it wrong. I know that this is probably easier if I have more SAS products but I only have the basic SAS enterprise guide to work with.
I really appreciate the help. Thank you.
Here's another way using PROC CORR that's still better than the solution above. And you don't need to filter - it doesn't matter regarding the variables, you only specify them in the PROC CORR procedure.
data id;
input id:$4. human alien wizard;
cards;
1005 1 1 0
1018 0 0 1
1022 0 0 1
1024 1 0 0
1034 0 1 0
1069 0 1 0
1078 1 0 0
1247 1 1 1
;;;;
run;
ods output sscp=want;
proc corr data=id sscp ;
var human alien wizard;
run;
proc print data=want;
format _numeric_ 8.;
run;
Results are:
Obs Variable human alien wizard
1 human 4 2 1
2 alien 2 4 1
3 wizard 1 1 3
I think this is what you want but it does not give the thing you show as answer.
data id;
input id:$4. human alien wizard;
cards;
1005 1 1 0
1018 0 0 1
1022 0 0 1
1024 1 0 0
1034 0 1 0
1069 0 1 0
1078 1 0 0
1247 1 1 1
;;;;
run;
proc corr noprint nocorr sscp out=sscp;
var human alien wizard;
run;
proc print;
run;
I was able to get the answer using this, although it does not include the last cell I wanted (human_alien_wizard):
proc transpose data=FULL_JOIN_ALL3 out=FULL_JOIN_ALL3_v2;
by employee_id;
var human_table alien_table wizard_table;
run;
proc sql;
create table FULL_JOIN_ALL3_v3 as
select distinct a._name_ as anm,b._name_ as bnm,
count(distinct case when a.col1=1 and b.col1=1 then a.employee_id else . end) as smalln
from FULL_JOIN_ALL3_v2 a, FULL_JOIN_ALL3_v2 b
where a.employee_id=b.employee_id
group by anm,bnm
;
proc tabulate data=FULL_JOIN_ALL3_v3;
class anm bnm;
var smalln;
table anm='',bnm=''*smalln=''*sum=''*f=best3. / rts=5;
run;

Count number of 0 values

Similar to here, I can count the number of missing observations:
data dataset;
input a b c;
cards;
1 2 3
0 1 0
0 0 0
7 6 .
. 3 0
0 0 .
;
run;
proc means data=dataset NMISS N;
run;
But how can I also count the number of observations that are 0?
If you want to count the number of observations that are 0, you'd want to use proc tabulate or proc freq, and do a frequency count.
If you have a lot of values and you just want "0/not 0", that's easy to do with a format.
data have;
input a b c;
cards;
1 2 3
0 1 0
0 0 0
7 6 .
. 3 0
0 0 .
;
run;
proc format;
value zerof
0='Zero'
.='Missing'
other='Not Zero';
quit;
proc freq data=have;
format _numeric_ zerof.;
tables _numeric_/missing;
run;
Something along those lines. Obviously be careful about _numeric_ as that's all numeric variables and could get messy quickly if you have a lot of them...
I add this as an additional answer. It requires you to have PROC IML.
This uses matrix manipulation to do the count.
(ds=0) -- creates a matrix of 0/1 values (false/true) of values = 0
[+,] -- sums the rows for all columns. If we have 0/1 values, then this is the number of value=0 for each column.
' -- operator is transpose.
|| -- merge matrices {0} || {1} = {0 1}
Then we just print the values.
proc iml;
use dataset;
read all var _num_ into ds[colname=names];
close dataset;
ds2 = ((ds=0)[+,])`;
n = nrow(ds);
ds2 = ds2 || repeat(n,ncol(ds),1);
cnames = {"N = 0", "Count"};
mattrib ds2 rowname=names colname=cnames;
print ds2;
quit;
Easiest to use PROC SQL. You will have to use a UNION to replicate the MEANS output;
Each section of the first FROM counts the 0 values for each variable and UNION stacks them up.
The last section just counts the number of observations in DATASET.
proc sql;
select n0.Variable,
n0.N_0 label="Number 0",
n.count as N
from (
select "A" as Variable,
count(a) as N_0
from dataset
where a=0
UNION
select "B" as Variable,
count(b) as N_0
from dataset
where b=0
UNION
select "C" as Variable,
count(c) as N_0
from dataset
where c=0
) as n0,
(
select count(*) as count
from dataset
) as n;
quit;
there is levels options in proc freq you could use.
proc freq data=dataset levels;
table _numeric_;
run;

How to easly reformat dataset in SAS

Suppose a data are as follows:
A B C
1 3 2
1 4 9
2 6 0
2 7 3
where A B and C are the variable names.
Is there a way to transform the table to
A 1
A 1
A 2
A 2
B 3
B 4
B 6
B 7
C 2
C 9
C 0
C 3
Expanding on the advice from #donPablo, here's how you would code it. Create an array to read across the data, then output each iteration of that array so you end up with the number of rows being the rows * columns from the original dataset. The VNAME function enables you to store the variable name (A, B, C) as a value in a separate variable.
data have;
input A B C;
datalines;
1 3 2
1 4 9
2 6 0
2 7 3
;
run;
data want;
set have;
length var1 $10;
array vars{*} _numeric_;
do i=1 to dim(vars);
var1=vname(vars{i});
var2=vars{i};
keep var1 var2;
output;
end;
run;
proc sort data=want;
by var1;
run;
The least amount of (expensive) development time might be --
Read and store the first row
For each subsequent row
Read the row
Create three records
Until end
Sort
How many times will this be run? Per day/ per year?
What number of rows are there?
Might we save 1 hr / month? 1 min / year? Something will need to read the entire file. Optomize last. Make it work first.
tkx
It should work correctly:
DATA A(keep A);
new_var = 'A';
SET your_data;
RUN;
DATA B(keep B);
new_var = 'B';
SET your_data;
RUN;
DATA C(keep C);
new_var = 'C';
SET your_data;
RUN;
PROC APPEND base=A data=B FORCE;
RUN;
PROC APPEND base=A data=C FORCE;
RUN;
Data A is a result data set.