Let's say I have data that look like this:
DATA temp;
INPUT id a1 b2 d1 f8;
DATALINES;
1 2.3 2.1 4.2 1.2
2 5.3 2.3 1.5 3.2
3 1.2 5.4 6.6 6.6
;
run;
What I want to do is use the data and set statements to say that if the values in a1 and f8 are less than the means of a1 and f8 (respectively), then those values are missing. So the resulting dataset would look like:
id a1 b2 d1 f8
1 . 2.1 4.2 .
2 5.3 2.3 1.5 .
3 . 5.4 6.6 6.6
Any tips for how I would start on this? I'm new to SAS and the examples in the manuals have not been very helpful. I had been thinking of something like this (but it doesn't work):
DATA temp2;
SET temp;
IF a1 < mean(a1) THEN a1=.;
IF f8 < mean(f8) THEN f8=.;
RUN;
The SAS implementation of SQL can do automatic application of group or data wise aggregates against a result set.
Proc SQL;
create table want as
select
case when (a1 < mean(a1)) then . else a1 as a1,
b2,
d1,
case when (f8 < mean(f8)) then . else f8 as f8
from have;
A solution that uses DATA step will need to precompute data set statistics, commonly with a procedure such as MEANS, SUMMARY or UNIVARIATE.
proc means noprint data=have;
output out=have_means mean(a1 f8)= / autoname;
run;
data want;
if _n_ = 1 then do;
set have_means(keep=a1_mean f8_mean);
end;
set have;
if a1 < a1_mean then a1 = .;
if f8 < f8_mean then f8 = .;
drop a1_mean f8_mean;
run;
Other techniques can update a data set in place and would use SQL UPDATE or DATA step MODIFY
mean function is applied across the row and not within the column in datastep, that is why you are not getting the results. #Richard answer is perfect. to do in datastep to get mean, you need to use DOW loop and then append with main dataset. It is much easier to use proc summary as #Richard explains.
data temp2_intial(keep= mean_a1 mean_f8);
do until(eof);
set temp end =eof;
tot_a1 = sum(tot_a1, a1);
cnt_a1=sum(cnt_a1,1);
mean_a1 = tot_a1/cnt_a1;
tot_f8 = sum(tot_f8, f8);
cnt_f8=sum(cnt_f8,1);
mean_f8 = tot_f8/cnt_f8;
end;
run;
data temp2(drop= mean_a1 mean_f8);
set temp ;
if _n_ =1 then set temp2_intial;
IF a1 < mean_a1 THEN a1=. ;
IF f8 < mean_f8 THEN f8=.;
run;
Related
I have a table which contains one key id and 100 variables (x1, x2, x3 ..... x100) and i need to check every variables if there are any values stored as -9999, -8888, -7777, -6666 in of them.
For one variable i use
proc sql;
select keyid, x1
from mytable
where x1 in(-9999,-8888,-7777,-6666);
quit;
This is the data i am trying to get but it is just for one variable.
I do not have time for copying and pasting all the variables (100 times) in this basic query.
I have searched the forum but the answers i have found are a bit far from what i actually need
and since i am new to SAS i can not write a macro.
Can you help me please?
Thanks.
Try this. Just made up some sample data that resembles what you describe :-)
data have;
do key = 1 to 1e5;
array x x1 - x100;
do over x;
x = rand('integer', -10000, -5000);
end;
output;
end;
run;
data want;
set have;
array x x1 - x100;
do over x;
if x in (-9999, -8888, -7777, -6666) then do;
output;
leave;
end;
end;
run;
Don't use SQL. Instead use normal SAS code so you can take advantage of SAS syntax like ARRAYs and variable lists.
So make an array containing the variable you want to look at. Then loop over the array. There is no need to keep looking once you find one.
data want;
set mytable;
array list var1 varb another_var x1-x10 Z: ;
found=0;
do index=1 to dim(list) until (found);
found = ( list[index] in (-9999 -8888 -7777 -6666) );
end;
if found;
run;
And if you want to search all of the numeric variables you can even use the special variable list _NUMERIC_ when defining the array:
array list _numeric_;
thank you for your help i have found a solution and wanted to share it with you.
It has some points that needs to be evaluated but it is fine for me now. (gets the job done)
`%LET LIB = 'LIBRARY';
%LET MEM = 'GIVENTABLE';
%PUT &LIB &MEM;
PROC SQL;
SELECT
NAME INTO :VARLIST SEPARATED BY ' '
FROM DICTIONARY.COLUMNS
WHERE
LIBNAME=&LIB
AND
MEMNAME=&MEM
AND
TYPE='num';
QUIT;
%PUT &VARLIST;
%MACRO COUNTS(INPUT);
%LOCAL i NEXT_VAR;
%DO i=1 %TO %SYSFUNC(COUNTW(&VARLIST));
%LET NEXT_VAR = %SCAN(&VARLIST, &i);
PROC SQL;
CREATE TABLE &NEXT_VAR AS
SELECT
COUNT(ID) AS NUMBEROFDESIREDVALUES
FROM &INPUT
WHERE
&NEXT_VAR IN (6666, 7777, 8888, 9999)
GROUP BY
&NEXT_VAR;
QUIT;
%END;
%MEND;
%COUNTS(GIVENTABLE);`
The answer you provided to your own question gives more insight to what you really wanted. However, the solution you offered while it works is not very efficient. The SQL statement runs 100 times for each variable in the source data. That means the source table is read 100 times. Another problem is that it creates 100 output tables. Why?
A better solution is to create 1 table that contains the counts for each of the 100 variables. Even better is to do it in 1 pass of the source data instead of 100.
data sum;
set have end=eof;
array x(*) x:;
array csum(100) _temporary_;
do i = 1 to dim(x);
x(i) = (x(i) in (-9999, -8888, -7777, -6666)); * flag (0 or 1) those meeting criteria;
csum(i) + x(i); * cumulative count;
if eof then do;
x(i) = csum(i); * move the final total to the orig variable;
end;
end;
if eof then output; * only output the final obs which has the totals;
drop key i;
run;
Partial result:
x1 x2 x3 x4 x5 x6 x7 x8 ...
90 84 88 85 81 83 59 71 ...
You can keep it in that form or you can transpose it.
proc transpose data=sum out=want (rename=(col1=counts))
name=variable;
run;
Partial result:
variable counts
x1 90
x2 84
x3 88
x4 85
x5 81
... ...
I just started using SAS and I'm trying to combine columns.
I've got table mainData
A1 A2 A3 A4
1 4 7 10
2 5 8 11
3 6 9 12
I want to create a new table rearrangedData
Type Value
A1 1
A1 2
A1 3
A2 4
A2 5
A2 6
A3 7
A3 8
A3 9
A4 10
A4 11
A4 12
There must be a simple solution to this I just can't figure this out. I'm thinking of writing do loop, but what if I don't know size of a table or amount of lines in a specific column. I can't figure how I would get such information in SAS.
This somewhat unusual transformation can be done via a transpose and some array logic:
data have;
input A1 A2 A3 A4;
cards;
1 4 7 10
2 5 8 11
3 6 9 12
;
run;
proc transpose data = have out = tr name=type prefix = r;
run;
data want;
set tr;
array r{*} r:;
do i = 1 to dim(r);
value = r[i];
output;
end;
drop i r:;
run;
Also, this preserves the original order without requiring a sort.
Make a dummy variable, then transpose data.
data have;
set have;
id=_n_;
run;
proc transpose data=have out=temp;
by id;
var A1-A4;
run;
proc sort data=temp out=want(rename=(_name_=type col1=value) drop=id);
by _name_;
run;
If you want to preserve the original order then you could use the POINT= option on the SET statement to loop over the data set once per variable (column).
So this data set will read the first observations just to get the variables defined. Then define the array VALUES so that we can use DIM(VALUES) to know how many columns. Then it uses the POINT= and NOBS= options on the SET statement to control the other loop. It uses the VNAME() function to find the name of the current variable in the array.
data want ;
set have ;
array values _numeric_;
do col=1 to dim(values);
length type $32 value 8;
type=vname(values(col));
do row=1 to nobs ;
set have point=row nobs=nobs ;
value=values(col);
output;
keep type value;
end;
end;
stop;
run;
For example, i have a data set like this (the value a1 a2 a3 b1 b2 b3 are numeric):
A B
a1 b1
a2 b2
a3 b3
I want to compare the average of 2 class A and B using proc ttest. But it seems that i have to change my data set in order to use this proc. I read lots of tutorials about the proc ttest and all of them use the data sets in this form below:
class value
A a1
A a2
A a3
B b1
B b2
B b3
So my question is: Does it exist a method to do the proc ttest without changing my data set?
Thank you and sorry for my bad english :D
The short answer is no, you can't run a ttest in SAS that compares multiple columns. proc ttest, when used for 2 samples, relies on the variable in the class statement to compare the groups. Only one variable can be entered and it must have 2 levels, therefore the structure of your data is not compatible with this.
You will therefore need to change the data layout, although you could do this in a view so that you don't create a new physical dataset. Here's one way to do that.
/* create dummy data */
data have;
input A B;
datalines;
10 11
15 14
20 21
25 24
;
run;
/* create a view that turns vars A and B into a single variable */
data have_trans / view=have_trans;
set have;
array vals{2} A B;
length grouping $2;
do i = 1 to 2;
grouping = vname(vals{i}); /* extracts the current variable name (A or B) */
value = vals{i}; /* extracts the current value */
output;
end;
drop A B i; /* drop unwanted variables */
run;
/* perform ttest */
proc ttest data=have_trans;
class grouping;
var value;
run;
data a1
col1 col2 flag
a 2 .
b 3 .
a 4 .
c 1 .
For data a1, flag is always missing. I want to update multiple rows using a2.
data a2
col1 flag
a 1
Ideal output:
col1 col2 flag
a 2 1
b 3 .
a 4 1
c 1 .
But this doesn't update all the records in by statement.
data a1;
modify a1 a2;
by col1;
run;
Question edited
Actually a1 is a very large data set on server. Hence I prefer to modify it (if possible) instead of creating a new one. Otherwise I have to drop previous a1 first and copy a new a1 from local to server, which will take much more time.
If you want to do this with MODIFY, you have to loop over the modify dataset in some fashion or it will only replace the first row (because the other dataset will then run out of records - normally this behaves like merge, where once it finds a match it advances to next record). Here's one option - there are others.
data a1(index=(col1));
input col1 $ col2 flag;
datalines;
a 2 .
b 3 .
a 4 .
c 1 .
;;;;
run;
data a2(index=(col1));
col1='a';
flag=1;
run;
data a1;
set a2(rename=flag=flag2);
do _n_ = 1 to nobs_a1;
modify a1 key=col1 nobs=nobs_a1;
if _iorc_=0 then do;
flag=flag2;
replace;
end;
end;
if _iorc_=%sysrc(_DSENOM) then _error_=0;
run;
If you're not using Merge statement for the sorting problem, you can simply change your merging approach.
If flag in A1 is always missing, you can drop it, otherwise you should temporary rename it for not losing those informations.
Here I will merge A1 and A2 using hash objects, this approach doesn't require any prior sorting on datasets.
data final_merged(drop = finder);
length flag 8.; /*please change length with the real one, use $ if char*/
if _N_ = 1 then do;
declare hash merger(dataset:'A2');
merger.definekey('col1');
merger.DefineData ('flag');
merger.definedone();
end;
set A1(drop=flag);
finder = merger.find();
if finder ne 0 then flag = .;
/*then flag='' or then flag='unknown' as you want if flag is a character var*/
run;
Please, let me know if this will help.
You could do the following but SQL sorts the observations so not sure how useful this would be for you? (you could always preprocess with ordvar=_n_; and then sort the SQL statement on it if that helps):
Data:
data a1 ;
input col1 $ col2 flag ;
cards ;
a 2 .
b 3 .
a 4 .
c 1 .
;run ;
data a2 ;
input col1 $ flag ;
cards ;
a 1
;run ;
Merge:
proc sql ;
create table output as
select a.col1, a.col2, b.flag
from a1 a
left join
a2 b
on a.col1=b.col1
;quit ;
To try and do it in one pass, how about creating two macros variables containing the mapping from a2?
proc sql ;
select distinct col1, flag
into :colvals separated by '', :flagvals separated by ''
from a2
;quit ;
Set flag to the corresponding character position between the two macro variables:
data a1 ;
set a1 ;
if findc("&colvals",col1) then
flag=input(substr("&flagvals", findc("&colvals",col1),1),8.) ;
run ;
We can make macro variables via the SAS SQL Procedure, using the syntax
select var into :mvar
But I wonder if there's same way in data step.
I have a dataset.
A B
=== ===
a1 b1
a2 b2
a3 b3
I can make a macro variable called MA with the below statement.
proc sql noprint;
select "'"||A||"'" into :MA separated by ","
from dataset;
quit;
How to do it in Data-step?
Firstly, creating your sample dataset:
data dataset;
infile datalines;
input A $ B $;
datalines;
a1 b1
a2 b2
a3 b3
;
run;
The step below almost does what your PROC SQL does, using CALL SYMPUT to output a macro variable called MA:
data _NULL_;
retain amac;
length amac $35;
set dataset;
if _N_ = 1 then amac=a; else amac=cats(amac,',',a);
put amac=;
call symputx('MA',amac);
run;
%put &MA;