My question is related with below topic:
SAS check field by field.
I'm searching method which set (put) to string/variable name of columns if field has value of 0.
Is there an elegant method for the?
Best regards!
The VNAME function will return the name of the variable corresponding to an array reference.
data Have;
input REFERENCE_DATE
L_CONTRACT
L_CONTRACT_ACTIVITY
L_LFC
L_CONTRACT_CO_CUSTOMER
L_CONTRACT_OBJECT
L_CUSTOMER
L_CUSTOMER_RETAIL
L_DPD
L_GL_ACCOUNT
L_GL_AMOUNT
L_EXTRA_COST
L_PRODUCT;
datalines;
450 1 9 8 6 0 4 3 0 0 0 0 0
;
data want;
length vars_with_zero $1000;
set have;
array L L_CONTRACT -- L_CUSTOMER_RETAIL;
* accumulate the names of the variables that have a zero value;
do _n_ = 1 to dim(L);
if L(_n_) = 0 then vars_with_zero = catx(' ', vars_with_zero, vname(L(_n_)));
end;
run;
Related
In the Data Step of SAS, you get value of a Column by directly using its name, for example, like this,
name = col1;
But for some reason, I want to get value of a column where column is represented by a string. For example, like this,
name = get_value_of_column(cats("col", i))
Is this possible? And if so, how?
The DATA Step functions VVALUE and VVALUEX will return the formatted value of a variable.
VVALUE(<variable-name>) static, a step compilation time interaction
VVALUEX(<expression>) dynamic, a runtime expression resolving to a variable name
The actual value of the variable can be dynamically obtained via a _type_ array scan
Array Scan
data have;
input name $ x y z (s t u) ($) date: yymmdd10.;
format s t u $upcase. date yymmdd10.;
datalines;
x 1 2 3 a b c 2020-10-01
y 2 3 4 b c d 2020-10-02
z 3 4 5 c d e 2020-10-03
s 4 5 6 hi ho silver 2020-10-04
t 5 6 7 aa bb cc 2020-10-05
u 6 7 8 -- ** !! 2020-10-06
date 7 8 9 ppp qqq rrr 2020-10-07
;
data want;
set have;
length u_vvalue name_vvaluex $20.;
u_vvalue = vvalue(u);
name_vvaluex = vvaluex(name);
array nums _numeric_;
array chars _character_;
/* NOTE:
* variable based arrays cause automatic variable _i_ to be in the PDV
* and _i_ will be automatically dropped from output data sets
*/
do _i_ = 1 to dim(nums);
if upcase(name) = upcase(vname(nums(_i_))) then do;
name_numeric_raw = nums(_i_);
leave;
end;
end;
do _i_ = 1 to dim(chars);
if upcase(name) = upcase(vname(chars(_i_))) then do;
name_character_raw = chars(_i_);
leave;
end;
end;
run;
If you perform an 'excessive' amount of dynamic value lookup in your DATA Step a transposition could possibly lead to simpler processing.
I want to recode the max value of a variable as 1 and 0 when it is not. For each variable, there may be multiple observations with the max value. The max value for each value is not fixed, i.e. from cycle to cycle the max value for each variable may change. And there are hundreds of variables, cannot "hard-code" anything.
The final product would have the same dimensions as the original table, i.e. equal number of rows and columns as a matrix of 0s and 1s.
This is within SAS. I attempted to calculate the max of each variable and then append these max as a new observation into the data. Then comparing down the column of each variable against the "max" observation... looking into examples of the following did not help:
SQL
Array in datastep
proc transpose
formatting
Any insight would be much appreciated.
Here is a version done with SQL:
The idea is that we first calculate the maximum. The Latter select. Then we join the data to original and the outer the case-select specifies if the flag is set up or not.
data begin;
input var value;
cards;
1 1
1 2
1 3
1 2.5
1 1.7
1 3
2 34
2 33
2 33
2 33.7
2 34
2 34
; run;
proc sql;
create table result as
select a.var, a.value, case when a.value = b.maximum then 1 else 0 end as is_max from
(select * from begin) a
left join
(select max(value) as maximum, var from begin group by var) b
on a.var = b.var
;
quit;
To avoid "hard-code" you need to use some code generation.
First let's figure out what code you could use to solve the problem. Later we can look into ways to generate that code.
It is probably easiest to do this with PROC SQL code. SAS will allow you to reference the MAX() value of a variable. Also note that SAS evaluates boolean expressions to 1 (TRUE) or 0 (FALSE). So you just want to generate code like:
proc sql;
create table want as
select var1=max(var1) as var1
, var2=max(var2) as var2
from have
;
quit;
To generate the code you need a list of the variables in your source dataset. You can get those with PROC CONTENTS but also with the metadata table (view) DICTIONARY.COLUMNS (also accessible as SASHELP.VCOLUMN from outside PROC SQL).
If the list of variables is small then you could generate the code into a single macro variable.
proc sql noprint;
select catx(' ',cats(name,'=max(',name,')'),'as',name)
into :varlist separated by ','
from dictionary.columns
where libname='WORK' and memname='HAVE'
order by varnum
;
create table want as
select &varlist
from have
;
quit;
The maximum number of characters that will fit into a macro variable is 64K. So long enough for about 2,000 variables with names of 8 characters each.
Here is little more complex way that uses PROC SUMMARY and a data step with a temporary array. It does not really need any code generation.
%let dsin=sashelp.class(obs=10);
%let dsout=want;
%let varlist=_numeric_;
proc summary data=&dsin nway ;
var &varlist;
output out=summary(drop=_type_ _freq_) max= ;
run;
data &dsout;
if 0 then set &dsin;
array vars &varlist;
array max [10000] _temporary_;
if _n_=1 then do;
set summary ;
do _n_=1 to dim(vars);
max[_n_]=vars[_n_];
end;
end;
set &dsin;
do _n_=1 to dim(vars);
vars[_n_]=vars[_n_]=max[_n_];
end;
run;
Results:
Obs Name Sex Age Height Weight
1 Alfred M 0 1 1
2 Alice F 0 0 0
3 Barbara F 0 0 0
4 Carol F 0 0 0
5 Henry M 0 0 0
6 James M 0 0 0
7 Jane F 0 0 0
8 Janet F 1 0 1
9 Jeffrey M 0 0 0
10 John M 0 0 0
I have a dataset like this(type is an indicator):
datetime type
...
ddmmyy:10:30:00 0
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
...
I was trying to extract data with type 1 and also the previous and next one. Just try to extract (-1,+1) window based on type 1.
datetime type
...
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
...
I found a similar post here. I copied and pasted the code, but I am not quite sure what does 'x' mean in his code. SAS gives me 'File WORK.x does not exist'.
Can someone help me out? Thx.
The X data set in the other post is the same source table you are filtering, so the logical order of the code is:
Check every row in the table 'Have', _N_ holds the current row number,
If Type = 1 then Set Have Point=_N_ goes to row _N_ in the 'Have' table and outputs that row to the new table 'want', then continues to the next row. The _N_ can be the pointer to the current, previous or next row. ( The two IF statements handles the cases of first row and last row; where there is no Previous or no Next)
Full Working Code:
data have;
length datetime $23.;
input datetime $ type ;
datalines;
ddmmyy:10:30:00 0
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
;
run;
data want;
set have nobs=nobs;
if type = 1 then do;
current = _N_;
prev = current - 1;
next = current + 1;
if prev > 0 then do;
set have point = prev;
output;
end;
set have point = current;
output;
if next <= nobs then do;
set have point = next;
output;
end;
end;
run;
proc sort data=want noduprecs;
by _all_ ; Run;
Note: I added an extra step proc sort to remove duplicate rows.
Output:
datetime=ddmmyy:10:31:00 type=0
datetime=ddmmyy:10:32:00 type=1
datetime=ddmmyy:10:33:00 type=0
datetime=ddmmyy:10:34:00 type=1
datetime=ddmmyy:10:35:00 type=0
For you example data that does not have any id or group variable it should be pretty straight forward. Instead of thinking about moving back and forth in the file, just create new variables that contain the previous (LAG_TYPE) and next (LEAD_TYPE) value for TYPE. Then your requirement to keep the observations before the one with TYPE=1 is translated to keeping the observations where LEAD_TYPE=1.
Let's convert your sample data into a dataset.
data have ;
input datetime :$15. type ;
cards;
ddmmyy:10:30:00 0
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
;
Rather than actually keeping the required observations I will make a new variable KEEP that will be true for records that meet your criteria.
data want ;
recno+1;
set have end=eof;
lag_type=lag(type);
if not eof then set have(firstobs=2 keep=type rename=(type=lead_type));
else lead_type=.;
keep= (type=1 or lag_type=1 or lead_type=1) ;
run;
Here is the result.
recno datetime type lag_type lead_type keep
1 ddmmyy:10:30:00 0 . 0 0
2 ddmmyy:10:31:00 0 0 1 1
3 ddmmyy:10:32:00 1 0 0 1
4 ddmmyy:10:33:00 0 1 1 1
5 ddmmyy:10:34:00 1 0 0 1
6 ddmmyy:10:35:00 0 1 . 1
Move next observation up and compare two observations at the same row or use lag to compare the current observation and previous observation.
data have;
length datetime $23.;
input datetime $ type ;
datalines;
ddmmyy:10:30:00 0
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
;
run;
data want;
merge have have(firstobs=2 keep=type rename=(type=_type));
if max(type,_type) or max(type,lag(type)) ;
drop _type;
run;
I have a dataset in SAS and I want to Convert one column into string by the Product. I have attached the image of input and output required.
I need the Colomn STRING in the outut. can anyone please help me ?
I have coded a data step to create the input data:
data have;
input products $
dates
value
;
datalines;
a 1 0
a 2 0
a 3 1
a 4 0
a 5 1
a 6 1
b 1 0
b 2 1
b 3 1
b 4 1
b 5 0
b 6 0
c 1 1
c 2 0
c 3 1
c 4 1
c 5 0
c 6 1
;
Does the following suggested solution give you what you want?:
data want;
length string $ 20;
do until(last.products);
set have;
by products;
string = catx(',',string,value);
end;
do until(last.products);
set have;
by products;
output;
end;
run;
Here's my quick solution.
data temp;
length cat $20.;
do until (last.prod);
set have;
by prod notsorted;
cat=catx(',',cat,value);
end;
drop value date;
run;
proc sql;
create table want as
select have.*, cat as string
from have inner join temp
on have.prod=temp.prod;
quit;
I have a large dataset, with hundreds of variables and hundreds of observations, coming from a clinical trial. Variable V1 is a yes/no variable, indicating some condition. V2 is numeric, and representing a dose. T is a time variable. The dataset is "long" shaped, every subject has few observations, one for each time point. For every subject, I want to create a new yes/no variable (can be in a new dataset), which is yes if: V1 is "yes" in at least one time point, OR, V2 is above 0 in at least one time point. How do I do that? Thank you.
Try the following:
data ds;
set ds;
if V1="yes" or V2>0 then do;
flag="yes;
end;
else do;
flag= "no";
end;
summarize the dataset to ID level:
proc sql;
create table summary as
select ID, count(flag) as flag_cnt
from ds
where flag="yes"
group by ID;
quit;
These are the IDs which satisfy the condition
You can submit the code on the example below to verify.
Here (V1="yes" or V2>0) gives a dummy variable for eauch row. When we sum it we have the number of rows satisfying the condition you mentioned for each ID.
To have a flag, we compare the sum to 0 and put it between () to create a 0/1 variable that you want to have.
hope it helps !
MK
data have;
input ID V1 $ V2;
cards;
1 yes 0
1 no 0
1 no 0
2 no 0
2 no 0
2 no 0
3 no 1
3 no 0
4 yes 0
4 yes 0
5 yes 1
5 no 1
5 yes 0
;
run;
proc sql;
select ID
, (sum((V1="yes")or(V2>0))>0) as new_flag
from have
group by ID;
quit;
data want (keep=id flag);
flag = 'no ';
do until (last.id);
set have;
by id;
if v1 = 'yes' or v2 > 0 then flag = 'yes';
end;
output;
run;