In the Data Step of SAS, you get value of a Column by directly using its name, for example, like this,
name = col1;
But for some reason, I want to get value of a column where column is represented by a string. For example, like this,
name = get_value_of_column(cats("col", i))
Is this possible? And if so, how?
The DATA Step functions VVALUE and VVALUEX will return the formatted value of a variable.
VVALUE(<variable-name>) static, a step compilation time interaction
VVALUEX(<expression>) dynamic, a runtime expression resolving to a variable name
The actual value of the variable can be dynamically obtained via a _type_ array scan
Array Scan
data have;
input name $ x y z (s t u) ($) date: yymmdd10.;
format s t u $upcase. date yymmdd10.;
datalines;
x 1 2 3 a b c 2020-10-01
y 2 3 4 b c d 2020-10-02
z 3 4 5 c d e 2020-10-03
s 4 5 6 hi ho silver 2020-10-04
t 5 6 7 aa bb cc 2020-10-05
u 6 7 8 -- ** !! 2020-10-06
date 7 8 9 ppp qqq rrr 2020-10-07
;
data want;
set have;
length u_vvalue name_vvaluex $20.;
u_vvalue = vvalue(u);
name_vvaluex = vvaluex(name);
array nums _numeric_;
array chars _character_;
/* NOTE:
* variable based arrays cause automatic variable _i_ to be in the PDV
* and _i_ will be automatically dropped from output data sets
*/
do _i_ = 1 to dim(nums);
if upcase(name) = upcase(vname(nums(_i_))) then do;
name_numeric_raw = nums(_i_);
leave;
end;
end;
do _i_ = 1 to dim(chars);
if upcase(name) = upcase(vname(chars(_i_))) then do;
name_character_raw = chars(_i_);
leave;
end;
end;
run;
If you perform an 'excessive' amount of dynamic value lookup in your DATA Step a transposition could possibly lead to simpler processing.
I have a parametrization table that mentions whether the (i,j) th element of "matrix 1" is zero, residual of the row sum or has to be read from the data table. I also have a data table with all the values for different segments. How do I construct the matrix?
For example, let's say "param_table" is the parametrization table:
data param_table;
infile datalines dsd;
length FieldName $20 FieldSourceTable $20;
input Matrix_Id Column_Order Row_Order IsZero IsRowResidual IsColumnResidual FieldName FieldSourceTable;
datalines;
1, 1, 1, 0, 1, 0, ., .
1, 1, 2, 0, 0, 0, xyz, table1
1, 1, 3, 0, 0, 0, abc, table1
1, 2, 1, 1, 0, 0, ., .
1, 2, 2, 0, 0, 0, pqr, table1
1, 2, 3, 0, 0, 0, mno, table1
1, 3, 1, 0, 0, 0, ab, table1
1, 3, 2, 0, 0, 0, pq, table1
1, 3, 3, 0, 1, 0, ., .
;
"table 1" is the actual data containing the values and references from earlier table:
data table1;
input Year (country method Segment) ( : $12.)
ABC XYZ PQR MNO AB PQ;
datalines;
2017 France ABC Retail 0.2 0.5 0.4 0.3 0.6 0.1
2017 France XYZ Corporate 0.1 0.5 0.4 0.2 0.6 0.2
;
run;
How do I create matrices with these rules for each row (each key set) in table 1? For example, matrix for row 1 of "table1" would be:
(1-ab) 0 ab
xyz pqr pq
abc mno (1-abc-mno)
(1,1)th and (3,3) th elements are row residuals, therefore are (1 - sum of rest of the row), whereas (1,2)th element is 0:
0.4 0 0.6
0.5 0.4 0.1
0.1 0.3 0.6
I have added the data steps for file "param_table" which contains the references (the column names) and if it is zero or row residual. Also added the "table1" file which contains the actual values. For each row of "table1" we should have a matrix based on the rules mentioned in param_table.
Thanks!
Each matrix defined in param_table will correspond to a 2-D array associated with each row in table1. Suppose you have macro matrixfier that generates the source code statements needed to map from the table1 data into a specified array (i.e. matrix).
%macro matrixfier (matrix_id=1, arrayName=x, out=);
%local rowCount colCount source z i;
%local s1 s2 s3 s4 s addr;
The macro will have to examine the parameter data to determine if the settings it contains are rational with regards to code generation.
proc sql noprint;
select *
from PARAM_TABLE where matrix_id = &matrix_id
and ( iszero not in (0,1) or
isrowresidual not in (0,1) or
iscolumnresidual not in (0,1) or
sum(iszero,isrowresidual,iscolumnresidual) not in (0,1)
);
%if &sqlobs %then %do;
%put ERROR: Parameters for matrix_id=&matrix_id. rejected for is* settings.;
%abort cancel;
%end;
select max(z) as z into :z from
( select column_order, sum(iscolumnresidual) as z
from PARAM_TABLE where matrix_id = &matrix_id
and iscolumnresidual
group by column_order
);
%if &z > 1 %then %do;
%put ERROR: Parameters for matrix_id=&matrix_id. rejected for iscolumnresidual settings.;
%abort cancel;
%end;
Determine how large the target array needs to be. Also, presume there is only one source table per matrix defined
select max(column_order), max(row_order), max(fieldsourcetable)
into :colCount, :rowCount, :source
from PARAM_TABLE where matrix_id = &matrix_id
;
Code generate DATA Step statements for assigning a value directly.
select cats("&arrayName.(",row_order,',',column_order,')=', fieldname)
into :s1 separated by ';'
from PARAM_TABLE where matrix_id = &matrix_id
and iszero=0 and isrowresidual=0 and iscolumnresidual=0
order by row_order, column_order
;
Code generate DATA Step statements for assigning a zero value.
select cats("&arrayName.(",row_order,',',column_order,')=0')
into :s2 separated by ';'
from PARAM_TABLE where matrix_id = &matrix_id
and iszero
order by row_order, column_order
;
Code generate DATA Step statements for computing row residuals.
%do i = 1 %to &rowCount;
select
cats(B.row_order,',',B.column_order),
'-' || A.fieldname
into
:addr,
:s separated by ','
from PARAM_TABLE A
join PARAM_TABLE B
on A.matrix_id = B.matrix_id
and A.row_order = B.row_order
where
A.matrix_id = &matrix_id and A.row_order=&i
and A.isrowresidual=0 and A.iszero=0 and A.iscolumnresidual=0
and B.isrowresidual=1
;
%if &sqlobs > 0 %then %let s3=&s3&arrayName.(&addr)=sum(1,&s)%str(;);
%end;
Code generate DATA Step statements for computing column residuals.
%do i = 1 %to &colCount;
select
cats(B.row_order,',',B.column_order),
'-' || A.fieldname
into
:addr,
:s separated by ','
from PARAM_TABLE A
join PARAM_TABLE B
on A.matrix_id = B.matrix_id
and A.column_order = B.column_order
where
A.matrix_id = &matrix_id and A.column_order=&i
and A.isrowresidual=0 and A.iszero=0 and A.iscolumnresidual=0
and B.iscolumnresidual=1
;
%if &sqlobs > 0 %then %let s4=&s4&arrayName.(&addr)=sum(1,&s)%str(;);
%end;
quit;
Assemble the statements in a DATA Step.
data &out;
set &source;
array &arrayName.(&rowCount,&colCount);
call missing (of &arrayName.(*));
* assignments;
&s1;
* zeroes;
&s2;
* row residuals;
&s3;
* column residuals;
&s4;
* log the matrix for this row;
do _i = 1 to dim(&arrayName.,1);
do _j = 1 to dim(&arrayName.,2);
putlog &arrayName(_i,_j) 6.2 #;
end;
putlog;
end;
putlog;
run;
%mend;
Resolve the parameters as applied to data
options mprint;
%matrixfier(matrix_id=1, arrayName=x, out=each);
Assume I have a data-set D1 as follows:
ID ATR1 ATR2 ATR3
1 A R W
2 B T X
1 A S Y
2 C T E
3 D U I
1 T R W
2 C X X
I want to create a data-set D2 from this as follows
ID ATR1 ATR2 ATR3
1 A R W
2 C T X
3 D U I
In other words, Data-set D2 consists of unique IDs from D1. For each ID in D2, the values of ATR1-ATR3 are selected as the most frequent (of the respective variable) among the records in D1 with the same ID. For example ID = 1 in D2 has ATR1 = A (most frequent).
I have one solution which is very clumsy. I simply sort copies of the data set `D1' three times (by ID and ATR1 e.g) and remove duplicates. I later merge the three data-sets to get what I want. However, I think there might be an elegant way to do this. I have about 20 such variables in the original data-set.
Thanks
/*
read and restructure so we end up with:
id attr_id value
1 1 A
1 2 R
1 3 W
etc.
*/
data a(keep=id attr_id value);
length value $1;
array attrs_{*} $ 1 attr_1 - attr_3;
infile cards;
input id attr_1 - attr_3;
do attr_id=1 to dim(attrs_);
value = attrs_{attr_id};
output;
end;
cards;
1 A R W
2 B T X
1 A S Y
2 C T E
3 D U I
1 T R W
2 C X X
;
run;
/* calculate frequencies of values per id and attr_id */
proc freq data=a noprint;
tables id*attr_id*value / out=freqs(keep=id attr_id value count);
run;
/* sort so the most frequent value per id and attr_id ends up at the bottom of the group.
if there are ties then it's a matter of luck which value we get */
proc sort data = freqs;
by id attr_id count;
run;
/* read and recreate the original structure. */
data b(keep=id attr_1 - attr_3);
retain attr_1 - attr_3;
array attrs_{*} $ 1 attr_1 - attr_3;
set freqs;
by id attr_id;
if first.id then do;
do i=1 to dim(attrs_);
attrs_{i} = ' ';
end;
end;
if last.attr_id then do;
attrs_{attr_id} = value;
end;
if last.id then do;
output;
end;
run;
Maybe a stupid question...
I got following dataset:
id count
x 1
y 2
z 3
a 1
b 2
c 3
etc.
And i want this:
id count group
x 1 1
y 2 1
z 3 1
a 1 2
b 2 2
c 3 2
etc.
Here is what I try:
data macro_1; set vix.macro_spy; where macro=1;
count+1;
if count>3 then do;
count=1;
end;
group=0;
if count=1 then group+1;
run;
But it is not working. How can I add all 'group' by one if I once get a 'count=1'?
Thanks.
even simpler
data want;
set vix.macro_spy;
group+(count=1);
run;
I'm not sure I understand what you need. So you have this dataset ordered so that values of variable count always go 1, 2, 3, 1, 2, 3, 1, 2, 3...
Now, you want to generate variable group so that value increments every time variable count passes over 3?
If so, you could do something like this:
data group;
set vix.macro_spy;
retain group;
if _N_ = 1 then group = 0;
if count = 1 then group + 1;
run;
This is the general pattern that I'm using.
if _N_ = 1 part is executed only once, this is where you initialize you variables.
retain statement ensures that the variable will retain its value from one iteration of the DATA step to the next.
I have a dataset like this(sp is an indicator):
datetime sp
ddmmyy:10:30:00 N
ddmmyy:10:31:00 N
ddmmyy:10:32:00 Y
ddmmyy:10:33:00 N
ddmmyy:10:34:00 N
And I would like to extract observations with "Y" and also the previous and next one:
ID sp
ddmmyy:10:31:00 N
ddmmyy:10:32:00 Y
ddmmyy:10:33:00 N
I tired to use "lag" and successfully extract the observations with "Y" and the next one, but still have no idea about how to extract the previous one.
Here is my try:
data surprise_6_step3; set surprise_6_step2;
length lag_sp $1;
lag_sp=lag(sp);
if sp='N' and lag(sp)='N' then delete;
run;
and the result is:
ID sp
ddmmyy:10:32:00 Y
ddmmyy:10:33:00 N
Any methods to extract the previous observation also?
Thx for any help.
Try using the point option in set statement in data step.
Like this:
data extract;
set surprise_6_step2 nobs=nobs;
if sp = 'Y' then do;
current = _N_;
prev = current - 1;
next = current + 1;
if prev > 0 then do;
set x point = prev;
output;
end;
set x point = current;
output;
if next <= nobs then do;
set x point = next;
output;
end;
end;
run;
There is an implicite loop through dataset when you use it in set statement.
_N_ is an automatic variable that contains information about what observation is implicite loop on (starts from 1). When you find your value, you store the value of _N_ into variable current so you know on which row you have found it. nobs is total number of observations in a dataset.
Checking if prev is greater then 0 and if next is less then nobs avoids an error if your row is first in a dataset (then there is no previous row) and if your row is last in a dataset (then there is no next row).
/* generate test data */
data test;
do dt = 1 to 100;
sp = ifc( rand("uniform") > 0.75, "Y", "N" );
output;
end;
run;
proc sql;
create table test2 as
select *,
monotonic() as _n
from test
;
create table test3 ( drop= _n ) as
select a.*
from test2 as a
full join test2 as b
on a._n = b._n + 1
full join test2 as c
on a._n = c._n - 1
where a.sp = "Y"
or b.sp = "Y"
or c.sp = "Y"
;
quit;