Split value of a variable to multiple variables - sas

i have a variable with values (14 12 13 15 15 14). i need to create new variables and assign the value accordingly Example: value 14 in var14 and value 2 in Var2

data a;
input a1 ##;
cards;
14 12 13 15 15 14
;
run;
%macro m;
proc sql noprint;
select distinct a1 into :k separated by '#' from WORK.a;
select count(distinct a1) into :c from WORK.a;
quit;
%do i=1 %to &c;
%let var%scan(&k, &i, '#')=%scan(&k, &i, '#');
%put var%scan(&k, &i, '#');
%end;
%mend m;
%m;
**************Log Results***************
var12
var13
var14
var15

Related

Add new empty rows to a SAS table with names from another table

Assume I have table foo which contains a (dynamic) list of new rows which I want to add to another table have, so that it yields a table want looking e.g. like this:
x y p_14 p_15
1 2 2 99
2 4 7 24
Example data for foo:
id row_name
14 p_14
15 p_15
Example data for have:
x y p Z
1 2 14 2
1 2 15 99
1 2 16 59
2 4 14 7
2 4 15 24
2 4 16 58
What I have so far is the following which is not yet in macro shape:
proc sql;
create table want as
select old.*, t1.p_14, t2.p_15 /* choosing non-duplicate rows */
from (select x, y from have) old
left join (select x, y, z as p_14 from have where p=14) t1
on old.x=t1.x and old.y=t1.y
left join (select x, y, z as p_15 from have where p=15) t2
on old.x=t2.x and old.y=t2.y
;
quit;
Ideally, I am aiming for a macro where which takes foo as input and automatically creates all the joins from above. Also, the solution should not spit out any warnings in the console. My challenge is how to dynamically choose the correct (non-duplicate) rows.
PS: This is a follow-up question of Populate SAS macro-variable using a SQL statement within another SQL statement? The important bit is that it is not a full transpose, I guess.
You can go from HAVE to WANT with PROC TRANSPOSE.
proc transpose data=have out=want(drop=_name_) prefix=p_ ;
by x y ;
id p ;
var z;
run;
To limit it to the values of P that occur in FOO you could use a macro variable (as long as the number of observations in FOO is small enough).
proc sql noprint ;
select id into :idlist separated by ' ' from foo ;
quit;
proc transpose data=have out=want(drop=_name_) prefix=p_ ;
where p in (&idlist) ;
by x y ;
id p ;
var z;
run;
If the issue is you want variable P_17 to be in the result even if 17 does not appear in HAVE then add a little more complexity. For example add another data step that will force the creation of the empty variables. You can generate the list of variable names from the list of id's in FOO.
proc sql noprint ;
select id , cats('p_',id)
into :idlist separated by ' '
, :varlist separated by ' '
from foo
;
quit;
proc transpose data=have out=want(drop=_name_) prefix=p_ ;
where p in (&idlist) ;
by x y ;
id p ;
var z;
run;
data want ;
set want (keep=x y);
array all &varlist ;
set want ;
run;
Results:
Obs x y p_14 p_15 p_17
1 1 2 2 99 .
2 2 4 7 24 .
If the number of values is too large to store in a single macro variable (limit 64K bytes) you could generate the WHERE statement with a data step to a file and use %INCLUDE to add the WHERE statement into the code.
filename where temp;
data _null_;
set foo end=eof;
file where ;
if _n_=1 then put 'where p in (' #;
put id # ;
if eof then put ');' ;
run;
proc transpose ... ;
%include where / source2;
...
Use macro program:
data have;
input x y p Z;
cards;
1 2 14 2
1 2 15 99
1 2 16 59
2 4 14 7
2 4 15 24
2 4 16 58
;
data foo;
input id row_name $;
cards;
14 p_14
15 p_15
;
%macro test(dsn);
proc sql;
select count(*) into:n trimmed from &dsn;
select id into: value separated by ' ' from &dsn;
create table want as
select distinct a.x,a.y,
%do i=1 %to &n;
%let cur=%scan(&value,&i);
t&i..p_&cur
%if &i<&n %then ,;
%else ;
%end;
from have a
%do i=1 %to &n;
%let cur=%scan(&value,&i);
left join have (where=(p=&cur) rename=(z=p_&cur.)) t&i.
on a.x=t&i..x and a.y=t&i..y
%end;
;
quit;
%mend;
%test(foo);

how can I build a loop for macro in SAS?

I want to do a simulation based on macro in SAS. I can build a function named 'fine()', the code is as follows
DATA CLASS;
INPUT NAME $ SEX $ AGE HEIGHT WEIGHT;
CARDS;
ALFRED M 14 69.0 112.5
ALICE F 13 56.5 84.0
BARBARA F 13 65.3 98.0
CAROL F 14 62.8 102.5
HENRY M 14 63.5 102.5
RUN;
PROC PRINT;
TITLE 'DATA';
RUN;
proc print data=CLASS;run;
PROC FCMP OUTLIB = work.functions.func;
function populationCalc(HEIGHT,WEIGHT,thres);
pop=HEIGHT-WEIGHT-thres;
return (pop);
ENDSUB;
options cmplib=(work.functions);
%macro fine(i);
data ex;
set CLASS;
thres=&i;
pop = populationCalc(HEIGHT,WEIGHT,thres);
if (pop>50) then score=1;
else score=0;
run;
proc iml;
USE ex;
READ all var _ALL_ into ma[colname=varNames];
CLOSE ex;
nn=nrow(ma);
total_score=sum(ma[,'thres']);
avg_score=sum(ma[,'thres'])/nn;
print total_score avg_score;
%mend fine;
%fine(10);
%fine(100);
%fine(150);
I want to build a loop for function 'fine()' ans also use macro, but the result is not as I expect. How can I fix this?
%macro ct(n);
data data_want;
%do i=1 %to &n;
x=%fine(&i);
output x;
%end;
run;
%macro ct;
%ct(10);
%fine does not generate any text that can be used in the context of a right hand side (RHS) of a DATA Step variable assignment statement.
You seem to perhaps want this data set as a result of invoking %ct
i total_score average_score
- ----------- -------------
1 5 1
2 10 2
3 15 3
etc...
Step 1. Save IML result
Add this to the bottom of IML code in %fine, replacing the print
create fine_out var {total_score avg_score};
append;
close fine_out;
quit;
Step 2. Rewrite ct macro
Invoke %fine outside a DATA step context so the DATA and IML steps can run. Append the IML output to a results data set.
%macro ct(n,out=result);
%local i;
%do i=1 %to &n;
%fine(&i)
%if &i = 1 %then %do;
data &out; set fine_out; run;
%end;
%else %do;
proc append base=&out data=fine_out; run;
%end;
%end;
%mend;
options mprint;
%ct(10)
This should be the output WORK.RESULT based on your data

Extract variable number of columns from beginning and end from a SAS dataset

I am having a SAS dataset where I want to keep, let's say, the firt 2 columns and last 4 columns, so to speak. In other words, only columns from beginning and from end.
data test;
input a b c d e f g h i j;
cards;
1 2 3 4 5 6 7 8 9 10
;
Initial output:
What I want is the following -
Output desired:
I checked on the net, people are trying something with varnum, as shown here, but I can't figure out. I don't want to use keep/drop, rather I want an automated way to solve this issue.
%DOSUBL can run code in a separate stream and be part of a code generation scheme at code submit (pre-run) time.
Suppose the requirement is to to slice the columns of a data set out based on meta data column position as indicated by varnum (i.e. places), and the syntax for places is:
p:q to select the range of columns whose varnum position is between p and q
multiple ranges can be specified, separated by spaces ()
a single column position, p, can be specified
negative values select the position downward from the highest position.
also, the process should honor all incoming data set options specified, i.e. keep= drop=
All the complex logic for implementing the requirements could be done in pure macro code using only %sysfunc and data functions such as open, varnum, varname, etc... That code would be pretty unwieldy.
The selection of names from meta data can be cleaner using SAS features such as Proc CONTENTS and Proc SQL executed within DOSUBL.
Example:
Macro logic is used to construct (or map) the filtering criteria statement based on varnum. Metadata retrieval and processing done with Procs.
%macro columns_slice (data=, places=);
%local varlist temp index p token part1 part2 filter joiner;
%let temp = __&sysmacroname._%sysfunc(monotonic());
%do index = 1 %to %sysfunc(countw(&places,%str( )));
%let token = %scan(&places,&index,%str( ));
%if NOT %sysfunc(prxmatch(/^(-?\d+:)?-?\d+$/,&token)) %then %do;
%put ERROR: &sysmacname, invalid places=&places;
%return;
%end;
%let part1 = %scan (%superq(token),1,:);
%let part2 = %scan (%superq(token),2,:);
%if %qsubstr(&part1,1,1) = %str(-) %then
%let part1 = max(varnum) + 1 &part1;
%if %length(&part2) %then %do;
%if %qsubstr(&part2,1,1) = %str(-) %then
%let part2 = max(varnum) + 1 &part2;
%end;
%else
%let part2 = &part1;
%let filter=&filter &joiner (varnum between &part1. and &part2.) ;
%let joiner = OR;
%end;
%put NOTE: &=filter;
%if 0 eq %sysfunc(dosubl(%nrstr(
options nonotes;
proc contents noprint data=&data out=&temp(keep=name varnum);
proc sql noprint;
select name
into :varlist separated by ' '
from &temp
having &filter
order by varnum
;
drop table &temp;
quit;
)))
%then %do;&varlist.%end;
%else
%put ERROR: &sysmacname;
%mend;
Using the slicer
* create sample table for demonstration;
data lotsa_columns(label='A silly 1:1 merge');
if _n_ > 10 then stop;
merge
sashelp.class
sashelp.cars
;
run;
%put %columns_slice (data=lotsa_columns, places=1:3);
%put %columns_slice (data=lotsa_columns, places=-1:-5);
%put %columns_slice (data=lotsa_columns, places=2:4 -2:-4 6 7 8);
1848 %put %columns_slice (data=lotsa_columns, places=1:3);
NOTE: FILTER=(varnum between 1 and 3)
Name Sex Age
1849 %put %columns_slice (data=lotsa_columns, places=-1:-5);
NOTE: FILTER=(varnum between max(varnum) + 1 -1 and max(varnum) + 1 -5)
Horsepower MPG_City MPG_Highway Wheelbase Length
1850 %put %columns_slice (data=lotsa_columns, places=2:4 -2:-4 6 7 8);
NOTE: FILTER=(varnum between 2 and 4) OR (varnum between max(varnum) + 1 -2 and max(varnum) + 1
-4) OR (varnum between 6 and 6) OR (varnum between 7 and 7) OR (varnum between 8 and 8)
Sex Age Height Make Model Type MPG_City MPG_Highway Wheelbase
Honoring options
data have;
array x(100);
array y(100);
array z(100);
run;
%put %columns_slice (data=have(keep=x:), places=2:4 8:10 -2:-4 -25:-27 -42);
1858 %put %columns_slice (data=have(keep=x:), places=2:4 8:10 -2:-4 -25:-27 -42);
NOTE: FILTER=(varnum between 2 and 4) OR (varnum between 8 and 10) OR (varnum between max(varnum)
+ 1 -2 and max(varnum) + 1 -4) OR (varnum between max(varnum) + 1 -25 and max(varnum) + 1 -27) OR
(varnum between max(varnum) + 1 -42 and max(varnum) + 1 -42)
x2 x3 x4 x8 x9 x10 x59 x74 x75 x76 x97 x98 x99
If you don't know number of variables, you can use this macro(you should specify num of first variables and num of last variables to keep in data set, libname and name of dataset):
%macro drop_vars(num_first_vars,num_end_vars,lib,dataset); %macro d;%mend d;
proc sql noprint;;
select sum(num_character,num_numeric) into:ncolumns
from dictionary.tables
where libname=upcase("&lib") and memname=upcase("&dataset");
select name into: vars_to_drop separated by ','
from dictionary.columns
where libname=upcase("&lib") and
memname=upcase("&dataset") and
varnum between %eval(&num_first_vars.+1) and %eval(&ncolumns-&num_end_vars);
alter table &lib..&dataset
drop &vars_to_drop;
quit;
%mend drop_vars;
%drop_vars(2,3,work,test);
Dataset before macro execution:
+---+---+---+---+---+---+---+---+---+----+
| a | b | c | d | e | f | g | h | i | j |
+---+---+---+---+---+---+---+---+---+----+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
+---+---+---+---+---+---+---+---+---+----+
Dataset after macro execution:
+---+---+---+---+----+
| a | b | h | i | j |
+---+---+---+---+----+
| 1 | 2 | 8 | 9 | 10 |
+---+---+---+---+----+
If the names follow a pattern just generate the list using the pattern. So if the names look like month names you just need to know one month to generate the other.
%let last_month = '01JAN2019'd ;
%let first_var = %sysfunc(intnx(month,&last_month,-12),monyy7.);
%let last_var = %sysfunc(intnx(month,&last_month,-0),monyy7.);
data want;
set have(keep= id1 id2 &first_var -- &last_var);
run;
If you cannot find a SAS function or format that generates the names in the style your variables use then write your own logic.
data _null_;
array month_abbr [12] $3 _temporary_ ('JAN' 'FEB' 'MAR' 'APR' 'MAY' 'JUN' 'JUL' 'AUG' 'SEP' 'OKT' 'NOV' 'DEK' );
last_month=today();
first_month=intnx('month',last_month,-12);
call symputx('first_var',catx('_',month_abbr[month(first_month)],year(first_month)));
call symputx('last_var',catx('_',month_abbr[month(last_month)],year(last_month)));
run;

Concatenate duplicate values

I have a table with some variables, say var1 and var2 and an identifier, and for some reasons, some identifiers have 2 observations.
I would like to know if there is a simple way to put back the second observation of the same identifier into the first one, that is
instead of having two observations, each with var1 var2 variables for the same identifier value
ID var1 var2
------------------
A1 12 13
A1 43 53
having just one, but with something like var1 var2 var1_2 var2_2.
ID var1 var2 var1_2 var2_2
--------------------------------------
A1 12 13 43 53
I can probably do that with renaming all my variables, then merging the table with the renamed one and dropping duplicates, but I assume there must be a simpler version.
Actually, your suggestion of merging the values back is probably the best.
This works if you have, at most, 1 duplicate for any given ID.
data first dups;
set have;
by id;
if first.id then output first;
else output dups;
run;
proc sql noprint;
create table want as
select a.id,
a.var1,
a.var2,
b.var1 as var1_2,
b.var2 as var2_2
from first as a
left join
dups as b
on a.id=b.id;
quit;
Another method makes use of PROC TRANSPOSE and a data-step merge:
/* You can experiment by adding more data to this datalines step */
data have;
infile datalines;
input ID : $2. var1 var2;
datalines;
A1 12 13
A1 43 53
;
run;
/* This step puts the var1 values onto one line */
proc transpose data=tab out=new1 (drop=_NAME_) prefix=var1_;
by id;
var var1;
run;
/* This does the same for the var2 values */
proc transpose data=tab out=new2 (drop=_NAME_) prefix=var2_;
by id;
var var2;
run;
/* The two transposed datasets are then merged together to give one line */
data want;
merge new1 new2;
by id;
run;
As an example:
data tab;
infile datalines;
input ID : $2. var1 var2;
datalines;
A1 12 13
A1 43 53
A2 199 342
A2 1132 111
A2 91913 199191
B1 1212 43214
;
run;
Gives:
ID var1_1 var1_2 var1_3 var2_1 var2_2 var2_3
---------------------------------------------------
A1 12 43 . 13 53 .
A2 199 1132 91913 342 111 199191
B1 1212 . . 43214 . .
There's a very simple way of doing this, using the IDGROUP function within PROC SUMMARY.
data have;
input ID $ var1 $ var2 $;
datalines;
A1 12 13
A1 43 53
;
run;
proc summary data=have nway;
class id;
output out=want (drop=_:)
idgroup(out[2] (var1 var2)=);
run;

Split SAS dataset

I have a SAS dataset that looks like this:
id | dept | ...
1 A
2 A
3 A
4 A
5 A
6 A
7 A
8 A
9 B
10 B
11 B
12 B
13 B
Each observation represents a person.
I would like to split the dataset into "team" datasets, each dataset can have a maximum of 3 observations.
For the example above this would mean creating 3 datasets for dept A (2 of these datasets would contain 3 observations and the third dataset would contain 2 observations). And 2 datasets for dept B (1 containing 3 observations and the other containing 2 observations).
Like so:
First dataset (deptA1):
id | dept | ...
1 A
2 A
3 A
Second dataset (deptA2)
id | dept | ...
4 A
5 A
6 A
Third dataset (deptA3)
id | dept | ...
7 A
8 A
Fourth dataset (deptB1)
id | dept | ...
9 B
10 B
11 B
Fifth dataset (deptB2)
id | dept | ...
12 B
13 B
The full dataset I'm using contains thousands of observations with over 50 depts. I can work out how many datasets per dept are required and I think a macro is the best way to go as the number of datasets required is dynamic. But I can't figure out the logic to create the datasets so that they have have a maximum of 3 observations. Any help appreciated.
Another version.
Compared to DavB version, it only processes input data once and splits it into several tables in single datastep.
Also if more complex splitting rule is required, it can be implemented in datastep view WORK.SOURCE_PREP.
data WORK.SOURCE;
infile cards;
length ID 8 dept $1;
input ID dept;
cards;
1 A
2 A
3 A
4 A
5 A
6 A
7 A
8 A
9 B
10 B
11 B
12 B
13 B
14 C
15 C
16 C
17 C
18 C
19 C
20 C
;
run;
proc sort data=WORK.SOURCE;
by dept ID;
run;
data WORK.SOURCE_PREP / view=WORK.SOURCE_PREP;
set WORK.SOURCE;
by dept;
length table_name $32;
if first.dept then do;
count = 1;
table = 1;
end;
else count + 1;
if count > 3 then do;
count = 1;
table + 1;
end;
/* variable TABLE_NAME to hold table name */
TABLE_NAME = catt('WORK.', dept, put(table, 3. -L));
run;
/* prepare list of tables */
proc sql noprint;
create table table_list as
select distinct TABLE_NAME from WORK.SOURCE_PREP where not missing(table_name)
;
%let table_cnt=&sqlobs;
select table_name into :table_list separated by ' ' from table_list;
select table_name into :tab1 - :tab&table_cnt from table_list;
quit;
%put &table_list;
%macro loop_when(cnt, var);
%do i=1 %to &cnt;
when ("&&&var.&i") output &&&var.&i;
%end;
%mend;
data &table_list;
set WORK.SOURCE_PREP;
select (TABLE_NAME);
/* generate OUTPUT statements */
%loop_when(&table_cnt, tab)
end;
run;
You could try this:
%macro split(inds=,maxobs=);
proc sql noprint;
select distinct dept into :dept1-:dept9999
from &inds.
order by dept;
select ceil(count(*)/&maxobs.) into :numds1-:numds9999
from &inds.
group by dept
order by dept;
quit;
%let numdept=&sqlobs;
data %do i=1 %to &numdept.;
%do j=1 %to &&numds&i;
dept&&dept&i&&j.
%end;
%end;;
set &inds.;
by dept;
if first.dept then counter=0;
counter+1;
%do i=1 %to &numdept.;
%if &i.=1 %then %do;
if
%end;
%else %do;
else if
%end;
dept="&&dept&i" then do;
%do k=1 %to &&numds&i.;
%if &k.=1 %then %do;
if
%end;
%else %do;
else if
%end;
counter<=&maxobs.*&k. then output dept&&dept&i&&k.;
%end;
end;
%end;
run;
%mend split;
%split(inds=YOUR_DATASET,maxobs=3);
Just replace the INDS parameter value in the %SPLIT macro call to the name of your input data set.