my case is finding value in this same table (for Variable2 from Value). Can you help me? I need a SAS code for this case.
I tried to solve this way:
data example2;
input Variable Value Variable2;
datalines;
V1 3 V2
V2 6 V1
V3 4 V5
V4 1 V1
V5 5 V2
;
proc sort data=example2;
by Variable;
run;
data example19;
set example2;
merge example2 example2 (keep=Value Variable2 rename=(Value=new));
run;
below code should work for your scenario. I have checked for couple of edge case scenarios and it works as expected. Just check once again if it fails any edge case other this
data have;
input variable $ value variable2 $;
datalines;
V1 3 V2
V2 6 V1
V3 4 V5
V4 1 V1
V5 5 V2
;
proc sql;
create table want as
select a.variable ,a.value, a.variable2 , b.value as value2
from have a
left join
have b
on a.variable2 =b.variable
order by variable;
proc sql;
select * from want;
/* once edge case scenario where you do not have variable for variable2 it
works as expected giving a null value*/
data have1;
input variable $ value variable2 $;
datalines;
V1 3 V2
V2 6 V1
V3 4 V5
V4 1 V1
V5 5 V2
V9 8 V7
;
So you want to take the value of VARIABLE2 and use it to find the row where VARIABLE has the same value? So to do that with a MERGE statement you will want to merge by the value of VARIABLE2.
So first sort the data by VARIABLE2.
proc sort data=example2;
by Variable2;
run;
Then make a version that just has the first two columns, but rename them so the variable name column matches but the value columns name doesn't conflict.
proc sort data=example2 (keep=Value Variable rename=(Value=New Variable=Variable2))
out=example2b
;
by variable2;
run;
Now you can just merge the two tables. But you only want to keep the original set of rows so use the IN= dataset option.
data want ;
merge example2(in=in1) example2b;
by variable2;
if in1;
run;
If you want a more efficient method you could look into using a data step HASH object. You could load the variable/Value pairs into a hash and then use the FIND() method to check for the value associated with Variable2. If found then copy the value into a new variable. Note that you will need then re-find the value associated with VARIABLE since the previous FIND() will have overwritten VALUE.
data want ;
set example2;
if _n_=1 then do;
declare hash h(dataset: 'example2');
h.definekey('Variable');
h.definedata('Value');
h.definedone();
end;
if not h.find(key: Variable2) then new=value;
h.find();
run;
Related
I have a dataset looking something like this:
var1 var2 count
cat1 no 1
cat1 yes 4
cat1 unkown 3
cat2 no 7
cat2 yes 3
cat2 unkown 5
cat3 no 2
cat3 yes 9
cat3 unkown 0
What I want to do is combine var1 & var2 into new variable where first row is from var1 and the others from var2. So it supposed to look like:
comb count
cat1
no 1
yes 4
unkown 3
cat2
no 7
yes 3
unkown 5
cat3
no 2
yes 9
unkown 0
Any help would be highly appreciated!
It's quite simple.
Here the solution :
1) create the dataset source:
data testa;
infile datalines dsd dlm=',';
input var1 : $200. var2 : $200. count : 8. ;
datalines;
cat1,no,1,
cat1,yes,4,
cat1,unkown,3,
cat2,no,7,
cat2,yes,3,
cat2,unkown,5,
cat3,no,2,
cat3,yes,9,
cat3,unkown,0,
;
run;
2) Selection of var list : cat1|cat2|cat3
proc sql;
select distinct(var1) into: list_var separated by '|' from testa;
run;
3) Process the var list one by one
%macro processListVar(list_var);
data want;
run;
%let k=1;
%do %while (%qscan(&list_var, &k,|) ne );
%let var = %scan(&list_var, &k,|);
data testb(drop=var1 rename=(var2=comb));
set testa;
N=_N_+1+&k;
where var1="&var";
run;
data testc;
N=1+&k;
comb="&var";
count=.;
run;
data tmp;
set testb testc;
run;
proc sort data=tmp out=teste;
by N;
run;
data want;
set want teste;
run;
%put var=&var;
%let k = %eval(&k + 1);
%end;
%mend processListVar;
%processListVar(&list_var);
4) At the end you get the result in dataset want.
You have to exclude finaly the N column like that :
data want_cleaned (drop=N);
set want;
run;
5) More explanation on the code.
a. The key problem was to keep the order between cat1,cat2,cat3.
b. So I divided the problem by each dataset cat1, cat2, .. and created a %do %while to loop through categories.
c. We use the column N, to count the number of line (like an index), and then we can sort on this column, to keep the order.
d. For example : the first var cat1 : We select the column var2, we rename it like the comb column. We drop the var1 column. It create the testb dataset.
The testb dataset is used to create an index (column N) and we create the first line of our subdataset (N=1+&k) in testc. &k is used through all subdatasets. Like that the index is continuing between subdatasets. (without interfering each others). We make a merge between testb and testc. The dataset tmp contains all info needed for cat1. Then we merge all subdatasets in dataset want.
So to summary, we create a loop, and we merge the datasets together at the end. We make a sort on the column N, to display lines in the order you wanted.
Regards,
I'm not very familiar with Do Loops in SAS and was hoping to get some help. I have data that looks like this:
Product A: 1
Product A: 2
Product A: 4
I'd like to transpose (easy) and flag that Product A: 3 is missing, but I need to do this iteratively to the i-th degree since the number of products is large.
If I run the transpose part in SAS, my first column will be 1, second column will be 2, and third column will be 4 - but I'd really like the third column to be missing and the fourth column to be 4.
Any thoughts? Thanks.
Get some sample data:
proc sort data=sashelp.iris out=sorted;
by species;
run;
Determine the largest column we will need to transpose to. Depending on your situation you may just want to hardcode this value using a %let max=somevalue; statement:
proc sql noprint;
select cats(max(sepallength)) into :max from sorted;
quit;
%put &=max;
Transpose the data using a data step:
data want;
set sorted;
by species;
retain _1-_&max;
array a[1:&max] _1-_&max;
if first.species then do;
do cnt = lbound(a) to hbound(a);
a[cnt] = .;
end;
end;
a[sepallength] = sepallength;
if last.species then do;
output;
end;
keep species _1-_&max;
run;
Notice we are defining an array of columns: _1,_2,_3,..._max. This happens in our array statement.
We then use by-group processing to populate these newly created columns for a single species at a time. For each species, on the first record, we clear the array. For each record of the species, we populate the appropriate element of the array. On the final record for the species output the array contents.
You need a way to tell SAS that you have 4 products and the values are 1-4. In this example I create dummy ID with the needed information then transpose using ID statement to name new variables using the value of product.
data product;
input id product ##;
cards;
1 1 1 2 1 4
2 2 2 3
;;;;
run;
proc print;
run;
data productspace;
if 0 then set product;
do product = 1 to 4;
output;
end;
stop;
run;
data productV / view=productV;
set productspace product;
run;
proc transpose data=productV out=wide(where=(not missing(id))) prefix=P;
by id;
var product;
id product;
run;
proc print;
run;
I want to use proc compare to update dataset on a daily basis.
work.HAVE1
Date Key Var1 Var2
01Aug2013 K1 a 2
01Aug2013 K2 a 3
02Aug2013 K1 b 4
work.HAVE2
Date Key Var1 Var2
01Aug2013 K1 a 3
01Aug2013 K2 a 3
02Aug2013 K1 b 4
03Aug2013 K2 c 1
Date and Key are uniquely determine one record.
How can I use the above two tables to construct the following
work.WANT
Date Key Var1 Var2
01Aug2013 K1 a 3
01Aug2013 K2 a 3
02Aug2013 K1 b 4
03Aug2013 K2 c 1
I don't want to delete the previous data and then rebuild it. I want to modify it via append new records at the bottom and adjust the values in VAR1 or VAR2.
I'm struggling with proc compare but it just doesn't return what I want.
proc compare base=work.HAVE1 compare=work.HAVE2 out=WORK.DIFF outnoequal outcomp;
id Date Key;
run;
This will give you new and changed (unequal records) in single dataset WORK.DIFF. You'll have to distinguish new vs changed yourself.
However, what you want to achieve is actually a MERGE - inserts new, overwrites existing, though maybe due to performance reasons etc. you don't want to re-create the full table.
data work.WANT;
merge work.HAVE1 work.HAVE2;
by Date Key;
run;
Edit1:
/* outdiff option will produce records with _type_ = 'DIF' for matched keys */
proc compare base=work.HAVE1 compare=work.HAVE2 out=WORK.RESULT outnoequal outcomp outdiff;
id Date Key;
run;
data WORK.DIFF_KEYS; /* keys of changed records */
set WORK.RESULT;
where _type_ = 'DIF';
keep Date Key;
run;
/* split NEW and CHANGED */
data
WORK.NEW
WORK.CHANGED
;
merge
WORK.RESULT (where=( _type_ ne 'DIF'));
WORK.DIFF_KEYS (in = d)
;
by Date Key;
if d then output WORK.CHANGED;
else output WORK.NEW;
run;
Edit2:
Now you can just APPEND the WORK.NEW to target table.
For WORK.CHANGED - either use MODIFY or UPDATE statement to update the records.
Depending on the size of the changes, you can also think about PROC SQL; DELETE to delete old records and PROC APPEND to add new values.
All a PROC COMPARE will do will tell you the differences between 2 datasets. To achieve your goal you need to use an UPDATE statement in a data step. This way, values in HAVE1 are updated with HAVE2 where the date and key match, or a new record inserted if there are no matches.
data have1;
input Date :date9. Key $ Var1 $ Var2;
format date date9.;
datalines;
01Aug2013 K1 a 2
01Aug2013 K2 a 3
02Aug2013 K1 b 4
;
run;
data have2;
input Date :date9. Key $ Var1 $ Var2;
format date date9.;
datalines;
01Aug2013 K1 a 3
01Aug2013 K2 a 3
02Aug2013 K1 b 4
03Aug2013 K2 c 1
;
run;
data want;
update have1 have2;
by date key;
run;
I have 6 identical SAS data sets. They only differ in terms of the values of the observations.
How can I create one output data, which finds the maximum value across all the 6 data sets for each cell?
The update statement seems a good candidate, but it cannot set a condition.
data1
v1 v2 v3
1 1 1
1 2 3
data2
v1 v2 v3
1 2 3
1 1 1
Result
v1 v2 v3
1 2 3
1 2 3
If need be the following could be automated by "PUT" statements or variable arrays.
***ASSUMES DATA SETS ARE SORTED BY ID;
Data test;
do until(last.id);
set a b c;
by id;
if v1 > updv1 then updv1 = v1;
if v2 > updv2 then updv2 = v2;
if v3 > updv3 then updv3 = v3;
end;
drop v1-v3;
rename updv1-updv3 = v1-v3;
run;
To provide a more complete solution to Rico's question(assuming 6 datasets e.g. d1-d6) one could do it this way:
Data test;
array v(*) v1-v3;
array updv(*) updv1-updv3;
do until(last.id);
set d1-d6;
by id;
do i = 1 to dim(v);
if v(i) > updv(i) then updv(i) = v(i);
end;
end;
drop v1-v3;
rename updv1-updv3 = v1-v3;
run;
proc print;
var id v1-v3;
run;
See below. For a SAS beginner might be too complex. I hope the comments do explain it a bit.
/* macro rename_cols_opt to generate cols_opt&n variables
- cols_opt&n contains generated code for dataset RENAME option for a given (&n) dataset
*/
%macro rename_cols_opt(n);
%global cols_opt&n max&n;
proc sql noprint;
select catt(name, '=', name, "&n") into: cols_opt&n separated by ' '
from dictionary.columns
where libname='WORK' and memname='DATA1'
and upcase(name) ne 'MY_ID_COLUMN'
;
quit;
%mend;
/* prepare macro variables = pre-generate the code */
%rename_cols_opt(1)
%rename_cols_opt(2)
%rename_cols_opt(3)
%rename_cols_opt(4)
%rename_cols_opt(5)
%rename_cols_opt(6)
/* create macro variable keep_list containing names of output variables to keep (based on DATA1 structure, the code expects those variables in other tables as well */
proc sql noprint;
select trim(name) into: keep_list separated by ' '
from dictionary.columns
where libname='WORK' and memname='DATA1'
;
quit;
%put &keep_list;
/* macro variable maxcode contains generated code for calculating all MAX values */
proc sql noprint;
select cat(trim(name), ' = max(of ', trim(name), ":)") into: maxcode separated by '; '
from dictionary.columns
where libname='WORK' and memname='DATA1'
and upcase(name) ne 'MY_ID_COLUMN'
;
quit;
%put "&maxcode";
data result1 / view =result1;
merge
data1 (in=a rename=(&cols_opt1))
data2 (in=b rename=(&cols_opt2))
data3 (in=b rename=(&cols_opt3))
data4 (in=b rename=(&cols_opt4))
data5 (in=b rename=(&cols_opt5))
data6 (in=b rename=(&cols_opt6))
;
by MY_ID_COLUMN;
&maxcode;
keep &keep_list;
run;
/* created a datastep view, now "describing" it to see the generated code */
data view=result1;
describe;
run;
Here's another attempt that is scalable against any number of datasets and variables. I've added in an ID variable this time as well. Like the answer from #vasja, there are some advanced techniques used here. The 2 solutions are in fact very similar, I've used 'call execute' instead of a macro to create the view. My solution also requires the dataset names to be stored in a dataset.
/* create dataset of required dataset names */
data datasets;
input ds_name $;
cards;
data1
data2
;
run;
/* dummy data */
data data1;
input id v1 v2 v3;
cards;
10 1 1 1
20 1 2 3
;
run;
data data2;
input id v1 v2 v3;
cards;
10 1 2 3
20 1 1 1
;
run;
/* create dataset, macro list and count of variables names */
proc sql noprint;
create table variables as
select name as v_name from dictionary.columns
where libname='WORK' and upcase(memname)='DATA1' and upcase(name) ne 'ID';
select name, count(*) into :keepvar separated by ' ',
:numvar
from dictionary.columns
where libname='WORK' and upcase(memname)='DATA1' and upcase(name) ne 'ID';
quit;
/* create view that joins all datasets, renames variables and calculates maximum value per id */
data _null_;
set datasets end=last;
if _n_=1 then call execute('data data_all / view=data_all; merge');
call execute (trim(ds_name)|| '(rename=(');
do i=1 to &numvar.;
set variables point=i;
call execute(trim(v_name)||'='||catx('_',v_name,_n_));
end;
call execute('))');
if last then do;
call execute('; by id;');
do i=1 to &numvar.;
set variables point=i;
call execute(trim(v_name)||'='||'max(of '||trim(v_name)||':);');
end;
call execute('run;');
end;
run;
/* create dataset of maximum values per id per variable */
data result (keep=id &keepvar.);
set data_all;
run;
This is a newbie SAS question. I have a dataset with numerical variables v1-v120, V and a categorical variable Z(with say three possible values). For each possible value of Z, I would like to get another set of variables w1-w120, where w{i}=sum(v{i}}/V, where the sum is a sum over a given value of Z. Thus I am looking for 3*120 matrix in this case. I can do this in data step, but would like to do it by Proc SQL or Proc MEANS, as the number of categorical variables in the actual dataset is moderately large. Thanks in advance.
Here's a solution using proc sql. You could probably also do something similar with proc means using an output dataset and a 'by' statement.
data t1;
input z v1 v2 v3;
datalines;
1 2 3 4
2 3 4 5
3 4 5 6
1 7 8 9
2 4 7 9
3 2 2 2
;
run;
%macro listForSQL(varstem1, varstem2, numvars);
%local numWithCommas;
%let numWithCommas = %eval(&numvars - 1);
%local i;
%do i = 1 %to &numWithCommas;
mean(&varstem1.&i) as &varstem2.&i,
%end;
mean(&varstem1.&numvars) as &varstem2.&numvars
%mend listForSQL;
proc sql;
create table t2 as
select
z,
%listForSQL(v, z, 3)
from t1
group by z
;
quit;
It's easy to do this with proc means. Using the t1 data set from Louisa Grey's answer:
proc means data=t1 nway noprint;
class z;
var v1-v3;
output out=t3 mean=w1-w3;
run;
This creates an table of results that match the SQL results.