Suppose I have two files named "test" and "lookup".
The file "test" contains the following information:
COL1 COL2
az ab
fc ll
gc ms
cc ds
And the file "lookup" has:
VAR
ll
dd
cc
ab
ds
I want to find those observations, which are in "test" but not in "lookup" and to replace them with missing values. Here is my code:
data want; set test;
array COL[2] COL1 COL2;
do n=1 to 2;
if COL[n] in lookup.VAR then COL[n]=COL[n];
else COL[n]=.;
end;
run;
I tried the above code. But ERROR shows that "Expecting an relational or arithmetic operator".
My question is how to refer a variable from another file?
First, grab the %create_hash() macro from this post.
You need to use a hash object to achieve what you are looking for.
The return code from a hash lookup is zero when found and non-zero when not found.
Character missing values are not . but "".
data want;
set have;
if _n_ = 1 then do;
%create_hash(lu,var,var,"lookup");
end;
array COL[2] COL1 COL2;
do n=1 to 2;
var = col[n];
rc = lu.find();
if rc then
col[n] = "";
end;
drop rc var n;
run;
Here is an alternative approach using proc sql:
proc sql;
create table want as
select case when col1 in (select var from lookup) then '' else col1 end as col1,
case when col2 in (select var from lookup) then '' else col2 end as col2
from test;
quit;
Related
I have many datasets for each month with the same name, changing just the end with specific month so for instance my datasets that i am calling with this code:
TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
are called "TEMPCAAD.LIFT_MODEL_V1_202021", "TEMPCAAD.LIFT_MODEL_V1_202022" and so on...
I am trying to append all datasets but some of them doesn't exist, so when i run the following code I get the error
Dataset "TEMPCAAD.LIFT_MODEL_V1_202022" does not exist.
%let currentmonth = &anomes_scores;
%let previousyearmonth = &anomes_x12;
data _null_;
length string $1000;
cur_month = input("&previousyearmonth.01",yymmdd8.);
do until (cur_month > input("¤tmonth.01",yymmdd8.));
string = catx(' ',trim(string),'TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
cur_month = intnx('month',cur_month,1,'b');
end;
call symput('mydatasets',trim(string));
%put &mydatasets;
run;
data WORK.LIFTS_U6M;
set &mydatasets.;
run;
How can I append only existing datasets?
Instead of looping on every file to see whether it exist or not, why don't you just extract all the dataset names from dictionary.tables?
libname TEMPCAAD "/home/kermit/TEMPCAAD";
data tempcaad.lift_model_v1_202110 tempcaad.lift_model_v1_202111 tempcaad.lift_model_v1_202112;
id = 1;
output tempcaad.lift_model_v1_202110;
id = 2;
output tempcaad.lift_model_v1_202111;
id = 3;
output tempcaad.lift_model_v1_202112;
run;
%let nome_modelo = MODEL;
%let versao_modelo = V1;
proc sql;
select strip("TEMPCAAD."||memname) into :dataset separated by " "
from dictionary.tables
where libname="TEMPCAAD" and memname like "LIFT_&NOME_MODELO._&VERSAO_MODELO.%";
quit;
data want;
set &dataset.;
run;
You can easily tweak the where statement to only extract the data that you wish to append. Just remember to put double quotes if you specify a macro-variable in it.
I stumbled upon the following code snippet in which the variable top3 has to be filled from a table have rather than from an array of numbers.
%let top3 = 14 15 42; /* This should be made obsolete.. */
%let no = 3;
proc sql;
create table want as
select *
from (select x, y from foo) a
%do i = 1 %to &no.;
%let current = %scan(&top3.,&i.); /* What do I need to put here? */
left join (select x, y from bar where z=¤t.) row_¤t.
on a.x = row_¤t..x
%end;
;
quit;
The table have contains the xs from the string and looks as follows:
i x
1 14
2 15
3 42
I am now wondering how I should modify the %let current = ... line such that current is populated from the table have. I know how to populate a macro variable using proc sql with select .. into, but I am afraid that the way I am going right now is fully against SAS philosophy.
It looks like you're more or less transposing something. If that's the case, this is doable in macro/sql pretty easily.
First, here's the simple version - no macro.
proc sql;
create table class_t as
select * from (
select name from sashelp.class ) class
left join (
select name, age as age_Alfred
from sashelp.class
where name='Alfred') Alfred
on class.name = Alfred.name
;
quit;
We grab the value of age from the Alfred row and put it on the main join. This isn't exactly what you're doing, but it seems similar. (I'm just using one table, but you can of course use two here.)
Now, how do we extend this to be table-driven and not handwritten? Macros!
First, here's the macro - just taking the Alfred bit and making it generic.
%macro joiner(name=);
left join (
select name, age as age_&name.
from sashelp.class
where name="&name.") &name.
on class.name = &name..name
%mend joiner;
Second, we look at this and see two things we need to put into macro lists: the SELECT variable list (we'll get one new variable for each call), and the JOIN list.
proc sql;
select cats('%joiner(name=',name,')')
into :joinlist separated by ' '
from sashelp.class;
select cats(name,'.age_',name)
into :selectlist separated by ','
from sashelp.class;
quit;
And then, we just call it!
proc sql;
create table class_t as
select class.name,&selectlist. from (
select name from sashelp.class) class
&joinlist.
;
quit;
Now, your dataset you call the macro lists from is perhaps the dataset with the 3 rows in it you have above ("have"). The dataset you actually get the appending data from is some other dataset ("bar"), right? And then the ones you join to is perhaps a third dataset ("foo"). Here I just use the one, for simplicity, but the concept is the same, just different sources.
When the lookup data is in a table you can perform a three way join without any need for SAS Macro. You don't provide any data so the example will mock some.
Example:
Suppose a master record has several associated detail records, and the detail records contain a z value used for selection into a result set per a wanted z lookup table.
data masters;
call streaminit(2020);
do id = 1 to 100;
do x = 1 to 100;
m_rownum + 1;
code = rand('integer', 10,45);
output;
end;
end;
run;
data details;
call streaminit(2020);
do date = 1 to 20;
do x = 1 to 100;
do rep = 1 to 5;
d_rownum + 1;
amount = rand('integer', 100,200);
z = rand('integer', 10,45);
output;
end;
end;
end;
run;
data zs;
input z ##; datalines;
14 15 42
;
proc sql;
create table want as
select
m_rownum
, d_rownum
, masters.id
, masters.x
, masters.code
, details.z
, details.date
, details.amount
from
masters
left join
details
on
details.x = masters.x
inner join
zs
on
zs.z = details.z
order by
masters.id, masters.x, details.z, details.date
;
quit;
I have a SAS dataset whose column layout is like this:
Col1 Col2 Col3
A_jan2018 A_feb2018 A_mar2018
B_jan2018 B_feb2018 B_mar2018
C_jan2018 C_feb2018 C_mar2018
I need to re-order the columns that start with A or B or C in such a format --
Col1 Col2 Col3
A_Jan2018 B_Jan2018 C_Jan2018
A_Feb2018 B_Feb2018 C_Feb2018
A_Mar2018 B_Mar2018 C_Mar2018
The A,B,C prefixes need not be in any sorting order (meaning they can start with anything), but my requirement is to re-order them based on the month-year (meaning B_Jan2018 A_Feb2018 C_2018 is okay).
Is there any way of achieving this in SAS?
Change the data structure so that you have a long dat set instead of
a short data set. (DATA STEP)
Separate out the prefix from date portion (DATA STEP)
Sort into your desired order (PROC SORT)
Transpose to desired format (PROC TRANSPOSE)
%*create sample data;
data have;
informat col1 col2 col3 $10.;
input Col1 $ Col2 $ Col3 $;
cards;
A_jan2018 A_feb2018 A_mar2018
B_jan2018 B_feb2018 B_mar2018
C_jan2018 C_feb2018 C_mar2018
;
run;
%*Make it wide table;
data _long;
set have;
array _col(3) col1-col3;
do i=1 to 3;
prefix=scan(_col(i), 1, "_");
date=input(scan(_col(i), 2, "_"), anydtdte.);
value=catx('_', prefix, put(date, monyy7.));
output;
end;
format date date9.;
run;
%*Sort by desired output;
proc sort data=_long;
by date prefix;
run;
%* transpose to the desired format;
proc transpose data=_long out=want1;
by i;
var value;
run;
If your data is exactly as posted and the output is transposed exactly then this also works but its entirely reliant on the source data being as specified and sorted correctly.
proc transpose data=have out=want2;
var col1-col3;
run;
I'm working on SAS and I'm getting values from data-sets and saving them in SAS into variables.
Sample data:
table
RK | ID | column_1 | column_2
1 | one| value_1 |
2 | two| value_1 | value_2
proc sql noprint;
select column_1
into: variable_1
from table
where RK = 1;
select column_2
into: variable_2
from table
where RK = 1;
quit;
Now I want to use those variables in my report and if there is no data in my into variables I want to print a blank space. as
%put &variable_1;
%put &variable_2;
Result
value_1
&variable_2
if there is no value in my into variable I want it to print nothing but a blank space to my log or in my report.
How can I do this?
Expected result
value_1
(A blank space)
%let variable_1=;
%let variable_2=;
proc sql noprint;
select column_1
into: variable_1
from table
where RK = 1;
select column_2
into: variable_2
from table
where RK = 1;
quit;
If the select statement does not return any rows (empty source table or no rows match where condition) then the macro variable(s) named in the into clause are not created. Just use a %let statement to set the default value before running the select statement.
proc sql noprint ;
%let infant_list=;
select name
into :infant_list separated by ' '
from sashelp.class
where age < 5
;
quit;
%put &=infant_list;
If you really want a macro variable to contain a single space instead of nothing then you will need to use macro quoting.
%let infant_list=%str( );
Using coalescec:
proc sql noprint;
select coalesecec(column_1," ")
into: variable_1
from table
where RK = 1;
select coalesecec(column_2," ")
into: variable_2
from table
where RK = 1;
quit;
Try this out:
proc sql noprint;
select case when column_1 is null then " " else column_1 end
into: variable_1
from table
where RK = 1;
select case when column_2 is null then " " else column_2 end
into: variable_2
from table
where RK = 1;
quit;
%put &variable_1;
%put &variable_2;
LOG
Named macro value logging is a shortcut syntax
%put &=variable_1; /* is almost the same as */
%put variable_1=&variable_1;
If the variable_1 contains unquoted semi-colons, or other confounding programmatic segments, it is better to log using superq. Macro variables can also be shown more clearly in the log by bracketing the value resolution. This will let you see leading and trailing spaces.
%put NOTE: variable_1=[%superq(variable_1)];
INTO
You can select specify more than one variable in the INTO clause
select a, b
into :a, :b
Basic INTO form does not trim values, and the target (macro variable) value length is based on source variable, computation length, or length as specified by the select items length= option.
select a length=50, substr(b,1,2)
into :a_50, :b_2
/* length of macro variable 'variable_1' will be the same length as column_1,
regardless of the " ". If the string literal was longer than column_1,
the computation length is the string literals length.
*/
select case when column_1 is null then " " else column_1 end
into: variable_1
There is additional syntax and keywords for INTO targets
trimmed value into a macro variable
select A
INTO :A trimmed
trimmed value from multiple rows into multiple (range of) macro variables
select A
INTO :A1-A99 /* populates range of &SQLOBS macro variables if <99 */
trimmed value from multiple rows into single macro variable
select A
INTO :A_csv separated by ','
NOTE: A trimmed blank value transferred to macro becomes a zero-length string.
NOTE: A character null in SAS data set is a blank value, so you don't necessarily need a CASE or coalesce
The blank situation for variable_1 could be
select column_1 into: variable_1
select column_1 into: variable_1 trimmed
--- LOG ---
NOTE: variable_1=[ ];
NOTE: variable_1=[];
I speculate that the length of the macro variable value is determined during the SQL statement compilation/planning time, and can not be changed during execution time (meaning the target length won't change according to values found)
Case 1 - no rows selected
When the where selects no rows, there will be no cause for the INTO clause to operate, and thus no macro variables will be created. If the macro variables already existed before the query, the values will remain unchanged. Thus you should initialize each macro variable listed in the INTO clause prior to the query (per Azeem112).
%let variable_1=;
%let variable_2=;
proc sql noprint;
If your really need a single space, instead of nothing at all, initialize thusly
%let variable_1=%str( );
%let variable_2=%str( );
proc sql noprint;
Case 2 - rows selected
The value from the selected item is moved into a macro variable. The macro value has the same untrimmed length of the item, or if trimmed, the length of the item, or 0 if the item is a blank value. If you need a single space in the macro variable for the blank value case you could do
select column_1 into: variable_1 trimmed
...;
%let variable_1 = %qsysfunc(ifc(%length(%superq(variable_1)),%superq(variable_1),%str( )));
Got the following example
I'm trying to know if any part of string in the column nomvar in table tata does exist in col1 in table toto and if yes, give me the definition using col2.
For I2010,RT,IS-IPI,F_CC11_X_CCXBA, I would have in the column intitule "yes,toto,tata,well"
I thought about using a proc sql with an insert and a select but I have two tables and I would need to do a join.
In the same time, I thought to have everything in one table but I'm unsure if it is a good idea.
Any suggestions are welcomed as I'm deeply stuck.
The SAS data step hash object is a nice way to do this. It allows you to read the Toto table into memory and it becomes a lookup table for you. Then you just walk the string from the Tata table using the scan function, tokenize, and lookup the col2 value. Here is the code.
By the way, turning table Tata into a structure like Toto and performing join is a perfectly rational way to do this, too.
/*Create sample data*/
data toto;
length col1 col2 $ 100;
col1='I2010';
col2='yes';
output;
col1='RT';
col2='toto';
output;
col1='IS-IPI';
col2='tata';
output;
col1='F_CC11_X_CCXBA';
col2='well';
output;
run;
data tata;
length nomvar intitule $ 100;
nomvar='I2010,RT,IS-IPI,F_CC11_X_CCXBA';
run;
/*Now for the solution*/
/*You can do this lookup easily with a data step hash object*/
data tata;
set tata;
length col1 col2 token $ 100;
drop col1 col2 token i sepchar rc;
/*slurp the data in from the Toto data set into the hash*/
if (_n_ = 1) then do;
declare hash toto_hash(dataset: 'work.toto');
rc = toto_hash.definekey('col1');
rc = toto_hash.definedata('col2');
toto_hash.definedone();
end;
/*now walk the tokens in data set tata and perform the lookup to get each value*/
i = 1;
sepchar = ''; /*this will be a comma after the first iteration of the loop*/
intitule = '';
do until (token = '');
/*grab nth item in the comma-separated list*/
token = scan(nomvar, i, ',');
/*lookup the col2 value from the toto data set*/
rc = toto_hash.find(key:token);
if (rc = 0) then do;
/*lookup successful so tack the value on*/
intitule = strip(intitule) || sepchar || col2;
sepchar = ',';
end;
i = i + 1;
end;
run;
Assuming your data is all structured like this (you're looking at the different strings in between . characters) I would think the easiest way is to normalize TATA (splitting by .) and then doing a straight join, then (if you need to) transposing back. (It might be better to leave it vertical - very likely you would find this more useful structure for analysis.)
data tata_v;
set tata;
call scan(nomvar,1,position,length,'.');
do _i = 1 by 1 while position le 0);
nomvar_out = substr(nomvar,position,length);
output;
call scan(nomvar,_i+1,position,length,'.');
end;
run;
Now you can join on nomvar_out and then (if needed) recombine things.