Check if a value exists in a dataset or column

Check if a value exists in a dataset or column - sas

I have a macro program with a loop (for i in 1 to n). With each i i have a table with many columns - variables. In these columns, we have one named var (who has 3 possible values: a b and c).
So for each table i, I want to check his column var if it exists the value "c". If yes, I want to export this table into a sheet of excel. Otherwise, I will concatenate this table with others.
Can you please tell me how can I do it?

Ok, in your macro at step i you have to do something like this
proc sql;
select min(sum(case when var = 'c' then 1 else 0 end),1) into :trigger from table_i;
quit;
then, you will get macro variable trigger equal 1 if you have to do export, and 0 if you have to do concatenetion. Next, you have to code something like this
%if &trigger = 1 %then %do;
proc export data = table_i blah-blah-blah;
run;
%end;
%else %do;
data concate_data;
set concate_data table_i;
run;
%end;

Without knowing the whole nine yard of your problem, I am at risk to say that you may not need Macro at all, if you don't mind exporting to .CSV instead of native xls or xlsx. IMHO, if you do 'Proc Export', meaning you can't embed fancy formats anyway, you 'd better off just use .CSV in most of the settings. If you need to include column headings, you need to tap into metadata (dictionary tables) and add a few lines.
filename outcsv '/share/test/'; /*define the destination for CSV, change it to fit your real settings*/
/*This is to Cat all of the tables first, use VIEW to save space if you must*/
data want1;
set table: indsname=_dsn;
dsn=_dsn;
run;
/*Belowing is a classic 2XDOW implementation*/
data want;
file outcsv(test.csv) dsd; /*This is your output CSV file, comma delimed with quotes*/
do until (last.dsn);
set want1;
by dsn notsorted; /*Use this as long as your group is clustered*/
if var='c' then _flag=1; /*_flag value will be carried on to the next DOW, only reset when back to top*/
end;
do until (last.dsn);
set want1;
by dsn notsorted;
if _flag=1 then put (_all_) (~); /*if condition meets, output to CSV file*/
else output; /*Otherwise remaining in the Cat*/
end;
drop dsn _flag;
run;

Related

SAS - Loop through rows and calculate MD5

I want to sweep each table in a libname and calculate a hash over each row.
For that purpose, i have already a table with libname, memname, concatenated columns with ',' and number of observations
libname
memname
columns
num_obs
lib_A
table_A
col1a,col2a...colna
1
lib_A
table_B
col1b,col2b...colnb
2
lib_B
table_C
col1c,col2c...colnc
1
I first get all data into ranged macro variables (i think its easier to work, but could be wrong, ofc)
proc sql;
select libname, memname, columns, num_obs
into :lib1-, :tab1-, :column1-, :sqlobs1-
from have
where libname="&sel_livraria"; /*macro var = prompt from user*/
quit;
Just for developing guideline i made the code just to check one specific table without getting the row number of it since with a simple counter doesn't work (i get the order of the rows mess up each time i run) and it works for that purpose
%let lib=lib_A;
%let tab=table_B;
%let columns=col1b,col2b,colnb;
data want;
length check $32.;
format check $hex32.;
set &lib..&tab;
libname="&lib";
memname="&tab";
check = md5(cats(&columns));
hash = put(check,$hex32.);
keep libname memname hash;
put hash;
put _all_;
run;
So, what’s the best approach for getting a MD5 from each row (same order as tables) of all tables in a libname? I saw problems i couldn’t overcame using data steps, procs or macros.
The result i wanted if lib_A was selected in prompt were something like:
libname
memname
obs_row
hash
lib_A
table_A
1
64A29CCA15F53C83A9583841294A26AA
lib_A
table_B
1
80DAC7B9854CF71A67F9C00A7EC4D9EF
lib_A
table_B
2
0AC44CD79DAB2E33C93BB2312D3A9A40
Need some help.
Tks in advance.

You're pretty close. This is how I would approach it. We'll create a macro with three parameters: data, lib, and out. data is the dataset you have with the column information. lib is the library you want to pull from your dataset, and out is the output dataset that you want to have.
We'll read each column into an individual macro variable:
memname1
memname2
memname3
libname1
libname2
libname3
etc.
From here, we simply need to loop over all of the macro variables and apply them where appropriate. We can easily count how many there are in a data step. All we need to do is add double-ampersands to resolve them correctly. For more information on why this is, check out this MWSUG paper.
%macro get_md5(data=, lib=, out=);
/* Save all variables into macro variables:
memname1 memname2 ...
columns1 columns2 ...
*/
data _null_;
set &data.;
where upcase(libname)=upcase("&lib.");
call symputx(cats('memname', _N_), memname);
call symputx(cats('columns', _N_), columns);
call symputx(cats('obs', _N_), obs);
call symputx('n_datasets', _N_);
run;
/* Loop through all the datasets and access each macro variable */
%do i = 1 %to &n_datasets.;
/* Double ampersand needed:
First, resolve &i. to get &memname1
Then resolve &mename1 to get the value stored in the macro variable memname1
*/
%let memname = &&memname&i.;
%let columns = &&columns&i.;
%let obs = &&obs&i.;
/* Calculate md5 in a temporary dataset */
data _tmp_;
length lib $8.
memname $32.
obs_row 8.
hash $32.
;
set &lib..&memname.(obs=&obs.);
lib = "&lib.";
memname = "&memname.";
obs_row = _N_;
hash = put(md5(cats(&columns.)), $hex32.);
keep libname memname obs_row hash;
run;
/* Overwrite the dataset so we don't keep appending */
%if(&i. = 1) %then %do;
data &out.;
set _tmp_;
run;
%end;
%else %do;
proc append base=&out. data=_tmp_;
run;
%end;
%end;
/* Remove temporary data */
proc datasets lib=work nolist;
delete _tmp_;
quit;
%mend;
Example:
data have;
length libname memname columns $15.;
input libname$ memname$ columns$ obs;
datalines;
sashelp cars make,model,msrp 1
sashelp class age,height,name 2
sashelp comet dose,length,sample 1
;
run;
%get_md5(data=have, lib=sashelp, out=want);
Output:
libname memname obs_row hash
sashelp cars 1 258DADA4843E7068ABAF95667E881B7F
sashelp class 1 29E8F4F03AD2275C0F191FE3DAA03778
sashelp class 2 DB664382B88BE7E445418B1A1C8CE13B
sashelp comet 1 210394B77E7696506FDEFD78890A8AB9

I would make a macro that takes as input the four values in your metadata dataset. Note that commas are anathema to SAS programs, especially macro code, so make the macro so it can accept space delimited variable lists (like normal SAS program statements do).
To reduce the risk of name conflict I will name the variable using triple underscores and then rename them back to human friendly names when the dataset is written.
%macro next_ds(libname,memname,num_obs,varlist);
data next_ds;
length ___1 $8 ___2 $32 ___3 8 ___4 $32 ;
___1 = "&libname";
___2 = "&memname";
___3 + 1;
set &libname..&memname(obs=&num_obs keep=&varlist);
___4 = put(md5(cats(of &varlist)),$hex32.);
keep ___1-___4 ;
rename ___1=libname ___2=memname ___3=obs_row ___4=hash;
run;
%mend next_ds;
Let's make some test metadata that reference datasets everyone should have.
data have;
infile cards truncover ;
input libname :$8. memname :$32. num_obs columns $200.;
cards;
sashelp class 3 name,sex,age
sashelp cars 2 make,model
;
And make sure the target dataset does not already exists.
%if %sysfunc(exist(want)) %then %do;
proc delete data=want; run;
%end;
Now you can call that macro once for each observation in your source metadata dataset. There is no need to generated oodles of macro variables. Instead you can use CALL EXECUTE() to generate the macro calls directly from the dataset.
We can replace the commas in the column lists when making the macro call. You can add in a PROC APPEND step after each macro call to aggregate the results into a single dataset.
data _null_;
set have;
call execute(cats(
'%nrstr(%next_ds)(',libname,',',memname,',',num_obs
,',',translate(columns,' ',','),')'
));
call execute('proc append data=next_ds base=want force; run;');
run;
Notice that wrapping the macro call in %NRSTR() makes the SAS log easier to read.
1 + %next_ds(sashelp,class,3,name sex age)
2 + proc append data=next_ds base=want force; run;
3 + %next_ds(sashelp,cars,2,make model)
4 + proc append data=next_ds base=want force; run;
Results:
Obs libname memname obs_row hash
1 sashelp class 1 5425E9CEDA1DDEB71B2692A3C7050A8A
2 sashelp class 2 C532D227D358A3764C2D225DC8C02D18
3 sashelp class 3 13AD5F1517E0C4494780773B6DC15211
4 sashelp cars 1 777C60693BF5E16F38706C89301CD0A8
5 sashelp cars 2 07080C9321145395D1A2BCC10FBE6B83
Note that CATS() might not be the best method for generating the string to pass to the MD5() function. That can generate the same string for different combinations of the source variables. For example 'AB' || 'CD' is the same as 'A' || 'BCD'. Perhaps just use CAT() instead.

Stu's approach is nice, and will work most of the time but will fall over when you have wiiiide variables, a large number of variables, variables with large precision, and other edge cases.
So for the actual hashing part, you might consider this macro, which is extensively tested within Data Controller for SAS:
https://core.sasjs.io/mp__md5_8sas.html
Usage:
data _null_;
set sashelp.class;
hashvar=%mp_md5(cvars=name sex, nvars=age height weight);
put hashvar=;
run;

SAS-Creating Panel by several datasets

Suppose there are ten datasets with same structure: date and price, particularly they have same time period but different price
date price
20140604 5
20140605 7
20140607 9
I want to combine them and create a panel dataset. Since there is no name in each datasets, I attempt to add a new variable name into each data and then combine them.
The following codes are used to add name variable into each dataset
%macro name(sourcelib=,from=,going=);
proc sql noprint; /*read datasets in a library*/
create table mytables as
select *
from dictionary.tables
where libname = &sourcelib
order by memname ;
select count(memname)
into:obs
from mytables;
%let obs=&obs.;
select memname
into : memname1-:memname&obs.
from mytables;
quit;
%do i=1 %to &obs.;
data
&going.&&memname&i;
set
&from.&&memname&i;
name=&&memname&i;
run;
%end;
%mend;
So, is this strategy correct? Whether are there a different way to creating a panel data?

There are really two ways to setup repeated measures data. You can use the TALL method that your code will create. That is generally the most flexible. The other would be a wide format with each PRICE being stored in a different variable. That is usually less flexible, but can be easier for some analyses.
You probably do not need to use macro code or even code generation to combine 10 datasets. You might find that it is easier to just type the 10 dataset names than to write complex code to pull the names from metadata. So a data step like this will let you list any number of datasets in the SET statement and use the membername as the value for the new PANEL variable that distinguishes the source dataset.
data want ;
length dsn $41 panel $32 ;
set in1.panel1 in1.panela in1.panelb indsname=dsn ;
panel = scan(dsn,-1,'.') ;
run;
And if your dataset names follow a pattern that can be used as a member list in the SET statement then the code is even easier to write. So you could have a list of names that have a numeric suffix.
set in1.panel1-in1.panel10 indsname=dsn ;
or perhaps names that all start with a particular prefix.
set in1.panel: indsname=dsn ;
If the different panels are for the same dates then perhaps the wide format is easier? You could then merge the datasets by DATE and rename the individual PRICE variables. That is generate a data step that looks like this:
data want ;
merge in1.panel1 (rename=(price=price1))
in1.panel2 (rename=(price=price2))
...
;
by date;
run;
Or perhaps it would be easier to add a BY statement to the data set that makes the TALL dataset and then transpose it into the WIDE format.
data tall;
length dsn $41 panel $32 ;
set in1.panel1 in1.panela in1.panelb indsname=dsn ;
by date ;
panel = scan(dsn,-1,'.') ;
run;
proc transpose data=tall out=want ;
by date;
id panel;
var price ;
run;

I can't comment on the SQL code but the strategy is correct. Add a name to each data set and then panel on the name with the PANELBY statement.

That is a valid way to achieve what you are looking for.
You are going to need 2 . in between the macros for library.data syntax. The first . is used to concatenate. The second shows up as a ..
I assume you will want to append all of these data sets together. You can add
data &going..want;
set
%do i=1 %to &obs;
&from..&&memname&i
%end;
;
run;
You can combine your loop that adds the names and that data step like this:
data &going..want;
set
%do i=1 %to &obs;
&from..&&memname&i (in=d&i)
%end;
;
%do i=1 %to &obs;
if d&i then
name = &&memname&i;
%end;
run;

Insert text into all cells of first column in a sas dataset

I've output 'Moments' from Proc Univariate to datasets. Many.
Example: Moments_001.sas7bdat through to Moments_237.sas7bdat
For the first column of each dataset (new added first column, and probably new dataset, as opposed to the original) I would like to have a particular text in every cell going down to bottom row.
The exact text would be the name of the respective dataset file: say, "Moments_001".
I do not have to 'grab' the filename, per se, if that's not possible. As I know what the names are already, I can put that text into the procedure. However, grabbing the filenames, if possible, would be easier from my standpoint.
I'd greatly appreciate any help anyone could provide to accomplish this.
Thanks,
Nicholas Kormanik

Are you looking for the INDSNAME option of the SET statement? You need to define two variables because the one generated by the option is automatically dropped.
data want;
length moment dsn $41 ;
set Moments_001 - Moments_237 indsname=dsn ;
moment=dsn;
run;

I think something along these lines should be what you're after. Assuming you have a list of moments, you can loop through it and add a new variable as the first column of each dataset.
%let list_of_moments = moments_001 moments_002 ... moments_237;
%macro your_macro;
%do i = 1 %to %sysfunc(countw(&list_of_moments.));
%let this_moment = %scan(&list_of_moments., &i.);
data &this_moment._v2;
retain new_variable;
set &this_moment.;
new_variable = "&this_moment.";
run;
%end;
%mend your_macro;
%your_macro;

The brute force entering of text into column 1 looks like this:
data moments_001;
length text $ 16;
set moments_001;
text="Moments_001";
run;
You could also write a macro that would loop through all 237 data sets and insert the text.
UNTESTED CODE
%macro do_all;
%do i=1 %to 237;
%let num = %sysfunc(putn(&i,z3.));
data moments_&num;
length text & 16;
set moments_&num;
text="Moments_&num";
run;
%end;
%mend
%do_all
It seems to me (not knowing your problem) that if you use PROC UNIVARIATE with the BY option, then you wouldn't need 237 different data sets, all of your output would be in one data set and the BY variable would also be in the data set. Does that solve your problem?

Set the labels of a SAS Dataset equal to their variable name

I'm working with a rather large several dataset that are provided to me as a CSV files. When I attempt to import one of the files the data will come in fine but, the number of variables in the file is too large for SAS, so it stops reading the variable names and starts assigning them sequential numbers. In order to maintain the variable names off of the data set I read in the file with the data row starting on 1 so it did not read the first row as variable names -
proc import file="X:\xxx\xxx\xxx\Extract\Live\Live.xlsx" out=raw_names dbms=xlsx replace;
SHEET="live";
GETNAMES=no;
DATAROW=1;
run;
I then run a macro to start breaking down the dataset and rename the variables based on the first observations in each variable -
%macro raw_sas_datasets(lib,output,start,end);
data raw_names2;
raw_names;
if _n_ ne 1 then delete;
keep A -- E &start. -- &end.;
run;
proc transpose data=raw_names2 out=raw_names2;
var A -- &end.;
run;
data raw_names2;
set raw_names2;
col1=compress(col1);
run;
data raw_values;
set raw;
keep A -- E &start. -- &end.;
run;
%macro rename(old,new);
data raw_values;
set raw_values;
rename &old.=&new.;
run;
%mend rename;
data _null_;
set raw_names2;
call execute('%rename('||_name_||","||col1||")");
run;
%macro freq(var);
proc freq data=raw_values noprint;
tables &var. / out=&var.;
run;
%mend freq;
data raw_names3;
set raw_names2;
if _n_ < 6 then delete;
run;
data _null_;
set raw_names3;
call execute('%freq('||col1||")");
run;
proc sort data=raw_values;
by StudySubjectID;
run;
data &lib..&output.;
set raw_values;
run;
%mend raw_sas_datasets;
The problem I'm running into is that the variable names are now all set properly and the data is lined up correctly, but the labels are still the original SAS assigned sequential numbers. Is there any way to set all of the labels equal to the variable names?

If you just want to remove the variable labels (at which point they default to the variable name), that's easy. From the SAS Documentation:
proc datasets lib=&lib.;
modify &output.;
attrib _all_ label=' ';
run;
I suspect you have a simpler solution than the above, though.
The actual renaming step needs to be done differently. Right now it's rewriting the entire dataset over and over again - for a lot of variables that is a terrible idea. Get your rename statements all into one datastep, or into a PROC DATASETS, or something else. Look up 'list processing SAS' for details on how to do that; on this site or on google you will find lots of solutions.
You likely can get SAS to read in the whole first line. The number of variables isn't the problem; it is probably the length of the line. There's another question that I'll find if I can on this site from a few months ago that deals with this exact problem.
My preferred option is not to use PROC IMPORT for CSVs anyway; I would suggest writing a metadata table that stores the variable names and the length/types for the variables, then using that to write import code. A little more work at first, but only has to be done once per study and you guarantee PROC IMPORT isn't making silly decisions for you.

In the library sashelp is a table vcolumn. vcolumn contains all the names of your variables for each library by table. You could write a macro that puts all your variable names into macro variables and then from there set the label.
Here's some code that I put together (not very pretty) but it does what you're looking for:
data test.label_var;
x=1;
y=1;
label x = 'xx';
label y = 'yy';
run;
proc sql noprint;
select count(*) into: cnt
from sashelp.vcolumn
where memname = 'LABEL_VAR';quit;
%let cnt = &cnt;
proc sql noprint;
select name into: name1 - :name&cnt
from sashelp.vcolumn
where memname = 'LABEL_VAR';quit;
%macro test;
%do i = 1 %to &cnt;
proc datasets library=test nolist;
modify label_var;
label &&name&i=&&name&i;
quit;
%end;
%mend test;
%test;

SAS: Drop column in a if statement

I have a dataset called have with one entry with multiple variables that look like this:
message reference time qty price
x 101 35000 100 .
the above dataset changes every time in a loop where message can be ="A". If the message="X" then this means to remove 100 qty from the MASTER set where the reference number equals the reference number in the MASTER database. The price=. is because it is already in the MASTER database under reference=101. The MASTER database aggregates all the available orders at some price with quantity available. If in the next loop message="A" then the have dataset would look like this:
message reference time qty price
A 102 35010 150 500
then this mean to add a new reference number to the MASTER database. In other words, to append the line to the MASTER.
I have the following code in my loop to update the quantity in my MASTER database when there is a message X:
data b.master;
modify b.master have(where=(message="X")) updatemode=nomissingcheck;
by order_reference_number;
if _iorc_ = %sysrc(_SOK) then do;
replace;
end;
else if _iorc_ = %sysrc(_DSENMR) then do;
output;
_error_ = 0;
end;
else if _iorc_ = %sysrc(_DSEMTR) then do;
_error_ = 0;
end;
else if _iorc_ = %sysrc(_DSENOM) then do;
_error_ = 0;
end;
run;
I use the replace to update the quantity. But since my entry for price=. when message is X, the above code sets the price='.' where reference=101 in the MASTER via the replace statement...which I don't want. Hence, I prefer to delete the price column is message=X in the have dataset. But I don't want to delete column price when message=A since I use this code
proc append base=MASTER data=have(where=(msg_type="A")) force;
run;
Hence, I have this code price to my Modify statement:
data have(drop=price_alt);
set have; if message="X" then do;
output;end;
else do; /*I WANT TO MAKE NO CHANGE*/
end;run;
but it doesn't do what I want. If the message is not equal X then I don't want to drop the column. If it is equal X, I want to drop the column. How can I adapt the code above to make it work?

Its a bit of a strange request to be honest, such that it raises questions about whether what you're doing is the best way of doing it. However, in the spirit of answering the question...
The answer by DomPazz gives the option of splitting the data into two possible sets, but if you want code down the line to always refer to a specific data set, this creates its own complications.
You also can't, in the one data step, tell SAS to output to the "same" data set where one instance has a column and one instance doesn't. So what you'd like, therefor, is for the code itself to be dynamic, so that the data step that exists is either one that does drop the column, or one that does not drop the column, depending on whether message=x. The answer to this, dynamic code, like many things in SAS, resolves to the creative use of macros. And it looks something like this:
/* Just making your input data set */
data have;
message='x';
time=35000;
qty=1000;
price=10.05;
price_alt=10.6;
run;
/* Writing the macro */
%macro solution;
%local id rc1 rc2;
%let id=%sysfunc(open(work.have));
%syscall set(id);
%let rc1=%sysfunc(fetchobs(&id, 1));
%let rc2=%sysfunc(close(&id));
%IF &message=x %THEN %DO;
data have(drop=price_alt);
set have;
run;
%END;
%ELSE %DO;
data have;
set have;
run;
%END;
%mend solution;
/* Running the macro */
%solution;

Try this:
data outX(drop=price_alt) outNoX;
set have;
if message = "X" then
output outX;
else
output outNoX;
run;
As #sasfrog says in the comments, a table either has a column or it does not. If you want to subset things where MESSAGE="X" then you can use something like this to create 2 data sets.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Check if a value exists in a dataset or column - sas

Related

SAS - Loop through rows and calculate MD5

SAS-Creating Panel by several datasets

Insert text into all cells of first column in a sas dataset

Set the labels of a SAS Dataset equal to their variable name

SAS: Drop column in a if statement

Categories

Resources