I have a SAS macro that, when given different arguments, creates several tables. It looks something like this:
%macro create_tables(key, value);
data WORK.TABLE_&key.:
set WORK.MAIN_TABLE;
where col = &value.;
col_&key. = 1;
drop col;
%mend create_tables;
The key parameter/macro variable is injected into the table name. I call this macro several times with different keys and values.
I want to convert this piece of code into Teradata syntax. I can create multiple tables for every key and value, but I have 30+ keys and values. What would be the best way to achieve this in Teradata? Would creating multiple tables be more efficient? The number of rows for each table created will be between 1 million and 2 million, and the MAIN_TABLE has 30+ million rows.
I agree with Tom that it is not obvious why you create a lot of new tables that just contain a certain part of your original data.
So, if you want to have these extra columns (one column per value of column col) then imho the following code seems to be the best solution in SAS.
Here, a view is created by a SAS macro where you can specify one or more values via the macro parameter KEY. For each value there will be a new column with the values 1 and 0 in the resulting view.
If you want only the rows with one certain value then you can do that based on this view.
%macro create_colkeys (KEY=);
%let NUMWORDS = %sysfunc(countw(&KEY.));
%do i=1 %to &NUMWORDS.;
%let KEY_&i.=%scan(&KEY,&i.);
%end;
proc sql;
create view col_flags as
select
%do i=1 %to &NUMWORDS.;
case when col="&&KEY_&i." then 1
else 0
end as col_&&KEY_&i..,
%end;
*
from main_table;
quit;
%mend;
%create_colkeys(KEY = abc def xyz);
Related
I'm new to programming in SAS and I would like to do 2 macros, the first one I have done and it consists of giving 3 parameters: name of the input table, name of the column, name of the output table. What this macro does is translate the rare or accented characters, passing it a table and specifying in which column you want the rare characters to be translated:
The code to do this macro is this:
%macro translate_column(table,column,name_output);
*%LET table = TEST_MACRO_TRNSLT;
*%let column = marca;
*%let name_output = COSAS;
PROC SQL;
CREATE TABLE TEST AS
SELECT *
FROM &table.;
QUIT;
data &NAME_OUTPUT;
set TEST;
&column.=tranwrd(&column., "Á", "A");
run;
%mend;
%translate_column(TEST_MACRO_TRNSLT,marca,COSAS);
The problem comes when I try to do the second macro, that I want to replicate what I do in the first one but instead of having the columns that I can introduce to 1, let it be infinite, that is, if in a data set I have 4 columns with characters rare, can you translate the rare characters of those 4 columns. I don't know if I have to put a previously made macro in a parameter and then make a kind of loop or something in the macro.
The same by creating a kind of array (I have no experience with this) and putting those values in a list (these would be the different columns you want to iterate over) or in a macrovariable, it may be that passing this list as a function parameter works.
Could someone give me a hand on this? I would be very grateful
Either use an ARRAY or a %DO loop.
In either case use a space delimited list of variable names as the value of the COLUMN input parameter to your macro.
%translate_column
(table=TEST_MACRO_TRNSLT
,column=var1 varA var2 varB
,name_output=COSAS
);
So here is ARRAY based version:
%macro translate_column(table,column,name_output);
data &NAME_OUTPUT;
set &table.;
array __column &column ;
do over __column;
__column=ktranslate(__column, "A", "Á");
end;
run;
%mend;
Here is %DO loop based version
%macro translate_column(table,column,name_output);
%local index name ;
data &NAME_OUTPUT;
set &table.;
%do index=1 %to %sysfunc(countw(&column,%str( )));
%let name=%scan(&column,&index,%str( ));
&name = ktranslate(&name, "A", "Á");
%end;
run;
%mend;
Notice I switched to using KTRANSLATE() instead of TRANWRD. That means you could adjust the macro to handle multiple character replacements at once
&name = ktranslate(&name,'AO','ÁÓ');
The advantage of the ARRAY version is you could do it without having to create a macro. The advantage of the %DO loop version is that it does not require that you find a name to use for the array that does not conflict with any existing variable name in the dataset.
I have a few data sets in SAS which I am trying to collate into one larger set which I will be filtering later. They're all called something like table_201802. My problem is that there are a few missing months (i.e. there exists table201802 and table201804 and up, but not table201803.
I'm new enough to SAS, but what I've tried so far is to create a new data set called output testing and ran a macro loop iterating over the names (they go from 201802 to 201903, and they're monthly data so anything from 812 to 900 won't exist).
data output_testing;
set
%do i=802 %to 812;
LIBRARY.table_201&i
%end;
;
run;
%mend append;
I want the code to ignore missing tables and just look for ones that do exist and then append them to the new output_testing table.
If the table name prefix is distinct, and you are confident the data structures amongst the tables are consistent (variable names, types and lengths are the same) then the table can be stacked using table name prefix lists (:)
For a specific known range of table names you can also use numbered range lists (-) tab
data have190101 have190102 have190103;
x =1;
run;
data want_version1_stack; /* any table name that starts with have */
set have:;
run;
data want_version1b_stack; /* 2019 and 2020 */
set have19: have20:;
run;
options nodsnferr;
data want_version2_stack; /* any table names in the iterated numeric range */
set have190101-have191231;
run;
options dsnferr;
From helps
Using Data Set Lists with SET
You can use data set lists with the SET
statement. Data set lists provide a quick way to reference existing
groups of data sets. These data set lists must either be name prefix
lists or numbered range lists.
Name prefix lists refer to all data
sets that begin with a specified character string. For example, set
SALES1:; tells SAS to read all data sets that start with "SALES1" such
as SALES1, SALES10, SALES11, and SALES12. >
Numbered range lists
require you to have a series of data sets with the same name, except
for the last character or characters, which are consecutive numbers.
In a numbered range list, you can begin with any number and end with
any number. For example, these lists refer to the same data sets:
sales1 sales2 sales3 sales4
sales1-sales4
Some macro code with proc append should solve the problem.
%let n = 10;
%macro get_list_table;
%do i = 1 %to &n;
%let dsn = data&n;
%if %sysfunc(exist(&dsn)) %then %do;
proc append data = &dsn base = appended_data force;
run;
%end;
%end;
%mend;
You can use shortcuts:
data output_testing;
set LIBRARY.table_201:
;
run;
but in this case you will get in set all tables that start with "table_201".
For example:
LIBRARY.table_201tablesss LIBRARY.table_201ed56
I've output 'Moments' from Proc Univariate to datasets. Many.
Example: Moments_001.sas7bdat through to Moments_237.sas7bdat
For the first column of each dataset (new added first column, and probably new dataset, as opposed to the original) I would like to have a particular text in every cell going down to bottom row.
The exact text would be the name of the respective dataset file: say, "Moments_001".
I do not have to 'grab' the filename, per se, if that's not possible. As I know what the names are already, I can put that text into the procedure. However, grabbing the filenames, if possible, would be easier from my standpoint.
I'd greatly appreciate any help anyone could provide to accomplish this.
Thanks,
Nicholas Kormanik
Are you looking for the INDSNAME option of the SET statement? You need to define two variables because the one generated by the option is automatically dropped.
data want;
length moment dsn $41 ;
set Moments_001 - Moments_237 indsname=dsn ;
moment=dsn;
run;
I think something along these lines should be what you're after. Assuming you have a list of moments, you can loop through it and add a new variable as the first column of each dataset.
%let list_of_moments = moments_001 moments_002 ... moments_237;
%macro your_macro;
%do i = 1 %to %sysfunc(countw(&list_of_moments.));
%let this_moment = %scan(&list_of_moments., &i.);
data &this_moment._v2;
retain new_variable;
set &this_moment.;
new_variable = "&this_moment.";
run;
%end;
%mend your_macro;
%your_macro;
The brute force entering of text into column 1 looks like this:
data moments_001;
length text $ 16;
set moments_001;
text="Moments_001";
run;
You could also write a macro that would loop through all 237 data sets and insert the text.
UNTESTED CODE
%macro do_all;
%do i=1 %to 237;
%let num = %sysfunc(putn(&i,z3.));
data moments_#
length text & 16;
set moments_#
text="Moments_&num";
run;
%end;
%mend
%do_all
It seems to me (not knowing your problem) that if you use PROC UNIVARIATE with the BY option, then you wouldn't need 237 different data sets, all of your output would be in one data set and the BY variable would also be in the data set. Does that solve your problem?
UPDATE I've been told this isn't possible using arrays because of they way they are stored. This changes my question a bit, but the gist is still the same. How can I most efficiently generate the tables I need from a given vector of values (ex: day, week, month, year) without just repeating the code multiple times? Is there any way to simply substitute the given date value into INTX in a loop?
Ok, this is my last question on this subject, I promise. After some good advice, I'm using the INTX function. However, I'd like to just loop through the different categories I select and create tables. I tried this, but to no avail.
data;
array period [*] $ day week month year;
run;
%MACRO sqlloop;
proc sql;
%DO k = 1 %TO dim(&period); /* in case i decide to drop/add from array later */
%LET bucket = &period[&k];
CREATE TABLE output.t_&bucket AS (
SELECT INTX( "&bucket.", date_field, O, 'E') AS test FROM table);
%END
quit;
%MEND
%sqlloop
Sadly this doesn't work because I'm fouling up the array reference somehow. If I can get this step I'll be in good shape.
You could replace your array with a macro variable string:
%let period=day week month year;
In your macro then, you loop over the words in the macro variable:
%MACRO sqlloop;
proc sql;
%DO k = 1 %TO %sysfunc(countw(&period.)); /*fixed extra s*/
%LET bucket = %scan(&period.,&k.);
CREATE TABLE output.t_&bucket AS (
SELECT INTNX( "&bucket.", date_field, 0, 'E') AS test FROM table);
%END;
quit;
%MEND;
%sqlloop
edit you forgot some semicolons apparently. :p
i have a data set with multiple attributes and each attribute has 10-15 rows each in the master table. i wish to use a do loop on the data set which would allow me to extract outputs for each attribute seperately. my concern is how to automate the selection of attribute in the do loop once the previous attribute's output is extracted??
thanks in advance.
I'm not completely sure what you're asking to do, but I can hopefully show the basic ideas of a do loop.
%macro YOUR_MACRO();
%let YOUR_VARIABLE = 1 2 3 ...; /*This could be whatever you want to split up from your master table*/
%let NUM_VAR = 3; /*Change this to the number of YOUR_VARIABLEs listed*/
%do i = 1 %to &NUM_VAR. %by 1;
%let LOOP_VAR = %scan(&YOUR_VARIABLE., %i.);
/*This do i = 1 starts your loop at 1 and goes up by 1 until your NUM_VAR is reached*/
proc sql;
create table TABLE_&LOOP_VAR. as /*Creates a specific table for each variable*/
select *
from MASTER_TABLE
where COLUMN_NAME = &LOOP_VAR. /*Splits up your table by a certain attribute equaling the loop variable*/
;
quit;
%end;
%mend;
%YOUR_MACRO(); /*Runs your loop*/
This is the basic structure and should give a little help. You can also just scan your master table for each variable name then separate it by that without having to type each one out.