I have a dataset with 20 variables that all start with date_, that represent the date of a particular procedure. I wish to use an arrays and do to create 20 new variables that have the same name, except the date_ is replaced with time_, and is equal to the number of days between the date_ variable and some reference date ref_date.
So far, I can perform the very simple operation to replace each element:
data morbidity;
set morbidity;
array dates date_:;
do over dates;
dates = dates - ref_date;
end;
format date_: 11.;
run;
But I can't figure out how to specify a new array times with the same suffix as dates:
data morbidity;
set morbidity;
array dates date_:;
array times /* Some array here */
do over dates;
times= dates - ref_date;
end;
run;
So, take for example date_amputation, I wish to create time_amputation which is equal to the difference between date_amputation and ref_date.
You need to know how many variables you want to create.
So code like this will work for up to 20 new variables. If you do not list the variable names in the ARRAY statement then they will be named TIMES1 to TIMES20.
data morbidity;
set morbidity;
array dates date_:;
array times [20] ;
do index=1 to min(dim(dates),dim(times);
times[index]= dates[index] - ref_date;
end;
drop index;
run;
If you want the new variable names to be based on the old variable names then you will need to use some code generation.
proc sql noprint;
select name,'time'||substr(name,5)
into :dates separated by ' '
, :times separated by ' '
from dictionary.columns
where libname='WORK' and memname='MORBIDITY'
and upcase(name) like 'DATE^_%' escape '^'
;
quit
data morbidity;
set morbidity;
array dates &dates;
array times ×
do index=1 to dim(dates);
times[index]= dates[index] - ref_date;
end;
drop index;
run;
Related
I'm calculating a hash (md5) row by row of an entire table, 1 hash for each row.
The table is selected by the user with a prompt (&sel_lib for the selected lib and &sel_tab for the table).
1st i get my columns from sas.help.vcolumn and concatenated all to a attribute called lista:
data contents;
do until (last.memname);
set sashelp.vcolumn;
where upcase(libname)="&sel_livraria"
and upcase(memname)="&sel_tabela";
by libname memname varnum;
length lista $&max_len ; /* sum(len col) + number of col */
lista=catx(',',lista,name);
end;
keep libname memname varnum lista;
run;
2nd i write that value in another table, because i need it for other operations not related with this issue:
proc sql;
update md5_table
set nom_colunas=(
select lista
from contents)
where libname="&sel_lib"
and memname="&sel_tab";
quit;
3rd Then i pass that concatenated columns to macro var scolunas
proc sql;
select columns
into :scolunas
from md5_table
where libname="&sel_lib"
and memname="&sel_tab";
quit;
Then i use it to run my hash in my data step
data want;
livraria="&sel_lib";
tabela="&sel_tab";
length check $32.;
format check $hex32.
set &sel_livraria..&sel_tabela.;
check = md5(cats(&scolunas));
hash = put(check,$hex32.);
keep livraria tabela hash;
put hash;
put _all_;
run;
My problem is that i need to compare the output with the output of a same table in another server (plataform migration), so i need a reference to compare both.
I prefer to do that by adding a row number of the source table (&sel_lib &sel_tab) to my data set. Any way to do that?
A more complex one would be adding the concatenate PK values to it.
Tks in advance.
STRIP will remove leading and trailing spaces of only one expression (i.e. variable) and thus accepts only a single argument.
You want your comma separated variable list concatenated as input to MD5.
CATS() will concatenate variables and implicitly strip.
...
check = md5(cats(&scolumnlist));
...
If you think that data values may in some edge cases cause a concatenation that is the same in different contexts use the CATX(<sep>,....) to explicitly separate the fields when concatenated.
Example:
A B C CATS() CATX('|',)
-- --- -- ------ ----------
PP QQ RR PPQQRR PP|QQ|RR
P PQQ RR PPQQRR P|PQQ|RR
I have table like first table on the picture.
It's information about banks deals on the FX market on daily basis (buy minus sell). I would like to calculate cumulative results like on the second table. The number of banks and their names, also as date are not fixed. I'm new in SAS and tried to find solutions, but didn't find anything useful. I will be glad for any help.
When data such as this is in a wide format, it can be more difficult to process in SAS compared to a long format. Long data formats have numerous benefits in the form of by-group processing, indexing, filtering, etc. Many SAS procedures are designed around this concept.
For more information on the examples below, check out SAS's example on the Program Data Vector and by-group processing. Mastering these concepts will help you with data step programming.
Here are two ways you can solve it:
1. Use a sum statement and by-group processing.
In this example, we will:
Convert the data from wide to long in order to convert the bank name to a character variable
Perform a cumulative sum on each bank
Convert back to long again
By converting the bank name into a character variable, we can use by-group processing on it.
/* Convert from wide to long */
proc transpose data=raw
out=raw_transposed
name=bank
;
by date;
run;
proc sort data=raw_transposed;
by bank date;
run;
/* Use by-group processing to get cumulative values by month for each bank */
data cumulative_long;
set raw_transposed;
by bank date;
/* Reset the cumulative sum for each bank */
if(first.bank) then call missing(cumulative);
cumulative+COL1;
run;
proc sort data=raw_transposed;
by date bank;
run;
/* Convert from long to wide */
proc transpose data=raw_transposed
out=want(drop=_NAME_)
;
by date;
id bank;
var COL1;
run;
The sum statement can be used as a shortcut of the following code:
data cumulative_long;
set raw_transposed;
by bank date;
retain cumulative;
if(first.bank) then cumulative = 0;
cumulative = cumulative + COL1;
run;
cumulative does not exist in the dataset: we are creating it here. This value will become missing whenever SAS moves on to read a new row. We want SAS to carry the last value forward. retain tells SAS to carry its last value forward until we change it.
2. Use macro variables and dictionary tables
A second option would be to read all of the bank names from a dictionary table to prevent transposing. We will:
Read the names of the banks from the special table dictionary.columns into a macro variable using PROC SQL
Use arrays to perform cumulative sums
This assumes the bank naming scheme is always prefixed with "Bank." If does not follow a regular pattern, you can exclude all other variables from the initial SQL query.
proc sql noprint;
select name
, cats(name, '_cume')
into :banks separated by ' '
, :banks_cume separated by ' '
from dictionary.columns
where memname = 'RAW'
AND libname = 'WORK'
AND upcase(name) LIKE 'BANK%'
;
quit;
data want;
set raw;
array banks[*] &banks.;
array banks_cume[*] &banks_cume.;
do i = 1 to dim(banks);
banks_cume[i]+banks[i];
end;
drop i;
run;
I have a huge dataset with a column(variable) having 100 distinct values. I want to break this dataset into 100 smaller parts using those distinct values of the column and that too in a loop(iteratively). I have been suggested to use a macro but i'm unable to do that
Sollution with macro variables only: list your datasets and output statements in a proc sql with an into clause:
proc sql;
select distinct 'WORK.cars_'|| origin
, 'when ("'|| trim(origin) ||'") output cars_'|| origin
into :cars_data separated by ' '
, :cars_when separated by '; '
from sashelp.cars;
quit;
Leave out the double quotes for an integer criterion.
For a float criterion, convert the dot to an underscore.
data &cars_data.;
set sashelp.cars;
select (origin);
&cars_when.;
end;
run;
I have created a sas code which generates many sas datasets. Now I want to append all of them to a single excel file . So first I want to convert all the column headers of sas datasets as first observation. Then leave space between these datasets (adding a blank observation). How can we do it?
one way to do this would be to use dictionary.columns
proc sql;
create table Attribute as
select * from dictionary.columns;
Read through the table and check what attributes you are interested in. For your case you might be interested in the column "NAME" <- consist of the name of all columns.
Modify the table by adding where statement to the proc sql based on the identity of the column ( from which library / what type of file / name of file) e.g. where upcase(libname)= "WORK"
data attribute;
array column [ n ] $ length ;
do i=1 to n;
set attribute ( keep = name) ;
column [ i ] = name ;
end;
run;
Then I would proceed with data step. You could use macro variable to store the value of column's names by select variable into : but anyhow you still need to hardcode the size for the array n or any other method that store value into one observation . Also remember define the length and the type of array accordingly. You can give name to the variable in the result dataset Attribute by adding var1-varnafter the length at array statement.
For simplicity I use set statement to read observation one and one and store the value of column NAME, which is the official column name derived when using dictionary.columns into the array
Note that creating a non-temporary array would create variable(s) .
Add if you want to add the blank,
data younameit ;
merge attribute attribute(firstobs=2 keep=name rename=(name=_name));
output;
if name ne _name then do;
call missing(of _all_);
output;
end;
run;
As two datasets start with different observation and column names do not duplicate within one dataset, the next row of a valid observation ( derived from the first output statement in the resulting dataset would be empty due to call missing ( of _all_ ) ; output;
Sounds like you just want to combine the datasets and write the results to the Excel file. Do you really need the extra empty row?
libname out xlsx 'myfile.xlsx';
data out.report ;
set ds1 ds2 ...;
run;
Ensure that all your columns are character (or numeric, substitute numeric), then in your data step use:
array names{*} _character_;
do i=1 to dim(names);
call label(names{i}, names{i});
end;
output;
I have two datasets in SAS that I would like to merge, but they have no common variables. One dataset has a "subject_id" variable, while the other has a "mom_subject_id" variable. Both of these variables are 9-digit codes that have just 3 digits in the middle of the code with common meaning, and that's what I need to match the two datasets on when I merge them.
What I'd like to do is create a new common variable in each dataset that is just the 3 digits from within the subject ID. Those 3 digits will always be in the same location within the 9-digit subject ID, so I'm wondering if there's a way to extract those 3 digits from the variable to make a new variable.
Thanks!
SQL(using sample data from Data Step code):
proc sql;
create table want2 as
select a.subject_id, a.other, b.mom_subject_id, b.misc
from have1 a JOIN have2 b
on(substr(a.subject_id,4,3)=substr(b.mom_subject_id,4,3));
quit;
Data Step:
data have1;
length subject_id $9;
input subject_id $ other $;
datalines;
abc001def other1
abc002def other2
abc003def other3
abc004def other4
abc005def other5
;
data have2;
length mom_subject_id $9;
input mom_subject_id $ misc $;
datalines;
ghi001jkl misc1
ghi003jkl misc3
ghi005jkl misc5
;
data have1;
length id $3;
set have1;
id=substr(subject_id,4,3);
run;
data have2;
length id $3;
set have2;
id=substr(mom_subject_id,4,3);
run;
Proc sort data=have1;
by id;
run;
Proc sort data=have2;
by id;
run;
data work.want;
merge have1(in=a) have2(in=b);
by id;
run;
an alternative would be to use
proc sql
and then use a join and the substr() just as explained above, if you are comfortable with sql
Assuming that your "subject_id" variable is a number then the substr function wont work as sas will try convert the number to a string. But by default it pads some paces on the left of the number.
You can use the modulus function mod(input, base) which returns the remainder when input is divided by base.
/*First get rid of the last 3 digits*/
temp_var = floor( subject_id / 1000);
/* then get the next three digits that we want*/
id = mod(temp_var ,1000);
Or in one line:
id = mod(floor(subject_id / 1000), 1000);
Then you can continue with sorting the new data sets by id and then merging.