I have a column with name total transaction. I want to add a date 4 days back from now in its name .
For example if today is 20161220 so I want my variable to be renamed as total_transaction_20161216.
Please suggest me a way out of my problem.
Just create a macro variable that stores the required date format and then use that in a rename statement within proc datasets.
%let datevar = %sysfunc(intnx(day,%sysfunc(today()),-4),yymmddn8.);
%put &=datevar.;
data have;
total_transaction=1;
run;
proc datasets lib=work nolist nodetails;
modify have;
rename total_transaction = total_transaction_&datevar.;
quit;
Related
Suppose there are ten datasets with same structure: date and price, particularly they have same time period but different price
date price
20140604 5
20140605 7
20140607 9
I want to combine them and create a panel dataset. Since there is no name in each datasets, I attempt to add a new variable name into each data and then combine them.
The following codes are used to add name variable into each dataset
%macro name(sourcelib=,from=,going=);
proc sql noprint; /*read datasets in a library*/
create table mytables as
select *
from dictionary.tables
where libname = &sourcelib
order by memname ;
select count(memname)
into:obs
from mytables;
%let obs=&obs.;
select memname
into : memname1-:memname&obs.
from mytables;
quit;
%do i=1 %to &obs.;
data
&going.&&memname&i;
set
&from.&&memname&i;
name=&&memname&i;
run;
%end;
%mend;
So, is this strategy correct? Whether are there a different way to creating a panel data?
There are really two ways to setup repeated measures data. You can use the TALL method that your code will create. That is generally the most flexible. The other would be a wide format with each PRICE being stored in a different variable. That is usually less flexible, but can be easier for some analyses.
You probably do not need to use macro code or even code generation to combine 10 datasets. You might find that it is easier to just type the 10 dataset names than to write complex code to pull the names from metadata. So a data step like this will let you list any number of datasets in the SET statement and use the membername as the value for the new PANEL variable that distinguishes the source dataset.
data want ;
length dsn $41 panel $32 ;
set in1.panel1 in1.panela in1.panelb indsname=dsn ;
panel = scan(dsn,-1,'.') ;
run;
And if your dataset names follow a pattern that can be used as a member list in the SET statement then the code is even easier to write. So you could have a list of names that have a numeric suffix.
set in1.panel1-in1.panel10 indsname=dsn ;
or perhaps names that all start with a particular prefix.
set in1.panel: indsname=dsn ;
If the different panels are for the same dates then perhaps the wide format is easier? You could then merge the datasets by DATE and rename the individual PRICE variables. That is generate a data step that looks like this:
data want ;
merge in1.panel1 (rename=(price=price1))
in1.panel2 (rename=(price=price2))
...
;
by date;
run;
Or perhaps it would be easier to add a BY statement to the data set that makes the TALL dataset and then transpose it into the WIDE format.
data tall;
length dsn $41 panel $32 ;
set in1.panel1 in1.panela in1.panelb indsname=dsn ;
by date ;
panel = scan(dsn,-1,'.') ;
run;
proc transpose data=tall out=want ;
by date;
id panel;
var price ;
run;
I can't comment on the SQL code but the strategy is correct. Add a name to each data set and then panel on the name with the PANELBY statement.
That is a valid way to achieve what you are looking for.
You are going to need 2 . in between the macros for library.data syntax. The first . is used to concatenate. The second shows up as a ..
I assume you will want to append all of these data sets together. You can add
data &going..want;
set
%do i=1 %to &obs;
&from..&&memname&i
%end;
;
run;
You can combine your loop that adds the names and that data step like this:
data &going..want;
set
%do i=1 %to &obs;
&from..&&memname&i (in=d&i)
%end;
;
%do i=1 %to &obs;
if d&i then
name = &&memname&i;
%end;
run;
I need to export a data set from SAS to Excel 2013 as a .csv file. However, I need the file name to be dynamic. In this instance, I need it to appear as:
in_C000000_013117_65201.csv
where the string, "in_C000000_" will remain constant, the string "013117_" will be the current day's date, and the string "65201" will be the row count of the data set itself.
Any help that you can provide would be much appreciated!
Thanks!
Here's a modified macro I wrote in the past that does almost exactly what you're asking for. If you want to replace sysdate with a date in your desired format, that's easy to do as well:
%let path = [[desired destination]];
%macro exporter(dataset);
proc sql noprint;
select count(*) into: obs
from &dataset.;
quit;
data temp;
format date mmddyy6.;
date = today();
run;
proc sql noprint;
select date format mmddyy6. into: date_formatted
from temp;
quit;
proc export data = &dataset.
file = "&path.in_C000000_&date_formatted._%sysfunc(compress(&obs.)).csv"
dbms = csv replace;
run;
%mend exporter;
%exporter(your_dataset_here);
Produces datasets in the format: in_C000000_020117_50000.csv
I have a dataset where I have several variables with suffixes that correspond to given dates. I want to replace the suffixes with the dates to make my output tables more user friendly.
Here is a sample of my code
the fields in my sales dataset are
product number_of_sales_1 number_of_sales_2 number_of_sales_3 revenue_1 revenue_2 revenue_3 tax_1 tax_2 tax_3
The suffixes 1,2,3 correspond to dates which are held in a second dataset with the following format
dates
id date
1 01Apr
2 01May
3 01Jun
I want to bulk replace the suffixes with the dates so my fields in sales become
product number_of_sales_01Apr number_of_sales_01May number_of_sales_01Jun revenue_01Apr revenue_01May revenue_01Jun tax_01Apr tax_01May tax_01Jun
Both the number of dates and the numberof metrics in sales are dynamic so I can't just hardcode in the the code.
I assume your datasets look like below:
data sales;
product="abc";number_of_sales_1=1;number_of_sales_2=2;number_of_sales_3=3;
revenue_1=1000;revenue_2=2000;revenue_3=3000;tax_1=100;tax_2=200;tax_3=300;
run;
data dates;
id=1;date="01Apr";output;id=2;date="01May";output;id=3;date="01Jun";output;
run;
1st Step - Finding out the dates variables which needs to be renamed
proc contents data=sales out=sales_temp(keep=name) noprint; run;
data sales_temp1;
length check_date_vars $1. id 8.;
set sales_temp;
check_date_vars=compress(substr(name,length(name)));
temp=notdigit(check_date_vars);
if temp=0 then id=check_date_vars;
run;
2nd step - Merging the above dataset with the datset which contains the formats, to create a mapping between old names and new names and creating macro variables out of it
proc sort data=sales_temp1; by id; run;
proc sort data=dates; by id; run;
data sales_temp_date;
merge sales_temp1(in=a) dates(in=b);
by id;
if a and b;
new_name=substr(name,1,length(name)-1)||date;
run;
proc sql noprint;
select count(*) into :num_vars separated by " " from sales_temp_date;
quit;
proc sql noprint;
select name into:old_name1 - :old_name&num_vars. from sales_temp_date;
select new_name into:new_name1 - :new_name&num_vars. from sales_temp_date;
quit;
3rd Step - Renaming the variables
%macro rename();
proc datasets library=work nolist;
modify sales;
rename
%do i=1 %to &num_vars.;
&&old_name&i.= &&new_name&i.
%end;
;
run;
%mend;
%rename;
I'm working with a rather large several dataset that are provided to me as a CSV files. When I attempt to import one of the files the data will come in fine but, the number of variables in the file is too large for SAS, so it stops reading the variable names and starts assigning them sequential numbers. In order to maintain the variable names off of the data set I read in the file with the data row starting on 1 so it did not read the first row as variable names -
proc import file="X:\xxx\xxx\xxx\Extract\Live\Live.xlsx" out=raw_names dbms=xlsx replace;
SHEET="live";
GETNAMES=no;
DATAROW=1;
run;
I then run a macro to start breaking down the dataset and rename the variables based on the first observations in each variable -
%macro raw_sas_datasets(lib,output,start,end);
data raw_names2;
raw_names;
if _n_ ne 1 then delete;
keep A -- E &start. -- &end.;
run;
proc transpose data=raw_names2 out=raw_names2;
var A -- &end.;
run;
data raw_names2;
set raw_names2;
col1=compress(col1);
run;
data raw_values;
set raw;
keep A -- E &start. -- &end.;
run;
%macro rename(old,new);
data raw_values;
set raw_values;
rename &old.=&new.;
run;
%mend rename;
data _null_;
set raw_names2;
call execute('%rename('||_name_||","||col1||")");
run;
%macro freq(var);
proc freq data=raw_values noprint;
tables &var. / out=&var.;
run;
%mend freq;
data raw_names3;
set raw_names2;
if _n_ < 6 then delete;
run;
data _null_;
set raw_names3;
call execute('%freq('||col1||")");
run;
proc sort data=raw_values;
by StudySubjectID;
run;
data &lib..&output.;
set raw_values;
run;
%mend raw_sas_datasets;
The problem I'm running into is that the variable names are now all set properly and the data is lined up correctly, but the labels are still the original SAS assigned sequential numbers. Is there any way to set all of the labels equal to the variable names?
If you just want to remove the variable labels (at which point they default to the variable name), that's easy. From the SAS Documentation:
proc datasets lib=&lib.;
modify &output.;
attrib _all_ label=' ';
run;
I suspect you have a simpler solution than the above, though.
The actual renaming step needs to be done differently. Right now it's rewriting the entire dataset over and over again - for a lot of variables that is a terrible idea. Get your rename statements all into one datastep, or into a PROC DATASETS, or something else. Look up 'list processing SAS' for details on how to do that; on this site or on google you will find lots of solutions.
You likely can get SAS to read in the whole first line. The number of variables isn't the problem; it is probably the length of the line. There's another question that I'll find if I can on this site from a few months ago that deals with this exact problem.
My preferred option is not to use PROC IMPORT for CSVs anyway; I would suggest writing a metadata table that stores the variable names and the length/types for the variables, then using that to write import code. A little more work at first, but only has to be done once per study and you guarantee PROC IMPORT isn't making silly decisions for you.
In the library sashelp is a table vcolumn. vcolumn contains all the names of your variables for each library by table. You could write a macro that puts all your variable names into macro variables and then from there set the label.
Here's some code that I put together (not very pretty) but it does what you're looking for:
data test.label_var;
x=1;
y=1;
label x = 'xx';
label y = 'yy';
run;
proc sql noprint;
select count(*) into: cnt
from sashelp.vcolumn
where memname = 'LABEL_VAR';quit;
%let cnt = &cnt;
proc sql noprint;
select name into: name1 - :name&cnt
from sashelp.vcolumn
where memname = 'LABEL_VAR';quit;
%macro test;
%do i = 1 %to &cnt;
proc datasets library=test nolist;
modify label_var;
label &&name&i=&&name&i;
quit;
%end;
%mend test;
%test;
I currently have a dataset with 200 variables. From those variables, I created 100 new variables. Now I would like to drop the original 200 variables. How can I do that?
Slightly better would be, how I can drop variables 3-200 in the new dataset.
sorry if I was vague in my question but basically I figured out I need to use --.
If my first variable is called first and my last variable is called last, I can drop all the variables inbetween with (drop= first--last);
Thanks for all the responses.
As with most SAS tasks, there are several alternatives. The easiest and safest way to drop variables from a SAS data set is with PROC SQL. Just list the variables by name, separated by a comma:
proc sql;
alter table MYSASDATA
drop name, age, address;
quit;
Altering the table with PROC SQL removes the variables from the data set in place.
Another technique is to recreate the data set using a DROP option:
data have;
set have(drop=name age address);
run;
And yet another way is using a DROP statement:
data have;
set have;
drop name age address;
run;
Lots of options - some 'safer', some less safe but easier to code. Let's imagine you have a dataset with variables ID, PLNT, and x1-x200 to start with.
data have;
id=0;
plnt=0;
array x[200];
do _t = 1 to dim(x);
x[_t]=0;
end;
run;
data want;
set have;
*... create new 100 variables ... ;
*option 1:
drop x1-x200; *this works when x1-x200 are numerically consecutive;
*option 2:
drop x1--x200; *this works when they are physically in order on the dataset -
only the first and last matter;
run;
*Or, do it this way. This would also work with SQL ALTER TABLE. This is
the safest way to do it.;
proc sql;
select name into :droplist separated by ' ' from dictionary.columns
where libname='WORK' and memname='HAVE' and name not in ('ID','PRNT');
quit;
proc datasets lib=work;
modify want;
drop &droplist.;
quit;
If all of the variables you want to drop are named so they all start the same (like old_var_1, old_var_2, ..., old_var_n), you could do this (note the colon in drop option):
data have;
set have(drop= old_var:);
run;
data want;
set have;
drop VAR1--VARx;
run;
Would love to know if you can do this by position.
Definitely works with variable names separated by double dash (--).
I have some macros that would allow this here
You could run that whole set of macros, or just run list_vars(), is_blank(), num_words, find_word, remove_word, remove_words , nth_word().
Using these it would be:
%let keep_vars = keep_this and_this also_this;
%let drop_vars = %list_vars(old_dataset);
%let drop_vars = %remove_words(&drop_vars , &keep_vars);
data new_dataset (drop = &drop_vars );
set old_dataset;
/*stuff happens*/
run;
This will keep the three variables keep_this and_this also_this but drop everything else in the old dataset.