I need to store macro variable values in the SAS dataset like this:
PROC SQL;
SELECT SUM(Invoice)/1000 FORMAT COMMAX18.2 INTO :val01 FROM SASHELP.CARS;
QUIT;
%PUT &val01.;
PROC SQL;
SELECT COUNT(Model) FORMAT COMMAX12. INTO :val02 FROM SASHELP.CARS;
QUIT;
%PUT &val02.;
PROC SQL;
SELECT SUM(MSRP)/1000000 FORMAT COMMAX18.2 INTO :val03 FROM SASHELP.CARS;
QUIT;
%PUT &val03.;
data WORK.teste;
input col01 $ col02 $ col03 $;
datalines;
Invoice &val01. k
Model &val02. un
MSRP &val03. mi
;
proc print data=WORK.teste;
run;
The desired result would be:
Obs col01 col02 col03
1 Invoice 12.846,29 k
2 Model 428 un
3 MSRP 14,03 mi
However, macro variable references are not being interpreted by SAS as macro variable values.
And the result is:
Obs col01 col02 col03
1 Invoice &val01. k
2 Model &val02. un
3 MSRP &val03. mi
How to make the SAS understand &val01. as a macro variable name, rather than a string.
Anyone would like a tip, please.
You can could use the RESOLVE() function, as long as the lines of data are shorter than the maximum length allowed for a character variable.
%let val01= 12.846,29;
%let val02= 428;
%let val03= 14,03 ;
data test;
input #;
_infile_=resolve(_infile_);
input col01 $ col02 $ col03 $;
datalines;
Invoice &val01. k
Model &val02. un
MSRP &val03. mi
;
Or if it is only COL02 that needs to be replaced by the value of a macro variable then have the source data for that column just be the NAME of the macro variable, not an actual macro expression. Then use the SYMGET() function to retrieve the value of the macro variable.
data test;
length col01-col03 $10;
input col01 mvar :$32. col03 ;
col02 = symget(mvar);
drop mvar;
datalines;
Invoice val01 k
Model val02 un
MSRP val03 mi
;
Macro facility does not accept CARDS or PARMCARDS statements. See https://support.sas.com/kb/43/902.html
And if you really want to do this, here are examples using different ways(quote() function or proc stream): https://support.sas.com/kb/43/295.html
How to use macro variable in datalines?
On the other hand, using cards statement is not valid in macro definition. See
https://technologytales.com/2008/10/28/sas-macro-and-datalinecards-statements-in-data-step/
The anwser about this question is from the first link, I copied it and write here:
Most of the SAS Language and all of the SAS Macro Language are 'free format' giving the ability to format and indent code for better readability. When compiling a macro, the SAS Macro Facility strips out any formatting (spaces, line breaks, tabs, etc.) in the model text to minimize the size of the stored text.
DATALINES, CARDS, and PARMCARDS statements are not free format. When the SAS Macro Facility compiles these statements, it removes the formatting, which would very possibly move data into incorrect lines and/or columns. This can cause incorrect results and is the fundamental reason for the restriction.
Related
I have an existing process that imports data from a flat file with no headers. There are hundreds of columns. The provider of the file has added several hundred more columns at different points within the existing columns. I have a list of the old and new column names and SAS code that properly sets the data types for the old columns but not the new ones. I'd rather not have to go through my existing import code and manually write column headers and data formats but I'm not sure how to use these parts to get new import code for the new headers.
data raw_file;
infile "flatfile.csv" delimiter="|" missover dsd firstobs=1;
informat oldcol1 best32.;
informat oldcol2 mmddyy10.;
informat oldcolN $60.;
format oldcol1 best32.;
format oldcol2 mmddyy10.;
format oldcolN $60.;
input
oldcol1
oldcol2
oldcolN $;
run;
I have the header information in an Excel file right now.
old K010H K010I K010J K020A
new K010H K010I K010J K010L K010M K010N K020A
Based on your description, I presume you either know or will find out the informats for the new columns also. If that is the case, why don't you auto generate the code to read the file?
Since you have the header information, assuming you can modify it to the following format and save as a CSV:
var infmt
K010H best32.
K010I mmddyy10.
K010J $60.
K010L best32.
K010M mmddyy10.
K010N $60.
K020A best32.
Then something like this would automatically generate the code and read the data for you:
proc import datafile="cols.csv" out=cols replace;
run;
proc sql;
select var into :cols separated by ' ' from cols ;
select infmt into :infmts separated by ' ' from cols ;
quit;
%macro gen_code;
data raw_file;
infile "flatfile.csv" delimiter="|" missover dsd firstobs=1;
%let ii = 1;
%do %while (%scan(&cols, &ii, %str( )) ~= %str());
%let col = %scan(&cols, &ii, %str( ));
%let infmt = %scan(&infmts, &ii, %str( ));
informat &col &infmt ;
%let ii = %eval(&ii + 1);
%end;
input
%let ii = 1;
%do %while (%scan(&cols, &ii, %str( )) NE %str());
%let col = %scan(&cols, &ii, %str( ));
&col
%let ii = %eval(&ii + 1);
%end;
;
run;
%mend;
%gen_code;
In the future, you could make modifications to your header CSV file and the rest will be taken care by the code itself.
If you have a machine readable data dictionary then you can generate the code from that. Otherwise you will need to just edit your data step. While you are at it you can clean it up so that it is easier to maintain.
First thing is to use LENGTH or ATTRIB to define the variables, instead of forcing SAS to guess. Second only attach informats or formats to variables that need them. For example there is no need to attach informats to normal strings or numbers. No need to attach $xx format to character variables. Do you really need to attach BEST32. format to numbers instead of letting SAS go ahead and display the numeric variables without formats attached using the default BEST12. format?
Second if you define the variables in the order they appear then you can use a positional variable list in the INPUT statement. Then you only have to change the INPUT statement if the first or last variable changes.
So for your example you might create a data step like this instead.
data raw_file;
infile "flatfile.csv" dlm="|" truncover dsd firstobs=1;
length
oldcol1 8
oldcol2 8
oldcolN $60
;
informat oldcol2 mmddyy10.;
format oldcol2 mmddyy10.;
input oldcol1 -- oldcolN ;
run;
Then adding new variables is as simple as inserting them into right place in the LENGTH statement and when needed adding them to the INFORMAT and/or FORMAT statements. If you don't know what the variables contain then make them as character strings and look at the resulting values and decide later if you need to define them differently.
I am having two data sets. The first data set has airport codes (JFK, LGA, EWR) in a variable 'airport'. The second dataset has the list of all major airports in the world. This dataset has two variables 'faa' holding the FAA Code (like JFG, LGA, EWR) and 'name' holding the actual name of the airport (John. F Kennedy, Le Guardia etc.).
My requirement is to create value labels for in the first data set, so that instead of airport code, the actual name of the airport comes up. I know I can use custom formats to achieve this. But can I write SAS code which can read the unique airport codes, then get the names from another data set and create a value label automatically?
PS: Other wise, the only option I see is to use MS Excel to get the unique list of FAA codes in dataset 1, and then use VLOOKUP to get the names of the airports. And then create one custom format by listing each unique FAA code and the airport name.
I think "value label" is SPSS terminology. Looks like you want to create a format. Just use your lookup table to create an input dataset for PROC FORMAT.
So if your second table looks like this:
data table2;
length FAA $4 Name $40 ;
input FAA Name $40. ;
cards;
JFK John F. Kennedy (NYC)
LGA Laguardia (NYC)
EWR Newark (NJ)
;
You can use this code to convert it into a dataset that PROC FORMAT can use to create a format.
data fmt ;
fmtname='$FAA';
hlo=' ';
set table2 (rename=(faa=start name=label));
run;
proc format cntlin=fmt lib=work.formats;
run;
Now you can use that format with your other data.
proc freq data=table1 ;
tables airport ;
format airport faa. ;
run;
Firstly, consider if it is really a format what is needed. For example, you may just do a left join to retrieve the column (airport) name from table2 (FAA-Name table).
Anyway, I believe the following macro does the trick:
Create auxiliary tables:
data have1;
input airport $;
datalines;
a
d
e
;
run;
data have2;
input faa $ name $;
datalines;
a aaaa
b bbbb
c cccc
d dddd
;
run;
Macro to create Format:
%macro create_format;
*count number of faa;
proc sql noprint;
select distinct count(faa) into:n
from have2;
quit;
*create macro variables for each faa and name;
proc sql noprint;
select faa, name
into:faa1-:faa%left(&n),:name1-:name%left(&n)
from have2;
quit;
*create format;
proc format;
value $airport
%do i=1 %to &n;
"&faa%left(&i)" = "&name%left(&i)"
%end;
other = "Unknown FAA code";
run;
%mend create_format;
%create_format;
Apply format:
data want;
set have1;
format airport $airport.;
run;
I have a dataset with X number of categorical variables for a given record. I would like to somehow turn this dataset into a new dataset with dummy variables, but I want to have one command / macro that will take the dataset and make the dummy variables for all variables in the dataset.
I also dont want to specify the name of each variable, because I could have a dataset with 50 variables so it would be too cumbersome to have to specify each variable name.
Lets say I have a table like this, and I want the resulting table, with the above conditions that I want a single command or single macro without specifying each individual variable:
You can use PROC GLMSELECT to generate the design matrix, which is what you are asking for.
data test;
input id v1 $ v2 $ v3 $ ;
datalines;
1 A A A
2 B B B
3 C C C
4 A B C
5 B A A
6 C B A
;
proc glmselect data=test outdesign(fullmodel)=test_design noprint ;
class v1 -- v3;
model id = v1 -- v3 /selection=none noint;
run;
You can use the -- to specify all variables between the first and last. Notice I don't have to type v2. So if you know first and the last, you can get want you want easily.
I prefer GLMMOD myself. One note, if you can, CLASS variables are usually a better way to go, but not supported by all PROCS.
/*Run model within PROC GLMMOD for it to create design matrix
Include all variables that might be in the model*/
proc glmmod data=sashelp.class outdesign=want outparm=p;
class sex age;
model weight=sex age height;
run;
/*Create rename statement automatically
THIS WILL NOT WORK IF YOUR VARIABLE NAMES WILL END UP OVER 32 CHARS*/
data p;
set p;
if _n_=1 and effname='Intercept' then
var='Col1=Intercept';
else
var=catt("Col", _colnum_, "=", catx("_", effname, vvaluex(effname)));
run;
proc sql ;
select var into :rename_list separated by " " from p;
quit;
/*Rename variables*/
proc datasets library=work nodetails nolist;
modify want;
rename &rename_list;
run;
quit;
proc print data=want;
run;
Originally from here and the post has links to several other methods.
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-dummy-variables-Categorical-Variables/ta-p/308484
Here is a worked example using your simple three observation dataset and a modified version of the PROC GLMMOD method posted by #Reeza
First let's make a sample dataset with a long character ID variable. We will introduce a numeric ROW variable that we can later use to merge the design matrix back with the input data.
data have;
input id :$21. education_lvl $ income_lvl $ ;
row+1;
datalines;
1 A A
2 B B
3 C C
;
You could set the list of variables into a macro variable since we will need to use it in multiple places.
%let varlist=education_lvl income_lvl;
Use PROC GLMMOD to generate the design matrix and the parameter list that we will later use to generate user friendly variable names.
proc glmmod data=have outdesign=design outparm=parm noprint;
class &varlist;
model row=&varlist / noint ;
run;
Now let's use the parameter list to generate rename statement to a temporary text file.
filename code temp;
data _null_;
set parm end=eof;
length rename $65 ;
rename = catx('=',cats('col',_colnum_),catx('_',effname,of &varlist));
file code ;
if _n_=1 then put 'rename ' ;
put #3 rename ;
if eof then put ';' ;
run;
Now let's merge back with the input data and rename the variables in the design matrix.
data want;
merge have design;
by row ;
%inc code / source2;
run;
What i want to do: I need to create a new variables for each value labels of a variable and do some recoding. I have all the value labels output from a SPSS file (see sample).
Sample:
proc format; library = library ;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
value ... (many more with different amount of levels)
The new variable name would be the actual one without F and with underscore+level (example: FUMERT1F level 0 would become FUMERT1_0).
After that i need to recode the variables on this pattern:
data ds; set ds;
FUMERT1_0=0;
if FUMERT1=0 then FUMERT1_0=1;
FUMERT1_1=0;
if FUMERT1=1 then FUMERT1_1=1;
FUMERT1_2=0;
if FUMERT1=2 then FUMERT1_2=1;
FUMERT1_3=0;
if FUMERT1=3 then FUMERT1_3=1;
run;
Any help will be appreciated :)
EDIT: Both answers from Joe and the one of data_null_ are working but stackoverflow won't let me pin more than one right answer.
Update to add an _ underscore to the end of each name. It looks like there is not option for PROC TRANSREG to put an underscore between the variable name and the value of the class variable so we can just do a temporary rename. Create rename name=newname pairs to rename class variable to end in underscore and to rename them back. CAT functions and SQL into macro variables.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
%let class=sex fumert1;
proc transpose data=have(obs=0) out=vnames;
var &class;
run;
proc print;
run;
proc sql noprint;
select catx('=',_name_,cats(_name_,'_')), catx('=',cats(_name_,'_'),_name_), cats(_name_,'_')
into :rename1 separated by ' ', :rename2 separated by ' ', :class2 separated by ' '
from vnames;
quit;
%put NOTE: &=rename1;
%put NOTE: &=rename2;
%put NOTE: &=class2;
proc transreg data=have(rename=(&rename1));
model class(&class2 / zero=none);
id caseid;
output out=design(drop=_: inter: rename=(&rename2)) design;
run;
%put NOTE: _TRGIND(&_trgindn)=&_trgind;
First try:
Looking at the code you supplied and the output from Joe's I don't really understand the need for the formats. It looks to me like you just want to create dummies for a list of class variables. That can be done with TRANSREG.
data have;
call streaminit(1234);
do caseID = 1 to 1e4;
fumert1 = rand('table',.2,.2,.2) - 1;
sex = first(substrn('MF',rand('table',.5),1));
output;
end;
stop;
run;
proc transreg data=have;
model class(sex fumert1 / zero=none);
id caseid;
output out=design(drop=_: inter:) design;
run;
proc contents;
run;
proc print data=design(obs=40);
run;
One good alternative to your code is to use proc transpose. It won't get you 0's in the non-1 cells, but those are easy enough to get. It does have the disadvantage that it makes it harder to get your variables in a particular order.
Basically, transpose once to vertical, then transpose back using the old variable name concatenated to the variable value as the new variable name. Hat tip to Data null for showing this feature in a recent SAS-L post. If your version of SAS doesn't support concatenation in PROC TRANSPOSE, do it in the data step beforehand.
I show using PROC EXPAND to then set the missings to 0, but you can do this in a data step as well if you don't have ETS or if PROC EXPAND is too slow. There are other ways to do this - including setting up the dataset with 0s pre-proc-transpose - and if you have a complicated scenario where that would be needed, this might make a good separate question.
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
proc transpose data=have out=want_pre;
by caseID;
var fumert1 sex;
copy fumert1 sex;
run;
data want_pre_t;
set want_pre;
x=1; *dummy variable;
run;
proc transpose data=want_pre_t out=want delim=_;
by caseID;
var x;
id _name_ col1;
copy fumert1 sex;
run;
proc expand data=want out=want_e method=none;
convert _numeric_ /transformin=(setmiss 0);
run;
For this method, you need to use two concepts: the cntlout dataset from proc format, and code generation. This method will likely be faster than the other option I presented (as it passes through the data only once), but it does rely on the variable name <-> format relationship being straightforward. If it's not, a slightly more complex variation will be required; you should post to that effect, and this can be modified.
First, the cntlout option in proc format makes a dataset of the contents of the format catalog. This is not the only way to do this, but it's a very easy one. Specify the appropriate libname as you would when you create a format, but instead of making one, it will dump the dataset out, and you can use it for other purposes.
Second, we create a macro that performs your action one time (creating a variable with the name_value name and then assigning it to the appropriate value) and then use proc sql to make a bunch of calls to that macro, once for each row in your cntlout dataset. Note - you may need a where clause here, or some other modifications, if your format library includes formats for variables that aren't in your dataset - or if it doesn't have the nice neat relationship your example does. Then we just make those calls in a data step.
*Set up formats and dataset;
proc format;
value SEXF
1 = 'Homme'
2 = 'Femme' ;
value FUMERT1F
0 = 'Non'
1 = 'Oui , occasionnellement'
2 = 'Oui , régulièrement'
3 = 'Non mais j''ai déjà fumé' ;
quit;
data have;
do caseID = 1 to 1e4;
fumert1 = rand('Binomial',.3,3);
sex = rand('Binomial',.5,1)+1;
output;
end;
run;
*Dump formats into table;
proc format cntlout=formats;
quit;
*Macro that does the above assignment once;
%macro spread_var(var=, val=);
&var._&val.= (&var.=&val.); *result of boolean expression is 1 or 0 (T=1 F=0);
%mend spread_var;
*make the list. May want NOPRINT option here as it will make a lot of calls in your output window otherwise, but I like to see them as output.;
proc sql;
select cats('%spread_var(var=',substr(fmtname,1,length(Fmtname)-1),',val=',start,')')
into :spreadlist separated by ' '
from formats;
quit;
*Actually use the macro call list generated above;
data want;
set have;
&spreadlist.;
run;
I have a problem that seems pretty simple (probably is...) but I can't get it to work.
The variable 'name' in the dataset 'list' has a length of 20. I wish to conditionally select values into a macro variable, but often the desired value is less than the assigned length. This leaves trailing blanks at the end, which I cannot have as they disrupt future calls of the macro variable.
I've tried trim, compress, btrim, left(trim, and other solutions but nothing seems to give me what I want (which is 'Joe' with no blanks). This seems like it should be easier than it is..... Help.
data list;
length id 8 name $20;
input id name $;
cards;
1 reallylongname
2 Joe
;
run;
proc sql;
select trim(name) into :nameselected
from list
where id=2;
run;
%put ....&nameselected....;
Actually, there is an option, TRIMMED, to do what you want.
proc sql noprint;
select name into :nameselected TRIMMED
from list
where id=2;
quit;
Also, end PROC SQL with QUIT;, not RUN;.
It works if you specify a separator:
proc sql;
select trim(name) into :nameselected separated by ''
from list
where id=2;
run;