why macro is creating Leading space while resolving macro in sas? - sas

I am submitting the following SAS code:
proc format;
picture mysdt
low-high = '%Y%0m%0d%0H%0M' (datatype =datetime);
run;
DATA _NULL_;
call symput("Today", Put(datetime(),mysdt.));
run;
%put t_&today;
The resulting log shows 2 spaces before the datetime:
t_ 201504240150
The problem here is when my macro is resolved it is creating leading space. Why is it creating spaces?
My output should be:
t_201504240150
I know the solution but just wanted to know the reason.
DATA _NULL_;
call symput("Today", strip(Put(datetime(),mysdt.)));
run;

The reason for this is that your format is set up with a default length of 14. Therefore, when you go to put your value into &today it is stored with leading blanks to fill out the length to 14. From the SAS documentation:
DEFAULT=length
specifies the default length of the picture. The value for DEFAULT= becomes the length of picture if you do not give a specific length when you associate the format with a variable.
So, there are a number of options:
Set the default format length to match the expected length of your datetime value (using DEFAULT=12):
proc format;
picture mysdt (default=12)
low-high = '%Y%0m%0d%0H%0M' (datatype =datetime);
run;
Specify the width in your format:
DATA _NULL_;
call symput("Today", strip(Put(datetime(),mysdt12.)));
run;
Or, as previously answered, use call symputx to trim whitespace:
DATA _NULL_;
call symputx("Today", strip(Put(datetime(),mysdt.)));
run;
Personally, I'd fix the format to default to 12, then you won't need to remember to specify width or use call symputx each time.

call symputx remove leading and tailing space.
DATA _NULL_;
call symputx("Today", Put(datetime(),mysdt.));
run;

Related

Combine one column's values into a single string

This might sound awkward but I do have a requirement to be able to concatenate all the values of a char column from a dataset, into one single string. For example:
data person;
input attribute_name $ dept $;
datalines;
John Sales
Mary Acctng
skrill Bish
;
run;
Result : test_conct = "JohnMarySkrill"
The column could vary in number of rows in the input dataset.
So, I tried the code below but it errors out when the length of the combined string (samplkey) exceeds 32K in length.
DATA RECKEYS(KEEP=test_conct);
length samplkey $32767;
do until(eod);
SET person END=EOD;
if lengthn(attribute_name) > 0 then do;
test_conct = catt(test_conct, strip(attribute_name));
end;
end;
output; stop;
run;
Can anyone suggest a better way to do this, may be break down a column into chunks of 32k length macro vars?
Regards
It would very much help if you indicated what you're trying to do but a quick method is to use SQL
proc sql NOPRINT;
select name into :name_list separated by ""
from sashelp.class;
quit;
%put &name_list.;
As you've indicated macro variables do have a size limit (64k characters) in most installations now. Depending on what you're doing, a better method may be to build a macro that puts the entire list as needed into where ever it needs to go dynamically but you would need to explain the usage for anyone to suggest that option. This answers your question as posted.
Try this, using the VARCHAR() option. If you're on an older version of SAS this may not work.
data _null_;
set sashelp.class(keep = name) end=eof;
length long_var varchar(1000000);
length want $256.;
retain long_var;
long_var = catt(long_var, name);
if eof then do;
want = md5(long_var);
put want;
end;
run;

This range is repeated or overlapped

Now the question I have is I have a bigger problem as I am getting "this range is repeated or overlapped"... To be specific my values of label are repeating I mean my format has repeated values like a=aa b=aa c=as kind of. How do I resolve this error. When I use the hlo=M as muntilqbel option it gives double the data...
I am mapping like below.
Santhan=Santhan
Chintu=Santhan
Please suggest a solution.
To convert data to a FORMAT use the CNTLIN= option on PROC FORMAT. But first make sure the data describes a valid format. So read the data from the file.
data myfmt ;
infile 'myfile.txt' dsd truncover ;
length fmtname $32 start $100 value $200 ;
fmtname = '$MYFMT';
input start value ;
run;
Make sure to set the lengths of START and VALUE to be long enough for any actual values your source file might have.
Then make sure it is sorted and you do not have duplicate codes (START values).
proc sort data=myfmt out=myfmt_clean nodupkey ;
by start;
run;
The SAS log will show if any observations were deleted because of duplicate START values.
If you do have duplicate values then examine the dataset or original text file to understand why and determine how you want to handle the duplicates. The PROC SORT step above will keep just one of the duplicates. You might just has exact duplicates, in which case keeping only one is fine. Or you might want to collapse the duplicate observations into a single observation and concatenate the multiple decodes into one long decode.
If you want you can add a record that will add the functionality of the OTHER keyword of the VALUE statement in PROC FORMAT. You can use that to set a default value, like 'Value not found', to decode any value you might encounter that was not in your original source file.
data myfmt_final;
set myfmt_clean end=eof;
output;
if eof then do;
start = ' ';
label = 'Value not found';
hlo = 'O' ;
output;
end;
run;
Then use PROC FORMAT to make the format from the cleaned up data file.
proc format cntlin = myfmt_final;
run;
To convert a FORMAT to a dataset use the CNTLOUT= option on PROC FORMAT.
For example if you had created this format previously.
proc format ;
value $myfmt 'ABC'='ABC' 'BCD'='BCD' 'BCD1'='BCD' 'BCD2'='BCD' ;
run;
then you can use another PROC FORMAT step to make a dataset. Use the SELECT statement if you format catalog has more than one format defined and you just want one (or some) of them.
proc format cntlout=myfmt ;
select $myfmt ;
run;
Then you can use that dataset to easily make a text file. For example a comma delimited file.
data _null_;
set myfmt ;
file 'myfmt.txt' dsd ;
put start label;
run;
The result would be a text file that looks like this:
ABC,ABC
BCD,BCD
BCD1,BCD
BCD2,BCD
You get this error because you have the same code that maps to two different categories. I'm going to guess you likely did not import your data correctly from your text file and ended up getting some values truncated but without the full process it's an educated guess.
This will work fine:
proc format;
value $ test
'a'='aa' 'b'='aa' 'c'='as'
;
run;
This version will not work, because a is mapped to two different values, so SAS will not know which one to use.
proc format;
value $ badtest
'a'='aa'
'a' = 'ba'
'b' = 'aa'
'c' = 'as';
run;
This generates the error regarding overlaps in your data.
The way to fix this is to find the duplicates and determine which code they should actually map to. PROC SORT can be used to get your duplicate records.

file statement in data step to export comma delimited text file

Problem: suppose i do not know the variable name and number of variable. or imagine there are too many variables that i cannot write the put statement.
the following cases is that i knew there are 3 varialbes
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put region mtg sendmail;
run;
I tried using put _all_;
And the output is:
region=N mtg=24NOV1999 sendmail=10OCT1999 _ERROR_=0 _N_=1
region=S mtg=28DEC1999 sendmail=13NOV1999 _ERROR_=0 _N_=2
region=E mtg=03DEC1999 sendmail=19OCT1999 _ERROR_=0 _N_=3
region=W mtg=04OCT1999 sendmail=20AUG1999 _ERROR_=0 _N_=4
While it does not give comman delimited format but named format instead
My desired output would be
N,24NOV1999,10OCT1999
S,28DEC1999,13NOV1999
E,03DEC1999,19OCT1999
W,04OCT1999,20AUG1999
This is right one
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put (_all_) (~);
run;
this one helps u
The reason you have so many answers that seem to work, but have different characters, is that the important thing is changing _all_ to (_all_). The arguments after that are not important.
Explained in some detail here, you actually have two entirely different things going on when you write
put _all_;
and
put (_all_) (:);
Programmers familiar with the concept of an overloaded function will find that as the simplest way to think of this. If put sees _all_, it calls one version of put. If it sees (_all_) (or any list of variables with ( ) around it), it calls another (expanding _all_ to its variable list). Notice that if you try
put (_all_);
It fails, and it fails with errors suggesting it is trying to call formatted input (ie, it asks you why you don't have another ( there, which would be the normal thing in formatted input after a list with ( ).)
By itself, _all_ is an argument to put that specifically tells it to use named output to output all variables in the dataset. Hence the variable=value format of the output. So in the first example, _all_ is a constant - an argument - nothing more.
In the second example, though, (_all_) is a variable list, which contains all variables as if they were typed in, space delimited. So
put (_all_) (:);
is equivalent to
put (name sex age height weight) (:);
if used with SASHELP.CLASS. Adding anything - a colon, a tilde, an ampersand, etc. - that is legal in the context of formatted output will cause that to be used.
Note that
put _all_ #;
Does not cause that to happen - apparently # (or ## or / or //) are all legal arguments to put _all_.
Interestingly, _numeric_ and _character_ do not have an analogous shortcut - clearly this is an explicit, special case just for _all_. They cannot be used without parens. put _numeric_; gives an error that _numeric_ is not a legal variable name. But, put (_numeric_) (:); is perfectly legal.
Try the colon modifier option.
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put (_all_) (:);
run;
Another option is to read the names from the SASHELP.VCOLUMN table, create a macro variable that lists the columns and include that in your put statement.
The documentation is a bit scarce:
https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000176623.htm
:
enables you to specify a format that the PUT statement uses to write the variable value. All leading and trailing blanks are deleted, and each value is followed by a single blank.
~
enables you to specify a format that the PUT statement uses to write the variable value. SAS displays the formatted value in quotation marks even if the formatted value does not contain the delimiter. SAS deletes all leading and trailing blanks, and each value is followed by a single blank. Missing values for character variables are written as a blank (" ") and, by default, missing values for numeric variables are written as a period (".").
It is easiest to just use a variable list followed by a format list. Syntax is:
(<variable list>) (<format list>)
The values in the format list are repeated until the variables in the variable list are exhausted. The format list can include format modifiers like :,&,~ or = and cursor movement commands like /, +n, or #n.
Also you should add the DSD option to your FILE statement so that missing values are properly represented in the CSV file as having nothing between the delimiters.
So your program reduces to:
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' DSD dlm=',';
put (_all_) (:) ;
run;
The problem you had with PUT _ALL_; is that when _ALL_ is used by itself it is treated differently than when it is part of a variable list inside of (). As a variable list it does not include system generated variables such as _N_ or FIRST. or LAST. variables generated by BY statements.
Note that if you want to use _ALL_ in a variable list and still get named output you can use the = format modifier in the format list.
put (_all_) (=) ;
No, I'm Spartacus!
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put (_all_) (&);
run;
data meeting;
input region $ mtg $ sendmail $;
cards;
N 24NOV1999 10OCT1999
S 28DEC1999 13NOV1999
E 03DEC1999 19OCT1999
W 04OCT1999 20AUG1999
;
run;
proc export data=meeting
outfile='c:\input\meeting.txt'
dbms=tab replace;
delimiter=',';
run;
hope this is helpul for even number of of variables.

Compress Newline character for dynamic varaibles

Dataset: Have
F1 F2
Student Section
Name No
Dataset "Have". Data has new line character.
I need to compress the newline character from the data.
I want to do this dynamically as sometimes the "Have" dataset may contain new variables like F3,F4,F5 etc.,
I have written as macro to do this.. However it is not working as expected.
When i execute the below code, first time I am getting error as invalid reference newcnt. If i execute for second time in the same session, i am not getting error.
PFB my code:
%macro update_2(newcnt);
data HAVE;
set HAVE;
%do i= 1 %to &newcnt;
%let colname = F&i;
&colname=compress(&colname,,'c');
%end;
run;
%mend update_2;
%macro update_1();
proc sql noprint;
select count(*) into :cnt from dictionary.columns where libname="WORK" and memname="HAVE";
quit;
%update_2(&cnt)
%mend update_1;
Note: All the variables have name as F1,F2,F3,F4.,
Please tell me what is going wrong..
If there is any other procedures, please help me.
In your macro %update_1 you're creating a macro variable called &cnt, but when you call %update_2 you refer to another macro variable, &colcnt. Try fixing this reference and see if your code behaves as expected.
We created our own function to clean unwanted characters from strings using proc fcmp. In this case, our function cleans tab characters, line feeds, and carriage returns.
proc fcmp outlib=common.funcs.funcs; /* REPLACE TARGET DESTINATION AS NECESSARY */
function clean(iField $) $200;
length cleaned $200;
bad_char_list = byte(10) || byte(9) || byte(13);
cleaned = translate(iField," ",bad_char_list);
return (cleaned );
endsub;
run;
Create some test data with a new line character in the middle of it, then export it and view the results. You can see the string has been split across lines:
data x;
length employer $200;
employer = cats("blah",byte(10),"diblah");
run;
proc export data=x outfile="%sysfunc(pathname(work))\x.csv" dbms=csv replace;
run;
Run our newly created clean() function against the string and export it again. You can see it is now on a single line as desired:
data y;
set x;
employer = clean(employer);
run;
proc export data=y outfile="%sysfunc(pathname(work))\y.csv" dbms=csv replace;
run;
Now to apply this method to all character variables in our desired dataset. No need for macros, just define an array referencing all the character variables, and iterate over them applying the clean() function as we go:
data cleaned;
set x;
array a[*] _char_;
do cnt=lbound(a) to hbound(a);
a[cnt] = clean(a[cnt]);
end;
run;
EDIT : Also note that fcmp may have some performance considerations to consider. If you are working with very large amounts of data, there may be other solutions that will perform better.
EDIT 6/15/2020 : Corrected missing length statement that could result in truncated responses.
Here's an example of Robert Penridge's function, as a call routine with an array as an argument. This probably only works in 9.4+ or possibly later updates of 9.3, when permanent arrays began being allowed to be used as arguments in this way.
I'm not sure if this could be done flexibly with an array as a function; without using macros (which require recompilation of the function constantly) I don't know how one could make the right size of array be returned without doing it as a call routine.
I added 'Z' to the drop list so it's obvious that it works.
options cmplib=work.funcs;
proc fcmp outlib=work.funcs.funcs;
sub clean(iField[*] $);
outargs iField;
bad_char_list = byte(11)|| byte(10) || byte(9) || byte(13)||"Z";
do _i = 1 to dim(iField);
iField[_i] = translate(iField[_i],trimn(" "),bad_char_list);
end;
endsub;
quit;
data y;
length employer1-employer5 $20;
array employer[4] $;
do _i = 1 to dim(employer);
employer[_i] = "Hello"||byte(32)||"Z"||"Goodbye";
end;
employer5 = "Hello"||byte(32)||"Z"||"Goodbye";
call clean(employer);
run;
proc print data=y;
run;
Here is another alternative. If newline is the only thing you want to remove, then we are talking about Char only, you may leverage implicit array and Do over,
data want;
set have;
array chr _character_;
do over chr;
chr=compress(chr,,'c');
end;
run;

Using informats when creating a dataset from another dataset

I've got a dataset that's full of data all in character format.
Now I want to create another dataset from this one, put put everything it it's correct decimal or date or character format.
Here's what I'm trying.
data work.testout;
attrib account_open_date informat = mmddyy10.;
do i = 1 to nobs;
set braw.accounts point = i nobs = nobs;
output;
end;
stop;
run;
this gives me:
Variable 'account_open_date' from data set braw.accounts (at line 7 column 21) has a different type (character) to the variable type on the data vector (numeric)
What's the best way of doing this?
You cannot use an informat to convert a variable directly from character to numeric. At least in SAS proper, you cannot convert a variable from character to numeric, period, without using an intermediary. You must do something along the lines of the following:
data want;
set have(rename=varwant=temp);
varwant=input(temp,MMDDYY10.);
drop temp;
run;
There you rename the (character) variable to a temporary name, then convert it to numeric using INPUT.