I am trying to make character informat from the range values given in a dataset.
Dataset : Grade
Start End Label Fmtname Type
0 20 A $grad I
21 40 B $grad I
41 60 C $grad I
61 80 D $grad I
81 100 E $grad I
And here is the code i wrote to create the informat
proc format cntlin = grade;
run;
And now the code to create a temp dataset using the new informat
data temp;
input grade : $grad. ## ;
datalines;
21 30 0 45 10
;
The output i wanted was a dataset Temp with values :
Grade
A
B
A
..
Whereas the dataset Temp has values :
Grade
21
30
0
...
SAS Log Entry :
1146 proc format cntlin = grade;
NOTE: Informat $GRAD has been output.
1147 run;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
NOTE: There were 5 observations read from the data set WORK.GRADE.
1148
1149
1150 data temp;
1151 input grade : $grad. ## ;
1152
1153 datalines;
NOTE: SAS went to a new line when INPUT statement reached past the end of a
line.
NOTE: The data set WORK.TEMP has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
I am not able to understand why informat is not working. Can anyone please
explain where i am making my mistake.
INFORMATS convert characters to (characters or numbers). So you can't use START/END the way you are doing so, since that only works with numbers.
See the following:
proc format;
invalue $grade
'0'-'20'="A"
'21'-'40'="B"
'41'-'60'="C"
'61'-'80'="D"
'81'-'100'="E";
quit;
proc format;
invalue $grade
'21'='A';
quit;
The latter works, the former gives you an error. So, you could write a dataset with all 101 values (each on a line with START), or just write a format and do it in a second step (read in as a number and then PUT to the format).
Related
I have a numeric parameter given to my macro and would like to convert it to date, set to end of month and apply a format.
Following code works for many dates, but not for march; throws 'Literal contains unmatched quote'.
proc format;
picture mydatep
low-high = "'%0d-%0b-%0Y'" (datatype = date);
%macro test(cycle=);
%let enddate = %SYSFUNC(intnx(month, %SYSFUNC(inputn(&cycle., yymmn6.)), 0, e), mydatep.);
%put &enddate.;
%mend;
%test(cycle=201602); /* works --> 29-Feb-2016*/
%test(cycle=201603); /* works not */
%test(cycle=201604); /* works again --> 30-Apr-2016*/
%test(cycle=201402); /* works --> 28-Feb-2014*/
%test(cycle=201403); /* works not */
%test(cycle=201404); /* works again --> 30-Apr-2014*/
I have been using the code for some years now, and never had trouble with it. I am using SAS Analytics Pro 9.4
Solution: I was starting the SAS session via SAS (Unicode). Switching to SAS (Deutsch) [engl: SAS (German)], solved the issue.
I don't know why, though.
#Kenji: "Switching to SAS (Deutsch) [engl: SAS (German)], solved the issue. I don't know why, though."
The explanation is quite simple, some date formats in German and English differ in just a few cases:
German English equal?
----------------------------------------
01Jan2022 01Jan2022 yes
01Feb2022 01Feb2022 yes
01Mär2022 01Mar2022 NO
01Apr2022 01Apr2022 yes
...
01Okt2022 01Oct2022 NO
01Nov2022 01Nov2022 yes
01Dez2022 01Dec2022 NO
So in a German environment, it is a common observation that your code might work in most cases but not for March, October and December.
If you used an endash or emdash instead of hyphens in your PICTURE text that would change the generated character string from 12 bytes to 13 or 14 bytes. Those non ASCII characters require more than one byte of storage.
So if your code used a width of 12 with that format then the value would be truncated removing the closing quote and possibly the last digit of the year also.
Tell PROC FORMAT that the default width for the format should be 14 and not 13.
proc format;
picture mydatep (default=14)
low-high = "'%0d-%0b-%0Y'" (datatype = date);
run;
Example:
23 proc options option=ENCODING option=LOCALE option=DATESTYLE option=DFLANG ;
24 run;
SAS (r) Proprietary Software Release 9.4 TS1M5
ENCODING=UTF-8 Specifies the default character-set encoding for the SAS session.
LOCALE=EN_US Specifies a set of attributes in a SAS session that reflect the language, local conventions, and culture for a
geographical region.
DATESTYLE=MDY Specifies the sequence of month, day, and year when ANYDTDTE, ANYDTDTM, or ANYDTTME informat data is ambiguous.
DFLANG=GERMAN Specifies the language for international date informats and formats.
NOTE: PROCEDURE OPTIONS used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
25
26 options dflang=german locale=de_DE ;
27
28 data test;
29 do month=1 to 12;
30 length string $20 ;
31 string=put(mdy(month,1,2000),mydatep.);
32 put month= string=;
33 end;
34 run;
month=1 string='01-Jan-2000'
month=2 string='01-Feb-2000'
month=3 string='01-Mär-2000
month=4 string='01-Apr-2000'
month=5 string='01-Mai-2000'
month=6 string='01-Jun-2000'
month=7 string='01-Jul-2000'
month=8 string='01-Aug-2000'
month=9 string='01-Sep-2000'
month=10 string='01-Okt-2000'
month=11 string='01-Nov-2000'
month=12 string='01-Dez-2000'
NOTE: The data set WORK.TEST has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
35 proc format;
36 picture mydatep (default=14)
37 low-high = "'%0d-%0b-%0Y'" (datatype = date);
NOTE: Format MYDATEP is already on the library WORK.FORMATS.
NOTE: Format MYDATEP has been output.
38 run;
NOTE: PROZEDUR FORMAT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
39
40 data test;
41 do month=1 to 12;
42 length string $20 ;
43 string=put(mdy(month,1,2000),mydatep.);
44 put month= string=;
45 end;
46 run;
month=1 string='01-Jan-2000'
month=2 string='01-Feb-2000'
month=3 string='01-Mär-2000'
month=4 string='01-Apr-2000'
month=5 string='01-Mai-2000'
month=6 string='01-Jun-2000'
month=7 string='01-Jul-2000'
month=8 string='01-Aug-2000'
month=9 string='01-Sep-2000'
month=10 string='01-Okt-2000'
month=11 string='01-Nov-2000'
month=12 string='01-Dez-2000'
NOTE: The data set WORK.TEST has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
I want to store an instance of a data step variable in a macro-variable using call symput, then use that macro-variable in the same data step to populate a new field, assigning it a new value every 36 records.
I tried the following code:
data a;
set a;
if MOB = 1 then do;
MOB1_accounts = accounts;
call symput('MOB1_acct', MOB1_accounts);
end;
else if MOB > 1 then MOB1_accounts = &MOB1_acct.;
run;
I have a series of repeating MOB's (1-36). I want to create a field called MOB1_Accts, set it equal to the # of accounts for that cohort where MOB = 1, and keep that value when MOB = 2, 3, 4 etc. I basically want to "drag down" the MOB 1 value every 36 records.
For some reason this macro-variable is returning "1" instead of the correct # accounts. I think it might be a char/numeric issue but unsure. I've tried every possible permutation of single quotes, double quotes, symget, etc... no luck.
Thanks for the help!
You are misusing the macro system.
The ampersand (&) introducer in source code tells SAS to resolve the following symbol and place it into the code submission stream. Thus, the resolved &MOB1_acct. can not be changed in the running DATA Step. In other words, a running step can not change it's source code -- The resolved macro variable will be the same for all implicit iterations of the step because its value became part of the source code of the step.
You can use SYMPUT() and SYMGET() functions to move strings out of and into a DATA Step. But that is still the wrong approach for your problem.
The most straight forward technique could be
use of a retained variable
mod (_n_, 36) computation to determine every 36th row. (_n_ is a proxy for row number in a simple step with a single SET.)
Example:
data a;
set a;
retain mob1_accounts;
* every 36 rows change the value, otherwise the value is retained;
if mod(_n_,36) = 1 then mob1_accounts = accounts;
run;
You didn't show any data, so the actual program statements you need might be slightly different.
Contrasting SYMPUT/SYMGET with RETAIN
As stated, SYMPUT/SYMGET is a possible way to retain values by off storing them in the macro symbol table. There is a penalty though. The SYM* requires a function call and whatever machinations/blackbox goings on are happening to store/retrieve a symbol value, and possibly additional conversions between character and numeric.
Example:
1,000,000 rows read. DATA _null_ steps to avoid writing overhead as part of contrast.
data have;
do rownum = 1 to 1e6;
mob + 1;
accounts = sum(accounts, rand('integer', 1,50) - 10);
if mob > 36 then mob = 1;
output;
end;
run;
data _null_;
set have;
if mob = 1 then call symput ('mob1_accounts', cats(accounts));
mob1_accounts = symgetn('mob1_accounts');
run;
data _null_;
set have;
retain mob1_accounts;
if mob = 1 then mob1_accounts = accounts;
run;
On my system logs
142 data _null_;
143 set have;
144
145 if mob = 1 then call symput ('mob1_accounts', cats(accounts));
146
147 mob1_accounts = symgetn('mob1_accounts');
148 run;
NOTE: There were 1000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 0.34 seconds
cpu time 0.34 seconds
149
150 data _null_;
151 set have;
152 retain mob1_accounts;
153
154 if mob = 1 then mob1_accounts = accounts;
155 run;
NOTE: There were 1000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds
Or
way real cpu
------------- ------ ----
SYMPUT/SYMGET 0.34 0.34
RETAIN 0.04 0.03
I have a variable for counting days. I'm trying to use the day count to divide by total days.
How do I create a macro that stores the most recent day and allows me to quote it later?
This is what I have so far (I've cut out code that's not relevant)
DATA scotland;
input day deathsscotland casesscotland;
cards;
1 1 85
2 1 121
3 1 153
4 1 171
5 2 195
6 3 227
7 6 266
8 6 322
9 7 373
10 10 416
11 14 499
12 16 584
13 22 719
14 25 894
;
run;
proc sort data=scotland out=scotlandsort;
by day;
run;
Data _null_;
keep day;
set scotlandsort end=eof;
if eof then output;
run;
%let daycountscot = day
Data ratio;
set cdratio;
SCOTLANDAVERAGE = (SCOTLANDRATIO/&daycountscot)*1000;
run;
Using your own code, you can create the macro variable like this
Data _null_;
keep day;
set scotlandsort end=eof;
if eof then call symputx('daycountscot', day);
run;
%put &daycountscot.;
The data _null_ is not doing anything. You can eliminate the sort and data steps by selecting the max day value directly into a macro variable.
proc sql noprint;
select max(day) into :daycountscot trimmed
from scotland
;
quit;
No need to use macro code for this, it is better to keep values in variables anyway. To convert the value into text to store it as a macro variable SAS will have to round the number.
You could make a dataset with the maximum DAY value and then combine it with the dataset where you want to do the division.
data last_day;
set scotlandsort end=eof;
if eof then output;
keep day;
rename day=last_day;
run;
data ratio;
set cdratio;
if _n_=1 then set last_day;
SCOTLANDAVERAGE = (SCOTLANDRATIO/last_day)*1000;
run;
Probably easier in SQL code:
proc sql;
create table ratio as
select a.*, (SCOTLANDRATIO/last_day)*1000 as SCOTLANDAVERAGE
from cdratio a
, (select max(day) as last_day from scotland)
;
quit;
I am trying to apply format to a variable in a data set, but after running the data step, I am still only seeing the raw values ( eg -1) instead of formatted values( eg -1=INAPPLICABLE). minimal re-producible code example below. Any help at all greatly appreciated.
proc format library=PUFLIB;
'-1' = '-1 INAPPLICABLE'
'1' = '1 YES'
'2' = '2 NO'
'3' = '3 DOES NOT WORK'
;
run;
data example_ds;
FORMAT ACCDNWRK $ACCDNWRK_FMT.;
input accdnwrk $;
datalines;
1
2
3
-1
;
Please ensure to review your log.
It shows that the error is right after your PROC FORMAT statement. In this case you're missing the code that tells you SAS if the format is a informat or format and the format name.
70
71 proc format library=PUFLIB;
72 '-1' = '-1 INAPPLICABLE'
____
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
73 '1' = '1 YES'
74 '2' = '2 NO'
75 '3' = '3 DOES NOT WORK'
76 ;
NOTE: The previous statement has been deleted.
77 run;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
Adding in value to indicate it's a format and then the format name, accwrk_fmt is what's needed.
proc format library=puflib;
value accwrk_fmt
*rest of your code*
You seem to want a numeric format. Hope this helps you
proc format;
value ACCDNWRK_FMT
-1 = '-1 INAPPLICABLE'
1 = '1 YES'
2 = '2 NO'
3 = '3 DOES NOT WORK'
;
run;
data example_ds;
FORMAT ACCDNWRK ACCDNWRK_FMT.;
input accdnwrk;
datalines;
1
2
3
-1
;
I am extracting data from a database that has all values posted in strings, in the format +000000xx.xxx or -00000xx.xxx . I need to convert these to numeric to operate on.
data want;
set have;
numeric_var = string_var*1;
run;
works fine, but, to save compute time and resources on the final running, which will be over a much larger dataset, and in the interest of doing things properly I'd rather do that with a format or informat statement.
data want;
set have;
numeric_var = input(string_var, best8.);
run;
seems to output wrong values and to round everything to 0.
Any ideas?
Using best8. is telling SAS to only consider the first 8 characters of the string, so that's never going to work. You should use just best. or possibly best32. if you feel you have to pre-specify the length.
However, make sure you run some benchmarks before changing your current simple solution. SAS is already doing a character-to-numeric conversion as part of the numeric_var = string_var*1; statement, and is apparently doing it correctly; changing the code to use an informat will not automatically be any faster.
It would be cool if you benchmarked both methods and reported the results back here.
EDIT:
I did some benchmarking on this, out of curiosity. The code and log are below but TL;DR - the informat seems to be very slightly but consistently faster - 7.58 seconds vs 7.83 seconds in the run below on a 50 million observation data set. So the informat method is the way to go, but the 3% performance gain wouldn't be worth refactoring a large program, particularly if you don't have good test coverage to be sure of avoiding regressions.
483 * Set small for testing, big for benchmarking;
484 %let obs = 50000000;
485
486 * Generate test data;
487 data testdata;
488 do i = 1 to &obs;
489 numeric = round(ranuni(0)*100, 0.001);
490 char = '+' || put(numeric, z12.3-L);
491 output;
492 end;
493 run;
NOTE: The data set WORK.TESTDATA has 50000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 12.55 seconds
user cpu time 11.41 seconds
system cpu time 0.84 seconds
memory 4375.18k
OS Memory 20784.00k
Timestamp 12/10/2019 10:36:11 AM
Step Count 51 Switch Count 0
494
495 %macro charToNum(in=, method=, obs=);
496
497 * Convert back to numeric;
498 data converted;
499 set ∈
500 %if "&method" = "MULT-BY-ONE" %then %do;
501 converted = char * 1;
502 %end; %else %if "&method" = "INFORMAT" %then %do;
503 converted = input(char, 32.);
504 %end;
505 if converted ne numeric then do;
506 put "ERROR: Conversion failed: " numeric= char= converted=;
507 end;
508 run;
509
510 %mend;
511
512 %charToNum(in = testdata, method = MULT-BY-ONE, obs = &obs);
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
3:20
NOTE: There were 50000000 observations read from the data set WORK.TESTDATA.
NOTE: The data set WORK.CONVERTED has 50000000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 7.83 seconds
user cpu time 5.92 seconds
system cpu time 1.88 seconds
memory 14642.84k
OS Memory 31036.00k
Timestamp 12/10/2019 10:36:18 AM
Step Count 52 Switch Count 0
513 %charToNum(in = testdata, method = INFORMAT, obs = &obs);
NOTE: There were 50000000 observations read from the data set WORK.TESTDATA.
NOTE: The data set WORK.CONVERTED has 50000000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 7.58 seconds
user cpu time 5.36 seconds
system cpu time 2.15 seconds
memory 14646.18k
OS Memory 31036.00k
Timestamp 12/10/2019 10:36:26 AM
Step Count 53 Switch Count 0
If you want to keep only the numbers, use the code below.
Using compress this way the numbers in the string will keeped.
The first parameter is the name of variable. The second is optional, this case the caracters to be keeped. Third is "k" that means keep.
data want;
set have;
numeric_var = input(compress(string_var,"0123456789","k"), best8.);
run;