SAS, PROC FORMAT change string to numeric - sas

I want to create a format on a string variable (Monday, Tuesday, Wednesd, Thrusda, Friday) to see the result as 1 to 5, so I can sort the data. I tried sth as:
proc format;
value days
'Monday'=1
'Tuesday'=2
'Wednesd'=3
'Thrusda'=4
'Friday'=5
run;
In the log file, an error likes this appear:
ERROR: The quoted string 'Monday' is not acceptable to a numeric format or informat.
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a format name.
ERROR 200-322: The symbol is not recognized and will be ignored.
Additional INFO
After creating the format, I will apply this in the plot, something like below:
PROC GLM data=Newspaper;
class Day Newspaper;
model ad_effect = Day|Newspaper;
**format Day days.;**
title 'Analyze the effects of Day & Newspaper';
title2 'Including Interaction';
run;
quit;
title;
Using the Format, the the marker in the scatter plot can be showed in a order from Monday to Friday. Otherwise, the marker will be showed based on alphabetical order.
Please share your idea.

You can use and INFORMAT to create a new variable by reading the day name as a number. For example.
proc format;
invalue days
'Monday'=1
'Tuesday'=2
'Wednesd'=3
'Thrusda'=4
'Friday'=5;
run;
data days;
input day:days.;
cards;
Monday
Tuesday
Wednesd
;;;;
run;
proc print;
run;

Related

Macro variable (date) not working as expected in query

I've several SAS (PROC SQL) queries using a MIN(startdate) and MAX(enddate).
To avoid having to calculate these every time I want to do this once at the beginning and store it in a macro variable but I get an error every time.
What is going wrong or how to achieve this ?
Thanks in advance for the help !
This works:
WHERE DATE BETWEEN
(SELECT MIN(startdate format yymmddn8. FROM work.mydata)
AND (SELECT MAX(enddate format yymmddn8. FROM work.mydata)
DATE format is YYMMDD8n and length is 8.
Creating macro variables:
PROC SQL;
SELECT MIN(startdate), MAX(enddate)
INTO :start_date, :end_date
FROM work.mydata
QUIT;
/*Formatting the macro variable:*/
%macro format(value,format);
%if %datatyp(&value)=CHAR
%THEN %SYSFUNC(PUTC(&value, &format));
%ELSE %LEFT(%QSYSFUNC(PUTN($value,&format)));
%MEND format;
Tried:
WHERE DATE BETWEEN "%format(&start_date, yymmddn8.)" AND "%format(&end_date, yymmddn8.)"
Error message:
ERROR: Expression using equals (=) has components that are of different data types
First, you are missing d when providing date for BETWEEN operator.
WHERE DATE BETWEEN "%format(&start_date, yymmddn8.)"d AND "%format(&end_date, yymmddn8.)"d
But keep in mind tht date string must be in date9. format.
"4NOV2022"d
Second, you dont need to format date for this WHERE condition. Date is numeric and numeric value whould work fine.
WHERE DATE BETWEEN &start_date AND &end_date
If you really want to have date formated you can format it directly inside PROC SQL:
PROC SQL;
SELECT
MIN(startdate) format=date9.,
MAX(enddate) format=date9.
INTO
:start_date,
:end_date
FROM
work.mydata
QUIT;
and then
WHERE DATE BETWEEN "&start_date"d AND "&end_date"d
Note that in a PROC SQL query the format attached to a variable does not carry over to the result of aggregate functions, like MIN() and MAX(), performed on the variable. For numeric variables PROC SQL will use the BEST8. format when converting the number into a string to store into the macro variable. You can remove the extra spaces that causes by adding the TRIMMED keyword.
proc sql noprint;
select min(startdate), max(enddate)
into :start_date trimmed
, :end_date trimmed
from work.mydata
;
quit;
Do not add quotes around the values generated by expanding the macro variables. That would generate a string literal and not a numeric literal.
where date between &start_date and &end_date
If you want the values put into the macro variables by the into syntax to be formatted in some other way you need to attach the format as part of the query.
For example if you wanted the value to be something that could be used to generate a date literal, that is a string that the DATE informat understands, then use the DATE format. Make sure the width used is long enough to include all four digits of the year.
proc sql noprint;
select min(startdate) format=date9.
, max(enddate) format=date9.
into :start_date trimmed
, :end_date trimmed
from work.mydata
;
quit;
...
where date between "&start_date"d and "&end_date"d

Converting type from character to numerical in SAS

First I had to format all of the categories to numbers and I used the code below to do that and was successful. I need to convert the type from character to numerical so that I can run analysis. I have tried the input function but that has not worked either. Any help would be greatly appreciated.
Proc Format;
Value $gender_num 'Male'=0 'Female'=1;
Value $att 'Yes'=0 'No'=1;
Value $bustrav 'Non-Travel'=1 'Travel_Frequently'=2 'Travel_Rarely'=3;
Value $dpt 'Research & Development'=1 'Human Resources'=2 'Sales'=3;
Value $edfd 'Life Sciences'=1 'Human Resources'=2 'Marketing'=3 'Medical'=4 'Technical Degree'=5 'Other'=6;
Value $ot 'Yes'=0 'No'=1;
Value $ms 'Divorced'=1 'Married'=2 'Single'=3;
Value $jr 'Healthcare Representative'=1 'Human Resources'=2 'Laboratory Technician'=3 'Manager'=4
'Manufacturing Director'=5 'Research Director'=6 'Research Scientist'=7 'Sales Executive'=8 'Sales
Representative'=9;
Run;
Proc Print data=work.empatt;
format gender $gender_num.;
format attrition $att.;
format businesstravel $bustrav.;
format department $dpt.;
format educationfield $edfd.;
format overtime $ot.;
format maritalstatus $ms.;
format jobrole $jr.;
Run;
You're mixing Formats with Informats.
Format: "How do I display a number on the screen for you?"
Informat: "How do I convert text to a number?"
Your code in the first step above should be invalues. Then you use input to translate. You also need to assign that to a new variable - you can't just associate the informat with the variable and magically get a numeric.
proc format;
invalue sexi
'Male'=1
'Female'=2
;
quit;
data want;
set have;
sex_n = input(sex,sexi.);
run;
You can, if you want, keep the same variable name; I'll show that in the next step, also adding a format so the value "looks" right.
proc format;
invalue sexi
'Male'=1
'Female'=2
;
value sexf
1 = 'Male'
2 = 'Female'
;
quit;
data want;
set have;
sex_n = input(sex,sexi.);
format sex_n sexf.;
drop sex;
rename sex_n = sex;
run;
You drop the original one, then rename the new one to the original name. I use the _n suffix to make it clear what I'm doing, but it's not required; nor are the 'i' and 'f' suffixes in the format/informat (and in fact you could use the identical name if you wanted to), again just a pattern I use to make it easier to distinguish.

Select a date and format output

Below is my code of 5 lines. When I run the first 3 lines I get a date output of 21042 and would like it displayed/formatted as 8/11/2017. I am having trouble with the format part (line 4) and need help with it. My code is:
PROC SQL;
select max (Load_DT) as max_date
from in.db_tb
Format max_date yymmdd10.;
quit;
You need to put the format statement in the select part of the query.
data db_tb;
load_dt = today();
run;
PROC SQL;
select max (Load_DT) as max_date format yymmdd10.
from db_tb ;
quit;
Note your stated preference (8/11/2017) does not match the format you use in your code (2017-08-11). MMDDYY10. is the format that you'd want for that.

SAS: PROC MEANS Grouping in Class Variable

I have the following sample data and 'proc means' command.
data have;
input measure country $;
datalines;
250 UK
800 Ireland
500 Finland
250 Slovakia
3888 Slovenia
34 Portugal
44 Netherlands
4666 Austria
run;
PROC PRINT data=have; RUN;
The following PROC MEANS command prints out a listing for each country above. How can I group some of those countries (i.e. UK & Ireland, Slovakia/SLovenia as Central Europe) in the PROC MEANS step, rather than adding another datastep to add a 'case when' etc?
proc means data=have sum maxdec=2 order=freq STACKODS;
var measure;
class country;
run;
Thanks for any help at all on this. I understand there are various things you can do in the PROC MEANS command itself (like limit the number of countries by doing this:
proc means data=have(WHERE=(country not in ('Finland', 'UK')
I'd like to do the grouping in the PROC MEANS command for brevity.
Thanks.
This is very easy with a format for any PROC that takes a CLASS statement.
Simply build a format, either with code or from data; then apply the format in the PROC MEANS statement.
proc format lib=work;
value $countrygroup
"UK"="British Isles"
"Ireland"="British Isles"
"Slovakia","Slovenia"="Central Europe"
;
quit;
proc means data=have;
class country;
var measure;
format country $countrygroup.;
run;
It's usually better to have numeric codes for country and then format those to be whichever set of names is needed at any one time, particularly as capitalization/etc. is pretty irritating, but this works well enough even here.
The CNTLIN= option in PROC FORMAT allows you to make a format from a dataset, with FMTNAME as the value statement, START as the value-to-label, LABEL as the label. (END=end of range if numeric.) There are other options also, the documentation goes into more detail.

SAS Dates - SEMIYEAR in Proc SQL

The following works fine for me, as I want to select the YYYYQ value for a column to show the year and quarter:
proc sql;
select YYQ(year(datepart(betdate))
, QTR(datepart(betdate))) FORMAT=YYQN. as yearquarter
, QTR(datepart(betdate)) as semiyear
from &dsn;
quit;
How can I calculate the 'SEMIYEAR' instead of QTR? I can find refernces to it in the SAS documentation, but can't seem to get it to work. I want to show YYYYS, as it the year and the year 'half'.
Thanks.
There's not exactly a format or function for that, unfortunately. Do you need the date part of the value to persist, or just the value "20131"? (In your YYQ, for example, the underlying value is an actual date, which corresponds to the first date in the period of 2013Q1, so Jan 1; it's just displayed as 20131).
If you just want to display the value, you can do something pretty simple, like this:
proc sql;
select YYQ(year(datepart(betdate))
, QTR(datepart(betdate))) FORMAT=YYQN. as yearquarter
, floor(QTR(datepart(betdate))/2)+1 as semiyear
from test;
quit;
And append on the year if you want. However that does not maintain the actual first-day-of-the-period value. If you want to do that, you should use INTNX:
proc sql;
select YYQ(year(datepart(betdate))
, QTR(datepart(betdate))) FORMAT=YYQN. as yearquarter
, intnx('SEMIYEAR',datepart(betdate),0,'b') FORMAT=DATE9. as semiyear
from test;
quit;
That doesn't format it neatly, of course, so you would have to write your own format, unless I'm missing one that exists already; that's pretty easy though.
proc format;
value SEMIYEAR
'01JAN2013'd-'30JUN2013'd = '20131'
'01JUL2013'd-'31DEC2013'd = '20132'
;
quit;
Sadly you can't use picture formats to do this as far as I know - the documentation at least doesn't offer an option to display semiannual period. You can either do like I did above and just explicitly specify the time periods in the range, or you can write a function format; see http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n1eyyzaux0ze5ln1k03gl338avbj.htm for examples of how to do that.
Edit: Here's an example that mostly works.
proc fcmp outlib=work.functions.smd;
function sfmt(date) $;
length snum $5;
snum = put(year(date)*10+floor(QTR(date)/2)+1,5.);
return(snum);
endsub;
run;
options cmplib=(work.functions);
proc format;
value semiyear
other=[sfmt()];
quit;
data test2;
set test;
x=put(datepart(betdate),semiyear.);
put x=;
run;
proc sql;
select YYQ(year(datepart(betdate))
, QTR(datepart(betdate))) FORMAT=YYQN. as yearquarter
, intnx('SEMIYEAR',datepart(betdate),0,'b') FORMAT=SEMIYEAR5. as semiyear
from test;
quit;
However, for some odd reason in my session at least the PROC SQL returns goofy characters instead of 20131. The data step returns the correct answer in the log. Not sure if this is a bug or if i'm doing something very slightly wrong.