I have a invalue format to populate the value of VISITNUM for some records where it's value is missing using AVISITN which is populated for all records as a reference. Both are numeric variables. So if AVISITN = 10 I would want the missing VISITNUM to be 1 etc.
proc format;
invalue dummy_visnum
10 = 1
20 = 2
30 = 4
40 = 5
50 = 6
60 = 7
70 = 8
80 = 9
100= 10;
quit;
data have;
input VISITNUM :8. AVISITN :8.;
infile datalines dlm = '|';
datalines;
1|10|
2|20|
4|30|
5|40|
6|50|
7|60|
8|70|
1|10|
2|20|
4|30|
5|40|
6|50|
|60|
|70|
|80|
1|10|
2|20|
4|30|
5|40|
|50|
|60|
|70|
|80|
1|10|
2|20|
|30|
|40|
|50|
|60|
|70|
|80|
;
RUN;
data want;
input VISITNUM :8. AVISITN :8.;
infile datalines dlm = '|';
datalines;
1|10|
2|20|
4|30|
5|40|
6|50|
7|60|
8|70|
1|10|
2|20|
4|30|
5|40|
6|50|
7|60|
8|70|
9|80|
1|10|
2|20|
4|30|
5|40|
6|50|
7|60|
8|70|
9|80|
1|10|
2|20|
4|30|
5|40|
6|50|
7|60|
8|70|
9|80|
;
RUN;
However when I run this code it works as intended but I get a warning in my log "Numeric values have been converted to character values at the places given by: (Line):(Column)."
visitnum = input(avisitn, dummy_visnum.);
"The VALUE statement in PROC FORMAT is used to define a FORMAT. The INVALUE statement is used to define an INFORMAT. In SAS you use a FORMAT to convert values into text and an INFORMAT to convert text into values." With INFORMAT, you are telling SAS that AVISITN Is text, but it is actually a number. Hence, SAS converts AVISITN into text. So i know this approach doesn't work because AVISITN is getting converted to text which is causing the warning in the log. So i could try something like this as an alternative
if missing(visitn) then visitn = avisitn / 10 or
visitnum = input(cats(mod(avisitn,10)),dummy_visnum.);
However, the data is slightly asynchronous as it does not have a one to one conversion as visitnum = 3 is AVISITN = 40 and same for VISITNUM 4 to 9. It is only 1,2 and 10 where this would work. Does anyone have any alternative suggestions? I seen a HASH Object data_null step using find but not sure this would work here.
... and you can use an INFORMAT to convert text into numbers.
So create a FORMAT that converts your numbers into strings that LOOK LIKE the numbers you want. You can then use a call to PUT() to convert the numbers into strings and pass the result to INPUT() to convert those strings into numbers.
proc format ;
value num2char
10 ='1'
20 ='2'
30 ='4'
40 ='5'
50 ='6'
60 ='7'
70 ='8'
80 ='9'
;
run;
data want;
set have;
avistn = input(put(visitnum,num2char.),32.);
run;
Related
I want to create a variable that resolves to the character before a specified character (*) in a string. However I am asking myself now if this specified character appears several times in a string (like it is in the example below), how to retrieve one variable that concatenates all the characters that appear before separated by a comma?
Example:
data have;
infile datalines delimiter=",";
input string :$20.;
datalines;
ABC*EDE*,
EFCC*W*d*
;
run;
Code:
data want;
set have;
cnt = count(string, "*");
_startpos = 0;
do i=0 to cnt until(_startpos=0);
before = catx(",",substr(string, find(string, "*", _startpos+1)-1,1));
end;
drop i _startpos;
run;
That code output before=C for the first and second observation. However I want it to be before=C,E for the first one and before=C,W,d for the second observation.
You can use Perl regular expression replacement pattern to transform the original string.
Example:
data have;
infile datalines delimiter=",";
input string :$20.;
datalines;
ABC*EDE*,
EFCC*W*d*
;
data want;
set have;
csl = prxchange('s/([^*]*?)([^*])\*/$2,/',-1,string); /* comma separated letters */
csl = prxchange('s/, *$//',1,csl); /* remove trailing comma */
run;
Make sure to increment _STARTPOS so your loop will finish. You can use CATX() to add the commas. Simplify selecting the character by using CHAR() instead of SUBSTR(). Also make sure to TELL the data step how to define the new variable instead of forcing it to guess. I also include test to handle the situation where * is in the first position.
data have;
input string $20.;
datalines;
ABC*EDE*
EFCC*W*d*
*XXXX*
asdf
;
data want;
set have;
length before $20 ;
_startpos = 0;
do cnt=0 to length(string) until(_startpos=0);
_startpos = find(string,'*',_startpos+1);
if _startpos>1 then before = catx(',',before,char(string,_startpos-1));
end;
cnt=cnt-(string=:'*');
drop i _startpos;
run;
Results:
Obs string before cnt
1 ABC*EDE* C,E 2
2 EFCC*W*d* C,W,d 3
3 *XXXX* X 1
4 asdf 0
call scan is also a good choice to get position of each *.
data have;
infile datalines delimiter=",";
input string :$20.;
datalines;
ABC*EDE*,
EFCC*W*d*
****
asdf
;
data want;
length before $20.;
set have;
do i = 1 to count(string,'*');
call scan(string,i,pos,len,'*');
before = catx(',',before,substrn(string,pos+len-1,1));
end;
put _n_ = +7 before=;
run;
Result:
_N_=1 before=C,E
_N_=2 before=C,W,d
_N_=3 before=
_N_=4 before=
I am trying to format a particular field in the below format:
If the value is 60.00 it has to be displayed as 60.
if the value is 14.32 it has to displayed as 1432
and if it is 0.00 then output should be 0
i.e. No decimal should be Displayed.
Below is the datasets and the option i have tried.
data input_dataset;
input srno $ bill_amt $10.;
datalines;
1 60.00
2 0.00
3 14.32
;
run;
data test;
set input_dataset;
format mc062 $10.;
mc062 = put((bill_amt *100 ),10.);
run;
expected Results are:
mc062:
60
0
1432
How about something like:
data input_dataset;
input srno $ bill_amt ;
datalines;
1 60.00
2 0.00
3 14.32
;
run;
data output_dataset;
set input_dataset;
if (bill_amt NE int(bill_amt))then bill_amt = bill_amt * 100;
run;
A custom PICTURE format will scale a data value prior to rendering the digits according to the picture template.
This example has a template for values up to 12 digits after scaling.
proc format;
picture nodecimals
0 = '9'
other='000000000000' /* 12 digit selectors */
(mult=100) /* value scaling multiplier */
;
run;
data have;
input srno $ bill_amt ; * get some NUMERIC billing amounts;
format bill_amt 10.2; * format to be used when showing value;
* copy amount into bill_raw (no format) just for learning about
* the effect of a format (or not) in output produced for viewing;
bill_raw = bill_amt;
* copy amount in mc062 variable that will custom PICTURE format applied
* when output (for viewing) is produced;
mc062 = bill_amt;
format mc062 nodecimals.;
datalines;
1 60.00
2 0.00
3 14.32
4 123.456
;
run;
* produce some output for viewing;
proc print data=have;
run;
I have to pass Date format in Proc SQL in the where Class.
Date Format is like this "SAT Mar 17 01:29:17 IST 2018" (String Column, length is 28)
Now when i have tried input(Date,datetime18.) and some other date functions, but all are giving me error. Below is my query
proc sql;
Select input(Date,datetime18.) from table;
quit;
How to convert this date into simple date like "17-03-2018", so that i can use the same in proc SQL query?
date is numeric and you should not compare it with string value, compare it with date literal by converting your value also to date. comparing greater and less than values with strings, in general do not serve any purpose and can lead to erroneous results. less than and greater than have meaning/make sense when you compare numeric variables
data have;
b = "SAT Mar 17 01:29:17 IST 2018";
output;
b= "SAT Mar 19 01:29:17 IST 2018";
output;
b= "SAT Jun 20 01:29:17 IST 2018";
output;
b= "SAT Mar 25 01:29:17 IST 2018";
output;
run;
proc sql;
select * from have
where input(cats(scan(b,3),scan(b,2), scan(b, -1)),date9.) > "19Mar2018"d;
The ANYDTDTM can be a go to informat in many cases, however, as you point out (in comments) it is not so for the datetime format presented in the question.
The string can be re-arranged into a SAS inputable date time representation using cats and scan
data have;
date_string = "SAT Mar 17 01:29:17 IST 2018";
run;
data want;
set have;
dt_wrong = input(date_string, anydtdtm.);
dt_right = input ( cats
( scan(date_string,3),
scan(date_string,2),
scan(date_string,6),':',
scan(date_string,4)
), datetime20.);
put date_string= /;
put dt_right= datetime20. " from input of cats of string parts";
put dt_wrong= datetime20. " from anydttm ";
run;
* sample macro that can be applied in data step or sql;
%macro dts_to_datetime(dts);
input ( cats
( scan( &dts , 3),
scan( &dts , 2),
scan( &dts , 6), ':',
scan( &dts , 4)
)
, datetime20.)
%mend;
data want2;
set have;
dt_righton = %dts_to_datetime(date_string);
format dt_righton datetime20.;
put dt_righton=;
run;
The macro can also be used in where statements such as
where '18-Mar-2018:0:0'DT <= %dts_to_datetime (date_string)
Or SQL
, %dts_to_datetime (date_string) as event_dtm format=datetime20.
i have used sub-string.I created a new column NEW_DATE in Dataset. As mentioned above date format is Sat Mar 17 01:01:01 IST 2018. Below is the data step to fetch date in the format of DD-MMM-YYYY
data new;
set old;
new_date = substr(date,9,2)||"-"||substr(date,5,3)||"-"||substr(date,25,4);
new_date_updated = input(new_date,date11.);
format new_date_updated date11.;
run;
then i use new column in proc sql
proc sql;
select * from new where new_date_updated>'17-Mar-2018'd;
quit;
and it worked for me.
Thanks
Checked above approach with all scenarios, working fine with all
Most of my data is read in in a fixed width format, such as fixedwidth.txt:
00012000ABC
0044500DEFG
345340000HI
00234000JKL
06453MNOPQR
Where the first 5 characters are colA and the next six are colB. The code to read this in looks something like:
infile "&path.fixedwidth.txt" lrecl = 397 missover;
input colA $5.
colB $6.
;
label colA = 'column A '
colB = 'column B '
;
run;
However some of my data is coming from elsewhere and is formatted as a csv without the leading zeroes, i.e. example.csv:
colA,colB
12,ABC
445,DEFG
34534,HI
234,JKL
6453,MNOPQR
As the csv data is being added to the existing data read in from the fixed width file, I want to match the formatting exactly.
The code I've got so far for reading in example.csv is:
data work.example;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile "&path./example.csv" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat colA $5.;
informat colB $6.;
format colA z5.; *;
format colB z6.; *;
input
colA $
colB $
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
But the formats z5. & z6. only work on columns formatted as numeric so this isn't working and gives this output:
ColA colB
12 ABC
445 DEFG
34534 HI
234 JKL
6453 MNOPQR
When I want:
ColA colB
00012 000ABC
00445 00DEFG
34534 0000HI
00234 000JKL
06453 MNOPQR
With both columns formatted as characters.
Ideally I'd like to find a way to get the output I need using only formats & informats to keep the code easy to follow (I have a lot of columns to keep track of!).
Grateful for any suggestions!
You can use cats to force the csv columns to character, without knowing what types the csv import determined they were. Right justify the resultant to the expected or needed variable length and translate the filled in spaces to zeroes.
For example
data have;
length a 8 b $7; * dang csv data, someone entered 7 chars for colB;
a = 12; b = "MNQ"; output;
a = 123456; b = "ABCDEFG"; output;
run;
data want;
set have (rename=(a=csvA b=csvB));
length a $5 b $6;
* may transfer, truncate or convert, based on length and type of csv variables;
* substr used to prevent blank results when cats (number) is too long;
* instead, the number will be truncated;
a = substr(cats(csvA),1);
b = substr(cats(csvB),1);
a = translate(right(a),'0',' ');
b = translate(right(b),'0',' ');
run;
SUBSTR on the left.
data test;
infile cards firstobs=2 dsd;
length cola $5 colb $6;
cola = '00000';
colb = '000000';
input (a b)($);
substr(cola,vlength(cola)-length(a)+1)=a;
substr(colb,vlength(colb)-length(b)+1)=b;
cards;
colA,colB
12,ABC
445,DEFG
34534,HI
234,JKL
6453,MNOPQR
;;;;
run;
proc print;
run;
I have to retrieve the X-Axis and Y-Axis pos from ADDITIONAL_DETAILS field which is more than 300 bytes in length.
Somewhere in this string, I am getting the location details as RETLOCID=2312.4892 like that.
I am trying to use PERL REGEX in SAS.
Problem: I am able to get the starting position into postn1 from call prxsubstr(MATCH_PATTERN1, ADDITIONAL_DETAILS, postn1,length1); but the length is always returned as 8 even though it is more than that.
TRANSACTION_ID = substrn(ADDITIONAL_DETAILS, postn1, length1); This is not giving me proper value when I am restricting length to 8. Any help is appreciated. Below is the code:
DATA WORK.LOCATION;
INFILE DATALINES;
INPUT ADDITIONAL_DETAILS $50.;
datalines;
afdsf RFTXNID=121.5435 xx
fdsg RFTXNID=7821.5487 xx fdsg
gfdgf
;
RUN;
data WORK.POSITION;
set WORK.POSITION;
if _N_ = 1 then do;
MATCH_PATTERN1 = PRXPARSE("/(RETLOCID=)/");
MATCH_PATTERN2 = PRXPARSE("/([0-9]{1,}\.[0-9]{1,})/");
end;
retain MATCH_PATTERN1 MATCH_PATTERN2;
call prxsubstr(MATCH_PATTERN1, ADDITIONAL_DETAILS, postn1,length1);
call prxsubstr(MATCH_PATTERN2, ADDITIONAL_DETAILS, postn2,length2);
if postn1 > 0 and not missing(ADDITIONAL_DETAILS) then
TRANSACTION_ID = substrn(ADDITIONAL_DETAILS, postn1 + 8, length1);
RUN;
data work.POSITION;
set work.POSITION;
drop MATCH_PATTERN1 postn1 length1;
run;
I need to pull 121.5435 and 7821.5487
Try this:
DATA WORK.LOCATION;
INPUT ADDITIONAL_DETAILS $50.;
string=prxchange('s/[a-z=_]+//i',-1,ADDITIONAL_DETAILS);
datalines;
afdsf RFTXNID=121.5435 xx
fdsg RFTXNID=7821.5487 xx fdsg
DISTR_QUOTE=66.92
gfdgf
;
run;
Or
DATA WORK.LOCATION;
INPUT ADDITIONAL_DETAILS $50.;
length string $20.;
if prxmatch('/\=/',ADDITIONAL_DETAILS)=0 then string='';
else string=prxchange('s/.*(?<=\=)([^a-z]+).*/$1/i',-1,ADDITIONAL_DETAILS);
datalines;
afdsf RFTXNID=121.5435 xx
fdsg RFTXNID=7821.5487 xx fdsg
gfdgf
DISTR_QUOTE=66.92
;
proc print;
run;