Converting 16 digit account number from char to numeric in sas - sas

Hi all Iam trying to convert an sixteen digit account number which is character to numeric in sas
Account_char = 123456789982635
So I used input(account_char , 16.)
Iam getting output of the numeric converted as 3.743554E14
Can someone help me where Iam going wrong .
I need output of the sixteen digit account number in numeric .

Assuming that you don't have values so large that they cannot be uniquely stored as a number in SAS just use INPUT() or PUT() function to convert from string to number and the reverse. Make sure to use a format other than the default BEST12. format to display the numbers so that all digits will print.
SAS Code Examples:
data table_w_num;
set table_w_char;
account_num = input(account_char,16.);
format account_num 16.;
run;
data table_w_char;
set table_w_num;
account_char = left(put(account_num,16.));
run;
SQL examples using macro variables:
select account_char into :mvar1 from table_w_char;
select * from table_w_num where account_num = &mvar1;
select quote(strip(put(account_num,16.))) into :mvar2 from table_w_num;
select * from table_w_char where account_char = &mvar2;
SQL examples using joins
select * from table_w_num n inner join table_w_char c
on n.account_num = input(c.account_char,16.)
;
select * from table_w_num n inner join table_w_char c
on c.account_char = strip(put(c.account_num,16.))
;
If the character field has leading zeros use Z16. format instead of 16. format. The informat used does not change.

Your posted number has 15 digits. Not 16.
However, you're all good. Just the formating:
data _null_;
Account_char = "123456789982635";
Account_num = input(Account_char, 15.);
put Account_num= 15.;
run;

Related

SAS Proc Format - Format a character based on a substring

I want to know if it is possible to create a custom format that will work on a range of character values by taking a substring and formatting it base on that.
e.g. A001, B001, C001 are all formatted to 'JAN'
A002, B002, C002 are all formatted to 'FEB'
Is this possible?
value $custom_format'A001', 'B001', 'C001' = 'JAN'
'A002', 'B002', 'C002' = 'FEB';
etc.
I know you can just list specific character values you want formatted (as above), but I would like this to work for any letter at the begining and only read the last 3 characters to determin the format, without having to list all possible iterations.
Use substr to only pull the last 3 values and create a format based on that.
proc format;
value $custom_format
'001' = 'JAN'
'002' = 'FEB'
;
run;
/* substr(var, 2) will start at the second character and read until the end */
data test;
var = 'A001';
month = put(substr(var, 2), $custom_format.);
output;
var = 'B002';
month = put(substr(var, 2), $custom_format.);
output;
run;
Output:
var month
A001 JAN
B002 FEB
If you need something more complex, you can supply regex to formats in SAS.

SAS: Converting numeric to character values

I am trying to convert datatime20. from numeric to character value.
Currently I have numeric values like this: 01Jan200:00:00:00 and I need to convert it to character values and received output like: 2020-01-01 00:00:00.0
What format and informat should be used in aboved ?
I have tried used PUT function to convert numeric to character and tried many option, each time receiving other format. Should be also use DHMS function before PUT ?
There is not a native format that produces that string exactly. But it it not hard to build it in steps using existing formats. Or you could use PICTURE statement in PROC FORMAT to build your own format.
If you don't really care about the time of day part of the datetime value then this is an easy and clearly understand way to convert the numeric variable DT with number of seconds into a new character variable in that style. Use DATEPART() to get the date (number of days) from the datetime value and then use the YYMMDD format to generate the 10 character string for the date and then just append the constant string of the formatted zeros.
length dt_string $21.;
dt_string = put(datepart(dt),yymmdd10.)||' 00:00:00.0';
If you need the time of day part then you could also use the TOD format.
dt_string = put(datepart(dt),yymmdd10.)||put(dt,tod11.1);
Or you could use the format E8601DT21.1 and then change the letter T between the date and time to a space instead.
dt_string = translate(put(dt,E8601DT21.1),' ','T');
If you want to figure out what formats exist for datetime values and what the formatted results look like you could run a little program to pull the formats from the meta data and apply them to a specific datetime value.
data datetime_formats;
length format $50 string $80 ;
set sashelp.vformat;
where fmttype='F';
where also fmtinfo(fmtname,'cat')='datetime';
keep format string fmtname maxw minw maxd ;
format=cats(fmtname,maxw,'.','-L');
string=putn('01Jan2020:01:02:03'dt,format);
run;
A custom format can be defined to return the result of a user defined function. Docs
proc format;
value <format-name> (default=<width>)
other = [<function-name>()]
;
run;
Example:
options cmplib=(sasuser.functions);
proc fcmp outlib=sasuser.functions.temporal;
function E8601DTS (datetime) $21;
return (
translate (putn(datetime,'E8601DT21.1'),' ','T')
);
endsub;
run;
proc format;
value E8601DTS (default=21)
other = [E8601DTS()]
;
run;
data have;
do dt = '01jan2020:0:0'dt to '10jan2020:0:0'dt by '60:00't;
output;
end;
format dt datetime16.;
run;
ods html file='function-based-format.html';
proc print data=have(obs=4); title 'stock E8601DT';
proc print data=have(obs=4); title 'custom E8601DTS';
format dt E8601DTS.;
run;
ods html close;

Converting to date5. format in SAS

I have a column which has mixed values of month and date (its in character $5 format).
date
7/23
5/23
23MAR
7/19
I want the data to come as uniform date5. format like this
date
23MAR
23MAY
23MAR
19JUL.
Here is the code that I'm using
data DAte_check4again;
set Date_2test;
format check_dt date5.;
check_dt=datepart(date);
run;
SAS stores DATE, TIME and DATETIME values as numbers. The DATEPART() function you are trying to use is for converting DATETIME values to DATE values. But your source variable is character with a length of 5. (FORMATs are just instructions for how to display values).
So your first problem will be to convert the string into a DATE value. You can then take the first 5 characters of the DATE. format and store that into either your original variable or some other variable. Assuming that the month/day values are for the current year and you only have those two styles of strings here is one method to generate a date and also the 5 character string.
data want;
set have ;
if index(date,'/') then date_ck = input(cats(date,'/',year(today())),mmmddyy10.);
else date_ck = input(cats(date,year(today())),date9.);
format date_ck date9.;
new_date = substr(put(date_ck,date9.),1,5);
run;

How do I do a cluster analysis on table with both character and numeric variables in SAS?

Account_id <- c("00qwerf1”, “00uiowe3”, “11heooiue” , “11heooihe” ,
"00sdffrg3”, “03vthjygjj”, “11mpouhhu” , “1poihbusw”)
Postcode <- c(“EN8 7WD”, “EN7 9BB”, “EN6 8YQ”, “EN8 7TT”, “EN7 9BC”, “EN6
8YQ”, “EN8 7WD”, “EN7 7WB)
Age <- c(“30”, “35”, “40”, “50”, “60”, “32”, “34”, “45”)
DF <- data.frame(Account_id, Postcode, Age)
I want to do cluster analysis on my dataframe in SAS. I understand that technically a dataframe is not used in SAS, however I have just used this format for illustration purposes. Account_id and Postcode are both character variables and Age is a numeric variable.
Below is the code that I have used after conducting a data step;
Proc fastclus data=DF maxc-8 maxiter=10 seed=5 out=clus;
Run;
The cluster analysis does not work because Account_id and Postcode are character variables. Is there a way to change these variables into numeric variables, or is there a clustering method that works with both character and numeric variables?
Before you can do clustering you need to define a metric that can be used to calculate the distance between observations. By default proc fastclus uses the Euclidean metric. This requires that all input variables are numeric and works best if they are all rescaled to have the same mean and variance, so that they are all equally important when growing clusters.
You could use postcode in a by statement if you wanted to perform a separate cluster analysis for each postcode, but if you want to use postcode itself as a clustering variable you will need to convert it to a numeric form. Replacing postcode with two variables for the latitude and longitude of postcode centroid might be a good option.
It's less obvious what would be a good option for your account ID variable, as this doesn't appear to be a measurement of anything. I would try to get hold of something else like account creation date or last activity date, which can be converted to a numeric value in a more obvious way.
You can determine the unique values of each variable and then assign the ordinality of the original value as it's numeric representation for the purpose of fastclus.
Sample code
Note: The FASTCLUS seed= option is a data set specifier, not a simple number (as is used with random number generators)
* hacky tweak to place your R coded data values in a SAS data set;
data have;
array _Account_id(8) $20 _temporary_ ("00qwerf1", "00uiowe3", "11heooiue" , "11heooihe" ,
"00sdffrg3", "03vthjygjj", "11mpouhhu" , "1poihbusw");
array _postcode(8) $7 _temporary_ ("EN8 7WD", "EN7 9BB", "EN6 8YQ", "EN8 7TT", "EN7 9BC", "EN6
8YQ", "EN8 7WD", "EN7 7WB");
array _age (8) $3 _temporary_ ("30", "35", "40", "50", "60", "32", "34", "45");
do _n_ = 1 to dim (_account_id);
Account_id = _account_id(_n_);
Postcode = _postcode(_n_);
Age = _age(_n_);
output;
end;
run;
* get lists of distinct values for each variable;
proc means noprint data=have;
class _all_;
ways 1;
output out=have_freq;
run;
* compute ordinal of each variables original value;
data have_freq2;
set have_freq;
if not missing(Account_id) then unum_Account_id + 1;
if not missing(Postcode) then unum_Postcode + 1;
if not missing(Age) then unum_Age + 1;
run;
* merge back by original value to obtain ordinal values;
proc sql;
create table have_unumified as
select
Account_id, Postcode, Age
, (select unum_Account_id from have_freq2 where have_freq2.Account_id = have.Account_id) as unum_Account_id
, (select unum_Postcode from have_freq2 where have_freq2.Postcode = have.Postcode) as unum_Postcode
, (select unum_Age from have_freq2 where have_freq2.Age = have.Age) as unum_Age
from have
;
run;
* fastclus on the ordinal values (seed= not specified);
Proc fastclus data=have_unumified maxc=8 maxiter=10 out=clus_on_unum;
var unum_:;
Run;

Adding a new Column to existing SAS dataset

I have a SAS dataset that I have created by reading in a .txt file. It has about 20-25 rows and I'd like to add a new column that assigns an alphabet in serial order to each row.
Row 1 A
Row 2 B
Row 3 C
.......
It sounds like a really basic question and one that should have an easy solution, but unfortunately, I'm unable to find this anywhere. I get solutions for adding new calculated columns and so on, but in my case, I just want to add a new column to my existing datatable - there is no other relation between the variables.
This is kind of ugly and if you have more than 26 rows it will start to use random ascii characters. But it does solve the problem as defined by the question.
Test data:
data have;
do row = 1 to 26;
output;
end;
run;
Explanation:
On my computer, the letter 'A' is at position 65 in the ASCII table (YMMV). We can determine this by using this code:
data _null_;
pos = rank('A');
put pos=;
run;
The ASCII table will position the alphabet sequentially, so that B will be at position 66 (if A is at 65 and so on).
The byte() function returns a character from the ASCII table at a certain position. We can take advantage of this by using the position of ASCII character A as an offset, subtracting 1, then adding the row number (_n_) to it.
Final Solution:
data want;
set have;
alphabet = byte(rank('A')-1 + _n_);
run;
Not better than Tom's but a brute force alternative essentially. Create the string of Alpha and then use CHAR() to identify character of interest.
data want;
set sashelp.class;
retain string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
letter = char(string, _n_);
run;