SAS/PROC SQL: Remove initial zeros from an alphanumeric field - sas

I need to remove some initial zeros from a field (it appears as an alphanumeric one in the DB) like this:
cod_acometida
000000000003391901
000000000008271401
000000000007696901
000000000005504701
000000000002298401
000000000000332701
000000000013942801
It's a variable number of characters but they are always zeros at the beginning of the string. I'm new at SAS, not sure if RegEx is applicable.
I'm using Enterprise Guide 7.15.
Thanks in advance.

Try this
data have;
input cod_acometida :$20.;
datalines;
000000000003391901
000000000008271401
000000000007696901
000000000005504701
000000000002298401
000000000000332701
000000000013942801
;
data want;
set have;
cod_acometida = substr(cod_acometida, verify(cod_acometida, '0'));
run;

Another way
data have;
input cod_acometida :$18.;
cards;
000000000003391901
000000000008271401
000000000007696901
000000000005504701
000000000002298401
000000000000332701
000000000013942801
;
data want;
set have;
cod_acometida = put(cod_acometida*1, best18.);
run;

Related

pulling all columns with certain values

I am working with a huge dataset in sas trying to use proc sql and I need help setting up a like statement. I'm trying to extract all the columns that have 'eco' in the name
I'm getting an error in the where statement as it is not registering the second *.
Any help?
proc sql
select *
from cfy19e8
where * LIKE %eco%;
You could concatenate all of your columns with catx() and find any one that has the word eco.
data have;
input col1$ col2$ col3$;
datalines;
sadfeco kdoa wrfs
asdf asdf sadf
mfecosa mawoeco mfzeco
;
run;
data want;
set have;
where catx('|', col1, col2, col3) LIKE '%eco%';
run;
If you have a lot of character columns, you could use the shortcut _CHARACTER_ to concatenate all variables, then use find() within an if statement in a data step.
data want;
set have;
if(find(catx('|', of _CHARACTER_), 'eco') );
run;
Perhaps
proc contents noprint data=cfy19e8 out=eco_columns(where=(upcase(name) like '%ECO%'));
run;
title 'Columns with ECO in their name';
proc print data=eco_columns;
var name;
run;

Combine one column's values into a single string

This might sound awkward but I do have a requirement to be able to concatenate all the values of a char column from a dataset, into one single string. For example:
data person;
input attribute_name $ dept $;
datalines;
John Sales
Mary Acctng
skrill Bish
;
run;
Result : test_conct = "JohnMarySkrill"
The column could vary in number of rows in the input dataset.
So, I tried the code below but it errors out when the length of the combined string (samplkey) exceeds 32K in length.
DATA RECKEYS(KEEP=test_conct);
length samplkey $32767;
do until(eod);
SET person END=EOD;
if lengthn(attribute_name) > 0 then do;
test_conct = catt(test_conct, strip(attribute_name));
end;
end;
output; stop;
run;
Can anyone suggest a better way to do this, may be break down a column into chunks of 32k length macro vars?
Regards
It would very much help if you indicated what you're trying to do but a quick method is to use SQL
proc sql NOPRINT;
select name into :name_list separated by ""
from sashelp.class;
quit;
%put &name_list.;
As you've indicated macro variables do have a size limit (64k characters) in most installations now. Depending on what you're doing, a better method may be to build a macro that puts the entire list as needed into where ever it needs to go dynamically but you would need to explain the usage for anyone to suggest that option. This answers your question as posted.
Try this, using the VARCHAR() option. If you're on an older version of SAS this may not work.
data _null_;
set sashelp.class(keep = name) end=eof;
length long_var varchar(1000000);
length want $256.;
retain long_var;
long_var = catt(long_var, name);
if eof then do;
want = md5(long_var);
put want;
end;
run;

SAS lowercase SHA256 string

Currently testing a SHA256 function in order to prepare a variable for use in another application.
The user has requested the SHA256 result be in lower case. I created a quick record in order to make sure I can convert the string-
data have;
input first $ last $ dob $ 10. sex $;
cards;
test person 1955-07-31 1
;
run;
Seems it will not allow a lower case string once passed through the SHA function.
Is there a workaround for this? The below attempt did not yield desirable results.
data have2;
set have;
source = catt(first,last,dob,sex);
encryp = lowcase(sha256(source));
format encryp $hex64.;
run;
The issue is not with the sha256 function, but with the $HEX64 format.
When you used lowcase you actually do some harm to the SHA256 result: you're not altering the hexadecimal representation, but you're actually altering the characters themselves, which means your result isn't accurate - and then you're displaying them with $HEX64. which will always show capital letters for the hexadecimal characters.
What instead you want, presumably, is to store the lower case version of the $HEX64. format. You can do that with put:
data want;
set have;
source = catt(first,last,dob,sex);
encryp = sha256(source);
lower = lowcase(put(encryp,$HEX64.));
run;
Note what encryp looks like - something totally different, and probably not particularly useful. You can of course skip that step if you want.
Below will do it using a put statement with the format inside:
data have2;
set have;
encryp = lowcase(put(sha256(catt(first,last,dob,sex)),$hex64.));
run;
It will show an entirely different encryption code compared to your method but it remains consistent.
data have;
input first $ last $ dob $ 10. sex $;
cards;
test person 1955-07-31 1
test person 1955-07-31 1
test2 person 1977-08-11 2
test3 person 1945-12-22 1
;
run;
data have2;
set have;
new_encryp = lowcase(put(sha256(catt(first,last,dob,sex)),hex64.)); /* new method */
encryp = lowcase(sha256(catt(first,last,dob,sex))); /* what you tried */
format encryp $hex64.;
run;
/* output */
first last dob sex new_encryp encryp
test person 7/31/1955 1 038a855a47f40edf54094adc4366e3e79c1a931346d7968e96d2cb930b01e7bc 039A857A67F40EDF74096AFC6366E3E79C1A931366D7969E96F2EB930B01E7BC
test person 7/31/1955 1 038a855a47f40edf54094adc4366e3e79c1a931346d7968e96d2cb930b01e7bc 039A857A67F40EDF74096AFC6366E3E79C1A931366D7969E96F2EB930B01E7BC
test2 person 8/11/1977 2 1117ab614f48a7edfbe9d615f12acad9d564b457b0f31bb2619f7eb9b10f1e58 1117AB616F68A7EDFBE9F615F12AEAF9F564B477B0F31BB261FF7EB9B10F1E78
test3 person 12/22/1945 1 d1cb00ebe044c0553039f99592dc7bd4804eac2c13da8208fd82459c3a37efd1 F1EB00EBE064E0753039F99592FC7BF4806EAC2C13FA8208FD82659C3A37EFF1
*convert case;
%let txt="SEQ_CLAIM_ID, SEQ_MEMB_ID, EFFECTIVE_DATE";
%macro lower_case(txt);
data;
text=lowcase(&txt);
run;
%mend;
%lower_case(&txt);

Character date 31.03.2001 to numerical, SAS

I have a variable that was entered as 31.01.2002 for all entries, and is a character. I would like to put it in numerical form with date9. .
I have tried the below:
date=input(oldway, 10.);
date=input(oldway, date9.);
put date=ddmmyy10.;
date=input(compress(oldway,'.'),10.);
date = INPUT(compress(oldway),date9.);
format date date9.;
run;
I have also tried combinations of the above and to no avail.
Any ideas for forward motion?
Kind Regards!!
You can't input your date using the date9. informat as your string variable isn't in that format. You can use ddmmyy10., though, and that also takes care of the . characters.
data have;
input old $10.;
cards;
31.01.2014
28.02.2014
01.01.2015
;
run;
data want;
set have;
new = input(old, ddmmyy10.);
format new date9.;
run;
try this:
data _null_;
date ="31.01.2014";
date=compress(date,".");
new_date=input(date,ddmmyy8.);
format new_date date9.;
put new_date;
run;

How to create a new variable in SAS by extracting part of the value of an existing numeric variable?

I have two datasets in SAS that I would like to merge, but they have no common variables. One dataset has a "subject_id" variable, while the other has a "mom_subject_id" variable. Both of these variables are 9-digit codes that have just 3 digits in the middle of the code with common meaning, and that's what I need to match the two datasets on when I merge them.
What I'd like to do is create a new common variable in each dataset that is just the 3 digits from within the subject ID. Those 3 digits will always be in the same location within the 9-digit subject ID, so I'm wondering if there's a way to extract those 3 digits from the variable to make a new variable.
Thanks!
SQL(using sample data from Data Step code):
proc sql;
create table want2 as
select a.subject_id, a.other, b.mom_subject_id, b.misc
from have1 a JOIN have2 b
on(substr(a.subject_id,4,3)=substr(b.mom_subject_id,4,3));
quit;
Data Step:
data have1;
length subject_id $9;
input subject_id $ other $;
datalines;
abc001def other1
abc002def other2
abc003def other3
abc004def other4
abc005def other5
;
data have2;
length mom_subject_id $9;
input mom_subject_id $ misc $;
datalines;
ghi001jkl misc1
ghi003jkl misc3
ghi005jkl misc5
;
data have1;
length id $3;
set have1;
id=substr(subject_id,4,3);
run;
data have2;
length id $3;
set have2;
id=substr(mom_subject_id,4,3);
run;
Proc sort data=have1;
by id;
run;
Proc sort data=have2;
by id;
run;
data work.want;
merge have1(in=a) have2(in=b);
by id;
run;
an alternative would be to use
proc sql
and then use a join and the substr() just as explained above, if you are comfortable with sql
Assuming that your "subject_id" variable is a number then the substr function wont work as sas will try convert the number to a string. But by default it pads some paces on the left of the number.
You can use the modulus function mod(input, base) which returns the remainder when input is divided by base.
/*First get rid of the last 3 digits*/
temp_var = floor( subject_id / 1000);
/* then get the next three digits that we want*/
id = mod(temp_var ,1000);
Or in one line:
id = mod(floor(subject_id / 1000), 1000);
Then you can continue with sorting the new data sets by id and then merging.