Character conversion from SAS to TERADATA - sas

I have recently started a new job and I am using tools which I am not very familiar with. so i was wondering if the StackOverFlow family could help me out.
I have this concatenation in SAS, but I am not able to sort it out on TERADATA
t1.COD_CZ||PUT(INPUT(t1.CODTC,5.),z4.)||PUT(t1.PROGOPE,z8.) as CODIGO_MCT
I have written something like this, but then the length of the string is not harmonized with the the result in sas.
t1.COD_CZ|| cast(cast(t1.CODTC as int) as char(4))|| cast(t1.PROGOPE as char(8)) as CODIGO_MCT
Can you gently enlight me? thanks in advance

You must apply a FORMAT to get leading zeroes (concatenating trims trailing spaces):
t1.COD_CZ||Cast(t1.CODTC AS FORMAT '9(4)')||Cast(t1.PROGOPE AS FORMAT '9(8)')
The result has a fixed length, but it's still a VarChar(17). If you need fixed length, e.g. for export:
CAST(t1.COD_CZ||Cast(t1.CODTC AS FORMAT '9(4)')||Cast(t1.PROGOPE AS FORMAT '9(8)') AS CHAR(17))

Related

Permanently Reformat Variable Values in SAS

I am trying to reformat my variables in SAS using the put statement and a user defined format. However, I can't seem to get it to work. I want to make the value "S0001-001" convert to "S0001-002". However, when I use this code:
put("S0001-001",$format.)
it returns "S0001-001". I double-checked my format and it is mapped correctly. I import it from Excel, convert it to a SAS table, and convert the SAS table to a SAS format.
Am I misunderstanding what the put statement is supposed to be doing?
Thanks for the help.
Assuming that you tried something like this it should work as you intended.
proc format ;
value $format 'S0001-001' = 'S0001-002' ;
run;
data want ;
old= 'S0001-001';
new=put(old,$format.);
put (old new) (=:$quote.);
run;
Make sure that you do not have leading spaces or other invisible characters in either the variable value or the START value of your format. Similarly make sure that your hyphens are actual hyphens and not em-dash characters.

Reading SAS date values

My data set has one variable whose values is like MMMYYYY format. But it is in character.
I have used input function to convert it to numeric but it didnt work.
code i have used is
newdate=input(chardate, date9.);
but is is not working. Please suggest me whats wrong in this code.
Thanks,
Ravi
You date isn't in a Date9 format, it's in the MONYY INFORMAT
https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000199690.htm
newdate=input(chardate, monyy7.);

Extract left part of the string in SAS?

Is there a function SAS proc SQL which i can use to extract left part of the string.it is something similar to LEFT function sql server. in SQL I have left(11111111, 4) * 9 = 9999, I would like to something similar in SAS proc SQL. Any help will be appreciated.
Had an impression you want to repeat the substring instead of multiply, so I'm adding REPEAT function just for the curiosity.
proc sql;
select
INPUT(SUBSTR('11111111', 1, 4), 4.) * 9 /* if source is char */
, INPUT(SUBSTR(PUT(11111111, 16. -L), 1, 4), 4.) * 9 /* if source is number */
, REPEAT(SUBSTR(PUT(11111111, 16. -L), 1, 4), 9) /* repeat instead of multiply */
FROM SASHELP.CLASS (obs=1)
;
quit;
substr("some text",1,4) will give you "some". This function works the same way in a lot of SQL implementations.
Also, note that this is a string function, but in your example you're applying it to a number. SAS will let you do this, but in general it's wise to control you conversion between strings and numbers with put() and input() functions to keep your log clean and be sure that you're only converting where you actually intend to.
You might be looking for SUBSTRN function..
SUBSTRN(string, position <, length>)
Arguments
string specifies a character or numeric constant, variable,
or expression.
If string is numeric, then it is converted to a character value that
uses the BEST32. format. Leading and trailing blanks are removed, and
no message is sent to the SAS log.
position is an integer that specifies the position of the first
character in the substring.
length is an integer that specifies the length of the substring. If
you do not specify length, the SUBSTRN function returns the substring
that extends from the position that you specify to the end of the
string.
As others have pointed out, substr() is the function you are looking for, although I feel that a more useful answer would also 'teach you how to fish'.
A great way to find out about SAS functions is to google sas functions by category which at the time of writing this post will direct you here:
SAS Functions and CALL Routines by Category
It's worth scanning through this list at least once just to get an idea of all of the functions available.
If you're after a specific version, you may want to include the SAS version number in your search. Note that the link above is for 9.2.
If you have scanned through all the functions, and still can't find what you are looking for, then your next option may be to write your own SAS function using proc fcmp. If you ever need assistance with doing this than I suggest posting a new question.

How do I use numeric functions to correct date typos?

I know it's easy enough to do manual corrections on date typos, but I want to automate such corrections using one or more SAS functions, given that my dataset is large and typos are frequent.
For instance, it seems that whomever created the dataset I am cleaning often transposed digits in the year of someone's birthdate (e.g., '2102' rather than '2012', '2110' instead of '2010', etc). I'm aware of string functions such as INDEX() that find certain character values or strings and then allow for the replacement of said characters in the same position (i.e., replace "ABCD" with "ABBB", regardless of the string's location in a value). Can the same process be replicated with numeric (and specifically date) values?
I don't think SAS has any functions that would check numeric values for digit patterns. I often do data cleaning and address this issue by making a character variable out of the numeric date variable, then using character functions and Perl regex to clean the character values, and then storing the cleaned values as numeric date.
For specifically date values, you could try using SAS date functions (e.g. DAY(), MONTH(), YEAR(), MDY(), etc.) to extract parts of the date value, error-check them, and put them all back together into a date value. This could be a good quick solution if you expect a limited set of typos and you roughly know what they are. For a more thorough error check, converting the numeric values to character and using char or regex functions would give you more options.
The only really concise suggestion I can imagine is using mdy (Assuming this is date, not datetime variables).
For example:
data want;
set have;
if year(datevar) > 2100 then
datevar = mdy(month(datevar),day(datevar),year(datevar)-90);
run;
would correct any '2104' to '2014'. That's a very simple correction (and may well do as much harm as good, since '2114' is also a possible typo), but things along those lines - break the date up into its pieces, verify the pieces, reconstruct using mdy.

Is there a limit to the value levels in a proc format statement?

proc format;
value $STNAME 'AL'='Alabama'
'AK'='Alaska
'AR'='Arkansas'
'AZ'='Arizona'
'CA'='California'
'CO'='Colorado'
'CT'='Connecticut'
'DC'='DistrictOfColumbia'
'DE'='Deleware'
'FL'='Florida'
'GA'='Georgia'
'HI'='Hawaii'
'IA'='Iowa'
'ID'='Idaho'
'IL'='Illinois'
'IN'='Indiana'
'KS'='Kansas'
'KY'='Knetucky'
'LA'='Louisiana'
'MA'='Massachusetts'
'MD'='Maryland'
'ME'='Maine'
'MI'='Michigan'
'MN'='Minnesota'
'MO'='Missouri'
'MS'='Mississippi'
'MT'='Montana'
'NC'='North Carolina'
'ND'='North Dakota'
'NE'='Nebraska'
'NH'='New Hampshire'
'NJ'='New Jersey'
'NM'='New Mexico'
'NY'='New York'
'NV'='Nevada'
'OH'='Ohio'
'OK'='Oklahoma'
'OR'='Oregon'
'PA'='Pennsylvania'
'RI'='Rhode Island'
'SC'='South Carolina'
'SD'='South Dakota'
'TN'='Tennessee'
'TX'='Texas'
'UT'='Utah'
'VA'='Virginia'
'VT'='Vermont'
'WA'='Washington'
'WI'='Wisconsin'
'WV'='West Virginia'
'WY'='Wyoming';
run;
It freezes up in the middle of the proc format step. If I split I shorten it, it runs fine.
Anyone aware how to get around this?
You are missing a closing quote on Alaska. I placed the code in my IDE and I could tell from the highlighting.
As long as your hard drive can hold the SAS program file, it does not have a limit on the number of unique values inside a proc format or the amount of memory needed to load it. As #Carolina has suggested you are missing an end quote for Alaska. If there is no end quote, the states after Alaska are in a different color. After you add the end quote, the highlighting after Alaska should change to a unanimous color.
Like this:
screenshot link
It might be better to use more conventional spacing for better readability.
Also, you might want spaces between 'DistrictOfColumbia' and Kentucky is spelt incorrectly.
Hope this helps.