SAS Date (numeric) to Character when missing (.) - sas

Most likely a silly question, but I must be overlooking something.
I have a date field in which sometimes the date is missing (.). I have to create a file against this data set, but the requirements to have this loaded into a DB2 environment are requesting that instead of native SAS null numeric value (.), they require it to be a blank string.
This should be a simple task, by first converting the variable to character, and using the appropriate format:
LAST_ATTEMPT = PUT(ATTMPT1,YYMMDDS10.);
When a proc contents is run on the data set, it confirms that this has been converted to a character variable.
The issue is that when I look at the data set, it still has the (.) for the missing values. In an attempt to convert the missing date(.) to a blank string, it then blanks out every value for the variable...
What am I missing here?

Options MISSING=' ';
This will PUT blank for missing value when you execute your assignment.

One way is to use Options MISSING=' ';, but this might have unwanted impact on other parts of your program.
Another safer way is just adding a test to the original program:
IF ATTMPT1~=. THEN LAST_ATTEMPT = PUT(ATTMPT1,YYMMDDS10.);
ELSE LAST_ATTEMPT = "";

Related

Programming if else in Notepad++?

How can I use regex to program the following set of data into adding a character in equal position after the certain character in the first column, depending on the type of data in the first column?
I was using this command ^(.{5})\s* by changing the number in the braces {} but it is to manual for large data set. I was trying if else condition but I could not figure out how to use it.
Before
after
Thank you,

Remove or replace '�' character in Informatica

We have a requirement wherein we need to replace or remove '�' character (which is an unrecognizable, undefined character) present in our source. While running my workflow it runs successfully but when i check the records in target they are not committed. I get the following error in Informatica
Error executing query for record 37: 6706: The string contains an untranslatable character.
I tried functions like replace_chr, reg_replace, replace_str etc., but none seems to be working. Kindly advise on how to get rid of this. Any reply is greatly appreciated.
You need to use in your schema definitions charset=> utf8-unidode-ci
but now you can do:
UPDATE tablename
SET columnToCheck = REPLACE(CONVERT(columnToCheck USING ascii), '?', '')
WHERE ...
or
update tablename
set columnToCheck = replace(columnToCheck , char(146), '');
Replace NonASCII Characters in MYSQL
You can replace the special characters in an expression transformation.
REPLACESTR(1,Column_Name,'?',NULL)
REPLACESTR - Function
1 - Position
Column_Name - Column name which has a special character
? - Special character
NULL - Replacing character
You need to fetch rows with the appropriate character set defined on your connection. What is the connection you're using, ODBC or native? What's the DB?
Special characters are a challenge and having checked the informatica network I can see there is a kludge involving replace_str setting first a variable to the string with all non special characters first and then using the resulting variable in a replace_str so that the final value has only the allowed characters https://network.informatica.com/thread/20642 (awesome workaround by nico so long as you can positively identify every character that should be allowed) ...
As an alternate kludge I would also attempt something using an xml transformation somewhere within the mapping as informatica conveniently converts special characters to encoded (decimal or hex I cant remember) values... so long as you can live with these encoded values appearing in your target text you should be fine ( and build some extra space into your strings to accommodate any bloatage from the extra characters

Permanently Reformat Variable Values in SAS

I am trying to reformat my variables in SAS using the put statement and a user defined format. However, I can't seem to get it to work. I want to make the value "S0001-001" convert to "S0001-002". However, when I use this code:
put("S0001-001",$format.)
it returns "S0001-001". I double-checked my format and it is mapped correctly. I import it from Excel, convert it to a SAS table, and convert the SAS table to a SAS format.
Am I misunderstanding what the put statement is supposed to be doing?
Thanks for the help.
Assuming that you tried something like this it should work as you intended.
proc format ;
value $format 'S0001-001' = 'S0001-002' ;
run;
data want ;
old= 'S0001-001';
new=put(old,$format.);
put (old new) (=:$quote.);
run;
Make sure that you do not have leading spaces or other invisible characters in either the variable value or the START value of your format. Similarly make sure that your hyphens are actual hyphens and not em-dash characters.

ERROR: P does not have a numeric suffix (SAS, RENAME)

After having worked out a bunch of other errors I'm left with the following
ERROR: P does not have a numeric suffix.
From all the info I've been able to find this happens a lot when using PROC TRANSPOSE, however I'm not using that here (and don't anywhere else in this code).
Data Spillover_HE (rename=(F1=FY F2=BN F3=employeeID F4=grade_subject_ID
F5=AsmtID_agg F6=linkB F7=subgroupID F8=w F9=MGP_SE F10=Residual_SE
F11=Residual_Var F12=mgp_var F13=student_n F14=calcID F15=sumwt F16=MGP
F17=ave_prescore F18=p_imp F19=p_postImp F20=p_sped F21=p_sped_rs
F22=p_sped_se_ss F23=p_sped_st F24=p_sped_tt F25=P-ell F26=p_ed
F27=p_hispanic F28=p_black F29=p_white F30=p_asian F31=p_other
F32=p_blahispmale F33=p_overaundcred F34=p_retained F35=p_transfer
F36=p_top10 F37=p_top5 F38=p_top1 F39=p_bot10 F40=p_bot5 F41=p_bot1
F42=target_population F43=mean_residual_var F44=P_0_5)); run;
Obviously I have a bunch of variables that start with "p". None of them are underlined in the log. I'm using SAS Base, and got the same error in SAS Enterprise Guide.
Not sure what my next move should be. Thanks.
A dash is not a correct character in a variable name.
Replace F25=P-ell into F25=P_ell.
You can use dash to specify a range of variables e.g. rename=(x1-x100=y1-y100). This code renames 100 variables with prefix x to y.

How do I use numeric functions to correct date typos?

I know it's easy enough to do manual corrections on date typos, but I want to automate such corrections using one or more SAS functions, given that my dataset is large and typos are frequent.
For instance, it seems that whomever created the dataset I am cleaning often transposed digits in the year of someone's birthdate (e.g., '2102' rather than '2012', '2110' instead of '2010', etc). I'm aware of string functions such as INDEX() that find certain character values or strings and then allow for the replacement of said characters in the same position (i.e., replace "ABCD" with "ABBB", regardless of the string's location in a value). Can the same process be replicated with numeric (and specifically date) values?
I don't think SAS has any functions that would check numeric values for digit patterns. I often do data cleaning and address this issue by making a character variable out of the numeric date variable, then using character functions and Perl regex to clean the character values, and then storing the cleaned values as numeric date.
For specifically date values, you could try using SAS date functions (e.g. DAY(), MONTH(), YEAR(), MDY(), etc.) to extract parts of the date value, error-check them, and put them all back together into a date value. This could be a good quick solution if you expect a limited set of typos and you roughly know what they are. For a more thorough error check, converting the numeric values to character and using char or regex functions would give you more options.
The only really concise suggestion I can imagine is using mdy (Assuming this is date, not datetime variables).
For example:
data want;
set have;
if year(datevar) > 2100 then
datevar = mdy(month(datevar),day(datevar),year(datevar)-90);
run;
would correct any '2104' to '2014'. That's a very simple correction (and may well do as much harm as good, since '2114' is also a possible typo), but things along those lines - break the date up into its pieces, verify the pieces, reconstruct using mdy.