I have a text field called Emp_number in the format of A1234567. I want to get the right 7 characters and then cast those 7 characters to Int.
However I sometimes get "dirty" data eg F1234G99 then I don't want to cast it.
How can I write my where clause to do this in SQL? I'm a beginner so please be very specific.
Thanks.
Related
I am attempting to extract dates from a free-text field (because our process is awesome like that :\ ) and keep hitting Teradata error 6706. The regex I'm using is: REGEXP_SUBSTR(original_field,'(\d{2})\/(\d{2})\/(\d{4})',1) AS new_field. I'm unsure of the field's type HELP TABLE has a blank in the Type column for the field.
I've already tried converting using TRANSLATE(col USING LATIN_TO_UNICODE), as well as UNICODE_TO_LATIN, those both actually cause the error by themselves. A straight CAST(original_field AS VARCHAR(255)) doesn't fix the issue, though that cast does work. I've also tried stripping various special characters (new-line, carriage return, etc.) from the field before letting the REGEXP_SUBSTR take a crack at it, both by itself and with the CAST & TRANSLATEs I already mentioned.
At this point I'm not sure what the issue could be, and could use some guidance on additional options to try.
The final version that worked ended up being
, CASE
WHEN TRANSLATE_CHK(field USING LATIN_TO_UNICODE) = 0 THEN
REGEXP_SUBSTR(TRANSLATE(field USING LATIN_TO_UNICODE),'(\d{2})\/(\d{2})\/(\d{4})',1)
ELSE NULL
END AS Ref_Date
For whatever reason, using a TRIM inside the TRANSLATE seems to cause an issue. Only once I striped any and all functions from inside the TRANSLATE did the TRANSLATE, and thus the REGEXP_SUBSTR, work.
Our database predates our database software having good unicode support, and in its place has a psuedo-base64 encoding which it uses to store UTF16 characters in an ascii field. I am writing a function to convert this type of field into straight UTF8 within SAS.
The function loops through the string converting each set of three ascii characters into a unicode character and placing it in an array. When experimenting with code in a data step I had used cat(of final{*}) to convert the array into a string, but the same code does not appear to be valid within a function.
I am currently collating the string in the loop with collate = trim(collate)!!trim(final{i}) and an arbitrary length collate string, but I would like to produce this directly from the array or at least set the size of the collate string based on the length of the input string.
I've included a pastebin of the data and function here.
Edit: The version of SAS I was using is 9.3
The same code is valid in a function in SAS 9.4 TS1M3; it may not be in earlier versions (significant changes were made to how arrays were handled in FCMP in 9.4 and in maintenance releases TS1M2 and 3).
However, this doesn't really solve your arbitrary length problem; when I run your function with
outtext = cat(of final{*});
return (outtext);
I get... 1 character! And when I run
return(cats(of final{*}));
output:
Obs text_enc finaltext
1 ABCABlABjABhAB1ABzABlAAgABVABUABGAA4AAgABpABzAAgABoABhAByABk BecauseU
2 ABTABpABtABwABsABlAByAAgABsABpABrABlAAgAB0ABoABpABz Simplerl
3 ABJABvAAgABJABvAAgABCAByABvABtABpABvABz IoIoBrom
which is a bit better (cats trims for you), I still only get 8 characters. That's because 8 characters is the default length in SAS for an undeclared character variable. Expand the length (using a length statement for outtext) and you get:
Obs text_enc finaltext
1 ABCABlABjABhAB1ABzABlAAgABVABUABGAA4AAgABpABzAAgABoABhAByABk BecauseUTF8ishard
2 ABTABpABtABwABsABlAByAAgABsABpABrABlAAgAB0ABoABpABz Simplerlikethis
3 ABJABvAAgABJABvAAgABCAByABvABtABpABvABz IoIoBromios
You'll still need to define whatever length you need, then. FCMP doesn't, as far as I know, allow for a way to have an undefined-length string; you need to define the default (and maximum) length for the string you're going to return. The user is welcome to define a shorter length, and should, when it's appropriate.
Is there a function SAS proc SQL which i can use to extract left part of the string.it is something similar to LEFT function sql server. in SQL I have left(11111111, 4) * 9 = 9999, I would like to something similar in SAS proc SQL. Any help will be appreciated.
Had an impression you want to repeat the substring instead of multiply, so I'm adding REPEAT function just for the curiosity.
proc sql;
select
INPUT(SUBSTR('11111111', 1, 4), 4.) * 9 /* if source is char */
, INPUT(SUBSTR(PUT(11111111, 16. -L), 1, 4), 4.) * 9 /* if source is number */
, REPEAT(SUBSTR(PUT(11111111, 16. -L), 1, 4), 9) /* repeat instead of multiply */
FROM SASHELP.CLASS (obs=1)
;
quit;
substr("some text",1,4) will give you "some". This function works the same way in a lot of SQL implementations.
Also, note that this is a string function, but in your example you're applying it to a number. SAS will let you do this, but in general it's wise to control you conversion between strings and numbers with put() and input() functions to keep your log clean and be sure that you're only converting where you actually intend to.
You might be looking for SUBSTRN function..
SUBSTRN(string, position <, length>)
Arguments
string specifies a character or numeric constant, variable,
or expression.
If string is numeric, then it is converted to a character value that
uses the BEST32. format. Leading and trailing blanks are removed, and
no message is sent to the SAS log.
position is an integer that specifies the position of the first
character in the substring.
length is an integer that specifies the length of the substring. If
you do not specify length, the SUBSTRN function returns the substring
that extends from the position that you specify to the end of the
string.
As others have pointed out, substr() is the function you are looking for, although I feel that a more useful answer would also 'teach you how to fish'.
A great way to find out about SAS functions is to google sas functions by category which at the time of writing this post will direct you here:
SAS Functions and CALL Routines by Category
It's worth scanning through this list at least once just to get an idea of all of the functions available.
If you're after a specific version, you may want to include the SAS version number in your search. Note that the link above is for 9.2.
If you have scanned through all the functions, and still can't find what you are looking for, then your next option may be to write your own SAS function using proc fcmp. If you ever need assistance with doing this than I suggest posting a new question.
I know it's easy enough to do manual corrections on date typos, but I want to automate such corrections using one or more SAS functions, given that my dataset is large and typos are frequent.
For instance, it seems that whomever created the dataset I am cleaning often transposed digits in the year of someone's birthdate (e.g., '2102' rather than '2012', '2110' instead of '2010', etc). I'm aware of string functions such as INDEX() that find certain character values or strings and then allow for the replacement of said characters in the same position (i.e., replace "ABCD" with "ABBB", regardless of the string's location in a value). Can the same process be replicated with numeric (and specifically date) values?
I don't think SAS has any functions that would check numeric values for digit patterns. I often do data cleaning and address this issue by making a character variable out of the numeric date variable, then using character functions and Perl regex to clean the character values, and then storing the cleaned values as numeric date.
For specifically date values, you could try using SAS date functions (e.g. DAY(), MONTH(), YEAR(), MDY(), etc.) to extract parts of the date value, error-check them, and put them all back together into a date value. This could be a good quick solution if you expect a limited set of typos and you roughly know what they are. For a more thorough error check, converting the numeric values to character and using char or regex functions would give you more options.
The only really concise suggestion I can imagine is using mdy (Assuming this is date, not datetime variables).
For example:
data want;
set have;
if year(datevar) > 2100 then
datevar = mdy(month(datevar),day(datevar),year(datevar)-90);
run;
would correct any '2104' to '2014'. That's a very simple correction (and may well do as much harm as good, since '2114' is also a possible typo), but things along those lines - break the date up into its pieces, verify the pieces, reconstruct using mdy.
I have strings in a table that contain hex values such as \ffffffc4. An example is the following:
Urz\ffffffc4\ffffff85dzenie zgodne ze standardem High Definition Audio
The following code can convert the hex into UTF8:
select chr(x'c4'::int)
which returns Ä but when I try to use a regexp_replace I get into problems. I have tried the following:
select regexp_replace(sal_input, E'\\f{6}(..)',convert(E'\\1','xyz','UTF8'),'g')
where XYZ are the various source encodings offered in 8.2 but all I get back is the hex value.
Any idea on how I could use the chr function inside regexp_replace?
Version used: PostgreSQL 8.2.15 (Greenplum Database 4.1.1.1 build 1) on x86_64-unknown-linux-gnu
Thanks in advance for the help
You are misunderstanding the order of evaluation. The 2nd argument to regexp_replace isn't a callback invoked for every substitution of '\1'.
What happens is that your convert call is evaluated first, on the literal value \1, and that result is passed to regexp_replace.
In any case, the SQL doesn't even evaluate on a modern PostgreSQL because of stricter casting rules, as '\1' isn't a valid bytea literal.
In a less ancient Pg version it might be possible to do something with regexp_split_to_table, chr and string_agg. In 8.2, I think you're going to be using a PL. I'd load PL/Perl and write a simple Perl function to do it. It's likely possible to implement in PL/PgSQL, but I suspect any implementation with the functionality available in 8.2 will be verbose and slow. I'd love to be proved wrong.