postgres convert char to datetime - regex

ok guys, im trying to convert char based on text filename. the name extracted using regex. it works perfectly until meet this date August21st.
select to_char(
to_date(
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(
regexp_replace(
substr('UserAndMasterPlanPerAugust21st2015.txt',21),
'.txt',''),
'rd','-'),
'th','-'),
'nd','-'),
'st','-'),
'MonthDD-YYYY'),
'YYYYMMDD')::integer
that code will produce this errors.
ERROR: invalid value "Augu-21st" for "Month"
DETAIL: The given value did not match any of the allowed values for this field.
********** Error **********
ERROR: invalid value "Augu-21st" for "Month"
SQL state: 22007
Detail: The given value did not match any of the allowed values for this field.
i expect the result for this date is
20150821
i already know the problem is on 'st' because there are two 'st', i just trying the best way to solve this.
Thanks

I think that you may use another regular expression, I suggest you to try to use this one to remove ordinals from a date like:
(?<=[0-9])(?:st|nd|rd|th)
example here
You only need to escape it to postgres dialect...

I stripped the date from the string then casted to a date in the format that it is currently in. You then convert to char with the format that you want. CHEERS!
select to_char(
to_date(
substring('UserAndMasterPlanPerAugust21st2015.txt'
from 21 for length('UserAndMasterPlanPerAugust21st2015.txt')-24
), 'MonthDDthYYYY'), 'YYYYMMDD');
The purpose of the -24 is because I only want to move to the right the number of spaces that the date is. If the total length is n and I start from 21 that means i have n-20 characters left. I want to remove the .txt as well so move 4 less spaces. therefore I want to move from 21 -> to length of the string-24.

Related

How do I regextract the second date in a string?

I am trying to extract the second date displayed in this string, however my code keeps extracting just the first date in gsheet:
String: BOT +1 1/1 CUSTOM IWM 100 12 SEP 22/7 SEP 22 184/184 PUT/CALL #6.13
This is my code: =REGEXEXTRACT(A3,"(\d{1,2}\s+[A-Za-z]+\s\d{2,4})")
my result: 12 SEP 22
Desired result should be: 7 SEP 22
Appreciate the help, thanks in advance!
Considering you have already a working formula for detecting dates, you can try adding first outside of the parentheses the same structure. So it will look for the first date, then .+ will consider that there will be some characters in between, and then your working pattern between parenthesis. Then only that last part will be extracted:
=REGEXEXTRACT(A3,"\d{1,2}\s+[A-Za-z]+\s\d{2,4}.+(\d{1,2}\s+[A-Za-z]+\s\d{2,4})")
Here's one approach to dynamically extract N number of dates within your string OR extract the 2nd or 3rd date pattern as per the requirement.
=index(if(len(A:A),lambda(y,regexextract(y,lambda(z,regexreplace(y,"(?i)("&z&")","($1)"))("\d{1,2}\s"&JOIN("\s\d{2}|\d{1,2}\s",INDEX(TEXT(SEQUENCE(12,1,DATE(2022,1,1),31),"MMM")))&"\s\d{2}")))(regexreplace(A:A,"[\(\)/+]","")),))
if its to pick specific number pattern, wrap the formula within index + number as shown in the screenshot
=index(formula,,pattern number)
To extract just the second date, you can modify the code as follows:
=REGEXEXTRACT(A3,"\d{1,2}\s+[A-Za-z]+\s\d{2,4}.*(\d{1,2}\s+[A-Za-z]+\s\d{2,4})")
This regular expression \d{1,2}\s+[A-Za-z]+\s\d{2,4}.*(\d{1,2}\s+[A-Za-z]+\s\d{2,4}) will match the first date and the second date in the string, and then extract just the second date.

Numeric YYYMM SAS field... Want to convert to an actual date field YYYYMM

I currently have a dataset that has a field that contains YYYYMM in a numeric format... How can I convert this to an actual date field in the same layout?
Here is the expression I'm trying to use:
Input(put(t1.LOAN_MONTH_YR_NR,f8.0),yymmdd.)
t1.Loan_Month_YR_NR is the field that has 201707 as a number.
Your numbers only have 6 digits, not 8, so you are using the wrong FORMAT to convert to character and the wrong INFORMAT to convert to date. You should also attach a format if you want them to look the same when printed (even though the actual value will be different).
select input(put(t1.LOAN_MONTH_YR_NR,z6.),yymmn6.) format=yymmn6. as LOAN_MONTH_YR_DT
from t1

How to replace a single number from a comma separated string in oracle using regular expressions?

I have the following set of data where I need to replace the number 41 with another number.
column1
41,46
31,41,48,55,58,121,122
31,60,41
41
We can see four conditions here
41,
41
,41,
41,
I have written the following query
REGEXP_replace(column1,'^41$|^41,|,41,|,41$','xx')
where xx is the number to be replaced.
This query will replace the comma as well which is not expected.
Example : 41,46 is replaced as xx46. Here the expected output is xx,46. Please note that there are no spaced between the comma and numbers.
Can somebody help out how to use the regex?
Assuming the string is comma separated, You can use comma concatenation with replace and trim to do the replacement. No regex needed. You should avoid regex as the solution is likely to be slow.
with t (column1) as (
select '41,46' from dual union all
select '31,41,48,55,58,121,122' from dual union all
select '31,60,41' from dual union all
select '41' from dual
)
-- Test data setup. Actual solution is below: --
select
column1,
trim(',' from replace(','||column1||',', ',41,', ',17,')) as replaced
from t;
Output:
COLUMN1 REPLACED
41,46 17,46
31,41,48,55,58,121,122 31,17,48,55,58,121,122
31,60,41 31,60,17
41 17
4 rows selected.
Also, it's worth noting here that the comma separated strings is not the right way of storing data. Normalization is your friend.

Removing quotes and spaces in SAS dataset

I am working in SAS EG and DI, facing a very peculiar problem.
When I look into a column of a dataset in SAS DI Studio or EG, it is appearing fine. But when I paste the data into notepad, some quotes and spaces are appearing.
The data which I am seeing in EG:
But the same data when copied into Notepad,
extra quotes and spaces are appearing like this(in 6th row):
I found this problem when I am using this field as a key in a join, the other related column values for 6th row are not going to the output as the match is failing for that 6th record.
I tried many things like tranwrd,dequote and compress but none of them is changing my result.
Can someone please help in understanding what the problem is and how can this be solved.
Take a look at what is in the column so that you can decide how to handle it. This query will show you both the character string and the Hexadecimal representation of the string.
proc sql;
select postcode,put(trim(postcode),$hex.) as hexcode,count(*) as nobs
from x
group by 1,2
;
quit;
So if you see hex characters like 0A, 0D, A0, 08 or other non-printable codes then you can figure out what is happening.
So you might see that you have POSTCODE='LS5 3BT' with HEXCODE='4C533520334254' for most of the records. But perhaps have some that look like the POSTCODE='LS5 3BT', but the value of HEXCODE is something like '0A4C533520334254' which would mean that you have a linefeed character at the beginning of the string. Or perhaps instead of space ('20'X) you have a tab ('09'X) in the middle of the string.

Using regexp_extract in Hive

I am trying to find the rows from a hive table where a particular column does not contain null values or \N values or STX character '\002'. The objective is to find which rows contain some characters other than these three.
I tried this hive query:
select column1,length(regexp_replace(column1,'\N|\002|NULL','')) as value
FROM table1 LIMIT 10;
I was expecting zero in the following cases but I am getting the following:
column1 value
NULL NULL
0
NULL NULL
0
\N\N\N\N\N\N\N\N 8
NULL NULL
\N\N\N\N\N\N\N\N 8
NULL NULL
NULL NULL
\N\N\N 3
Could someone please help me on the correct regex for the above case?
Thank you.
Ravi
It looks that hive is using Java's regular expression engine so the problem seems to be with the regex itself, more specifically in the escape sequences.
Try the following and if it doesn't work then please let me know:
(?:(?:\\\\N)+|\002|NULL)