Big Query Regex for Date ETL - regex

I have data with Date info imported in Big Query in format 2/13/2016 , 3/4/2012 etc
I want to convert it into Date format like 02-12-2016 and 03-04-2012.
I want to use a Query to create a new column and use regex for the same.
I know the regex to match the first part (2) of 2/4/2012 will be something like
^(\d{1})(/|-)
Reg ex to match the the 2nd part with / would be
(/)(\d{1})(/)
I am wondering how to use these 2 regex along with REGEXP_EXTRACT and REGEXP_REPLACE to create a new column with these dates in correct format.

It might be easiest just to convert to a DATE type column. For example:
#standardSQL
SELECT
PARSE_DATE('%m/%d/%Y', date_string) AS date
FROM (
SELECT '2/13/2016' AS date_string UNION ALL
SELECT '3/4/2012' AS date_string
);
Another option--if you want to keep the dates as strings--is to use REPLACE:
#standardSQL
SELECT
REPLACE(date_string, '/', '-') AS date
FROM (
SELECT '2/13/2016' AS date_string UNION ALL
SELECT '3/4/2012' AS date_string
);

Related

BigQuery regexp replace character between quotes

I'm trying to use the BigQuery function regexp_replace for the following scenario:
Given a string field with comma as a delimiter, I need to only remove the commas within double quotes.
I found the following regex to work in the website but it seems that the BigQuery function doesn't support Lookahead groups. Could you please help me find an equivalent expression that is supported by the Big Query function regexp_replace?
https://regex101.com/r/nxkqtb/3
Big Query example code not supported:
WITH tbl AS (
SELECT 'LINE_NR="1",TXT_FIELD="Some text",CID="0"' as text
UNION ALL
SELECT 'LINE_NR="2",TXT_FIELD=",,Some text",CID="0"' as text
UNION ALL
SELECT 'LINE_NR="3",TXT_FIELD="Some text ,",CID="0"' as text
UNION ALL
SELECT 'LINE_NR="4",TXT_FIELD=",Some ,text,",CID="0"' as text
)
SELECT
REGEXP_REPLACE(text, r'(?m),(?=[^"]*"(?:[^"\r\n]*"[^"]*")*[^"\r\n]*$)', "")
FROM tbl;
Thank you
Consider below approach (assuming you know in advance keys within the text field)
select text,
( select string_agg(replace(kv, ',', ''), ',' order by offset)
from unnest(regexp_extract_all(text, r'((?:LINE_NR|TXT_FIELD|CID)=".*?")')) kv with offset
) corrected_text
from tbl;
if applied to sample data in your question - output is

Regular Expression in redshift

I have a data which is being fed in the below format -
2016-006-011 04:58:22.058
This is an incorrect date/timestamp format and in order to convert this to a right one as below -
2016-06-11 04:58:22.058
I'm trying to achieve this using regex in redshift. Is there a way to remove the additional Zero(0) in the date and month portion using regex. I need something more generic and not tailed for this example alone as date will vary.
The function regexp_replace() (see documentation) should do the trick:
select
regexp_replace(
'2016-006-011 04:58:22.058' -- use your date column here instead
, '\-0([0-9]{2}\-)0([0-9]{2})' -- matches "-006-011", captures "06-" in $1, "11" in $2
, '-$1$2' -- inserts $1 and $2 to give "-06-11"
)
;
And so the result is, as required:
regexp_replace
-------------------------
2016-06-11 04:58:22.058
(1 row)

Hive - regexp_replace function for multiple strings

I am using hive 0.13! I want to find multiple tokens like "hip hop" and "rock music" in my data and replace them with "hiphop" and "rockmusic" - basically replace them without white space. I have used the regexp_replace function in hive. Below is my query and it works great for above 2 examples.
drop table vp_hiphop;
create table vp_hiphop as
select userid, ntext,
regexp_replace(regexp_replace(ntext, 'hip hop', 'hiphop'), 'rock music', 'rockmusic') as ntext1
from vp_nlp_protext_males
;
But I have 100 such bigrams/ngrams and want to be able to do replace efficiently where I just remove the whitespace. I can pattern match the phrase - hip hop and rock music but in the replace I want to simply trim the white spaces. Below is what I tried. I also tried using trim with regexp_replace but it wants the third argument in the regexp_replace function.
drop table vp_hiphop;
create table vp_hiphop as
select userid, ntext,
regexp_replace(ntext, '(hip hop)|(rock music)') as ntext1
from vp_nlp_protext_males
;
You can strip all occurrences of a substring from a string using the TRANSLATE function to replace the substring with the empty string. For your query it would become this:
drop table vp_hiphop;
create table vp_hiphop as
select userid, ntext,
translate(ntext, ' ', '') as ntext1
from vp_nlp_protext_males
;

Oracle SQL regexp date formatting

im so new in oracle, and trying to select some bad formatted date as cleaned.
for example,
my field is: 12.05.2010 dfsafs()F(Gf, 12:45
can i select it as 12.05.2010 12:45 with regexp or something else ?
thanks
Use the below regex to match date and time formats.
[0-9]{2}\.[0-9]{2}\.[0-9]{4}|[0-9]{2}:[0-9]{2}
DEMO
In oracle, i think you need to escape the curly braces.
[0-9]\{2\}\.[0-9]\{2\}\.[0-9]\{4\}|[0-9]\{2\}:[0-9]\{2\}
Something like this should works:
select regexp_substr(dat,'.*(\d{2}\.\d{2}\.\d{4}).*',1,1,'i',1) ||' '||
regexp_substr(dat,'.*(\d{2}:\d{2}).*',1,1,'i',1) datetime
from
(select '12.05.2010 dfsafs()F(Gf, 12:45' dat from dual);
Check that i extract date and time using regexp_substr and then concat both values.

regular expression search in notepad++ or text pad

I want to be able to search a pattern like 'CREATE TABLE ' followed by any expression include newline and ending with );
So it should be able to select following 2 create table stamtement one after other.
create table tab1 ( col1 number,
col2 date);
create table tab3 ( col1 number,
col2 date,
col3 number);
I did tried create table .* but I am not able to include newline .
Thanks.
this should do:
create table [^;]*;
check the matches newline checkbox