Requirement
There is one requirement . The regexp_replace has to duplicate the sub string no.of times as ',,' is present.
(,xyz,,12,).* has to replace with (,xyz,,12,).*(,xyz,,12,).*
Example:
Input : (.*(,ELF,,NLF,).*)#(.*(,ABC,,CDF,,SDE,).*)
Output : (.*(,ELF,,NLF,).*(,ELF,,NLF,).*)#(.*(,ABC,,CDF,,SDE,).*(,ABC,,CDF,,SDE,).*(,ABC,,CDF,,SDE,).*)
Please help . Can this be done using regexp_replace?
Yes it can. Try this:
select regexp_replace('(.*(,ELF,,NLF,).*)#(.*(,ABC,,CDF,,SDE,).*)'
,'(\.\*[A-Z,\(\)]+)\.\*\)#\((\.\*[A-Z,\(\)]+)\.\*\)'
,'\1\1.*)#(\2\2\2.*)')
from dual;
This do exactly what you posted as example.
Related
we have ranges like this
"0,5, 0,5"
"0,112, 0,118"
and want to split by the second comma.
Any idea?
You can update the regex you split by with comma then a space after.
select regexp_split_to_array('0,112, 0,118', ', ')
demo:db<>fiddle
Supposing, there ist always at least one space after the second comma and none after the others, you could use this for the split regex:
SELECT
regexp_split_to_array(ranges, ',\s+')
FROM
t
This returns an array like {"0,5","0,5"}.
You can split both ranges into columns using a subquery:
SELECT
r[1],
r[2]
FROM (
SELECT
regexp_split_to_array(ranges, ',\s+') as r
FROM
t
) s
Edit:
TO wants to get everything after the second comma. So you need a regex for splitting, which finds the nth (here n = 2) occurrence of a comma:
(?:(^.*?,.*)),
This can be used to query the required data:
demo:db<>fiddle
SELECT
(regexp_split_to_array(ranges, '(?:(^.*?,.*)),'))[2]
FROM
t
Use regexp_replace:
select regexp_replace('0,112, 0,118', '.*,\s+', '') as foo;
Output:
foo
-------
0,118
(1 row)
Thank you all for the quick answers. It finally worked by using this
regexp_matches(your_string_value, '\d+[,|.]\d+|\d+','g'))[1]
This helped me getting rid of all unnecessary characters within the values + delivered me back the second value in the range.
I've been trying to join two tables 'A' and 'B' using a column say 'Col1'. The problem I'm facing is that the data coming in both columns are in different format. For example : 'A - Air' is coming as 'A-Air', 'B - Air' is coming as 'B-Air' etc.
Therefore, I'm trying to remove white spaces from data coming in Col1 in A but i'm not able to remove it using any function given in AWS documentation. I've tried Trim and replace, but they wont work in this case. This might be achieved using regular expressions but i'm not able to find how. Below is the snippet of how I tried using regex but didn't work.
select Col1, regexp_replace( Col1, '#.*\\.( )$')
from A
WHERE
date = TO_DATE('2020/08/01', 'YYYY/MM/DD')
limit 5
Please let me know how can I possibly remove the spaces from a string using regular expressions or any other possible means in Redshift.
Col1, regexp_replace( Col1,'\\s','')
This worked for me.
I've some URL's in my cas_fnd_dwd_det table,
casi_imp_urls cas_code
----------------------------------- -----------
www.casiac.net/fnds/CASI/qnxp.pdf
www.casiac.net/fnds/casi/as.pdf
www.casiac.net/fnds/casi/vindq.pdf
www.casiac.net/fnds/CASI/mnip.pdf
how do i copy the letters between last '/' and '.pdf' to another column
expected outcome
casi_imp_urls cas_code
----------------------------------- -----------
www.casiac.net/fnds/CASI/qnxp.pdf qnxp
www.casiac.net/fnds/casi/as.pdf as
www.casiac.net/fnds/casi/vindq.pdf vindq
www.casiac.net/fnds/CASI/mnip.pdf mnip
the below URL's are static
www.casiac.net/fnds/CASI/
www.casiac.net/fnds/casi/
Advise, how do i select the codes between last '/' and '.pdf' ?
I would recommend to take a look at REGEXP_SUBSTR. It allows to apply a regular expression. Db2 has string processing functions, but the regex function may be the easiest solution. See SO question on regex and URI parts for different ways of writing the expression. The following would return the last slash, filename and the extension:
SELECT REGEXP_SUBSTR('http://fobar.com/one/two/abc.pdf','\/(\w)*.pdf' ,1,1)
FROM sysibm.sysdummy1
/abc.pdf
The following uses REPLACE and the pattern is from this SO question with the pdf file extension added. It splits the string in three groups: everything up to the last slash, then the file name, then the ".pdf". The '$1' returns the group 1 (groups start with 0). Group 2 would be the ".pdf".
SELECT REGEXP_REPLACE('http://fobar.com/one/two/abc.pdf','(?:.+\/)(.+)(.pdf)','$1' ,1,1)
FROM sysibm.sysdummy1
abc
You could apply LENGTH and SUBSTR to extract the relevant part or try to build that into the regex.
For older Db2 versions than 11.1. Not sure if it works for 9.5, but definitely should work since 9.7.
Try this as is.
with cas_fnd_dwd_det (casi_imp_urls) as (values
'www.casiac.net/fnds/CASI/qnxp.pdf'
, 'www.casiac.net/fnds/casi/as.pdf'
, 'www.casiac.net/fnds/casi/vindq.pdf'
, 'www.casiac.net/fnds/CASI/mnip.PDF'
)
select
casi_imp_urls
, xmlcast(xmlquery('fn:replace($s, ".*/(.*)\.pdf", "$1", "i")' passing casi_imp_urls as "s") as varchar(50)) cas_code
from cas_fnd_dwd_det
I am using BigQuery on Google Cloud Platform to extract data from GDELT. This uses an SQL syntax and regular expressions.
I have a column of data (called V2Tone), in which each cell looks like this:
1.55763239875389,2.80373831775701,1.24610591900312,4.04984423676012,26.4797507788162,2.49221183800623,299
To select only the first number (i.e., the number before the first comma) using regular expressions, we use this:
regexp_replace(V2Tone, r',.*', '')
How can we select only the second number (i.e., the number between the first and second commas)?
How about the third number (i.e., the number between the second and third commas)?
I understand that re2 syntax (https://github.com/google/re2/wiki/Syntax) is used here, but my understanding of how to put that all together is limited.
If anything is unclear, please let me know. Thank you for your help as I learn to use regular expressions.
Below example is for BigQuery Standard SQL using super simple SPLIT approach
#standardSQL
SELECT
SPLIT(V2Tone)[SAFE_OFFSET(0)] first_number,
SPLIT(V2Tone)[SAFE_OFFSET(1)] second_number,
SPLIT(V2Tone)[SAFE_OFFSET(2)] third_number
FROM `project.dataset.table`
If for some reason you need/want to use regexp here - use below
#standardSQL
SELECT
REGEXP_EXTRACT(V2Tone, r'^(.*?),') first_number,
REGEXP_EXTRACT(V2Tone, r'^(?:(?:.*?),)(.*?),') second_number,
REGEXP_EXTRACT(V2Tone, r'^(?:(?:.*?),){2}(.*?),') third_number,
REGEXP_EXTRACT(V2Tone, r'^(?:(?:.*?),){4}(.*?),') fifth_number
FROM `project.dataset.table`
Note use of REGEXP_EXTRACT instead of REGEXP_REPLACE
You can play, test above options with dummy string from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '1.55763239875389,2.80373831775701,1.24610591900312,4.04984423676012,26.4797507788162,2.49221183800623,299' V2Tone
)
SELECT
SPLIT(V2Tone)[SAFE_OFFSET(0)] first_number,
SPLIT(V2Tone)[SAFE_OFFSET(1)] second_number,
SPLIT(V2Tone)[SAFE_OFFSET(2)] third_number,
REGEXP_EXTRACT(V2Tone, r'^(.*?),') first_number_re,
REGEXP_EXTRACT(V2Tone, r'^(?:(?:.*?),)(.*?),') second_number_re,
REGEXP_EXTRACT(V2Tone, r'^(?:(?:.*?),){2}(.*?),') third_number_re,
REGEXP_EXTRACT(V2Tone, r'^(?:(?:.*?),){4}(.*?),') fifth_number_re
FROM `project.dataset.table`
with output :
first_number second_number third_number first_number_re second_number_re third_number_re fifth_number_re
1.55763239875389 2.80373831775701 1.24610591900312 1.55763239875389 2.80373831775701 1.24610591900312 26.4797507788162
I don't know of a single regex replace which could be used to isolate a single number in your CSV string, because we need to remove things on both sides of the match, in general. But, we can chain together two calls to regex_replace. For example, if you wanted to target the third number in the CSV string, we could try this:
regexp_replace(regexp_replace(V2Tone, r'^(?:(?:\d+(?:\.\d+)?),){2}', ''),
r',.*', ''))
The pattern I am using to strip of the first n numbers is this:
^(?:(?:\d+(?:\.\d+)?),){n}
This just removes a number, followed by a comma, n times, from the beginning of the string.
Demo
Here is a solution with a single regex replace:
^([^,]+(?:,|$)){2}([^,]+(?:,|$))*|^.*$
Demo
\n is added to the negated character class in the demo to avoid matching accross lines in m|multiline mode.
Usage:
regexp_replace(V2Tone, r'^([^,]+(?:,|$)){2}([^,]+(?:,|$))*|^.*$', '$1')
Explanation:
([^,]+(?:,|$){n} captures everything to the next comma or the end of the string n times
([^,]+(?:,|$))* captures the rest 0 or more times
^.*$ capture everything if we cannot match n times
And then, finally, we can reinsert the nth match using $1.
I am trying to replace the string using regexp_replace in PLSQL and not getting desired output. i am new to this. please advise where i am going wrong.
names := 'table_200_file1_record1.column1 table_200_file2_record2.column2'
SELECT REGEXP_REPLACE(names,'([table_200]*[.]*){1,}','') FROM DUAL;
Desired output: (i want to remove everything before . operator which is starting with table_200)
column1 column2
You need to replace everything that's not a dot after table_200, up to the first dot you find, i.e.:
SELECT REGEXP_REPLACE('table_200_file1_record1.column1 table_200_file2_record2.column2','table_200[^\.]+(\.)','') FROM DUAL