Finding string that has repeated pattern in snowflake - regex

I am trying to find the string that has repeated patterns in snowflake table. I am trying to get that using regex.
Example :
String : 'abc' , 'abcabc' , 'snowsnowflake'
The Query return only " 'abcabc' , 'snowsnowflake' ". Because it has repeated patterns.
Thank you.

I couldn't make it work with plain regex in SQL, but I was able to create a JavaScript UDF to get the desired results:
create or replace function find_repeated("x" string)
returns string
language javascript
as
$$
return x.match(/(.+)\1/g)
$$;
select x.value
, find_repeated(x.value)
, find_repeated(x.value) is not null has_repeated
from table(split_to_table('abc,abcabc,snowsnowflake', ',')) x

Related

Adding nulls to dataframe output with regexp replace in Spark 2.4

I am trying to use regex replace to add a string "null" to the output. Language is Spark Scala 2.40 in aws glue. What is the best approach for this problem?
I am creating a dataframe by dataframe select and parsing through the columns that I need to add "null" to:
var select_df = raw_df.select(
col("example_column_1"),
col("example_column_2"),
col("example_column_3")
)
Input of example_column_1
#;#;Runner#;#;bob
Desired Output of example_column_1
null#;null#;Runner#;null#;bob
Attempt:
select_df.withColumn("example_column_1", regexp_replace(col("example_column_1"), "", "null"))
The task can be split into two parts:
replace # at the beginning of the string
replace all occurences of ;#
select_df
.withColumn("example_column_1", regexp_replace('example_column_1, "^#", "null#"))
.withColumn("example_column_1", regexp_replace('example_column_1, ";#", ";null#"))
.show(false)

Select the next line of the matched pattern in clob column using oracle regular expression

I have a clob column "details" in table xxx. I want to select the next line of the matched pattern using Regex.
Input Text (CLOB DATA) like below :( all placed in new line)
MODEL_DATA 1
TEST1:
NONE
TEST2:
NONE
INFO:
SERVICES,VALUED-YES
TYPE:
NONE
I tried to use INFO as pattern match string and retrieve the next line of the text . But could not able to do it by using Regular expression function . Please help me to resolve this
Output :
SERVICES,VALUES-YES
You can use the below to get the details
select replace(regexp_substr(details,'INFO:'||chr(10)||'.+'),'INFO:')
from your_table;
You can also try the below to be operation system independent
select replace(regexp_substr(details,'INFO:
('||chr(10)||'|'||chr(13)||chr(10)||').+'),'INFO:')
from your_table;

How to split a string in db2?

I've some URL's in my cas_fnd_dwd_det table,
casi_imp_urls cas_code
----------------------------------- -----------
www.casiac.net/fnds/CASI/qnxp.pdf
www.casiac.net/fnds/casi/as.pdf
www.casiac.net/fnds/casi/vindq.pdf
www.casiac.net/fnds/CASI/mnip.pdf
how do i copy the letters between last '/' and '.pdf' to another column
expected outcome
casi_imp_urls cas_code
----------------------------------- -----------
www.casiac.net/fnds/CASI/qnxp.pdf qnxp
www.casiac.net/fnds/casi/as.pdf as
www.casiac.net/fnds/casi/vindq.pdf vindq
www.casiac.net/fnds/CASI/mnip.pdf mnip
the below URL's are static
www.casiac.net/fnds/CASI/
www.casiac.net/fnds/casi/
Advise, how do i select the codes between last '/' and '.pdf' ?
I would recommend to take a look at REGEXP_SUBSTR. It allows to apply a regular expression. Db2 has string processing functions, but the regex function may be the easiest solution. See SO question on regex and URI parts for different ways of writing the expression. The following would return the last slash, filename and the extension:
SELECT REGEXP_SUBSTR('http://fobar.com/one/two/abc.pdf','\/(\w)*.pdf' ,1,1)
FROM sysibm.sysdummy1
/abc.pdf
The following uses REPLACE and the pattern is from this SO question with the pdf file extension added. It splits the string in three groups: everything up to the last slash, then the file name, then the ".pdf". The '$1' returns the group 1 (groups start with 0). Group 2 would be the ".pdf".
SELECT REGEXP_REPLACE('http://fobar.com/one/two/abc.pdf','(?:.+\/)(.+)(.pdf)','$1' ,1,1)
FROM sysibm.sysdummy1
abc
You could apply LENGTH and SUBSTR to extract the relevant part or try to build that into the regex.
For older Db2 versions than 11.1. Not sure if it works for 9.5, but definitely should work since 9.7.
Try this as is.
with cas_fnd_dwd_det (casi_imp_urls) as (values
'www.casiac.net/fnds/CASI/qnxp.pdf'
, 'www.casiac.net/fnds/casi/as.pdf'
, 'www.casiac.net/fnds/casi/vindq.pdf'
, 'www.casiac.net/fnds/CASI/mnip.PDF'
)
select
casi_imp_urls
, xmlcast(xmlquery('fn:replace($s, ".*/(.*)\.pdf", "$1", "i")' passing casi_imp_urls as "s") as varchar(50)) cas_code
from cas_fnd_dwd_det

Replace pair of % in oracle

please, I have in Oracle table this texts (as 2 records)
"Sample text with replace parameter %1%"
"You reached 90% of your limit"
I need replace %1% with specific text from input parameter in Oracle Function. In fact, I can have more than just one replace parameters. I have also record with "Replace this %12% with real value"
This functionality I have programmed:
IF poc > 0 THEN
FOR i in 1 .. poc LOOP
p := get_param(mString => mbody);
mbody := replace(mbody,
'%' || p || '%', parameters(to_number(p, '99')));
END LOOP;
END IF;
But in this case I have problem with text number 2. This functionality trying replace "90%" also and I then I get this error:
ORA-06502: PL/SQL: numeric or value error: NULL index table key value
It's a possible to avoid try replace "90%"? Many thanks for advice.
Best regards
PS: Oracle version: 10g (OCI Version: 10.2)
Regular expressions can work here. Try the following and build them into your script.
SELECT REGEXP_REPLACE( 'Sample text with replace parameter %1%',
'\%[0-9]+\%',
'db_size' )
FROM DUAL
and
SELECT REGEXP_REPLACE( 'Sample text with replace parameter 1%',
'\%[0-9]+\%',
'db_size' )
FROM DUAL
The pattern is pretty simple; look for patterns where a '%' is followed by 1 or more numbers followed by a '%'.
The only issue here will be if you have more than one replacement to make in each string and each replacement is different. In that case you will need to loop round the string each time replacing the next parameter. To do this add the position and occurrence parameters to REGEXP_REPLACE after the replacement string, e.g.
REGEXP_REPLACE( 'Sample text with replace parameter %88888888888%','\%[0-9]+\%','db_size',0,1 )
You are getting the error because at parameters(to_number(p, '99')). Can you please check the value of p?
Also, if the p=90 then then REPLACE will not try to replace "90%". It will replace "%90%". How have you been sure that it's trying to replace "90%"?

Oracle: Extract number from String

I've reviewed this question and I'm wondering my output seems to be a little skewed.
From my understanding the REGEXP_REPLACE method, takes a string that you want to replace content with, followed by a pattern to match, then anything that does not match that pattern is replaced with the substitution param.
I've written the following function to extract distance from a text field, in which a spatial query will be performed on the result.
CREATE OR REPLACE FUNCTION extract_distance
(
p_search_string VARCHAR2
)
RETURN VARCHAR2
IS
l_distance VARCHAR2(25);
BEGIN
SELECT REGEXP_REPLACE(UPPER(p_search_string), '(([0-9]{0,4}) ?MILES)', '')
INTO l_distance FROM SYS.DUAL;
RETURN l_distance;
END extract_distance;
When I run this in a block to test:
DECLARE
l_output VARCHAR2(25);
BEGIN
l_output := extract_distance('Stores selling COD4 in 400 Miles');
DBMS_OUTPUT.PUT_LINE(l_output);
END;
I'd expect the output 400 miles but in-fact I get Stores selling COD4 in. Where have I gone wrong?
"REGEXP_REPLACE extends the functionality of the REPLACE function by letting you search a string for a regular expression pattern. By default, the function returns source_char with every occurrence of the regular expression pattern replaced with replace_string." from Oracle docu
You could use, e.g.,
SELECT REGEXP_REPLACE('Stores selling COD4 in 400 Miles', '^.*?(\d+ ?MILES).*$', '\1', 1, 0, 'i') FROM DUAL;
or alternatively
SELECT REGEXP_SUBSTR('Stores selling COD4 in 400 Miles', '(\d+ ?MILES)', 1, 1, 'i') FROM DUAL;
You'll want to use, regexp_substr which returns a substring that matches the regular expression.
REGEX_SUBSTR