Wildcard expression in SQL Server - regex

I know that, the following query returns the rows, which are contain the exact 5 characters between the A and G
select *
from
(select 'prefixABBBBBGsuffix' code /*this will be returned. */
union
select 'prefixABBBBGsuffix') rex
where
code like '%A_____G%'
But I want 17 character between A and G, then like condition must have 17 underscores. So I search little in google I found [] will be used in like. Then I tried so for.
select *
from
(select 'AprefixABBBBBGsuffixG' code
union
select 'AprefixABBBBGsuffixG') rex
where
code like '%A[_]^17G%' /*As per my understanding, '[]' makes a set. And
'^17' would be power of the set (like Mathematics).*/
Then it returns the NULL set. How can I search rows which has certain number of character in the set []?
Note:
I'm using SQL Server 2012.

I would use REPLICATE to generate desired number of '_':
select * from (
select 'prefixABBBBBGsuffix' code
union
select 'prefixABBBBGsuffix'
) rex
where code like '%A' + REPLICATE('_',17) + 'G%';

same answer as previously but corrected. 17 wasn't the number, it was 18 and 19 for strings, also put in the len(textbetweenA and G) to show.
select rex.*
from (
select len('prefixABBBBBGsuffix') leng, 'AprefixABBBBBGsuffixG' code
union
select len('prefixABBBBGsuffix'), 'AprefixABBBBGsuffixG'
union
select 0, 'A___________________G'
) rex
where
rex.code like '%A' + replicate('_',19) + 'G%'
--and with [] the set would be [A-Za-z]. Notice this set does not match the A___________________G string.
select rex.*
from (
select len('prefixABBBBBGsuffix') leng, 'AprefixABBBBBGsuffixG' code
union
select len('prefixABBBBGsuffix'), 'AprefixABBBBGsuffixG'
union
select 0, 'A___________________G'
) rex
where
rex.code like '%A' + replicate('[A-Za-z]',19) + 'G%'
[A-Za-z0-9] matches one character within the scope of alphabet (both cases) or a number 0 through 9
I can't find any working information about another way to handle a number of chars like that, replicate is just a way to ease parameterization and typing.

Related

matching patterns for regexp_like in Oracle to include a group of character conditionally

I tried to look up for a good documentation for matching pattern for use of regexp_like in Oracle. I have found some and followed their instructions but looks like I have missed something or the instruction is not comprehensive.
Let's look at this example:
SELECT * FROM
(
SELECT 'ABC' T FROM DUAL
UNION
SELECT 'WZY' T FROM DUAL
UNION
SELECT 'WZY_' T FROM DUAL
UNION
SELECT 'WZYEFG' T FROM DUAL
UNION
SELECT 'WZY_EFG' T FROM DUAL
) C
WHERE regexp_like(T, '(^WZY)+[_]{0,1}+[A-Z]{0,6}')
What I expect to receive are WZY and WZY_EFG. But what I got was:
What I would like to have is the "_" could be present or not but if there are character after the first group, it is mandatory that it be present only once.
Is there a clean way to do this?
Use a subexpression grouping to make sure the _ character appears only with Capitalized Alphabetical Characters
Yes, your pattern does not address the conditional logic you need (only see the _ when capitalized alphabetical characters follow).
Placing the _ character in with a capitalized alphabetical character list into a subexpression grouping forces this logic.
Finally, placing the end of line anchor addresses the zero match scenarios.
SCOTT#DB>SELECT
2 *
3 FROM
4 (
5 SELECT 'ABC' t FROM dual
6 UNION ALL
7 SELECT 'WZY' t FROM dual
8 UNION ALL
9 SELECT 'WZY_' t FROM dual
10 UNION ALL
11 SELECT 'WZYEFG' t FROM dual
12 UNION ALL
13 SELECT 'WZY_EFG' t FROM dual
14 ) c
15 WHERE
16 REGEXP_LIKE ( t, '^(WZY)+([_][A-Z]{1,6}){0,1}$' );
T
__________
WZY
WZY_EFG

POSIX ERE Regular expression to find repeated substring

I have a set of strings containing a minimum of 1 and a maximum of 3 values in a format like this:
123;456;789
123;123;456
123;123;123
123;456;456
123;456;123
I'm trying to write a regular expression so I can find values repeated on the same string, so if you have 123;456;789 it would return null but if you had 123;456;456 it would return 456 and for 123;456;123 return 123
I managed to write this expression:
(.*?);?([0-9]+);?(.*?)\2
It works in the sense that it returns null when there are no duplicate values but it doesn't return exactly the value I need, eg: for the string 123;456;456 it returns 123;456;456and for the string 123;123;123 it returns 123;123
What I need is to return only the value for the ([0-9]+) portion of the expression, from what I've read this would normally be done using non-capturing groups. But either I'm doing it wrong or Oracle SQL doesn't support this as if I try using the ?: syntax the result is not what I expect.
Any suggestions on how you would go about this on oracle sql? The purpose of this expression is to use it on a query.
SELECT REGEXP_SUBSTR(column, "expression") FROM DUAL;
EDIT:
Actually according to https://docs.oracle.com/cd/B12037_01/appdev.101/b10795/adfns_re.htm
Oracle Database implements regular expression support compliant with the POSIX Extended Regular Expression (ERE) specification.
Which according to https://www.regular-expressions.info/refcapture.html
Non-capturing group is not supported by POSIX ERE
This answer describes how to select a matching group from a regex. So using that,
SELECT regexp_substr(column, '(\d{3}).*\1', 1, 1, NULL, 1) from dual;
# ^ Select group 1
Working demo of the regex (courtesy: OP).
If you only have three substrings, then you can use a brute force method. It is not particularly pretty, but it should do the job:
select (case when val1 in (val2, val3) then val1
when val2 = val3 then val2
end) as repeated
from (select t.*,
regexp_substr(col, '[^;]+', 1, 1) as val1,
regexp_substr(col, '[^;]+', 1, 2) as val2,
regexp_substr(col, '[^;]+', 1, 3) as val3
from t
) t
where val1 in (val2, val3) or val2 = val3;
Please bear with me and think of this different approach. Look at the problem a little differently and break it down in a way that gives you more flexibility in how you you are able look at the data. It may or may not apply to your situation, but hopefully should be interesting to keep in mind that there are always different ways to approach a problem.
What if you turned the strings into rows so you could do standard SQL against them? That way you could not only count elements that repeat but perhaps apply aggregate functions to look for patterns across sets or something.
Consider this then. The first Common Table Expression (CTE) builds the original data set. The second one, tbl_split, turns that data into a row for each element in the list. Uncomment the select that immediately follows to see. The last query selects from the split data, showing the count of how often the element occurs in the id's data. Uncomment the HAVING line to restrict the output to those elements that appear more than one time for the data you are after.
With the data in rows you can see how other aggregate functions could be applied to slice and dice to reveal patterns, etc.
SQL> with tbl_orig(id, str) as (
select 1, '123;456;789' from dual union all
select 2, '123;123;456' from dual union all
select 3, '123;123;123' from dual union all
select 4, '123;456;456' from dual union all
select 5, '123;456;123' from dual
),
tbl_split(id, element) as (
select id,
regexp_substr(str, '(.*?)(;|$)', 1, level, NULL, 1) element
from tbl_orig
connect by level <= regexp_count(str, ';')+1
and prior id = id
and prior sys_guid() is not null
)
--select * from tbl_split;
select distinct id, element, count(element)
from tbl_split
group by id, element
--having count(element) > 1
order by id;
ID ELEMENT COUNT(ELEMENT)
---------- ----------- --------------
1 123 1
1 456 1
1 789 1
2 123 2
2 456 1
3 123 3
4 123 1
4 456 2
5 123 2
5 456 1
10 rows selected.
SQL>

POSTGRESQL at least 8 characters in name with LIKE or REGEX

SELECT name
FROM players
WHERE name ~ '(.*){8,}'
It is really simple but I cannot seem to get it.
I have a list with names and I have to filter out the ones with at least 8 characters... But I still get the full list.
What am I doing wrong?
Thanks! :)
A (.*){8,} regex means match any zero or more chars 8 or more times.
If you want to match any 8 or more chars, you would use .{8,}.
However, using character_lenth is more appropriate for this task:
char_length(string) or character_length(string) int Number of characters in string
CREATE TABLE table1
(s character varying)
;
INSERT INTO table1
(s)
VALUES
('abc'),
('abc45678'),
('abc45678910')
;
SELECT * from table1 WHERE character_length(s) >= 8;
See the online demo

How to replace a single number from a comma separated string in oracle using regular expressions?

I have the following set of data where I need to replace the number 41 with another number.
column1
41,46
31,41,48,55,58,121,122
31,60,41
41
We can see four conditions here
41,
41
,41,
41,
I have written the following query
REGEXP_replace(column1,'^41$|^41,|,41,|,41$','xx')
where xx is the number to be replaced.
This query will replace the comma as well which is not expected.
Example : 41,46 is replaced as xx46. Here the expected output is xx,46. Please note that there are no spaced between the comma and numbers.
Can somebody help out how to use the regex?
Assuming the string is comma separated, You can use comma concatenation with replace and trim to do the replacement. No regex needed. You should avoid regex as the solution is likely to be slow.
with t (column1) as (
select '41,46' from dual union all
select '31,41,48,55,58,121,122' from dual union all
select '31,60,41' from dual union all
select '41' from dual
)
-- Test data setup. Actual solution is below: --
select
column1,
trim(',' from replace(','||column1||',', ',41,', ',17,')) as replaced
from t;
Output:
COLUMN1 REPLACED
41,46 17,46
31,41,48,55,58,121,122 31,17,48,55,58,121,122
31,60,41 31,60,17
41 17
4 rows selected.
Also, it's worth noting here that the comma separated strings is not the right way of storing data. Normalization is your friend.

Query for regular expression - oracle DB

I am trying to get all the name which has special char and numbers in it. I want to know if there is a better way of writing this query. Thanks
select Name from TABLE1
WHERE NAME LIKE '%-%'
OR NAME LIKE '%$%'
OR NAME LIKE '%4%' etc
Try this:
SELECT Name FROM Test WHERE REGEXP_LIKE(Name,'(-|$|4)');
The | (OR or Alternation) operator allows the expression to return true if the value contains an '-' OR an '$' OR an '4' etc.
Use square brackets to define a character class. Also a full regex to anchor the pattern to the beginning of the string, any number of any characters before a character class consisting of a dash or a 4 or a dollar sign then any number of any characters anchored to the end of the string:
SQL> with tbl(name) as (
2 select 'efs' from dual
3 union
4 select 'abc-' from dual
5 union
6 select 'a4bd' from dual
7 union
8 select 'gh$dll' from dual
9 union
10 select 'xy5zzy' from dual
11 )
12 select name from tbl
13 where regexp_like(name, '^.*[-|$|4].*$');
NAME
------
a4bd
abc-
gh$dll
SQL>