Select only letters which are followed by a number - regex

I am trying to select some codes from a PostgreSQl table.
I only want the codes that have numbers in them e.g
GD123
GD564
I don't want to pick any codes like `GDTG GDCNB
Here's my query so far:
select regexp_matches(no_, '[a-zA-Z0-9]*$')
from myschema.mytable
which of course doesn't work.
Any help appreciated.

The pattern to match a string that has at least 1 letter followed by at least 1 number is '[A-Za-z]+[0-9]+'.
Now, if the valid patterns had to start with two letters, and then have 3 digits after as your examples show, then replace the + with {2} & {4} respectively, and enclose the pattern in ^$, like this: '^[A-Za-z]{2}[0-9]{3}$'
The regex match operator is ~ which you can use in the where clause:
SELECT no_
FROM myschema.mytable
WHERE no_ ~ '[A-Za-z]+[0-9]+'

You may use
CREATE TABLE tb1
(s character varying)
;
INSERT INTO tb1
(s)
VALUES
('GD123'),
('12345'),
('GDFGH')
;
SELECT * FROM tb1 WHERE s ~ '^(?![A-Za-z]+$)[a-zA-Z0-9]+$';
Result:
Details
^ - start of string
(?![A-Za-z]+$) - a negative lookahead that fails the match if there are only letters to the end of the string
[a-zA-Z0-9]+ - 1 or more alphanumeric chars
$ - end of string.
If you want to avoid matching 12345, use
'^(?![A-Za-z]+$)(?![0-9]+$)[a-zA-Z0-9]+$'
Here, (?![0-9]+$) will similarly fail the match if, from the string start, all chars up to the end of the string are digits. Result:

smth like:
so=# with c(v) as (values('GD123'),('12345'),('GD ERT'))
select v ~ '[A-Z]{1,}[0-9]+', v from c;
?column? | v
----------+--------
t | GD123
f | 12345
f | GD ERT
(3 rows)
?..

If the format of the data you want to obtain is a set of characters follewd by a set of digits (i.e., GD123) you can use the regex:
[a-zA-Z0-9]+[0-9]

This captures every digit and letter which is in front of the digits:
([A-z]+\d+)

Related

SQL Regex Pattern, How to match only a specific variable between two characters? (see Sample Output)

I have this inputs:
John/Bean/4000-M100
John/4000-M100
John/4000
How can I get just the 4000 but note that the 4000 there will be change from time to time it can be 3000 or 2000 how can I treat that using regex pattern?
Here's my output so far, it statisfies John/400-M100 and John/4000 but the double slash doesnt suffice the match requirements in the regex I have:
REGEXP_REPLACE(REGEXP_SUBSTR(a.demand,'/(.*)-|/(.*)',1,1),'-|/','')
You can use this query to get the results you want:
select regexp_replace(data, '^.*/(\d{4})[^/]*$', '\1')
from test
The regex looks for a set of 4 digits following a / and then not followed by another / before the end of the line and replaces the entire content of the string with those 4 digits.
Demo on dbfiddle
This would also work, unless you need any digit followed by three zeros. See it in action here, for as long as it lives, http://sqlfiddle.com/#!4/23656/5
create table test_table
( data varchar2(200))
insert into test_table values('John/Bean/4000-M100')
insert into test_table values('John/4000-M100')
insert into test_table values('John/4000')
select a.*,
replace(REGEXP_SUBSTR(a.data,'/\d{4}'), '/', '')
from test_table a
The following will match any multiple of 1000 less than 10000 when its preceded by a slash:
\/[1-9]0{3}
To match any four-digit number preceded by a slash, not followed by another digit, such as 4031 in—
Sal_AS_180763852/4200009751_S5_154552/4031
—try:
\/\d{3}(?:(?:\d[^\d])|(?:\d$))
https://regex101.com/r/Am34WO/1

Why is this regex performing partial matches?

I have the following raw data:
1.1.2.2.4.4.4.5.5.9.11.15.16.16.19 ...
I'm using this regex to remove duplicates:
([^.]+)(.[ ]*\1)+
which results in the following:
1.2.4.5.9.115.16.19 ...
The problem is how the regex handles 1.1 in the substring .11.15. What should be 9.11.15.16 becomes 9.115.16. How do I fix this?
The raw values are sorted in numeric order to accommodate the regex used for processing the duplicate values.
The regex is being used within Oracle's REGEXP_REPLACE
The decimal is a delimiter. I've tried commas and pipes but that doesn't fix the problem.
Oracle's REGEX does not work the way you intended. You could split the string and find distinct rows using the general method Splitting string into multiple rows in Oracle. Another option is to use XMLTABLE , which works for numbers and also strings with proper quoting.
SELECT LISTAGG(n, '.') WITHIN
GROUP (
ORDER BY n
) AS n
FROM (
SELECT DISTINCT TO_NUMBER(column_value) AS n
FROM XMLTABLE(replace('1.1.2.2.4.4.4.5.5.9.11.15.16.16.19', '.', ','))
);
Demo
Unfortunately Oracle doesn't provide a token to match a word boundary position. Neither familiar \b token nor ancient [[:<:]] or [[:>:]].
But on this specific set you can use:
(\d+\.)(\1)+
Note: You forgot to escape dot.
Your regex caught:
a 1 - the second digit in 11,
then a dot,
and finally 1 - the first digit in 15.
So your regex failed to catch the whole sequence of digits.
The most natural way to write a regex catching the whole sequence
of digits would be to use:
a loobehind for either the start of the string or a dot,
then catch a sequence of digits,
and finally a lookahead for a dot.
But as I am not sure whether Oracle supports lookarounds, I wrote
the regex another way:
(^|\.)(\d+)(\.(\2))+
Details:
(^|\.) - Either start of the string or a dot (group 1), instead of
the loobehind.
(\d+) - A sequence of digits (group 2).
( - Start of group 3, containing:
\.(\2) - A dot and the same sequence of digits which caught group 2.
)+ - End of group 3, it may occur multiple times.
Group the repeating pattern and remove it
As revo has indicated, a big source of your difficulties came with not escaping the period. In addition, the resulting string having a 115 included can be explained as follows (Valdi_Bo made a similar observation earlier):
([^.]+)(.[ ]*\1)+ will match 11.15 as follow:
SCOTT#DB>SELECT
2 '11.15' val,
3 regexp_replace('11.15','([^.]+)(\.[ ]*\1)+','\1') deduplicated
4 FROM
5 dual;
VAL DEDUPLICATED
11.15 115
Here is a similar approach to address those problems:
matching pattern composition
-Look for a non-period matching list of length 0 to N (subexpression is referenced by \1).
'19' which matches ([^.]*)
-Look for the repeats which form our second matching list associated with subexression 2, referenced by \2.
'19.19.19' which matches ([^.]*)([.]\1)+
-Look for either a period or end of string. This is matching list referenced by \3. This fixes the match of '11.15' by '115'.
([.]|$)
replacement string
I replace the match pattern with a replacement string composed of the first instance of the non-period matching list.
\1\3
Solution
regexp_replace(val,'([^.]*)([.]\1)+([.]|$)','\1\3')
Here is an example using some permutations of your examples:
SCOTT#db>WITH tst AS (
2 SELECT
3 '1.1.2.2.4.4.4.5.5.9.11.15.16.16.19' val
4 FROM
5 dual
6 UNION ALL
7 SELECT
8 '1.1.1.1.2.2.4.4.4.4.4.5.5.9.11.11.11.15.16.16.19' val
9 FROM
10 dual
11 UNION ALL
12 SELECT
13 '1.1.2.2.4.4.4.5.5.9.11.15.16.16.19.19.19' val
14 FROM
15 dual
16 ) SELECT
17 val,
18 regexp_replace(val,'([^.]*)([.]\1)+([.]|$)','\1\3') deduplicate
19 FROM
20 tst;
VAL DEDUPLICATE
------------------------------------------------------------------------
1.1.2.2.4.4.4.5.5.9.11.15.16.16.19 1.2.4.5.9.11.15.16.19
1.1.1.1.2.2.4.4.4.4.4.5.5.9.11.11.11.15.16.16.19 1.2.4.5.9.11.15.16.19
1.1.2.2.4.4.4.5.5.9.11.15.16.16.19.19.19 1.2.4.5.9.11.15.16.19
My approach does not address possible spaces in the string. One could just remove them separately (e.g. through a separate replace statement).

REGEX_LIKE not selecting correct result

I am trying to select numbers starting with one + character and having only digits afterward from a varchar column. I have used the regex_like operator but it also selects special character in the result.
Expected Correct value:
+369
+6589445
+5896552
Wrong:
693
+4534dfgfgf#
+3435435*%
I tried,
SELECT Column FROM Table WHERE REGEXP_LIKE(Column , '^[+][0-9]');
To select values starting with + and then 1 or more digits, use
^[+][0-9]+$
^^
The $ will force the end-of-string boundary and + will allow matching 1 or more occurrences of the construct the plus quantifies (the [0-9] character class).
Here is a demo showing how this regex works.

Remove last 4 digits from string if pattern matches

I'm trying to remove the last 4 digits from a string in Postgres if and only if they match a certain pattern: [0][1-9][0][1-9].
Example:
1031610101 -> 103161
1234 -> 1234
123456 -> 123456
123405 -> 123405
I've tried a few approaches using substring, but somehow can't get this to work.
The length of the string is variable.
So far I've tried:
substring(value from '([\d](3,6}[0][1-9][0][1-9])') as "Result"
Easier with regexp_replace():
SELECT regexp_replace(col, '0[1-9]0[1-9]$', '')
FROM tbl;
$ .. end of string
SELECT SUBSTR('ABCDEFGHIJKLMNOP', 1, LENGTH('ABCDEFGHIJKLMNOP') - 4);
Syntax: SUBSTR('string',from_postion,length)

Regex mask with varying ignored characters

I have a series of strings which look something like this:
foobar | ABC Some text 123
barfoo | DEF Some te 456
And I want to mask it such that I get the results
ABC123
DEF456
respectively. The text in between will always be a substring Some text which could potentially contain numbers (e.g. S0m3 t3xt or S0m3 t3). It will always be a substring starting from the left, so never me te.
So clearly I need to start the Regex with something like
(?<=| )[A-Z]{3}
which gets me ABC and DEF but I am at a loss of how to effectively concatenate the numbers at the end of the string.
Is there any way to do this with a single expression?
See http://regexr.com?375u8
(?<=| )([A-Z]{3}).*(\d{3})
This will give you three characters in the range of A-Z and three numbers in two capturing groups, allowing you to use these groups to concatenate both to your desired output: $1$2
This will even work if your Some text contains three numbers inbetween.
In case you want to replace everything with both of your capturing groups, add .* in front of the regex:
.*(?<=| )([A-Z]{3}).*?(\d{3})
Another javascript version
[
'foobar | ABC Some text 123',
'barfoo | DEF Some te 456'
].map(function(v) {
return v.replace(/^.*\| ([A-Z]{3}) .* (\d{3})$/, '$1$2');
})
Gives
["ABC123", "DEF456"]