I have values in column like "07960/WR" , "27163/WR", etc. I need to select all numbers from it. So i created sql:
select CAST (regexp_replace(object_index, '\D', '', 'g') as integer) as number from ...
Its OK, but when somebody put [number] / <- slash / ....
example: "99/27163/WR"
My query doesnt work.
How to use regexp_replace ONLY for last 5 digits in value?
I don't know PostgreSQL, but with a little help from RegexBuddy, I pieced something together that hopefully works:
select CAST (REGEXP_REPLACE(object_index, $$(?p)^.*(\d{5})\D*$$$, $$\1$$, 'g') as integer) as number from ...
The idea of this regex is to match and capture the last five digits \d{5} in the string (i. e. those that are followed only by non-digits: \D*$) and remove everything around them.
Related
I'm trying to read specific values from a TestString using Perl and can't seem to get to where I need to. Hoping someone could help me.
I'm trying to read the value that starts the string and only two numbers to the left of the decimal and save it to value1. It has to be the value that ends two numbers to the left of the decimal and to the start of the string since the leading numbers may be 4, 3, or 2 numbers (e.g. 123420.78616 or 3320.78616).
So with the example below, I'm looking to save "133" to value1 using regexmatch and autohotkey.
The second regexmatch is, I need to save the other portion of the number to value2. Value2 would start two numbers to the left of the decimal and then to the end of the string. So I need the "20.78616" to be saved as value2.
Below I can only capture the full number with the Perl used and I've been trying combinations for hours with a regex101.com to no avail.
Hoping someone could help me.
TestString := "13320.78616"
RegExMatch (TestString, "(([\w\.]+)$)", value1)
RegExMatch (TestString, "(([\w\.]+)$)", value2)
msgbox, %value1%
msgbox, %value2%
Suggest the following regex:
(\d+)(\d\d\.\d*)
Three things to note:
use \d instead of \w if you want to capture just digits and not letters;
the (\d+) captures a leading string of at least one digit, and ends two digits before the decimal because of the next part:
the (\d\d\.\d*) captures exactly two digits, the decimal point, and any following digits.
I have this inputs:
John/Bean/4000-M100
John/4000-M100
John/4000
How can I get just the 4000 but note that the 4000 there will be change from time to time it can be 3000 or 2000 how can I treat that using regex pattern?
Here's my output so far, it statisfies John/400-M100 and John/4000 but the double slash doesnt suffice the match requirements in the regex I have:
REGEXP_REPLACE(REGEXP_SUBSTR(a.demand,'/(.*)-|/(.*)',1,1),'-|/','')
You can use this query to get the results you want:
select regexp_replace(data, '^.*/(\d{4})[^/]*$', '\1')
from test
The regex looks for a set of 4 digits following a / and then not followed by another / before the end of the line and replaces the entire content of the string with those 4 digits.
Demo on dbfiddle
This would also work, unless you need any digit followed by three zeros. See it in action here, for as long as it lives, http://sqlfiddle.com/#!4/23656/5
create table test_table
( data varchar2(200))
insert into test_table values('John/Bean/4000-M100')
insert into test_table values('John/4000-M100')
insert into test_table values('John/4000')
select a.*,
replace(REGEXP_SUBSTR(a.data,'/\d{4}'), '/', '')
from test_table a
The following will match any multiple of 1000 less than 10000 when its preceded by a slash:
\/[1-9]0{3}
To match any four-digit number preceded by a slash, not followed by another digit, such as 4031 in—
Sal_AS_180763852/4200009751_S5_154552/4031
—try:
\/\d{3}(?:(?:\d[^\d])|(?:\d$))
https://regex101.com/r/Am34WO/1
I have the following raw data:
1.1.2.2.4.4.4.5.5.9.11.15.16.16.19 ...
I'm using this regex to remove duplicates:
([^.]+)(.[ ]*\1)+
which results in the following:
1.2.4.5.9.115.16.19 ...
The problem is how the regex handles 1.1 in the substring .11.15. What should be 9.11.15.16 becomes 9.115.16. How do I fix this?
The raw values are sorted in numeric order to accommodate the regex used for processing the duplicate values.
The regex is being used within Oracle's REGEXP_REPLACE
The decimal is a delimiter. I've tried commas and pipes but that doesn't fix the problem.
Oracle's REGEX does not work the way you intended. You could split the string and find distinct rows using the general method Splitting string into multiple rows in Oracle. Another option is to use XMLTABLE , which works for numbers and also strings with proper quoting.
SELECT LISTAGG(n, '.') WITHIN
GROUP (
ORDER BY n
) AS n
FROM (
SELECT DISTINCT TO_NUMBER(column_value) AS n
FROM XMLTABLE(replace('1.1.2.2.4.4.4.5.5.9.11.15.16.16.19', '.', ','))
);
Demo
Unfortunately Oracle doesn't provide a token to match a word boundary position. Neither familiar \b token nor ancient [[:<:]] or [[:>:]].
But on this specific set you can use:
(\d+\.)(\1)+
Note: You forgot to escape dot.
Your regex caught:
a 1 - the second digit in 11,
then a dot,
and finally 1 - the first digit in 15.
So your regex failed to catch the whole sequence of digits.
The most natural way to write a regex catching the whole sequence
of digits would be to use:
a loobehind for either the start of the string or a dot,
then catch a sequence of digits,
and finally a lookahead for a dot.
But as I am not sure whether Oracle supports lookarounds, I wrote
the regex another way:
(^|\.)(\d+)(\.(\2))+
Details:
(^|\.) - Either start of the string or a dot (group 1), instead of
the loobehind.
(\d+) - A sequence of digits (group 2).
( - Start of group 3, containing:
\.(\2) - A dot and the same sequence of digits which caught group 2.
)+ - End of group 3, it may occur multiple times.
Group the repeating pattern and remove it
As revo has indicated, a big source of your difficulties came with not escaping the period. In addition, the resulting string having a 115 included can be explained as follows (Valdi_Bo made a similar observation earlier):
([^.]+)(.[ ]*\1)+ will match 11.15 as follow:
SCOTT#DB>SELECT
2 '11.15' val,
3 regexp_replace('11.15','([^.]+)(\.[ ]*\1)+','\1') deduplicated
4 FROM
5 dual;
VAL DEDUPLICATED
11.15 115
Here is a similar approach to address those problems:
matching pattern composition
-Look for a non-period matching list of length 0 to N (subexpression is referenced by \1).
'19' which matches ([^.]*)
-Look for the repeats which form our second matching list associated with subexression 2, referenced by \2.
'19.19.19' which matches ([^.]*)([.]\1)+
-Look for either a period or end of string. This is matching list referenced by \3. This fixes the match of '11.15' by '115'.
([.]|$)
replacement string
I replace the match pattern with a replacement string composed of the first instance of the non-period matching list.
\1\3
Solution
regexp_replace(val,'([^.]*)([.]\1)+([.]|$)','\1\3')
Here is an example using some permutations of your examples:
SCOTT#db>WITH tst AS (
2 SELECT
3 '1.1.2.2.4.4.4.5.5.9.11.15.16.16.19' val
4 FROM
5 dual
6 UNION ALL
7 SELECT
8 '1.1.1.1.2.2.4.4.4.4.4.5.5.9.11.11.11.15.16.16.19' val
9 FROM
10 dual
11 UNION ALL
12 SELECT
13 '1.1.2.2.4.4.4.5.5.9.11.15.16.16.19.19.19' val
14 FROM
15 dual
16 ) SELECT
17 val,
18 regexp_replace(val,'([^.]*)([.]\1)+([.]|$)','\1\3') deduplicate
19 FROM
20 tst;
VAL DEDUPLICATE
------------------------------------------------------------------------
1.1.2.2.4.4.4.5.5.9.11.15.16.16.19 1.2.4.5.9.11.15.16.19
1.1.1.1.2.2.4.4.4.4.4.5.5.9.11.11.11.15.16.16.19 1.2.4.5.9.11.15.16.19
1.1.2.2.4.4.4.5.5.9.11.15.16.16.19.19.19 1.2.4.5.9.11.15.16.19
My approach does not address possible spaces in the string. One could just remove them separately (e.g. through a separate replace statement).
SELECT name
FROM players
WHERE name ~ '(.*){8,}'
It is really simple but I cannot seem to get it.
I have a list with names and I have to filter out the ones with at least 8 characters... But I still get the full list.
What am I doing wrong?
Thanks! :)
A (.*){8,} regex means match any zero or more chars 8 or more times.
If you want to match any 8 or more chars, you would use .{8,}.
However, using character_lenth is more appropriate for this task:
char_length(string) or character_length(string) int Number of characters in string
CREATE TABLE table1
(s character varying)
;
INSERT INTO table1
(s)
VALUES
('abc'),
('abc45678'),
('abc45678910')
;
SELECT * from table1 WHERE character_length(s) >= 8;
See the online demo
I am looking for a regex string to match a set of numbers:
9.50 (numbers without spaces, that have 2 to 4 decimal points)
1 9 . 5 0 (numbers with spaces that have 2 to 4 decimals points)
10 (numbers without spaces and without decimal points)
So far I have come up regex string [0-9\s\.]+, but this not doing what I want. Any cleaner solutions out there?
Many Thanks
Try this:
[\d\s]+(?:\.(?:\s*\d){2,4})?
This makes the decimal point and the digits/spaces after it optional. If there are digits after, it checks that there are 2-4 of them with {2,4}
DEMO
If this should only match the whole string, you can anchor it.
^[\d\s]+(?:\.(?:\s*\d){2,4})?\s*$
The problem with your regex is that it will match 127.0.0.1 as well, which is an IP4 address, not a number.
The following regex should do the trick:
[0-9]+[0-9\s]*(\.(\s*[0-9]){2,4})?
Assumption I've made: You need to place at least one digit (before the comma).
regex101 demo.
(\d+[\d\s]*\.((\s*\d){2,4})?|\d+)
I was still getting "trailing spaces" selected with the third example of 10
This eliminated them.
wouldn't this work as well - '[^. 0-9]' ?
my full postgresql query looks like this:
split_part(regexp_replace(columnyoudoregexon , '[^. 0-9]', '', 'g'), ' ', 1)
and its doing the following:
values in the column get everything except numbers, spaces and point(for decimal) replaced with empty string.
split this new char string with split_part() and call which element in the resulting list you want.
was stuck on this for a while. i hope it helps.