Trimming leading zeros with regex - regex

I'm struggling to get a regex expression to work.
I need to allow the following transforms based on the existence of leading zeros...
001234 -> 1234
1234 -> 1234
00AbcD -> AbcD
001234.1234 -> 1234.1234
001234.000002 -> 1234.2
001234/000002 -> 1234.2
I've found the expression matches works well for transforms 1, 2 & 3 but I'm not sure how to match the (optional) second section demonstrated in 4, 5 & 6.
^0*([0-9A-Za-z]*$)

You can get the zeros with following regex :
/(?:^|[./])0+/g
Demo
and replace the second group with first group (\1).
For example in python i can do following :
>>> s="""001234
... 1234
... 00AbcD
... 001234.1234
... 001234.000002
... 001234/000002"""
>>> [re.sub(r'(:?^|[./])0+',r'\1',i) for i in s.split()]
['1234', '1234', 'AbcD', '1234.1234', '1234.2', '1234/2']

^(0+)(.+)
Group 2 should be result.

Related

return nth match from string using regex

I am using Tableau to create a visualization and need to apply Regex to string values in my data set. I'm trying to use Regex to return the nth match of this string of data: b29f3b2f2b2f3b3f1r2f3+b3x#. The data will always be in one line and I need to break the data out into substrings each time the characters b,s,f, or d are encountered and I need to match the nth occurrence returned. For example, when identifying which number match to return the following will match:
n=1 matches b29
n=2 matches f3
n=3 matches b2
n=4 matches f2
n=5 matches b2
n=6 matches f3
n=7 matches b3
n=8 matches f1r2
n=9 matches f3+
n=10 matches b3x#
I can get the n=1 match to return the proper value using bfsd(?=[bfsd]) and have tried to get the subsequent values to return using lookahead, but can't find a regex which works. Any help is appreciated.
Your item pattern is [bfsd][^bfsd]*.
You may use ^(?:.*?([bfsd][^bfsd]*)){n} to get what you need, just update the n variable with the number you need to get.
This pattern will get you the second value:
^(?:.*?([bfsd][^bfsd]*)){2}
See regex demo.
Details
^ - start of string
(?:.*?([bfsd][^bfsd]*)){2} - two occurrences of
.*? - any 0+ chars, as few as possible
([bfsd][^bfsd]*) - b, f, s or d followed with 0+ chars othet than b, f, s and d.
You can use this regex:
[bsfd][^bsfd]*
Use the 'global' flag.
This will create matches that start with one of the four letters, followed by any number of other characters.
The result will be an array with all the matches. Note the Array will start with index 0 (not 1).
if you have gawk, this will partition the input field as your spec
$ awk -v FPAT='[a-f][0-9rx#+]+' '{$1=$1}1'
$ echo "b29f3b2f2b2f3b3f1r2f3+b3x#" |
awk -v FPAT='[a-f][0-9rx#+]+' '{for(i=1;i<=NF;i++) print i " -> " $i}'
1 -> b29
2 -> f3
3 -> b2
4 -> f2
5 -> b2
6 -> f3
7 -> b3
8 -> f1r2
9 -> f3+
10 -> b3x#

Regex number pipe

I got this regular expression:
192\.168\.[1|2|5|20]\.[0-9]{1,3}
192.168.2.123 -> OK
192.168.5.123 -> OK
192.168.20.123 -> Error
I want to accept just value: 1 - 2 - 5 - 20 on X --> 192.168.X.122
(the rest of regular expression is correct, i just got the problem when i try to get value 20)
I can't reproduce your observations, but here is a pattern which should meet your requirements:
192\.168\.(?:1|2|5|20)\.122
Demo
It looks like you were confounding character classes, which are characters inside square brackets, with an alternation, which are different patterns of text, one of which needs to match.
This
[1|2|5|20]
actually says to match the numbers 0, 1, 2, 5 or pipe. If you want to match any of these numbers, then use an alternation:
(1|2|5|20)

Convert a regex expression to erlang's re syntax?

I am having hard time trying to convert the following regular expression into an erlang syntax.
What I have is a test string like this:
1,2 ==> 3 #SUP: 1 #CONF: 1.0
And the regex that I created with regex101 is this (see below):
([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)
:
But I am getting weird match results if I convert it to erlang - here is my attempt:
{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
Also, I get more than four matches. What am I doing wrong?
Here is the regex101 version:
https://regex101.com/r/xJ9fP2/1
I don't know much about erlang, but I will try to explain. With your regex
>{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
>re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
{match,[{0, 28},{0,3},{8,1},{16,1},{25,3}]}
^^ ^^
|| ||
|| Total number of matched characters from starting index
Starting index of match
Reason for more than four groups
First match always indicates the entire string that is matched by the complete regex and rest here are the four captured groups you want. So there are total 5 groups.
([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)
<-------> <----> <---> <--------->
First group Second group Third group Fourth group
<----------------------------------------------------------------->
This regex matches entire string and is first match you are getting
(Zero'th group)
How to find desired answer
Here we want anything except the first group (which is entire match by regex). So we can use all_but_first to avoid the first group
> re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M, [{capture, all_but_first, list}]).
{match,["1,2","3","1","1.0"]}
More info can be found here
If you are in doubt what is content of the string, you can print it and check out:
1> RE = "([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)".
"([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)"
2> io:format("RE: /~s/~n", [RE]).
RE: /([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)/
For the rest of issue, there is great answer by rock321987.

Select text in regex between 2 strings

I have the following line :
3EAM7A 1 3 EI AMANDINE MRV SHP 70 W 0 SH3-A1 1 SHP 70W OVOIDE AI E27 SON PIA PLUS
I'd like to get the string : EI AMANDINE MRV SHP 70 W. So I decided to select the strings between 1 (can also be 2, 3 or 99) and 0 (can also be 1, 2, 3, 4 or 5).
I tried :
(0|1|2|3|99)(.*)(0|1|2|3|4|5)
But I have this result :
EAM7A 1 3 EI AMANDINE MRV SHP 70 W 0 SH3-A1 1 SHP 70W OVOIDE AI E
that is not what I want to obtain.
Do you have an idea in regex to make that selection work ?
Thanks !
You were pretty close! Try this:
\b(?:0|1|2|3|99) ([^0|1|2|3|99].*?) (?:0|1|2|3|4|5)\b
Regex101
I think that you want to match "word" 4 to 9?
Your desired match will be in group 1
^(\S+\s){3}((\S+\s){6})
Enable the multiline option if you have a whole file of subject strings.
You can try with:
\s(?:[0-3]|99)\s([A-Z].*?)\b(?:[0-5])\b
DEMO
and get string by group $1. Or if your language support look around, try:
(?<=\s[0-3]\s|99)[A-Z].+?(?=\s[0-5]\s)
DEMO
to get match directly.
Another solution that is based on matching all initial space + digit sequences:
\b(?:(?:[0-3]|99)\b\s*)+(.*?)\s*\b(?:[0-5])\b
See demo
The result is in Group 1.
With \b(?:(?:[0-3]|99)\b\s*)+ the rightmost number from the allowed leading set is picked.
You can use following regex :
(?:(?:[0-3]|99)\s)+(.*?)\s(?:[0-5])\s
See demo https://regex101.com/r/iX6oE1/6
Also note that for matching a range of number you can use a character class instead of multiple OR.

Remove last 4 digits from string if pattern matches

I'm trying to remove the last 4 digits from a string in Postgres if and only if they match a certain pattern: [0][1-9][0][1-9].
Example:
1031610101 -> 103161
1234 -> 1234
123456 -> 123456
123405 -> 123405
I've tried a few approaches using substring, but somehow can't get this to work.
The length of the string is variable.
So far I've tried:
substring(value from '([\d](3,6}[0][1-9][0][1-9])') as "Result"
Easier with regexp_replace():
SELECT regexp_replace(col, '0[1-9]0[1-9]$', '')
FROM tbl;
$ .. end of string
SELECT SUBSTR('ABCDEFGHIJKLMNOP', 1, LENGTH('ABCDEFGHIJKLMNOP') - 4);
Syntax: SUBSTR('string',from_postion,length)