How to find numbers and exclude any in parentheses using regex - regex

I'm trying to write a regex pattern that will find numbers with two leading 00's in it in a string and replace it with a single 0. The problem is that I want to ignore numbers in parentheses and I can't figure out how to do this.
For example, with the string:
Somewhere 001 (2009)
I want to return:
Somewhere 01 (2009)
I can search by using [00] to find the first 00, and replace with 0 but the problem is that (2009) becomes (209) which I don't want. I thought of just doing a replace on (209) with (2009) but the strings I'm trying to fix could have a valid (209) in it already.
Any help would be appreciated!

Search one non digit (or start of line) followed by two zeros followed by one or more digits.
([^0-9]|^)00[0-9]+
What if the number has three leading zeros? How many zeros do you want it to have after the replacement? If you want to catch all leading zeros and replace them with just one:
([^0-9]|^)00+[0-9]+

Ideally, you'd use negative look behind, but your regex engine may not support it. Here is what I would do in JavaScript:
string.replace(/(^|[^(\d])00+/g,"$10");
That will replace any string of zeros that is not preceded by parenthesis or another digit. Change the character class to [^(\d.] if you're also working with decimal numbers.

?Regex.Replace("Somewhere 001 (2009)", " 00([0-9]+) ", " 0$1 ")
"Somewhere 01 (2009)"

Related

Regular Expression Extracting Text from a group

I have a filename like this:
0296005_PH3843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
I needed to break down the name into groups which are separated by a underscore. Which I did like this:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
So far so go.
Now I need to extract characters from one of the group for example in group 2 I need the first 3 and 8 decimal ( keep mind they could be characters too ).
So I had try something like this :
(.*?)_([38]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It didn’t work but if I do this:
(.*?)_([PH]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It will pull the PH into a group but not the 38 ? So I’m lost at this point.
Any help would be great
Try the below Regex to match any first 3 char/decimal and one decimal
(.?)_([A-Z0-9]{3}[0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
Try the below Regex to match any first 3 char/decimal and one decimal/char
(.?)_([A-Z0-9]{3}[A-Z0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
It will match any 3 letters/digits followed by 1 letter/digit.
If your first two letter is a constant like "PH" then try the below
(.?)_([PH]+[0-9A-Z]{2})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
I am assuming that you are trying to match group2 starting with numbers. If that is the case then you have change the source string such as
0296005_383843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
It works, check it out at https://regex101.com/r/zem3vt/1
Using [^_]* performs much better in your case than .*? since it doesn't backtrack. So changing your original regex from:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
to:
([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
reduces the number of steps from 114 to 42 for your given string.
The best method might be to actually split your string on _ and then test the second element to see if it contains 38. Since you haven't specified a language, I can't help to show how in your language, but most languages employ a contains or indexOf method that can be used to determine whether or not a substring exists in a string.
Using regex alone, however, this can be accomplished using the following regular expression.
See regex in use here
Ensuring 38 exists in the second part:
([^_]*)_([^_]*38[^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
Capturing the 38 in the second part:
([^_]*)_([^_]*)(38)([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)

Regex to match digits with decimals that have spaces as well as no spaces

I am looking for a regex string to match a set of numbers:
9.50 (numbers without spaces, that have 2 to 4 decimal points)
1 9 . 5 0 (numbers with spaces that have 2 to 4 decimals points)
10 (numbers without spaces and without decimal points)
So far I have come up regex string [0-9\s\.]+, but this not doing what I want. Any cleaner solutions out there?
Many Thanks
Try this:
[\d\s]+(?:\.(?:\s*\d){2,4})?
This makes the decimal point and the digits/spaces after it optional. If there are digits after, it checks that there are 2-4 of them with {2,4}
DEMO
If this should only match the whole string, you can anchor it.
^[\d\s]+(?:\.(?:\s*\d){2,4})?\s*$
The problem with your regex is that it will match 127.0.0.1 as well, which is an IP4 address, not a number.
The following regex should do the trick:
[0-9]+[0-9\s]*(\.(\s*[0-9]){2,4})?
Assumption I've made: You need to place at least one digit (before the comma).
regex101 demo.
(\d+[\d\s]*\.((\s*\d){2,4})?|\d+)
I was still getting "trailing spaces" selected with the third example of 10
This eliminated them.
wouldn't this work as well - '[^. 0-9]' ?
my full postgresql query looks like this:
split_part(regexp_replace(columnyoudoregexon , '[^. 0-9]', '', 'g'), ' ', 1)
and its doing the following:
values in the column get everything except numbers, spaces and point(for decimal) replaced with empty string.
split this new char string with split_part() and call which element in the resulting list you want.
was stuck on this for a while. i hope it helps.

How Can I Create a RegEx Pattern that will Get N Words Using Custom Word Boundary?

I need a RegEx pattern that will return the first N words using a custom word boundary that is the normal RegEx white space (\s) plus punctuation like .,;:!?-*_
EDIT #1: Thanks for all your comments.
To be clear:
I'd like to set the characters that would be the word delimiters
Lets call this the "Delimiter Set", or strDelimiters
strDelimiters = ".,;:!?-*_"
nNumWordsToFind = 5
A word is defined as any contiguous text that does NOT contain any character in strDelimiters
The RegEx word boundary is any contiguous text that contains one or more of the characters in strDelimiters
I'd like to build the RegEx pattern to get/return the first nNumWordsToFind using the strDelimiters.
EDIT #2: Sat, Aug 8, 2015 at 12:49 AM US CT
#maraca definitely answered my question as originally stated.
But what I actually need is to return the number of words ≤ nNumWordsToFind.
So if the source text has only 3 words, but my RegEx asks for 4 words, I need it to return the 3 words. The answer provided by maraca fails if nNumWordsToFind > number of actual words in the source text.
For example:
one,two;three-four_five.six:seven eight nine! ten
It would see this as 10 words.
If I want the first 5 words, it would return:
one,two;three-four_five.
I have this pattern using the normal \s whitespace, which works, but NOT exactly what I need:
([\w]+\s+){<NumWordsOut>}
where <NumWordsOut> is the number of words to return.
I have also found this word boundary pattern, but I don't know how to use it:
a "real word boundary" that detects the edge between an ASCII letter
and a non-letter.
(?i)(?<=^|[^a-z])(?=[a-z])|(?<=[a-z])(?=$|[^a-z])
However, I would want my words to allow numbers as well.
IAC, I have not been able how to use the above custom word boundary pattern to return the first N words of my text.
BTW, I will be using this in a Keyboard Maestro macro.
Can anyone help?
TIA.
All you have to do is to adapt your pattern ([\w]+\s+){<NumWordsOut>} to, including some special cases:
^[\s.,;:!?*_-]*([^\s.,;:!?*_-]+([\s.,;:!?*_-]+|$)){<NumWordsOut>}
1. 2. 3. 4. 5.
Match any amount of delimiters before the first word
Match a word (= at least one non-delimiter)
The word has to be followed by at least one delimiter
Or it can be at the end of the string (in case no delimiter follows at the end)
Repeat 2. to 4. <NumWordsOut> times
Note how I changed the order of the -, it has to be at the start or end, otherwise it needs to be escaped: \-.
Thanks to #maraca for providing the complete answer to my question.
I just wanted to post the Keyboard Maestro macro that I have built using #maraca's RegEx pattern for anyone interested in the complete solution.
See KM Forum Macro: Get a Max of N Words in String Using RegEx

Parse Number from string with regex

I have thousands of article descriptions containing numbers.
they look like:
ca.2760h3x1000.5DIN345x1500e34
the resulting numbers should be:
2760
1000.5
1500
h3 or 3 shall not be a result of the parsing, since h3 is a tolerance only
same for e34
DIN345 is a norm an needs to be excluded (every number with a trailing DIN or BN)
My current REGEX is:
[^hHeE]([-+]?([0-9]+\.[0-9]+|[0-9]+))
This solves everything BUT the norm. How can I get this "DIN" and "BN" treated the same way as a single character ?
Thanx, TomE
Try using this regular expression:
(?<=x)[+-]?0*[0-9]+(?:\.[0-9]+)?|[+-]?0*[0-9]+(?:\.[0-9]+)?(?=h|e)
It looks like every number in your testcase you want to match exept the first number is starting with x.This is what the first part of the regex matches. (?<=x)[+-]?0*[0-9]+(?:\.[0-9]+)?The second part of the regex matches the number until h or e. [+-]?0*[0-9]+(?:\.[0-9]+)?(?=h|e)
The two parts [+-]?0*[0-9]+(?:\.[0-9]+)? in the regex is to match the number.
If we can assume that the numbers are always going to be four digits long, you can use the regex:
(\d{4}\.\d+|\d{4})
DEMO
Depending on the language you might need to replace \d with [0-9].

How to match a one of a set of numbers?

I am trying to match a group of numbers in regex that consist of one of the following:
1,2,3,4,5,6,7,8,9,10,11
But I am having trouble figuring out the regex.
For single digits this pattern worked fine "0|1|2|3|4|5|6|7|8|9" but it fails on double digit numbers. For example 12 passes as ok due to the regex finding the 1 in 12.
You can use begin and end anchors to force the whole string to be matched:
^(0|1|2|3|4|5|6|7|8|9|10|11)$
Which can be shortened to:
^(\d|10|11)$
This will work if you want to check if just one number is between 0 and 11.
^[0-9]$|^1?[0-1]$
If you want to match a string like:
1,2,3,12,32,5,1,6,8, 11
and match 0-11 then you can use the following:
(?<=,|^)([0-9]|1?[0-1])(?=,|$)
use this regex ^(0|1|2|3|4|5|6|7|8|9|(10)|(11))$