Regex : Find a number between parentheses and a specific string - regex

I would like to find lines where there is a number between a parentheses and string BAC after it
For exemple
ABABBAB (87490), BAC ===> OK
BLABLABLA (65688), BIC ===> Not OK
ABABBAB (75664), EEE ===> Not OK
I Have found an answer to get numbers between parentheses
^.*?\([^\d]*(\d+)[^\d]*\).*$ here an example
Now I would like to add the condition to match also the BAC string

Something like this should work:
^.*?\([^\d]*(\d+)[^\d]*\),\s+BAC\s*$
, — direct match
\s+ — one or more spaces
BAC — direct match
\s* — zero or more spaces
If you'd like to match and report an arbitrary word, this should work:
^.*?\([^\d]*(\d+)[^\d]*\)\s+(\S+).*$
\S+ — one or more non-space characters
To match BAC, followed by anything:
^.*?\([^\d]*(\d+)[^\d]*\),\s+BAC,.*$

You could avoid using a capture group with the following regex:
(?<=\()\d+(?=\))(?=.*\bBAC\b)
Demo
Each string of one or more digits surrounded by parentheses and followed by the word BAC (but not BACK or ABAC, for example) is matched.
This regex works with PCRE (PHP), Python, Javascript, Onigmo regex engines, and others that support fixed-length positive look-behinds and positive look-aheads. See the comparison chart here.
The regex engine performs the following operations.
(?<=\() # match '(' in a positive lookbehind
\d+ # match 1+ digits
(?=\)) # match ')' in a positive lookahead
(?=.*\bBAC\b) # match 0+ chars followed by `BAC` with word breaks fore and aft

Related

How to use a negative lookahead to prevent my regular expression from matching?

I'm using this regular expression: ^(\d+(?:\.\d+)?) that will match any decimal or integer numeric value that is followed by any character and will capture only the numeric part of it. For example this regex will match the following values and capture the numeric part of them:
10.5
10.5 Inches
10 Inches
However, it seems like my regex will also match the following value: 6" + 1.5". I want to update my regex so that it doesn't match for these type of values. So it shouldn't match if there are multiple numeric values.
I tried doing a negative lookahead like this ^(\d+(?:\.\d+)?)(?!\d), but it doesn't seem to be working.
Converting my comment to answer so that solution is easy to find for future visitors.
You may use this regex:
^(\d+(?:\.\d+)?)\b(?!.*\d)
RegEx Demo
RegEx Breakdown:
^: Line start
(: Start a capture group
\d+: Match 1+ digits
(?:\.\d+)?: Optionally match dot and 1+ digits
): End capture group
\b: Word boundary
(?!.*\d): Negative lookahead to assert that there is no digit ahead after this match

Regex TRYING to search with multiple criteria or backwards

Appreciating regex but still beginning.
I tried many workarounds but can't figure how to solve my problem.
String A : 4 x 120glgt
String B : 120glgt
I'd like the proper regex to return 120 as the number after "x".
But sometimes there won't be "x". So, be it [A] or [B] looking for one unique approach.
I tried :
to start the search from the END
Start right after the "x"
I clearly have some syntax issues and didn't quite get the logic of (?=)
(?=[^x])(?=[0-9]+)
So looking forward to learn with your help
As you tagged pcre, you could optionally match the leading digits followed by x and use \K to clear the match buffer to only match the digits after it.
^(?:\d+\h*x\h*)?\K\d+
The pattern matches:
^ Start of string
(?:\d+\h*x\h*)? Optionally match 1+ digits followed by x between optional spaces
\K Forget what is matched so far
\d+ Match 1+ digits
See a regex demo.
If you want to use a lookahead variant, you might use
\d+(?=[^\r\n\dx]*$)
This pattern matches:
\d+ Match 1+ digits
(?= Positive lookahead, assert what is to the right is
[^\r\n\dx]*$ Match optional repetitions of any char except a digit, x or a newline
) Close the lookahead
See another regex demo.

Regex to find string with only numbers, but match only when preceeded with # or \s and followed by space

I am attempting to find a regex that will find a string of numbers and only match if they are preceded with white space of a pound sign and followed by either white space or a line break. For example, the following would match:
#1234
#001234
000123
1234
But the following would not:
123-456
#1234
123kok
Using one of those online regex sandboxes, I tried to use a negative look behind:
\d*(?<=#|\s)\d{1,10} but I can't get the following to work. So out of these:
123-456
#1234
123kok
456 would match
(?<=...) is a lookbehind (preceded by ...), (?<!...) is a negative lookbehind (not preceded by ...). Writting \d*(?<=#|\s) doesn't make sense and behaves like (?<=#|\s) alone since a same position can't be a digit and a # or a whitespace at the same time. But it isn't the problem. All you need is an assertion for the condition after the digits: a lookahead (negative here).
(?<![^\s#])\d+(?!\S)
The double negation: not preceded by a character that is not a whitespace or a #, is useful to include the start of the string. Same thing for the negative lookahead (not followed by a character that is not a whitespace) to include the end of the string.
Obviously:
(?<=^|\s|#)\d+(?=\s|$)
is correct too but longer.

Regex to match numbers followed by a specific character

I am so sorry, I know this is a simple question, which is not appropriate here, but I am terrible in regex.
I use preg_match with a pattern of (numbers A) to match the following replaces with the substrings
2A -> <i>2A</i>
100 A -> <i>100 A</i>
84.55A -> <i>84.55A</i>
92.1 A -> <i>92.1 A</i>
The numbers can be separated from the character or not
The numbers can be decimal
The letter should not be the begging of a word (not matching 4 All;
in fact, A should be followed by a space or period or linebreak)
My problem is to apply OR conditions to match a character which may exist or not to have a single match to be replaced as
$str = preg_replace($pattern, '<i>$1</i>', $str);
I can suggest
'~\b(?<![\d.])\d*\.?\d+\s*A\b~'
See the regex demo. Replace with '<i>$0</i>' where the $0 is the backreference to the whole match.
Details:
\b - leading word boundary
(?<![\d.]) - a negative lookbehind that fails the match if there is a dot or digit before the current location (NOTE: this is added to avoid matching 33.333.4444 A like strings, just remove if not necessary)
\d*\.?\d+ - a usual simplified float/int value regex (0+ digits, an optional . and 1+ digits) (NOTE: if you need a more sophisticated regex for this, see Matching Floating Point Numbers with a Regular Expression)
\s* - 0+ whitespaces
A\b - a whole word A (here, \b is a trailing word boundary).

Regex numbers from string

I am trying to write a regex that can find only numbers from given string. What I mean is:
Input: My number is +12 345 678. I have galaxy s3, its symbol 34abc.
Output: 345 and 678 (but not +12, 3 from word s3 or 34 from 34abc)
I tried just numbers (\d+) and I combinations with white and words characters. The closest was^\d$ but that doesn't work as my numbers are part of the bigger string, not whole string themselves. Can you give me a hint?
------- EDIT
Looks like I just don't know how to check a character without actually getting it into result. Like "digit that follow space character (without this space)".
In general case, you can make use of lookbehind and lookahead:
(?<=^|\s)\d+(?=$|\s)
The part which makes it into the captured output is \d+.
Lookbehind and lookahead are not included in the match.
I just included spaces as delimiters in the regex, but you may replace \s with any character class, as defined by your requirements. For example, to allow dots as separators (both in front and after the digits), use the following regex:
(?<=^|[\s.])\d+(?=$|[\s.])
The (?<=^|\s) should be read as follows:
(?<= ... ) defines the lookbehind group.
The expression which must precede the \d+ is ^|\s, meaning "either start of the line (^) or whitespace".
Similarly, (?=$|\s) defines the lookahead group (it must follow the captured digits), which is either end of the line ($) or whitespace.
A note on \b mentioned in other answers: it is a nice feature, means "word boundary", but the "word characters" are not customizable. This means that, for example, the "+" character is considered to be a separator and you can't change this if you use \b. With lookaround, you can customize the separators to your needs.
What you seem to want is a sequence of digits (\d+) that is preceded by a whitespace (\s) or the start of the string (^), and followed by a whitespace or punctuation character ([\s.,:;!?]) or the end of the string ($), but the preceding/following whitespace or punctuation character should not be included in the match, so you need positive lookahead ((?=xxx)) and lookbehind ((?<=xxx)).
(?<=^|\s)\d+(?=[\s.,:;!?]|$)
See regex101 for demo.
Remember to double the backslashes in a Java literal.
Safer RegEx
Try this:
(?<=\s|^)\d+(?=\s|\b)
Live Demo on Regex101
How it works:
(?<=\s|^) # Start of String OR Whitespace (will not select +)
# Positive Lookbehind ensures the data is not included in the match
\d+ # Digit(s)
(?=\s|\b) # Whitespace OR Word Boundary
# Positive Lookahead ensures the data is not included in the match
Lookarounds do not take up any characters in the match, so they can be used so Capture Groups do not need to be. For example:
# Regex /.*barbaz/
barbaz # Matched Data Result: barbaz
foobarbaz # Matched Data Result: foobarbaz
# Regex (with Positive Lookahead) /.*bar(?=baz)/
barbaz # Matched Data Result: bar
foobarbaz # Matched Data Result: foobar
As you can see with the second RegEx, baz is never included in the matched data result, however it was required in the string for the RegEx to match. The RegEx above works on the same principle
Not as Safe (Old) RegEx
You can try this RegEx:
\b\d+\b
\b is a Word Boundary. This will, however, select 12 from +12.
You can change the RegEx to this to stop 12 from being selected:
(?<!\+)\b\d+\b
This uses a Negative Lookbehind and will fail if there is a + before the digits.
Live Demo on Regex101