Appreciating regex but still beginning.
I tried many workarounds but can't figure how to solve my problem.
String A : 4 x 120glgt
String B : 120glgt
I'd like the proper regex to return 120 as the number after "x".
But sometimes there won't be "x". So, be it [A] or [B] looking for one unique approach.
I tried :
to start the search from the END
Start right after the "x"
I clearly have some syntax issues and didn't quite get the logic of (?=)
(?=[^x])(?=[0-9]+)
So looking forward to learn with your help
As you tagged pcre, you could optionally match the leading digits followed by x and use \K to clear the match buffer to only match the digits after it.
^(?:\d+\h*x\h*)?\K\d+
The pattern matches:
^ Start of string
(?:\d+\h*x\h*)? Optionally match 1+ digits followed by x between optional spaces
\K Forget what is matched so far
\d+ Match 1+ digits
See a regex demo.
If you want to use a lookahead variant, you might use
\d+(?=[^\r\n\dx]*$)
This pattern matches:
\d+ Match 1+ digits
(?= Positive lookahead, assert what is to the right is
[^\r\n\dx]*$ Match optional repetitions of any char except a digit, x or a newline
) Close the lookahead
See another regex demo.
Related
I've got 2 strings in the format:
Some_thing_here_1234 Match Me 1 & 1234 Match Me 1_1
In both cases I want the resultant match to be 1234 Match Me 1
So far I've got (?<=^|_)\d{4}\s.+ which works but in the case of string 2 also captures the _1 at the end. I thought I could use a lookahead at the end with an optional such as (?<=^|_)\d{4}\s.+(?=_\d{1}$|$) but it always seems to revert to the second option and so the _1 gets through.
Any help would be great
You can use
(?<=^|_)\d{4}\s[^_]+
See the regex demo.
Details:
(?<=^|_) - a positive lookbehind that matches a location that is immediately preceded with either start of string or a _ char (equal to (?<![^_]))
\d{4} - four digits
\s - a whitespace
[^_]+ - one or more chars other than _.
Your second pattern (?<=^|_)\d{4}\s.+(?=_\d{1}$|$) is greedy and at the end of the string the second alternative |$ will match so you will keep matching the whole line.
Note that you can omit {1}
If you want to use an optional part in the lookahad, you can make the match non greedy and optionally match :_\d in the lookahead followed by the end of the string.
(?<=^|_)\d{4}\s.+?(?=(?:_\d)?$)
See a regex demo.
I need to match a character to split a big string, let's say -, but not if it's between two digits
In a-b it should match -
In a-4 it should match -
In 3-a it should match -
In 3-4 it should not match
I've tried negative lookahead and lookbehind, but I've only been able to come up with this (?<=\D)-(?=\D)|(?<=\d)-(?=\D)|(?<=\D)-(?=\d)
Is there a simpler way to specify this pattern?
Edit: using regex conditionals I think I can use (?(?<=\D)-|-(?=\D))
The following will work for this scenario. Be sure that your Regex flavor of choice has conditionals, otherwise this will not work:
-(?(?=\d)(?<=\D-))
- // match a dash
(? // If
(?=\d) // the next character is a digit
(?<= // then start a lookbehind (assert preceding characters are)
\D- // a non-digit then the dash we matched
) // end lookbehind
) // end conditional
With nothing as the substitution, as the dash is the only character captured.
Another option is to use an alternation to match a - when on the left is not a digit or match a - when on the right is not a digit:
(?<!\d)-|-(?!\d)
(?<!\d)- Negative lookbehind, assert what is on the left is not a digit and match -
| or
-(?!\d) Match - and assert what is on the right is not a digit using a negative lookahead
Regex demo
I want to match
abc_def_ghi,
abc_abc_ghi,
abc_a2a_ghi,
abc_999_ghi
but not abc_xxx_ghi (with xxx in center).
I came up to manually consuming look ahead (abc_(?!xxx)..._ghi), but I wonder is there any other way without manually specifying number of characters to skip.
Original qustion was with numbers, updated for strings case.
If you don't want to specify exactly how many characters to skip, perhaps you could use a quantifier like + in the negative lookahead and use a negated character class to match not an underscore.
\babc_(?!x+_)[^_]+_ghi\b
Explanation
\babc_ Word boundary, match abc_
(?! Negative lookahead, assert what is directly on the right is not
x+_ Match 1+ times x followed by an underscore
) Close lookahead
[^_]+_ Negated character class, match 1+ times any char except _
ghi\b Match ghi and word boundary
Regex demo
You can use this
123_(?:(?!000)\d){3}_789
Regex demo
If you don't wish to use look-arounds, this expression might be an option:
(?:abc_xxx_ghi)|(abc_.{3}_ghi)
Other than that I can't think of anything else.
DEMO
There are a thousand regular expression questions on SO, so I apologize if this is already covered. I did look first.
I have string:
Name Subname 11X22 88X620 AB33(20) YA5619 77,66
I need to capture this string: YA5619
What I am doing is just finding AB33(20) and after this I am capturing until first white space. But AB33(20) can be AB-33(20) or AB33(-20) or AB33(-1).
My preg_match regex is: (?<=\bAB\d{2}\(\d{2}\)\s).+?(?=\s)
Why I am getting error when I change from \d{2} to \d+?
For final result I was thinking this regix will work but no:
(?<=\bAB-?\d+\(-?\d+\)\s).+?(?=\s)
Any ideas what I am doing wrong?
With most regex flavors, lookbehind needs to evaluate to a fixed-length sequence, so you can't use variable quantifiers like * or + or even {1,2}.
Instead of using lookaround, you can simply match your marker pattern and then forget it with \K.
AB-?\d+(?:\(-?\d+\))? \K[^ ]+
demo: https://regex101.com/r/8XXngH/1
It depends on the language. If it is in .NET for example, it matches due to the various length in the lookbehind.
Another solution might be to use a character class and add the character you would allow to match. Then match a whitespace character and capture in a group matching \S+ which matches 1+ times not a whitespace character.
\bAB[()\d-]+\s\K\S+
Explanation
\bAB Match literally prepended with word boundary to prevent AB being part of a larger match.
[()\d-]+ Match 1+ times any of the listed character in the character class
\s Match a whitespace char (or \s+ to match 1 or more)
\K Reset the starting point of the reported match( Forget what was matched)
\S+ Match in a group 1+ times not a whitespace character
Regex demo | Php demo
I am so sorry, I know this is a simple question, which is not appropriate here, but I am terrible in regex.
I use preg_match with a pattern of (numbers A) to match the following replaces with the substrings
2A -> <i>2A</i>
100 A -> <i>100 A</i>
84.55A -> <i>84.55A</i>
92.1 A -> <i>92.1 A</i>
The numbers can be separated from the character or not
The numbers can be decimal
The letter should not be the begging of a word (not matching 4 All;
in fact, A should be followed by a space or period or linebreak)
My problem is to apply OR conditions to match a character which may exist or not to have a single match to be replaced as
$str = preg_replace($pattern, '<i>$1</i>', $str);
I can suggest
'~\b(?<![\d.])\d*\.?\d+\s*A\b~'
See the regex demo. Replace with '<i>$0</i>' where the $0 is the backreference to the whole match.
Details:
\b - leading word boundary
(?<![\d.]) - a negative lookbehind that fails the match if there is a dot or digit before the current location (NOTE: this is added to avoid matching 33.333.4444 A like strings, just remove if not necessary)
\d*\.?\d+ - a usual simplified float/int value regex (0+ digits, an optional . and 1+ digits) (NOTE: if you need a more sophisticated regex for this, see Matching Floating Point Numbers with a Regular Expression)
\s* - 0+ whitespaces
A\b - a whole word A (here, \b is a trailing word boundary).