Having trouble getting the regex to work for this. I want to basically just recognize the second half of something like this: firsthalf.secondhalf(): as a function. So in the example above just the .secondhalf(): would be recognized as unique and different color than the firsthalf.
I've tried, but to no avail:
<regex>(\w*()\b)</regex>
Try following:
(\w*\(\):)
Debuggex Demo
\w*\(\)\:$
\w* match any word character [a-zA-Z0-9_]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\( matches the character ( literally
\) matches the character ) literally
\: matches the character : literally
$ assert position at end of the string
Related
There are a thousand regular expression questions on SO, so I apologize if this is already covered. I did look first.
I have string:
Name Subname 11X22 88X620 AB33(20) YA5619 77,66
I need to capture this string: YA5619
What I am doing is just finding AB33(20) and after this I am capturing until first white space. But AB33(20) can be AB-33(20) or AB33(-20) or AB33(-1).
My preg_match regex is: (?<=\bAB\d{2}\(\d{2}\)\s).+?(?=\s)
Why I am getting error when I change from \d{2} to \d+?
For final result I was thinking this regix will work but no:
(?<=\bAB-?\d+\(-?\d+\)\s).+?(?=\s)
Any ideas what I am doing wrong?
With most regex flavors, lookbehind needs to evaluate to a fixed-length sequence, so you can't use variable quantifiers like * or + or even {1,2}.
Instead of using lookaround, you can simply match your marker pattern and then forget it with \K.
AB-?\d+(?:\(-?\d+\))? \K[^ ]+
demo: https://regex101.com/r/8XXngH/1
It depends on the language. If it is in .NET for example, it matches due to the various length in the lookbehind.
Another solution might be to use a character class and add the character you would allow to match. Then match a whitespace character and capture in a group matching \S+ which matches 1+ times not a whitespace character.
\bAB[()\d-]+\s\K\S+
Explanation
\bAB Match literally prepended with word boundary to prevent AB being part of a larger match.
[()\d-]+ Match 1+ times any of the listed character in the character class
\s Match a whitespace char (or \s+ to match 1 or more)
\K Reset the starting point of the reported match( Forget what was matched)
\S+ Match in a group 1+ times not a whitespace character
Regex demo | Php demo
I need a regular expression to match below pattern
Word1 OR Word2 OR Word3 OR......
basically this is a string which contains words split by OR
You can do:
(\w+)(?=(?:\s+OR)|(?:\s*$))
Demo
The following will match based on what you've given:
^\w+(?: OR \w+)*$
^ assert position at start of the string
\w+ match any word character [a-zA-Z0-9_]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
(?: OR \w+)* Non-capturing group
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
_OR_ matches the characters _OR_ literally (case sensitive)
\w+ match any word character [a-zA-Z0-9_]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
$ assert position at end of the string
NOTE: I used _OR_ to show the spaces around OR in the quotation as the whitespace was ignored.
Link to Regex101
Do something like this and access the second group using \2:
((\w+)\sOR)*
Try playing with the link below, read the explanation and you'll understand how it works:
https://regex101.com/r/vY6mA7/1
I noticed some interesting behaviour with some regex work I am doing, and I'd like some insight.
From what I understand, the word character, \w should match the following [a-zA-Z_0-9]
Given this input,
0000000060399301+0000000042456971+0000000
What should this regex
(\d+)\w
Capture?
I would expect it to capture 0000000060399301 but it actually captures 000000006039930
Is there something I am missing? Why is the 1 dropped from the end?
I noticed if I changed the regex to
(\d+\w)
It captures correctly i.e. including the 1
Anyone care to explain? Thanks
You require the regex to match a trailing word character - that would be the 1.
It cannot be another character, because
+ is not a word class character
+ is not a digit
matching is greedy
\d+ - matches one or more digit characters.
\w+ - matches one or more word characters. [A-Za-z\d_]
So with this string 0000000060399301+, \d+ in this (\d+)\w regex matches all the digits (including the 1 before +) at very first, since the following pattern is \w , regex engine tries to find a match, so it backtracks one character to the left and forces \w to match the digit before + . Now the captured group contains 000000006039930 and the last 1 is matched by \w
The 1 is being dropped because \w isn't in the capture group.
Is there any simple way to transform:
"<A[hello|home]>"
to:
"hello|home"
Thanks!
Apart from the clever advice in the comments to simply remove certain characters, if you are unable to remove these characters because they are present elsewhere in the text and do want to match that format, here is a way to do it with regex:
Search: <\w+\[([^|]*\|[^\]]*)\]>
Replace: \1 or $1 depending on editor or regex engine.
See the Substitution pane at the bottom of the demo.
Explanation
<\w+\[([^|]*\|[^\]]*)\]>
Match the character “<” literally <
Match a single character that is a “word character” (Unicode; any letter or ideograph, digit, connector punctuation) \w+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match the character “[” literally \[
Match the regex below and capture its match into backreference number 1 ([^|]*\|[^\]]*)
Match any character that is NOT a “|” [^|]*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the character “|” literally \|
Match any character that is NOT a “]” [^\]]*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the character “]” literally \]
Match the character “>” literally >
\1
Insert the backslash character \
Insert the character “1” literally 1
I am building a RegEx that needs to find lines that have either:
DateTime.Now
or
Date.Now
But cannot have the literal "SystemDateTime" on the same line.
I started with this (DateTime\.Now|Date\.Now) but now I am stuck with where to put the "SystemDateTime"
Use this. Assuming you are not using /s modifier(or DOTALL) which takes newline characters under the dot(.)
(?!.*SystemDateTime)(DateTime\.Now|Date\.Now)
(?!.*SystemDateTime) means there is no SystemDateTime in front.
You could use negative lookahead like this:
(?!.*SystemDateTime)\bDate(?:Time)?\.Now\b
/(?!.*SystemDateTime)Date(?:Time)?\.Now/
DEMO
EXPLANATION:
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!.*SystemDateTime)»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the characters “SystemDateTime” literally «SystemDateTime»
Match the characters “Date” literally «Date»
Match the regular expression below «(?:Time)?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the characters “Time” literally «Time»
Match the character “.” literally «\.»
Match the characters “Now” literally «Now»