Regex to find combination of lines not containing a string - regex

I'm trying to find the correct regex that, within this input:
#Tag-1234
Scenario:
Blabla
Scenario:
Blabla
#Tag-1234
Scenario:
Blabla
Will select only the second one (the one without a tag).
So far I tried something like (?!#Tag-\d{4})\nScenario, but it's not doing the trick.
Can anyone throw some light into this?
I'm doing my tests on regex101 -> https://regex101.com/r/msDHKf/1
Thanks

It seems I was missing the \n before the tag that would concatenate to the search, so the regex would be
\n(?!#Tag-\d{4})\nScenario
So close!

In the pattern (?!#Tag-\d{4})\nScenario the lookahead can be removed as it is directly followed by matching \nScenario so the assertion will always be true.
If there should be no tag before Scenario but a newline, you can just match 2 newlines and then Scenario
\n\nScenario\b
See a regex demo.

Related

How can I write a regex that will match a nested [quote] BB tag?

As part of a forum that uses BBCode to store posts, I'm trying to write a way to detect mentions and quotes, in order to notify the users.
I have it working for all cases except nested quotes.
This is my regex so far (Python 2.7):
regex = r'\[url=.*?\/users\/(.*?)\/\]#.*?\[\/url\]|\[quote="(.*?)"\].*?\[\/quote\]'
These are my test cases:
# This works fine, I get the `user1` group.
Hello [url=/users/user1/]#Foo Bar[/url]
# This works fine, I get the `user2` and `user3` groups.
[quote="user2"]Test message[/quote] OK [quote="user3"]Test message[/quote]
# This doesn't work as I'd l ike. I only get the `user4` group, but not `user5`.
[quote="user4"][quote="user5"]Test message[/quote][/quote]
How can I modify the regular expression to match also the third test with the nested [quote] block?
Here's a link to regex101 for your convenience: https://regex101.com/r/Ov5SI1/1
Thank you!
A minor change in the original regex will solve your problem. Here is the original regex:
\[url=.*?\/users\/(.*?)\/\]#.*?\[\/url\]|\[quote="(.*?)"\].*?\[\/quote\]
Error
Consider the input string:
[quote="user4"][quote="user5"]Test message[/quote][/quote]
The last alternation tries to match it and it does succeed. However, the first match is
[quote="user4"][quote="user5"]Test message[/quote]
Now the next match starts after the [/quote]. It will not start anywhere before since all the previous text is already part of a successful match.
Correction
Solution 1:
Changing this part .*?\[\/quote\] in the original regex to a look ahead will result in successful match of both the user4 and user5.
\[quote=\"(.*?)\"\](?=.*?\[\/quote\])
final regex: \[url=.*?\/users\/(.*?)\/\]#.*?\[\/url\]|\[quote=\"(.*?)\"\](?=.*?\[\/quote\])
Solution 2:
Focusing on just the right part of the alternation - \[quote="(.*?)"\].*?\[\/quote\]
Here only \[quote="(.*?)"\] this is necessary if you want to find any patter of the form [quote="..."]. The remaining portion is unnecessary.
Here is the final regex:
\[url=.*?\/users\/(.*?)\/\]#.*?\[\/url\]|\[quote=\"(.*?)\"\]
Please do remember that the regex must be applied globally to find all the matches.

Regex to remove a whole phrase from the match

I am trying to remove a whole phrase from my regex(PCRE) matches
if given the following strings
test:test2:test3:test4:test5:1.0.department
test:test2:test3:test4:test5:1.0.foo.0.bar
user.0.display
"test:test2:test3:test4:test5:1.0".division
I want to write regex that will return:
.department
.foo.0.bar
user.0.display
.division
Now I thought a good way to do this would be to match everything and then remove test:test2:test3:test4:test5:1.0 and "test:test2:test3:test4:test5:1.0" but I am struggling to do this
I tried the following
\b(?!(test:test2:test3:test4:test5:1\.0)|("test:test2:test3:test4:test5:1\.0"))\b.*
but this seems to just remove the first tests from each and thats all. Could anyone help on where I am going wrong or a better approach maybe?
I suggest searching for the following pattern:
"?test:test2:test3:test4:test5:1\.0"?
and replacing with an empty string. See the regex demo and the regex graph:
The quotation marks on both ends are made optional with a ? (1 or 0 times) quantifier.

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

RegEx substract text from inside

I have an example string:
*DataFromAdHoc(cbgv)
I would like to extract by RegEx:
DataFromAdHoc
So far I have figured something like that:
^[^#][^\(]+
But Unfortunately without positive result. Do you have maybe any idea why it's not working?
The regex you tried ^[^#][^\(]+ would match:
From the beginning of the string, it should not be a # ^[^#]
Then match until you encounter a parenthesis (I think you don't have to escape the parenthesis in a character class) [^\(]+
So this would match *DataFromAdHoc, including the *, because it is not a #.
What you could do, it capture this part [^\(]+ in a group like ([^(]+)
Then your regex would look like:
^[^#]([^(]+)
And the DataFromAdHoc would be in group 1.
Use ^\*(\w+)\(\w+\)$
It just gets everything between the * and the stuff in brackets.
Your answer may depend on which language you're running your regex in, please include that in your question.

RegEx: capture entire group content

I am writing a parser for some Oracle commands, like
LOAD DATA
INFILE /DD/DATEN
TRUNCATE
PRESERVE BLANKS
INTO TABLE aaa.bbb
( some parameters... )
I already created a regex to match the entire command. I am now looking for a way to capture the name of the input file ("/DD/DATEN" for instance here).
My problem is that using the following regex will only return the last character of the first group ("N").
^\s*LOAD DATA\s*INFILE\s*(\w|\\|/)+\s*$
Debuggex Demo
Any ideas?
Many thanks in advance
EDIT: following #HamZa 's question, here would be the entire regex to parse Oracle LOAD DATA INFILE command (simplified though):
^\s*LOAD DATA\s*INFILE\s*((?:\w|\\|/)+)\s*((?:TRUNCATE|PRESERVE BLANKS)\s*){0,2}\s*INTO TABLE\s*((?:\w|\.)+)\s*\(\s*((\w+)\s*POSITION\s*\(\s*\d+\s*\:\s*\d+\s*\)\s*((DATE\s*\(\s*(\d+)\s*\)\s*\"YYYY-MM-DD\")|(INTEGER EXTERNAL)|(CHAR\s*\(\s*(\d+)\s*\)))\s*\,{0,1}\s*)+\)\s*$
Debuggex Demo
Let's point out the wrongdoer in your regex (\w|\\|/)+. What happens here ?
You're matching either a word character or a back/forwardslash and putting it in group 1 (\w|\\|/) after that you're telling the regex engine to do this one or more times +. What you actually want is to match those characters several times before grouping them. So you might use a non-matching group (?:) : ((?:\w|\\|/)+).
You might notice that you could just use a character class after all ([\w\\/]+). Hence, your regex could look like
^\s*LOAD DATA\s*INFILE\s*([\w\\/]+)\s*$
On a side note: that end anchor $ will cause your regex to fail if you're not using multiline mode. Or is it that you intentionally didn't post the full regex :) ?
Not tested but...
^\s*LOAD DATA\s*INFILE\s*(\S+)\s*$