Regex to exclude a string from anywhere, but match another expression - regex

I'm quite new to regex. Tried to look at other questions but still can't workout how to resolve my scenario. I want to match string that starts with "AB" but not ABC,or string contains DE but not DEF. For example sDEN23, DET or DE should be matches, AB3 should be matches. I've tried the below so far but it doesn't work as expected. Could someone please help? Many thanks.
Edited: How can this be achieved without using lookahead and lookbehind, as these are not support by Impala?
.*AB?[^C].*|.*DE?[^DEF]

You can use a negative lookahead pattern to avoid matches followed by certain characters:
^(?:AB(?!C)|(?!.*DEF).*DE).*
Demo: https://regex101.com/r/DH1WTf/3
EDIT: Since you've updated the question by replacing the python tag with impala, whose regex engine does not support lookarounds, you can instead use multiple LIKE operators to achieve what you want:
SELECT * FROM table_name WHERE (col LIKE 'AB%' OR col LIKE '%DE%') AND NOT (col LIKE 'ABC%' OR col LIKE '%DEF%')

Try with
(AB)([^C^\n\r])+|(DE)([^F^\n\r])+

Use a negative look ahead for your restrictions, and a match for either of your targets:
^(?!ABC|DEF)(AB|DE).*
See live demo.

Related

Split complex string into mutliple parts using regex

I've tried a lot to split this string into something i can work with, however my experience isn't enough to reach the goal. Tried first 3 pages on google, which helped but still didn't give me an idea how to properly do this:
I have a string which looks like this:
My Dogs,213,220#Gallery,635,210#Screenshot,219,530#Good Morning,412,408#
The result should be:
MyDogs
213,229
Gallery
635,210
Screenshot
219,530
Good Morning
412,408
Anyone have an idea how to use regex to split the string like shown above?
Given the shared patterns, it seems you're looking for a regex like the following:
[A-Za-z ]+|\d+,\d+
It matches two patterns:
[A-Za-z ]+: any combination of letters and spaces
\d+,\d+: any combination of digits + a comma + any combination of digits
Check the demo here.
If you want a more strict regex, you can include the previous pattern between a lookbehind and a lookahead, so that you're sure that every match is preceeded by either a comma, a # or a start/end of string character.
(?<=^|,|#)([A-Za-z ]+|\d+,\d+)(?=,|#|$)
Check the demo here.

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

RegEx substract text from inside

I have an example string:
*DataFromAdHoc(cbgv)
I would like to extract by RegEx:
DataFromAdHoc
So far I have figured something like that:
^[^#][^\(]+
But Unfortunately without positive result. Do you have maybe any idea why it's not working?
The regex you tried ^[^#][^\(]+ would match:
From the beginning of the string, it should not be a # ^[^#]
Then match until you encounter a parenthesis (I think you don't have to escape the parenthesis in a character class) [^\(]+
So this would match *DataFromAdHoc, including the *, because it is not a #.
What you could do, it capture this part [^\(]+ in a group like ([^(]+)
Then your regex would look like:
^[^#]([^(]+)
And the DataFromAdHoc would be in group 1.
Use ^\*(\w+)\(\w+\)$
It just gets everything between the * and the stuff in brackets.
Your answer may depend on which language you're running your regex in, please include that in your question.

regex to select all characters enclosed by parenthesis provided equals sign is present

I am using the following regex to select any string enclosed by parentheses:
/\(([^()]+)\)/g
However, I would like to select only if there is an equals sign present within the string, e.g.:
(28%) - NOT selected
(A = 28%) - selected
Is this possible?
It is possible by Look-Head assertion that works like if in programming language
\(.*(?==).*?\)
The look-head is (?=) and you can insert what you want after = sign. It is a part of the regex and not a regular equal sign. So you need (?==) in you code. That's it.
demo
As #Wiktor mentioned /\(([^()]*=[^()]*)\)/g can solve your problem,
https://regex101.com/r/0lmEcb/1
To add to Wiktor's comment and RaR's answer, * will match cases like these also: (=28%), (A=) which you might not desire. You can use + to make sure there are characters on either side of =.
\([^(]+=[^)]+\)
Demo: https://regex101.com/r/0lmEcb/2

Regular Expression - Want two matches get only one

I'm working wih a regular expression and have some lines in javascript. My expression should deliver two matches but recognizes only one and I don't know whats the problem.
The Lines in javascript look like this:
if(mode==1) var adresse = "?APPNAME=CampusNet&PRGNAME=ACTION&ARGUMENTS=-A7uh6sBXerQwOCd8VxEMp6x0STE.YaNZDsBnBOto8YWsmwbh7FmWgYGPUHysiL9u0.jUsPVdYQAlvwCsiktBzUaCohVBnkyistIjCR77awL5xoM3WTHYox0AQs65SoHAhMXDJVr7="; else var adresse = "?APPNAME=CampusNet&PRGNAME=ACTION&ARGUMENTS=-AHMqmg-jXIDdylCjFLuixe..udPC2hjn6Kiioq7O41HsnnaP6ylFkQLhaUkaWKINEj4l2JqL2eBSzOpmG.b5Av2AvvUxEinUhMBTt5awdgAL4SkBEgYXGejTGUxcgPE-MfiQjefc=";
My expression looks like this:
(?<Popup>(popUp\(')|(adresse...")).*\?((?<Parameters>APPNAME=CampusNet[^>"']*["']))
I want to have two matches with APPNAME...... as Parameters.
[UPDATE] Like Tim Pietzcker wrote i used the greedy version and should have used the lazy version. while he wrote that i solved it myself by using .? instead of . in the middle so the expression looks like this:
(?<Popup>(popUp\(')|(adresse...")).*?\\?((?<Parameters>APPNAME=CampusNet[^>"']*["']))
That worked. Thanks to Tim Pietzcker
Your regex matches too much - from the very first adresse until the very last " because it uses a greedy quantifier .*.
If you make that quantifier lazy, i. e.
(?<Popup>(popUp\(')|(adresse...")).*?\?((?<Parameters>APPNAME=CampusNet[^>"']*["']))
you get two matches.
Alternatively, if your data allows this, use a different quantifier that only matches non-space characters. This will match faster (but will fail of course if the text you're trying to match could possibly contain spaces):
(?<Popup>(popUp\(')|(adresse..."))\S*\?((?<Parameters>APPNAME=CampusNet[^>"']*["']))
Usually you must apply the regex with the "global" flag to find all matches. I can't really say more until I see the complete code sample you are working with.