Negative lookbehind not working correctly [duplicate] - regex

This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 2 years ago.
regex pattern(javascript and pcre)
<a href="(?<!#).*".*>(.*)<\/a>
This pattern should not select any html anchor tag of which the href attribute starts with a # symbol. But it matches the following code
Team
What am I doing wrong here?

Look-arounds are zero-width, meaning they don't consume any characters, making them only useful at the start and end of a pattern. #team is not preceded by #, so the first .* matches #team.
The way to write what you want is "[^#].*". This means the first character in the quotes must not be #. One caveat here is that it will not match empty strings, but that's easy enough to add like so: "([^#].*)?".

Related

Regex for excluding files with specific pattern [duplicate]

This question already has answers here:
Regex: match everything but a specific pattern
(6 answers)
Closed 2 years ago.
Hey I have a list of files
B123245.xml
B123245-ext.xml
1234W01.xml
1234W01-ext.xml
Now I need a regular expression filter only the files without -ext in the name.
I tried already this ^.+(?!-ext)\.xml$
but it is not working.
What am I doing wrong?
Not sure about your exact needs, but if you want to exclude those file where "-ext" is right before the xml extension I think you could use:
^.+(?<!-ext)\.xml$
See the demo
^ - Start string anchor.
.+ - 1+ character apart from newline.
(?<!-ext) - A negative lookbehind to assert position isn't preceded by "-ext".
\.xml - Match a literal dot and "xml".
$ - End string anchor.
With the help of user 'The fourth bird' I found out the correct structure.
Here is the correct result
^(?!.*-ext).+\.xml$

Find DATE match starting from end of string [duplicate]

This question already has answers here:
Regex Last occurrence?
(7 answers)
Closed 3 years ago.
I have the following RegEx syntax that will match the first date found.
([0-9]+)/([0-9]+)/([0-9]+)
However, I would like to start from the end of the content and search backwards. In other words, in the below example, my syntax will always match the first date, but I want it to match the last instead.
Some Text here
01/02/15
Some additional
text here.
10/04/14
Ending text
here
I believe this is possible by using a negative lookahead, but all my attempts failed at this because I don't understand RegEx enough. Help would be appreciated.
Note: my application uses RegEx PCRP.
You could make the dot match a newline using for example an inline modifier (?s) and match until the end of the string.
Then make use of backtracking until the last occurrence of the date like pattern and precede the first digit with a word boundary.
Use \K to forget what was matched and match the date like pattern.
^(?s).*\b\K[0-9]+/[0-9]+/[0-9]+
Regex demo
Note that the pattern is a very broad match and does not validate a date itself.

Regular Expression - Need Help Matching Everything Except For A Certain String [duplicate]

This question already has answers here:
Regex: match everything but a specific pattern
(6 answers)
Closed 4 years ago.
After countless hours of trying to get this regex to work (including looking all over StackOverflow), I thought I'd reach out for help on here as I have not been successful).
I have tried creating a regex to match everything and to not match any parameters that look like this:
text=3242ffs3F34
The data after the = sign can be random (its a mixture of numeric and string characters) and is never the same. So far I have created the following regex below, which is almost doing what I am after but it does not work.
\b(?!text=.*)\b\S+
Assistance is much appreciated!
EDIT:
I will be using the regex to match everything in a file but to filter out all parameters that look like this:
text=3242ffs3F34
Below is an example of how the config file will look like:
This is a test
test=asda
test2=22rr2
text=3242ffs3F34
test5=hello
To match everything except strings containing LAST_DOMINO_TIME= as substring you can use the expression:
(?!.*\bLAST_DOMINO_TIME=.*$)^.*$
(?! Negative lookahead.
.* Match anything.
\b Word boundary.
LAST_DOMINO_TIME= Literal substring.
.*$ Anything up to end of string.
) Close lookahead.
^.*$ Assert position beginning of line, match anything up to end of line.
You can try it here.

REGEX - find a string that has the same match? [duplicate]

This question already has answers here:
Regex plus vs star difference? [duplicate]
(9 answers)
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 4 years ago.
I am trying to match a string "menu-item" but has a digit after it.
<li id="menu-item-578" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-578">
i can use this regex
menu-item-[0-9]*
however it matches all the menu-item string, i want to only match the "menu-item-578" but not id="menu-item-578"
how can i do it?
thank you
You should avoid using menu-item-[0-9]* not because it matches the same expected substring superfluously but for the reason that it goes beyond that too like matching menu-item- in menu-item-one.
Besides replacing quantifier with +, you have to look if preceding character is not a non-whitespace character:
(?<!\S)menu-item-[0-9]+(?=["' ])
or if your regex flavor doesn't support lookarounds you may want to do this which may not be precise either:
[ ]menu-item-[0-9]+
You may also consider following characters using a more strict pattern:
[ ]menu-item-[0-9]+["' ]
Try it works too:
(\s)(menu-item-)\d+
https://regex101.com/
\s Any whitespace character
Use a space before, like this:
\ menu-item-[0-9]*
The first ocurrence has an " right before, while the second one has a space.
EDIT: use an online regex editor (like Regex tester to try this things.

Regex for string containing one string, but not another [duplicate]

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
Have regex in our project that matches any url that contains the string
"/pdf/":
(.+)/pdf/.+
Need to modify it so that it won't match urls that also contain "help"
Example:
Shouldn't match: "/dealer/help/us/en/pdf/simple.pdf"
Should match: "/dealer/us/en/pdf/simple.pdf"
If lookarounds are supported, this is very easy to achieve:
(?=.*/pdf/)(?!.*help)(.+)
See a demo on regex101.com.
(?:^|\s)((?:[^h ]|h(?!elp))+\/pdf\/\S*)(?:$|\s)
First thing is match either a space or the start of a line
(?:^|\s)
Then we match anything that is not a or h OR any h that does not have elp behind it, one or more times +, until we find a /pdf/, then match non-space characters \S any number of times *.
((?:[^h ]|h(?!elp))+\/pdf\/\S*)
If we want to detect help after the /pdf/, we can duplicate matching from the start.
((?:[^h ]|h(?!elp))+\/pdf\/(?:[^h ]|h(?!elp))+)
Finally, we match a or end line/string ($)
(?:$|\s)
The full match will include leading/trailing spaces, and should be stripped. If you use capture group 1, you don't need to strip the ends.
Example on regex101