Negative lookahead to match server directories not properly working - regex

Given the following 3 example paths representing server paths i am trying to create a skiplist for my FTP client via PCRE regular expressions but can't seem to get the wished result.
/subdir-level-1/subdir-level-2/.../Author1_-_Title1-(1234)-Publisher1
/subdir-level-1/subdir-level-2/.../Author2_-_Title2_(5678)-PUBLiSHER2
/subdir-level-1/subdir-level-2/.../Author3_-_Title3-4951-publisher3
I want to skip all folders (not paths) that do not end with
-Publisher1
I am trying to create a working pattern with the help of this online help and and this regex tester but don't get any further than to this negative lookahead pattern
.*-(?!Publisher1)
But with this pattern all lines match because with all of them the substrings up to the pattern do all not contain the pattern.
/subdir/subdir/.../Author1_-_Title1-(1234) -Publisher1
/subdir/subdir/.../Author2_-_Title2_(5678) -PUBLiSHER2
/subdir/subdir/.../Author3_-_Title3-4951 -publisher3
What is my mistake and how would the correct pattern be just to match only the second and third line as line to be skipped but keep the first line?
EDIT to make it clearer what to highlight and what not.
Everything from the beginning of the path to the last slash must be ignored (allowed).
Everything after the last slash that matches the defined regex must be skipped.
EDIT to present an advanced pattern matching only the red part
[^/]*(?<!-Publisher2)$
Debuggex Demo

The regex which you have used is:
.*-(?!Publisher1)
I will tell you whats the fault in it.
According to this regex it will match those lines which dont have a - followed by Publisher1. Okay, do you notice the - there in between on yur text, yes. between author and title or after title. So all the strings satisfy this condition. Instead if you search with a negative lookahead in such a way that hiphen is with Publisher1 then your match should work.
So you plan on moving the hiphen inside the parenthesis so that it matches and make your regex like this :
^.*(?!-Publisher1)
but this will also not work, because here .* matches everything, so when we do a lookahead, we are not able to find a single character to match . Thus we will use a negative lookbehind. <.
.*(?<!-Publisher1)
what now ? . I have done everything but still I cannot get it to work. why is it so ?
because a negative lookbehind will lookback and tell if it is not followed by -Publisher1.
this is complex, just bear with me :
suppose your string
/subdir/subdir/.../Author1_-_Title1-(1234)-Publisher1
we do a negative lookbehind for -Publisher1. From the postition after 1 . i.e. at the end of the string -Publisher1 is visible when we lookback. BUT our condition is negative lookbehind. So it will move one character left to reach a position where it will no more be able to lookback and say that "Hey I can see -Publisher1 from here" because from here we are able to see "-Publisher" only. Our condtin satisfies but the regex still matches the rest of the string.
So it is essential to bind the lookbehind to the end of the string so that it doesnot move one character to the left to search for its match.
final regex:
.*(?<!-Publisher1)$
demo here : http://regex101.com/r/lE1vW2

This should suit your needs:
^.*(?<!-Publisher1)$
Debuggex Demo

I want to skip all folders that do not end with -Publisher1
You can use this negative lookahead based regex:
^(?!.*?-Publisher1$).+$
Working Demo

You could use the following regex in order to exclude lines containing Publisher1:
^((?!Publisher1).)*$
Online demo: http://regex101.com/r/gD8jK0

Related

Select Northings from a 1 Line String

I have the following string;
Start: 738392E, 6726376N
I extracted 738392 ok using (?<=.art\:\s)([0-9A-Z]*). This gave me a one group match allowing me to extract it as a column value
.
I want to extract 6726376 the same way. Have only one group appear because I am parsing that to a column value.
Not sure why is (?=(art\:\s\s*))(?=[,])*(.*[0-9]*) giving me the entire line after S.
Helping me get it right with an explanation will go along way.
Because you used positive lookaheads. Those just make some assertions, but don't "move the head along".
(?=(art\:\s\s*)) makes sure you're before "art: ...". The next thing is another positive lookahead that you quantify with a star to make it optional. Finally you match anything, so you get the rest of the line in your capture group.
I propose a simpler regex:
(?<=(art\:\s))(\d+)\D+(\d+)
Demo
First we make a positive lookback that makes sure we're after "art: ", then we match two numbers, seperated by non-numbers.
There is no need for you to make it this complicated. Just use something like
Start: (\d+)E, (\d+)N
or
\b\d+(?=[EN]\b)
if you need to match each bit separately.
Your expression (?=(art\:\s\s*))(?=[,])*(.*[0-9]*) has several problems besides the ones already mentioned: 1) your first and second lookahead match at different locations, 2) your second lookahead is quantified, which, in 25 years, I have never seen someone do, so kudos. ;), 3) your capturing group matches about anything, including any line or the empty string.
You match the whole part after it because you use .* which will match until the end of the line.
Note that this part [0-9]* at the end of the pattern does not match because it is optional and the preceding .* already matches until the end of the string.
You could get the match without any lookarounds:
(art:\s)(\d+)[^,]+,\s(\d+)
Regex demo
If you want the matches only, you could make use of the PyPi regex module
(?<=\bStart:(?:\s+\d+[A-Z],)* )\d+(?=[A-Z])
Regex demo (For example only, using a different engine) | Python demo

PCRE Regex Match /x... but not /y/x

When configuring redirections, it's common to run into multiple pages that include some of the same path strings. We've ran into this instance multiple times where we need to redirect:
https://example.com/x...
But not:
https://example.com/y/x...
To match the /x... we use PCRE regex of:
/x.*
We've been struggling to get the exclude to match correctly; we apologize in advance as our regex is a bit weak, here's our pseudo code:
Match all /x... except /y/x...
Here is what we thought that looked like:
^\/(?!y\/).x.*
In our mind that reads:
Any query starting with /x..., except starting with /y/x...
Thank you in advance, and please feel free to suggest better formatting, we are not stack overflow pros.
Your regex matches from the start of the string a forward slash and then uses a negative lookahead to check what follows is not y/. If that is true, then match any character followed by x and 0+ character. That will match for example //x///
Without taking matching the url part into account, one way could be to use a negative lookahead (?! to check if what is on the right side does not contain /y/x and then match any character:
^(?!.*/y/x).+
Regex demo
You may use a negative lookbehind assertion:
~(?<!/y)/x~
RegEx Demo
(?<!/y) is a negative lookbehind assertnion that will fail the match if /y appears before matching /x.

Regex for selecting words ending in 'ing' unless

I want to select words ending in with a regular expression, but I want exclude words that end in thing. For example:
everything
running
catching
nothing
Of these words, running and catching should be selected, everything and nothing should be excluded.
I've tried the following:
.+ing$
But that selects everything. I'm thinking look aheads/look arounds could be the solution, but I haven't been able to get one that works.
Solutions that work in Python or R would be helpful.
In python you can use negative lookbehind assertion as this:
^.*(?<!th)ing$
RegEx Demo
(?<!th) is negative lookbehind expression that will fail the match if th comes before ing at the end of string.
Note that if you are matching words that are not on separate lines then instead of anchors use word boundaries as:
\w+(?<!th)ing\b
Something like \b\w+(?<!th)ing\b maybe.
You might also use a negative lookahead (?! to assert that what is on the right is not 0+ times a word character followed by thing and a word boundary:
\b(?!\w*thing\b)\w*ing\b
Regex demo | Python demo

Regex negation?

I'm playing Regex Golf (http://regex.alf.nu/) and I'm doing the Abba hole. I have the following regex that matches the wrong side entirely (which is what I was trying to do):
(([\w])([\w])\3\2)
However, I'm trying to negate it now so it matches the other side. I can't seem to figure that part out. I tried:
(?!([\w])([\w])\3\2)
But that didn't work. Any tips from the regex masters?
You can make it much shorter (and get more points) by simply using . and removing unnecessary parens:
^(?!.*(.)(.)\2\1)
It just makes sure that there's no "abba" ("abba" here means 4 letters in that particular order we don't want to match) in any part of the string without having to match the whole word.
Using the explanation here: https://stackoverflow.com/a/406408/584663
I came up with: ^((?!((\w)(\w)\4\3)).)*$
The key here turns out to be the leading caret, ^, and the .*
(?! ...) is a look-ahead construct, and so does not advance the regex processing engine.
/(?! ...)/ on its own will correctly return a negative result for items matching the expression within; but for items which do not match (...) the regex engine continues processing. However if your regex only contains the (?! ) there is nothing left to process, and the regex processing position never advances. (See this great answer).
Apparently since the remaining regex is empty, it matches any zero-width segment of a string, i.e. it matches any string.
[begin SWAG]
With the caret ^ present, the regex engine is able to recognize that you are looking for a real answer and that you do not want it to tell you the string contains zero-width components.
[end SWAG]
Thus it is able to correctly fail to match when the (?! ) succeeds.

Matching double symbol

Hey guys I've been working with this one for a little while. I can't seem to get it.
Here is what I have so far
(#[^{2,}+)([^(\s\W\d{2}]+)(\b)
http://rubular.com/r/zlx3j00Wjl
Although this is not excepting periods in the match.
I basically need to match this.
#function.name(param)
I just need to match function.name. This does that.
http://rubular.com/r/hWMB72LsWT
I don't want to match this
##function.name(param)
hello##test.com`
Didn't know if anyone has any ideas. Thanks for the help.
You can use a negative lookahead: #(?!#) matches a # not followed by another #.
Here is my go at it (here it is on Rubular):
(?<!#)#(\w+(?:\.\w+)*)\([^)]*\)
Explained:
(?<!#)# # an '#' not preceded by an '#'
(\w+(?:\.\w+)*) # any number of xxx.xxx.xxx, captured into a group
\([^)]*\) # brackets, containing anything that isn't a closing bracket
Since this is Ruby, you might not care about matching parentheses. In that case you can just remove the last section.
Try this:
(?:^|\s)#+([^(]+)
You will have function.match and function.name in the first group, will not match hello##test.com. Rubular:
http://rubular.com/r/b8gy1LcVGz
Try this
(?!.*##)^#([^()\s]+)\b
See it here on Rubular
I removed some brackets from your expression
I removed the Quantifier from the leading #
(?!.*##) is a negative lookahead assertion. It will fail if it finds anywhere in the string two # characters in a row.
I am not sure about your requirements, if there is all the time a set of brackets at the end, then you don't need your word boundary. If there can be similar strings without brackets that you don't want to match, then I would add another lookahead to ensure this assertion:
?!.*##)^#([^()\s]+)(?=\()
See it here on Rubular