Javascript Regex exclude matches that contain keywords - regex

I'm trying to do a single javascript regex that matches email addresses that start with lcp_ but ignore any matches that also contain the word auto at any position.
I've tried a few things with no luck
/^lcp[._-](?!auto)/gi
The goal is following
lcp_land#blah.com - match
lcp_land_auto#blah.com - no match
Thanks

You can you use a "tempered greedy token" which basically means you are checking a negative lookahead with each repetition of the sub-pattern, so as to exclude the illegal string at any position, something like this for example:
\blcp_(?:(?!auto)\S)+(?=\s)
https://regex101.com/r/I0xisv/2

Related

PCRE Regex Match /x... but not /y/x

When configuring redirections, it's common to run into multiple pages that include some of the same path strings. We've ran into this instance multiple times where we need to redirect:
https://example.com/x...
But not:
https://example.com/y/x...
To match the /x... we use PCRE regex of:
/x.*
We've been struggling to get the exclude to match correctly; we apologize in advance as our regex is a bit weak, here's our pseudo code:
Match all /x... except /y/x...
Here is what we thought that looked like:
^\/(?!y\/).x.*
In our mind that reads:
Any query starting with /x..., except starting with /y/x...
Thank you in advance, and please feel free to suggest better formatting, we are not stack overflow pros.
Your regex matches from the start of the string a forward slash and then uses a negative lookahead to check what follows is not y/. If that is true, then match any character followed by x and 0+ character. That will match for example //x///
Without taking matching the url part into account, one way could be to use a negative lookahead (?! to check if what is on the right side does not contain /y/x and then match any character:
^(?!.*/y/x).+
Regex demo
You may use a negative lookbehind assertion:
~(?<!/y)/x~
RegEx Demo
(?<!/y) is a negative lookbehind assertnion that will fail the match if /y appears before matching /x.

Match Latin words which not in the hook

I'm trying to filter words which is not in the "[ ]".
Why is this not working?
[^\[][\u0000-\u024F]+[^\]]
The reason your expression is not working is that it matches all text inside brackets as well as outside.
This is the best I've been able to do:
/(?:^|])[^[]+/g
It includes the ]s in the match because look-behind is not allowed:
http://regexr.com/3c515
If look-behind were allowed, this would be the ticket:
/(?:^|(?<=]))[^[]+/g
https://regex101.com/r/lK9tS7/3
Because this will match [\u0000-\u024F]+ and 2 character which will be matches by [^\[]. If you want to your regex engine match the whole of pattern you need to use start and end anchors in your regex :
/^[^\[][\u0000-\u024F]+[^\]]$/m
But this will work if your string is contain words in each line, which is not a proper way.
As a better way you can use negative look arounds :
(?<!\[)[\u0000-\u024F]+(?!\])

Regex get all matches including smaller submatches

I have following input string
Testing <B><I>bold italic</I></B> text.
and following regex :
<([A-Z][A-Z0-9]*)\b[^>]*>.*</\1>
This regex only gives following larger match
<B><I>bold italic</I></B>
How to use regex to get the smaller match ?
<I>bold italic</I>
I tried using non-greedy operators, but it didn't worked either.
And Is it possible to get both as match groups using like java or c# match groups or match collections ?
Try the below regex which uses positive lookbehind,
(?<=>)<([A-Z][A-Z0-9]*)\b[^>]*>.*<\/\1>
DEMO
It looks for the tag which starts just after to the > symbol.
Explanation:
(?<=>) Positive lookbehind is used here, which sets the matching marker just after tp the > symbol.
< Literal < symbol.
([A-Z][A-Z0-9]*\b[^>]*>) Captures upto the next > symbol.
.* Matches nay character except \n zero or more times.
<\/\1> Matches the lietral </+first captured group+>
As you probably know, many people prefer using a DOM parser to parse html. But looking at your existing regex, to fix it, I would suggest this:
<([A-Z][A-Z0-9]*)\b[^<>]*>[^<]*</\1>
See the demo.
Explanation
Inside the tags, inside of the .* that match too many chars, we use [^<]*, which matches any chars that are not an opening tag. That way we won't go into another tag.
Likewise, I changed your [^>]* to [^<>]* so we don't start another tag
I assume you will make this case-insensitive

Negative lookahead to match server directories not properly working

Given the following 3 example paths representing server paths i am trying to create a skiplist for my FTP client via PCRE regular expressions but can't seem to get the wished result.
/subdir-level-1/subdir-level-2/.../Author1_-_Title1-(1234)-Publisher1
/subdir-level-1/subdir-level-2/.../Author2_-_Title2_(5678)-PUBLiSHER2
/subdir-level-1/subdir-level-2/.../Author3_-_Title3-4951-publisher3
I want to skip all folders (not paths) that do not end with
-Publisher1
I am trying to create a working pattern with the help of this online help and and this regex tester but don't get any further than to this negative lookahead pattern
.*-(?!Publisher1)
But with this pattern all lines match because with all of them the substrings up to the pattern do all not contain the pattern.
/subdir/subdir/.../Author1_-_Title1-(1234) -Publisher1
/subdir/subdir/.../Author2_-_Title2_(5678) -PUBLiSHER2
/subdir/subdir/.../Author3_-_Title3-4951 -publisher3
What is my mistake and how would the correct pattern be just to match only the second and third line as line to be skipped but keep the first line?
EDIT to make it clearer what to highlight and what not.
Everything from the beginning of the path to the last slash must be ignored (allowed).
Everything after the last slash that matches the defined regex must be skipped.
EDIT to present an advanced pattern matching only the red part
[^/]*(?<!-Publisher2)$
Debuggex Demo
The regex which you have used is:
.*-(?!Publisher1)
I will tell you whats the fault in it.
According to this regex it will match those lines which dont have a - followed by Publisher1. Okay, do you notice the - there in between on yur text, yes. between author and title or after title. So all the strings satisfy this condition. Instead if you search with a negative lookahead in such a way that hiphen is with Publisher1 then your match should work.
So you plan on moving the hiphen inside the parenthesis so that it matches and make your regex like this :
^.*(?!-Publisher1)
but this will also not work, because here .* matches everything, so when we do a lookahead, we are not able to find a single character to match . Thus we will use a negative lookbehind. <.
.*(?<!-Publisher1)
what now ? . I have done everything but still I cannot get it to work. why is it so ?
because a negative lookbehind will lookback and tell if it is not followed by -Publisher1.
this is complex, just bear with me :
suppose your string
/subdir/subdir/.../Author1_-_Title1-(1234)-Publisher1
we do a negative lookbehind for -Publisher1. From the postition after 1 . i.e. at the end of the string -Publisher1 is visible when we lookback. BUT our condition is negative lookbehind. So it will move one character left to reach a position where it will no more be able to lookback and say that "Hey I can see -Publisher1 from here" because from here we are able to see "-Publisher" only. Our condtin satisfies but the regex still matches the rest of the string.
So it is essential to bind the lookbehind to the end of the string so that it doesnot move one character to the left to search for its match.
final regex:
.*(?<!-Publisher1)$
demo here : http://regex101.com/r/lE1vW2
This should suit your needs:
^.*(?<!-Publisher1)$
Debuggex Demo
I want to skip all folders that do not end with -Publisher1
You can use this negative lookahead based regex:
^(?!.*?-Publisher1$).+$
Working Demo
You could use the following regex in order to exclude lines containing Publisher1:
^((?!Publisher1).)*$
Online demo: http://regex101.com/r/gD8jK0

Regex to match string between two characters in email

I'm trying to match a single string out of an email using regex. The email pattern looks like:
name.name.someid#mail.domain.com
And I would like to grab the 'someid' section. Meaning I need to match everything before the '#' and after the last period.
I can match everything before the '#' with (^[^#]+) however I can't effectively combine it in the regex statement to evaluate only after the last period (I can only get it to match after the first period).
Any pointers would be great, thanks!
Use a positive lookahead:
/[^.]+(?=#)/
Here's a demo: http://regex101.com/r/sW7sR3
/\.([^.#]+)#/
Without using lookarounds, this matches anything that's not an # or . that comes after a . and before #.