Matching Everything That Does Not Contain Anything in an Array with Regex - regex

I have the following regex query I'm trying to use to exclude assets from being cached:
^((?!(\.css|\.js|\.|\.json|\.xml|\.svg|\.ico|\.png|\.mp3|\.jpg|\.svg|\.woff|\.woff2|\.eot|\.ttf|\/api\/play\/add|\/api\/favorite|\/Listen\/channel|getAccountInfo)).)*$
Except it doesn't match https://exampl.com/home for some reason. Does anyone know how I can fix this? Also, is there anyway I can make the Regex expression better?

Your regex contains a |\.| part (after |\.js). That alternative makes your regex fail the match with any string containing a dot. You need to remove that alternative:
^((?!(\.css|\.js|\.json|\.xml|\.svg|\.ico|\.png|\.mp3|\.jpg|\.svg|\.woff|\.woff2|\.eot|\.ttf|\/api\/play\/add|\/api\/favorite|\/Listen\/channel|getAccountInfo)).)*$
See the regex demo

Related

Regular expression to exclude tag groups or match only (.*) in between tags

I am struggling with this regex for a while now.
I need to match the text which is in between the <ns3:OutputData> data</ns3:OutputData>.
Note: after nscould be 1 or 2 digits
Note: the data is in one line just as in the example
Note: the ... preceding and ending is just to mention there are more tags nested
My regex so far: (ns\d\d?:OutputData>)\b(.*)(\/\1)
Sample text:
...<ns3:OutputData>foo bar</ns3:OutputData>...
I have tried (?:(ns\d\d?:OutputData>)\b)(.*)(?:(\/\1)) in an attempt to exclude group 1 and 3.
I wan't to exclude the tags which are matched, as in the images:
start
end
Any help is much appreciated.
EDIT
There might be some regex interpretation issue with Grep Console for IntelliJ which I intend to use the regex.
Here is is the latest image with the best match so far...
Your regex is almost there. All you need to do is to make the inside-matcher non-greedy. I.e. instead of (.*) you can write (.*?).
Another, xml-specific alternative is the negated character-class: ([^<]*).
So, this is the regex: (ns\d\d?:OutputData>)\b(.*?)(\/\1) You can experiment with it here.
Update
To make sure that the only group is the one that matches the text, then you have to make it work without backreferences: (?:ns\d\d?:OutputData>)\b(.*?)<
Update 2
It's possible to match only the required parts, using lookbehind. Check the regex here.:
(?<=ns\d:OutputData>)\b([^<]*)|(?<=ns\d\d:OutputData>)\b([^<]*)
Explanation:
The two alternatives are almost identical. The only difference is the number of digits. This is important because some flavors support only fixed-length lookbehinds.
Checking alternative one, we put the starting tag into one lookbehind (?<=...) so it won't be included into the full match.
Then we match every non- lt symbol greedily: [^<]*. This will stop atching at the first closing tag.
Essentially, you need a look behind and a look ahead with a back reference to match just the content, but variable length look behinds are not allowed. Fortunately, you have only 2 variations, so an alternation deals with that:
(?<=<(ns\d:OutputData>)).*?(?=<\/\1)|(?<=<(ns\d\d:OutputData>)).*?(?=<\/\2)
The entire match is the target content between the tags, which may contain anything (including left angle brackets etc).
Note also the reluctant quantifier .*?, so the match stops at the next matching end tag, rather than greedy .* that would match all the way to the last matching end tag.
See live demo.
This was the answer in my case:
(?<=(ns\d:OutputData)>)(.*?)(?=<\/\1)
The answer is based on #WiktorStribiżew 3 given solutions (in comments).
The last one worked and I have made a slight modification of it.
Thanks all for the effort and especially #WiktorStribiżew!
EDIT
Ok, yes #Bohemian it does not match 2-digits, I forgot to update:
(?<=(ns\d{0,2}:OutputData)>)(.*?)(?=<\/\1)

Regex matching character within

Looking to match WS-810-REFERENCE-1 where the string must have -'s within it
And can't think of something to work perfectly
[a-zA-Z0-9\-]+
That will match but will also match words that do not have the - character
Thought of maybe this ([a-zA-Z0-9\-]+\-)+
But that will match WS-810-REFERENCE- missing the final segment.
Thoughts?
Used a modified version of the second attempt just to grab that extra missing section
((?:[a-zA-Z0-9]+\-)+[a-zA-Z0-9]+)
I believe you're looking for lookahead to make sure hyphen is present in the string. You can use:
\b(?=\w*?-)[a-zA-Z0-9-]+(?= |$)
Online Demo: http://regex101.com/r/pZ6hV6

.hgignore a folder except some subfolders

I want to ignore a folder but preserve some of its folders.
I Tried regexp matching like this
syntax: regexp
^site/customer/\b(?!.*/data/.*).*
Unfortunately this doesn't work.
I read in this answer that python only does fixed-width negative lookups.
Is my desired ignoring impossible?
Python regex is cool
Python does support negative lookahead lookups (?=.*foo). But it doesn't support arbitrary-length negative lookbehind lookups (?<=foo.*). It needs to be fixed (?<=foo..).
Which means it's definitely possible to solve your problem.
The problem
You've got the following regex: /customer/(?!.*/data/.*).*.
Let's take an input example /customer/data/name. It matches for a reason.
/customer/data/name
^^^^^^^^^^ -> /customer/ match !
^ (?!.*/data/.*) Let's check if there is no /data/ ahead
The problem is here, we've already matched "/"
so the regex only finds "data/name" instead of "/data/name"
^^^^^^^^^ .* match !
Fixing your regex
Basically we just need to remove that one forward slash, we add an anchor ^ to make sure it's the beginning of string and make sure we just match customer by using \b : ^/customer\b(?!.*/data/).*.
Online demo

Parse with Regex without trailing characters

How can I successfully parse the text below in that format to parse just
To: User <test#test.com>
and
To: <test#test.com>
When I try to parse the text below with
/To:.*<[A-Z0-9._+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}>/mi
It grabs
Message-ID <CC2E81A5.6B9%test#test.com>,
which I dont want in my answer.
I have tried using $ and \z and neither work. What am I doing wrong?
Information to parse
To: User <test#test.com> Message-ID <CC2E81A5.6B9%test#test.com>
To:
<test#test.com>
This is my parsing information in Rubular http://rubular.com/r/DQMQC4TQLV
Since you haven't specified exactly what your tool/language is, assumptions must be made.
In general regex pattern matching tends to be aggressive, matching the longest possible pattern. Your pattern starts off with .*, which means that you're going to match the longest possible string that ENDS WITH the remainder of your pattern <[A-Z0-9._+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}>, which was matched with <CC2E81A5.6B9%test#test.com> from the Message-ID.
Both Apalala's and nhahtdh's comments give you something to try. Avoid the all-inclusive .* at the start and use something that's a bit more specific: match leading spaces, or match anything EXCEPT the first part of what you're really interested in.
You need to make the wildcard match non greedy by adding a question mark after it:
To:.*?<[A-Z0-9._+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}>

How to match a string with an optional part?

We have a string that we need to parse using regex, the string could be either:
There was a problem at XXXX
There was a problem at XXXX, previous failures were YYY
The XXX could be any character (e.g. ".")
How can we make regex that will match:
XXXX
", previous failures were YYY" (remember could be optional)
Every regex that I tried captures on the first match everything (because greedy or too little because not greedy)
I know this is advance but maybe someone already did it.
^There was a problem at (.*?)(?:, previous failures were (.*))?$
(.*?) means match everything, but match as little as possible to make this match match. The ^ and $ anchors force the regex to span the entire line so that it will always match something.
EDIT: If you really want the surrounding error text, and not just "XXX" and "YYY", then use the following regex instead:
^There was a problem at (.*?)(, previous failures were .*)?$
EDIT 2: Depending on the format of XXX, you may be able to get away with the following, but only if there are no comma's in "XXX". Unfortuanately, aside from this though, you need atleast the $ anchor to make sure the non-greedy match will match something. As you noted in your question, using a greedy match isn't an option at all (while using . atleast).
There was a problem at ([^,]*)(, previous failures were .*)?
A Perl, Java, Python, .NET, JavaScript etc. compatible regex could be
^There was a problem at (.*?)(, previous failures were .*)?$
if I understand your question correctly. If you need a code sample, please provide more details.