Filter by regex example - regex

Could anyone provide an example of a regex filter for the Google Chrome Developer toolbar?
I especially need exclusion. I've tried many regexes, but somehow they don't seem to work:

It turned out that Google Chrome actually didn't support this until early 2015, see Google Code issue. With newer versions it works great, for example excluding everything that contains banners:
/^(?!.*?banners)/

It's possible -- at least in Chrome 58 Dev. You just need to wrap your regex with forward-slashes: /my-regex-string/
For example, this is one I'm currently using: /^(.(?!fallback font))+$/
It successfully filters out any messages that contain the substring "fallback font".
EDIT
Something else to note is that if you want to use the ^ (caret) symbol to search from the start of the log message, you have to first match the "fileName.js?someUrlParam:lineNumber " part of the string.
That is to say, the regex is matching against not just the log message, but also the stack-entry for the line which made the log.
So this is the regex I use to match all log messages where the actual message starts with "Dog":
/^.+?:[0-9]+ Dog/

The negative or exclusion case is much easier to write and think about when using the DevTool's native syntax. To provide the exclusion logic you need, simply use this:
-/app/ -/some\sother\sregex/
The "-" prior to the regex makes the result negative.

Your expression should not contain the forward slashes and /s, these are not needed for crafting a filter.
I believe your regex should finally read:
!(appl)
Depending on what exactly you want to filter.
The regex above will filter out all lines without the string "appl" in them.
edit: apparently exclusion is not supported?

Related

Trying to regex YouTube ads with pihole

EDIT:
As far as I know, Pihole does not block YouTube ads.
Original Post:
Trying to regex urls like:
r4---sn-vgqsrnez.googlevideo.com
r1---sn-vgqsknlz.googlevideo.com
r5---sn-vgqskn7e.googlevideo.com
r3---sn-vgqsknez.googlevideo.com
r6---sn-vgqs7ney.googlevideo.com
r4---sn-vgqskne6.googlevideo.com
r4---sn-vgqsrnez.googlevideo.com
r5---sn-vgqskn76.googlevideo.com
r6---sn-vgqs7ns7.googlevideo.com
r1---sn-vgqsener.googlevideo.com
r1---sn-vgqskn7z.googlevideo.com
r1---sn-vgqsknek.googlevideo.com
r6---sn-vgqsener.googlevideo.com
r3---sn-vgqs7nly.googlevideo.com
r1---sn-vgqsknes.googlevideo.com
r4---sn-vgqsrnes.googlevideo.com
r6---sn-vgqskn76.googlevideo.com
I've tried:
(^|\.)r[0-100]---sn-vgqs?n??\.googlevideo\.com$
(^|\.)r[0-100]?*\.googlevideo\.com$
^r[0-100]---sn-vgqs(?:.*)n(?:.*)(?:.*).googlevideo.com$
^r[0-100]---sn-vgqs(?:.*)n(?:.*).googlevideo.com$
but nothing works
I am probably using regex wrong because I don't have much experience with it but looking online some people have said it could be a thing with Pihole.
I'm guessing that you'd like to have restricted boundaries, if not though, this expression might be somewhat close to what you have in mind:
^r\d+---sn-vgqs[a-z0-9]{4}\.googlevideo\.com$
Demo 1
You can add more boundaries, if necessary, such as:
^r(?:100|[1-9]\d|\d)---sn-vgqs[a-z0-9]{4}\.googlevideo\.com$
Demo 2
or:
^r(?:100|[1-9]\d|\d)---sn-vgqs(?:rne(?:s|z)|kne(?:s|z)|knlz|kn7e|7ney|kne6|kn76|7ns7|ener|kn7z|knek|7nly)\.googlevideo\.com$
Demo 3
which I'm just guessing.
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
The following Regex match all the url start with "r" then followed by anything else without limiting number of character then followed by "sn" then followed by any number of characters then end with ".googlevideo.com" the expression was anchor with ^ and $.
I try it on my pihole with great success but have to remove it later. all r....sn...googlevideo.com was blocked in the query list but it also rendered my smart tv youtube app broken. It will not play any video at all unless I remove it from pihole. use it at your own risk.
^r.+sn.+(\.googlevideo\.com)$
The post is a bit older but because I tried myself with regexes I just want to say that your regexes can't work because of one "little" point.
Pi-Hole uses the POSIX ERE (POSIX Extended Regular Expressions) standard.
So there are no lazy quantifiers or shorthand character classes.
It also does not support non-capturing groups like in your third and fourth line.
You can check such regexes in tools like RegexBuddy. Maybe other free tools can check it too and help to convert it.
My current regex is:
^r[[:digit:]]+---sn-4g5e[a-z0-9]{4}\.googlevideo\.com$
It correctly blocks all ads BUT also videos.
If you use it you have to do the following.
Open a youtube video and check if the video loads.
If not, go to your pi hole dashboard to the query log.
For your device you will have two dns queries
r5---sn-4g5e6nze.googlevideo.com
and
r5---sn-4g5ednse.googlevideo.com
The last one (upper) in the query log is the video. So whitelist
the dns. You have to do it sometimes.
Greetings

Extract only the text field needed

I am at the beginning of learning Regex, and I use every opportunity to understand how it's working. Currently I am trying to extract dates from a text file (which is in fact a vnt-file type from my mobile phone). It looks like following:
BEGIN:VNOTE
VERSION:1.1
BODY;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:18.07.=0A14.08.=0A15.09.=0A15.10.=
=0A13.11.=0A13.12.=0A12.01.=0A03.02. Grippe=0A06.03.=0A04.04.2015=0A0=
5.05.2015=0A03.06.2015=0A03.07.2015=0A02.08.2015=0A30.08.2015=0A28.09=
17.11.2017=0A
DCREATED:20171118T095601
X-IRMC-LUID:150
END:VNOTE
I want to extract all dates, so that the final list is like that:
18.07.
14.08.
15.09.
15.10.
and so on. If the date has also a year, it should also be displayed.
I almost found out how to detect the dates by the following regex:
.+(\d\d\.\d\d\.(2015|2016|2017)?).+
But it only detect very few of the dates. The result is this:
BEGIN:VNOTE
VERSION:1.1
15.10.
04.04.2015
30.08.2015
24.01.2016
DCREATED:20171118T075601
X-IRMC-LUID:150
END:VNOTE
Then I tried to add a question mark which makes the .+ not greedy, as far as I read in tutorials. Then the regex looks like:
.+?(\d\d\.\d\d\.(2015|2016|2017)?).+?
But the result is still not what I am looking for:
BEGIN:VNOTE
VERSION:1.1
21.03.20.04.18.05.18.06.18.07.14.08.15.09.15.10.
13.11.13.12.12.01.03.02.06.03.04.04.20150A0=
03.06.201503.07.201502.08.201530.08.20150A28.09=
28.10.201525.11.201528.12.201524.01.20160A
DCREATED:20171118T075601
X-IRMC-LUID:150
END:VNOTE
For someone who is familiar with regex I am pretty sure this is very easy to solve, but I don't get it. It's very confusing when you are new to regex. I tried to find a hint in some tutorials or stackoverflow posts, but all I found is this: Notepad++ how to extract only the text field which is needed?
But it doesn't work for me. I assume it might have something to do with the fact that my text file is not one single line.
I have my example on regex101 too.
I would be very thankful if maybe someone can give me a hint what else I can try.
Edit: I would like to detect the dates with the regex and as a result have a list with only the dates (maybe it is called substitute?)
Edit 2: Sorry for not mentioning it earlier: I just want to use the regex in e.g. Notepad++ or an online regex test website. Just to get the result of the dates and save the result in a new txt-file. I don't want to use the regex in an programming language. My apologies for not being precisely before.
Edit 3: The result should be a list with the dates, and each date in a new line:
I want to extract all dates, so that the final list is like that:
18.07.
14.08.
15.09.
15.10.
I suggest this pattern:
(?:.*?|\G)(\d\d\.\d\d\.(?:\d{4})?)
This makes use of the \G flag that, in this case, allows for multiple matches from the very start of the match without letting any single unmatched character in the text, thus allowing the removal of all but what's wanted.
If you want to remove the extra matches as well, add |.* at the end:
(?:.*?|\G)(\d\d\.\d\d\.(?:\d{4})?)|.*
regex101 demo
In N++, make sure the options underlined are selected, and that the cursor is at the beginning. In the picture below, I replaced then undid the replacement, only to show that matches were identified (16 replacements).
You can try using the following pattern:
\d{2}\.\d{2}\.(?:\d{4})?
This will match day.month dates of the form 18.07., but it also allows such a date to be followed by a four digit year, e.g. 18.07.2017. While it would be nice to make the pattern more restrictive, to avoid false fire matches, I do not see anything obvious which can be added to the above pattern. Follow the demo link below to see the pattern in action.
Demo

Chrome dev tools: any way to exclude requests whose URL matches a regex?

Unfortunately in the last versions of Chrome the negative network filter doesn't work anymore. I used this filter in order to exclude each http call containing a particular string. I asked a solution in Chrome dev tool forum but at the moment nobody answered.
So I would like to know if there is a way to resolve this problem (and exclude for example each call containing the string 'loadMess') with regex syntax.
Update (2018):
This is an update to my old answer to clarify that both bugs have been fixed for some time now.
Negate or exclude filtering is working as expected now. That means you can filter request paths with my.com/path (show requests matching this), or -my.com/path (show requests not matching this).
The regex solution also works after my PR fix made it in production. That means you can also filter with /my.com.path/ and /^((?!my.com/path).)*$/, which will achieve the same result.
I have left the old answer here for reference, and it also explains the negative lookup solution.
The pre-defined negative filters do work, but it doesn't currently allow you to do NOT filters on the names in Chrome stable, only CONTAINS. This is a bug that has been fixed in Chrome Canary.
Once the change has been pushed to Chrome stable, you should be able to do loadMess to filter only for that name, and -loadMess to filter out that name and leave the rest, as it was previously.
Workaround: Regex for matching a string not containing a string
^((?!YOUR_STRING).)*$
Example:
^((?!loadMess).)*$
Explanation:
^ - Start of string
(?!loadMess) - Negative lookahead (at this cursor, do not match the next bit, without capturing)
. - Match any character (except line breaks)
()* - 0 or more of the preceeding group
$ - End of string
Update (2016):
I discovered that there is actually a bug with how DevTools deals with Regex in the Network panel. This means the workaround above doesn't work, despite it being valid.
The Network panel filters on Name and Path (as discovered from the source code), but it does two tests that are OR'ed. In the case above, if you have loadMess in the Name, but not in the Path (e.g. not the domain or directory), it's going to match on either. To clarify, true || false === true, which means it will only filter out loadMess if it's found in both the Name and Path.
I have created an issue in Chromium and have subsequently pushed a fix to be reviewed. This has subsequently been merged.
This is answered here - for latest Chrome 58.0.3029.110 (Official Build) (64-bit)
https://stackoverflow.com/a/27770139/4772631
E.g.: If I want to exclude all gifs then just type -gif
Negative lookahead is recommended everywhere, but it does not work.
Instead, "-myregex" does work for me. Like this: -/(Violation|HMR)/.
Chrome broswer dev tools support regrex filter not very well.
When I want to hide some requests, it does not work as showed above. But you can use -hide1 -hide2 to hide the request you want.
Just leave a space between the conditions, and this does not match the regrex, I guess it may use string match other than regrex in principle
Filtering multiple different urls
You can negate symbol for filtering the network call.
Eg: -lab.com would filter lab.com urls.
But for filtering multiple urls you can use the | symbol in the regex
Eg: -/lab.com|mini.com/ This will filter lab.com and mini.com as well you can use it to filter many different websites or urls.
You can use "Invert" option to exclude the APIs matching a string in the Filter text box.
On latest chrome version (62) you have to use :
-mime-type:image/gif

Looking for a regex to match more than one reference string in TortoiseSVN

We used two different methods to reference external documents and Bugzilla bug numbers.
I'm now looking for a regular expression that matches these two possibilities of reference strings for convenient display and linking in the TortoiseSVN 1.6.16 log screen. First should be a bugzilla entry of the form [BZ#123], second is [some text and numbers], which has not to be converted into a url.
This can be matched with
\[BZ#\d+\]
and
\[.*?\]
My problem now is to concatenate those two match strings together. Usually this would be done by the regex (first|second), and I've done it this way:
(\[.*?\]|\[BZ#\d+\])
Unfortunately in this case TortoiseSVN seems to catch it all as the bug number because of the round braces. Even if I add a second expression which (according to the documentation) is meant to be used to extract the issue number itself, this second expression is supposed to be ignored:
(\[.*?\]|\[BZ#\d+\])
\[BZ#(\d+)\]
In this case TortoiseSVN displays the bug and document references correctly in the separate column, but uses them completely for the bugtracker url, which is of course not working:
https://mybugzillaserver/show_bug.cgi?id=[BZ#949]
BTW, Mercurial uses a better way by using {1}, {2}, ... as the placeholder in URLs.
Has anybody an idea how to solve this problem?
EDIT
In short: We have used [BZ#123] as bug number references and [anytext] as references to other (partly non-electronic) documents. We would like to have both patterns listed in TortoiseSVN's extra column, but only the bug number from the first part shpuld be used as %BUGID% in the URL string.
EDIT 2
Supposedly TortoiseSVN cannot handle nested regex groups (round braces), so this question doesn't have any satisfactory answer at the moment.
I'm not familiar with TortoiseSVN regex, but what it looked like the problem was that the first piece of the regex ([.*?\]) would always match, so you would never even get to the part evaluating the second part, \[BZ#(\d+)\]
Try this one instead:
((?<=\[BZ#)\d+(?=\])|\[.*?\])
Explanation:
( #Opening group.
(?<=\[BZ#) #Look behind for a bugzilla placeholder.
\d+ #Capture just the digits.
(?=\]) #Look ahead for the closing bracket (probably not necessary.)
| #Or, if that fails,
\[.*?\] #Find all other placeholders.
) #Closing the group.
Edit: I've just looked at TortoiseSVN docs. You could also try to keep the Message part expression the same, but change the Bug-ID expression to:
(?<=\[BZ#)(\d+)(?=\])
Edit: ?<= represents a zero-width lookbehind. See http://www.regular-expressions.info/lookaround.html. It is possible that TortoiseSVN doesn't support lookbehinds.
What happens if you just use (\d+) for your Bug-ID expression?

Removing everything between a tag (including the tag itself) using Regex / Eclipse

I'm fairly new to figuring out how Regex works, but this one is just frustrating.
I have a massive XML document with a lot of <description>blahblahblah</description> tags. I want to basically remove any and all instances of <description></description>.
I'm using Eclipse and have tried a few examples of Regex I've found online, but nothing works.
<description>(.*?)</description>
Shouldn't that work?
EDIT:
Here is the actual code.
<description><![CDATA[<center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"><th>ID</th><td>308</td></tr></table></center>]]></description>
I'm not familiar with Eclipse, but I would expect its regex search facility to use Java's built-in regex flavor. You probably just need to check a box labeled "DOTALL" or "single-line" or something similar, or you can add the corresponding inline modifier to the regex:
(?s)<description>(.*?)</description>
That will allow the . to match newlines, which it doesn't by default.
EDIT: This is assuming there are newlines within the <description> element, which is the only reason I can think of why your regex wouldn't work. I'm also assuming you really are doing a regex search; is that automatic in Eclipse, or do you have to choose between regex and literal searching?