How to use regex for URL-targeting - regex

As a disclaimer, I must say that my experience with regular expressions is very limited. I am using Optimizely for A/B testing and have run into a problem. I only want my experiment to run on one page, however, this page's URL-structure is somewhat complicated. The URL-structure of the page where I want to run my experiment looks like this:
https://mywebsite.co/term/public_id/edit/pricing
The problem is the public_id that changes dynamically, whenever a new user goes through the signup flow. How can I use regex to target this page exclusively? I have been trying to figure it out these past days but without any luck. Optimizely regex docs can be found here. I can't just use a simple match because /term/ appears in the URL of several pages on my site.

You could use this regular expression:
mywebsite\.co/somepage/.*?/edit/pricing
The .* part means any character can occur here any number of times. The additional ? makes it lazy, meaning the rest of the regular expression will kick in as soon as possible.
Note that a literal . needs to be escaped with a backslash, like \.

Related

Trying to regex YouTube ads with pihole

EDIT:
As far as I know, Pihole does not block YouTube ads.
Original Post:
Trying to regex urls like:
r4---sn-vgqsrnez.googlevideo.com
r1---sn-vgqsknlz.googlevideo.com
r5---sn-vgqskn7e.googlevideo.com
r3---sn-vgqsknez.googlevideo.com
r6---sn-vgqs7ney.googlevideo.com
r4---sn-vgqskne6.googlevideo.com
r4---sn-vgqsrnez.googlevideo.com
r5---sn-vgqskn76.googlevideo.com
r6---sn-vgqs7ns7.googlevideo.com
r1---sn-vgqsener.googlevideo.com
r1---sn-vgqskn7z.googlevideo.com
r1---sn-vgqsknek.googlevideo.com
r6---sn-vgqsener.googlevideo.com
r3---sn-vgqs7nly.googlevideo.com
r1---sn-vgqsknes.googlevideo.com
r4---sn-vgqsrnes.googlevideo.com
r6---sn-vgqskn76.googlevideo.com
I've tried:
(^|\.)r[0-100]---sn-vgqs?n??\.googlevideo\.com$
(^|\.)r[0-100]?*\.googlevideo\.com$
^r[0-100]---sn-vgqs(?:.*)n(?:.*)(?:.*).googlevideo.com$
^r[0-100]---sn-vgqs(?:.*)n(?:.*).googlevideo.com$
but nothing works
I am probably using regex wrong because I don't have much experience with it but looking online some people have said it could be a thing with Pihole.
I'm guessing that you'd like to have restricted boundaries, if not though, this expression might be somewhat close to what you have in mind:
^r\d+---sn-vgqs[a-z0-9]{4}\.googlevideo\.com$
Demo 1
You can add more boundaries, if necessary, such as:
^r(?:100|[1-9]\d|\d)---sn-vgqs[a-z0-9]{4}\.googlevideo\.com$
Demo 2
or:
^r(?:100|[1-9]\d|\d)---sn-vgqs(?:rne(?:s|z)|kne(?:s|z)|knlz|kn7e|7ney|kne6|kn76|7ns7|ener|kn7z|knek|7nly)\.googlevideo\.com$
Demo 3
which I'm just guessing.
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
The following Regex match all the url start with "r" then followed by anything else without limiting number of character then followed by "sn" then followed by any number of characters then end with ".googlevideo.com" the expression was anchor with ^ and $.
I try it on my pihole with great success but have to remove it later. all r....sn...googlevideo.com was blocked in the query list but it also rendered my smart tv youtube app broken. It will not play any video at all unless I remove it from pihole. use it at your own risk.
^r.+sn.+(\.googlevideo\.com)$
The post is a bit older but because I tried myself with regexes I just want to say that your regexes can't work because of one "little" point.
Pi-Hole uses the POSIX ERE (POSIX Extended Regular Expressions) standard.
So there are no lazy quantifiers or shorthand character classes.
It also does not support non-capturing groups like in your third and fourth line.
You can check such regexes in tools like RegexBuddy. Maybe other free tools can check it too and help to convert it.
My current regex is:
^r[[:digit:]]+---sn-4g5e[a-z0-9]{4}\.googlevideo\.com$
It correctly blocks all ads BUT also videos.
If you use it you have to do the following.
Open a youtube video and check if the video loads.
If not, go to your pi hole dashboard to the query log.
For your device you will have two dns queries
r5---sn-4g5e6nze.googlevideo.com
and
r5---sn-4g5ednse.googlevideo.com
The last one (upper) in the query log is the video. So whitelist
the dns. You have to do it sometimes.
Greetings

How to exclude the last part of a variable string using regex

I am currently making a bunch of landing pages that use similar URL structure, but each URL varies in number of words.
So it's something like:
http://landingpage.xyz/page-number-five
http://landingpage.xyz/page-number-fifty-four
http://landingpage.xyz/page-for-a-different-topic
and for the sent page I just postfix -sent like this. The reason I am not adding it as /sent is because the platform I am using handles URLs this way.
http://landingpage.xyz/page-number-five-sent
http://landingpage.xyz/page-number-fifty-four-sent
http://landingpage.xyz/page-for-a-different-topic-sent
Now I found it easy to make a regular expression that identifies all the sent pages which is let's say:
\/([a-z0-9\-]*)-sent
The thing is that I am not sure how to identify the ones that are not sent. I tried using a similar regular expression using something like this, but it's not working as expected:
\/([a-z0-9\-]*)(?!-sent)
What's the best way to design the regex for this? Or I am approaching it in the wrong way?
A lookahead should be considered where there are some characters left to match. So one at the end of regex doesn't look for anything. As long as I'm not sure whether or not your environment supports lookbehinds, this should be a workaround:
\/(?!.*-sent\b)([a-z0-9\-]*)

Filter by regex example

Could anyone provide an example of a regex filter for the Google Chrome Developer toolbar?
I especially need exclusion. I've tried many regexes, but somehow they don't seem to work:
It turned out that Google Chrome actually didn't support this until early 2015, see Google Code issue. With newer versions it works great, for example excluding everything that contains banners:
/^(?!.*?banners)/
It's possible -- at least in Chrome 58 Dev. You just need to wrap your regex with forward-slashes: /my-regex-string/
For example, this is one I'm currently using: /^(.(?!fallback font))+$/
It successfully filters out any messages that contain the substring "fallback font".
EDIT
Something else to note is that if you want to use the ^ (caret) symbol to search from the start of the log message, you have to first match the "fileName.js?someUrlParam:lineNumber " part of the string.
That is to say, the regex is matching against not just the log message, but also the stack-entry for the line which made the log.
So this is the regex I use to match all log messages where the actual message starts with "Dog":
/^.+?:[0-9]+ Dog/
The negative or exclusion case is much easier to write and think about when using the DevTool's native syntax. To provide the exclusion logic you need, simply use this:
-/app/ -/some\sother\sregex/
The "-" prior to the regex makes the result negative.
Your expression should not contain the forward slashes and /s, these are not needed for crafting a filter.
I believe your regex should finally read:
!(appl)
Depending on what exactly you want to filter.
The regex above will filter out all lines without the string "appl" in them.
edit: apparently exclusion is not supported?

Regular Expression for finding JavaScript accessing custom attributes

I'm fixing our web application to be browser compatible with Internet Explorer 10 (non-compatibility mode) and have run into a couple if issues. There is a lot of JavaScript that access a custom attribute of an element, which does not work in Internet Explorer 10 (regular mode). I've fixed most cases by using element.getAttribute("customattribute"). The problem is, there is quite a bit of JavaScript and I do not know all the places that a custom attribute is trying to be obtained. I've working on finding all occurrences by using a regular expression. Basically, I want to find anyword, followed by a dot (.) followed by anyword except attributes like id, name, checked, etc, followed by a space or equal sign. This is what I've come up with so far.
(\w)\.(?!attr|index|all|id|value|className)(\w)([ \t]|=)
The words attr, index, all, id, value and className are all being returned though. Is there a better way (or correct way) to achieve this?
I used the following modification to obtain the things you are asking for:
(\w*)\.(?!attr|index|all|id|value|className|getElementById)(\w*)
However there are a lot of dot phrases caught (e.g. "document.getElementById", "xmlhttp.open") which you don't want. So whitelisting things you do want may be helpful as well:
style\.(?!attr|index|all|id|value|className|getElementById)(\w*)
Tested at: http://gskinner.com/RegExr/ with a sample of JavaScript code. Without more information on the JavaScript code itself there are too many I can suppose to exclude or do the opposite if there are too many custom ones to I want to find.

Regex for URL routing - match alphanumeric and dashes except words in this list

I'm using CodeIgniter to write an app where a user will be allowed to register an account and is assigned a URL (URL slug) of their choosing (ex. domain.com/user-name). CodeIgniter has a URL routing feature that allows the utilization of regular expressions (link).
User's are only allowed to register URL's that contain alphanumeric characters, dashes (-), and under scores (_). This is the regex I'm using to verify the validity of the URL slug: ^[A-Za-z0-9][A-Za-z0-9_-]{2,254}$
I am using the url routing feature to route a few url's to features on my site (ex. /home -> /pages/index, /activity -> /user/activity) so those particular URL's obviously cannot be registered by a user.
I'm largely inexperienced with regular expressions but have attempted to write an expression that would match any URL slugs with alphanumerics/dash/underscore except if they are any of the following:
default_controller
404_override
home
activity
Here is the code I'm using to try to match the words with that specific criteria:
$route['(?!default_controller|404_override|home|activity)[A-Za-z0-9][A-Za-z0-9_-]{2,254}'] = 'view/slug/$1';
but it isn't routing properly. Can someone help? (side question: is it necessary to have ^ or $ in the regex when trying to match with URL's?)
Alright, let's pick this apart.
Ignore CodeIgniter's reserved routes.
The default_controller and 404_override portions of your route are unnecessary. Routes are compared to the requested URI to see if there's a match. It is highly unlikely that those two items will ever be in your URI, since they are special reserved routes for CodeIgniter. So let's forget about them.
$route['(?!home|activity)[A-Za-z0-9][A-Za-z0-9_-]{2,254}'] = 'view/slug/$1';
Capture everything!
With regular expressions, a group is created using parentheses (). This group can then be retrieved with a back reference - in our case, the $1, $2, etc. located in the second part of the route. You only had a group around the first set of items you were trying to exclude, so it would not properly capture the entire wild card. You found this out yourself already, and added a group around the entire item (good!).
$route['((?!home|activity)[A-Za-z0-9][A-Za-z0-9_-]{2,254})'] = 'view/slug/$1';
Look-ahead?!
On that subject, the first group around home|activity is not actually a traditional group, due to the use of ?! at the beginning. This is called a negative look-ahead, and it's a complicated regular expression feature. And it's being used incorrectly:
Negative lookahead is indispensable if you want to match something not followed by something else.
There's a LOT more I could go into with this, but basically we don't really want or need it in the first place, so I'll let you explore if you'd like.
In order to make your life easier, I'd suggest separating the home, activity, and other existing controllers in the routes. CodeIgniter will look through the list of routes from top to bottom, and once something matches, it stops checking. So if you specify your existing controllers before the wild card, they will match, and your wild card regular expression can be greatly simplified.
$route['home'] = 'pages';
$route['activity'] = 'user/activity';
$route['([A-Za-z0-9][A-Za-z0-9_-]{2,254})'] = 'view/slug/$1';
Remember to list your routes in order from most specific to least. Wild card matches are less specific than exact matches (like home and activity), so they should come after (below).
Now, that's all the complicated stuff. A little more FYI.
Remember that dashes - have a special meaning when in between [] brackets. You should escape them if you want to match a literal dash.
$route['([A-Za-z0-9][A-Za-z0-9_\-]{2,254})'] = 'view/slug/$1';
Note that your character repetition min/max {2,254} only applies to the second set of characters, so your user names must be 3 characters at minimum, and 255 at maximum. Just an FYI if you didn't realize that already.
I saw your own answer to this problem, and it's just ugly. Sorry. The ^ and $ symbols are used improperly throughout the lookahead (which still shouldn't be there in the first place). It may "work" for a few use cases that you're testing it with, but it will just give you problems and headaches in the future.
Hopefully now you know more about regular expressions and how they're matched in the routing process.
And to answer your question, no, you should not use ^ and $ at the beginning and end of your regex -- CodeIgniter will add that for you.
Use the 404, Luke...
At this point your routes are improved and should be functional. I will throw it out there, though, that you might want to consider using the controller/method defined as the 404_override to handle your wild cards. The main benefit of this is that you don't need ANY routes to direct a wild card, or to prevent your wild card from goofing up existing controllers. You only need:
$route['404_override'] = 'view/slug';
Then, your View::slug() method would check the URI, and see if it's a valid pattern, then check if it exists as a user (same as your slug method does now, no doubt). If it does, then you're good to go. If it doesn't, then you throw a 404 error.
It may not seem that graceful, but it works great. Give it a shot if it sounds better for you.
I'm not familiar with codeIgniter specifically, but most frameworks routing operate based on precedence. In other words, the default controller, 404, etc routes should be defined first. Then you can simplify your regex to only match the slugs.
Ok answering my own question
I've seem to come up with a different expression that works:
$route['(^(?!default_controller$|404_override$|home$|activity$)[A-Za-z0-9][A-Za-z0-9_-]{2,254}$)'] = 'view/slug/$1';
I added parenthesis around the whole expression (I think that's what CodeIgniter matches with $1 on the right) and added a start of line identifier: ^ and a bunch of end of line identifiers: $
Hope this helps someone who may run into this problem later.