Regex in Google Analytics for segment creation - regex

I'm trying to trap URLs of the following structure:
/resources/state-name/city-name
given that there are URLs of the following type
/resources/other-words
/resources/state-name
/resources/state-name/city-name/other-words
I have tried to trap using
include/matches regex:
\/resources\/.*\/.*
exclude/matches regex:
\/resources\/.*\/.*\/.*
but this is allowing the other-words and state-name only to slip through.

Try this regex \/resources\/[^\/\r\n]*(?:$|(?:\/.*\/.*$)). I assumed the end of the url was also the end of the line. This matches all of them but /resources/state-name/city-name
To only get /resources/state-name/city-name, then use this one \/resources\/[^\/\r\n]*\/[^\/\r\n]*$.

Something like this /(\/resources\/)([\w+%-]+)\/([\w+%-]+)/g
With this [\w-%] you match any letter, number, - and % in the URL (i put % because in the URLs spaces are replaced with - or + or %20)
Also, with the () you can access each member with $1 to $3

Related

How can I write a opposite regex to this regex?

this is a regex of a proxy, if I add this to my proxy:
(.*\.|)(abc|google)\.(org|net)
my proxy will not transmit the abc.org, abc.net, google.org, google.net's traffic.
how can I write a regex opposite to this regex? I mean only transmit the abc.org, abc.net, google.org, google.net's traffic.
EDIT-01
My thought is just want to transmit abc.org or www.abc.org, how can I do with that?
Try this:
^(?!(www\.)?(?:abc|google)\.(?:net|org)).*
Demo: https://regex101.com/r/WOnFx8/3/
I used ?! to reverse the matching of your regex. This way, it will match any domain except these specific 4 domains.
Another way to do it is by using this code to include anything before the desired domains:
^(?!(.*\.|)(?:abc|google)\.(?:net|org)).*
demo: https://regex101.com/r/WOnFx8/4/
Your regex you write
(.*\.|)(abc|google)\.(org|net)
mean any string is one of abc.org, gooogle.org, abc.net, google.net, with optional prefix string ends with dot (.)
Like: test.google.org, sub.abc.net,...
I think you want to match string like test.yahoo.com, but not test.google.org. If you can use negative look ahead, this is the answer:
^(.*\.|)(?!(abc|google)\.(org|net))\w+\.\w+$
Explain:
^ and $ to be sure your match is entire url string
Negative look ahead is to check the url is not something like abc.org, abc.net, google.org, google.net
And \w+\.\w+ to check the remain string is kind of URL type (something likes yahoo.com, etc...)
Im going to assume you have lookaheads, if so then you can simply use -
(^.*?\.(?!(abc|google))\w+\.(?:org|net)$)
Demo - https://regex101.com/r/5eC41R/3
What this does is -
Looks for the start of the url (till the first .)
Checks that next part is not abc or google
looks for the next section (till the next .)
Looks for a closing org or net
Note that since it is a lookahead it will be slow compared to other regex matches

Regex: start with something and ends with whatever except something

I need a regex for Url rewrite module, to validate urls in such way:
1) spa/ - match
2) spa/some/url - match
3) spa/some-url - match
4) spa/some.js - no match
5) spa/some.css - no match
So, it should match, if url
a) starts with "spa"
b) ends with whatever except ".js" or ".css"
What I tried to test is ^(spa/)((?!.js)|(?!.css))$
but it's not working.
Thank you and sorry if it's duplicated.
Try this regex:
^spa\/((.+)\/)*.*(?<!\.js|\.css)$
with g and m flags set.
Please note that this regex allows several characters that urls are not supposed to have. I have tried to keep it simple. So, you might want to tune it a bit before using it.
You need negative-lookbehind for this.
Try this (you may need to modify it slightly)
^spa.*(?<!(\.js|\.css))$
^spa : string beginning with spa
.* : followed by any character(s)
(?<!(\.js|\.css))$ : not ending with .js or .css

Django Url pattern regex for tokens

I need to pass tokens like b'//x0eaa#abc.com//x00//xf0//x7f//xff//xff//xfd//x00' in my Django Url pattern. I am not able to find matching regex for that resulting Page not found error.
My url will be like /api/users/0/"b'//x0eaa#abc.com//x00//xf0//x7f//xff//xff//xfd//x00'"/
I have tried with following regex
url(r'^api/users/(?P<username>[\w\-]+)/(?P<paging_state>[\w.%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4})/$', views.getUserPagination),
Please pass the token in request header or body and then use accordingly in your view.
Considering there are some static predictable elements in your url like -
api/users/
/" before b
"/ at the end after '
So I can see the url in either of the 2 ways below. Regex's mentioned accordingly:
api/users/(set of words, digits or hyphens)/"(any character except newline)"/
REGEX: ^api\/users\/([\w\d\-]+)\/"(.*)"\/$
URL: url(r'^api\/users\/([\w\d\-]+)\/"(.*)"\/$', views.getUserPagination),
api/users/(set of words, digits or hyphens)/"(one character-b)'//(any no. of words or digits)#(any no. of words or digits).(any no. of words or digits) (any no. of words, digits, front slashes)'"/
REGEX: ^api\/users\/([\w\d\-]+)\/"([a-g]'\/\/[\w\d]*#[\w\d]*.[\w\d]*[\/\w\d]*')"\/$
URL: url(r'^api\/users\/([\w\d\-]+)\/"([a-g]'\/\/[\w\d]*#[\w\d]*.[\w\d]*[\/\w\d]*')"\/$', views.getUserPagination),
You should be able to use either of the above two. There can be multiple ways to match the token part in your url. So unless it is a big security concern, you can do with the simplest approach as mentioned in point 1.

Matching URLs with other characters around

I need a regex pattern to match URLs in a complicated environment.
An URL would be in this position:
[url=http://www.php.net/manual/en/function.preg-replace.php:32p0eixu]TEST[/url:32p0eixu]
(That's just a sample URL)
I need to match the URL until the colon, the colon and the code after that should be ignored. There are so many URLs out there and I'm not that experienced to create a pattern to match everything from http:// to :
As I said, everything else should be ignored, left away, except the URL which I need to store in a variable.
Could someone help me create such a pattern? My tries were matching the URL above, but when I put in more complicated URLs, they wouldn't match.
This is the pattern I've created. It works with simple URLs, but not with the complicated ones:
http(s)?://[A-Za-z0-9.,/_-]+
I'm not very good in regex, I'm still learning.
Thank you.
This regex should do it for you.
\[url=(.*?):[a-zA-Z0-9]*\]
Run against your test data:
[url=http://www.php.net/manual/en/function.preg-replace.php:32p0eixu]TEST[/url:32p0eixu]
This will return the URL in capture group 1.
Assuming PHP (since your test URL is for the PHP manual), you'd use this with preg_match like this:
$value = "[url=http://www.php.net/manual/en/function.preg-replace.php:32p0eixu]TEST[/url:32p0eixu]";
$pattern = "/\[url=(.*?):[a-zA-Z0-9]*\]/";
preg_match($pattern, $value, $matches);
echo $matches[1];
Output:
http://www.php.net/manual/en/function.preg-replace.php
This will also work against URLs which contain colons in them, such as:
http://www.php.net:8080/manual/en/function.preg-replace.php
http://www.php.net/manual/us:en/function.preg-replace.php
How about this:
^(http(s)?:\/\/)?[^]^(^)^ ]+
Below regex will give you the url part before colon:
\[url=((http|https)?://)?[^\:]+

Regex to match anything after /

I'm basically not in the clue about regex but I need a regex statement that will recognise anything after the / in a URL.
Basically, i'm developing a site for someone and a page's URL (Local URL of Course) is say (http://)localhost/sweettemptations/available-sweets. This page is filled with custom post types (It's a WordPress site) which have the URL of (http://)localhost/sweettemptations/sweets/sweet-name.
What I want to do is redirect the URL (http://)localhost/sweettemptations/sweets back to (http://)localhost/sweettemptations/available-sweets which is easy to do, but I also need to redirect any type of sweet back to (http://)localhost/sweettemptations/available-sweets. So say I need to redirect (http://)localhost/sweettemptations/sweets/* back to (http://)localhost/sweettemptations/available-sweets.
If anyone could help by telling me how to write a proper regex statement to match everything after sweets/ in the URL, it would be hugely appreciated.
To do what you ask you need to use groups. In regular expression groups allow you to isolate parts of the whole match.
for example:
input string of: aaaaaaaabbbbcccc
regex: a*(b*)
The parenthesis mark a group in this case it will be group 1 since it is the first in the pattern.
Note: group 0 is implicit and is the complete match.
So the matches in my above case will be:
group 0: aaaaaaaabbbb
group 1: bbbb
In order to achieve what you want with the sweets pattern above, you just need to put a group around the end.
possible solution: /sweets/(.*)
the more precise you are with the pattern before the group the less likely you will have a possible false positive.
If what you really want is to match anything after the last / you can take another approach:
possible other solution: /([^/]*)
The pattern above will find a / with a string of characters that are NOT another / and keep it in group 1. Issue here is that you could match things that do not have sweets in the URL.
Note if you do not mind the / at the beginning then just remove the ( and ) and you do not have to worry about groups.
I like to use http://regexpal.com/ to test my regex.. It will mark in different colors the different matches.
Hope this helps.
I may have misunderstood you requirement in my original post.
if you just want to change any string that matches
(http://)localhost/sweettemptations/sweets/*
into the other one you provided (without adding the part match by your * at the end) I would use a regular expression to match the pattern in the URL but them just blind replace the whole string with the desired one:
(http://)localhost/sweettemptations/available-sweets
So if you want the URL:
http://localhost/sweettemptations/sweets/somethingmore.html
to turn into:
http://localhost/sweettemptations/available-sweets
and not into:
localhost/sweettemptations/available-sweets/somethingmore.html
Then the solution is simpler, no groups required :).
when doing this I would make sure you do not match the "localhost" part. Also I am assuming the (http://) really means an optional http:// in front as (http://) is not a valid protocol prefix.
so if that is what you want then this should match the pattern:
(http://)?[^/]+/sweettemptations/sweets/.*
This regular expression will match the http:// part optionally with a host (be it localhost, an IP or the host name). You could omit the .* at the end if you want.
If that pattern matches just replace the whole URL with the one you want to redirect to.
use this regular expression (?<=://).+