Firebase Dynamic Link Url patterns - regex

I need to add url patterns for a domain that has multiple subdomains in the Url.
For example:
https://demo.site1.mybrand.company/ Where .company is the top-level domain and mybrand is the domain.
The problem is that the demo subdomain can change based on the environment in the app, so it could be demo, or test, or anything, so I would like to make sure that any subdomain with site1.mybrand.company can access the Dynamic Links API and generate links for that Url.
What I have tried:
Firebase docs cite that these are too permission and I am not sure if Firebase supports multi-tier domains such as this.
^https://.*.company/.*$
^https://.*.mybrand.company/.*$
^https://.*.site1.mybrand.company/.*$
Has anyone experienced this situation before or know if this particular scenario is supported?
References:
https://support.google.com/firebase/answer/9021429
https://github.com/google/re2/wiki/Syntax

You might use a bit more specific pattern to match either demo or test using an alternation, and extend that to all the allowed names.
^https://(?:demo|test)\.site1\.mybrand\.company/\S*$
The pattern matches:
^ Start of string
https:// Match literally
(?:demo|test) Match either demo or test
\.site1\.mybrand\.company/ Match .site1.mybrand.company/ (note to escape the dot)
\S* Match optional non whitespace chars
$ End of string
Regex demo

Related

Regex: how can exclude all TLD except my own domain

I have an Asp.Net website on which I'm implementing some basic anti-spam stuff via the validation controls.
One such regex is: "^(?!.*(//|[.]({com|net|info|uk|etc}))).*$"
It pretty much does what it needs to as far as blocking goes — it doesn't need to be too sophisticated. However, I want to include the option to whitelist my own domain.
So, I want to block all .uk domains, except mydomain.co.uk.
This is a regex step beyond me — can anyone help?
You may use a nested negative lookbehind while uk to fail the already existing negative lookahead for mydomain.co. part before matching uk:
^(?!.*(//|[.](com|net|info|(?<!mydomain\.co\.)uk))).*$
RegEx Demo
Take note of (?<!mydomain\.co\.) which is a negative lookbehind to not to match uk if it is preceded by mydomain.co..

Regex for both website url versions with wildcard

I'm trying to add in allowed urls in a watchguard firebox webblocker list using regular expression. I'm trying to keep my list short by allowing one entry to apply to both www and non-www versions of a site including subdomains. I'm currently using the following:
(www\.)?ups\.com/*
Which works great for both versions plus subdomains, but has an issue as it allows other sites through that end their domain with ups.com such as jobs-ups.com
How can I make the regular expression know that if there is no subdomain that the url is only going to be ups.com without any other letters before the u, so it will block sites like jobs-ups.com?
You can use the caret ^ to accomplish this
^(?:www\.)?ups\.com\/
DEMO
The caret forces the check at the start of the string. This means it will not match in mid-string, which is what you are wanting.
Not familiar with firebox at all, but generally you should escape your periods and forward slashes. You would also generally use a non-capturing group as well. But if this is simple regex, you can still preserve your original formatting:
^(www.)?ups.com/*

Regex on domain and negation against language subfolders

Let's say my domains are:
www.test.com
www.test.com/en-gb
www.test.com/cn-cn
These are language sites, the first is the main US English site. In Google Analytics I want to set up a filter to only show me traffic of the first (US) domain. I could do this, I think:
^\/(en-gb|cn-cn).*$
If I EXCLUDE my Request URI with that filter pattern, then I should get a view for the en-US domain. However, I'm interested in understanding regex better so here is some test data and code which I am trying out on http://www.regextester.com/
Regular expression:
^\/(en-gb|cn-cn).*$
Test String
/cn-cn/about
/cn-cn/about/
/cn-cn
/cn-cn/about/test
/en-gb/
/en-gb
/en-gb-test/
/en-gb/aboutus/
/en-gb?q=1
/en-gb/?q=1
/about-us
/test?q=1
/aword/me/
/three
/about/en-gb/
/about/en-gb-test/
/test-yes/
/test/me/
/hello/world/
My questions:
If you try this out, you'll notice that /en-gb-test/ is actually matched with the Regex. How do I avoid this?
Also, let's say I wanted to have a rule to NEGATE this whole option. So rather than telling Google Analytics to "exclude", I am curious how I could write the opposite of this same rule. So basically, catch all URLs that are not in /en-gb and /cn-cn sub-folders.
Thanks in advance!
You may stop the regex from matching en-gb-test by making sure you may / or ? after it or the end of the string
^\/(en-gb|cn-cn)([\/?]|$)
See the regex demo. If you really need to get the rest of the string, add .* after [\/?]: ^\/(en-gb|cn-cn)([\/?]|$).
Details:
^ - start of string
\/ - a / (note that you do not need to escape / in GA regex)
(en-gb|cn-cn) - a capturing group with 2 alternatives, either en-gb or cn-cn
([\/?]|$) - a capturing group with two alternatives: a ? or / OR the end of the string.
In RE2 regex, you cannot use lookaheads that are crucial when you need to match something other than something else. It would look like ^(?!\/(en-gb|cn-cn)([\/?]|$)).*, but it is not possible with RE2.

Regular expression to match a domain

I want to have a regular Expression for Google Analytic so I can match all the domain including the sub domains
say we have to match a domain name called xyz.com
So i want to match every url that have xyz.com in it.
Example
abcd.xyz.com, abc1232.xyz, www.xyz.com, www.xyz.com/abc
Can anyone help me with that.
My purpose to it to have the traafic reports excluded in Google Analytics that are coming from these sites.
In general, the regular expression to match those domains would be something like .*\.xyz\.com$. The backslashes escape the dots (which are normally wildcard characters and the dollar-sign represents the end of the string.
There are different regex implementations, so you might have to tweak this for your regex engine.
To exclude subdomains like described above you can use GA filter([Exclude] [Hostname] [Matching RegEx]) along with regular expression (xyz.com)|(.*.xyz.com).
This RegEx including both main domain and it's subdomains.
You could try this regex
(.*\.)?xyz\.com
This matches all your required formats for the URL.

Regex for excluding URL

I working with an email company that has a feature where they spider your site in order to provide custom content. I have the ability to have the spider ignore urls based on the regex patterns I provide.
For this system a pattern starts and ends with a "/".
What I'm trying to do is ignore http://www.website.com/2011/10 BUT allow http://www.website.com/2011/10/title-of-page.html
I would have thought the pattern below would work since it does not have a trailing slash but no luck.
Any ideas?
/http:\/\/www\.website\.com\/[0-9][0-9][0-9][0-9]\/[0-9][0-9]/
Your regex matches a part of the URL, so you need to tell it not to allow a slash to follow it:
/http:\/\/www\.website\.com\/[0-9]{4}\/[0-9][0-9](?!\/)/
If you want to also avoid other partial matches like in http://www.website.com/2011/100, then an additional word boundary might help:
/http:\/\/www\.website\.com\/[0-9]{4}\/[0-9][0-9]\b(?!\/)/
It depends on the regexp engine but you can probably either use $ (if the URL is tokenised beforehand) or a match for whitespace and delimiters