Regex to search group name after Active Directory canonical path - regex

New to regex and not a coder. I am using a group filter in the OKTA SSO application that uses regex for filtering. The below filter works for groups without the full canonical path but it doesn't find groups that are in the format of canonical name. I want to search only for the group name after the path.
Example: "(?i)^aws_\S+_(?{{role}}[\w-]+)_(?{{accountid}}\d+)$"
will find this: "AWS_Alias_AdministratorAccess_000000"
but it will not find this: "llc.domainname.loc/IT/Security
Groups/AWS_Alias_AdministratorAccess_12345678"
OKTA Documentation:
https://saml-doc.okta.com/SAML_Docs/How-to-Configure-SAML-2.0-for-Amazon-Web-Service.html#setup-step3

If you mean there must be any one or more chars followed with a / char followed with your pattern you can use
(?i)^.+/aws_\S+_(?{{role}}[\w-]+)_(?{{accountid}}\d+)$
Here,
.+ - one or more chars
/ - a / char.

Related

How can I extract all characters between the first / and second / using REGEXP_EXTRACT in Google Data Studio?

I am trying to use REGEXP_EXTRACT in Google Data Studio for extracting a part of the URL.
Input:
URLs
/media/news/royals/meghan-markle-prince-harry-archie-new-photo
/marketplace/deals/best-selling-orthotic-friendly-sneakers/
Output:
URLs
media
marketplace
How can I draft an expression that will allow me to extract it?
You can use regex & a capture group to find the start of the string, 1 slash, anything not a slash, then a slash. In Python, the regex below works. Use regex101.com to test your regex.
strings = ['/media/news/royals/meghan-markle-prince-harry-archie-new-photo', '/marketplace/deals/best-selling-orthotic-friendly-sneakers/']
for s in strings:
good_part = re.sub('\A/([^/]*)/.*', r'\1', s)
print(good_part)
prints:
media
marketplace
You can achieve this with the following expression: ^/([^/]+).
It matches a string that starts (^) with /, and captures 1 or more characters that are not a / after that (([^/]+)).
Example:
WITH URLS AS (
SELECT '/media/news/royals/meghan-markle-prince-harry-archie-new-photo' url
UNION ALL
SELECT '/marketplace/deals/best-selling-orthotic-friendly-sneakers/' url
)
SELECT url, REGEXP_EXTRACT(url, '^/([^/]+)') path
FROM URLS
See https://support.google.com/datastudio/answer/7050487?hl=en
It can be achieved using the REGEXP_EXTRACT Calculated Field below which extracts all characters between the first / and the next / (if there is no second /, all characters will be captured till the end of the string):
REGEXP_EXTRACT(URLs, "^/([^/]+)")
Editable Google Data Studio Report (Embedded Google Sheets Data Source) and a GIF to elaborate:

Required regex to match word example - "active users"

Guys could you please help on creating a Regex for below Scenario.
I'm trying to create regex to match file name active users in wildcard.
Tried the basic one but doesn't work (?i)^active users$
example:
Regex should match the word(active users) with case insensitive
AcTiVE Users
active users
ACTIVE USERS
Regex should also match the word(active users) even though character present in prefix and suffix
Active users_test
Test_ACTIVE USERS
You can exclude matching word characters without the underscore at the left and at the right instead of using the anchors.
(?i)(?<![^\W_])active users(?![^\W_])
Regex demo

RegEx to filter E-Mail Adresses from URLs in Google Analytics

I want to use a Google Analytics filter to remove email addresses from incoming URIs. I am using the custom advanced filter, filtering field A on a RegEx for the Request URI and replacing the respective part later. However, my RegEx does not seem to work correctly. It should find email addresses, not only if an '#' is used, but also if '(at)', '%40', or '$0040' are used to represent the '#'.
My latest RegEx version (see below) still allows '$0040' to go through undetected. Can someone advise me what to change?
^(.*)=([A-Z0-9._%+-]+[#|[\(at\)]|[\$0040]|[\%40]][A-Z0-9.-]+\.[A-Z]{2,4})(.*)$
I suggest using
([A-Za-z0-9._%+-]+(#|\(at\)|[$]0040|\%40)[A-Za-z0-9.-]+\.[A‌​-Za-z]{2,4})
See the regex demo.
If you need to match the whole string, you may keep that pattern enclosed with your ^(.*) and (.*)$.
Details
([A-Za-z0-9._%+-]+(#|\(at\)|[$]0040|\%40)[A-Za-z0-9.-]+\.[A‌​-Za-z]{2,4}) - Group 1 capturing
[A-Za-z0-9._%+-]+ - 1 or more ASCII letters/digits, ., _, %, +, or -
(#|\(at\)|[$]0040|\%40) - one of the alternatives: #, (at), $0040 or %40
[A-Za-z0-9.-]+ - 1 or more ASCII letters/digits, . or -
\. - a dot
[A‌​-Za-z]{2,4} - 2 to 4 ASCII letters.

Google analytics filter RegEXp assistance

I have a filter that only shows me data with /documents/ in the URL. I need to modify the filter so i can get /documents/ and /getattachment/.
For the Filter Pattern could i use: "/documents/|/getattachment/", assuming the pipe is an OR?
When you have repeating patterns, you may consider shortening the final pattern.
I'd recommend using a grouping construct with alternation:
/(documents|getattachment)/
Now, the pattern will mean:
/ - a slash
(documents|getattachment) - either one word or the other
/ - a slash

Regex for Page Filtering in Google Analytics

I'm trying to use GA to filter out certain URL pages. I need to distinguish between pages like this:
www.example.com/hotel/hotelfoofoo
and this:
www.example.com/hotel/hotelfoofoo/various-options-go-here?lots-of-other-stuff-follows
I'm new to regex, so I know very little, but am basically trying to capture URL pages that begin with /hotel/ but do not include any other forward slashes. Is there a way to write that code?
Two possible solutions:
1) Assuming only alpha numeric + '-' signs allowed in the name of hotel:
/hotel/([-\w]+)(?![-\/\w])
Note: hotel name would be caught in first group. Idea here - is to capture all digits/letters/underscor/- symbols which are not followed by slash.
2) Assuming white space symbol required to designate url end:
/hotel/([^\s/]+)(?=\s)
Note: depending on your regexp language some of character should be escaped. For js all "/" should be escaped e.g.: "/"