I want to setup a filter and replace on URI in my Google Analytics account and I want to filter "/" sperated values from URI:
This is a sample list of URIs:
/clients/1282/buildings/4490
/clients/1362/buildings/8915/systems
/clients/1362/buildings/8915/systems/manage-rules/configure-rules
/clients/1282/buildings/4490/insights/rule-templates
/clients/1167/buildings/4126/insights/4126.10100.PG1-Program_Data
this is the regex I apply the moment but it captures only the first instance and it doesn't work for the last URI (mix of text and numbers)
(\/)\d+
results of the current state is :
/clients/1282/buildings/4490
/clients/1362/buildings/8915/systems
/clients/1362/buildings/8915/systems/manage-rules/configure-rules
/clients/1282/buildings/4490/insights/rule-templates
/clients/1167/buildings/4126/insights/4126.10100.PG1-Program_Data
expected result
/clients/buildings/
/clients/buildings/systems
/clients/buildings/systems/manage-rules/configure-rules
/clients/buildings/insights/rule-templates
/clients/buildings/insights/
Create one Search and Replace for the first group and another for the second group.
Related
I'm performing regex extraction for parsing logs for our SIEM. I'm working with PCRE2.
In those logs, I have this problem: I have to extract a field that can be preceded by multiple options and I want use only one group name.
Let me be clearer with an example.
The SSH connection can appear in our log with this form:
UserType=SSH,
And I know that a simple regex expression to catch this is:
UserType=(?<app>.*?),
But, at the same time, SSH can appear with another "prefix":
ACCESS TYPE:SSH;
that can be captured with:
ACCESS\sTYPE:(?<app>.*?);
Now, because the logical field is the same (SSH protocol) and I want map it in every case under group name "app", is there a way to put the previous values in OR and use the same group name?
The desiderd final result is something like:
(UserType=) OR (ACCESS TYPE:) <field_value_here>
You can use
(?:UserType=|ACCESS\sTYPE:)(?<app>[^,;]+)
See the regex demo. Details:
(?:UserType=|ACCESS\sTYPE:) - either UserType= or ACCESS + whitespace + TYPE:
(?<app>[^,;]+) - Group "app": one or more chars other than , and ;.
I'm looking to use CloudWatch Logs Insights to group logs by a request url field, however the url can contain 0-2 unique numerical identifiers that I'd like to be ignored when doing the grouping.
Some examples of urls:
/dev/user
/dev/user/123
/dev/user/123/inventory/4
/dev/server/3/statistics
The groups would look something like:
/dev/user
/dev/user/
/dev/user//inventory/
/dev/server//statistics
I have something quite close to what I need which extracts the section of the url in front of the first optional identifier and the section between the first identifier and the second identifier and concatenates the two, but it isn't totally reliable. This is where I'm at currently, #message is valid json which containers an 'endpoint' field that looks like one of the urls above:
fields #message | parse endpoint /(\bdev)\/(?<#prefix>[^0-9]+)(?:[0-9]+)(?<#suffix>[^0-9]+)/ | stats count(*) by #prefix
While this query will work with endpoints like '/dev/accounts/1' it ignores endpoints like '/dev/accounts' as it doesn't have all of the components the regex is looking for, which means I'm missing a lot of results.
If there are 0-2 numerical identifiers that you want to remove, you could match the first and optionally match the second number and use 2 capturing groups to capture what you want to keep.
In the replacement use the 2 capturing groups $1$2
^(.*?\/)\d+(?:(.*?\/)\d+\b)?
Regex demo
Looks like I can use question marks outside of capture groups to mark those groups as optional, which has resolved the last issue I was having.
Regex demo
I am trying to capture multiple occurence of utm tag in a URL and append when re-writing the url. However i just want utm key values and skip others.
This is a sample URL
https://example.com/dl/?screen=page&title=SABC&page_id=4063&myvalue=Noidea&utm_source=sourceTest19&utm_medium=mediumTest19&utm_campaign=campaignTest19&utm_term=termTest19&test=value&utm_content=contentTest19
I tried this:
(\?.*)(page_id=([^&]*))(\?|&)(.*[&?]utm_[a-z]+=([^&]+).*)
and unfortunately, it doesn't produce the result I expect.
I need to capture PAGE ID and utm tags both, but do not want test=value, myvalue=Noidea and only want query strings with utm tags.
Expected Result is the URL below:
https://example.com/dl/page_id/4063?utm_source=sourceTest19&utm_medium=mediumTest19&utm_campaign=campaignTest19&utm_term=termTest19&utm_content=contentTest19
one group with pageid=<somenumber/text>
one group with all utm tags with key and value
Help will be appreciated.
You can make regex like this to get group result:
(?:(page_id|utm_[a-z]+)=[A-z0-9]+)(?:^\&)?
You can instead replace any parameter that does not match the desired ones with the empty string. The pattern for this is
(?:[?&](?!(?:page_id|utm_[^=&]++)=)[^&]*+)++$|(?<=[?&])(?!(?:page_id|utm_[^=&]++)=)[^&]*+(?:&|$)
Here's a working proof: https://regex101.com/r/L5xcl4/2 It has an extra \s only so it works on the multiline input in the tester, but you shouldn't need it as you'll be working on a string that contains only a URL without whitespace.
1) www.mysite.site/product/brand?card_type=all
2) www.mysite.site/product/brand?card_type=all&cp=randomID&keyword=randomKeyword&network=randomNetwork&v3=sometype&v4=MM
So I have these 2 types of URLs being reported on my Analytics being:
Traffic that went on that page organically
Traffic that went on that page via Paid Traffic
I need to basically find all the links that have a "&" followed by (cp|keyword|v1|v2|v3|v5) after the value for “card_type” and replace it with “ppc-traffic” - so ideally would have :
www.mysite.site/product/brand?card_type=all
www.mysite.site/product/brand/ppctraffic or just mysite.site/ppctraffic
What I attempted:
Search String
Request URI
^(https?:\/\/\S+\/[^?]*)(.*?)&(cp|keyword|v1|v2|v3|v5)
Replace String:
/ppctraffic
(I’ve also tried $1/ppctraffic and $2/ppctraffic)
When testing the regex online it seems to work so not sure what Im doing wrong.
Any help deeply appreciated
One way is to capture in a group upon /brand matching not a question mark [^?]+ and match ?card_type=all& afterwards followed by any character until the end of the string.
As your links do not start with https:// you could make that part optional (?:https?:\/\/)?.
^((?:https?:\/\/)?www\.[^?]+)\?card_type=all&(?:cp|keyword|v[1235]).*$
Then in the replacement use $1/ppctraffic
Regex demo
const pattern = /^((?:https?:\/\/)?www\.[^?]+)\?card_type=all&(?:cp|keyword|v[1235]).*$/;
[
"www.mysite.site/product/brand?card_type=all&cp=randomID&v1=randomIDv2=productName&v3=sometype&v4=MM&fbclid=randomID",
"www.mysite.site/product/brand?card_type=all",
"www.mysite.site/product/brand?card_type=all&aa=randomID&v1=randomIDv2=productName&v3=sometype&v4=MM&fbclid=randomID"
].forEach(s => console.log(s.replace(pattern, "$1/ppctraffic")));
I have 2 landing pages:
/aa/index.php/aa/index/[sessionID]/alpha
/bb/index.php/bb/index/[sessionID]/bravo
Because the sessionID is unique, each of the landing page will be tracked as different pages. Therefore, I need a filter to remove the sessionID. These are what i want to track:
/aa/index.php/aa/index/alpha
/bb/index.php/bb/index/bravo
I created the Search and Replace Custom Filter on the Request URI:
Search String: /(aa|bb)/index\.php/(aa|bb)/index/(.*)
Replace String: /$1/index.php/$2/index/$3
But i get the /$1/index.php/$2/index/$3 being reported on the dashboard the next day. So i tried /\1/index.php/\2/index/\3 but i got very strange results, //aa/index.php/aa/index/alpha/index.php/aa/index/aa.
Does anyone know how to reference the grouped patterns in the replace string?
My Solution:
i managed to solve it using Advanced Filter. My solution:
Field A => Request URI: /(aa|bb)/index\.php/(aa|bb)/index/(.*)/(.*)
Field B => -
Output to => Request URI: /$A1/index.php/$A2/index/$A4
I haven't used the Google Analytics regex engine, but it appears to me that \1 is referencing the entire match (which in other regex implementations is called \0), while \2 is the first group, \3 is the second group, and so on.
Your initial regex, however, looks incomplete--I think it should look as follows:
Search String: /(aa|bb)/index\.php/(aa|bb)/index(/.*)/(alpha|bravo)
Replace String: /\2/index.php/\3/index/\5
(Note that I'm not sure whether ? is supported in this regex implementation as the non-greedy modifier, but if it is, the above search string pattern might run a little faster if you change /.* to /.*?.)