Replacing full referrer using REGEX Google Data Studio - regex

I'm using Google Data Studio to create a report analyzing specific referral sites. My data source is my site Google Analytics.
I want to replace the Full Referrer (e.g. of the format webaddress.com/page-name-one) with a text only value (i.e Page name one), so that it's clearer to see in the report which page is which in my charts and tables.
I've used the below formulae in the calculated fields, but none of them seem to change Full Referrer to match what I need it to. Data studio recognizes them all as valid formulae too.
I've anonymised my examples, but it has the same principles. I've tried:
REGEXP_REPLACE(Full Referrer,"[webaddress\\.com\\/page\\-name\\-one].*","Page name one")
REGEXP_REPLACE(Full Referrer, 'webaddress.com/page-name-one', 'Page name one')
REGEXP_REPLACE(Full Referrer, 'webaddress\\.com\\/page\\-name\\-one', 'Page name one')
REGEXP_REPLACE(Full Referrer, 'name', 'Page name one')
REGEXP_REPLACE(Full Referrer, 'page-name-one', 'Page name one')
REGEXP_REPLACE(Full Referrer, 'page\\-name\\-one', 'Page name one')

In testing this on one of my own GA data sources, I was able to achieve this using one of your patterns:
REGEXP_REPLACE(Full Referrer,'webaddress.com/page-name-one','Page name one')
It should be noted, however, that the . should be properly escaped (either by \ or wrapping it in a character class like [.]; see re2 syntax for details). Because you have to double-backslash, I also prefer to use something Data Studio borrowed from BigQuery (sort of an undocumented feature), which is the regular expression string type (r"" or r''). When using this, you only have to single-backslash (unless you want a literal backslash):
REGEXP_REPLACE(Full Referrer,r'webaddress\.com/page-name-one','Page name one')
Because you're using REGEXP_REPLACE, anything before or after your match string will still exist after the replacement—meaning that for a Full Referrer of "m.facebook.com/l", REGEXP_REPLACE(Full Referrer,r'facebook\.com','FB') would return "m.FB/l"
So your pattern above will match the value anywhere in the string, which likely isn't what you want. To anchor it to the beginning, use the ^ (start of string) assertion:
REGEXP_REPLACE(Full Referrer,r'^webaddress\.com/page-name-one','Page name one')
If you want to only match that exact value of Full Referrer (i.e. not including any additional path levels), make sure to use the $ (end of string) assertion as well:
REGEXP_REPLACE(Full Referrer,r'^webaddress\.com/page-name-one$','Page name one')
Keep in mind that if you're doing this in the data source as a calculated field, you aren't actually changing the original metric—you're working on a copy of it. So you need to replace Full Referrer with whatever you named your calculated field in the data source.
Often you're wanting to do this for a bunch of sites or pages, so you can use CASE and REGEXP_MATCH to handle all this logic in a single field:
CASE
WHEN REGEXP_MATCH(Full Referrer,r'^webaddress\.com/page-name-one$') THEN 'Page name one'
WHEN REGEXP_MATCH(Full Referrer,r'^site2\.com/example$') THEN 'S2 Example'
ELSE Full Referrer
END
These matches are done in order, so you can even match a specific page or pages, and then still provide a different value for anything on that domain that you didn't match:
CASE
WHEN REGEXP_MATCH(Full Referrer,r'^site\.com/$') THEN 'Site - Home'
WHEN REGEXP_MATCH(Full Referrer,r'^site\.com/about$') THEN 'Site - About'
WHEN REGEXP_MATCH(Full Referrer,r'^site\.com/') THEN 'Site - (other)'
ELSE Full Referrer
END
You can also use the ELSE if you want to bucket all of the unmatched values into an "other" grouping instead of just leaving the original value.
Another thing to remember is that due to shared fields in GA, things like Source (utm_source) also show up in Full Referrer, so you could be seeing values there that you wouldn't normally expect. Often you can get rid of these by also filtering to only the Default Channel Grouping of "Referral".
If your patterns still aren't matching, please update the question with some additional details such as what the output actually is, whether there's an error message, etc.—and also whether you're doing this as a calculated field in the data source or the "Create Field" button on a single chart.

Related

Fail2Ban regex for Drupal log not matching

I am trying to match and ban certain patterns in my drupal logs (drupal 9).
I have taken the base drupal-auth regex, created a new conf and tried to amend it to my requirements but I seem to be failing at the first hurdle. This is the code that will give me anything that has the type 'user' and this is filtered by the user\ in the code below, just before the <HOST> block:
failregex = ^%(__prefix_line)s(https?:\/\/)([\da-z\.-]+)\.([a-z\.]{2,6})(\/[\w\.-]+)*\|\d{10}\|user\|<HOST>\|.+\|.+\|\d\|.*\|.+\.$
If I want to search exactly the same pattern, but with say 'page not found' or 'access denied' instead of 'user' what do I need? I cannot seem to get it to match the moment the type has a space in it. It seems such a simple thing to do!
I am using fail2ban-regex --print-all-matched to test.

RegEx for filtering in Azure using Terraform

The Terraform azurerm_image data source lets you use a RegEx to identify a machine image whose ID matches the regular expression.
What RegEx should be used to retrieve an image that includes the string MyImageName and that takes the complete form /subscriptions/abc-123-def-456-ghi-789-jkl/resourceGroups/MyResourceGroupName/providers/Microsoft.Compute/images/MyImageName1618954096 ?
The following version of the RegEx is throwing an error because it will not accept two * characters. However, when we only used the trailing *, the image was not retrieved.
data "azurerm_image" "search" {
name_regex = "*MyImageName*"
resource_group_name = var.resourceGroupName
}
Note that the results only return a single image so you do not need to worry about multiple images being returned. There is a flag that can be set to specify either ascending or descending sorting to retrieve the oldest or the newest match.
The precise error we are getting is:
Error: "name_regex": error parsing regexp: missing argument to repetition operator: `*`
Nick's Suggestion
Per #Nick's suggestion, we tried:
data "azurerm_image" "search" {
name_regex = "/MyImageName[^/]+$"
resource_group_name =
var.resourceGroupName
}
But the result is:
Error: No Images were found for Resource Group "MyResourceGroupName"
We checked in the Azure Portal and there is an image that includes MyImageName in its name within the resource group named MyResourceGroupName. We also confirmed that Terraform is running as the subscription owner, so we imagine that the subscription owner has sufficient authorization to filter image names.
What else can we try?
After my validation, it seems that it works when name_regex includes only one trailing *. If with one prefix *, it will generate that error message.
For example, I have an image name rrr-image-20210421150018 in my resource group.
The following works:
r*
-*
8*
rrr*
image*
2021*
The following does not work:
*r
*-
*8
*image*
*rrr*
Also, verify if you have the latest azurerm provider.
Result

How can I use regex to construct an API call in my Jekyll plugin?

I'm trying to write my own Jekyll plugin to construct an api query from a custom tag. I've gotten as far as creating the basic plugin and tag, but I've run into the limits of my programming skills so looking to you for help.
Here's my custom tag for reference:
{% card "Arbor Elf | M13" %}
Here's the progress on my plugin:
module Jekyll
class Scryfall < Liquid::Tag
def initialize(tag_name, text, tokens)
super
#text = text
end
def render(context)
# Store the name of the card, ie "Arbor Elf"
#card_name =
# Store the name of the set, ie "M13"
#card_set =
# Build the query
#query = "https://api.scryfall.com/cards/named?exact=#{#card_name}&set=#{#card_set}"
# Store a specific JSON property
#card_art =
# Finally we render out the result
"<img src='#{#card_art}' title='#{#card_name}' />"
end
end
end
Liquid::Template.register_tag('cards', Jekyll::Scryfall)
For reference, here's an example query using the above details (paste it into your browser to see the response you get back)
https://api.scryfall.com/cards/named?exact=arbor+elf&set=m13
My initial attempts after Googling around was to use regex to split the #text at the |, like so:
#card_name = "#{#text}".split(/| */)
This didn't quite work, instead it output this:
[“A”, “r”, “b”, “o”, “r”, “ “, “E”, “l”, “f”, “ “, “|”, “ “, “M”, “1”, “3”, “ “]
I'm also then not sure how to access and store specific properties within the JSON response. Ideally, I can do something like this:
#card_art = JSONRESPONSE.image_uri.large
I'm well aware I'm asking a lot here, but I'd love to try and get this working and learn from it.
Thanks for reading.
Actually, your split should work – you just need to give it the correct regex (and you can call that on #text directly). You also need to escape the pipe character in the regex, because pipes can have special meaning. You can use rubular.com to experiment with regexes.
parts = #text.split(/\|/)
# => => ["Arbor Elf ", " M13"]
Note that they also contain some extra whitespace, which you can remove with strip.
#card_name = parts.first.strip
#card_set = parts.last.strip
This might also be a good time to answer questions like: what happens if the user inserts multiple pipes? What if they insert none? Will your code give them a helpful error message for this?
You'll also need to escape these values in your URL. What if one of your users adds a card containing a & character? Your URL will break:
https://api.scryfall.com/cards/named?exact=Sword of Dungeons & Dragons&set=und
That looks like a URL with three parameters, exact, set and Dragons. You need to encode the user input to be included in a URL:
require 'cgi'
query = "https://api.scryfall.com/cards/named?exact=#{CGI.escape(#card_name)}&set=#{CGI.escape(#card_set)}"
# => "https://api.scryfall.com/cards/named?exact=Sword+of+Dungeons+%26+Dragons&set=und"
What comes after that is a little less clear, because you haven't written the code yet. Try making the call with the Net::HTTP module and then parsing the response with the JSON module. If you have trouble, come back here and ask a new question.

How to configure Fiddler's Autoresponder to "map" a host to a folder?

I'm already using Fiddler to intercept requests for specific remote files while I'm working on them (so I can tweak them locally without touching the published contents).
i.e. I use many rules like this
match: regex:(?insx).+/some_file([?a-z0-9-=&]+\.)*
respond: c:\somepath\some_file
This works perfectly.
What I'd like to do now is taking this a step further, with something like this
match: regex:http://some_dummy_domain/(anything)?(anything)
respond: c:\somepath\(anything)?(anything)
or, in plain text,
Intercept any http request to 'some_dummy_domain', go inside 'c:\somepath' and grab the file with the same path and name that was requested originally. Query string should pass through.
Some scenarios to further clarify:
http://some_domain/somefile --> c:\somepath\somefile
http://some_domain/path1/somefile --> c:\somepath\path1\somefile
http://some_domain/path1/somefile?querystring --> c:\somepath\path1\somefile?querystring
I tried to leverage what I already had:
match: regex:(?insx).+//some_dummy_domain/([?a-z0-9-=&]+\.)*
respond: ...
Basically, I'm looking for //some_dummy_domain/ in requests. This seems to match correctly when testing, but I'm missing how to respond.
Can Fiddler use matches in responses, and how could I set this up properly ?
I tried to respond c:\somepath\$1 but Fiddler seems to treat it verbatim:
match: regex:(?insx).+//some_domain/([?a-z0-9-=&]+\.)*
respond: c:\somepath\$1
request: http://some_domain/index.html
response: c:\somepath\$1html <-----------
The problem is your use of insx at the front of your expression; the n means that you want to require explicitly-named capture groups, meaning that a group $1 isn't automatically created. You can either omit the n or explicitly name the capture group.
From the Fiddler Book:
Use RegEx Replacements in Action Text
Fiddler’s AutoResponder permits you to use regular expression group replacements to map text from the Match Condition into the Action Text. For instance, the rule:
Match Text: REGEX:.+/assets/(.*)
Action Text: http://example.com/mockup/$1
...maps a request for http://example.com/assets/Test1.gif to http://example.com/mockup/Test1.gif.
The following rule:
Match Text: REGEX:.+example\.com.*
Action Text: http://proxy.webdbg.com/p.cgi?url=$0
...rewrites the inbound URL so that all URLs containing example.com are passed as a URL parameter to a page on proxy.webdbg.com.
Match Text: REGEX:(?insx).+/assets/(?'fname'[^?]*).*
Action Text C:\src\${fname}
...maps a request for http://example.com/‌assets/img/1.png?bunnies to C:\src\‌img\‌1.png.

Making a Regex Django URL Token Optional

You have a URL which accepts a first_name and last_name in Django:
('^(?P<first_name>[a-zA-Z]+)/(?P<last_name>[a-zA-Z]+)/$','some_method'),
How would you include the OPTIONAL URL token of title, without creating any new lines. What I mean by this is, in an ideal scenario:
#A regex constant
OP_REGEX = r'THIS IS OPTIONAL<title>[a-z]'
#Ideal URL
('^(?P<first_name>[a-zA-Z]+)/(?P<last_name>[a-zA-Z]+)/OP_REGEX/$','some_method'),
Is this possible without creating a new line i.e.
('^(?P<first_name>[a-zA-Z]+)/(?P<last_name>[a-zA-Z]+)/(?P<title>[a-zA-Z]+)/$','some_method'),
('^(?P<first_name>[a-zA-Z]+)/(?P<last_name>[a-zA-Z]+)(?:/(?P<title>[a-zA-Z]+))?/$','some_method'),
Don't forget to give title a default value in the view.
In case your are looking for multiple optional arguments, without any required ones, just omit "/" at the beginning, such as:
re_path(r'^view(?:/(?P<dummy1>[a-zA-Z]+))?(?:/(?P<dummy2>[a-zA-Z]+))?(?:/(?P<dummy3>[a-zA-Z]+))?/$', views.MyView.as_view(), name='myname'),
which you can browse at:
http://localhost:8000/view/?dummy1=value1&dummy2=value2&dummy3=value3