I've got a client with a real estate Website. Based on comparing goal completions to unique page views, it's pretty clear that some visitors inquire about multiple properties during a single session (more unique pages matching the REGEX string than completed goals). As goal tracking is set-up now, only one conversion per session is being completed (which I know is standard for GA).
I am using the destination as the confirmation of goal completion.
"Thank you" pages for conversions look like this:
/Enquiry/Thank-You/?subject=B30794&market=sale
/Enquiry/Thank-You/?subject=B36930&market=rental
My goal tracking REGEX is \Q/Enquiry/Thank-You/\E
I'm pretty sure that I can get Google Analytics to count more than one "thank you" as a completed goal during a session if I can account for the variable in the middle (B30794, B36930, etc.) plus the sale or rental on the end of the URL
I've asked around, including Google's GA community but can't get an string that works to handle the variable in the middle. These are two tags recommended to me that have not worked:
This string returns zero results:
^/Enquiry/Thank-You/\?subject=.*&market=(sale|rental)$
This string returns more than simply the thank you URLs:
^/Enquiry/Thank-You/\?subject=.*&market=sale|rental$
Try this one, the issue for the empty one was that the characters were not escaped:
^\/Enquiry\/Thank\-You\/\?subject=.*&market=(sale|rental)$
Related
I am working with the Google Ads team in my company on a Shopify store and they asked me for some regular expressions for the several steps of the checkout process. I created them and everything was running fine, until the guys noticed that sometimes Analytics added a _ga paremeter to the URL query parameters.
My original expressions are:
1. When in cart - no problem here
\/cart
2. First step of checkout - Contact Information - In several lines for easier reading
(
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)$
|
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\?step=contact_information
)
In this part I added the step=contact_information as an OR option. It isn't normally there except for when you go back to contact information it is added to the URL as step. I know this is not the ideal way, but I am far from fluent in regex.
3. Shipping information
(
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\?step=shipping_method
|
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)?(.*)&step=shipping_method
)
In this part it always has step=shipping_method but it can also have previous_step=contact_information. This is also not ideal, but I am not sure how to do it.
4. Payment information
(
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\?step=payment_method
|
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)?(.*)&step=payment_method
)
The same as step 3, in this case it always has step=payment_method but it can also have previous_step=shipping_method. As points 2 and 3, not ideal.
5. Processing - this part works fine, because I am not interested in the query parameters
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\/processing
6. Thank you page - this also works fine, because I am not interested in the query parameters
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\/thank_you
Issue with _ga parameter
Those regular expressions work fine with the regular URLs, but when I add the _ga parameter to the URL they don't match. I think there was a way to match query parameters, but I am not sure how to match certain and exclude others.
The _ga parameter normally persists on the next steps
The list of all the possible matches for points 2., 3. and 4.:
Contact information without and with _ga
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=contact_information
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=contact_information
Shipping method without and with _ga
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=shipping_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=shipping_method&previous_step=contact_information
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=shipping_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=shipping_method&previous_step=contact_information
Payment method without and with _ga
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=payment_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=payment_method&previous_step=shipping_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=payment_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=payment_method&previous_step=shipping_method
Any ideas how I could solve this? I am pretty sure it's simple, but my brain just doesn't get around more complex regular expressions :)
UPDATE
Just to clear this up a bit more, what I need to achieve with the regular expressions is to identify specifically the step of the funnel.
The Google Ads guys from my team are creating a funnel in Analytics and they add the corresponding steps from the checkout as stages of the funnel.
So basically I just need my regexes to be able to work with or without the _ga query, BUT always detecting a specific step.
UPDATE 2
I added all the possible matches. I need to be able to identify the specific step through the regular expression. So basically I need one regular expression for contact information, one for shipping method and one for payment method, each identifying only the specific step with or without _ga in the URL.
I believe for the checkout url you can simply use this regex:
/([0-9]+)/checkouts/([a-z0-9-]+)(?:.*step=([a-z0-9_-]+))?
no matter if the url is with/without _ga parameter.
Basically it will provide you three groups in a match, the third group will contain step parameter value, e.g.: contact_information
Example:
https://regex101.com/r/C1GuDY/1
I am using Google Datastudio to make a CASE statement to take a multi-words string and split it out into categories. I was asked to use REGEXP_MATCH (nothing else, I know contains function would be easier).
I need a solution to match the following words:
HouseBrochure
home brochure
HomeBrochure
house brochure
Bathroom brochure
Bathroombrochure
FloorBrochure
floor brochure
To complicate matters, these words come in via a website request system, meaning people can request a house, bathroom and floor brochure in one request. When such requests reach my server, it compiles into a list(string) which looks like this:
# (with the pipes included)
HouseBrochure|Bathroom brochure|floor brochure
This is just an example of 1 request, there are many variations and multiple requests that come through (I've also only included a few of these brochures, there are many more)
I need to separate out all the house brochures, all the bathroom brochures and all the floor brochures etc, so I can count how many requests have been made for each brochure.
Being new to Regex, I have a basic understanding but nowhere near advanced.
My current attempt in Data studio looks like this:
CASE
WHEN REGEXP_MATCH(Event Label,'^.*(HouseBrochure.*|home brochure.*|HomeBrochure.*|house brochure.*).*$') THEN 'Home Brochure'
END
This is just for the home brochure, yet it's not working, can someone help?
Also, as an FYI Datastudio uses REG2
My approach would be:
convert everything to lower case (avoid messing with upper/lower case differences)
Use regex to replace variations with base form:
e.g.
(house|home)\s*brochure
replace with
HomeBrochure
Test here.
Do some counting as needed, using just the base keywords.
I need to create a filter on Google Analytics to include only a set of pages, for example, the view will have a filter to collect data only from
www.example.com/page1.html
www.example.com/page2.html
www.example.com/page3.html
I am trying to achieve this by using a Custom Filter to Include - > Request URI and using a Regex on the Filter Pattern.
My problem is that the Regex exceeds the 255 character limitation, even after I tried to optimice the regex to be a small as possible.
Creating more than one Include Filter does not work because this way no data would be collected, so I am wondering how could I achieve this? Thank you
This is the original regex
/es/investigacion/lace\.html|/en/research/lace\.html|/es/investigacion/lace/programa-maestrias\.html|/es/investigacion/lace/alumni/latin-american-forum-entrepreneurs\.html|/es/investigacion/lace/alumni/fondo-angel-investment\.html|/es/investigacion/lace/alumni\.html|/es/investigacion/lace/fondo-inversion\.html|/es/investigacion/lace/investigacion\.html|/es/investigacion/lace/acerca-del-centro\.html|/es/investigacion/lace/alumni/estudiantes-del-pais\.html|/en/research/investigation\.html|/en/research/about-the-center\.html|/es/investigacion/lace/alumni/mentoring\.html|/es/investigacion/lace/alumni/reatu-entrepreneur-award\.html|/en/research/lace/master-program\.html|/en/research/lace/alumni\.html|/en/research/investment-fund\.html
Edit: first try to compress the regex
/es/investigacion/lace/(programa-maestrias|alumni|investigacion|programa-maestrias|alumni/latin-american-forum-entrepreneurs|alumni/fondo-angel-investment|fondo-inversion|investigacion|acerca-del-centro|alumni/estudiantes-del-pais|alumni/mentoring|alumni/incae-entrepreneur-award)\.html|
Edit: the reason for this is because I need to create a new user profile on GA, and this new profile will have access to the information of a set of URLs only; so what occurred to me is create a new View that only captures the information of this set of URLs, and then assign the profile to this view with "Read/Analyze" permissons.
There are definitely more ways to optimise the regex. For example, since all string options end with .html, you could do something like this:
/(es/investigacion/lace|en/research/lace)\.html
by taking the .html out.
You could also take out
/es/investigacion/lace
and weave in the variable part of that using |'s, eg.
/es/investigacion/lace/(programa-maestrias|alumni|investigacion)\.html
But try a few of those optimisation techniques and you should be able to fit more in.
I have 500 or so spambots and about 5 actual registered users on my wiki. I have used nuke to delete their pages but they just keep reposting. I have spambot registration under control using reCaptcha. Now, I just need a way to delete/block/merge about 500 users at once.
You could just delete the accounts from the user table manually, or at least disable their authentication info with a query such as:
UPDATE /*_*/user SET
user_password = '',
user_newpassword = '',
user_email = '',
user_token = ''
WHERE
/* condition to select the users you want to nuke */
(Replace /*_*/ with your $wgDBprefix, if any. Oh, and do make a backup first.)
Wiping out the user_password and user_newpassword fields prevents the user from logging in. Also wiping out user_email prevents them from requesting a new password via email, and wiping out user_token drops any active sessions they may have.
Update: Since I first posted this, I've had further experience of cleaning up large numbers of spam users and content from a MediaWiki installation. I've documented the method I used (which basically involves first deleting the users from the database, then wiping out up all the now-orphaned revisions, and finally running rebuildall.php to fix the link tables) in this answer on Webmasters Stack Exchange.
Alternatively, you might also find Extension:RegexBlock useful:
"RegexBlock is an extension that adds special page with the interface for blocking, viewing and unblocking user names and IP addresses using regular expressions."
There are risks involved in applying the solution in the accepted answer. The approach may damage your database! It incompletely removes users, doing nothing to preserve referential integrity, and will almost certainly cause display errors.
Here a much better solution is presented (a prerequisite is that you have installed the User merge extension):
I have a little awkward way to accomplish the bulk merge through a
work-around. Hope someone would find it useful! (Must have a little
string concatenation skills in spreadsheets; or one may use a python
or similar script; or use a text editor with bulk replacement
features)
Prepare a list of all SPAMuserIDs, store them in a spreadsheet or textfile. The list may be
prepared from the user creation logs. If you do have the
dB access, the Wiki_user table can be imported into a local list.
The post method used for submitting the Merge & Delete User form (by clicking the button) should be converted to a get method. This
will get us a long URL. See the second comment (by Matthew Simoneau)
dated 13/Jan/2009) at
http://www.mathworks.com/matlabcentral/newsreader/view_thread/242300
for the method.
The resulting URL string should be something like below:
http: //(Your Wiki domain)/Special:UserMerge?olduser=(OldUserNameHere)&newuser=(NewUserNameHere)&deleteuser=1&token=0d30d8b4033a9a523b9574ccf73abad8%2B\
Now, divide this URL into four sections:
A: http: //(Your Wiki domain)/Special:UserMerge?olduser=
B: (OldUserNameHere)
C: &newuser=(NewUserNameHere)&deleteuser=1
D: &token=0d30d8b4033a9a523b9574ccf73abad8%2B\
Now using a text editor or spreadsheet, prefix each spam userIDs with part A and Suffix each with Part C and D. Part C will include the
NewUser(which is a specially created single dummy userID). The Part D,
the Token string is a session-dependent token that will be changed per
user per session. So you will need to get a new token every time a new
session/batch of work is required.
With the above step, you should get a long list of URLs, each good to do a Merge&Delete operation for one user. We can now create a
simple HTML file, view it and use a batch downloader like DownThemAll
in Firefox.
Add two more pieces " Linktext" to each line at
beginning and end. Also add at top and at
bottom and save the file as (for eg:) userlist.html
Open the file in Firefox, use DownThemAll add-on and download all the files! Effectively, you are visiting the Merge&Delete page for
each user and clicking the button!
Although this might look a lengthy and tricky job at first, once you
follow this method, you can remove tens of thousands of users without
much manual efforts.
You can verify if the operation is going well by opening some of the
downloaded html files (or by looking through the recent changes in
another window).
One advantage is that it does not directly edit the
MySQL pages. Nor does it require direct database access.
I did a bit of rewriting to the quoted text, since the original text contains some flaws.
Alright, i've played around with this for over a week now and I can't get it to work. Using a regular expression match:
My GOAL URL:
category=thanks
This is not tracking correctly
My only goal step:
/s.nl\?c=1025622&n=5&sc=[0-9]+&ext=T&add=[0-9]&whence=
This is tracking correctly but it is saying everybody exits on this step and does not go to my goal URL
Upon looking at pages that contain category=thanks, I found the following tracked URLs
/s.nl?c=1025622&sc=44&category=thanks&whence=&n=5
/s.nl?c=1025622&sc=44&category=thanks&n=5
/s.nl?c=1025622&sc=44&category=thanks&whence=&n=5&redirect_count=1&did_javascript_redirect=T
/s.nl?c=1025622&n=5&sc=44&category=thanks&it=A&login=T
along with a bunch of other containing category=thanks. As I obviously can't compensate for all these changing URL, I figured just having "category=thanks" would work, but apparently not?
Your category=thanks statement is not a regular expression that matches the URLs you mentioned. It would apply for filters and segments where there's a 'URI contains' option, but not a full match.
I beleive you have 2 options:
Go to Content report in GA, choose a larger time period (like a year) and add a category=thanks filter. You'll get all possible goal URLs. For them you'll have to write a regular expression. From what you describe, your URL structure is a mess (too many parametrs), so take a look at option 2.
Add a small script to your goal page that would redirect your visitors to a page with a clean url. Something like a conditional statement saying if (URL contains category=thanks) {redirect to /thankyou.html}, then use this thankyou.html as a goal in GA.