Shopify regular expressions for checkout process - regex

I am working with the Google Ads team in my company on a Shopify store and they asked me for some regular expressions for the several steps of the checkout process. I created them and everything was running fine, until the guys noticed that sometimes Analytics added a _ga paremeter to the URL query parameters.
My original expressions are:
1. When in cart - no problem here
\/cart
2. First step of checkout - Contact Information - In several lines for easier reading
(
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)$
|
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\?step=contact_information
)
In this part I added the step=contact_information as an OR option. It isn't normally there except for when you go back to contact information it is added to the URL as step. I know this is not the ideal way, but I am far from fluent in regex.
3. Shipping information
(
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\?step=shipping_method
|
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)?(.*)&step=shipping_method
)
In this part it always has step=shipping_method but it can also have previous_step=contact_information. This is also not ideal, but I am not sure how to do it.
4. Payment information
(
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\?step=payment_method
|
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)?(.*)&step=payment_method
)
The same as step 3, in this case it always has step=payment_method but it can also have previous_step=shipping_method. As points 2 and 3, not ideal.
5. Processing - this part works fine, because I am not interested in the query parameters
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\/processing
6. Thank you page - this also works fine, because I am not interested in the query parameters
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\/thank_you
Issue with _ga parameter
Those regular expressions work fine with the regular URLs, but when I add the _ga parameter to the URL they don't match. I think there was a way to match query parameters, but I am not sure how to match certain and exclude others.
The _ga parameter normally persists on the next steps
The list of all the possible matches for points 2., 3. and 4.:
Contact information without and with _ga
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=contact_information
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=contact_information
Shipping method without and with _ga
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=shipping_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=shipping_method&previous_step=contact_information
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=shipping_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=shipping_method&previous_step=contact_information
Payment method without and with _ga
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=payment_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=payment_method&previous_step=shipping_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=payment_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=payment_method&previous_step=shipping_method
Any ideas how I could solve this? I am pretty sure it's simple, but my brain just doesn't get around more complex regular expressions :)
UPDATE
Just to clear this up a bit more, what I need to achieve with the regular expressions is to identify specifically the step of the funnel.
The Google Ads guys from my team are creating a funnel in Analytics and they add the corresponding steps from the checkout as stages of the funnel.
So basically I just need my regexes to be able to work with or without the _ga query, BUT always detecting a specific step.
UPDATE 2
I added all the possible matches. I need to be able to identify the specific step through the regular expression. So basically I need one regular expression for contact information, one for shipping method and one for payment method, each identifying only the specific step with or without _ga in the URL.

I believe for the checkout url you can simply use this regex:
/([0-9]+)/checkouts/([a-z0-9-]+)(?:.*step=([a-z0-9_-]+))?
no matter if the url is with/without _ga parameter.
Basically it will provide you three groups in a match, the third group will contain step parameter value, e.g.: contact_information
Example:
https://regex101.com/r/C1GuDY/1

Related

Use REGEX to generate multiple conversions per session in Google Analytics?

I've got a client with a real estate Website. Based on comparing goal completions to unique page views, it's pretty clear that some visitors inquire about multiple properties during a single session (more unique pages matching the REGEX string than completed goals). As goal tracking is set-up now, only one conversion per session is being completed (which I know is standard for GA).
I am using the destination as the confirmation of goal completion.
"Thank you" pages for conversions look like this:
/Enquiry/Thank-You/?subject=B30794&market=sale
/Enquiry/Thank-You/?subject=B36930&market=rental
My goal tracking REGEX is \Q/Enquiry/Thank-You/\E
I'm pretty sure that I can get Google Analytics to count more than one "thank you" as a completed goal during a session if I can account for the variable in the middle (B30794, B36930, etc.) plus the sale or rental on the end of the URL
I've asked around, including Google's GA community but can't get an string that works to handle the variable in the middle. These are two tags recommended to me that have not worked:
This string returns zero results:
^/Enquiry/Thank-You/\?subject=.*&market=(sale|rental)$
This string returns more than simply the thank you URLs:
^/Enquiry/Thank-You/\?subject=.*&market=sale|rental$
Try this one, the issue for the empty one was that the characters were not escaped:
^\/Enquiry\/Thank\-You\/\?subject=.*&market=(sale|rental)$

Google Analytics Filter Set of Pages

I need to create a filter on Google Analytics to include only a set of pages, for example, the view will have a filter to collect data only from
www.example.com/page1.html
www.example.com/page2.html
www.example.com/page3.html
I am trying to achieve this by using a Custom Filter to Include - > Request URI and using a Regex on the Filter Pattern.
My problem is that the Regex exceeds the 255 character limitation, even after I tried to optimice the regex to be a small as possible.
Creating more than one Include Filter does not work because this way no data would be collected, so I am wondering how could I achieve this? Thank you
This is the original regex
/es/investigacion/lace\.html|/en/research/lace\.html|/es/investigacion/lace/programa-maestrias\.html|/es/investigacion/lace/alumni/latin-american-forum-entrepreneurs\.html|/es/investigacion/lace/alumni/fondo-angel-investment\.html|/es/investigacion/lace/alumni\.html|/es/investigacion/lace/fondo-inversion\.html|/es/investigacion/lace/investigacion\.html|/es/investigacion/lace/acerca-del-centro\.html|/es/investigacion/lace/alumni/estudiantes-del-pais\.html|/en/research/investigation\.html|/en/research/about-the-center\.html|/es/investigacion/lace/alumni/mentoring\.html|/es/investigacion/lace/alumni/reatu-entrepreneur-award\.html|/en/research/lace/master-program\.html|/en/research/lace/alumni\.html|/en/research/investment-fund\.html
Edit: first try to compress the regex
/es/investigacion/lace/(programa-maestrias|alumni|investigacion|programa-maestrias|alumni/latin-american-forum-entrepreneurs|alumni/fondo-angel-investment|fondo-inversion|investigacion|acerca-del-centro|alumni/estudiantes-del-pais|alumni/mentoring|alumni/incae-entrepreneur-award)\.html|
Edit: the reason for this is because I need to create a new user profile on GA, and this new profile will have access to the information of a set of URLs only; so what occurred to me is create a new View that only captures the information of this set of URLs, and then assign the profile to this view with "Read/Analyze" permissons.
There are definitely more ways to optimise the regex. For example, since all string options end with .html, you could do something like this:
/(es/investigacion/lace|en/research/lace)\.html
by taking the .html out.
You could also take out
/es/investigacion/lace
and weave in the variable part of that using |'s, eg.
/es/investigacion/lace/(programa-maestrias|alumni|investigacion)\.html
But try a few of those optimisation techniques and you should be able to fit more in.

Google Analytics: View Filter by Request URI - does this work?

I think this should be pretty straight forward but if I have a url(s) with a request URIs such as:
/en/my-hometown/92-winston
/en/my-hometown/92-winston/backyard
and I want to set up a GA view which only includes this page and any subpages, then does the following Filter work in Analytics?
Will that basically filter against and URI which specifically contains 92-winston or do I need to wrap it in a fancy regex? I'm not that great with RegEx.
Thanks! Apologies in advance if this is ridiculously easy.
You may want to try this regex, if you know exactly that the URI will start with "/en/my-hometown/92-winston":
^\/en\/my\-hometown\/92\-winston.*
This captures URI strings that start exactly with "/en/my-hometown/92-winston" and also include anything that may come after that, for example
/en/my-hometown/92-winston
/en/my-hometown/92-winston/backyard
/en/my-hometown/92-winston/some_other_page
Just beware that if you have more than one "Include" filter in GA, then you need to make sure you aren't actually excluding more data, as multiple include filters are "and"ed. Always test your new filters in your test view as well.

Google Analytics Reg Ex isn't tracking goal URL

Alright, i've played around with this for over a week now and I can't get it to work. Using a regular expression match:
My GOAL URL:
category=thanks
This is not tracking correctly
My only goal step:
/s.nl\?c=1025622&n=5&sc=[0-9]+&ext=T&add=[0-9]&whence=
This is tracking correctly but it is saying everybody exits on this step and does not go to my goal URL
Upon looking at pages that contain category=thanks, I found the following tracked URLs
/s.nl?c=1025622&sc=44&category=thanks&whence=&n=5
/s.nl?c=1025622&sc=44&category=thanks&n=5
/s.nl?c=1025622&sc=44&category=thanks&whence=&n=5&redirect_count=1&did_javascript_redirect=T
/s.nl?c=1025622&n=5&sc=44&category=thanks&it=A&login=T
along with a bunch of other containing category=thanks. As I obviously can't compensate for all these changing URL, I figured just having "category=thanks" would work, but apparently not?
Your category=thanks statement is not a regular expression that matches the URLs you mentioned. It would apply for filters and segments where there's a 'URI contains' option, but not a full match.
I beleive you have 2 options:
Go to Content report in GA, choose a larger time period (like a year) and add a category=thanks filter. You'll get all possible goal URLs. For them you'll have to write a regular expression. From what you describe, your URL structure is a mess (too many parametrs), so take a look at option 2.
Add a small script to your goal page that would redirect your visitors to a page with a clean url. Something like a conditional statement saying if (URL contains category=thanks) {redirect to /thankyou.html}, then use this thankyou.html as a goal in GA.

Match all characters in group except for first and last occurrence

Say I request
parent/child/child/page-name
in my browser. I want to extract the parent, children as well as page name. Here are the regular expressions I am currently using. There should be no limit as to how many children there are in the url request. For the time being, the page name will always be at the end and never be omitted.
^([\w-]{1,}){1} -> Match parent (returns 'parent')
(/(?:(?!/).)*[a-z]){1,}/ -> Match children (returns /child/child/)
[\w-]{1,}(?!.*[\w-]{1,}) -> Match page name (returns 'page-name')
The more I play with this, the more I feel how clunky this solution is. This is for a small CMS I am developing in ASP Classic (:(). It is sort of like the MVC routing paths. But instead of calling controllers and functions based on the URL request. I would be travelling down the hierarchy and finding the appropriate page in the database. The database is using the nested set model and is linked by a unique page name for each child.
I have tried using the split function to split with a / delimiter however I found I was nested so many split statements together it became very unreadable.
All said, I need an efficient way to parse out the parent, children as well as page name from a string. Could someone please provide an alternative solution?
To be honest, I'm not even sure if a regular expression is the best solution to my problem.
Thank you.
You could try using:
^([\w-]+)(/.*/)([\w-]+)$
And then access the three matching groups created using Match.SubMatches. See here for more details.
EDIT
Actually, assuming that you know that [\w-] is all that is used in the names of the parts, you can use ^([\w-]+)(.*)([\w-]+)$ instead and it will handle the no-child case fine by itself as well.