Google Analytics Regex Code - regex

I'm having trouble figuring out the last part of my regex code for Google Analytics. I want to be able to grab any URL from my site that fits the following pattern:
www.site.com/hotel/[any text]/rooms?[any text]
So the URLs will always begin with /hotels and will always end with /rooms? followed by any possible text string with any possible text between "hotel/" and "/rooms?".
I have this much: ^/hotel/([^/])+/rooms([^\?])
But I'm not sure how to finish this so that it will only capture URLs that have text after the "?"

This works. You may want to tighten up the the allowed text in the path parameter and query parameter.
^www.site.com/hotel/[^/]+/rooms\?.+$

Related

How to dismiss the end of the url parameters with regex?

I have a script that is supposed to trigger when a certain page path is open.
The issue: the page path contains multiple parameters including the parameter "returnUrl", returning the previous page visited.
Here is the url I want to check :
/cxsSearchApply?positionId=a0w0X000004IceYQAS&lang=en&returnUrl=https://example.com/cxsrec__cxsSearchDetail?id=a0w0X000004IceYQAS&lang=en&returnUrl=https://example.com/cxsrec__cxsSearch&lang=en
I initially used this regex code to get triggered on this page :
(cxsSearchApply.*)
But I have others regex codes like:
(cxsSearchSearchDetail.*)
And they also trigger because of the page path included in the url...
What reggex I should use to match the first part of the url but nothing after "returnUrl" ?
So you want to match cxsSearchApply on the text before &returnUrl. You could use a lookahead:
(cxsSearchApply.*)(?=returnUrl=)
However, what you really want is to match everything before the first &returnUrl. So you need a non-greedy operator:
(cxsSearchApply.*?)(?=returnUrl=)
Likewise, for your other search, it should no longer match because it is also only looking at the first part:
(cxsSearchSearchDetail.*?)(?=returnUrl=)
I believe that will get you what you want.
Nothing after "returnUrl"
If this is literally what you want, you can simply do (.*)(&returnUrl=.*) and take the first capture group as your result.

Remove a character from the middle of a string with regex

I have no programing experience and thought this would be simple, but I have searched for days without luck. I am using a program to strip content from a web page. The program uses regex filters to display what you want from the stripped content. The stripped content can be any letter and is in the form of USD/SEK. I want to display USDSEK, without the "/"
Thanks
To elaborate further - I am using a program called Data toolbar for chrome, which makes it easy to strip content from web pages. After it strips the content, it provides a regex filter to display what part of the content is displayed. But I have to know the regex command to remove the / from USD/SEK, to display just USDSEK. I've tried [A-Z.,]+ but that only displays USD. I need the regex command to grab the first 3 and last 3 characters only, or to omit the / from the string.
Try adding parentheses around the groups which you wish to capture:
([a-zA-Z]{3})\/([a-zA-Z]{3})
or
([a-zA-Z]{3})\/((?1))
Depending on the functionality of the program you are using you can then reference these captured groups as $1and $2 (or \1and \2 depending on flavor)

Regular Expression for retrieving File Extension in HTTP url

I am working on the ELK stack and as part of Logstash data transformation i am transforming data in Apache access logs.
One of the metric needed is to get a stat on different content types (aspx, php, gif, etc.).
From the log file I am trying to retrieve request url and then deduce the file type, for ex /c/dataservices/online.jsp?callBack is the request and I would get .aspx using the regular expression
\.\w{3,4}.
My regular expression wont work for request say /etc/designs/design/libs.min.1253.css this is returning me .min as the extension.
I am trying to get the last extension but it is not working. Please do suggest other approaches.
You need to anchor the match to the end of the string or the beginning of a query param ?. Try:
\.\w{3,4}($|\?)
Play with it here: https://regex101.com/r/iV3iM1/1
You're going to need a much fancier Regex.
Try this one.
([/.\w]+)([.][\w]+)([?][\w./=]+)?
This uses three capture groups. The first ([/.\w]+) matches your path up to the last .
The second ([.][\w]+) matches the final extension, and you can use the capture group to read it out.
The third ([?][\w./=]+)? matches the query string, which is optional.

Grabbing specific query string parameters from URL with regex

We have an implementation of Liferay portal and I'm just getting started with using Google Analytics with it. I'm noticing a lot of duplicate entries in GA, mainly because of the query strings in the URI, for example:
/web/home-community/search-and-help?p_p_id=mytcdirectory_WAR_mytcdirectory&p_p_lifecycle=1&p_p_state=normal&p_p_mode=view&p_p_col_id=column-3&p_p_col_count=4&_mytcdirectory_WAR_mytcdirectory_action=getResults
I'm playing around with the Search and Replace filters in GA (using regex) and my goal is to try to pull out the ?p_p_id and &*_action parameters from the URI, and disregard the rest. I'm getting close with the following regex:
^([^\?]+)([\?\&]p_p_id=[^\&]+)?.*(\&[^\&]+_action=[^\&]+)?.*$
But that last grouping isn't working correctly. If I remove the ? from the end of the last grouping it matches, but the problem with that approach is that not all URIs contain that query string so it needs to be optional. But if I keep it in, it won't grab that last parameter. My regex fiddle is located here:
http://regex101.com/r/qQ2dE4/13
Thank you all in advance for any help.

How can I use regular expression to match urls starting with https and ending with #?

Very much a newb with regex and having a hard time figuring this one out. I have an HTML document and I want to clear out a ton of URLs that are inside of it. All of the URLs begin with https:// and they all end with a pound sign #.
Any help would be extremely appreciative. Using sublime text for my editor in case that is needed.
A basic way to do it:
\bhttps://[^\s#]+#
free-spaced:
\b //word start
https://
[^\s#]+ //followed by anything but whitespace and '#'
#
If you truly want to clear everything in between the url from https:// [...] # then you can use:
^(https)+(.)*(#)+$
But you may want to be more specific in terms of what you are filtering out. If this is from a database query you should be ok since you can assume the URL will be the content of the field(s) returned the you will be running the regex through a code loop of some kind.
BTW you can hone your scripts using something like http://regexpal.com/