Regular Expression for Google Analytics to determine page - regex

I'm looking specifically for a regular expression that will grab the last term of a URL. This is not always a file name, it may not end in .html or .php, so I'll need to make sure that the regular expression is grabbing the last term from the URL.
Example:
I need to grab www.mydomain.com/anything_can_be_here/thankyoupage
I need to extract "thankyoupage" even when there can be any term preceding it in the URL.
Also note, there is no file extension on the thankyoupage URL segment.

This should do it:
/^(?:http:\/\/)?(?:[^\/]+)\/.*?\/([^\/]+)(?:\?.*)?$/
For example, the result of this:
m = 'http://example.com/where/is?the=pancakes/house'.match(/^(?:http:\/\/)?(?:[^\/]+)\/.*?\/([^\/]+)(?:\?.*)?$/);
is this array:
["http://example.com/where/is?the=pancakes/house", "is"]
And this:
m = 'http://example.com/where/is'.match(/^(?:http:\/\/)?(?:[^\/]+)\/.*?\/([^\/]+)(?:\?.*)?$/)
Results in:
["http://example.com/where/is", "is"]
And this:
m = 'http://example.com/'.match(/^(?:http:\/\/)?(?:[^\/]+)\/.*?\/([^\/]+)(?:\?.*)?$/)
Results in null.
And your component is in m[1] and that comes from ([^\/]+). The (?:[^\/]+) will take care of the hostname (and the userinfo if it happens to be present), the (?:\?.*)?$ part will take care of any trailing CGI arguments.
Depending on your URLs, you could replace ^(?:http:\/\/)? with ^http:\/\/.

If you are only feeding it urls, something simple as .*/(.*) should work
that's assuming there is a '/' after the .com/.org/whatever
otherwise you'll get everything after the http://

what you need is the path name, which can be access using:
window.location.pathname;

Try this regex:
^http:\/\/.*/(.+)$
It will look for string starting with http:// then will go all the way till the last / and store everything after the last / into $1 variable.

The regexp:
/(\/([^\/]+))+/g
Take the 3rd element of the resulting array:
var a='http://www.host.com/aaa/bbb/ccc/dd.pp';
var regexp=/(\/([^\/]+))+/g;
var result=regexp.exec(a)
if( result.length==3) {
document.write('<p>'+result[2]+'</p>');
} else {
document.write('<p>Fail</p>');
}

Try this:
var str = "www.mydomain.com/other/other/this";
var path = /(?:https?:\/\/)?(?:www\.)?.*\/([^\/]+)/.exec(str)[1]; //this

Hope this is what you want
console.log(window.location.pathname.split('/').reverse()[0]);

Alright figured it outmyself, thanks anyways guys
/\/*\/thanks/
will match /thanks

Related

Regex ignore first 12 characters from string

I'm trying to create a custom filter in Google Analytic to remove the query parts of the url which I don't want to see. The url has the following structure
[domain]/?p=899:2000:15018702722302::NO:::
I would like to create a regex which skips the first 12 characters (that is until:/?p=899:2000), and what ever is going to be after that replace it with nothing.
So I made this one: https://regex101.com/r/Xgbfqz/1 (which could be simplified to .{0,12}) , but I actually would like to skip those and only let the regex match whatever is going to be after that, so that I'll be able to tell in Google Analytics to replace it with "".
The part in the url that is always the same is
?p=[3numbers]:[0-4numbers]
Thank you
Your regular expression:
\/\?p=\d{3}\:\d{0,4}(.*)
Tested in Golang RegEx 2 and RegEx101
It search for /p=###:[optional:####] and capture the rest of the right side string.
(extra) JavaScript:
paragraf='[domain]/?p=899:2000:15018702722302::NO:::'
var regex= /\/\?p=\d{3}\:\d{0,4}(.*)/;
var match = regex.exec(paragraf);
alert('The rest of the right side of the string: ' + match[1]);
Easily use "[domain]/?p=899:2000:15018702722302::NO:::".substr(12)
You can try this:
/\?p\=\d{3}:\d{0,4}
Which matches just this: ?p=[3numbers]:[0-4numbers]
Not sure about replacing though.
https://regex101.com/r/Xgbfqz/1

Remove a directory from a URL with regex

I have a site with the following URL structure in places:
www.sitename.com/folder/sub_folder/item
What I need to do is remove the sub_folder part so that it displays as:
www.sitename.com/folder/item
Is there a regex expression for that?
Not sure if there's a regex expression, but you should be able to parse, try splitting on '/', then removing everything between the first and last instance, then recombine with '/'
So if string s="www.sitename.com/folder/sub_folder/item";
string newUrl=s.split('/')[0]+"/"+s.split('/')[s.split('/').length];
or something along those lines
In Python, you could do:
str1 = "www.sitename.com/folder/sub_folder/item"
str2 = re.sub(r'/\w*(/\w*)$',r'\1',str1)

Regular Expression for String without a "?" character to redirect to string with "?" character

On our website we occasionally experience an error where dynamic links aren't building correctly.
URLs like this
https://www.test.url.edu/collections/&edan_fq[]=p.edanmdm.indexedstructured.object_type:%22Financial+records%22&edan_fq[]=p.edanmdm.descriptivenonrepeating.record_id:item_*
Should actually be this:
https://www.test.url.edu/collections/search?edan_fq[]=p.edanmdm.indexedstructured.object_type:%22Financial+records%22&edan_fq[]=p.edanmdm.descriptivenonrepeating.record_id:item_*
We want to create a regular expression to redirect
/collections/&edan_fq[]=
to
/collections/search?edan_fq[]=
But everything after "edan_fq[]=" can change dynamically--there are thousands of permutations of the string after that point.
Does anyone know how this would be done?
If you use \& without Global Flag in Regex it will give first match. I've used JavaScript, please check this.
var data = "https://www.test.url.edu/collections/&edan_fq[]=p.edanmdm.indexedstructured.object_type:%22Financial+records%22&edan_fq[]=p.edanmdm.descriptivenonrepeating.record_id:item_*";
var regex = /\&/
data = data.replace(regex,"search?");
console.log(data);
Please check Substitution example in Regex101.

301 Redirect Regex Pattern - Sitecore Redirect module

I apologize if this seems like a rudimentary question, but I'm trying to setup a redirect pattern for the 301 module in Sitecore and am having a hard time with the proper pattern.
I need to have the following path:
http://www.example.com/some-path/videos/2014/08/08/15/20/some-item-title
converted to:
http://www.example.com/some-path/videos/some-item-title
Basically strip the numerical folders out. How can I do that but preserve the beginning of the path and the item name at the end.
An https-safe version would be appreciated. Thanks.
EDIT:
Note, the numerical folders are always in the above format: There is a 4 digit folder, followed by four, 2 digit folders.
some-path/whatever/4444/22/22/22/22/item-name
This will fit your "less global" requirement.
var pattern = #"^(https?://[^/]*/some-path/videos/)\d{4}/\d{2}/\d{2}/\d{2}/\d{2}/(.*)$";
var match = Regex.Match(myurl, pattern, RegexOptions.IgnoreCase);
var rewrittenUrl = string.Empty;
if (match.Success)
{
rewrittenUrl = match.Groups[1].Value + match.Groups[2].Value;
}
Note that I chose to ignore case. This is probably correct behavior given that you are dealing with URLs. I edited the original post such that the pattern will match any host now.
Your regex could be like that :
(https?:\/\/.*)\/\d{4}\/(\d{2}\/){4}(.*)
And your substitution :
$1/$3
It will preserve the https since the s? is an equivalent of s{0,1} in this case and this part is included in the first parentheses wrapped block.

How do I remove "http://" from a string in actionscript?

This one may seem basic but I don't know how to do it - anybody else?
I have a string that looks like this:
private var url:String = "http://subdomain";
What regex do I need so I can do this:
url.replace(regex,"");
and wind up with this?
trace(url); // subdomain
Or is there an even better way to do it?
Try this:
url.replace("http:\/\/","");
Like bedwyr said. :)
This will match only at the beginning of the string and will catch https as well:
url.replace("^https?:\/\/","");
ActionScript does indeed support a much richer regex repetoire than bewdwyr concluded. You just need to use an actual Regexp, not a string, as the replacement parameter. :-)
var url:String;
url = "https://foo.bar.bz/asd/asdasd?asdasd.fd";
url = url.replace(/^https?:\/\//, "");
To make this perhaps even clearer
var url:String;
var pattern:RegExp = /^https?:\/\//;
url = "https://foo.bar.bz/asd/asdasd?asdasd.fd";
url = url.replace(pattern, "");
RegExp is a first class ActionScript type.
Note that you can also use the $ char for end-of-line and use ( ) to capture substrings for later reuse. Plenty of power there!