RegEx to cut out URL

RegEx to cut out URL - regex

I try to get an URL from a String of the following format:
RANDOMRUBBISHhttps://www.my-url.com/randomfirstname_randomlastnameRANDOMRUBBISH
I already tried some things, especially the the look before/after, which I used before successfully on another url format (starts https... ends .html, this was working).
But seems I'm too stupid to figure out the regex for the kind of string mentioned above. I just want the URL part from https.... to the end of the random last name. Is this even possible?
Any Ideas?

If you can guarantee that randomfirstname_randomlastname is all lowercase and RANDOMRUBBISH is all uppercase, you can use character classes [a-z] and [A-Z]. The language the regex is for will determine how to use these.
This is example works in javascript:
var str = "RANDOMRUBBISHhttps://www.my-url.com/randomfirstname_randomlastnameRANDOMRUBBISH";
var match = /https:\/\/www\.my-url\.com\/[a-z]*/.exec(str);

Related

Using RegEx to extract a string in a URL

I've tried to search for this and I'm sure versions of this question have been asked, but I haven't been able to apply other answers to my case.
I need to use RegEx to extract a random string of characters and symbols that appears in the URL when an advertiser sends traffic to me.
The referring URL looks something like this, with the part I want to extract in bold:
https://adclick.g.doubleclick.net/pcs/click%**long-string-of-characters-and-symbols**https://www.mywebsite.com
That long string of characters and symbols (the hash) contains multiple % signs so I need the entire string after the first % sign, but before my website's URL.
I've been pulling my hair out on this and any help would be appreciated!

You can use:
(?<=%).*(?=https)
How it works:
(?<=%) Positive lookbehind: search for a character preceeded by %
.* matches everything until...
(?=https): the first https occurs (Positive lookhead)

Regex ignore first 12 characters from string

I'm trying to create a custom filter in Google Analytic to remove the query parts of the url which I don't want to see. The url has the following structure
[domain]/?p=899:2000:15018702722302::NO:::
I would like to create a regex which skips the first 12 characters (that is until:/?p=899:2000), and what ever is going to be after that replace it with nothing.
So I made this one: https://regex101.com/r/Xgbfqz/1 (which could be simplified to .{0,12}) , but I actually would like to skip those and only let the regex match whatever is going to be after that, so that I'll be able to tell in Google Analytics to replace it with "".
The part in the url that is always the same is
?p=[3numbers]:[0-4numbers]
Thank you

Your regular expression:
\/\?p=\d{3}\:\d{0,4}(.*)
Tested in Golang RegEx 2 and RegEx101
It search for /p=###:[optional:####] and capture the rest of the right side string.
(extra) JavaScript:
paragraf='[domain]/?p=899:2000:15018702722302::NO:::'
var regex= /\/\?p=\d{3}\:\d{0,4}(.*)/;
var match = regex.exec(paragraf);
alert('The rest of the right side of the string: ' + match[1]);

Easily use "[domain]/?p=899:2000:15018702722302::NO:::".substr(12)

You can try this:
/\?p\=\d{3}:\d{0,4}
Which matches just this: ?p=[3numbers]:[0-4numbers]
Not sure about replacing though.
https://regex101.com/r/Xgbfqz/1

Regular Expression for String without a "?" character to redirect to string with "?" character

On our website we occasionally experience an error where dynamic links aren't building correctly.
URLs like this
https://www.test.url.edu/collections/&edan_fq[]=p.edanmdm.indexedstructured.object_type:%22Financial+records%22&edan_fq[]=p.edanmdm.descriptivenonrepeating.record_id:item_*
Should actually be this:
https://www.test.url.edu/collections/search?edan_fq[]=p.edanmdm.indexedstructured.object_type:%22Financial+records%22&edan_fq[]=p.edanmdm.descriptivenonrepeating.record_id:item_*
We want to create a regular expression to redirect
/collections/&edan_fq[]=
to
/collections/search?edan_fq[]=
But everything after "edan_fq[]=" can change dynamically--there are thousands of permutations of the string after that point.
Does anyone know how this would be done?

If you use \& without Global Flag in Regex it will give first match. I've used JavaScript, please check this.
var data = "https://www.test.url.edu/collections/&edan_fq[]=p.edanmdm.indexedstructured.object_type:%22Financial+records%22&edan_fq[]=p.edanmdm.descriptivenonrepeating.record_id:item_*";
var regex = /\&/
data = data.replace(regex,"search?");
console.log(data);
Please check Substitution example in Regex101.

Regex URI portion: Remove hyphens

I have to split URIs on the second portion:
/directory/this-part/blah
The issue I'm facing is that I have 2 URIs which logically need to be one
/directory/house-&-home/blah
/directory/house-%26-home/blah
This comes back as:
house-&-home and house-%26-home
So logically I need a regex to retrieve the second portion but also remove everything between the hyphens.
I have this, so far:
/[^(/;\?)]*/([^(/;\?)]*).*

(?<=directory\/)(.+?)(?=\/)
Does this solve your issue? This returns:
house-&-home and house-%26-home
Here is a demo
If you want to get the result:
house--home
then you should use a replace method. Because I am not sure what language you are using, I will give my example in java:
String regex = (?<=directory\/)(.+?)(?=\/);
String str = "/directory/house-&-home/blah"
Pattern.compile(regex).matcher(str).replaceAll("\&", "");
This replace method allows you to replace a certain pattern ( The & symbol ) with nothing ""

django url matching non characters and charaters

suppose I have this url
url(r'^delete_group/(\w+)/', 'delete_group_view',name='delete_group')
In template
{%url 'delete_group' 'mwas'%} works but when I use
{%url 'delete_group' 'mwas 45'%} is not working. Any way to modify the url to accept both mwas and mwas 45

The issue might be your regex. The URL example you're showing has a space in it. \w won't match spaces. Try this instead: r'^delete_group/([\w\s]+)/ which allows either words or spaces in multiples.
However, know that spaces are not valid in URLs and will likely get converted to %20 or something similar. A best practice is to use hyphens where you would put a space.
I'd also point you at this answer to a similar question.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

RegEx to cut out URL - regex

Related

Using RegEx to extract a string in a URL

Regex ignore first 12 characters from string

Regular Expression for String without a "?" character to redirect to string with "?" character

Regex URI portion: Remove hyphens

django url matching non characters and charaters

Categories

Resources