How i can get PATH from URL with Regex? - regex

Maybe somebody can help me with this regex ?
.*\:\/\/(?:www.)?([^\/]+)(\/.+")
I need to get all paths from URL. I tried, but i can't match only path without quotation mark
https://regex101.com/r/J6nILD/6

You can get the path using JSR223 Sampler with Groovy code.
Declare/ get the URL variable
Parse that URL to get protocol, host, port and path. Use JSR223 Sampler and paste the following code in Script area
URL url1 = new URL(vars.get('url'));
vars.put('protocol', url1.getProtocol());
vars.put('host', url1.getHost());
vars.put('port', url1.getPort() as String);
vars.put('path', url1.getPath());
vars.put('query', url1.getQuery());
Use that variables anywhere in the script using ${}

If you have to first scan for a URL:
I've attempted to provide a simple regex (overly simplified) that might work in your context, but you might have to modify it to provide some additional context. For example, x is a valid path and this regex will recognize it as such. But if you are trying to look for the path in a string such as <img src="x">, it will also recognize img as a valid url path. In that case, you would want perhaps:
/<img\s+src="((https?|ftp):\/\/[^\/]+)?(\/?[^?#\s"]*)/i
var regex = /\b((https?|ftp):\/\/[^\/]+)?(\/?[^?#\s]*)\b/i;
var s = 'http://example.com/a/b?x=1';
var result = regex.exec(s);
console.log(result[3]);
If the protocol and host potion of the URL are always present, then it becomes easier to distinguish URLs in just about any context by making the protocol and host not optional:
/\b((https?|ftp)://[^/]+)(/?[^?#\s]*)\b/i;

You could go for something like:
(?:([^:\\/?#]+):)?(?:\\/\\/([^\\/?#]*))?([^?#]*)(?:\\?([^#]*))?(?:#(.*))?
Demo:
More information:
JMeter: Regular Expressions
Using RegEx (Regular Expression Extractor) with JMeter
Perl 5 Regex Cheat sheet

Related

How to replace part of a URL with regex

I need to remove part of a URL with a regex.
From the words: http or https to the word .com.
And it can be several times in one string.
Can anyone help me with this?
For example a string:
"The request is:https://stackoverflow.com/questions"
After the removal - "The request is:/questions"
The regex that performed the deletion perfectly is: (#"\w+://[^/$]*")
with replace "".
Something like that:
var regex = new Regex(#"\w+:\/\/[^\/$]*");
regex.Replace(url, "");
You can use the re.sub() function from the regex package. Alternatively if your working with python you can use urlparse package to extract different parts of the url and concatenate it to the prefix you want.

Regular expressions (RegEx) to filter string from URLs in Google Analytics

I want to filter a string from the URLs in Google Analytics. This can be done using the Views > Filter > Exclude using RegEx, but I have been unable to get it to work.
An outline of how these filters are set up, can be found here, however, I can not work out how to isolate the string using RegEx. I believe it will need to be one filter per URL type.
The URLs follow this format:
/software/11F372288FA/pagename
/software/13F412C5FA/pagename/summary
/software/XIL1P0BFXCKM81/pagename2
I need to exclude this part of the URL:
/11F372288FA/
So that the URL data (e.g. Session time) is recorded against:
/software/pagename
/software/pagename/summary
/software/pagename2
I have worked out that I can isolate the string using thing following RegEx
^\/validate\/(..........)\/accounts\/summary$
It is not very elegant and would require a filter for every URL type.
Thanks for the help!
I'm not certain if this will work in your exact case but instead of using regex for this it might be easier to just create a new string from the start to the end of "software" and append everything from pagename to the end. In Java this might look something like:
String newString = oldString.substring(0, 9) + oldString.substring(oldString.indexOf("pagename"));
Take note though that this will only work if the "software" at the start is always the same length and you are actually only excluding things between "software" and "pagename".

Regex for this URL, http://www.chip.de and this domain chip.de

I am trying to create a regex to look for similar URL and domain like this below
*chip.de
http://www.chip.de*
I tried to use the regex expression
http?:\/\/([\w\.-]+)([\/\w \.-]*)
It did not capture the URL.
I tried to use the url, https://www.regextester.com/99497 to test it out and it failed..
What am I missing?
Please create two rules for domain and URL
Thank you
If you're simply looking for regex that will match URLs which include chip.de then please try this and let me know if it is sufficient:
https?\:\/\/www\.chip\.de.*

Regular Expression for String without a "?" character to redirect to string with "?" character

On our website we occasionally experience an error where dynamic links aren't building correctly.
URLs like this
https://www.test.url.edu/collections/&edan_fq[]=p.edanmdm.indexedstructured.object_type:%22Financial+records%22&edan_fq[]=p.edanmdm.descriptivenonrepeating.record_id:item_*
Should actually be this:
https://www.test.url.edu/collections/search?edan_fq[]=p.edanmdm.indexedstructured.object_type:%22Financial+records%22&edan_fq[]=p.edanmdm.descriptivenonrepeating.record_id:item_*
We want to create a regular expression to redirect
/collections/&edan_fq[]=
to
/collections/search?edan_fq[]=
But everything after "edan_fq[]=" can change dynamically--there are thousands of permutations of the string after that point.
Does anyone know how this would be done?
If you use \& without Global Flag in Regex it will give first match. I've used JavaScript, please check this.
var data = "https://www.test.url.edu/collections/&edan_fq[]=p.edanmdm.indexedstructured.object_type:%22Financial+records%22&edan_fq[]=p.edanmdm.descriptivenonrepeating.record_id:item_*";
var regex = /\&/
data = data.replace(regex,"search?");
console.log(data);
Please check Substitution example in Regex101.

Replacing Anchor in JSTL

In my JSP I have receive some data which is coming from database my data is for example something like this :
Google is the greatest search engine ever http://www.google.com
what I wanna do is so simple: I want to make this link wrap in anchor tag using JSTL something like:
Google is the greatest search engine ever http://www.google.com
that's all !
take note that the urls are not constant, I mean I'm not sure what that be exactly & I just mentioned google here for the example.
Follow this SO question to create a replaceAll function for JSTL and then use the following pattern to replace the url to html link:
String pattern = "(http:[A-z0-9./~%]+)";
String str = "Google is the http://www.test.com greatest search engine ever http://www.google.com";
String replaced = str.replaceAll(pattern, "<a href='$1'>$1</a>");