How to replace part of a URL with regex

How to replace part of a URL with regex - regex

I need to remove part of a URL with a regex.
From the words: http or https to the word .com.
And it can be several times in one string.
Can anyone help me with this?
For example a string:
"The request is:https://stackoverflow.com/questions"
After the removal - "The request is:/questions"

The regex that performed the deletion perfectly is: (#"\w+://[^/$]*")
with replace "".
Something like that:
var regex = new Regex(#"\w+:\/\/[^\/$]*");
regex.Replace(url, "");

You can use the re.sub() function from the regex package. Alternatively if your working with python you can use urlparse package to extract different parts of the url and concatenate it to the prefix you want.

Related

How i can get PATH from URL with Regex?

Maybe somebody can help me with this regex ?
.*\:\/\/(?:www.)?([^\/]+)(\/.+")
I need to get all paths from URL. I tried, but i can't match only path without quotation mark
https://regex101.com/r/J6nILD/6

You can get the path using JSR223 Sampler with Groovy code.
Declare/ get the URL variable
Parse that URL to get protocol, host, port and path. Use JSR223 Sampler and paste the following code in Script area
URL url1 = new URL(vars.get('url'));
vars.put('protocol', url1.getProtocol());
vars.put('host', url1.getHost());
vars.put('port', url1.getPort() as String);
vars.put('path', url1.getPath());
vars.put('query', url1.getQuery());
Use that variables anywhere in the script using ${}

If you have to first scan for a URL:
I've attempted to provide a simple regex (overly simplified) that might work in your context, but you might have to modify it to provide some additional context. For example, x is a valid path and this regex will recognize it as such. But if you are trying to look for the path in a string such as <img src="x">, it will also recognize img as a valid url path. In that case, you would want perhaps:
/<img\s+src="((https?|ftp):\/\/[^\/]+)?(\/?[^?#\s"]*)/i
var regex = /\b((https?|ftp):\/\/[^\/]+)?(\/?[^?#\s]*)\b/i;
var s = 'http://example.com/a/b?x=1';
var result = regex.exec(s);
console.log(result[3]);
If the protocol and host potion of the URL are always present, then it becomes easier to distinguish URLs in just about any context by making the protocol and host not optional:
/\b((https?|ftp)://[^/]+)(/?[^?#\s]*)\b/i;

You could go for something like:
(?:([^:\\/?#]+):)?(?:\\/\\/([^\\/?#]*))?([^?#]*)(?:\\?([^#]*))?(?:#(.*))?
Demo:
More information:
JMeter: Regular Expressions
Using RegEx (Regular Expression Extractor) with JMeter
Perl 5 Regex Cheat sheet

Regex ignore first 12 characters from string

I'm trying to create a custom filter in Google Analytic to remove the query parts of the url which I don't want to see. The url has the following structure
[domain]/?p=899:2000:15018702722302::NO:::
I would like to create a regex which skips the first 12 characters (that is until:/?p=899:2000), and what ever is going to be after that replace it with nothing.
So I made this one: https://regex101.com/r/Xgbfqz/1 (which could be simplified to .{0,12}) , but I actually would like to skip those and only let the regex match whatever is going to be after that, so that I'll be able to tell in Google Analytics to replace it with "".
The part in the url that is always the same is
?p=[3numbers]:[0-4numbers]
Thank you

Your regular expression:
\/\?p=\d{3}\:\d{0,4}(.*)
Tested in Golang RegEx 2 and RegEx101
It search for /p=###:[optional:####] and capture the rest of the right side string.
(extra) JavaScript:
paragraf='[domain]/?p=899:2000:15018702722302::NO:::'
var regex= /\/\?p=\d{3}\:\d{0,4}(.*)/;
var match = regex.exec(paragraf);
alert('The rest of the right side of the string: ' + match[1]);

Easily use "[domain]/?p=899:2000:15018702722302::NO:::".substr(12)

You can try this:
/\?p\=\d{3}:\d{0,4}
Which matches just this: ?p=[3numbers]:[0-4numbers]
Not sure about replacing though.
https://regex101.com/r/Xgbfqz/1

How to extract FirstName and LastName from html tags with regex?

I have response body which contains
"<h3 class="panel-title">Welcome
First Last </h3>"
I want to fetch 'First Last' as a output
The regular expression I have tried are
"Welcome(\s*([A-Za-z]+))(\s*([A-Za-z]+))"
"Welcome \s*([A-Za-z]+)\s*([A-Za-z]+)"
But not able to get the result. If I remove the newline and take it as
"<h3 class="panel-title">Welcome First Last </h3>" it is detecting in online regex maker.

I suspect your problem is the carriage return between "Welcome" and the user name. If you use the "single-line mode" flag (?s) in your regex, it will ignore newlines. Try these:
(?s)Welcome(\s*([A-Za-z]+))(\s*([A-Za-z]+))
(?s)Welcome \s*([A-Za-z]+)\s*([A-Za-z]+)
(this works in jMeter and any other java or php based regex, but not in javascript. In the comments on the question you say you're using javascript and also jMeter - if it is a jMeter question, then this will help. if javaScript, try one of the other answers)

Well, usually I don't recommend regex for this kind of work. DOM manipulation plays at its best.
but you can use following regex to yank text:
/(?:<h3.*?>)([^<]+)(?:<\/h3>)/i
See demo at https://regex101.com/r/wA2sZ9/1
This will extract First and Last names including extra spacing. I'm sure you can easily deal with spaces.

In jmeter reg exp extractor you can use:
<h3 class="panel-title">Welcome(.*?)</h3>
Then take value using $1$.
In the data you shown welcome is followed by enter.If actually its part of response then you have to use \n.
<h3 class="panel-title">Welcome\n(.*?)</h3>
Otherwise above one is enough.
First verify this in jmeter using regular expression tester of response body.

Welcome([\s\S]+?)<
Try this, it will definitely work.

Regular expressions are greedy by default, try this
Welcome\s*([A-Za-z]+)\s*([A-Za-z]+)
Groups 1 and 2 contain your data
Check it here

RegEx to cut out URL

I try to get an URL from a String of the following format:
RANDOMRUBBISHhttps://www.my-url.com/randomfirstname_randomlastnameRANDOMRUBBISH
I already tried some things, especially the the look before/after, which I used before successfully on another url format (starts https... ends .html, this was working).
But seems I'm too stupid to figure out the regex for the kind of string mentioned above. I just want the URL part from https.... to the end of the random last name. Is this even possible?
Any Ideas?

If you can guarantee that randomfirstname_randomlastname is all lowercase and RANDOMRUBBISH is all uppercase, you can use character classes [a-z] and [A-Z]. The language the regex is for will determine how to use these.
This is example works in javascript:
var str = "RANDOMRUBBISHhttps://www.my-url.com/randomfirstname_randomlastnameRANDOMRUBBISH";
var match = /https:\/\/www\.my-url\.com\/[a-z]*/.exec(str);

How can I make this regex for a URL more specific?

I have the following regex that attempts to match URLs:
/((http|https):(([A-Za-z0-9$_.+!*(),;/?:#&~=-])|%[A-Fa-f0-9]{2}){2,}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*(),;/?:#&~=%-]*))?([A-Za-z0-9$_+!*();/?:~-]))/g
How can I modify this regex to only match URLs of a single domain?
For example, I only want to match URLs that begin with http://www.google.com?
This should simplify my regex, but I'm too much of a regex noob to get it working (after all these years...)

Did you write that RegEx? I don't know what it's trying to do, but it certainly doesn't match URLs correctly. Here's something it matches:
http:###9#?~
which I'm pretty sure isn't a valid URL.
You shouldn't be using RegEx to match URLs like this. You haven't said what language you're working in, but use whatever its equivalent of urlparse is..
Here's a relevant question: How do you validate a URL with a regular expression in Python?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to replace part of a URL with regex - regex

I need to remove part of a URL with a regex. From the words: http or https to the word .com. And it can be several times in one string. Can anyone help me with this? For example a string: "The request is:https://stackoverflow.com/questions" After the removal - "The request is:/questions"

The regex that performed the deletion perfectly is: (#"\w+://[^/$]") with replace "". Something like that: var regex = new Regex(#"\w+:\/\/[^\/$]"); regex.Replace(url, "");

You can use the re.sub() function from the regex package. Alternatively if your working with python you can use urlparse package to extract different parts of the url and concatenate it to the prefix you want.

Related

How i can get PATH from URL with Regex?

Regex ignore first 12 characters from string

How to extract FirstName and LastName from html tags with regex?

RegEx to cut out URL

How can I make this regex for a URL more specific?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to replace part of a URL with regex - regex

I need to remove part of a URL with a regex. From the words: http or https to the word .com. And it can be several times in one string. Can anyone help me with this? For example a string: "The request is:https://stackoverflow.com/questions" After the removal - "The request is:/questions"

The regex that performed the deletion perfectly is: (#"\w+://[^/$]*") with replace "". Something like that: var regex = new Regex(#"\w+:\/\/[^\/$]*"); regex.Replace(url, "");

You can use the re.sub() function from the regex package. Alternatively if your working with python you can use urlparse package to extract different parts of the url and concatenate it to the prefix you want.

Related

How i can get PATH from URL with Regex?

Regex ignore first 12 characters from string

How to extract FirstName and LastName from html tags with regex?

RegEx to cut out URL

How can I make this regex for a URL more specific?

Categories

Resources

The regex that performed the deletion perfectly is: (#"\w+://[^/$]") with replace "". Something like that: var regex = new Regex(#"\w+:\/\/[^\/$]"); regex.Replace(url, "");