Parse This or This Using Regex

Parse This or This Using Regex - regex

I simply can't figure this out and have been trying for awhile. I need a regex that will parse data in the following manner:
Lets say I've got an input in the following format:
www.google.com
www.google.com/
www.google.com/something
I need a regex that will parse the above three URLs (individually) to final result of:
www.google.com
www.google.com
www.google.com
However, the way it needs to match them, is based on the following:
Parse and return everything to the left of a "/" if one exists in the line
Parse and return the entire line if no "/" exists in the line
I'm new to regex, so while this may be simple, I can't figure it out.

Try the following regex:
[^/]*

Substiture /.* with nothing. It doesn't matter if there isn't a / at all since in this case the regex will not match.

Related

Multiple slash in URL replacement though regex

I am trying to create a regex in pcre, that is going to salinize URL with multiple slashes like the following:
https://www.domin.com/test1/////test2/somemoretests_67142 https://www.domin.com/test1/test2/somemoretests_67142///// https://www.domin.com/test1/test2///somemoretests_67142
So that I can replace it with the following: https://\2\4 and the link at the end of it looks: https://www.domin.com/test1/test2/somemoretests_67142
I have been struggling with it for the past couple of days, so any regex guru help is more than welcome :)
I have tried the following and more:
(http|https):\/\/(.*)(\/\/+)(.*)
(http|https):\/\/(.*)(\/\/){2,}(.*)
(http|https):\/\/(.*)(\/\/{2})(.*)
I am going to utilize these for Akamai to sanitize our URLs though cloudlet.

You can try:
(?<!https:\/)(?<!http:\/)(\/+$|(?<=\/)\/+)
And substitute the first group with empty string.
Regex demo.
This will produce this output:
https://www.domin.com/test1/test2/somemoretests_67142
https://www.domin.com/test1/test2/somemoretests_67142
https://www.domin.com/test1/test2/somemoretests_67142

How to replace part of a URL with regex

I need to remove part of a URL with a regex.
From the words: http or https to the word .com.
And it can be several times in one string.
Can anyone help me with this?
For example a string:
"The request is:https://stackoverflow.com/questions"
After the removal - "The request is:/questions"

The regex that performed the deletion perfectly is: (#"\w+://[^/$]*")
with replace "".
Something like that:
var regex = new Regex(#"\w+:\/\/[^\/$]*");
regex.Replace(url, "");

You can use the re.sub() function from the regex package. Alternatively if your working with python you can use urlparse package to extract different parts of the url and concatenate it to the prefix you want.

How can I write a opposite regex to this regex?

this is a regex of a proxy, if I add this to my proxy:
(.*\.|)(abc|google)\.(org|net)
my proxy will not transmit the abc.org, abc.net, google.org, google.net's traffic.
how can I write a regex opposite to this regex? I mean only transmit the abc.org, abc.net, google.org, google.net's traffic.
EDIT-01
My thought is just want to transmit abc.org or www.abc.org, how can I do with that?

Try this:
^(?!(www\.)?(?:abc|google)\.(?:net|org)).*
Demo: https://regex101.com/r/WOnFx8/3/
I used ?! to reverse the matching of your regex. This way, it will match any domain except these specific 4 domains.
Another way to do it is by using this code to include anything before the desired domains:
^(?!(.*\.|)(?:abc|google)\.(?:net|org)).*
demo: https://regex101.com/r/WOnFx8/4/

Your regex you write
(.*\.|)(abc|google)\.(org|net)
mean any string is one of abc.org, gooogle.org, abc.net, google.net, with optional prefix string ends with dot (.)
Like: test.google.org, sub.abc.net,...
I think you want to match string like test.yahoo.com, but not test.google.org. If you can use negative look ahead, this is the answer:
^(.*\.|)(?!(abc|google)\.(org|net))\w+\.\w+$
Explain:
^ and $ to be sure your match is entire url string
Negative look ahead is to check the url is not something like abc.org, abc.net, google.org, google.net
And \w+\.\w+ to check the remain string is kind of URL type (something likes yahoo.com, etc...)

Im going to assume you have lookaheads, if so then you can simply use -
(^.*?\.(?!(abc|google))\w+\.(?:org|net)$)
Demo - https://regex101.com/r/5eC41R/3
What this does is -
Looks for the start of the url (till the first .)
Checks that next part is not abc or google
looks for the next section (till the next .)
Looks for a closing org or net
Note that since it is a lookahead it will be slow compared to other regex matches

How to stop regex at the end of line

trying to figure out next case:
I have txt file with parameters
environment=trank
Browser=iexplore
id=1988
Url=www.google.com
maautomate=no
When I parse this txt file with regex pattern like
/environment=([^\s]+)/
I got "trankBrow" as result, or
/Url=([^\s]+)/
I got www.google.commaautomate=no
So why second parameters appended? And how to get "trank" only?

environment=([^\\s]+)
You need to use this. \s in your case is escaping s and so the output is trankBrow because after that s is there.

mod rewrite to match number at end of string

I'm trying to write a bit mod rewrite regex and getting stuck.
I want to parse the following url structure to give variables to my script:
dogs/staffordshire-brown-hampshire-32
(32 is the variable I want to pass)..
index.php?type=dogs&id=$matches[1]
I have this so far which doesn't seem to work
dogs/([^-]*$)
Any ideas? Thanks

That will match the entire string after dogs if it doesn't contain a hyphen. Try:
dogs/.*?([0-9]+)$

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parse This or This Using Regex - regex

Try the following regex: [^/]*

Substiture /.* with nothing. It doesn't matter if there isn't a / at all since in this case the regex will not match.

Related

Multiple slash in URL replacement though regex

How to replace part of a URL with regex

How can I write a opposite regex to this regex?

How to stop regex at the end of line

mod rewrite to match number at end of string

Categories

Resources