Get website regex from a website link - regex

if I have a website like: www.google.com/en/my-page/anotherpage
how is it possible that with reg-ex to get: /en/my-page ? I am using this reg-ex in the IIS?
So far I have done something similar to this:
^(?:\\.|[^/\\])*/((?:\\.|[^/\\])*)/
but it is returning /en/my-page/ and I want it to return /en/my-page

In grep your regex is returning the string "www.google.com/en/". You can simply use the following regex if positive look behind is not mandatory :
(/[^/]+)+

You could use a look-ahead assertion to get rid of the last slash:
/\/.*(?=\/)/

This one should suit your needs:
^[^/]+(/.*)/[^/]+$
Visualization by Debuggex.
The output your looking for is in the first captured group.
Demo on RegExr.

Related

Multiple slash in URL replacement though regex

I am trying to create a regex in pcre, that is going to salinize URL with multiple slashes like the following:
https://www.domin.com/test1/////test2/somemoretests_67142 https://www.domin.com/test1/test2/somemoretests_67142///// https://www.domin.com/test1/test2///somemoretests_67142
So that I can replace it with the following: https://\2\4 and the link at the end of it looks: https://www.domin.com/test1/test2/somemoretests_67142
I have been struggling with it for the past couple of days, so any regex guru help is more than welcome :)
I have tried the following and more:
(http|https):\/\/(.*)(\/\/+)(.*)
(http|https):\/\/(.*)(\/\/){2,}(.*)
(http|https):\/\/(.*)(\/\/{2})(.*)
I am going to utilize these for Akamai to sanitize our URLs though cloudlet.
You can try:
(?<!https:\/)(?<!http:\/)(\/+$|(?<=\/)\/+)
And substitute the first group with empty string.
Regex demo.
This will produce this output:
https://www.domin.com/test1/test2/somemoretests_67142
https://www.domin.com/test1/test2/somemoretests_67142
https://www.domin.com/test1/test2/somemoretests_67142

Regex to remove everything after -i- (with -i-)

I was trying to find solution for my problem.
Input: prd-abcd-efgh-i-0dflnk55f5d45df
Output: prd-abcd-efgh
Tried Splunk Query : index=aws-* (host=prd-abcd-efgh*) | rex field=host "^(?<host>[^.]+)"| dedup host | stats count by host,methodPath
I want to remove everything comes after "-i-" using simple regex.I tried with regex "^(?[^.]+)" listed here
https://answers.splunk.com/answers/77101/extracting-selected-hosts-with-regex-regex-hosts-with-exceptions.html
Please help me to solve it.
replace(host, "(?<=-i-).*", "")
Example here: https://regex101.com/r/blcCcQ/2
This (?<=-i-) is a lookbehind
I have no knowledge of Splunk. but the normal way to do that would be to match the part you don't want and replace it with an empty string.
The regex for doing that could be:
-i-.*
Then replace the match with an empty string.
Something simple like this should work:
([a-z-]+)-i-.+
The first capture group will return only the part preceding -i-.

Conditional Regex to match url

I am trying to make a if/then condition to match the url, but I can't seem to get it to work. I am trying to match URLs and then capture the non-optional group. So - if a url comes in like this:
/en/testing.aspx
I want to capture /testing.aspx
if the url comes in like this:
/testing.aspx
I want to capture /testing.aspx
Is there an easy way to do this using regex?
EDIT:
The Url can be multi-part url, like /en/sub1/sub2/testing.aspx - I essentially want everything after "/en/".
use regex \/en(\/.+)$
Check this out
edited
https://regex101.com/r/lwowhi/6
If there is "/en/" in the URL and you still want to capture /testing.aspx then here is an edit (?:\/en)*(\/.+)$
https://regex101.com/r/lwowhi/8
You can use a greedy regex which will consume everything up until the final forward slash. Then, capture everything which comes after that point.
^.*?(?:\/en)?(\/.*)$
Demo
Guessing all pages are .aspx then use group.
regex: .(/..aspx)
this will match "/testing.aspx" in all bellow samples
/testing.aspx or
/en/testing.aspx or
www.abc.com/en-us/testing.aspx

regex to allow certain input values

I want to allow input values as A+,B+,A-,B- or 2 decimal values like 100.00, 90.0 like this
how to write regex for above input? simply I want to allow grades(A+,A-,B+,B-),decimal values (10.05,20.00).
The below regex will helpful to you:
[AB][+-]|\d{2}\.\d{2}
Description and Demo At: Demo
For what I am seeing, I would use this regex (I bet you can optimize it).
^([A-GOa-go][+-])|((\d{1,2}(?!\d)\.\d{2}|100\.00),(\d{1,2}(?!\d)\.\d{2}|100\.00))$
Here is the demo
Try this:
([AB][+-]|(100|\d{2})\.\d{2})
This, in my opinion, will work for what you are expecting
Online test : RegExr.com
EDIT :
Following what you are expecting for, i suggest you this regex :
^([AB][+-]|(100|\d{2})\.\d{2})$
Will match only if the entire string matches, and no longer return a 02.00 match for 102.00 (for example)

regular expression : get super scripted text

I would like to get super scripted text via following html string.
testing to <sup>supers</sup>cript o<sup>n</sup>e
The result I would like to get is like below
supers
n
This is what I tried right now
But the result is not what I want.
<sup>supers
<sup>n
Could anyone give me suggestion please?
You can use lookbehind in your regex:
(?<=<sup>)[^<]*
Update Demo
Use this if there may be other HTML tags between <sup> and </sup>:
(?<=<sup>)(.*?)(?=<\/sup>)
Check the demo.
You were close, just not capturing your match:
Updated regex
(?:<sup>)([^<]*) I just added a capture group around your match
(?<=<sup>)([^<]*?)(?=<\/)
This should work.
See demo.
http://regex101.com/r/sA7pZ0/13