Regex: get string after first occurrence of a character (including it) - regex

I'm trying to get some old links from my site to redirect to the new ones with a 301 redirect instruction. What I need to accomplish is to remove the first part of the string until it matches a hyphen and remove it (including the hyphen)
Example:
http://example.com/19731-la-preservacion-de-la-biodiversidad-es-crucial-para-frenar-la-desertificacion-en-zonas-aridas
or
http://example.com/633-afecta-la-crisis-alimentaria-ya-a-miles-de-personas
Should output to:
http://example.com/la-preservacion-de-la-biodiversidad-es-crucial-para-frenar-la-desertificacion-en-zonas-aridas
http://example.com/afecta-la-crisis-alimentaria-ya-a-miles-de-personas
I have tried so far with
RewriteRule ^[^-|-](.*)$ $1 and
RewriteRule ^([^-]*-)(.*)$ $1 but I can't seem to get it to work.
Thanks!

To get a substring after the first occurrence of some character including it you may use a negated character class that will match any char(s) other than that character, and then you need to start a capturing group, place the char as the first atom in it, and add .*) after:
^[^-]*(-.*)$
Here, ^[^-]*(-.*)$ matches a whole string, and the first - with all the chars after it landing in Group 1 ($1 replacement in RewriteRule).
See the regex demo
Details
^ - start of string
[^-]* - zero or more chars other than - (negated character class)
(-.*) - Group 1 ($1): - and then any 0+ chars
$ - end of string.

Try:
(.*?\/)\d+-(.*)
Replace:
$1$2
Check This

Related

Regex to pick a value from url

I am having difficulty to build a regex which can extract a value from the URL. The condition is get the value between after last "/" and ".html" Please help
Sample URL1 - https://www.example.com/fgf/sdf/sdf/as/dwe/we/bingo.html - The value I want to extract is bingo
Sample URL2 - www.example.com/we/b345g.html - The value I want to extract is b345g
I tried to build a regex and I was able to get "bingo.html" and "b345g.html using [^\/]+$ but was not able to remove or skip ".html"
Here you are:
\/([^\/]+?)(?>\..+)?$
Explaination:
\/ - literal character '/'
([^\/]+?) - first group: at least one character that is not a '/' with greedyness (match only the first expansion)
[^\/] - any character that is not a '/'
+ - at least one occurence
? - greediness operator (match only first expansion)
(?>\..+)? - second optional group: '.' + any character (like '.html' or '.exe' or '.png')
?> - non-capturing lookahead group (exclude the content from the result)
\. - literal character '.'
. - any character (except line terminators)
+ - at least one occurence
? - optionality (note that this one is outside the parenthesis)
$ - end of the string
If you want also to exclude query strings you can expand it like this:
\/([^\/]+?)(?>\..+)?(?>\?.*)?$
If you also need to remove the protocol part of the url you can use this:
(?<!\/)\/([^\/]+?)(?>\..+)?(?>\?.*)?$
Where this (?<!\/) just look if there are no '/' before the start of the match
You are only matching using [^\/]+$ but not differentiating between the part before and after the dot.
To make that different, you could use for example a capture group to get the part after the last slash and before the first dot.
\S*\/([^\/\s.]+)\.[^\/\s]+$
\S*\/ Match optional non whitespace chars till the last occurrence of /
([^\/\s.]+) Capture group 1 Match 1+ times any char except a / whitespace char or .
\. Match a dot
[^\/\s]+ Match 1+ times any char except a / whitespace char or .
$ End of string
See a regex demo.

How do I make this regular expression not match anything after forward slash /

I have this regular expression:
/^www\.example\.(com|co(\.(in|uk))?|net|us|me)\/?(.*)?[^\/]$/g
It matches:
www.example.com/example1/something
But doesn't match
www.example.com/example1/something/
But the problem is that, it matches: I do not want it to match:
www.example.com/example1/something/otherstuff
I just want it to stop when a slash is enountered after "something". If there is no slash after "something", it should continue matching any character, except line breaks.
I am a new learner for regex. So, I get confused easily with those characters
You may use this regex:
^www\.example\.(?:com|co(?:\.(?:in|uk))?|net|us|me)(?:\/[^\/]+){2}$
RegEx Demo
This will match following URL:
www.example.co.uk/example1/something
You can use
^www\.example\.(?:com|co(?:\.(?:in|uk))?|net|us|me)\/([^\/]+)\/([^\/]+)$
See the regex demo
The (.*)? part in your pattern matches any zero or more chars, so it won't stop even after encountering two slashes. The \/([^\/]+)\/([^\/]+) part in the new pattern will match two parts after slash, and capture each part into a separate group (in case you need to access those values).
Details:
^ - start of string
www\.example\. - www.example. string
(?:com|co(?:\.(?:in|uk))?|net|us|me) - com, co.in, co.uk, co, net, us, me strings
\/ - a / char
([^\/]+) - Group 1: one or more chars other than /
\/ - a / char
([^\/]+) - Group 2: one or more chars other than /
$ - end of string.

Regex to properly match urls with a particular domain and also if there is a subdomain added

I have the following regex:
(^|^[^:]+:\/\/|[^\.]+\.)hello\.net
Which seems to work fors most cases such as these:
http://hello.net
https://hello.net
http://www.hello.net
https://www.hello.net
http://domain.hello.net
https://solutions.hello.net
hello.net
www.hello.net
However it still matches this which it should not:
hello.net.domain.com
You can see it here:
https://regex101.com/r/fBH112/1
I am basically trying to check if a url is part of hello.net. so hello.net and any subdomains such as sub.hello.net should all match.
it should also match hello.net/bye. So anything after hello.net is irrelevant.
You may fix your pattern by adding (?:\/.*)?$ at the end:
(^|^[^:]+:\/\/|[^.]+\.)hello\.net(?:\/.*)?$
See the regex demo. The (?:\/.*)?$ matches an optional sequence of / and any 0 or more chars and then the end of string.
You might consider a "cleaner" pattern like
^(?:\w+:\/\/)?(?:[^\/.]+\.)?hello\.net(?:\/.*)?$
See the regex demo. Details:
^ - start of string
(?:\w+:\/\/)? - an optional occurrence of 1+ word chars, and then :// char sqequence
(?:[^\/.]+\.)? - an optional occurrence of any 1 or more chars other than / and . and then .
hello\.net - hello.net
(?:\/.*)?$ - an optional occurrence of / and then any 0+ chars and then end of string

regex check if url has trailing slash and params

i need a check on regex, in particular:
/it/categoria/diritti-e-ugualianza
/it/categoria/diritti-e-ugualianza/
/it/categoria/diritti-e-ugualianza?i=1
/it/categoria/diritti-e-ugualianza/?i=1
must be checked in a unique rule
i try with this
/it/categoria/diritti-e-ugualianza(?:/(.*))?$
but it works only with
/it/categoria/diritti-e-ugualianza
/it/categoria/diritti-e-ugualianza/
exists a way to ignore also params?
thank you
You may replace / with a character class [/?] that matches either ? or /:
/it/categoria/diritti-e-ugualianza(?:[?/](.*))?$
^^^^
See the regex demo.
Details
/it/categoria/diritti-e-ugualianza - a literal substring
(?:[?/](.*))? - an optional group matching 1 or 0 occurrences of
[?/] - a ? or /
(.*) - Capturing group 1: any 0+ chars to the end of the line
$ - end of string.

Get the first ocurrence of a string in a variable REGEX

I have the following variable in a database: PSC-CAMPO-GRANDE-I08-V00-C09-H09-IPRMKT and I want to split it into two variables, the first will be PSC-CAMPO-GRANDE-I08 and the second V00-C09-H09-IPRMKT.
I'm trying the regex .*(\-I).*(\-V), this doesn't work. Then I tried .*(\-I), but it gets the last -IPRMKT string.
Then my question is: There a way of split the string PSC-CAMPO-GRANDE-I08-V00-C09-H09-IPRMKT considering the first occurrence of -I?
This should do the trick:
regex = "(.*?-I[\d]{2})-(.*)"
Here is test script in Python
import re
regex = "(.*?-I[\d]{2})-(.*)"
match = re.search(regex, "PSC-CAMPO-GRANDE-I08-V00-C09-H09-IPRMKT")
if match:
print ("yep")
print (match.group(1))
print (match.group(2))
else:
print ("nope")
In the regex, I'm grabbing everything up to the first -I then 2 numbers. Then match but don't capture a -. Then capture the rest. I can help tweak it if you have more logic that you are trying to do.
You may use
^(.*?-I[^-]*)-(.*)
See the regex demo
Details:
^ - start of a string
(.*?-I[^-]*) - Group 1:
.*? - any 0+ 0+ chars other than line break chars up to the first (because *? is a lazy quantifier that matches up to the first occurrence)
-I - a literal substring -I
[^-]* - any 0+ chars other than a hyphen (your pattern was missing it)
- - a hyphen
(.*) - Group 2: any 0+ chars other than line break chars up to the end of a line.