regex - exclude substring contains more than 2 "/" - regex

I have a list of the following strings:
/fajwe/conv_1/routing/apwfe/afjwepfj
/fajwe/conv_2/routing/apwfe
/fajwe/conv_2/routing
/fajwe/conv_3/routing/apwfe/afjwepfj/awef
/fajwe/conv_4/routing/apwfe/afjwepfj/awef/0o09
I want a regex to only match string contains no more than 1 / after the word routing. Namely /fajwe/conv_2/routing/apwfe and /fajwe/conv_2/routing.
Currently I use the regex ^((?!rou\w+(\/\w+){2,}).)*$ but it matches nothing. How can I write a regex to exclude strings contains more than 2 / after the word routing?
I would love to learn how to achieve this using Negative Lookbehind. Many thanks!

Something like this?
^.*\/routing(\/[^\/]*){0,1}$

routing(\/[^\/]*)?$
there you go
https://regex101.com/r/KjE8ed/1/

Your regex matches what you are looking for with the multiline flag m as #revo pointed out.
^((?!rou\w+(\/\w+){2,}).)*$
You could also try it like this:
^\/fajwe\/conv_\d\/routing(?:\/[^\/]+)?$
Depending of your context of language you could \/ escape the forward slash

Related

Split complex string into mutliple parts using regex

I've tried a lot to split this string into something i can work with, however my experience isn't enough to reach the goal. Tried first 3 pages on google, which helped but still didn't give me an idea how to properly do this:
I have a string which looks like this:
My Dogs,213,220#Gallery,635,210#Screenshot,219,530#Good Morning,412,408#
The result should be:
MyDogs
213,229
Gallery
635,210
Screenshot
219,530
Good Morning
412,408
Anyone have an idea how to use regex to split the string like shown above?
Given the shared patterns, it seems you're looking for a regex like the following:
[A-Za-z ]+|\d+,\d+
It matches two patterns:
[A-Za-z ]+: any combination of letters and spaces
\d+,\d+: any combination of digits + a comma + any combination of digits
Check the demo here.
If you want a more strict regex, you can include the previous pattern between a lookbehind and a lookahead, so that you're sure that every match is preceeded by either a comma, a # or a start/end of string character.
(?<=^|,|#)([A-Za-z ]+|\d+,\d+)(?=,|#|$)
Check the demo here.

Regex for internal URL

I'm trying to create a regex that matches with internal URLs (the ones that don't include the domain or http) that I can find in a file like this one:
category/subcategory/sub-subcategory/item-1
For that I'm using:
/\w+\/.+\/[\w\-]+/
But some URLs are like this:
category/subcategory
And I need a regular expression that also catch those. Do I have to create a different one or is it possible to create one that match both examples? Is for a BASH script but if you have an idea it does not matter if it is for other engine.
Thank you!!
Update: I forgot the context. Each line of the file is like this:
"11","category/subcategory/sub-subcategory/item-1","index.php?option=com_trombinoscopeextended&Itemid=125&lang=es&view=trombinoscope","251","0","0000-00-00","","","","","","","0"
Or like this:
"4","category/subcategory","index.php?option=com_trombinoscopeextended&Itemid=121&lang=es","0","1","0000-00-00","","","","","","","0"
I need to extract the examples for each line.
Thanks.
You may use
/\w+(\/[\w-]+)+/
See the regex demo.
Details
\w+ - 1+ word chars
(\/[\w-]+)+ - 1 or more consecutive sequences of
\/ - a / char
[\w-]+ - 1+ word or - chars.
A hint: you might read in your string with a kind of a CSV parser using your preferred language, and then only return fields that match ^\w+(\/[\w-]+)+$ pattern (here, ^ matches the start of the string and $ matches the end of the string).
That is pretty specific. I came up with this one after some testing. We have subdomains we need to check for as well.
(?!https?:)/?[^/][^/].*|(https?:)?//([^.]*\.)?yourdomain\.com(/.*)?
Someone can probably make it better, but this works for me.

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

RegEx substract text from inside

I have an example string:
*DataFromAdHoc(cbgv)
I would like to extract by RegEx:
DataFromAdHoc
So far I have figured something like that:
^[^#][^\(]+
But Unfortunately without positive result. Do you have maybe any idea why it's not working?
The regex you tried ^[^#][^\(]+ would match:
From the beginning of the string, it should not be a # ^[^#]
Then match until you encounter a parenthesis (I think you don't have to escape the parenthesis in a character class) [^\(]+
So this would match *DataFromAdHoc, including the *, because it is not a #.
What you could do, it capture this part [^\(]+ in a group like ([^(]+)
Then your regex would look like:
^[^#]([^(]+)
And the DataFromAdHoc would be in group 1.
Use ^\*(\w+)\(\w+\)$
It just gets everything between the * and the stuff in brackets.
Your answer may depend on which language you're running your regex in, please include that in your question.

Regular expression to match line containing some strings and not others

I have lines like this:
example.com/p/stuff/...
example.com/page/thing/...
example.com/page/stuff/...
example.com/page/other-stuff/...
etc
where the dots represent continuing URL paths. I want to select URLs that contain /page/ and are NOT followed by thing/. So from the above list we would select:
example.com/page/stuff/...
example.com/page/other-stuff/...
.*?\/page\/[^(thing)].*
this is the regex for matching a string which has /page/ not followed by thing
adding the lazy evalation is suggested because you advance a char at the time, better performance!
You need to use negative lookahead:
example\.com\/page\/(?!thing\/).*
Demo
Use the following regex pattern:
.*?\/page\/(?!thing\/).*
https://regex101.com/r/19wh1w/2
(?!thing\/) - negative lookahead assertion ensures that page/ section is not followed by thing/