Can't get REGEX to work ((?=(<\/))(.*?)(?:\>)) - regex

I have the following regex and it finds a partial solution for me:
((?=(<\/))(.*?)(?:\>))
Given the Following line:
</EventSubType><OrgUnitNm>###</OrgUnitNm></test:CommonAttributes><ProductArrangementBasic><ProductId>####</ProductId></part:Asking>
The regex will give me:
</EventSubType></OrgUnitNm></test:CommonAttributes></ProductId></part:Asking>
I want just:
</test: </part:
Any help would be great.
Thank you.

NOTE: if you are parsing HTML, you'd better use a dedicated library for that.
I suspect you are using the regex with PCRE /U tag, like /((?=(<\/))(.*?)(?:\>))/U.
If you want to obtain just </test: and </part:, you may use a much simpler regex:
/<\/\w+:/
See the regex demo
Details:
< - a literal <
\/ - a literal /
\w+ - 1 or more chars from [a-zA-Z0-9_] ranges
: - a literal :.

Related

Regex for internal URL

I'm trying to create a regex that matches with internal URLs (the ones that don't include the domain or http) that I can find in a file like this one:
category/subcategory/sub-subcategory/item-1
For that I'm using:
/\w+\/.+\/[\w\-]+/
But some URLs are like this:
category/subcategory
And I need a regular expression that also catch those. Do I have to create a different one or is it possible to create one that match both examples? Is for a BASH script but if you have an idea it does not matter if it is for other engine.
Thank you!!
Update: I forgot the context. Each line of the file is like this:
"11","category/subcategory/sub-subcategory/item-1","index.php?option=com_trombinoscopeextended&Itemid=125&lang=es&view=trombinoscope","251","0","0000-00-00","","","","","","","0"
Or like this:
"4","category/subcategory","index.php?option=com_trombinoscopeextended&Itemid=121&lang=es","0","1","0000-00-00","","","","","","","0"
I need to extract the examples for each line.
Thanks.
You may use
/\w+(\/[\w-]+)+/
See the regex demo.
Details
\w+ - 1+ word chars
(\/[\w-]+)+ - 1 or more consecutive sequences of
\/ - a / char
[\w-]+ - 1+ word or - chars.
A hint: you might read in your string with a kind of a CSV parser using your preferred language, and then only return fields that match ^\w+(\/[\w-]+)+$ pattern (here, ^ matches the start of the string and $ matches the end of the string).
That is pretty specific. I came up with this one after some testing. We have subdomains we need to check for as well.
(?!https?:)/?[^/][^/].*|(https?:)?//([^.]*\.)?yourdomain\.com(/.*)?
Someone can probably make it better, but this works for me.

regex - exclude substring contains more than 2 "/"

I have a list of the following strings:
/fajwe/conv_1/routing/apwfe/afjwepfj
/fajwe/conv_2/routing/apwfe
/fajwe/conv_2/routing
/fajwe/conv_3/routing/apwfe/afjwepfj/awef
/fajwe/conv_4/routing/apwfe/afjwepfj/awef/0o09
I want a regex to only match string contains no more than 1 / after the word routing. Namely /fajwe/conv_2/routing/apwfe and /fajwe/conv_2/routing.
Currently I use the regex ^((?!rou\w+(\/\w+){2,}).)*$ but it matches nothing. How can I write a regex to exclude strings contains more than 2 / after the word routing?
I would love to learn how to achieve this using Negative Lookbehind. Many thanks!
Something like this?
^.*\/routing(\/[^\/]*){0,1}$
routing(\/[^\/]*)?$
there you go
https://regex101.com/r/KjE8ed/1/
Your regex matches what you are looking for with the multiline flag m as #revo pointed out.
^((?!rou\w+(\/\w+){2,}).)*$
You could also try it like this:
^\/fajwe\/conv_\d\/routing(?:\/[^\/]+)?$
Depending of your context of language you could \/ escape the forward slash

Not able to Create regular expression for a Voucher number generated in this format "484,0116/BRD/0000267" and I have to use only 0116/BRD/0000267

I have tried many combinations to extract 0116/BRD/0000267 from this number "484,0116/KMO/0000267" but not able to extract, it is showing ERROR - jmeter.extractor.RegexExtractor: Error in pattern: [^,](*[0-9]|/|*[A-Z]|/|*[0-9]+?)"
Please help if anybody have answer for this situation.
Thanks in advance. Images
Response from web page
setting of regular expression
I think it should be enouth to solve your problem.
^\d+,(\d+\/\w+\/\d+)
For a better explanation:
https://regex101.com/r/oI0nP6/1
The main problem with this regex is that you inserted pipes (alternation operator in regex) where you really did not intend to use alternation, but continuation. The * quantifier cannot be applied to the alternation operator.
Use
[0-9]*/[A-Z]*/[0-9]+
or (if the substring is always at the end of the string):
[0-9]*/[A-Z]*/[0-9]+$
See regex demo
Explanation:
[0-9]* - matches 0 or more digits (perhaps, * can be replaced with + to match 1 or more occurrences)
/ - a literal forward slash
[A-Z]* - 0 or more uppercase ASCII letters (again, perhaps, * can be replaced with + to match 1 or more occurrences)
/ - a literal forward slash
[0-9]+ - 1 or more digits.
The $ asserts the position at the end of the string.
That should work with $0$ variable. You can also make your fixed pattern work with
[^,]*,([0-9]*/[A-Z]*/[0-9]+)
Use it with $1$. If the string pattern is always digits+,+digits+/+uppercase letters+/+digits, you can just use
^\d+,(\d+/[A-Z]+/\d+)$
Again, with $1$.
After very long try I have got the answer. Thank you guys for your support. The expression to extract 0116/BRD/0000267 from "484,0116/BRD/0000267" is ,(.+)?\"
This worked for me in Jmeter. Thank you all of you for supporting me., = was bcas I want string after this (.+) = Was for the whole string and numbers ?= was to stop when I got result \" = was to stop before " (inverted comma)
I have used template $1$ and field check in = Body. Image is here for my regex setting. Regex expression setting for jmeter
Thank you :)
We have shared this details on
http://www.knowledgeworldforyou.com/?p=276

Regex split and concatenate path base and pattern with filename deleting part of path between them

I have an URL like this:
a) <a href=\"http://example.com/path-pattern-to-match/subPath/onemoreSubpath/arbitrary-number-of-subpaths/someArticle1\">
or:
b) <a href=\"http://example.com/path-pattern-to-match/someArticle2\">
I need to split path pattern with its base URL, start of <a> tag and concatenate it with Iits someArticle. Everything in between needs to be deleted.
Case 'b' remains untouched. Case 'a' needs to become:
<a href=\"http://example.com/path-pattern-to-match/someArticle1\">
Please answer with a RegEx, that is what I need. Other solutions could be interesting if well explained, using Perl or a bash script, but please avoid to suggest some programming module or function to parse it only to say that RegEx is not the best solution and without any real one solution.
PS: I need to parse a non multiline file.
someArticle is variable.
If you have look-behind support, use
(?<=<a href=\\"http:\/\/example\.com\/path-pattern-to-match\/)(?:[^\/]+\/)*([^\/>"]*)(?=\\">)
See demo
EXPLANATION
(?<=<a href=\\"http:\/\/example\.com\/path-pattern-to-match\/) - a fixed width lookbehind making sure we have <a href=\"http://example.com/path-pattern-to-match/ literal text in front of...
(?:[^\/]+\/)* - 0 or more sequences of 1 or more characters other than / ([^\/]+) followed with a literal / (i.e. subpaths)
([^\/>"]*) - A capturing group that matches our keyword "someArticle" (0 or more characters other than ", >, or /.
(?=\\">) - A positive lookahead checking if there is a \"> right after the preceding subpattern.
Using the $1 replacement string, you can remove the subpaths and keep the "someArticle" part.

Regex to match a string not followed by anything

I am trying to figure out a regex sequence that will match the first item in the list below but not the other two, {Some-Folder} is variable.
http://www.url.com/{Some-Folder}/
http://www.url.com/{Some-Folder}/thing/key/
http://www.url.com/{Some-Folder}/thing/119487302/
http://www.url.com/{Some-Folder}/{something-else}
Essentially I want to be able to detect anything that is of the form:
http://www.url.com/{Some-Folder}/
or
http://www.url.com/{Some-Folder}
but not
http://www.url.com/{Some-Folder}/{something-else}
So far I have
http://www.url.com/[A-Z,-]*\/^.
but this doesn't match anything
http://www.url.com/[^/]+/?$
Or, in the few parsers that use \Z as end of text,
http://www.url.com/[^/]+/?\Z
I customized a regex I've used for URL parsing before, it's not perfect, and will need even more work once gTLD becomes more used. Anyway, here it is:
\bhttps?:\/\/[a-z0-9.-]+\.(?:[a-z]{2,4}|museum|travel)\/[^\/\s]+(?:\/\b)?
You may want to add case insensitive flag, for whichever language you're using.
Demo: http://rubular.com/r/HyVXU30Hvp
You may use the following regex:
(?m)http:\/\/www\.example\.com\/[^\/]+\/?$
Explanation:
(?m) : Set the m modifier which makes ^ and $ match start and end of line respectively
http:\/\/www\.example\.com\/ : match http://www.example.com/
[^\/]+ : match anything except / one or more times
\/? : optionally match /
$ : declare end of line
Online demo
I've been looking for an answer to this exact problem. aaaaaa123456789's answer almost worked for me. But the $ and \Z didn't work. My solution is:
http://www.url.com/[^/]+/?.{0}