Regex to handle a dynamic set of delimters - regex

Im writing a parser and need to handle escaping characters via regex, if possible.
Given a sample string of with the escape character of '\' and a delimiter of '&':
TestSection1&TestSection2\\&TestSection3\&TestSection4
I would like to be able to split on a valid '&', that is to say not an & that is escaped. So the above example would come out something like this:
TestSection1
TestSection2\
TestSection3\&TestSection4
Ive tried a quite a few regex that Ive tried to muddle together but no luck. Does anyone have any insight on how one can accomplish this, or if its even possible?
Thanks

You can use this double lookbehind based regex:
(.+?)(?:(?<!(?<!\\)\\)&|$)
RegEx Demo
(?:(?<!(?<!\\)\\)&|$) means match & or end anchor if & is not preceded by a single \

Related

Brackets within a Regex string

I'm trying to use a regular expression to match on a string. Brackets are special characters within regex, am I'm unsure of how'd i'd go about including them in my regex.
To provide more context, I want to find a string such as test[test]
My regex currently looks like this: ^*test[test]. My expression is built out more much than this, but this example is enough to understand the problem.
How can i search for brackets in my string without triggering a character class. I need to use a regex, please don't recommend switching to something else.
You can escape a character with a backslash so \[
I can highly recommend https://regex101.com/ to test your regex without having to code it.
Try: ^.*test\[test\] - This mean {start of line}, {anything}, "test[test]".

How to do a negative lookbehind within a %r<…>-delimited regexp in Ruby?

I like the %r<…> delimiters because it makes it really easy to spot the beginning and end of the regex, and I don't have to escape any /. But it seems that they have an insurmountable limitation that other delimiters don't have?
Every other delimiter imaginable works fine:
/(?<!foo)/
%r{(?<!foo)}
%r[(?<!foo)]
%r|(?<!foo)|
%r/(?<!foo)/
But when I try to do this:
%r<(?<!foo)>
it gives this syntax error:
unterminated regexp meets end of file
Okay, it probably doesn't like that it's not a balanced pair, but how do you escape it such that it does like it?
Does something need to be escaped?
According to wikibooks.org:
Any single non-alpha-numeric character can be used as the delimiter,
%[including these], %?or these?, %~or even these things~.
By using this notation, the usual string delimiters " and ' can appear
in the string unescaped, but of course the new delimiter you've chosen
does need to be escaped.
Indeed, escaping is needed in these examples:
%r!(?<\!foo)!
%r?(\?<!foo)?
But if that were the only problem, then I should be able to escape it like this and have it work:
%r<(?\<!foo)>
But that yields this error:
undefined group option: /(?\<!foo)/
So maybe escaping is not needed/allowed? wikibooks.org does list %<pointy brackets> as one of the exceptions:
However, if you use
%(parentheses), %[square brackets], %{curly brackets} or
%<pointy brackets> as delimiters then those same delimiters
can appear unescaped in the string as long as they are in balanced
pairs
Is it a problem with balanced pairs?
Balanced pairs are no problem as long as you are doing something in the Regexp that requires them, like...
%r{(?<!foo{1})} # repetition quantifier
%r[(?<![foo])] # character class
%r<(?<name>foo)> # named capture group
But what if you need to insert a left-side delimiter ({, [, or <) inside the regex? Just escape it, right? Ruby seems to have no problem with escaped unbalanced delimiters most of the time...
%r{(?<!foo\{)}
%r[(?<!\[foo)]
%r<\<foo>
It's just when you try to do it in the middle of the "group options" (which I guess is what the <! characters are classified as here) following a (? that it doesn't like it:
%r<(?\<!foo)>
# undefined group option: /(?\<!foo)/
So how do you do that then and make Ruby happy? (without changing the delimiters)
Conclusion
The workaround is easy. I'll just change this particular regex to just use something else instead like %r{…} instead.
But the questions remain...
Is there really no way to escape the < here?
Are there really some regular expression that are simply impossible to write using certain delimiters like %r<…>?
Is %r<…> the only regular expression delimiter pair that has this problem (where some regular expressions are impossible to write when using it). If you know of a similar example with %r{…}/%r[…], do share!
Version info
Not that it probably matters since this syntax probably hasn't changed, but I'm using:
⟫ ruby -v
ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-linux]
Reference:
https://ruby-doc.org/core-2.6.3/Regexp.html
% Notation
As others have mentioned, seems like an oversight based on how this character differs from other paired boundaries.
As far as "Is there really no way to escape the < here?" there is a way... but you're not going to like it:
%r<(?#{'<'}!foo)> == %r((?<!foo))
Using interpolation to insert the < character seems to work. But given that there are much better options, I would avoid it unless you were planning on splitting the regex into sections anyway...

Regex Match between brackets (...)

I'm trying to grab 2 items from a simple line.
[Title](Description)
EDIT: actually a url looking to display called it description because i want it displayed not actually parsed.
[Trivium](https://www.youtube.com/user/trivium)
Grabbing between the brackets (...) doesn't seem to work at all for me. I've googled and found several variations with no luck, Thanks in advance :)
EDIT:
Tried the following:
[(.+?)]\((.*)\)
[(.+?)]\([^\(\r\n]*\)
[(.+?)]((.+?))
and a cpl more I cant find again
The first regex you listed almost has it right. Try using this regex instead:
\[.+?\]\((.*)\)
As #PM 77-1 pointed out, you need to escape the brackets by placing a backslash in front of them. The reason for this is that brackets are special regex metacharacters, or characters which have a special meaning. Brackets tell the regex engine to look for classes of characters contained inside of it.
Your original regex [(.+?)]\((.*)\) is actually doing this:
[(.+?)] match a period '.' 1 or more times
\((.*)\) match (anything), i.e. anything contained in parentheses
So this regex would match .....(stuff) but would not match [Title](Description), the latter which is what you really want.
Here is a link where you can test out the working regex:
Regex 101

Regex expression to match all char inside

I'm trying to mass update a web app, I need to create a regex that matches:
lang::id(ALLCHARACTERS]
Can someone assist me with this? I'm not good with regex. I'm pretty sure it can start like:
lang\:\:\(WHAT GOES HERE\]
Something like this would work:
lang::id\([^]]*]
This will match a literal lang::id\(, followed by zero or more of any character other than ], followed by a literal ].
Note that the only character that really needs to be escaped is the open parenthesis.
lang::id\(.*]
The . means any single character, and then * repeats it zero->N times. Make sure to escape the ( since it is used inside regex and is a special char for them, so escaping it with \ is needed, or the regex will probably complain about unbalanced parenthesis.
If you wanted it to not include all characters, you can add a smaller regex in place of the .*. This way you can break the regex down into smaller chunks which help make it easier to understand and develop for some complex rules.

Regex to certain special characters

Currently i have this following regex which i use to validate the name of a company/industry and its working fine
/(?=[a-zA-Z0-9-]{5,25}$)^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$/
The above regex doesnt supports for special characters like & - . _ which are valid in my case
I came up with this but it wasnt working as expected.
/(?=[a-zA-Z0-9-\&\_\.]{5,25}$)^[a-zA-Z0-9\&\_\.]+(-[a-zA-Z0-9\&\_\.]+)*$/
Can someone point it out where my above regex goes wrong. Also a short explaination of the above regex wud be greatly appreciated
Thanks
I don't think you have to escape & with \&, same way _ also
/(?=[a-zA-Z0-9-&_\.]{5,25}$)^[a-zA-Z0-9&_\.]+(-[a-zA-Z0-9&_\.]+)*$/
If I'm not wrong, you don't actually have to put backslash with every special character unless the special character is the backslash itself or the character -. So your regular expression would be
/(?=[a-zA-Z0-9-&_.]{5,25}$)^[a-zA-Z0-9&_.]+(-[a-zA-Z0-9&_.]+)*$/