regular expression to find content within square brackets, but with some exceptions: - regex

I want to create a regular expression to find content within square brackets, but with some exceptions:
E.g.,
[abc] -> It should match
['abc'] -> it should not match
[$abc] -> it should not match
[integer] Like [0] -> it should not match
I have used this regular expression
\[((?!')[^]]*)\]
It is working for the first 2 condition but not for the other 2 condition.

This regex could do the job,
\[([^'$\d]+?)\]
DEMO
Explanation:
\[ Matches a literal [ symbol.
() Capturing group
[^'$\d]+? Matches any character not of literal ' or $ or \d one or more times. ? after + does a reluctant(non-greedy) match.
\] Matches a literal ] symbol.

You could add a $ to your negative lookahead assertion and assert that no integer number can be matched:
\[((?!['$]|\d+\])[^]]*)\]
Explanation:
\[ # Match [
( # Capture in group 1:
(?! # unless the following matches here: Either...
['$] # one of the characters ' or $
| # or
\d+\] # a positive integer number, followed by ]
) # End of lookahead assertion
[^]]* # Match any number of characters except closing brackets
) # End of group 1
\] # Match ]
Test it live on regex101.com.

You might be able to avoid the negative lookahead altogether:
\[[^]'$\d]*\]

Related

REGEX : Match after A to the last B but only before C

I would like to match everything after \t(\fn but only before \and excluding the last bracket that belongs to the \(t (as you can see on the attached picture)
For this :
\t(\fnJester (BIG))
\t(\fnJester (BIG))\i1
\t(\fnJester (BIG)
\t(\fnJester))))\fnArial (BOLD)\
I want to match :
Jester (BIG)
Jester (BIG)
Jester (BIG
Jester)))) and Arial (BOLD
I'm almost there with this pattern :
(?<=\\t\(.*?\\fn).*(?=\)|\\)
But since it's greedy, it matches everything even after \
https://regex101.com/r/YzY2KN/3
You can match the regular expression
(?<=fn)[^\\]*(?=\)|\\fn)
Demo
The expression can be broken down as follows.
(?<= # begin a positive lookbehind
fn # match 'fn'
) # end positive lookbehind
[^\\]* # match zero or more chars other than '\'
(?= # begin a positive lookahead
\) # match ')'
| # or
\\fn # match '\fn\
) # end positive lookahead
You may use this regex:
(?<=\\fn).*?(?=\\fn|\)(?:\\(?!fn)|$))
Updated RegEx Demo
(?=\\fn|\)(?:\\(?!fn)|$)) is the lookahead condition that is making sure that our match is ending with \fn or ) followed by optional \.

Parenthesis content after a specific word

I'm trying to get UNIX group names using a regex (can't use groups because I can only get the process uid, so I'm using id <process_id> to get groups)
input looks like this
uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n
I'd like to capture kawsay, sudo, video and gpio
The only pieces I've got are:
a positive lookbehind to start capturing after groups: /(?<=groups)/
capture the parenthesis content: /\((\w+)\)/
Using PCRE's \G you may use this regex:
(?:\bgroups=|(?<!^)\G)[^(]*\(([^)]+)\)
Your intended matches are available in capture group #1
RegEx Demo
RegEx Details:
(?:: Start non-capture group
\bgroups=: Match word groups followed by a =
|: OR
(?<!^)\G: Start from end position of the previous match
): End non-capture group
[^(]*: Match 0 or more of any character that is not (
\(: Match opening (
([^)]+): Use capture group #1 to match 1+ of any non-) characters
\): Match closing )
You can use
(?:\G(?!\A)\),|\bgroups=)\d+\(\K\w+
See the regex demo. Details:
(?:\G(?!\A)\),|\bgroups=) - either of
\G(?!\A)\), - end of the previous match (\G operator matches either start of string or end of the previous match, so the (?!\A) is necessary to exclude the start of string location) and then ), substring
| - or
\bgroups= - a whole word groups (\b is a word boundary) and then a = char
\d+\( - one or more digits and a (
\K - match reset operator that makes the regex engine "forget" the text matched so far
\w+ - one or more word chars.
Here are two more ways to extract the strings of interest. Both return matches and do not employ capture groups. My preference is for second one.
str = "uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n"
Match substrings between parentheses that are not followed later in the string with "groups="
Match the regular expression
rgx = /(?<=\()(?!.*\bgroups=).*?(?=\))/
str.scan(rgx)
#=> ["kawsay", "sudo", "video", "gpio"]
Demo
See String#scan.
This expression can be broken down as follows.
(?<=\() # positive lookbehind asserts previous character is '('
(?! # begin negative lookahead
.* # match zero or more characters
\bgroups= # match 'groups=' preceded by a word boundary
) # end negative lookahead
.* # match zero or more characters lazily
(?=\)) # positive lookahead asserts next character is ')'
This may not be as efficient as expressions that employ \G (because of the need to determine if 'groups=' appears in the string after each left parenthesis), but that may not matter.
Extract the portion of the string following "groups=" and then match substrings between parentheses
First, obtain the portion of the string that follows "groups=":
rgx1 = /(?<=\bgroups=).*/
s = str[rgx1]
#=> "1001(kawsay),27(sudo),44(video),997(gpio)\n"
See String#[].
Then match the regular expression
rgx2 = /(?<=\()[^\)\r\n]+/
against s:
s.scan(rgx2)
#=> ["kawsay", "sudo", "video", "gpio"]
The regular expression rgx1 can be broken down as follows:
(?<=\bgroups=) # Positive lookbehind asserts that the current
# position in the string is preceded by`'groups'`,
# which is preceded by a word boundary
.* # match zero of more characters other than line
# terminators (to end of line)
rgx2 can be broken down as follows:
(?<=\() # Use a positive lookbehind to assert that the
# following character is preceded by '('
[^\)\r\n]+ # Match one or more characters other than
# ')', '\r' and '\n'
Note:
The operations can of course be chained: str[/(?<=\bgroups=).*/].scan(/(?<=\()[^\)\r\n]+/); and
rgx2 could alternatively be written /(?<=\().+?(?=\)), where ? makes the match of one or more characters lazy and (?=\)) is a positive lookahead that asserts that the match is followed by a right parenthesis.
This would probably be the fastest solution of those offered and certainly the easiest to test.

Regex match text after last '-'

I am really stuck with the following regex problem:
I want to remove the last piece of a string, but only if the '-' is more then once occurring in the string.
Example:
BOL-83846-M/L -> Should match -M/L and remove it
B0L-026O1 -> Should not match
D&F-176954 -> Should not match
BOL-04134-58/60 -> Should match -58/60 and remove it
BOL-5068-4 - 6 jaar -> Should match -4 - 6 jaar and remove it (maybe in multiple search/replace steps)
It would be no problem if the regex needs two (or more) steps to remove it.
Now I have
[^-]*$
But in sublime it matches B0L-026O1 and D&F-176954
Need your help please
You can match the first - in a capture group, and then match the second - till the end of the string to remove it.
In the replacement use capture group 1.
^([^-\n]*-[^-\n]*)-.*$
^ Start of string
( Capture group 1
[^-\n]*-[^-\n]* Match the first - between chars other than - (or a newline if you don't want to cross lines)
) Capture group 1
-.*$ Match the second - and the rest of the line
Regex demo
You can match the following regular expression.
^[^-\r\n]*(?:$|-[^-\r\n]*(?=-|$))
Demo
If the string contains two or more hyphens this returns the beginning of the string up to, but not including, the second hyphen; else it returns the entire string.
The regular expression can be broken down as follows.
^ # match the beginning of the string
[^-\r\n]* # match zero or more characters other than hyphens,
# carriage returns and linefeeds
(?: # begin a non-capture group
$ # match the end of the string
| # or
- # match a hyphen
[^-\r\n]* # match zero or more characters other than hyphens,
# carriage returns and linefeeds
(?= # begin a positive lookahead
- # match a hyphen
| # or
$ # match the end of the string
) # end positive lookahead
) # end non-capture group

Modify Go regex so it doesn't pick up the last character

I have this regex, which works as on this link: https://regex101.com/r/HVKfYU/1
This is my regex string: (\d+[-–]\(?\d+([+\-*/^]\d+ ?[+\-*/^] ?\d+)?\)?)
These are my test strings:
(0–(2^63 - 1))
(1-(2^16 - 2))
(1-29999984)
(3-32)
This is what the regex matches in the first two cases:
0–(2^63 - 1)
1-(2^16 - 2)
// works, it doesn't match the first pair of brackets
And this is what it matches in the last two:
1-29999984)
3-32)
// doesn't work, it matches the closing bracket
I'd like it to not match the last closing bracket in any of the test strings. At the moment I'm stripping the bracket if necessary, but I would like to avoid that. How could I modify the regex, so it works as I would like?
Try (\d+[-–](?:\d+|\(\d+([+\-*/^]\d+[ ]?[+\-*/^][ ]?\d+)?\)))
demo
it just match digits or block with paren
add some explern
(
\d+ [-–]
(?: # non capture for alternation
\d+ # dd-dd form
| # or
\( \d+ # dd-(dd + dd) form
(
[+\-*/^]
\d+
[ ]?
[+\-*/^]
[ ]?
\d+
)?
\)
)
)

Negative lookbehind and square brackets

I 'd like to create a regex that matches unmatched right square brackets. Examples:
]ichael ==> match ]
[my name is Michael] ==> no match
No nested pairs of of square brackets occur in my text.
I tried to use negative lookbehind for that, more specifically I use this regex: (?<!\[(.)+)\] but it doesn't seem to do the trick.
Any suggestions?
Unless you are using .NET, lookbehinds have to be of fixed length. Since you just want to detect whether there are any unmatched closing brackets, you don't actually need a lookbehind though:
^[^\[\]]*(?:\[[^\[\]]*\][^\[\]]*)*\]
If this matches you have an unmatched closing parenthesis.
It's a bit easier to understand, if you realise that [^\[\]] is a negated character class that matches anything but square brackets, and if you lay it out in freespacing mode:
^ # start from the beginning of the string
[^\[\]]* # match non-bracket characters
(?: # this group matches matched brackets and what follows them
\[ # match [
[^\[\]]* # match non-bracket characters
\] # match ]
[^\[\]]* # match non-bracket characters
)* # repeat 0 or more times
\] # match ]
So this tries to find a ] after matching 0 or more well-matched pairs of brackets.
Note that the part between ^ and ] is functionally equivalent to Tim Pietzker's solution (which is a bit easier to understand conceptually, I think). What I have done, is an optimization technique called "unrolling the loop". If your flavor provides possessive quantifiers, you can turn all * into *+ to increase efficiency even further.
About your attempt
Even if you are using .NET, the problem with your pattern is that . allows you to go past other brackets. Hence, you'd get no match in
[abc]def]
Because both the first and the second ] have a [ somewhere in front of them. If you are using .NET, the simplest solution is
(?<!\[[^\[\]]*)\]
Here we use non-bracket characters in the repetition, so that we don't look past the first [ or ] we encounter to the left.
You don't need lookaround at all (and it would be difficult to use it most languages don't allow unlimited-length lookbehind assertions):
((?:\[[^\[\]]*]|[^\[\]]*)*+)\]
will match any text that ends in a closing bracket unless there's a corresponding opening bracket before it. It does not (and according to your question doesn't need to) handle nested brackets.
The part before the ] can be found in $1 so you can reuse it later.
Explanation:
( # Match and capture in group number 1:
(?: # the following regex (start of non-capturing group):
\[ # Either a [
[^\[\]]* # followed by non-brackets
\] # followed by ]
| # or
[^\[\]]* # Any number of non-bracket characters
)*+ # repeat as needed, match possessively to avoid backtracking
) # End of capturing group
\] # Match ]
This should do it:
'^[^\[]*\]'
Basically says pick out any closing square bracket that doesn't have an open square bracket between it and the beginning of the line.
\](.*)
Will match on everything after the ]:
]ichael -> ichael
[my name is Michael] ->