PCRE Regex - Match only brackets excluding enclosed content - regex

I'm trying to match a pair of special characters, while excluding the enclosed content from the match. For example, ~some enclosed content~ should match only the pair of ~ and leave out some enclosed content entirely. I can only use vanilla PCRE, and capture groups aren't an option.
My match criteria for the entire string is ~([^\s].*?(?<!\s))~. Matching the first and second ~ separately would also be acceptable.

Looking at your pattern, you want a non whitespace char right after the opening ~ and a non whitespace char right before the closing ~
As those are the delimiters, and the non whitespace char should also not be ~ itself, you might use:
~(?=[^~\s](?:[^~\r\n]*[^\s~])?~)|(?<=~)[^\s~](?:[^~\r\n]*[^\s~])?\K~
Explanation
~ Match literally
(?= Positive lookahead, assert that to the right is
[^~\s] Match a non whitespace char except for ~
(?: Non capture group
[^~\r\n]*[^\s~] Match repeating any char other than a newline or ~ followed by a non whitespace char except for ~
)? Close non capture group and make it optional (to also match a single char ~a~)
~ Match literally
) Close the lookahead
| Or
(?<=~) Positive lookbehind, assert ~ to the left
[^\s~] Match a non whitespace char except for ~
(?:[^~\r\n]*[^\s~])? Same optional pattern as in the lookahead
\K Forget what is matched so far
~ Match literally
Regex demo

Related

Parenthesis content after a specific word

I'm trying to get UNIX group names using a regex (can't use groups because I can only get the process uid, so I'm using id <process_id> to get groups)
input looks like this
uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n
I'd like to capture kawsay, sudo, video and gpio
The only pieces I've got are:
a positive lookbehind to start capturing after groups: /(?<=groups)/
capture the parenthesis content: /\((\w+)\)/
Using PCRE's \G you may use this regex:
(?:\bgroups=|(?<!^)\G)[^(]*\(([^)]+)\)
Your intended matches are available in capture group #1
RegEx Demo
RegEx Details:
(?:: Start non-capture group
\bgroups=: Match word groups followed by a =
|: OR
(?<!^)\G: Start from end position of the previous match
): End non-capture group
[^(]*: Match 0 or more of any character that is not (
\(: Match opening (
([^)]+): Use capture group #1 to match 1+ of any non-) characters
\): Match closing )
You can use
(?:\G(?!\A)\),|\bgroups=)\d+\(\K\w+
See the regex demo. Details:
(?:\G(?!\A)\),|\bgroups=) - either of
\G(?!\A)\), - end of the previous match (\G operator matches either start of string or end of the previous match, so the (?!\A) is necessary to exclude the start of string location) and then ), substring
| - or
\bgroups= - a whole word groups (\b is a word boundary) and then a = char
\d+\( - one or more digits and a (
\K - match reset operator that makes the regex engine "forget" the text matched so far
\w+ - one or more word chars.
Here are two more ways to extract the strings of interest. Both return matches and do not employ capture groups. My preference is for second one.
str = "uid=1001(kawsay) gid=1001(kawsay) groups=1001(kawsay),27(sudo),44(video),997(gpio)\n"
Match substrings between parentheses that are not followed later in the string with "groups="
Match the regular expression
rgx = /(?<=\()(?!.*\bgroups=).*?(?=\))/
str.scan(rgx)
#=> ["kawsay", "sudo", "video", "gpio"]
Demo
See String#scan.
This expression can be broken down as follows.
(?<=\() # positive lookbehind asserts previous character is '('
(?! # begin negative lookahead
.* # match zero or more characters
\bgroups= # match 'groups=' preceded by a word boundary
) # end negative lookahead
.* # match zero or more characters lazily
(?=\)) # positive lookahead asserts next character is ')'
This may not be as efficient as expressions that employ \G (because of the need to determine if 'groups=' appears in the string after each left parenthesis), but that may not matter.
Extract the portion of the string following "groups=" and then match substrings between parentheses
First, obtain the portion of the string that follows "groups=":
rgx1 = /(?<=\bgroups=).*/
s = str[rgx1]
#=> "1001(kawsay),27(sudo),44(video),997(gpio)\n"
See String#[].
Then match the regular expression
rgx2 = /(?<=\()[^\)\r\n]+/
against s:
s.scan(rgx2)
#=> ["kawsay", "sudo", "video", "gpio"]
The regular expression rgx1 can be broken down as follows:
(?<=\bgroups=) # Positive lookbehind asserts that the current
# position in the string is preceded by`'groups'`,
# which is preceded by a word boundary
.* # match zero of more characters other than line
# terminators (to end of line)
rgx2 can be broken down as follows:
(?<=\() # Use a positive lookbehind to assert that the
# following character is preceded by '('
[^\)\r\n]+ # Match one or more characters other than
# ')', '\r' and '\n'
Note:
The operations can of course be chained: str[/(?<=\bgroups=).*/].scan(/(?<=\()[^\)\r\n]+/); and
rgx2 could alternatively be written /(?<=\().+?(?=\)), where ? makes the match of one or more characters lazy and (?=\)) is a positive lookahead that asserts that the match is followed by a right parenthesis.
This would probably be the fastest solution of those offered and certainly the easiest to test.

regex pattern to highlight all the matches for the punctuation in VBA

need an expression to allow only the below pattern
end word(dot)(space)start word [eg: end. start]
in other words
no space before colon,semicolon and dot |
one space after colon,semicolon and dot
rest of the all other patterns need to get capture to identify such as
end.start || end . start || end .start
i used
"([\s{0,}][\.]|[\.][\s{2,}a-z]|[\.][\s{0,}a-z])"
but not working as i expected.Need your support please
need_regex_patterns aim_of_regex_need
You could match 1+ word characters using \w+ and match either a colon or semi colon using a character class [;:] between optional spaces ?.
After that, match again 1+ word characters.
\w+ ?[;:] ?\w+
Regex demo
To match the dot followed by a single space variant, you don't need a character class but you could match the dot only using \.
\w+\. \w+
Regex demo
Edit
To highlight all the matches for the punctuations:
(?: [.:;]|[.:;] {2,}|(?<=\S)[;:.](?=\S))
Explanation
(?: Non capture group
[.:;] match a space followed by either . : or ;
| Or
[.:;] {2,} Match one of the listed followed by 2 or more spaces
| Or
(?<=\S)[;:.](?=\S) Match one of the listed surrounded by non whitespace chars
) Close group
Regex demo

Exclude curly brace matches

I have the following strings:
logger.debug('123', 123)
logger.debug(`123`,123)
logger.debug('1bc','test')
logger.debug('1bc', `test`)
logger.debug('1bc', test)
logger.debug('1bc', {})
logger.debug('1bc',{})
logger.debug('1bc',{test})
logger.debug('1bc',{ test })
logger.debug('1bc',{ test})
logger.debug('1bc',{test })
Instead of debug there can be other calls like warn, fatal etc.
All quote pairs can be "", '' or ``.
I need to create a regular express which matches case 1 - 5 but not 6 - 11.
That's what I've come up with:
logger.*\(['`].*['`],\s*.([^{.*}])
This also matches 8 - 11, so I'm suspecting this part is wrong ([^{.*}]) but I don't get it why.
You can try this
logger\.[^(]+\((?:"(?:\\"|[^"])*"|'(?:\\'|[^'])*'|`(?:\\`|[^`])*`),[^{}]*?\)
Regex Demo
P.S:- This pattern can be shorten if we are sure there won't be any mismatch of quotes, also if there won't be any escaped quote inside string
If there's no escaped string
logger\.[^(]+\((?:"[^"]*"|'[^']*'|`[^`]*`),[^{}]*?\)
If there's no quotes in between string. i.e no strings like "mr's jhon
logger\.[^(]+\(([`"'])[^"'`]*\1,[^{}]*?\)
If there are no quotes between the quoted parts, you could make use of a capturing group to match one of the quote types (['`"]) and use a backreference \1 to match the closing quote type.
The \r\n in the negated character class is to not cross newline boundaries.
The pattern will match either the quoted parts or 1+ times a word character for the first part.
The second part matches any char except { or } or ) using a negated character class.
logger\.[^(\r\n]+\((?:(['`"])[^'`"]+\1|\w+),[^{})\r\n]+\)
That will match
logger\. Match logger.
[^(\r\n]+ Match 1+ times any char except ( or a newline
\( Match (
(?: Non capture group
(['`"]) Capture group 1
[^'`"]+\1 Match 1+ times any char except the quote types, backreference to the captured
| or
\w+ Match 1+ word chars
), Close non capture group and match ,
[^{})\r\n]+ Match 1+ times any char except { } ) or a newline
\) Match )
Regex demo

Search / and replace it with ; in xml tag with sublime text 3

I am working on an .xml file with this tag
<Categories><![CDATA[Test/Test1-Test2-Test3|Test4/Test5-Test6|Test7/Test8]]></Categories>
and I am trying to replace / with ; by using regular expressions in Sublime Text 3.
The output should be
<Categories><![CDATA[Test;Test1-Test2-Test3|Test4;Test5-Test6|Test7;Test8]]></Categories>
When I use this (<Categories>\S+)\/(.+</Categories>) it matches all the line and of course if I use this \/ it matches all / everywhere inside the .xml file.
Could you please help?
For you example string, you could make use of a \G to assert the position at the end of the previous match and use \K to forget what has already been matched and then match a forward slash.
In the replacement use a ;
Use a positive lookahead to assert what is on the right is ]]></Categories>
(?:<Categories><!\[CDATA\[|\G(?!^))[^/]*\K/(?=[^][]*]]></Categories>)
Explanation
(?: Non capturing group
<Categories><!\[CDATA\[ Match <Categories><![CDATA[
| Or
\G(?!^) Assert the position at the end of the previous match, not at the start
) Close non capturing group
[^/]* Match 0+ times not / using a negated character class
\K/ Forget what was matched, then match /
(?= Positive lookahead, assert what is on the right is
[^][]*]]></Categories> Match 0+ times not [ or ], then match ]]></Categories>
) Close positive lookahead
Regex demo

Regex Replacement

I have a string:
users/554983490\/Another+Test+/Question????\/+dhjkfsdf/
How would i write a RegExp that would match all of the forward slashes NOT preceded by a back slash?
EDIT: Is there a way to do it without using a negative lookbehinds?
If your regular expressions support negative lookbehinds:
/(?<!\\)\//
Otherwise, you will need to match the character before the / as well:
/(^|[^\\])\//
This matches either the start of a string (^), or (|) anything other than a \ ([^\\]) as capture group #1 (). Then it matches the literal / after. Whatever character was before the / will be stored in the capture group $1 so you can put it back in if you are doing a replace....
Example (JavaScript):
'st/ri\\/ng'.replace(/(^|[^\\])\//, "$1\\/");
// returns "st\/ri\/ng"
You can use this :
/(?<!\\)\//
This is called a negative lookbehind.
I used / as the delimiters
(?<! <-- Start of the negative lookbehind (means that it should be preceded by the following pattern)
\\ <-- The \ character (escaped)
) <-- End of the negative lookbehind
\/ <-- The / character (escaped)