Why ex (vim) is catching matching pattern to the end of line - regex

File contains following line:
[assembly: AssemblyVersion("1.0.0.0")]
Bash script that replaces one version to another:
echo "%s/AssemblyVersion\s*\(.*\)/AssemblyVersion(\"$newVersionNumber\")]/g
w
q
" | ex $filePath
The question is why this catch whole line to the end so i have to add ] at the end of replacement string?

The problem arises because .* matches all the chars to the end of the line, and \( and \) create a capturing group (unlike most of NFA regex engines, Vim regex matches a ( char with an unescaped ( and ) with an unescaped ) in the pattern).
You may use
%s/AssemblyVersion\s*([^()]*)/AssemblyVersion(\"$newVersionNumber\")/g
Here, AssemblyVersion will match the word, then \s* will match any 0+ whitespace chars, ( will match a literal (, [^()]* will match 0+ chars other than ( and ), and ) will match a literal ).
Another regex substitution command you may use is
:%s/AssemblyVersion\s*(\zs[^()]*\ze)/\"$newVersionNumber\"/g
Here, AssemblyVersion\s*( will match AssemblyVersion, 0+ whitespaces and ( and \zs will omit that part from the match, then 0+ chars other than ( and ) will get matched, and then \ze) will check if there is ) to the right of the current location, but won't add it to the match.
\zs sets the next character to be the first character of the match. Any text before the \zs pattern will not be included into the match.
\ze sets the end of the match. Anything after the \zs pattern will not be part of the match.

Related

Capture function and its value using Regex

I have a text that contains the following function calls:
set_name(value:"this is a test");
set_attribute(name:"description", value:"Some
Multi
Line
Value");
And I am trying to capture its data so that I get back:
'name'
or
'attribute'
The value just after "set_"
As well as the inside content:
value:"this is a test"
And
name:"description", value:"Some
Multi
Line
Value"
Respectively
I tried using this regex:
script_([A-Za-z_]+)\s*\(([\S\s]*?)\)
but it will fail if this is the set_attribute value:
set_attribute(name:"description", value:"Some
Multi
(Line)
Value");
Because the (first) ) found there is captured by the regex
I am looking for a regex that would return "attribute" and the content via two group captures:
name:"description", value:"Some
Multi
(Line)
Value"
The desired strings could be extracted with the following regular expression, with the single-line or DOTALL flag is set, causing dot to match line terminators.
(?<=^set_)\w+(?=\()|(?<=\().*?(?=\);$)
The first match is the substring between set_ and (; the second match is the substring between ( and ).
In Ruby, for example, this regex could be used as follows.
str = 'set_name(value:"this is a test");'
r = /(?<=^set_)\w+(?=\()|(?<=\().*?(?=\);$)/m
after_set, inside_parens = str.scan(r)
after_set #=> "name"
inside_parens #=> "value:\"this is a test\""
Note that in Ruby single-line or DOTALL mode (dot matches line terminators) is denoted /m.
Start your engine!.
The regex engine performs the following operations.
/
(?<=^set_) : positive lookbehind asserts match is preceded by `set_` at
the beginning of the string
\w+ : match 1+ word characters
(?=\() : positive lookahead asserts following character is '('
| : or
(?<=\() : positive lookbehind asserts match is preceded by '('
.*? : match 0+ characters, as few as possible
(?=\);$) : positive lookahead asserts match is followed by ');' at
: the end of the line
/m : flag to cause '.' to match line terminators
Each line ends with character semicolon. You could add the character in regex after character ).
set_([A-Za-z_]+)\s*\(([\S\s]*?)\);
Demo
You may use
(?ms)^set_(\w+)\((.*?)\);$
See the regex demo.
Details
(?ms) - multiline (^ and $ match start/end of the line now) and dotall (. matches line break chars) modes are ON
^ - start of a line
set_ - a literal string
(\w+) - Group 1: one or more word chars
\( - a ( char
(.*?) - Group 2: any 0 or more chars, as few as possible
\); - ); substring...
$ - at the end of the line.
Another way to get the values using the 2 capturing groups is to repeatedly match the key:values pairs between the opening and the closing parenthesis in group 2.
^set_([A-Za-z_]+)\s*\((\w+:"[^"]+"(?:, ?\w+:"[^"]+")*)\);
Explanation
^set_ Match set_ form the start of the string
( Capture group 1
[A-Za-z_]+ Match 1+ times any of the listed
) Close group 1
\s*\( Match 0+ whitespace chars and opening (
( Capture group 2
\w+:"[^"]+" Match 1+ word chars, then from opening " till closing "
(?:, ?\w+:"[^"]+")* Optionally repeat the previous pattern preceded by a comma and optional space
) Close group 2
\); Match the closing )
Regex demo

Regex lazy matching

I have this string
(Mozilla/5.0 \(X11; Linux x86_64\) AppleWebKit/537.36 \(KHTML, like Gecko\) Chrome/data Safari/data2) /Producer (Skia/PDF m80) /CreationDate (D:20200420090009+00'00') /ModDate (D:20200420090009+00'00')
I want to get the first ocurrence of () where there isn't any \ before ( or ). That case I would get
(Mozilla/5.0 \(X11; Linux x86_64\) AppleWebKit/537.36 \(KHTML, like Gecko\) Chrome/data Safari/data2)
I'm using this regex expression
\([\s\S]*[^\\]{1}\)?
However I get the whole string
Your regex can be broken down like so.
[The spaces and newlines are for clarity]
\( match a literal (
[\s\S]* match 0 or more of whitespace or not-whitespace (anything)
[^\\]{1} match 1 thing which is not \
\)? optionally match a literal )
regex101 demo
It's that [\s\S]* which winds up slurping in everything.
The ? on the end doesn't mean lazy, it makes matching the ) optional. To be lazy, ? must be put in front of an open-ended qualifier like *? or +? or {3,}? or {1,5}?.
To match just the first set of parenthesis, we want to lazily match anything between unescaped parens. Lazy matching anything is easy .*?.
Matching unescaped parens is a little harder. We could match [^\\]\), but that requires a character to match. This won't work if the opening paren is at the beginning of the string because there's no character before the (. We can solve this by also matching the beginning of the string: (?:[^\\]|^)\).
(?: non-capturing group
[^\\] match a non \
| or
^ the beginning of the string
)
\( match a literal (
.*? lazy match 0 or more of anything
[^\\] match a non \
\) match a literal )
regex101 demo
But this will be foiled by (). It will match all of ()(foo).
(?:[^\\]|^) matches the beginning of the string. \( matches the first (. That leaves .*?[^\\]\) looking at )(foo). The first ) does not match because there is no leading character, it was already consumed. So .*? gobbles up characters until it his o) which matches [^\\]\).
The boundary problem is better solved by negative look behinds. (?<!\\) says the preceding character must not be a \ which includes no character at all. Lookbehinds don't consume what they match so they can be used to peek behind and ahead. Most, but not all, regex engines support them.
(?<!\\) \( match a literal ( which is not after a \
.*? lazy match 0 or more of anything
(?<!\\) \) match a literal ) which is not after a \
regex101 demo
However, there are libraries to parse User-Agents. ua-parser has libraries for many languages,

regex pattern to highlight all the matches for the punctuation in VBA

need an expression to allow only the below pattern
end word(dot)(space)start word [eg: end. start]
in other words
no space before colon,semicolon and dot |
one space after colon,semicolon and dot
rest of the all other patterns need to get capture to identify such as
end.start || end . start || end .start
i used
"([\s{0,}][\.]|[\.][\s{2,}a-z]|[\.][\s{0,}a-z])"
but not working as i expected.Need your support please
need_regex_patterns aim_of_regex_need
You could match 1+ word characters using \w+ and match either a colon or semi colon using a character class [;:] between optional spaces ?.
After that, match again 1+ word characters.
\w+ ?[;:] ?\w+
Regex demo
To match the dot followed by a single space variant, you don't need a character class but you could match the dot only using \.
\w+\. \w+
Regex demo
Edit
To highlight all the matches for the punctuations:
(?: [.:;]|[.:;] {2,}|(?<=\S)[;:.](?=\S))
Explanation
(?: Non capture group
[.:;] match a space followed by either . : or ;
| Or
[.:;] {2,} Match one of the listed followed by 2 or more spaces
| Or
(?<=\S)[;:.](?=\S) Match one of the listed surrounded by non whitespace chars
) Close group
Regex demo

Exclude curly brace matches

I have the following strings:
logger.debug('123', 123)
logger.debug(`123`,123)
logger.debug('1bc','test')
logger.debug('1bc', `test`)
logger.debug('1bc', test)
logger.debug('1bc', {})
logger.debug('1bc',{})
logger.debug('1bc',{test})
logger.debug('1bc',{ test })
logger.debug('1bc',{ test})
logger.debug('1bc',{test })
Instead of debug there can be other calls like warn, fatal etc.
All quote pairs can be "", '' or ``.
I need to create a regular express which matches case 1 - 5 but not 6 - 11.
That's what I've come up with:
logger.*\(['`].*['`],\s*.([^{.*}])
This also matches 8 - 11, so I'm suspecting this part is wrong ([^{.*}]) but I don't get it why.
You can try this
logger\.[^(]+\((?:"(?:\\"|[^"])*"|'(?:\\'|[^'])*'|`(?:\\`|[^`])*`),[^{}]*?\)
Regex Demo
P.S:- This pattern can be shorten if we are sure there won't be any mismatch of quotes, also if there won't be any escaped quote inside string
If there's no escaped string
logger\.[^(]+\((?:"[^"]*"|'[^']*'|`[^`]*`),[^{}]*?\)
If there's no quotes in between string. i.e no strings like "mr's jhon
logger\.[^(]+\(([`"'])[^"'`]*\1,[^{}]*?\)
If there are no quotes between the quoted parts, you could make use of a capturing group to match one of the quote types (['`"]) and use a backreference \1 to match the closing quote type.
The \r\n in the negated character class is to not cross newline boundaries.
The pattern will match either the quoted parts or 1+ times a word character for the first part.
The second part matches any char except { or } or ) using a negated character class.
logger\.[^(\r\n]+\((?:(['`"])[^'`"]+\1|\w+),[^{})\r\n]+\)
That will match
logger\. Match logger.
[^(\r\n]+ Match 1+ times any char except ( or a newline
\( Match (
(?: Non capture group
(['`"]) Capture group 1
[^'`"]+\1 Match 1+ times any char except the quote types, backreference to the captured
| or
\w+ Match 1+ word chars
), Close non capture group and match ,
[^{})\r\n]+ Match 1+ times any char except { } ) or a newline
\) Match )
Regex demo

regex match after word

I would like to know how to capture text only if the beginning of a line matching a certain string... but i dont want to capture the begining string...
for example if i have the text:
BEGIN_TAG: Text To Capture
WRONG_TAG: Text Not to Capture
i want to capture:
Text To Capture
From the line that begin with BEGIN_TAG: not the line that begin with WRONG_TAG:
I know the how to select the line that begin with the desired text: ^BEGIN_TAG:\W?(.*)
but this selects the text "BEGIN_TAG:"... i dont want this only want the text after "BEGIN_TAG"
I am using PCRE regex
Instead of a positive lookbehind that does not allow unknown width patterns, you may use a match reset operator \K:
^BEGIN_TAG:\W?\K.*
See the regex demo
Details:
^ - in Sublime, start of a line
BEGIN_TAG: - a string of literal chars
\W? - 1 or 0 non-word chars
\K - the match reset operator that discards all text matched so far
.* - any 0+ chars other than linebreak characters (the rest of the line) that are the only chars that will be kept in the matched text.
You can use lookbehind. Then, the text in the lookbehind group isn't part of the whole match. You can see it as an anchor like \b, ^, etc.
You then get:
(?<=^BEGIN_TAG:\W)(\w.*)$
Explained:
(?<= # Positive lookbehind group
^ # Start of line / string
BEGIN_TAG: # Literal
\W # A non-word character ([^a-zA-Z_])
)
( # First and only matching group (probably not needed)
\w # A word character ([a-zA-Z_])
.* # Any character, any number of times
)
$ # End of line / string