match but don't select with regex - regex

I'm trying to capture the term ISomething only when it isn't immediately preceded by a full stop . however my search is capturing the preceding letter or space. I'm using javascript style regex (vscode search to be exact).
My aim is to replace ISomething with Namespace.ISomething without touching existing namespaces.
Live example
My search sample
Api.Resources.Things.Bits.ISomething //doesn't match
something : ISomething
List<ISomething>
something:ISomething
isomething //doesn't match
My regex
[^\.](ISomething)
My matches, the first captures the whitespace, the second the arrow, third the bracket.
ISomething
<ISomething
:ISomething
How (and why) can i just get the word ISomething in all of the above?

UPDATE
You can use infinite-width lookahead and lookbehind without any constraint beginning with Visual Studio Code v.1.31.0 release, and you do not need to set any options for that now.
So, the solution can look like
Find what: \b(?<!\.)ISomething\b
Replace with: Namespace.$&
The (?<!\.) must be after \b for better performance (in order not to perform lookbehind check at each position in a string) and is a negative lookbehind that matches a position that is not immediately preceded with a literal .. The $& in the replacement is a whole match value backreference/placeholder.
With older versions you may use
Find what: (^|[^.])\b(ISomething)\b
Replace with: $1Namespace.$2
See the regex demo and the VSCode settings below:
NOTE that the Aa (case sensitivity) and .* (regex mode) options must be ON.
After clicking Replace all, the results are:
Regex details
(^|[^.]) - Group 1: either the start of the line/string or any char other than .
\b - a word boundary
(ISomething) - Group 2: the word ISomething
\b - a word boundary

if supported \K may be what you are looking for:
[^\.]\KISomething

You could use a negative lookbehind
(?<!\.)ISomething basically this will match any ISomething that is not preceded by a .

Related

Find multiple occurrences of a character after another character

I need to find and replace multiple occurrences of a character after another character.
My file looks like this:
b
a
b
b
And I need to replace all b after a with c:
b
a
c
c
I came up with this: a((\n|.)*)b as the find expression and a$1c as the replace option, however it only replaces the last match instead of all of them.
I am using VSCode's global search and replace option.
I found a dirty way to achieve what I want: I add a ? lazy quantifier after .* matches once, and I apply the replacement. Then I can do it again and it will replace the next match. I do this until all occurrences are replaced.
However this would not be usable if there are thousands of matchs, and it would be very interesting to know if there is a proper way to do it, with only 1 find.
How can I match all b after a?
You can use
(?<=a[\w\W]*?)b
Replace with c. Details:
(?<=a[\w\W]*?) - a positive lookbehind that matches a location that is immediately preceded with a and then any zero or more chars (as few as possible)
b - a b.
Also, see Multi-line regular expressions in Visual Studio Code for more ways to match any char across lines.
Demo:
After replacing:
If you need to use something like this to replace in multiple files, you need to know that the Rust regex used in the file search and replace VSCode feature is really much less powerful and does not support neither \K, nor \G, nor infinite-width lookbehinds. I suggest using Notepad++ Replace in Files feature:
The (?:\G(?!\A(?<!(?s:.)))|a)[^b]*\Kb pattern matches
(?:\G(?!\A(?<!(?s:.)))|a) - either of the two options:
\G(?!\A(?<!(?s:.))) - the end of the previous successful match ((?!\A(?<!(?s:.))) is necessary to exclude the start of file position from \G)
| - or
a - an a
[^b]* - any zero or more occurrences of chars other than b
\K - omit the matched text
b - a b char.
It's probably not the prettiest, but when tried and tested the following worked for me:
(?:^a\n|\G(?<!\A))\n*\Kb$
See the online demo. I don't know VSCode but a quick search let me to believe it should follow Perl based PCRE2 syntax as per the linked demo.
(?: - Open non-capture group:
^a\n - Start line anchor followed by "a" and a newline character.
| - Or:
\G(?<!\A) - Meta escape, assert position at end of previous match or start of string. The negative lookbehind prevents the start of string position to be matched.
) - Close non-capture group.
\n* - 0+ new-line characters.
\K - Meta escape, reset starting point of reported match.
b$ - Match a literal "b", followed by an end-line anchor.

Regex - How to prevent any string that starts with "de" but cannot use lookahead or lookbehind?

I have a regex
[a-zA-Z][a-z]
I have to change this regex such that the regex should not accept string that starts with "de","DE","dE" and "De" .I cannot use look behind or look ahead because my system does not support it?
There's a solution without a lookahead or lookbehind, but you need to be able to use groups.
The idea there is to create a sort of "honeypot" that will match your negative results and keep only the results that do interest you.
In your case, that would write:
[dD][eE].*|(<your-regex>)
If the proposition is de<anything> (case insensitive here), it will match, but group(1) will be null.
On the other hand, matching diZ for instance would match not match what is before the or and would therefore fall into the group(1).
Finally, if the proposition doesn't start with de and doesn't match your regex, well, there will be no groups to get at all.
If you need to be sure that your proposition will match the whole provided string, you can update the regex thus:
^(?:[dD][eE].*|(<your-regex>))$
Note that ?: is not a lookahead of any kind, it serves to mark the group as non-capturing, so that <your-regex> will still be captured by group(1) (would become group(2) otherwise and the capture of a group is not always a transparent operation, performance-wise).
Simply ignore those characters:
[a-ce-z][a-df-z][a-gi-kwxyzWZXZ]
Make sure the flag is set to case insensitive. Also, [a-gi-kwxyzWZXZ] can then be modified to [a-gi-kwxyz].
EDIT:
As pointed out in this comment, the regex here won't support other words that start with d but are not followed by e. In this case, negative lookahead is a possible solution:
^(?!de)[a-z]+
This matches anything not starting with "DE" (case insensitive, without look arounds, allowing leading whitespace):
^ *+(?:[^Dd].|.[^Ee])<your regex for rest of input>
See live demo.
The possessive quantifier *+ used for whitespace prevents [^Dd] from being allowed to match a space via backtracking, making this regex hardened against leading spaces.
You can use an alternation excluding matching the d and D from the first character, or exclude matching the e as the second character.
Note that the pattern [a-zA-Z][a-z] matches at least 2 characters, so will the following pattern:
^(?:[abce-zABCE-Z][a-z]|[a-zA-Z][a-df-z]).*
^ Start of string
(?: Non capture group
[abce-zABCE-Z][a-z] Match a char a-zA-Z without d and D followed by a lowercase char a-z
| or
[a-zA-Z][a-df-z] Match a char a-zA-Z followed by a lowercase chars a-z without e
) Close non capture grou
.* Match 0+ times any char except a newline
Regex demo
Another option is to use word boundaries \b instead of an anchor ^
\b(?:[abce-zABCE-Z][a-z]|[a-zA-Z][a-df-z])[a-zA-Z]*\b
Regex demo

RegEx: Excluding a pattern from the match

I know some basics of the RegEx but not a pro in it. And I am learning it. Currently, I am using the following very very simple regex to match any digit in the given sentence.
/d
Now, I want that, all the digits except some patterns like e074663 OR e123444 OR e7736 should be excluded from the match. So for the following input,
Edit 398e997979 the Expression 9798729889 & T900980980098ext to see e081815 matches. Roll over matches or e081815 the expression e081815 for details.e081815 PCRE & JavaScript flavors of RegEx are e081815 supported. Validate your expression with Tests mode e081815.
Only bold digits should be matched and not any e081815. I tried the following without the success.
(^[e\d])(\d)
Also, going forward, some more patterns needs to be added for exclusion. For e.g. cg636553 OR cg(any digits). Any help in this regards will be much appreciated. Thanks!
Try this:
(?<!\be)(?<!\d)\d+
Test it live on regex101.com.
Explanation:
(?<!\be) # make sure we're not right after a word boundary and "e"
(?<!\d) # make sure we're not right after a digit
\d+ # match one or more digits
If you want to match individual digits, you can achieve that using the \G anchor that matches at the position after a successful match:
(?:(?<!\be)(?<=\D)|\G)\d
Test it here
Another option is to use a capturing group with lookarounds
(?:\b(?!e|cg)|(?<=\d)\D)[A-Za-z]?(\d+)
(?: Non capture group
\b(?!e|cg) Word boundary, assert what is directly to the right is not e or cg
| Or
(?<=\d)\D Match any char except a digit, asserting what is directly on the left is a digit
) Close group
[A-Za-z]? Match an optional char a-zA-Z
(\d+) Capture 1 or more digits in group 1
Regex demo

Regex: don't match if the pattern start with /

My regex (PCRE):
\b([\w-.]*error)\b(?:[^-\/.]|\.\W|\.$|$)
is a match (the actual match is surrounded by stars) :
**this.is.an.error**
**this.IsAnerror**
**this.is.an.error**.
**this.is.an.error**(
bla **this_is-an-error**
**this.is.an.error**:
this is an (**error**)
not a match:
this.is.an.error.but.dont.match
this.is.an.error-but.dont.match
this.is.an.error/but.dont.match
this.is.an.error/
/this.is.an.error
for this sample: /this.is.an.error
I can't manage to have a condition that will reject the whole match if it starts with the character /.
every combination I've tried resulted in some partial catch (which is not the desired).
Is there any simple or fancy way to do that?
You can try to add lookabehinds at the beginning instead of a word boundary:
(?<!\/)(?<=[^\w-.])([\w-.]*error)\b(?:[^-\/.]|\.\W|\.$|$)
Explanation:
(?<!\/) - negative lookbehind assuring there is no / before the first character;
(?<=[^\w-.]) - word boundary implementation taking into account your extended definition of characters accepted for a word [\w-.];
Demo
Prepend your regex with \/.*|:
\/.*|\b([\w-.]*error)\b(?=[^-\/.]|(?:\.\W?)?$)
Now just like before the first capturing group holds the desired part.
See live demo here
Note: I made some modifications to your regex to remove unnecessary alternations.

how to get sub-string using regex if I specify start and end, without start characters?

I have string like this:
12abcc?p_auth=123ABC&ABC&s
Start of symbol is "p_auth=" and end of string first "&" symbol.
P.S symbol '&' and 'p_auth=' must not be included.
I have wrote that regex:
(p_auth).+?(?=&)
Ok, thats works well, it gets that sub-string:
p_auth=123ABC
bot how to get string without 'p_auth'?
Use look-arounds:
(?<=p_auth=).*?(?=&)
See regex demo
The look-behind (?<=p_auth=) and the look-ahead (?=&) do not consume characters as they are zero-width assertions. They just check for the substring presence either before or after a certain subpattern.
A couple more words about (?<=p_auth=). It is a positive look-behind. Positive because it require a pattern inside it to appear on the left, before the "main" subpattern. If the look-behind subpattern is found, the result is just "true" and the regex goes on checking the rest of subpatterns. If not, the match is failed, the engine goes on looking for another match at the next index.
Here is some description from regular-expressions.info:
It [the look-behind] tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (?<!a)b matches a "b" that is not preceded by an "a", using negative lookbehind. It doesn't match cab, but matches the b (and only the b) in bed or debt. (?<=a)b (positive lookbehind) matches the b (and only the b) in cab, but does not match bed or debt.
In most cases, you do not really need look-arounds. In this case, you could just use a
p_auth(.*?)&
And get the first capturing group value.
The .*? pattern will look for any number of characters other than a newline, but as few as possible that are required to find a match. It is called lazy dot matching, because the ? symbol makes the * quantifier stop before the first symbol that is matched by the subsequent subpattern in the regular expression.
The .*& would match all the substring until the last & because * quantifier is greedy - it will consume as many characters it can match as possible.
See more at Repetition with Star and Plus regular-expressions.info page.
p_auth(.+?)(?=&)
Simply use this and grab the group 1 or capture 1.