I have the next regex which keeps "c" and delimiter sign from replacement
(?<=c[:=\s]|:=).+
But the problem is in case of spaces after delimiter, it replaces them as well:
c= test1
will replace for example with:
c=test
How can I preserve space after delimiter sign in order it will not be replaced:
c= test
I have tried the next:
(?<=c[:=\s]\s).+
But in it doesn't do matching and correct replacement for strings which do not contain a space after delimiter:
c=test1
You could match c followed by : or = and zero or more whitespace characters \s+ and then capture one or more characters in a group (.+). Start with a word boundary \b to make sure c is not part of a longer match.
As replacement you could use the first capturing group \1 followed by your replacement text.
Match
\b(c[:=]\s*).+
Replace with
\1test
Demo Python
Related
I am trying to capture a string that can contain any character but must always be followed by ';'
I want to capture it and trim the white space around it. I've tried using positive lookahead but that does not seem to exclude the whitespace.
Example:
this is a match ;
this is not a match
regex:
.+(?=\s*;)
result:
"this is a match " gets captured with trailing white space behind.
expected result:
"this is a match" (without whitespace)
You have to make sure the first and the last characters of your match are not spaces. Thus we use the non-whitespace character match (\S) before and after the all character match (.*). As spaces might be optional, the any character match (.) must be optional, thus we use * instead of +.
\S.*\S(?=\s*;)
If the string can start with space use .*\S(?=\s*;).
Demonstration
Thanks to #CarySwoveland for improving the answer.
You can match
.*(?<!\s)(?=\s*;)
provided the regex engine supports negative lookbehinds.
Demo
Note that this returns an empty string if the string is " ;".
You can make the dot non greedy and start the match with a non whitespace character:
\S.*?(?=\s*;)
Regex demo
If the non whitespace character itself should also not be a semicolon:
[^\s;].*?(?=\s*;)
I cannot make a regex that only captures a trailing space or N of spaces, followed by a single letter s.
((\s)+(s){1,1})
Works but breaks when you start to stress test it, for example it greedily captures words beginning with s.
word s word s
word s
word suffering
word spaces
word s some ss spaces
there's something wrong
words S s
If you want a single letter s to be captured, as opposed to an s at the beginning of a longer word, you need to specify a word break \b after s:
\s+s\b
Demo on regex101
If you for example do not want to match in s# you can also assert a whitespace boundary to the right.
Note that for a match only, you can omit all the capture groups, and using (s){1,1} is the same as (s){1} which by itself can be omitted and would leave just s
\s+s(?!\S)
Regex demo
As \s can also match a newline, if you want to match spaces without newlines:
[^\S\n]+s(?!\S)
Regex demo
I want to regex match the last word in a string where the string ends in ... The match should be the word preceding the ...
Example: "Do not match this. This sentence ends in the last word..."
The match would be word. This gets close: \b\s+([^.]*). However, I don't know how to make it work with only matching ... at the end.
This should NOT match: "Do not match this. This sentence ends in the last word."
If you use \s+ it means there must be at least a single whitespace char preceding so in that case it will not match word... only.
If you want to use the negated character class, you could also use
([^\s.]+)\.{3}$
( Capture group 1
[^\s.]+ Match 1+ times any char except a whitespace char or dot
) Close group
\.{3} Match 3 dots
$ End of string
Regex demo
You can anchor your regex to the end with $. To match a literal period you will need to escape it as it otherwise is a meta-character:
(\S+)\.\.\.$
\S matches everything everything but space-like characters, it depends on your regex flavor what it exactly matches, but usually it excludes spaces, tabs, newlines and a set of unicode spaces.
You can play around with it here:
https://regex101.com/r/xKOYa4/1
I'm attempting to match the last character in a WORD.
A WORD is a sequence of non-whitespace characters
'[^\n\r\t\f ]', or an empty line matching ^$.
The expression I made to do this is:
"[^ \n\t\r\f]\(?:[ \$\n\t\r\f]\)"
The regex matches a non-whitespace character that follows a whitespace character or the end of the line.
But I don't know how to stop it from excluding the following whitespace character from the result and why it doesn't seem to capture a character preceding the end of the line.
Using the string "Hi World!", I would expect: the "i" and "!" to be captured.
Instead I get: "i ".
What steps can I take to solve this problem?
"Word" that is a sequence of non-whitespace characters scenario
Note that a non-capturing group (?:...) in [^ \n\t\r\f](?:[ \$\n\t\r\f]) still matches (consumes) the whitespace char (thus, it becomes a part of the match) and it does not match at the end of the string as the $ symbol is not a string end anchor inside a character class, it is parsed as a literal $ symbol.
You may use
\S(?!\S)
See the regex demo
The \S matches a non-whitespace char that is not followed with a non-whitespace char (due to the (?!\S) negative lookahead).
General "word" case
If a word consists of just letters, digits and underscores, that is, if it is matched with \w+, you may simply use
\w\b
Here, \w matches a "word" char, and the word boundary asserts there is no word char right after.
See another regex demo.
In Word text, if I want to highlight the last a in para. I search for all the words that have [space][para][space] to make sure I only have the word I want, then when it is found it should be highlighted.
Next, I search for the last [a ] space added, in the selection and I will get only the last [a] and I will highlight it or color it differently.
I am trying to capture every word in a string except for 'and'. I also want to capture words that are surrounded by asterisks like *this*. The regex command I am using mostly works, but when it captures a word with asterisks, it will leave out the first one (so *this* would only have this* captured). Here is the regex I'm using:
/((?!and\b)\b[\w*]+)/gi
When I remove the last word boundary, it will capture all of *this* but won't leave out any of the 'and' s.
The problem is that * is not treated as a word character, so \b don't match a position before it. I think you can replace it with:
^(?!and\b)([\w*]+)|((?!and\b)(?<=\W)[\w*]+)
The \b was repleced with \W (non-word character) to match also *, however then the first word in string will not match because is not precedeed by non-word character. This is why I added alternative.
DEMO