I'm looking for a regular expression that allows me to select words surrounded by whitespaces. The obvious pattern /\s(\w*)\s/g does not work because end of line is considered as whitespace.
I need something like this:
not-match match not-match
not-match match not-match
The \s will match the non-printing white"space" characters \n and \r as well, yes. If you only want to match the characters that actually leave "space", you have to specify them. Also, use the zero-width lookahead and lookbehind so as not to "consume" the space for the neighboring words matches:
/(?<=[ \t])(\w+)(?=[ \t])/g
Related
I cannot make a regex that only captures a trailing space or N of spaces, followed by a single letter s.
((\s)+(s){1,1})
Works but breaks when you start to stress test it, for example it greedily captures words beginning with s.
word s word s
word s
word suffering
word spaces
word s some ss spaces
there's something wrong
words S s
If you want a single letter s to be captured, as opposed to an s at the beginning of a longer word, you need to specify a word break \b after s:
\s+s\b
Demo on regex101
If you for example do not want to match in s# you can also assert a whitespace boundary to the right.
Note that for a match only, you can omit all the capture groups, and using (s){1,1} is the same as (s){1} which by itself can be omitted and would leave just s
\s+s(?!\S)
Regex demo
As \s can also match a newline, if you want to match spaces without newlines:
[^\S\n]+s(?!\S)
Regex demo
I would like to make a regex to match a word, but don't match it if there are special characters on its sides.
I tried to use a word boundary (\b) on both sides but it doesn't seem to exclude special characters...
For example, this should work:
text word-to-match more-text
But this should not:
text word-to-match-more-text
Because there is a - between the word to match and more text.
What i have now is this:
(?<=[^-\[\]{}()+?.,\\^$|#])\bword-to-match\b(?=[^-\[\]{}()+?.,\\^$|#])
I would like to know if there is a more elegant way instead of using [^-\[\]{}()+?.,\\^$|#]) on both sides of the word.
Thanks in advance!
You may use lookahead and lookbehind on both sides to fail the match if there is a non-whitespace character on either side:
(?<!\S)word-to-match(?!\S)
RegEx Demo
(?<!\S): Fail if previous character is a non-whitespace
(?!\S): Fail if next character is a non-whitespace
I want to regex match the last word in a string where the string ends in ... The match should be the word preceding the ...
Example: "Do not match this. This sentence ends in the last word..."
The match would be word. This gets close: \b\s+([^.]*). However, I don't know how to make it work with only matching ... at the end.
This should NOT match: "Do not match this. This sentence ends in the last word."
If you use \s+ it means there must be at least a single whitespace char preceding so in that case it will not match word... only.
If you want to use the negated character class, you could also use
([^\s.]+)\.{3}$
( Capture group 1
[^\s.]+ Match 1+ times any char except a whitespace char or dot
) Close group
\.{3} Match 3 dots
$ End of string
Regex demo
You can anchor your regex to the end with $. To match a literal period you will need to escape it as it otherwise is a meta-character:
(\S+)\.\.\.$
\S matches everything everything but space-like characters, it depends on your regex flavor what it exactly matches, but usually it excludes spaces, tabs, newlines and a set of unicode spaces.
You can play around with it here:
https://regex101.com/r/xKOYa4/1
I need to match only those words which doesn't have special characters like # and :.
For example:
git#github.com shouldn't match
list should return a valid match
show should also return a valid match
I tried it using a negative lookahead \w+(?![#:])
But it matches gi out of git#github.com but it shouldn't match that too.
You may add \w to the lookahead:
\w+(?![\w#:])
The equivalent is using a word boundary:
\w+\b(?![#:])
Besides, you may consider adding a left-hand boundary to avoid matching words inside non-word non-whitespace chunks of text:
^\w+(?![\w#:])
Or
(?<!\S)\w+(?![\w#:])
The ^ will match the word at the start of the string and (?<!S) will match only if the word is preceded with whitespace or start of string.
See the regex demo.
Why not (?<!\S)\w+(?!\S), the whitespace boundaries? Because since you are building a lexer, you most probably have to deal with natural language sentences where words are likely to be followed with punctuation, and the (?!\S) negative lookahead would make the \w+ match only when it is followed with whitespace or at the end of the string.
You can use negative lookbehind and negative lookahead patterns around a word pattern to make sure that the word is not preceded or followed by a non-space character, or in other words, to make sure that it is surrounded by either a space or a string boundary:
(?<!\S)\w+(?!\S)
Demo: https://regex101.com/r/cjhUUM/2
How to regex match words that have digits or any non-characters inside words, excluding when digits and non-characters (\/°†#*()'\s+&;±|-\^) are at the end of word? I need to match dAS2a but not dASI6. Could not adapt the Regex to match string not ending with pattern solution.
dA/Sa
dAS2a
dASI/
dASI6
http://regex101.com/r/qM4dV7/1 failed.
This should work just fine (if you use the gmi modifiers):
^.*[a-z]$
Demo
You said each word is on a new line. Using the m modifier we can anchor each expression to the beginning/end of a line with ^ and $ anchors (without the modifier, this means beginning/end of the string). Then you said a word can essentially be anything (.*) as long as it ends in a non-digit or non-special character (I took that to mean a "letter", [a-z] with the i modifier).