Regex boundary to also exclude special characters - regex

I would like to make a regex to match a word, but don't match it if there are special characters on its sides.
I tried to use a word boundary (\b) on both sides but it doesn't seem to exclude special characters...
For example, this should work:
text word-to-match more-text
But this should not:
text word-to-match-more-text
Because there is a - between the word to match and more text.
What i have now is this:
(?<=[^-\[\]{}()+?.,\\^$|#])\bword-to-match\b(?=[^-\[\]{}()+?.,\\^$|#])
I would like to know if there is a more elegant way instead of using [^-\[\]{}()+?.,\\^$|#]) on both sides of the word.
Thanks in advance!

You may use lookahead and lookbehind on both sides to fail the match if there is a non-whitespace character on either side:
(?<!\S)word-to-match(?!\S)
RegEx Demo
(?<!\S): Fail if previous character is a non-whitespace
(?!\S): Fail if next character is a non-whitespace

Related

Regex for allowing apostrophe and period

I have below regex which is used for removing punctuations from a string. What I need is to allow only apostrophes and periods in between words such as “Zipf’s”, “e.g”.
[^\w\s]
An idea to use non word boundaries (where no word-character touches specified characters).
\B matches at any position between two word characters as well as at any position between two non-word characters ...
[^\w\s.’']|\B[.’']\B
See this demo at regex101

Unmatch complete words if a negative lookahead is satisfied

I need to match only those words which doesn't have special characters like # and :.
For example:
git#github.com shouldn't match
list should return a valid match
show should also return a valid match
I tried it using a negative lookahead \w+(?![#:])
But it matches gi out of git#github.com but it shouldn't match that too.
You may add \w to the lookahead:
\w+(?![\w#:])
The equivalent is using a word boundary:
\w+\b(?![#:])
Besides, you may consider adding a left-hand boundary to avoid matching words inside non-word non-whitespace chunks of text:
^\w+(?![\w#:])
Or
(?<!\S)\w+(?![\w#:])
The ^ will match the word at the start of the string and (?<!S) will match only if the word is preceded with whitespace or start of string.
See the regex demo.
Why not (?<!\S)\w+(?!\S), the whitespace boundaries? Because since you are building a lexer, you most probably have to deal with natural language sentences where words are likely to be followed with punctuation, and the (?!\S) negative lookahead would make the \w+ match only when it is followed with whitespace or at the end of the string.
You can use negative lookbehind and negative lookahead patterns around a word pattern to make sure that the word is not preceded or followed by a non-space character, or in other words, to make sure that it is surrounded by either a space or a string boundary:
(?<!\S)\w+(?!\S)
Demo: https://regex101.com/r/cjhUUM/2

How to consume lookaround in regex?

I want to match
abc_def_ghi,
abc_abc_ghi,
abc_a2a_ghi,
abc_999_ghi
but not abc_xxx_ghi (with xxx in center).
I came up to manually consuming look ahead (abc_(?!xxx)..._ghi), but I wonder is there any other way without manually specifying number of characters to skip.
Original qustion was with numbers, updated for strings case.
If you don't want to specify exactly how many characters to skip, perhaps you could use a quantifier like + in the negative lookahead and use a negated character class to match not an underscore.
\babc_(?!x+_)[^_]+_ghi\b
Explanation
\babc_ Word boundary, match abc_
(?! Negative lookahead, assert what is directly on the right is not
x+_ Match 1+ times x followed by an underscore
) Close lookahead
[^_]+_ Negated character class, match 1+ times any char except _
ghi\b Match ghi and word boundary
Regex demo
You can use this
123_(?:(?!000)\d){3}_789
Regex demo
If you don't wish to use look-arounds, this expression might be an option:
(?:abc_xxx_ghi)|(abc_.{3}_ghi)
Other than that I can't think of anything else.
DEMO

Regular expression to select only the words surrounded by whitespaces

I'm looking for a regular expression that allows me to select words surrounded by whitespaces. The obvious pattern /\s(\w*)\s/g does not work because end of line is considered as whitespace.
I need something like this:
not-match match not-match
not-match match not-match
The \s will match the non-printing white"space" characters \n and \r as well, yes. If you only want to match the characters that actually leave "space", you have to specify them. Also, use the zero-width lookahead and lookbehind so as not to "consume" the space for the neighboring words matches:
/(?<=[ \t])(\w+)(?=[ \t])/g

How can I use regex for all words beginning with : punctuation?

How can I use regex for all words beginning with : punctuation?
This gets all words beginning with a:
\ba\w*\b
The minute I change the letter a to :, the whole thing fails. Am I supposed to escape the colon, and if so, how?
\b matches between a non-alphanumeric and an alphanumeric character, so if you place it before :, it only matches if there is a letter/digit right before the colon.
So you either need to drop the \b here or specify what exactly constitutes a boundary in this situation, for example:
(?<!\w):\w*\b
That would ensure that there is no letter/digit/underscore right before the :. Of course this presumes a regex flavor that supports lookbehind assertions.
The problem is that \b won't match the start of a word when the word starts with a colon :, because colon is not a word character. Try this:
(?<=:)\w*\b
This uses a (non-capturing) look-behind to assert that the previous character is a colon.