Escape brackets in a regex with alternation - regex

I am trying to write a Reg Expression to match any word from a list of words but am having trouble with words with brackets.
This is the reg expression I have so far:
^\b(?:Civil Services|Assets Management|Engineering Works (EW)|EW Maintenance|Ferry|Road Maintenance|Infrastructure Planning (IP)|Project Management Office (PMO)|Resource Recovery (RR)|Waste)\b$
Words with brackets such as Civil Services are matched but not words with brackets such as Engineering Works (EW).
I have tried single escaping with \ and double escaping (\) but neither option seems to return a match when testing words with brackets in them.
How can I also match words with brackets?

The problem is that \b can't match a word boundary the way you want when it's preceded by a ). A word boundary is a word character adjacent to a non-word character or end-of-string. A word character is a letter, digit, or underscore; notably, ) is not a word character. That means that )\b won't match a parenthesis followed by a space, nor a parenthesis at the end of the string.
The easiest fix is to remove the \bs. You don't actually need them since you've already got ^ and $ anchors:
^(?:Orange|Banana|Apple \(Red\)| Apple \(Green\)|Plum|Mango)$
Alternatively, if you want to search in a larger string you could use a lookahead to look a non-word character or end-of-string. This is essentially what \b does except we only look ahead, not behind.
\b(?:Orange|Banana|Apple \(Red\)| Apple \(Green\)|Plum|Mango)(?=\W|$)

Related

regex last character of a WORD

I'm attempting to match the last character in a WORD.
A WORD is a sequence of non-whitespace characters
'[^\n\r\t\f ]', or an empty line matching ^$.
The expression I made to do this is:
"[^ \n\t\r\f]\(?:[ \$\n\t\r\f]\)"
The regex matches a non-whitespace character that follows a whitespace character or the end of the line.
But I don't know how to stop it from excluding the following whitespace character from the result and why it doesn't seem to capture a character preceding the end of the line.
Using the string "Hi World!", I would expect: the "i" and "!" to be captured.
Instead I get: "i ".
What steps can I take to solve this problem?
"Word" that is a sequence of non-whitespace characters scenario
Note that a non-capturing group (?:...) in [^ \n\t\r\f](?:[ \$\n\t\r\f]) still matches (consumes) the whitespace char (thus, it becomes a part of the match) and it does not match at the end of the string as the $ symbol is not a string end anchor inside a character class, it is parsed as a literal $ symbol.
You may use
\S(?!\S)
See the regex demo
The \S matches a non-whitespace char that is not followed with a non-whitespace char (due to the (?!\S) negative lookahead).
General "word" case
If a word consists of just letters, digits and underscores, that is, if it is matched with \w+, you may simply use
\w\b
Here, \w matches a "word" char, and the word boundary asserts there is no word char right after.
See another regex demo.
In Word text, if I want to highlight the last a in para. I search for all the words that have [space][para][space] to make sure I only have the word I want, then when it is found it should be highlighted.
Next, I search for the last [a ] space added, in the selection and I will get only the last [a] and I will highlight it or color it differently.

Regular expression to match alphanumeric, hyphen, underscore and space string

I'm trying to match a string that contains alphanumeric, hyphen, underscore and space.
Hyphen, underscore, space and numbers are optional, but the first and last characters must be letters.
For example, these should all match:
abc
abc def
abc123
ab_cd
ab-cd
I tried this:
^[a-zA-Z0-9-_ ]+$
but it matches with space, underscore or hyphen at the start/end, but it should only allow in between.
Use a simple character class wrapped with letter chars:
^[a-zA-Z]([\w -]*[a-zA-Z])?$
This matches input that starts and ends with a letter, including just a single letter.
There is a bug in your regex: You have the hyphen in the middle of your characters, which makes it a character range. ie [9-_] means "every char between 9 and _ inclusive.
If you want a literal dash in a character class, put it first or last or escape it.
Also, prefer the use of \w "word character", which is all letters and numbers and the underscore in preference to [a-zA-Z0-9_] - it's easier to type and read.
Check this working in fiddle http://refiddle.com/refiddles/56a07cec75622d3ff7c10000
This will fix the issue
^[a-zA-Z]+[a-zA-Z0-9-_ ]*[a-zA-Z0-9]$
I tried using following regex:
/^\w+([\s-_]\w+)*$/
This allows alphanumeric, underscore, space and dash.
More details
As per your requirement of including space, hyphen, underscore and alphanumeric characters you can use \w shorthand character set for [a-zA-Z0-9_]. Escape the hyphen using \- as it usually used for character range inside character set.
To negate the space and hyphen at the beginning and end I have used [^\s\-].
So complete regex becomes [^\s\-][\w \-]+[^\s\-]
Here is the working demo.
You can use this regex:
^[a-zA-Z0-9]+(?:[\w -]*[a-zA-Z0-9]+)*$
RegEx Demo
This will only allow alphanumerics at start and end.

Regex negative lookahead and word boundary removes first character from capture group

I am trying to capture every word in a string except for 'and'. I also want to capture words that are surrounded by asterisks like *this*. The regex command I am using mostly works, but when it captures a word with asterisks, it will leave out the first one (so *this* would only have this* captured). Here is the regex I'm using:
/((?!and\b)\b[\w*]+)/gi
When I remove the last word boundary, it will capture all of *this* but won't leave out any of the 'and' s.
The problem is that * is not treated as a word character, so \b don't match a position before it. I think you can replace it with:
^(?!and\b)([\w*]+)|((?!and\b)(?<=\W)[\w*]+)
The \b was repleced with \W (non-word character) to match also *, however then the first word in string will not match because is not precedeed by non-word character. This is why I added alternative.
DEMO

Find slash that are NOT followed by non word character

I am trying to write a regex for finding slashes only that are not followed by special characters.
For example, if the string is,
/PErs/#loc/g/2, then I regex should find slashes (/) that are before P, g and 2. It should not return slash before # as # is a special character.
I could write \/\w but it is returning me /P, /g and /2.
Simplest one by using word boundary \b.
\/\b
\b matches between a word character and a non-word character.
DEMO
You want to use the lookahead operator.
Positive lookahead or detect if something is present after (ahead)
Try this regex instead:
\/(?=\w)
DEMO
We use here the positive lookahead operator (?=). It will "detect" the position of a given expression but won't match the expression.
Negative lookahead or detect if something is NOT present after (ahead)
Alternatively, you can also use the negative look ahead operator (?!).
\/(?![#])
DEMO
Negative lookahead with multiple special characters
This will match any / NOT followed by #. If you have more special characters, simply add them to the character class.
For example, if # and % were special characters, the regular expression above would become:
\/(?![##%])
DEMO
Matching slashes NOT followed by NON word character is not the same than followed by word character.
Have a try with:
/(?!\W)
This matches slashes NOT followed by NON word character
It matches the final slash in string: PErs/

How can I use regex for all words beginning with : punctuation?

How can I use regex for all words beginning with : punctuation?
This gets all words beginning with a:
\ba\w*\b
The minute I change the letter a to :, the whole thing fails. Am I supposed to escape the colon, and if so, how?
\b matches between a non-alphanumeric and an alphanumeric character, so if you place it before :, it only matches if there is a letter/digit right before the colon.
So you either need to drop the \b here or specify what exactly constitutes a boundary in this situation, for example:
(?<!\w):\w*\b
That would ensure that there is no letter/digit/underscore right before the :. Of course this presumes a regex flavor that supports lookbehind assertions.
The problem is that \b won't match the start of a word when the word starts with a colon :, because colon is not a word character. Try this:
(?<=:)\w*\b
This uses a (non-capturing) look-behind to assert that the previous character is a colon.