Escape brackets in a regex with alternation

Escape brackets in a regex with alternation - regex

I am trying to write a Reg Expression to match any word from a list of words but am having trouble with words with brackets.
This is the reg expression I have so far:
^\b(?:Civil Services|Assets Management|Engineering Works (EW)|EW Maintenance|Ferry|Road Maintenance|Infrastructure Planning (IP)|Project Management Office (PMO)|Resource Recovery (RR)|Waste)\b$
Words with brackets such as Civil Services are matched but not words with brackets such as Engineering Works (EW).
I have tried single escaping with \ and double escaping (\) but neither option seems to return a match when testing words with brackets in them.
How can I also match words with brackets?

The problem is that \b can't match a word boundary the way you want when it's preceded by a ). A word boundary is a word character adjacent to a non-word character or end-of-string. A word character is a letter, digit, or underscore; notably, ) is not a word character. That means that )\b won't match a parenthesis followed by a space, nor a parenthesis at the end of the string.
The easiest fix is to remove the \bs. You don't actually need them since you've already got ^ and $ anchors:
^(?:Orange|Banana|Apple \(Red\)| Apple \(Green\)|Plum|Mango)$
Alternatively, if you want to search in a larger string you could use a lookahead to look a non-word character or end-of-string. This is essentially what \b does except we only look ahead, not behind.
\b(?:Orange|Banana|Apple \(Red\)| Apple \(Green\)|Plum|Mango)(?=\W|$)

Related

regex last character of a WORD

I'm attempting to match the last character in a WORD.
A WORD is a sequence of non-whitespace characters
'[^\n\r\t\f ]', or an empty line matching ^$.
The expression I made to do this is:
"[^ \n\t\r\f]\(?:[ \$\n\t\r\f]\)"
The regex matches a non-whitespace character that follows a whitespace character or the end of the line.
But I don't know how to stop it from excluding the following whitespace character from the result and why it doesn't seem to capture a character preceding the end of the line.
Using the string "Hi World!", I would expect: the "i" and "!" to be captured.
Instead I get: "i ".
What steps can I take to solve this problem?

"Word" that is a sequence of non-whitespace characters scenario
Note that a non-capturing group (?:...) in [^ \n\t\r\f](?:[ \$\n\t\r\f]) still matches (consumes) the whitespace char (thus, it becomes a part of the match) and it does not match at the end of the string as the $ symbol is not a string end anchor inside a character class, it is parsed as a literal $ symbol.
You may use
\S(?!\S)
See the regex demo
The \S matches a non-whitespace char that is not followed with a non-whitespace char (due to the (?!\S) negative lookahead).
General "word" case
If a word consists of just letters, digits and underscores, that is, if it is matched with \w+, you may simply use
\w\b
Here, \w matches a "word" char, and the word boundary asserts there is no word char right after.
See another regex demo.

In Word text, if I want to highlight the last a in para. I search for all the words that have [space][para][space] to make sure I only have the word I want, then when it is found it should be highlighted.
Next, I search for the last [a ] space added, in the selection and I will get only the last [a] and I will highlight it or color it differently.

Regular expression to match alphanumeric, hyphen, underscore and space string

I'm trying to match a string that contains alphanumeric, hyphen, underscore and space.
Hyphen, underscore, space and numbers are optional, but the first and last characters must be letters.
For example, these should all match:
abc
abc def
abc123
ab_cd
ab-cd
I tried this:
^[a-zA-Z0-9-_ ]+$
but it matches with space, underscore or hyphen at the start/end, but it should only allow in between.

Use a simple character class wrapped with letter chars:
^[a-zA-Z]([\w -]*[a-zA-Z])?$
This matches input that starts and ends with a letter, including just a single letter.
There is a bug in your regex: You have the hyphen in the middle of your characters, which makes it a character range. ie [9-_] means "every char between 9 and _ inclusive.
If you want a literal dash in a character class, put it first or last or escape it.
Also, prefer the use of \w "word character", which is all letters and numbers and the underscore in preference to [a-zA-Z0-9_] - it's easier to type and read.

Check this working in fiddle http://refiddle.com/refiddles/56a07cec75622d3ff7c10000
This will fix the issue
^[a-zA-Z]+[a-zA-Z0-9-_ ]*[a-zA-Z0-9]$

I tried using following regex:
/^\w+([\s-_]\w+)*$/
This allows alphanumeric, underscore, space and dash.
More details

As per your requirement of including space, hyphen, underscore and alphanumeric characters you can use \w shorthand character set for [a-zA-Z0-9_]. Escape the hyphen using \- as it usually used for character range inside character set.
To negate the space and hyphen at the beginning and end I have used [^\s\-].
So complete regex becomes [^\s\-][\w \-]+[^\s\-]
Here is the working demo.

You can use this regex:
^[a-zA-Z0-9]+(?:[\w -]*[a-zA-Z0-9]+)*$
RegEx Demo
This will only allow alphanumerics at start and end.

Regex negative lookahead and word boundary removes first character from capture group

I am trying to capture every word in a string except for 'and'. I also want to capture words that are surrounded by asterisks like *this*. The regex command I am using mostly works, but when it captures a word with asterisks, it will leave out the first one (so *this* would only have this* captured). Here is the regex I'm using:
/((?!and\b)\b[\w*]+)/gi
When I remove the last word boundary, it will capture all of *this* but won't leave out any of the 'and' s.

The problem is that * is not treated as a word character, so \b don't match a position before it. I think you can replace it with:
^(?!and\b)([\w*]+)|((?!and\b)(?<=\W)[\w*]+)
The \b was repleced with \W (non-word character) to match also *, however then the first word in string will not match because is not precedeed by non-word character. This is why I added alternative.
DEMO

Find slash that are NOT followed by non word character

I am trying to write a regex for finding slashes only that are not followed by special characters.
For example, if the string is,
/PErs/#loc/g/2, then I regex should find slashes (/) that are before P, g and 2. It should not return slash before # as # is a special character.
I could write \/\w but it is returning me /P, /g and /2.

Simplest one by using word boundary \b.
\/\b
\b matches between a word character and a non-word character.
DEMO

You want to use the lookahead operator.
Positive lookahead or detect if something is present after (ahead)
Try this regex instead:
\/(?=\w)
DEMO
We use here the positive lookahead operator (?=). It will "detect" the position of a given expression but won't match the expression.
Negative lookahead or detect if something is NOT present after (ahead)
Alternatively, you can also use the negative look ahead operator (?!).
\/(?![#])
DEMO
Negative lookahead with multiple special characters
This will match any / NOT followed by #. If you have more special characters, simply add them to the character class.
For example, if # and % were special characters, the regular expression above would become:
\/(?![##%])
DEMO

Matching slashes NOT followed by NON word character is not the same than followed by word character.
Have a try with:
/(?!\W)
This matches slashes NOT followed by NON word character
It matches the final slash in string: PErs/

How can I use regex for all words beginning with : punctuation?

How can I use regex for all words beginning with : punctuation?
This gets all words beginning with a:
\ba\w*\b
The minute I change the letter a to :, the whole thing fails. Am I supposed to escape the colon, and if so, how?

\b matches between a non-alphanumeric and an alphanumeric character, so if you place it before :, it only matches if there is a letter/digit right before the colon.
So you either need to drop the \b here or specify what exactly constitutes a boundary in this situation, for example:
(?<!\w):\w*\b
That would ensure that there is no letter/digit/underscore right before the :. Of course this presumes a regex flavor that supports lookbehind assertions.

The problem is that \b won't match the start of a word when the word starts with a colon :, because colon is not a word character. Try this:
(?<=:)\w*\b
This uses a (non-capturing) look-behind to assert that the previous character is a colon.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Escape brackets in a regex with alternation - regex

Related

regex last character of a WORD

Regular expression to match alphanumeric, hyphen, underscore and space string

Regex negative lookahead and word boundary removes first character from capture group

Find slash that are NOT followed by non word character

How can I use regex for all words beginning with : punctuation?

Categories

Resources