Matching Regex with # character - regex

I am trying to match the following string:
style #
My regex is as follows:
^\s*\b(style #)\b\s*$
This is not matching my string.
If I try this regex:
^\s*\b(style n)\b\s*$
It matches the following string:
style n
This leads me to think that I am using the # character incorrectly.
What am I doing wrong?

The problem is that \b means a word boundary (with a letter/number/underscore on exactly one side), and your string doesn't have a word boundary after the # (because it's not followed by a letter/number/underscore). Just drop that part.
^\s*\b(style #)\s*$
(And you actually don't need the first \b, either, since the context guarantees there'll be a word boundary there.)

Related

How to overcome multiple matches within same sentence (regex) [duplicate]

I am trying to implement a regex which includes all the strings which have any number of words but cannot be followed by a : and ignore the match if it does. I decided to use a negative look ahead for it.
/([a-zA-Z]+)(?!:)/gm
string: lame:joker
since i am using a character range it is matching one character at a time and only ignoring the last character before the : .
How do i ignore the entire match in this case?
Link to regex101: https://regex101.com/r/DlEmC9/1
The issue is related to backtracking: once your [a-zA-Z]+ comes to a :, the engine steps back from the failing position, re-checks the lookahead match and finds a match whenver there are at least two letters before a colon, returning the one that is not immediately followed by :. See your regex demo: c in c:real is not matched as there is no position to backtrack to, and rea in real:c is matched because a is not immediately followed with :.
Adding implicit requirement to the negative lookahead
Since you only need to match a sequence of letters not followed with a colon, you can explicitly add one more condition that is implied: and not followed with another letter:
[A-Za-z]+(?![A-Za-z]|:)
[A-Za-z]+(?![A-Za-z:])
See the regex demo. Since both [A-Za-z] and : match a single character, it makes sense to put them into a single character class, so, [A-Za-z]+(?![A-Za-z:]) is better.
Preventing backtracking into a word-like pattern by using a word boundary
As #scnerd suggests, word boundaries can also help in these situations, but there is always a catch: word boundary meaning is context dependent (see a number of ifs in the word boundary explanation).
[A-Za-z]+\b(?!:)
is a valid solution here, because the input implies the words end with non-word chars (i.e. end of string, or chars other than letter, digits and underscore). See the regex demo.
When does a word boundary fail?
\b will not be the right choice when the main consuming pattern is supposed to match even if glued to other word chars. The most common example is matching numbers:
\d+\b(?!:) matches 12 in 12,, but not in 12:, and also 12c and 12_
\d+(?![\d:]) matches 12 in 12, and 12c and 12_, not in 12: only.
Do a word boundary check \b after the + to require it to get to the end of the word.
([a-zA-Z]+\b)(?!:)
Here's an example run.

Can't match character "#" within word boundary regex

I can't match the character "#" at the end of a word with regex
/\b(C#)\b/i
I'm working on some MongoDB queries. The subject of the search is programming languages on a given text field of my collection.
The regex I'm using, and is almost always working, is
/\b(java|php)\b/i
(for a concrete case where I'm looking for Java and PHP).
The word boundaries are needed to search whole words (javascript must not match java)
The problem is, as said before, when I look for "C#", the regex just fails, throwing no results.
The regex works if I remove the last boundary, but then the java/javascript example fails.
I've being stuck in this for a couple of days now, any help would be appreciated.
Per https://stackoverflow.com/a/3241901/2191572:
A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one.
You need to create your own definition of word boundary:
\b(java|php|c#)(?=[^a-z0-9_]|$)
https://regex101.com/r/LoOADH/1
Note: If you needed to match something like #b because it's the latest programming craze then you would also have to replace the leading \b with the lookahead and start string assertion ^ so it would look like:
(?=[^a-z0-9_]|^)(java|php|c#|#b)(?=[^a-z0-9_]|$)

Exclude any word followed by # from regex

I'm trying to use the regex.replace function in VB.NET, and I want to exclude any word that has an # symbol after it. At the moment, the pattern I'm using is "/b" & Term & "/b" (where Term is whatever word I want to replace).
Thanks.
You may try this:
\b(?<!#)[^#\s]+(?!#)\b
Regex Demo
Explanation
[^#\s]+ This will exclude any word that has '#'within or just
after it. character class [^] that starts with ^ indicates negate anything that is within the character class. Thus, ^ inside [] doesn't mean start of a string.
In many flavor The word boundary \b includes # as a boundary value.
Therefore you need to make sure that \b doesn't consider # as a
boundary. Therefore the lookahead and lookbehind has been introduced
here.
The first \b(?<!#) ensures word boundary but not #
The last (?!#)\b ensures word boundy but not #

Regex matching on word boundary OR non-digit

I'm trying to use a Regex pattern (in Java) to find a sequence of 3 digits and only 3 digits in a row. 4 digits doesn't match, 2 digits doesn't match.
The obvious pattern to me was:
"\b(\d{3})\b"
That matches against many source string cases, such as:
">123<"
" 123-"
"123"
But it won't match against a source string of "abc123def" because the c/1 boundary and the 3/d boundary don't count as a "word boundary" match that the \b class is expecting.
I would have expected the solution to be adding a character class that includes both non-Digit (\D) and the word boundary (\b). But that appears to be illegal syntax.
"[\b\D](\d{3})[\b\D]"
Does anybody know what I could use as an expression that would extract "123" for a source string situation like:
"abc123def"
I'd appreciate any help. And yes, I realize that in Java one must double-escape the codes like \b to \b, but that's not my issue and I didn't want to limit this to Java folks.
You should use lookarounds for those cases:
(?<!\d)(\d{3})(?!\d)
This means match 3 digits that are NOT followed and preceded by a digit.
Working Demo
Lookarounds can solve this problem, but I personally try to avoid them because not all regex engines fully support them. Additionally, I wouldn't say this issue is complicated enough to merit the use of lookarounds in the first place.
You could match this: (?:\b|\D)(\d{3})(?:\b|\D)
Then return: \1
Or if you're performing a replacement and need to match the entire string: (?:\b|\D)+(\d{3})(?:\b|\D)+
Then replace with: \1
As a side note, the reason \b wasn't working as part of a character class was because within brackets, [\b] actually has a completely different meaning--it refers to a backspace, not a word boundary.
Here's a Working Demo.

Regex - how to exclude single word?

I am using http://www.position-absolute.com/articles/jquery-form-validator-because-form-validation-is-a-mess/ for validation. Validation rules are defined in a following way:
"onlyLetterSp": {
"regex": /^[a-zA-Z\ \']+$/,
"alertText": "* Only letters"
}
I would like to add new rule, which will exclude one single word. I have read some similar questions on StackOverflow and tried to declare it with something like this
"regex": /(?!exclude_word)\^[a-zA-Z\ \']+$/,
But it didn't work. Can you give me some advices how to do it?
This is a good time to use word boundary assertions, like #FailedDev indicated, but care needs to be exercised to avoid rejecting certain not-TOO-special cases, such as wordy, wordsmith or even not so obviously cases like sword or foreword
I believe this will work pretty well:
\b(?!\bword\b)\w+\b
This is the expression broken down:
\b # assert at a word boundary
(?! # look ahead and assert that what follows IS NOT...
\b # a word boundary
word # followed by the exact characters `word`
\b # followed by a word boundary
) # end look-ahead assertion
\w+ # match one or more word characters: `[a-zA-Z0-9_]`
\b # then a word boundary
The expression in the original question, however, matches more than word characters. [a-zA-Z\ \']+ matches spaces (to support multiple words in the input) and single quotes as well (for apostrophes?). If you need to allow words with apostrophes in them then use the following expression:
\b(?!\bword\b)[a-zA-Z']+\b
\b(?:(?!word)\w)+\b
Will not match the "word".
It's unclear from your question what you want, but I've interpreted it as "not matching input that contains a particular word". The regex for this is:
^(?!.*\bexclude_word\b)