Regex to exclude alpha-numeric characters - regex

I thought [^0-9a-zA-Z]* excludes all alpha-numeric letters, but allows for special characters, spaces, etc.
With the search string [^0-9a-zA-Z]*ELL[^0-9A-Z]* I expect outputs such as
ELL
ELLs
The ELL
Which ELLs
However I also get following outputs
Ellis Island
Bellis
How to correct this?

You may use
(?:\b|_)ELLs?(?=\b|_)
See the regex demo.
It will find ELL or ELLs if it is surrounded with _ or non-word chars, or at the start/end of the string.
Details:
(?:\b|_) - a non-capturing alternation group matching a word boundary position (\b) or (|) a _
ELLs? - matches ELL or ELLs since s? matches 1 or 0 s chars
(?=\b|_) - a positive lookahead that requires the presence of a word boundary or _ immediately to the right of the current location.

change the * to +
a * means any amount including none. A + means one or more. What you probably want though is a word boundry:
\bELL\b
A word boundry is a position between \w and \W (non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ([0-9A-Za-z_]). More here about that:
What is a word boundary in regexes?

Related

Regex 4 characters and 1 space minimum anywhere position

I've tried this
(?!\sa-zA-Z){4,}\s{1,}
I want 4 or more characters and minimum 1 space anywhere in string, but doesn't work
The space can be anywhere position without from start.
I've try this, but that not work
Regex: Allow minimum alphanumeric, dot and - characters. Asterisk allowed anywhere?
EDIT: I would like this result :
aa aa..., aaa a..., a aaa..., aaaa ...
You can use
\b[a-zA-Z](?=[a-zA-Z ]{3})[a-zA-Z]* +[a-zA-Z]*
Explanation
\b A word boundary to prevent a partial word match
[a-zA-Z] Match a single char a-zA-Z
(?=[a-zA-Z ]{3}) Positive lookahead, assert 3 of the listed chars in the character class to the right of the current position
[a-zA-Z]* +[a-zA-Z]* Match optional chars a-zA-Z, then match 1+ spaces space and again optional chars a-zA-Z
See a regex demo.

Regex match pattern, space and character

^([a-zA-Z0-9_-]+)$ matches:
BAP-78810
BAP-148080
But does not match:
B8241066 C
Q2111999 A
Q2111999 B
How can I modify regex pattern to match any space and/or special character?
For the example data, you can write the pattern as:
^[a-zA-Z0-9_-]+(?: [A-Z])?$
^ Start of string
[a-zA-Z0-9_-]+ Match 1+ chars listed in the character class
(?: [A-Z])? Optionally match a space and a char A-Z
$ End of string
Regex demo
Or a more exact match:
^[A-Z]+-?\d+(?: [A-Z])?$
^ Start of string
[A-Z]+-? Match 1+ chars A-Z and optional -
\d+(?: [A-Z])? Matchh 1+ digits and optional space and char A-Z
$ End of string
Regex demo
Whenever you want to match something that can either be a space or a special character, you would use the dot symbol .. Your regex pattern would then be modified to:
^([a-zA-Z0-9_-])+.$
This will match the empty space, or any other character. If you want to match the example provided, where strictly one alphabetical, numer character will follow the space, you could include \w such that:
^([a-zA-Z0-9_-])+.\w$
Note that \w is equivalent to [A-Za-z0-9_]
Further, be careful when you use . as it makes your pattern less specific and therefore more likely to false positives.
I suggest using this approach
^[A-Z][A-Z\d -]{6,}$
The first character must be an uppercase letter, followed by at least 6 uppercase letters, digits, spaces or -.
I removed the group because there was only one group and it was the entire regex.
You can also use \w - which includes A-Z,a-z and 0-9, as well as _ (underscore). To make it case-insensitive, without explicitly adding a-z or using \w, you can use a flag - often an i.

Regex: matching up to the first occurrence of word with character 'a' in it

I need a regular expression to match the first word with character 'a' in it for each line. For example my test string is this:
bbsc abcd aaaagdhskss
dsaa asdd aaaagdfhdghd
wwer wwww awww wwwd
Only the ones in BOLD fonts should be matched. How can I do that? I can match all the words with 'a' in it, but can't figure out how to only match the first occurrence.
Under the assumption that the only characters being used are word characters, i.e. \w characters, and white space then use:
/^(?:[^a ]+ +)*([^a ]*a\w*)\b/gm
^ Matches the start of the line
(?:[^a ]+ +)* Matches 0 or more occurrences of words composed of any character other than an a followed by one or more spaces in a non-capturing group.
([^a ]*a\w*)\b Matches a word ending on a word boundary (it is already guaranteed to begin on a word boundary) that contains an a. The word-boundary constraint allows for the word to be at the end of the line.
The first word with an a in it will be in group #1.
See demo
If we cannot assume that only word (\w) and white space characters are present, then use:
^(?:[^a ]+ +)*(\w*a\w*)\b
The difference is in scanning the first word with an a in it, (\w*a\w*), where we are guaranteed that we are scanning a string composed of only word characters.
What are you using? In many programs you can set limit. If possible: \b[b-z]*a[a-z]* with 1 limit.
If it is not possible, use group to capture and match latter: ([b-z]*a[a-z]*).*
Try:
^(?:[^a ]+ )*(\w*a\w*) .*$
Basically what it says is: capture a bunch of words that are composed of anything but the letter a (or <space>) then capture a word that must include the letter a.
Group 1 should hold the first word with a.

How can I fix this negative lookahead to make it work

I have a string for example as follows:
ABCD17; ABC18; ABCEF19; XYZ19; ABCDE
Within the MusicBee application, I'm attempting to use a Regex replace function to swap MATCHED items for blanks and thus transform the above string into
ABCEF19; XYZ19
i.e. ONLY retain the items ending in "19"
The elements can be any length and they may or may not end in a number.
The following expression correctly matches the items Ending in 19
[^|;].*(?=19).{3}
However, I obviously need the opposite of this (since the matched items are then replaced with empty strings) which is NOT (surprisingly to me)
[^|;].*(?!19).{3}
If you only want to keep items that end on 19, one option might be to use word boundaries \b and start matching 1+ uppercase chars A-Z.
Optionally match the digits at the end when it is not 19 using the negative lookahead (?!19\b)
\b[A-Z]+(?!19\b)\d*\b;?
\b Word boundary
[A-Z]+ Match 1+ uppercase chars A-Z (or use [^\W\d] to match word chars without a digit)
(?!19\b) Negative lookahead, assert what is directly on the right is not 19
\d* Match 0+ digits
\b;? Word boundary and optionally match ;
Regex demo

regex nonconsecutive match

I'm trying to match a word that has 2 vowels in it (doesn't have to be consecutively) but the regex I've come up either matches nothing or not enough. This is the last iteration (dart).
final vowelRegex = new RegExp(r'[aeiouy]{2}');
Here's an example sentence being parsed and it should match, one, shoulder, their, and over. It's only matching shoulder and their. I understand why, because that's the expression I defined. How can the expression be defined to match on 2 vowels, regardless of position in the word?
one shoulder their the which over
The expression only needs to be tested on one word at a time so hopefully this simplifies things.
You can use :
new RegExp(r'(\w*[aeiouy]\w*){2}');
Both of the previous two answers are incorrect.
(\S*[aeiouy]\S*){2} can match substrings of non-whitespace characters even if they contain non-word characters (proof).
\S*[aeiouy]\S*[aeiouy]\S* has the same problem (proof).
Correct solution:
\b([^\Waeiou]*[aeiou]){2}\w*\b
And if you want only whitespace to count as the word boundary (rather than any non-word character), then use the following regex where the target word is in capture group \2.
(\s|^)(([^\Waeiou]*[aeiou]){2}\w*)(\s|$)
You can try this:
\S*[aeiouy]\S*[aeiouy]\S*
Explanation
\S* matches any non-whitespace character (equal to [^\r\n\t\f ])
* Quantifier — Matches between zero and unlimited times
[aeiou] Match a single character present in the list below [aeiou]
For input string : one shoulder their the which over
it will match four word: one shoulder their over
I'd do:
\b(?:\w*[aeiouy]+\w*){2,}\b
Explanation:
\b : word boundary
(?: : start non-capture group
\w* : 0 or more word characters
[aeiouy]+ : 1 or more vowels
\w* : 0 or more word characters
){2,} : end group repeated at least twice
\b : word boundary