Regex capture required and optional characters in any position only - regex

I would like to match against a word only a set of characters in any order but one of those letters is required.
Example:
Optional letters: yujkfec
Required letter: d
Matches: duck dey feed yudekk dude jude dedededy jejeyyyjd
No matches (do not contain required): yuck feck
No matches (contain letters outside of set): sucked shock blah food bard
I've tried ^[d]+[yujkfec]*$ but this only matches when the required letter is in the front. I've tried positive lookaheads but this didn't do much.

You can use
\b[yujkfec]*d[dyujkfec]*\b
See the regex demo. Note that the d is included into the second character class.
Details:
\b - word boundary
[yujkfec]* - zero or more occurrences of y, u, j, k, f, e or c
d - a d char
[dyujkfec]* - zero or more occurrences of y, u, j, k, f, e, c or d.
\b - a word boundary.

Related

Regex that does not accept sub strings of more than two 'b'

I need a regex that accepts all the strings consisting only of characters a and b, except those with more than two 'b' in a row.
For example, these should not match:
abb
ababbb
bba
bbbaa
bbb
bb
I came up with this, but it's not working
[a-b]+b{2,}[a-b]*
Here is my code:
int main() {
string input;
regex validator_regex("\b(?:b(?:a+b?)*|(?:a+b?)+)\b");
cout << "Hello, "<<endl;
while(regex_match(input,validator_regex)==false){
cout << "please enter your choice of regEx :"<<endl;
cin>>input;
if(regex_match(input,validator_regex)==false)
cout<<input+" is not a valid input"<<endl;
else
cout<<input+" is valid "<<endl;
}
}
Your pattern [a-b]+b{2,}[a-b]* matches 1 or more a or b chars until you match bb which is what you don't want. Also note that the string should be at least 3 characters long due to this part [a-b]+b{2,}
To not match 2 b chars in a row you can exclude those matches using a negative lookahead by matching optional chars a or b until you encounter bb
Note that [a-b] is the same as [ab]
\b(?![ab]*?bb)[ab]+\b
\b A word boundary
(?![ab]*?bb) Negative lookahead, assert not 0+ times a or b followed by bb to the right
[ab]+ Match 1+ occurrences of a or b
\b A word boundary
Regex demo
Without using lookarounds, you can match the strings that you don't want by matching a string that contains bb, and capture in group 1 the strings that you want to keep:
\b[ab]*bb[ab]*\b|\b([ab]+)\b
Regex demo
Or use an alternation matching either starting with b and optional repetitions of 1+ a chars followed by an optional b, or match 1+ repetitions of starting with a followed by an optional b
\b(?:b(?:a+b?)*|(?:a+b?)+)\b
Regex demo
The simplest regex is:
^(?!.*bb)[ab]+$
See live demo.
This regex works by adding a negative look ahead (anchored to start) for bb appearing anywhere within input consisting of a or b.
If zero length input should match, change [ab]+ to [ab]*.

command line grep finding words with exactly one vowel

how do you list all the lines that contain words which contain one vowel?
I have tried
egrep -i '\<.*[aeiou]{1}.*\>' f3.txt
but I'm stuck and can't figure it out
You may use
grep -i '\<[^[:digit:][:punct:][:space:]aeiou]*[aeiou][^[:digit:][:punct:][:space:]aeiou]*\>' f3.txt
Details
\< - start of a word
[^[:digit:][:punct:][:space:]aeiou]* - 0 or more chars other than digits, punctuation, whitespace, a, e, i, o, u
[aeiou] - 1 occurrence of a, e, i, o or u
[^[:digit:][:punct:][:space:]aeiou]* - 0 or more chars other than digits, punctuation, whitespace, a, e, i, o, u
\> - end of a word.
See an online demo.

Remove all numbers + symbols from line in Notepad++

Is it possible to remove every line in a notepad++ Not Containing
a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
, . '
Like that :
Remove Non-ascii
.*[^\x00-\x7F]+.*
Remove Numbers
.*[0-9]+.*
Text :
example
example'
example,
example.
example123
éxample è
[example/+
example'/é,
example,*
exa'mple--
example#
example"
You may use
^(?![a-zA-Z,.']+$).+$\R?
The regex matches any non-empty line (.+) that does not only consist of ASCII letters, ,, . or '. \R? at the end matches an optional line break.
Details:
^ - start of a string
(?![a-zA-Z,.']+$) - a negative lookahead that fails the match if its pattern is not matched: [a-zA-Z,.']+ - 1 or more ASCII letters, comma, period or single quote up to the end of the line ($)
.+ - 1+ chars other than line break char
$ - end of a line
\R? - an optional line break char (sequence)
You can remove them like this:
Find what: ^.*[^a-zA-Z.,'].*$
Replace with: ``
Explanation:
.* for any text
the negated character class [^...] for any unwanted character
then again .* for more any text
You need to wrap it into ^...$ to match the whole line
If you want to delete the linefeed characters, then you can use \r?\n instead of the $ sign. I.e.: ^.*[^a-zA-Z.,'].*\r?\n
Try to replace all this match
^.+?[^a-zA-Z,.'\r\n]+(.|\r?\n)

Regex logic for a 3 digits code with specific 1 digit Alphabetic and Numbers

I need help on this to write logic for regex expression for the following conditions. The user keyed code should have
3 Bytes max
1st byte can have alpha (specifically A, B, P) or all 3 numbers
2nd & 3rd bytes must be numeric
No special characters allowed.
Examples,
A23 - match
B45 - match
P71 - match
A3 - match
418 - match
91 - match
C23 - not match
AC2 - not match
D3 - not match
I tried the expression, but no luck. The logic is
alphaNumericRegExp =/[A,B,P][0-9]{3}/
Matcher matcher = mask.matcher(service.getRacprCd1());
Matcher matcher1=digitPattern.matcher(service.getRacprCd1());
if (!matcher.matches()) {
vectErrMsgs.add("Pr code is not valid. " );
}
You may use
alphaNumericRegExp =/[ABP0-9]?[0-9]{1,2}/
With matcher.matches(), it requires a full string match, no need adding ^ and $ anchors. It matches:
[ABP0-9]? - an optional A, B, P, or digit
[0-9]{1,2} - 1 or 2 digits
Note that a | inside a character class makes it match the literal pipe symbol.
Split it into logical pieces. The first char can be A, B, P, a number, or (if I understand correctly) nothing. Therefore:
[ABP\d]?
Then there needs to be 1 or 2 digits.
\d{1,2}
So all together,
^[ABP\d]?\d{1,2}$
One gotcha, this allows a single digit. I can't tell from your question if that is allowed. If the code has to be at least 2 chars long, remove the ?

Contextual Regular Expression

I have a list of comma separated words that I want to remove the comma from and replace with a space:
elements-(a,b,c,d)
becomes:
elements-(a b c d)
The question is how can I do this using a regular expression if and only if that list is within a specific context, e.g. only prefixed by element-():
The following:
There are a number of elements-(a,b,c,d) and a number of other elements-(e,f,g,h)
should become:
There are a number of elements-(a b c d) and a number of other elements-(e f g h)
What would be the correct way to do this with regex?
For contextual regular expressions, you can use zero-width look-around assertions. Look-around assertions are used to assert that something must be true in order for the match to succeed, but they do not consume any characters (hence "zero-width").
In your case, you want to use positive look-behind and look-ahead assertions. In C#, you can do the following:
static string Replace(string text)
{
return Regex.Replace(
text,
#"(?<=elements\-\((\w+,)*)(\w+),(?=(\w+,)*\w+\))",
"$2 "
);
}
There are three basic parts to the pattern here (in order):
(?<=elements\-\((\w+,)*) - this is the positive look-behind assertion. It says that the pattern will only match if it is preceded by the text elements-( and zero-or-more comma-separated strings.
(\w+), - this is the actual match. It's the text that's being replaced.
(?=(\w+,)*\w+\)) - this is the positive look-ahead assertion. It says that the pattern will only match if it is followed by one-or-more comma-separated strings.
In C#, for matching the inner comma-separated contents, you can alternatively do the following:
static string Replace(string text)
{
return Regex.Replace(
text,
#"(?<=elements\-)\(((\w+,)+\w+)\)",
m => string.Format("({0})", m.Groups[1].Value.Replace(',', ' '))
);
}
The basic approach with the positive look-ahead assertion is still the same.
Example output:
"(x,y,z) elements-(a,b) (m,m,m) elements-(c,d,e,f,g,h)"
...becomes...
"(x,y,z) elements-(a b) (m,m,m) elements-(c d e f g h)"