command line grep finding words with exactly one vowel - regex

how do you list all the lines that contain words which contain one vowel?
I have tried
egrep -i '\<.*[aeiou]{1}.*\>' f3.txt
but I'm stuck and can't figure it out

You may use
grep -i '\<[^[:digit:][:punct:][:space:]aeiou]*[aeiou][^[:digit:][:punct:][:space:]aeiou]*\>' f3.txt
Details
\< - start of a word
[^[:digit:][:punct:][:space:]aeiou]* - 0 or more chars other than digits, punctuation, whitespace, a, e, i, o, u
[aeiou] - 1 occurrence of a, e, i, o or u
[^[:digit:][:punct:][:space:]aeiou]* - 0 or more chars other than digits, punctuation, whitespace, a, e, i, o, u
\> - end of a word.
See an online demo.

Related

Regex capture required and optional characters in any position only

I would like to match against a word only a set of characters in any order but one of those letters is required.
Example:
Optional letters: yujkfec
Required letter: d
Matches: duck dey feed yudekk dude jude dedededy jejeyyyjd
No matches (do not contain required): yuck feck
No matches (contain letters outside of set): sucked shock blah food bard
I've tried ^[d]+[yujkfec]*$ but this only matches when the required letter is in the front. I've tried positive lookaheads but this didn't do much.
You can use
\b[yujkfec]*d[dyujkfec]*\b
See the regex demo. Note that the d is included into the second character class.
Details:
\b - word boundary
[yujkfec]* - zero or more occurrences of y, u, j, k, f, e or c
d - a d char
[dyujkfec]* - zero or more occurrences of y, u, j, k, f, e, c or d.
\b - a word boundary.

Regex that does not accept sub strings of more than two 'b'

I need a regex that accepts all the strings consisting only of characters a and b, except those with more than two 'b' in a row.
For example, these should not match:
abb
ababbb
bba
bbbaa
bbb
bb
I came up with this, but it's not working
[a-b]+b{2,}[a-b]*
Here is my code:
int main() {
string input;
regex validator_regex("\b(?:b(?:a+b?)*|(?:a+b?)+)\b");
cout << "Hello, "<<endl;
while(regex_match(input,validator_regex)==false){
cout << "please enter your choice of regEx :"<<endl;
cin>>input;
if(regex_match(input,validator_regex)==false)
cout<<input+" is not a valid input"<<endl;
else
cout<<input+" is valid "<<endl;
}
}
Your pattern [a-b]+b{2,}[a-b]* matches 1 or more a or b chars until you match bb which is what you don't want. Also note that the string should be at least 3 characters long due to this part [a-b]+b{2,}
To not match 2 b chars in a row you can exclude those matches using a negative lookahead by matching optional chars a or b until you encounter bb
Note that [a-b] is the same as [ab]
\b(?![ab]*?bb)[ab]+\b
\b A word boundary
(?![ab]*?bb) Negative lookahead, assert not 0+ times a or b followed by bb to the right
[ab]+ Match 1+ occurrences of a or b
\b A word boundary
Regex demo
Without using lookarounds, you can match the strings that you don't want by matching a string that contains bb, and capture in group 1 the strings that you want to keep:
\b[ab]*bb[ab]*\b|\b([ab]+)\b
Regex demo
Or use an alternation matching either starting with b and optional repetitions of 1+ a chars followed by an optional b, or match 1+ repetitions of starting with a followed by an optional b
\b(?:b(?:a+b?)*|(?:a+b?)+)\b
Regex demo
The simplest regex is:
^(?!.*bb)[ab]+$
See live demo.
This regex works by adding a negative look ahead (anchored to start) for bb appearing anywhere within input consisting of a or b.
If zero length input should match, change [ab]+ to [ab]*.

Regex 'OR' seems to not behave as expected

Hello I am trying to build a regex for a string with the followings constraints :
should only contain 'X', 'O', 'T', '_', ';'
'T' and 'O' should occur only once and can be anywhere in the string
'X', '_', ';' may occur zero to n times
Here are few valid examples :
"X__;O_T;___"
"T__;_XX_;_XO"
"T__;OX_;_X_"
"OT"
This is the regex I have right now :
/^([X;_]*T[X;_]*O)|([X;_]*O[X;_]*T);$ */i
The above seems to pass the below input as valid:
T__;_X__OO; //which is not valid
Thanks for your time.
If you can use a lookahead you may use
^(?=[^O]*O[^O]*$)(?=[^T]*T[^T]*$)[TOX;_]*$
See the regex demo
Details
^ - start of string
(?=[^O]*O[^O]*$) - there must be any 0+ chars other than O, then O, and then any 0+ chars other than O up to the end of the string
(?=[^T]*T[^T]*$) - there must be any 0+ chars other than T, then T, and then any 0+ chars other than T up to the end of the string
[TOX;_]* - 0+ T, O, X, ;, _ chars
$ - end of string.
A non-lookaround approach based on alternation is also possible:
^[X;_]*(?:T[X;_]*O|O[X;_]*T)[X;_]*$
See the regex demo.
Details
^ - string start
[X;_]* - 0+ T, O, X, ;, _ chars
(?:T[X;_]*O|O[X;_]*T) - either of the two alternatives:
T[X;_]*O - T, any 0+ T, O, X, ;, _ chars, O
| - or
O[X;_]*T - O, any 0+ T, O, X, ;, _ chars, T
[X;_]* - 0+ T, O, X, ;, _ chars
$ - string end.

Remove all numbers + symbols from line in Notepad++

Is it possible to remove every line in a notepad++ Not Containing
a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
, . '
Like that :
Remove Non-ascii
.*[^\x00-\x7F]+.*
Remove Numbers
.*[0-9]+.*
Text :
example
example'
example,
example.
example123
éxample è
[example/+
example'/é,
example,*
exa'mple--
example#
example"
You may use
^(?![a-zA-Z,.']+$).+$\R?
The regex matches any non-empty line (.+) that does not only consist of ASCII letters, ,, . or '. \R? at the end matches an optional line break.
Details:
^ - start of a string
(?![a-zA-Z,.']+$) - a negative lookahead that fails the match if its pattern is not matched: [a-zA-Z,.']+ - 1 or more ASCII letters, comma, period or single quote up to the end of the line ($)
.+ - 1+ chars other than line break char
$ - end of a line
\R? - an optional line break char (sequence)
You can remove them like this:
Find what: ^.*[^a-zA-Z.,'].*$
Replace with: ``
Explanation:
.* for any text
the negated character class [^...] for any unwanted character
then again .* for more any text
You need to wrap it into ^...$ to match the whole line
If you want to delete the linefeed characters, then you can use \r?\n instead of the $ sign. I.e.: ^.*[^a-zA-Z.,'].*\r?\n
Try to replace all this match
^.+?[^a-zA-Z,.'\r\n]+(.|\r?\n)

Regex for replacing multiple spaces and dashes with or without spaces

I can do this with two separate regex passes, but this is already slow and doing two doesn't help, so I want to be able to do it in one pass.
I want to:
replace multiple spaces with one space
replace a dash (hyphen) with a space
However, if the dash has a space on either side of it then the dash and any spaces either side to be replaced with just one space.
As an example:
a - b c-d e -f g- h i - j k - l m - n
must end up like
a b c d e f g h i j k l m n
I have tried things like this:
\s+| - | -|- |-
but that doesn't work:
a b c d e f g h i j k l m n
Use the following regexp to match multiple spaces or dashes;
[\s-]+
Replace with a single space.
[\s-]+ with a global 'g' modifier and replace with one single space.
See here
Regex:
(?:\s*-\s*)+|\s{2,}
REplacement string:
<space>
DEMO