Find words does not end with a letter expression using regexp - regex

I am trying to find any word which ends 'k' letter and must be come after these letters 'a,e,o'.
Regex should find this:
'stack'
'kick'
'kiik'
'kimk'
'gesk'
and should not find belows:
'book'
'beak'
'aiok'
For this gain i use this reguler expression :
(?![aeo]+k)^.*?$
. But it does not work.

^.*(?<![aeo])k$
You can use this as all your words are ending with k.See demo.The lookbehind will separate out the words having aeo just before the last k.
https://regex101.com/r/cD5jK1/3

You can use this negation based regex:
^.*[^aeo]k$
RegEx Demo

You may not have provided enough information, but I don't see why any sort of lookaround is warranted here. You should be able to simply use:
\b[A-Za-z]*[aeo]k\b
Word boundaries ( \b ) will help you limit this pattern to only words. If you need to account for hyphens, then you could adjust the first range to include hyphen as well.

Related

Multiline PCRE, multiple conditions

just starting out with regex and have hit a stumbling block. Hoping someone might be able to explain the workaround.
Trying to carry out a multi-line search. I wish to use "*" as the 'flag', so to speak: if a line contains an asterisk it should match. The digits at the start of the line should be output, so should the word "Match" in the linked example, excluding the asterisk itself.
I assume my use of "|" is dividing the regex into two conditions, when it actually needs to satisfy both to match.
https://regex101.com/r/Pu56bi/2
(?m)(^\d+)|(?<=\*).*$
Any help kindly appreciated.
You could use a pos. lookahead as in
^(?=.*?\*)(\d+).+?(Match)$
See your modified example on regex101.com.
If Match is always at the end of the string, you could match the digits at the start of the string, then match an * and Match at the end of the string.
Use a word boundary \b to prevent the word of digits being part of a longer word.
^(\d+)\b.*\*.*\b(Match)$
Regex demo
If there can be test after the word Match you can assert * using a positive lookahead.
^(?=.*\*)(\d+)\b.*\b(Match)\b.*$
Regex demo

Regex to match different characters at same position in string

Let's say I have the text a123456. I want a string of b123456 to match. So essentially, 'match if all characters are the same except for the first character'. Am I asking for the impossible with regex?
Use the dot (.) to match any character. So, a possible Regex would be:
/^.123456$/
If you want to use zero length assertion with regex, you can have lookbehind approach in following way :
(?<=\w)your_value$ // your_value should be text which you want to check
I think you can figure it out on your own. This ain't tough, just needs some understanding between you and Regex. Why don't you go through the following links and try to make a regex on your own.
https://www.talentcookie.com/2015/07/regular-expressions/
https://www.talentcookie.com/2015/07/lets-practice-regular-expression/
https://www.talentcookie.com/2016/01/some-useful-regular-expression-terminologies/

how to Exclude specific word using regex?

i have a problem here, i have the following string
#Novriiiiii yauda busana muslim #nencor haha. wa'alaikumsalam noperi☺
then i use this regex pattern to select all the string
\w+
however, i need to to select all the string except the word which prefixed with # like #Novriiiiii or #nencor which means, we have to exclude the #word ones
how do i do that ?
ps. i am using regexpal to compile the regex. and i want to apply the regex pattern into yahoo pipes regex. thank you
You can use a negative lookbehind so that if a word is preceded by # it is excluded. You also need a word boundary before the word or else the lookbehind will only affect the first character.
(?<!#)\b\w+
http://rubular.com/r/ONEl70Am5Q
Does this suit your needs?
http://rubular.com/r/uuXvNrUiGJ
[^#\w+]\w+
This would sole your problem indeed:
[^#\w+][\w.]+
Check this link: http://regexr.com?34tq7
If you cannot use a negative lookbehind as other answers have already suggested, here's a workaround.
\w already doesn't match the # character, so you'd want something like this:
[^#]\w+
But this will (a) not work at the beginning of the string, and (b) include the character before the word in the match. To fix (a), we can do:
(^|[^#])\w+
To fix (b), we parenthesize the part we want:
(^|[^#])(\w+)
Then use $2 or \2 (depending on regex dialect) to refer to the matched word.
Another option is to include the # symbol in the word:
[\w#]+
And then add another step in your Pipe to filter out all words that start with an #.
A way to do that is to remove words that you don't want. Example:
find: #\w+
replace: empty string
you obtain the text without #abcdef words.

Perl regular expression for English word

I need a regular expression that will find anything that looks like an English word. In particular, I want the expression to match when a string has:
1) only letters; and
2) at least two different letters. (I am purposely excluding one-letter words.)
So I'm looking for something that would match the and abracadabra but not aaa.
Any help is much appreciated.
Perhaps \b(\w*(\w)\w*(?!\2)\w+)\b works for you. It handles the examples you give.
It matches a letter \w in a group, then looks for something other than than letter using backreferences and negative lookahead (?!\2). We match at least one character at the end, which is necessary to make the negative lookahead force at least one distinct character. Then we place additional \w*'s around to allow additional letters. \b assures the ends of the matches are at word boundaries.
http://www.rubular.com/r/pwjGi9eLf5
Please note that this is no super duper regular expression that matches English-only words. For that, you want to compare against a dictionary. But that doesn't seem to be what you're looking to do here.
Check out Lingua::EN::Splitter:
use strict; use warnings;
use Lingua::EN::Splitter qw(words);
my #words = words $input_text;
print #words;

Regex negation - word parsing

I am trying to parse a phrase and exclude common words.
For instance in the phrase "as the world turns", I want to exclude the common words "as" and "the" and return only "world" and "turns".
(\w+(?!the|as))
Doesn't work. Feedback appreciated.
The lookahead should come first:
(\b(?!(the|as)\b)\w+\b)
I have also added word boundaries to ensure that it only matches whole words otherwise it would fail to match the complete word "as" but it would successfully match the letter "s" of that word.
You might also want to consider what \w matches and if that meets your needs. If you are looking for words in English you probably are interested in letters but not digits and you may wish to include some punctuation characters that are excluded by \w, such as apostrophes. You could try something like this instead (Rubular):
/(\b(?!(?:the|as)\b)[a-z'-]+\b)/i
To match words more accurately in a human language you could consider using a natural language parsing library instead of regular expressions.
You should use word boundaries to only match whole words. Either with a look-ahead assertion:
(\b(?!(?:the|as)\b)\w+\b)
Or with a look-behind assertion:
(\b\w+\b(?<!\b(?:the|as)))