how to Exclude specific word using regex? - regex

i have a problem here, i have the following string
#Novriiiiii yauda busana muslim #nencor haha. wa'alaikumsalam noperi☺
then i use this regex pattern to select all the string
\w+
however, i need to to select all the string except the word which prefixed with # like #Novriiiiii or #nencor which means, we have to exclude the #word ones
how do i do that ?
ps. i am using regexpal to compile the regex. and i want to apply the regex pattern into yahoo pipes regex. thank you

You can use a negative lookbehind so that if a word is preceded by # it is excluded. You also need a word boundary before the word or else the lookbehind will only affect the first character.
(?<!#)\b\w+
http://rubular.com/r/ONEl70Am5Q

Does this suit your needs?
http://rubular.com/r/uuXvNrUiGJ
[^#\w+]\w+

This would sole your problem indeed:
[^#\w+][\w.]+
Check this link: http://regexr.com?34tq7

If you cannot use a negative lookbehind as other answers have already suggested, here's a workaround.
\w already doesn't match the # character, so you'd want something like this:
[^#]\w+
But this will (a) not work at the beginning of the string, and (b) include the character before the word in the match. To fix (a), we can do:
(^|[^#])\w+
To fix (b), we parenthesize the part we want:
(^|[^#])(\w+)
Then use $2 or \2 (depending on regex dialect) to refer to the matched word.

Another option is to include the # symbol in the word:
[\w#]+
And then add another step in your Pipe to filter out all words that start with an #.

A way to do that is to remove words that you don't want. Example:
find: #\w+
replace: empty string
you obtain the text without #abcdef words.

Related

Regex to find if all the characters in a word are the same specific character

I have a set of words coming in one by one like aa, ##, ???, ~~~, ?~ etc
I need a regex to find if any of these words is containing only ? or only ~.
Of the above input examples, ??? and ~~~ should match but not the others.
I tried ^[\s?]*$ and ^[\s~]*$ separately and it works, I am trying to combine them.
^[\s?||~]*$ doesn't work as it also recognizes ?~ as valid.
Any help?
You can use this regex, which looks for a string starting with a ~ or a ?, and then asserts that every other character in the string is the same as the first one using a backreference (\1):
^([~?])\1+$
Demo on regex101
You need to use backreference to achived your desired result.
If you want only ~ or ? use
^([~?])\1+$
If you want any repetitive pattern, use
^(.)\1+$
Explanation (.) or ([~?]) capturing the first charactor.
Then, \1+ checking the first charactor, one or more times (backreferencing)
You want to match lines that both start and end with any number of either a tilde or questionmark. That would be ^\(~\|?\)*$. The parentheses to make a group and the vertical bar to do the 'or' need to be backslash escaped.

Match Latin words which not in the hook

I'm trying to filter words which is not in the "[ ]".
Why is this not working?
[^\[][\u0000-\u024F]+[^\]]
The reason your expression is not working is that it matches all text inside brackets as well as outside.
This is the best I've been able to do:
/(?:^|])[^[]+/g
It includes the ]s in the match because look-behind is not allowed:
http://regexr.com/3c515
If look-behind were allowed, this would be the ticket:
/(?:^|(?<=]))[^[]+/g
https://regex101.com/r/lK9tS7/3
Because this will match [\u0000-\u024F]+ and 2 character which will be matches by [^\[]. If you want to your regex engine match the whole of pattern you need to use start and end anchors in your regex :
/^[^\[][\u0000-\u024F]+[^\]]$/m
But this will work if your string is contain words in each line, which is not a proper way.
As a better way you can use negative look arounds :
(?<!\[)[\u0000-\u024F]+(?!\])

Find words does not end with a letter expression using regexp

I am trying to find any word which ends 'k' letter and must be come after these letters 'a,e,o'.
Regex should find this:
'stack'
'kick'
'kiik'
'kimk'
'gesk'
and should not find belows:
'book'
'beak'
'aiok'
For this gain i use this reguler expression :
(?![aeo]+k)^.*?$
. But it does not work.
^.*(?<![aeo])k$
You can use this as all your words are ending with k.See demo.The lookbehind will separate out the words having aeo just before the last k.
https://regex101.com/r/cD5jK1/3
You can use this negation based regex:
^.*[^aeo]k$
RegEx Demo
You may not have provided enough information, but I don't see why any sort of lookaround is warranted here. You should be able to simply use:
\b[A-Za-z]*[aeo]k\b
Word boundaries ( \b ) will help you limit this pattern to only words. If you need to account for hyphens, then you could adjust the first range to include hyphen as well.

Regex for deleting characters before a certain character?

I'm very new at regex, and to be completely honest it confounds me. I need to grab the string after a certain character is reached in said string. I figured the easiest way to do this would be using regex, however like I said I'm very new to it. Can anyone help me with this or point me in the right direction?
For instance:
I need to check the string "23444:thisstring" and save "thisstring" to a new string.
If this is your string:
I'm very new at regex, and to be completely honest it confounds me
and you want to grab everything after the first "c", then this regular expression will work:
/c(.*)/s
It will return this match in the first matched group:
"ompletely honest it confounds me"
Try it at the regex tester here: regex tester
Explanation:
The c is the character you are looking for
.* (in combination with /s) matches everything left
(.*) captures what .* matched, making it available in $1 and returned in list context.
Regex for deleting characters before a certain character!
You can use lookahead like this
.*(?=x)
where x is a particular character or word or string.{using characters like .,$,^,*,+ have special meaning in regex so don't forget to escape when using it within x}
EDIT
for your sample string it would be
.*(?=thisstring)
.* matches 0 to many characters till thisisstring
Here is a one-line solution for matching everything after "before"
print $1."\n" if "beforeafter" =~ m/before(.*)/;
Edit:
While using lookbehind is possible, it's not required. Grouping provides an easier solution.
To get the string before : in your example, you have to use [^:][^:]*:\(.*\). Notice that you should have at least one [^:] followed by any number of [^:]s followed by an actual :, the character you are searching for.

Antimatch with Regex

I search for a regex pattern, which shouldn't match a group but everything else.
Following regex pattern works basicly:
index\.php\?page=(?:.*)&tagID=([0-9]+)$
But the .* should not match TaggedObjects.
Thanks for any advices.
(?:.*) is unnecessary - you're not grouping anything, so .* means exactly the same. But that's not the answer to your question.
To match any string that does not contain another predefined string (say TaggedObjects), use
(?:(?!TaggedObjects).)*
In your example,
index\.php\?page=(?:(?!TaggedObjects).)*&tagID=([0-9]+)$
will match
index.php?page=blahblah&tagID=1234
and will not match
index.php?page=blahTaggedObjectsblah&tagID=1234
If you do want to allow that match and only exclude the exact string TaggedObjects, then use
index\.php\?page=(?!TaggedObjects&tagID=([0-9]+)$).*&tagID=([0-9]+)$
Try this. I think you mean you want to fail the match if the string contains an occurence of 'TaggedObjects'
index\.php\?page=(?!.*TaggedObjects).*&tagID=([0-9]+)$