Match string not containg a certain phrase - regex

I need to find all instances of the word "confidential" in a message except when it is used in the phrase "confidential and proprietary" in which case it is ok and I dont need to pick it up through regex.
Thanks all in advance!
-P

You can use negative lookaround (http://www.regular-expressions.info/lookaround.html)
This regex will match: (confidential) (?!and proprietary) if your engine support lookaround.
demo: http://regexr.com?36itq

Using word boundaries \b is also an option here.
\bconfidential\b(?! and proprietary\b)

Related

Regex to match different characters at same position in string

Let's say I have the text a123456. I want a string of b123456 to match. So essentially, 'match if all characters are the same except for the first character'. Am I asking for the impossible with regex?
Use the dot (.) to match any character. So, a possible Regex would be:
/^.123456$/
If you want to use zero length assertion with regex, you can have lookbehind approach in following way :
(?<=\w)your_value$ // your_value should be text which you want to check
I think you can figure it out on your own. This ain't tough, just needs some understanding between you and Regex. Why don't you go through the following links and try to make a regex on your own.
https://www.talentcookie.com/2015/07/regular-expressions/
https://www.talentcookie.com/2015/07/lets-practice-regular-expression/
https://www.talentcookie.com/2016/01/some-useful-regular-expression-terminologies/

Regex matching groups by prefix

I'm having the following string: CL_6x CL_5c CL_234 CL_ERB14 1D CL_6y
I need to find a regex to extract groups like this
CL_6x
CL_5c
CL_234
CL_ERB14 1D
CL_6y
As you can see they're all prefixed with CL_
Any ideas how to achieve this?
You need to use a positive lookahead based regex.
\bCL_.*?(?=\s*CL_|$)
This should match until the next CL_ or end of the line.
DEMO
CL_.+?\b
Try this.See demo.\b is word boundary
https://regex101.com/r/uF4oY4/86
EDIT:
for test cases like CL_ERB14 1D.
use
CL_\S+(?:\s*(?!CL_)\S+)
See demo.
https://regex101.com/r/uF4oY4/87
You can use following regex.
^CL_.+\b
Explanation
^: Starts with
CL_: Matches literal CL_
.+: Matches any characters any number of times
\b: Word boundary

match a word with out consecutive characters in it

I am new to java regex. Can you please give a matching pattern for the following requirement.
Example:
"Apple" or "Google" should not be matched since there are consecutive characters in them.
But, words like "Hard2Crack" or "Sometimes" should match.
Your help is appreciated.
You can use this regex:
^(?:(.)(?!\1))*$
Demo
(?!.*?([a-z])\1)^.*$
You can try this.uses a negative looakahead.
See demo.
http://regex101.com/r/oC3nN4/13

how to Exclude specific word using regex?

i have a problem here, i have the following string
#Novriiiiii yauda busana muslim #nencor haha. wa'alaikumsalam noperi☺
then i use this regex pattern to select all the string
\w+
however, i need to to select all the string except the word which prefixed with # like #Novriiiiii or #nencor which means, we have to exclude the #word ones
how do i do that ?
ps. i am using regexpal to compile the regex. and i want to apply the regex pattern into yahoo pipes regex. thank you
You can use a negative lookbehind so that if a word is preceded by # it is excluded. You also need a word boundary before the word or else the lookbehind will only affect the first character.
(?<!#)\b\w+
http://rubular.com/r/ONEl70Am5Q
Does this suit your needs?
http://rubular.com/r/uuXvNrUiGJ
[^#\w+]\w+
This would sole your problem indeed:
[^#\w+][\w.]+
Check this link: http://regexr.com?34tq7
If you cannot use a negative lookbehind as other answers have already suggested, here's a workaround.
\w already doesn't match the # character, so you'd want something like this:
[^#]\w+
But this will (a) not work at the beginning of the string, and (b) include the character before the word in the match. To fix (a), we can do:
(^|[^#])\w+
To fix (b), we parenthesize the part we want:
(^|[^#])(\w+)
Then use $2 or \2 (depending on regex dialect) to refer to the matched word.
Another option is to include the # symbol in the word:
[\w#]+
And then add another step in your Pipe to filter out all words that start with an #.
A way to do that is to remove words that you don't want. Example:
find: #\w+
replace: empty string
you obtain the text without #abcdef words.

How to match comments starting with # but not hexadecimal colors

I can't figure out how to match comments but not HTML hex in regex. For example I want the script to match
#I'm a comment, yes I am
but not
#FF33AF
You could use negative lookahead. From the python documentation:
(?!...)
Matches if ... doesn’t match next. This is a negative lookahead assertion. For example, Isaac (?!Asimov) will match 'Isaac ' only if it’s not followed by 'Asimov'.
To do the job right you need a parser not a regular expression matcher. For example, is "#decade" a comment or a color name? You can't know without a little context.
Well, the obvious regex is going to be something like:
(?m-:^\s*#(?![0-9A-Fa-f]{6}).*$)
This gives you all lines that start with a '#'. From your post your not very specific but I think that is what your looking for.
Updated:
Corrected to only allow the six:
(?m-:^\s*#(?![0-9A-Za-z]{6}\s*$).*$)