Ignore one word with regex - regex

I know there are several similar questions already asked. But can't fix this issue with regex.
I have sentence like
Lorem Ipsum is http://stack.com text of the http://stack.com/wp-admin
printing and typesetting industry.
I want to cache the word "stack.com" but not stack.com/wp-admin
I have tried few regex but it's not working.
^(?!stack.com$).*

The ^(?!stack.com$).* regex matches any string (even an empty one) that does not start with stack.com.
To match stack.com but not inside stack.com/wp-admin, you need a negative lookahead:
/stack\.com(?!\/wp-admin)/
^^^^^^^^^^^^^
Or better, with word boundaries to only match whole words:
/\bstack\.com\b(?!\/wp-admin)/
See the regex demo
Details:
\b - a leading word boundary
stack\.com - a literal string stack.com (a dot must be escaped)
\b - a trailing word boundary
(?!\/wp-admin) - a negative lookahead that fails the match if there is /wp-admin immediately to the right of the current location.

Related

Regex - match first word before hyphen and digit and file extension

I am trying to write a regex expression that will capture the first word up until a hyphen, and then ALSO match a digit and file extension.
Here is an example string:
lorem-ipsum-dolor-15.jpg
I know that [\d]+(.jpg) will match 15.jpg
I also know that ^[^-]*[^ -] will match the first word up until the hyphen, in this case lorem but it won't match anything beyond that.
How can I reconcile these 2 expressions into one? I'll also settle for a reverse match of what I'm asking for.
Thanks!
Assuming this is JS, you should be able to just join the two patterns with .*? (that is, match any character, any number of times, in a non-greedy fashion).
So something like:
/^([^-]*[^ -]).*?(\d+\.jpg)$/
The first capture group will be the leading lorem, and the second will be 15.jpg.
Check out regex101 for more detail.

Regex: capturing capital word with nothing in front of it

I'm trying to match all proper nouns in some given text.
So far I've got (?<![.?!]\s|^)(?<!\“)[A-Z][a-z]+ which ignores capital words preceded by a .?! and a space as well as words inside a bracket. Can be seen here.
But it doesn't catch capital words at the beginning of sentences. So given the text:
Alec, Prince, so Genoa and Lucca are now just family estates of the “What”. He said no. He, being the Prince.
It successfully catches Prince, Genoa, Lucca but not Alec.
So i'd like some help to modify it if possible, to match any capital word with nothing behind it. (I'm not sure how to define nothing)
You can put the “ as the second alternative in the lookbehind instead of ^ which asserts the start of the string.
Then you can omit (?<!\“)
(?<![.?!]\s|“)[A-Z][a-z]+
Explanation
(?<! Negative lookbehind, assert what is directly to the left if the current position is not
[.?!]\s Match any of . ? ! followed by a whitespace char
| Or
“ Match literally
) Close lookbehind
[A-Z][a-z]+ Match an uppercase char A-Z and 1+ chars a-z
See a regex demo.
The thing you're looking for is called a "word boundary", which is denoted as \b in a lot of regex languages.
Try \b[A-Z][a-z]*\b.

regex grab a word

i'm trying to grab a regex from source, but only name from this type.
"name":"HELP-PERP","posOnly":false,"price":40.3,"priceIncrement":0.01,"quote":null,"quoteV":73851918.483,"restricted":false,"sizeIncrement":0.01,"type":"future",
So i got here \b(\w*-PERP\w*)\b
This grabs the word HELP-PERP but duplicates it, so i'm trying to grab that word that matches the type =future.
Grab help-perp that is in the same line with type":"future".
Total nub at this, i've tried several things on regex101 and can't come up :(
Thank you
You can use
/\w*-PERP\w*\b(?=.*type":"future")/g
See the regex demo.
Details
\w*-PERP\w* - zero or more word chars, -PERP, and again zero or more chars
\b - a word boundary
(?=.*type":"future") - a positive lookahead that matches a location in string that is immediately followed with any zero or more chars other than line break chars as many as possible (.*) and then a type":"future" string.

Regex for blacklist and whitelist words

I'm trying to set up regex for a blacklist and whitelist, flagging blacklisted words and ignoring whitelisted words. Here are the rules:
I want to see if a word or phrase on the blacklist exists in the input string.
The blacklist words should be matched regardless of where they appear (full word or as substring).
The whitelist words (i.e. words that are known to be okay even though they contain blacklisted words) are not to be matched if they are full words only.
Blacklist words I want to search for and match if found: BUNNY, GARDEN, HOLE
Whitelist words that are clean and can be ignored even though they contain blacklisted words: WHOLE, GARDENER
I made the following regex using negative lookbehind:
(BUNNY|GARDEN|HOLE)(?<!\bWHOLE\b|\bGARDENER\b)
My silly example string:
This whole hole is a wholey mistake in the gardener agardener.
I would expect only the following be matched:
"hole"
"wholey"
"agardener"
It mostly works, since "whole" doesn't match but "wholey" does and "agardener" is also a match. However, "gardener" matches even though it's in the whitelist. What am I missing?
You can use
\w*(?:BUNNY|GARDEN|HOLE)\w*\b(?<!\bWHOLE|\bGARDENER)
See the regex demo.
A variation without a lookbehind, but with a lookahead:
\b(?!(?:WHOLE|GARDENER)\b)\w*(?:BUNNY|GARDEN|HOLE)\w*\b
See this regex demo.
Details:
\w* - zero or more word chars
(?:BUNNY|GARDEN|HOLE) - one of the required word parts
\w* - zero or more word chars
\b - a word boundary
(?<!\bWHOLE|\bGARDENER) - a negative lookbehind that fails the match if there whole word situated on the left is WHOLE or GARDENER.
The \b(?!(?:WHOLE|GARDENER)\b)\w*(?:BUNNY|GARDEN|HOLE)\w*\b matches a word boundary first, then fails the match if the next chars are a WHOLE or GARDENER whole words and then matches a word with BUNNY, GARDEN or HOLE substring in it.
Replace \w with [a-zA-Z] or \p{L} (or [[:alpha:]]) if supported and you need to only match letter words.

Regex to match a letter, but not if that letter starts the word

I've been trying to figure this one out at regex101, but no luck yet. I want to match the second 's' in system for instance, but not if the s is at the start of the word. so I want to match the "s" in mos, or answer, but I dont want to match the s in space.
This is what I have tried so far:
s*(?<!\W)
with couple variations, but no luck yet.
A negative lookbehind is the way to go. I think you need this:
(?<!\b)s
Details
(?<!...) - negative lookbehind
\b - word boundary
s - the letter s to be matched (will not be matched if there is a word boundary before)
Regex101 demo.