Regex - returning a match without a period - regex

I'm using the below regex string to match the word "kohls" which is located in a group of other words.
\W*((?i)kohls(?-i))\W*
It works great when the word is alone, but if the word is in a url, the match includes a period on both sides.
See the below examples:
Thank you for shopping at Kohls - returns a match for kohls.
https://www.kohls.com - returns a match for .kohls.
Edit. https://www.KohlsAndMichaels.com - doesn't return any match for kohls.
I want it to only extract the exact match for kohls without periods or any other symbols/text in front or behind it. Can you tell me what I'm doing wrong?

In cases like that you can always use a site like regex101.com, which explains the regular expression and shows the matches with colors. So this is how your regular expression currently works:
As you can see in blue color, the problem with the dots is in the \W*, which matches any non-word character. In order to fix this, you can use the following regular expression:
\b((?i)kohls(?-i))\b
The \b (before and after the word you want to match) is used to assert the position at a word boundary. See how this work on that website now:
If you still have questions, look at the explanation of the regular expression provided by that website. It is worth looking.

The \W metacharacter is used to find non-word characters. So adding a star operator will match 0 or more of these non-word characters (like periods). Did you meant to add a word boundary instead?
\b(?i)kohls(?-i)\b

Replace both \W* with [\W,\.\-]* etc.
Should be enough.

Related

A regular expression to find a word and also exclude another word/string

I have a regular expression as follows:
te\b"[^Haste]"
I want to find all words ending with "te" in each segment but need to exclude the word "Haste" and possibly few other words as they are sometimes flooding the list of errors as false positives.
Any help would be gratefully appreciated :-)
I tried to look it up here and there with no success. Also, many tries on regex101 with no success.
Try this:
\b(?!(?:Haste|AAAte)\b)\w*te\b
\b word boundary.
(?!(?:Haste|AAAte)\b) that is not followed by the word Haste or AAAte.
\w* zero or more word character.
te the string te.
\b word boundary.
See regex demo
One way is to match, but not capture, what you don't want and capture what you do want. Suppose we wanted to skip over "haste" and "paste". We could then use the following regular expression.
\b(?:haste|paste|(\w*te))\b
Suppose the string were as follows.
"In the surgeon's haste to amputate he removed the wrong leg."
The string pointer maintained by the regex engine would move from left to right one character at a time until it matched a word in the sentence ending in "te". The first would be "haste". That would be matched but not captured. We therefore pay no attention to that match.
Next, "amputate" is matched by
(\w*te)
As it is captured as well we find that "amputate" is a valid match.
Demo.

Multiline PCRE, multiple conditions

just starting out with regex and have hit a stumbling block. Hoping someone might be able to explain the workaround.
Trying to carry out a multi-line search. I wish to use "*" as the 'flag', so to speak: if a line contains an asterisk it should match. The digits at the start of the line should be output, so should the word "Match" in the linked example, excluding the asterisk itself.
I assume my use of "|" is dividing the regex into two conditions, when it actually needs to satisfy both to match.
https://regex101.com/r/Pu56bi/2
(?m)(^\d+)|(?<=\*).*$
Any help kindly appreciated.
You could use a pos. lookahead as in
^(?=.*?\*)(\d+).+?(Match)$
See your modified example on regex101.com.
If Match is always at the end of the string, you could match the digits at the start of the string, then match an * and Match at the end of the string.
Use a word boundary \b to prevent the word of digits being part of a longer word.
^(\d+)\b.*\*.*\b(Match)$
Regex demo
If there can be test after the word Match you can assert * using a positive lookahead.
^(?=.*\*)(\d+)\b.*\b(Match)\b.*$
Regex demo

How to find a particular string

Im using Visual Studio 2017 and in a long long text file Im searching for a particular function but unable to find
here's what the regex Im using
c\.CreateMap\<(\w)+\,\s+Address\>
and I want to in these
c.CreateMap<ClientAddress, Address>()
c.CreateMap<Responses.SiteAddress, Data.Address>()
and so on.
As soon as I add "Address" in the regex it stops matching any.
what am I doing wrong?
You can try this
c\.CreateMap\<\w+\.?\w+?\,\s*\w*?\.?Address\>
Explanation
c\.CreateMap\< - Matches c\.CreateMap\<.
\w+ - Matches any word character one or more time.
\.? - Matches '.' zero or one time.
\, - Matches ','.
\s* - Matches space zero or more time.
\w - Matches word character zero or more time.
\.? - Matches '.' zero or one time.
Address\> - Matches Address\>.
Demo
P.S- In case you also want to match something like this.
c.CreateMap<Responses.SiteAddress.abc, Data.Address.xyz>()
You can use this.
c\.CreateMap\<(\w+\.?\w+?)*\,\s*(?:\w*?\.?)*Address(\.\w*)?\>
Demo
Here is general regex I can suggest:
c\.CreateMap\<[\w.]+,\s+(?:[\w.]+\.)?Address\>\s*\(\s*\)
This will match any term with dots or word characters in the first position in the diamond. In the second, position, it will match Address, or some parent class names, followed by a dot separator, followed by Address.
Demo
Note that I also include the empty function call parentheses in the regex. As well, I allow for flexibility in the whitespace may appear after the diamond, or between the parentheses.
In your second example, you have extra dot which is not handled. Your regex needs little modification. Also, you don't need to escape < or > or , Use this,
c\.CreateMap<([\w.])+,\s+[\w.]*Address>
Demo
To match any of the functions on your question, you can use:
c\.CreateMap[^)]+\)
Regex Demo
Regex Explanation:

trying to find the correct regular expression

I have the following cases that should match with a regular expression, I've tried several combinations and have read a lot of answers but still no clue on how to solve it.
the rule is, find any combination of . inside a quoted string, atm I have the following regexp
\"\w*((..)|(.))\w*\"
that covers most of the cases:
mmmas"A.F"asdaAA
196.34.45.."asd."#
".add"
sss"a.aa"sss
".."
"a.."
"a..a"
"..A"
but still having problems with this one:
"WERA.HJJ..J"
I've been testing the regpexp in the http://regexr.com/ site
I will really appreciate any help on this
Change your regex to
\"\w*(\.+\w*)+\"
Update: escape . to match the dot and not any character
demo
From the question, it seems that you need to find every occurrence of one or more dot (along with optional word characters) inside a pair of quotes. The following regex would do this:
\"\w*(\.+\w*)+\"
In "WERA.HJJ..J", you have some word characters followed by a dot which is followed by a sequence of word characters again followed by dot and word characters. Your regex would match one or two dots with a pair of optional word character blocks on either sides only.
The dots in the regex are escaped to avoid them being matched against any character, since it is a metacharacter.
Check here.

Extract url based on specific keyword

I am crawling data from certain websites and I am looking to extract data from specific urls. One such case let say url with *devicehelp.optus.com.au/web/* as as example. PFB my regex -
/[^]*devicehelp\.optus\.com\.au\/web\/[^.]*/
This regex doesn't give me perfect match what I am looking for. Could someone please let me know what am I missing here?
Test urls -
*devicehelp.optus.com.au/web/*
http://www.top.abc.something.optus.devicehelp.optus.com.au/web/web/web/
This regex works when I test it on http://regexr.com/ but doesn't on https://regex101.com/
In most regex flavors, [^] is an invalid regex construct, while on the site you tested (regexr.com), this will be parsed as any character (since the regexr regex flavor is JavaScript).
To match any character but a newline zero or more times, you may use .*.
.*\bdevicehelp\.optus\.com\.au\/web\/.*
The \b is a word boundary, so as to match devicehelp as a whole word (if you do not intend to match it as a whole word, you may remove it). Dots should be escaped to match literal dots.