Pattern Matching Multiple Word Patterns - regex

I'm hoping someone can help with some regular expression. I'm trying find instances where two patterns exist in a string (I think I'm saying that right).
Here is my test string:
{"eventid": 2121, "username":"FRED", "starttime": "1550243080", "newprocessname": "C:\\Windows\\System32\\wbem\\WmiPrvSE.exe", "parentprocessname": "C:\\Windows\\System32\\svchost.exe"}
I want to be able to search based on one or more criteria. The problem I have is when multiple criteria is provided. For example, the following seems to match if username is good or newprocessname is good whereas I want it to match only if both are good.
("username"\s*:\s*"(.*?)FRED(.*?)")|("newprocessname"\s*:\s*"(.*?)WINDOWS(.*?)")
I think my patterns are right, but how do I return a result only if both patterns match?
I hope I'm explaining this correct???
Thank you!
Jon

If you're using a regex library that supports lookaheads you can use this regex:
^(?=.*"username"\s*:\s*"([^"]*)FRED([^"]*)")(?=.*"newprocessname"\s*:\s*"([^"]*)WINDOWS([^"]*)")
It looks for both the username and newprocessname tags at the same time (as the lookaheads don't consume any characters). Note that you need to change .*? in your regex to [^"]* otherwise you can match text further into the string than the actual value associated with the tag name.
Demo on regex101

Related

Find/Replace Pascal Case EXCEPT when prepended by particular field (e.g. data-control=")

I have an existing RegEx to match PascalCase in certain scenarios:
([^\w<\->\s])([A-Z][a-z]+)((?:[A-Z][a-z]+)*)
I'm looking for a way to NOT match when these groups are prepended by data-control= or name= or id=.
Example Input:
<input data-control="Phone2" ng-model="UserInformation.Phone2">
Desired Output:
I want to match on "UserInformation", "Phone2" in ng-model, but DON'T match on "Phone2" in data-control.
Regex101 fiddle
Thanks to #0x263A for the suggestion in comments of using a negative lookbehind (I've only been looking into lookaheads, and been trying to engineer the wrong tool for this).
With their help, I was able to come up with a solution that disqualifies certain matches via a list type format (in this case, don't match if preceded by data-control=, id=, or name=):
(?<!data-control=)(?<!id=)(?<!name=)([^\w<\->\s])([A-Z][a-z]+)((?:[A-Z][a-z]+)*)
If I find a match of something I want to ignore additionally, I'll just add it to the list. Thanks all!

Regex in a Tweet: match #reply WITHOUT matching #mention

I am working with Tweets.
I am looking for a regular expression that matches all the replies #reply that are present in the beginning of a Tweet WITHOUT matching the mentions #mention.
For example I have this Tweeet:
#AlexWassabi #laurDIY now i type my text and i mention #mention
I would like to match #AlexWassabi and #laurDIY WITHOUT matching #mention.
Please note that in this example there are 2 replies (e.g #AlexWassabi and #laurDIY) but in reality we can have more #replies that are followed by each other, then we have text and then we have a mention.
Could please suggest a regular expression that does this job?
Thank you.
^(#\w+\s)+ will match all the prefixed "#reply"(s) at beginning of string only.
regex101.com demo

can a regex match cn.cn. or ti.ti. but not vv.pp. or aa.bb.?

is it possible with regex to match a particular sequence repeating it self rather than number of letters? I would like to be able to match cn.cn. or ti.ti. or xft.xft. but not vv.pp. or aa.bb. and I do not seam to be able to do that with (\w\w.)+ opposed to \w+.\w+. in the first case I want in fact to use only one occurrence, like cn. or ti. in the second I want to keep v.p. or a.b.
thanks for any help.
Depending on your flavor of regex, you can use backreferences in your regex to match an earlier group. Your question title and question body disagree, however, on what exactly is supposed to be matched. I'll answer in Python as that's the flavor I'm most familiar with.
# match vv.pp., no match cn.cn.
re.match(r"(\w)\1\.(\w)\2\.", some_text)
# match cn.cn., no match vv.pp.
re.match(r"(\w{2})\.\1\.", some_text)

Perl regex to match only if not followed by both patterns

I am trying to write a pattern match to only match when a string is not followed by both following patterns. Right now I have a pattern that I've tried to manipulate but I can't seem to get it to match correctly.
Current pattern:
/(address|alias|parents|members|notes|host|name)(?!(\t{5}|\S+))/
I am trying to match when a string is not spaced correctly but not if it is part of a larger word.
For example I want it to match,
host \t{4} something
but not,
hostgroup \t{5} something
In the above example it will match hostgroup and end up separating it into 2 separate words "host" and "group"
Match:
notes \t{4} something
but not,
notes_url \t{5} something
Using my pattern it ends up turning into:
notes \t{5} _url
Hopefully that makes a bit more sense.
I'm not at all clear what you want, but word boundaries will probably do what you ask.
Does this work for you?
/\b(address|alias|parents|members|notes|host|name)\b(?!\t{5})/
Update
Having understood your problem better, does this do what you want?
/\b(address|alias|parents|members|notes|host|name)\b(?!\t{5}(?!\t))/

How match data inside a tag, but don't other similar tags

Regexr link for the lazy: http://regexr.com?33udv
Test string:
<li><strong>Start</strong></li><li>End</li>
I want to match when I search for "Start"
<li><strong>Start</strong></li>
My pattern is this:
<li>(?!<li>)*Start.*?</li>
My issue is that it's matching both list children, when I only want to match the one that contains "Start".
Note: This is a very predictable html string that will always look the same. I know Regex shouldn't parse html, but the question is more about understanding of Negative Lookaheads.
Solution:
<li>((?!<li>).)*Start.*?</li>
The expression you posted is different than the one from the link. I will focus on the one from the link.
.* is greedy, it will try to find the longest match. You want it to be lazy:
<li>.*?Start.*?</li>