Match all occurences with a single or regex

Match all occurences with a single or regex - regex

I need to find regex which matches both:
;hostname:MytestHello;
;message:#Hellowtestworld;
In this value:
;hostname:MytestHello;severity:major;message:#Hellowtestworld;
Here is my regex shot:
(hostname:|message:).*?(test).*?\;
But I only get the first occurence:
hostname:nimsofttest22;
What can I do in order to get BOTH results ?

While the multiple matching part is easy to solve with a global modifier or the correct language function/method that returns multiple matches, your pattern contains a flaw: it may return unwanted results if message or hostname with no test after them appear before another occurrence with test. See this regex demo to understand what I mean.
So, the correct way is to restrict . here, to match any char but ; (that acts as a delimiter in your string):
/(?:hostname|message):[^;]*?test[^;]*;/g
See this regex demo.
Note: you should adapt the pattern for any language method//function that you will choose later in the code.
Details
(?:hostname|message) - either of the 2 substrings
: - a colon
[^;]*? - any 0+ chars other than ;, as few as possible
test - test
[^;]* - any 0+ chars other than ; as many as possible
; - a semi-colon.

Related

How to match characters between two occurrences of the same but random string

Base string looks like:
repeatedRandomStr ABCXYZ /an/arbitrary/##-~/sequence/of_characters=I+WANT+TO+MATCH/repeatedRandomStr/the/rest/of/strings.etc
The things I know about this base string are:
ABCXYZ is constant and always present.
repeatedRandomStr is random, but its first occurrence is always at the beginning and before ABCXYZ
So far I looked at regex context matching, recursion and subroutines but couldn't come up with a solution myself.
My currently working solution is to first determine what repeatedRandomStr is with:
^(.*)\sABCXYZ
and then use:
repeatedRandomStr\sABCXYZ\s(.*)\srepeatedRandomStr
to match what I want in $1. But this requires two separate regex queries. I want to know if this can be done in a single execution.

In Go, where RE2 library is used, there is no way other than yours: keep extracting the value before the ABCXYZ and then use the regex to match a string between two strings, as RE2 does not and won't support backreferences.
In case the regex flavor can be switched to PCRE or compatible, you can use
^(.*?)\s+ABCXYZ\s(.*)\1
^(.*?)\s+ABCXYZ\s(.*?)\1
See the regex demo.
Details:
^ - start of string
(.*?) - Group 1: zero or more chars other than line break chars as few as possible
\s+ - one or more whitespaces
ABCXYZ - some constant string
\s - a whitespace
(.*) - Group 2: zero or more chars other than line break chars as many as possible
\1 - the same value as in Group 1.

Regex for URL path and query. (CodeScan warns of 'Polynomial regular expression')

Writing a regex that'll validate that some inputs are known link formats which I use on my site, an example would be /section/my-article-1?test=b
The requirements are
leading slash
the path just contains alfanumerics, dashes and slashes
queryparams are allowed
My regex is
/^((\/)[\dA-Za-z-]+)*(\/)?([&?=\dA-Za-z-])*$/;
This kinda works but it's not optimized.
Github CodeScan shows the warning 'Polynomial regular expression'
https://codeql.github.com/codeql-query-help/java/java-polynomial-redos/
I assume that's because the groups [\dA-Za-z-] and [&?=\dA-Za-z-] potentially could overlap and cause slowness. But I'm unsure of how to improve it while still allowing queryparams.
How would I optimize the regex?
Here's some testdata I've used
SHOULD MATCH
/
/section
/section/article-1
/section/article-1/
/section/article-1?x=y&hello=world
SHOULD NOT MATCH
section/article-1
/section/!$*
/x(1)
PS: my current regex does allow multiple slashes after eachother, which is undesirable so preventing that would also be a bonus.

See the Recommendation:
Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.
Change the pattern to such a regex that does not allow each subsequent part to match at the same location:
^(?:/(?:[0-9A-Za-z-]+/)*[&?=0-9A-Za-z-]*)?$
See the regex demo.
Here,
^ - start of string
(?:/(?:[0-9A-Za-z-]+/)*[&?=0-9A-Za-z-]*)? - an optional non-capturing group:
/ - a slash
(?:[0-9A-Za-z-]+/)* - zero or more repetitions of one or more alphanumeric or hyphen chars and then a / char
[&?=0-9A-Za-z-]* - zero or more alphanumeric, hyphen, =, ? or & chars
$ - end of string.
Just to show why it is efficent:
^ - matches start of string
(?:/(?:[0-9A-Za-z-]+/)*[&?=0-9A-Za-z-]*)? - either starts matching with / that is at the start of string, or matches an empty string
(?:[0-9A-Za-z-]+/)* - requires alphanumeric/hyphen to appear after a / ([0-9A-Za-z-] does not match /), and ends matching with a / if matched
[&?=0-9A-Za-z-]* only matches the chars in the class if preceded with a / char that is missing from the character class (i.e. no overlapping here, either)
$ - matches end of string.
Now, if the warning is still there, you can safely ignore it.

How to find a particular string

Im using Visual Studio 2017 and in a long long text file Im searching for a particular function but unable to find
here's what the regex Im using
c\.CreateMap\<(\w)+\,\s+Address\>
and I want to in these
c.CreateMap<ClientAddress, Address>()
c.CreateMap<Responses.SiteAddress, Data.Address>()
and so on.
As soon as I add "Address" in the regex it stops matching any.
what am I doing wrong?

You can try this
c\.CreateMap\<\w+\.?\w+?\,\s*\w*?\.?Address\>
Explanation
c\.CreateMap\< - Matches c\.CreateMap\<.
\w+ - Matches any word character one or more time.
\.? - Matches '.' zero or one time.
\, - Matches ','.
\s* - Matches space zero or more time.
\w - Matches word character zero or more time.
\.? - Matches '.' zero or one time.
Address\> - Matches Address\>.
Demo
P.S- In case you also want to match something like this.
c.CreateMap<Responses.SiteAddress.abc, Data.Address.xyz>()
You can use this.
c\.CreateMap\<(\w+\.?\w+?)*\,\s*(?:\w*?\.?)*Address(\.\w*)?\>
Demo

Here is general regex I can suggest:
c\.CreateMap\<[\w.]+,\s+(?:[\w.]+\.)?Address\>\s*\(\s*\)
This will match any term with dots or word characters in the first position in the diamond. In the second, position, it will match Address, or some parent class names, followed by a dot separator, followed by Address.
Demo
Note that I also include the empty function call parentheses in the regex. As well, I allow for flexibility in the whitespace may appear after the diamond, or between the parentheses.

In your second example, you have extra dot which is not handled. Your regex needs little modification. Also, you don't need to escape < or > or , Use this,
c\.CreateMap<([\w.])+,\s+[\w.]*Address>
Demo

To match any of the functions on your question, you can use:
c\.CreateMap[^)]+\)
Regex Demo
Regex Explanation:

Regex Validation to exclude a group

I have two scenarios where I need two regex.
/vendors?(\-[a-z]*)*/
/vendor-staff?(\-[a-z]*)*/
My problem is that first one interfere with second one:
With the first one I need to capture cases like: vendor, vendor-add, vendor-edit, vendor-list;
Second one needs to capture cases where: vendor-staff-add, vendor-staff-edit exists only;
How can I do that? I tried several options without success.
I tried to validate those here: https://regexr.com/3uddc
Thank you

You may add a negative lookahead (?!-staf) after vendor in the first regex:
vendor(?!-staf)s?(-[a-z]*)*
To prevent consecutive hyphens, you need to replace the [a-z]* with [a-z]+ pattern:
vendor(?!-staf)s?(-[a-z]+)*
See the regex demo
Details
vendor - a literal substring
(?!-staf) - no -staf substring allowed right after vendor
s? - an optional s
(-[a-z]+)* - 0 or more occurrences of - and then 1+ lowercase ASCII letters.

Mixing Lookahead and Lookbehind in 1 Regexp

I'm trying to match first occurrence of window.location.replace("http://stackoverflow.com") in some HTML string.
Especially I want to capture the URL of the first window.location.replace entry in whole HTML string.
So for capturing URL I formulated this 2 rules:
it should be after this string: window.location.redirect("
it should be before this string ")
To achieve it I think I need to use lookbehind (for 1st rule) and lookahead (for 2nd rule).
I end up with this Regex:
.+(?<=window\.location\.redirect\(\"?=\"\))
It doesn't work. I'm not even sure that it legal to mix both rules like I did.
Can you please help me with translating my rules to Regex? Other ways of doing this (without lookahead(behind)) also appreciated.

The pattern you wrote is really not the one you need as it matches something very different from what you expect: text window.location.redirect("=") in text window.location.redirect("=") something. And it will only work in PCRE/Python if you remove the ? from before \" (as lookbehinds should be fixed-width in PCRE). It will work with ? in .NET regex.
If it is JS, you just cannot use a lookbehind as its regex engine does not support them.
Instead, use a capturing group around the unknown part you want to get:
/window\.location\.redirect\("([^"]*)"\)/
or
/window\.location\.redirect\("(.*?)"\)/
See the regex demo
No /g modifier will allow matching just one, first occurrence. Access the value you need inside Group 1.
The ([^"]*) captures 0+ characters other than a double quote (URLs you need should not have it). If these URLs you have contain a ", you should use the second approach as (.*?) will match any 0+ characters other than a newline up to the first ").

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Match all occurences with a single or regex - regex

Related

How to match characters between two occurrences of the same but random string

Regex for URL path and query. (CodeScan warns of 'Polynomial regular expression')

How to find a particular string

Regex Validation to exclude a group

Mixing Lookahead and Lookbehind in 1 Regexp

Categories

Resources