Get regex to match multiple instances of the same pattern - regex

So I have this regex - regex101:
\[shortcode ([^ ]*)(?:[ ]?([^ ]*)="([^"]*)")*\]
Trying to match on this string
[shortcode contact param1="test 2" param2="test1"]
Right now, the regex matches this:
[contact, param2, test1]
I would like it to match this:
[contact, param1, test 2, param2, test1]
How can I get regex to match the first instance of the parameters pattern, rather than just the last?

You may use
'~(?:\G(?!^)\s+|\[shortcode\s+(\S+)\s+)([^\s=]+)="([^"]*)"~'
See the regex demo
Details
(?:\G(?!^)\s+|\[shortcode\s+(\S+)\s+) - either the end of the previous match and 1+ whitespaces right after (\G(?!^)\s+) or (|)
\[shortcode - literal string
\s+ - 1+ whitespaces
(\S+) - Group 1: one or more non-whitespace chars
\s+ - 1+ whitespaces
([^\s=]+) - Group 2: 1+ chars other than whitespace and =
=" - a literal substring
([^"]*) - Group 3: any 0+ chars other than "
" - a " char.
PHP demo
$re = '~(?:\G(?!^)\s+|\[shortcode\s+(\S+)\s+)([^\s=]+)="([^"]*)"~';
$str = '[shortcode contact param1="test 2" param2="test1"]';
$res = [];
if (preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0)) {
foreach ($matches as $m) {
array_shift($m);
$res = array_merge($res, array_filter($m));
}
}
print_r($res);
// => Array( [0] => contact [1] => param1 [2] => test 2 [3] => param2 [4] => test1 )

Try using the below regex.
regex101
Below is your use case,
var testString = '[shortcode contact param1="test 2" param2="test1"]';
var regex = /[\w\s]+(?=[\="]|\")/gm;
var found = paragraph.match(regex);
If you log found you will see the result as
["shortcode contact param1", "test 2", " param2", "test1"]
The regex will match all the alphanumeric character including the underscore and blank spaces only if they are followed by =" or ".
I hope this helps.

Related

Regex to match a pattern within quotes

I'm trying to match and substitute a pattern.
Test String: {1-Emp Name: "John", "2-Emp pat" : 1123,"3-Emp lwd" : 20}, "4-Emp Pat" : 1234}
I'm trying to match the pattern with the word "pat" from the test string and substitute
Expected Result: {1-Emp Name: "John", "matched Pattern" : 1123,"3-Emp lwd" : 20}, "matched Pattern" : 1234}
My regex: ".+?(?i)Pat.+?(?=:)
You can use
Regex pattern: (?i)"[^"]* Pat\b[^"]*("\s*:)
Replacement pattern: "matched pattern$1
See the regex demo. Details:
(?i) - case insensitive inline modifier
" - a " char
[^"]* - zero or more chars other than "
Pat - space + Pat word
\b - word boundary
[^"]* - zero or more chars other than "
("\s*:) - Group 1 ($1): ", zero or more whitespaces, :.

Match every thing between "****" or [****]

I have a regex that look like this:
(?="(test)"\s*:\s*(".*?"|\[.*?]))
to match the value between "..." or [...]
Input
"test":"value0"
"test":["value1", "value2"]
Output
Group1 Group2
test value0
test "value1", "value2" // or - value1", "value2
I there any trick to ignore "" and [] and stick with two group, group1 and group2?
I tried (?="(test)"\s*:\s*(?="(.*?)"|\[(.*?)])) but this gives me 4 groups, which is not good for me.
You may use this conditional regex in PHP with branch reset group:
"(test)"\h*:\h*(?|"([^"]*)"|\[([^]]*)])
This will give you 2 capture groups in both the inputs with enclosing " or [...].
RegEx Demo
RegEx Details:
(?|..) is a branch reset group. Here Subpatterns declared within each alternative of this construct will start over from the same index
(?|"([^"]*)"|\[([^]]*)]) is if-then-else conditional subpatern which means if " is matched then use "([^"]*)" otherwise use \[([^]]*)] subpattern
You can use a pattern like
"(test)"\s*:\s*\K(?|"\K([^"]*)|\[\K([^]]*))
See the regex demo.
Details:
" - a " char
(test) - Group 1: test word
" - a " char
\s*:\s* - a colon enclosed with zero or more whitespaces
\K - match reset operator that clears the current overall match memory buffer (group value is still kept intact)
(?|"\K([^"]*)|\[\K([^]]*)) - a branch reset group:
"\K([^"]*) - matches a ", then discards it, and then captures into Group 2 zero or more chars other than "
| - or
\[\K([^]]*) - matches a [, then discards it, and then captures into Group 2 zero or more chars other than ]
In Java, you can't use \K and ?|, use capturing groups:
String s = "\"test\":[\"value1\", \"value2\"]";
Pattern pattern = Pattern.compile("\"(test)\"\\s*:\\s*(?:\"([^\"]*)|\\[([^\\]]*))");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("Key: " + matcher.group(1));
if (matcher.group(2) != null) {
System.out.println("Value: " + matcher.group(2));
} else {
System.out.println("Value: " + matcher.group(3));
}
}
See a Java demo.

Scala regex : capture between group

In below regex I need "test" as output but it gives complete string which matches the regex. How can I capture string between two groups?
val pattern = """\{outer.*\}""".r
println(pattern.findAllIn(s"try {outer.test}").matchData.map(step => step.group(0)).toList.mkString)
Input : "try {outer.test}"
expected Output : test
current output : {outer.test}
You may capture that part using:
val pattern = """\{outer\.([^{}]*)\}""".r.unanchored
val s = "try {outer.test}"
val result = s match {
case pattern(i) => i
case _ => ""
}
println(result)
The pattern matches
\{outer\. - a literal {outer. substring
([^{}]*) - Capturing group 1: zero or more (*) chars other than { and } (see [^{}] negated character class)
\} - a } char.
NOTE: if your regex must match the whole string, remove the .unanchored I added to also allow partial matches inside a string.
See the Scala demo online.
Or, you may change the pattern so that the first part is no longer as consuming pattern (it matches a string of fixed length, so it is possible):
val pattern = """(?<=\{outer\.)[^{}]*""".r
val s = "try {outer.test}"
println(pattern.findFirstIn(s).getOrElse(""))
// => test
See this Scala demo.
Here, (?<=\{outer\.), a positive lookbehind, matches {outer. but does not put it into the match value.

Extract groups separated by space

I've got following string (example):
Loader[data-prop data-attr="value"]
There can be 1 - n attributes. I want to extract every attribute. (data-prop,data-attr="value"). I tried it in many different ways, for example with \[(?:(\S+)\s)*\] but I didn't get it right. The expression should be written in PREG style..
I suggest grabbing all the key-value pairs with a regex:
'~(?:([^][]*)\b\[|(?!^)\G)\s*(\w+(?:-\w+)*(?:=(["\'])?[^\]]*?\3)?)~'
(see regex demo) and then
See IDEONE demo
$re = '~(?:([^][]*)\b\[|(?!^)\G)\s*(\w+(?:-\w+)*(?:=(["\'])?[^\]]*?\3)?)~';
$str = "Loader[data-prop data-attr=\"value\" more-here='data' and-one-more=\"\"]";
preg_match_all($re, $str, $matches);
$arr = array();
for ($i = 0; $i < count($matches); $i++) {
if ($i != 0) {
$arr = array_merge(array_filter($matches[$i]),$arr);
}
}
print_r(preg_grep('~\A(?![\'"]\z)~', $arr));
Output:
Array
(
[3] => data-prop
[4] => data-attr="value"
[5] => more-here='data'
[6] => and-one-more=""
[7] => Loader
)
Notes on the regex (it only looks too complex):
(?:([^][]*)\b\[|(?!^)\G) - a boundary: we only start matching at a [ that is preceded with a word (a-zA-Z0-9_) character (with \b\[), or right after a successful match (with (?!^)\G). Also, ([^][]*) will capture into Group 1 the part before the [.
\s* - matches zero or more whitespace symbols
(\w+(?:-\w+)*) - captures into Group 2 "words" like "word1" or "word1-word2"..."word1-wordn"
(?:=(["\'])?[^\]]*?\3)? - optional group (due to (?:...)?) matching
= - an equal sign
(["\'])? - Group 3 (auxiliary group to check for the value delimiter) capturing either ", ' or nothing
[^\]]*? - (value) zero or more characters other than ] as few as possible
\3 - the closing ' or " (the same value captured in Group 3).
Since we cannot get rid of capturing ' or ", we can preg_grep all the elements that we are not interested in with preg_grep('~\A(?![\'"]\z)~', $arr) where \A(?![\'"]\z) matches any string that is not equal to ' or ".
how about something like [\s\[]([^\s\]]+(="[^"]+)*)+
gives
MATCH 1: data-prop
MATCH 2: data-attr="value"

Why won't Groovy honor "OR" instances in my regex?

It is well established that "|" in a regex is the "OR" operator. So when I run this:
static void main(String[] args) {
String permission = "[fizz]:[index]"
if((permission =~ /\[fizz|buzz]:\[.*]/).matches()) {
println "We match!"
} else {
println "We don't match!"
}
}
...then why does it print "We don't match!"???
The regex \[fizz|buzz]:\[.*] matches:
\[fizz - literal [ followed by fizz
| - OR operator....
buzz]:\[ - matches literal buzz]:[
.* - any character but a newline, as many times as possible, greedy
] - a literal ].
I think you need to re-group the alternatives:
if((permission =~ /\[(?:fizz|buzz)]:\[[^\]]*]/).matches()) {
Here, \[(?:fizz|buzz)]:\[[^\]]*] will match a [, then either fizz or buzz without capturing the words, then ]:[, [^\]]* will match 0 or more any characters but a ] and then ].
Check the regex101 demo. Also checked at OCP Regex Visualizer: