Regex to match a pattern within quotes - regex

I'm trying to match and substitute a pattern.
Test String: {1-Emp Name: "John", "2-Emp pat" : 1123,"3-Emp lwd" : 20}, "4-Emp Pat" : 1234}
I'm trying to match the pattern with the word "pat" from the test string and substitute
Expected Result: {1-Emp Name: "John", "matched Pattern" : 1123,"3-Emp lwd" : 20}, "matched Pattern" : 1234}
My regex: ".+?(?i)Pat.+?(?=:)

You can use
Regex pattern: (?i)"[^"]* Pat\b[^"]*("\s*:)
Replacement pattern: "matched pattern$1
See the regex demo. Details:
(?i) - case insensitive inline modifier
" - a " char
[^"]* - zero or more chars other than "
Pat - space + Pat word
\b - word boundary
[^"]* - zero or more chars other than "
("\s*:) - Group 1 ($1): ", zero or more whitespaces, :.

Related

Regex expression to remove only two first quotes in every line of text

I need to remove only first two quotes in every line:
"key0": "some very long text",
"key1": "some very long text",
into
key0: "some very long text",
key1: "some very long text",
I tried [^"]{2} "{2}
You can use
Find What: ^([^"]*)"([^"]*)"
Or, if there are lines with no quotes in them, so as to avoid overmatching:
Find What: ^([^"\r\n]*)"([^"\r\n]*)"
Replace With: $1$2
See the regex demo.
Details:
^ - start of string
([^"]*) - Group 1 ($1): zero or more chars other than a " char ([^"\r\n]* matches any zero or more chars other than ", CR and LF)
" - a " char
([^"]*) - Group 2 ($2): zero or more chars other than a " char
" - a " char

regex to match all whitespace except those between words and surrounding hyphens?

I'd like to sanitize a string so all whitespace is removed, except those between words, and surrounding hyphens
1234 - Text | OneWord , Multiple Words | Another Text , 456 -> 1234 - Text|OneWord,Multiple Words|Another Text,456
std::regex regex(R"(\B\s+|\s+\B)"); //get rid of whitespaces except between words
auto newStr = std::regex_replace(str, regex, "*");
newStr = std::regex_replace(newStr, std::regex("*-*"), " - ");
newStr = std::regex_replace(newStr, std::regex("*"), "");
this is what I currently use, but it is rather ugly and I'm wondering if there is a regex I can use to do this in one go.
You can use
(\s+-\s+|\b\s+\b)|\s+
Replace with $1, backreference to the captured substrings in Group 1. See the regex demo. Details:
(\s+-\s+|\b\s+\b) - Group 1: a - with one or more whitespaces on both sides, or one or more whitespaces in between word boundaries
| - or
\s+ - one or more whitespaces.
See the C++ demo:
std::string s("1234 - Text | OneWord , Multiple Words | Another Text , 456");
std::regex reg(R"((\s+-\s+|\b\s+\b)|\s+)");
std::cout << std::regex_replace(s, reg, "$1") << std::endl;
// => 1234 - Text|OneWord,Multiple Words|Another Text,456

Regex: Match a pattern within quoted texts

Text is
lemma A:
"
abx K() bc
"
// comment lemma B
lemma B:
"
abx bc sdsf
"
lemma C:
"
abfdfx K() bc
"
lemma D:
"
abxsf bc
"
I want to find the lemmas which contain K() inside its following quoted text. I have tried Perl regex (?s)^[ ]*lemma.*?"(?!").*?K\( but it overlaps two lemmas. The output should be: lemma A: "..." and lemma C: "...".
If the double quotes are at the start of the string, you can match a newline and then the double quote.
Then match any char except the double quote until you match K(
^[ ]*lemma\b.*\R"[^"]*K\(
^ Start of string
[ ]*lemma\b Match optional spaces and lemma
.*\R Match the rest of the line and a newline
"[^"]* Match " followed by optional chars other than "
K\( Match K(
Regex demo
You could use:
(?s)^[ ]*lemma[^"]*"[^"]*?K\(
[^"] means "any character but ""
See a demo here

Match every thing between "****" or [****]

I have a regex that look like this:
(?="(test)"\s*:\s*(".*?"|\[.*?]))
to match the value between "..." or [...]
Input
"test":"value0"
"test":["value1", "value2"]
Output
Group1 Group2
test value0
test "value1", "value2" // or - value1", "value2
I there any trick to ignore "" and [] and stick with two group, group1 and group2?
I tried (?="(test)"\s*:\s*(?="(.*?)"|\[(.*?)])) but this gives me 4 groups, which is not good for me.
You may use this conditional regex in PHP with branch reset group:
"(test)"\h*:\h*(?|"([^"]*)"|\[([^]]*)])
This will give you 2 capture groups in both the inputs with enclosing " or [...].
RegEx Demo
RegEx Details:
(?|..) is a branch reset group. Here Subpatterns declared within each alternative of this construct will start over from the same index
(?|"([^"]*)"|\[([^]]*)]) is if-then-else conditional subpatern which means if " is matched then use "([^"]*)" otherwise use \[([^]]*)] subpattern
You can use a pattern like
"(test)"\s*:\s*\K(?|"\K([^"]*)|\[\K([^]]*))
See the regex demo.
Details:
" - a " char
(test) - Group 1: test word
" - a " char
\s*:\s* - a colon enclosed with zero or more whitespaces
\K - match reset operator that clears the current overall match memory buffer (group value is still kept intact)
(?|"\K([^"]*)|\[\K([^]]*)) - a branch reset group:
"\K([^"]*) - matches a ", then discards it, and then captures into Group 2 zero or more chars other than "
| - or
\[\K([^]]*) - matches a [, then discards it, and then captures into Group 2 zero or more chars other than ]
In Java, you can't use \K and ?|, use capturing groups:
String s = "\"test\":[\"value1\", \"value2\"]";
Pattern pattern = Pattern.compile("\"(test)\"\\s*:\\s*(?:\"([^\"]*)|\\[([^\\]]*))");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("Key: " + matcher.group(1));
if (matcher.group(2) != null) {
System.out.println("Value: " + matcher.group(2));
} else {
System.out.println("Value: " + matcher.group(3));
}
}
See a Java demo.

Get regex to match multiple instances of the same pattern

So I have this regex - regex101:
\[shortcode ([^ ]*)(?:[ ]?([^ ]*)="([^"]*)")*\]
Trying to match on this string
[shortcode contact param1="test 2" param2="test1"]
Right now, the regex matches this:
[contact, param2, test1]
I would like it to match this:
[contact, param1, test 2, param2, test1]
How can I get regex to match the first instance of the parameters pattern, rather than just the last?
You may use
'~(?:\G(?!^)\s+|\[shortcode\s+(\S+)\s+)([^\s=]+)="([^"]*)"~'
See the regex demo
Details
(?:\G(?!^)\s+|\[shortcode\s+(\S+)\s+) - either the end of the previous match and 1+ whitespaces right after (\G(?!^)\s+) or (|)
\[shortcode - literal string
\s+ - 1+ whitespaces
(\S+) - Group 1: one or more non-whitespace chars
\s+ - 1+ whitespaces
([^\s=]+) - Group 2: 1+ chars other than whitespace and =
=" - a literal substring
([^"]*) - Group 3: any 0+ chars other than "
" - a " char.
PHP demo
$re = '~(?:\G(?!^)\s+|\[shortcode\s+(\S+)\s+)([^\s=]+)="([^"]*)"~';
$str = '[shortcode contact param1="test 2" param2="test1"]';
$res = [];
if (preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0)) {
foreach ($matches as $m) {
array_shift($m);
$res = array_merge($res, array_filter($m));
}
}
print_r($res);
// => Array( [0] => contact [1] => param1 [2] => test 2 [3] => param2 [4] => test1 )
Try using the below regex.
regex101
Below is your use case,
var testString = '[shortcode contact param1="test 2" param2="test1"]';
var regex = /[\w\s]+(?=[\="]|\")/gm;
var found = paragraph.match(regex);
If you log found you will see the result as
["shortcode contact param1", "test 2", " param2", "test1"]
The regex will match all the alphanumeric character including the underscore and blank spaces only if they are followed by =" or ".
I hope this helps.