RegEx - grouping a string - regex

Can't seem to figure out an expression which handles this line of text:
'SOME_TEXT','EVEN_MORE_TEXT','EXPRESSION IS IN (''YES'',''NO'')'
To this groupings
SOME_TEXT
EVEN_MORE_TEXT
EXPRESSION IS IN ('YES', 'NO')
....I'd rather have a nifty regex than solving this by string functions like indexOf(), etc..

The regex '([^']|'')++' will match the parts you're interested in, as this demo shows:
$text = "'SOME_TEXT','EVEN_MORE_TEXT','EXPRESSION IS IN (''YES'',''NO'')'";
preg_match_all("/'([^']|'')+'/", $text, $matches);
print_r($matches[0]);
which prints:
Array
(
[0] => 'SOME_TEXT'
[1] => 'EVEN_MORE_TEXT'
[2] => 'EXPRESSION IS IN (''YES'',''NO'')'
)

Related

php separate strings with delimiters

i have string for example:
$stringExample = "(({FAPAGE15}+500)/{GOGA:V18})"
// separete content { }
I need the result to be something like that: :
$response = array("FAPAGE15","GOGA:V18")
I assume it must be something with : preg_split or preg_match
Here's the regex you need:
\{(.*?)\}
Regex example:
http://regex101.com/r/qU8eB0
PHP:
$str = "(({FAPAGE15}+500)/{GOGA:V18})";
preg_match_all("/\{(.*?)\}/", $str, $matches);
print_r($matches[1]);
Output:
Array
(
[0] => FAPAGE15
[1] => GOGA:V18
)
Working Example:
https://eval.in/92516
You can use a negative character class: [^}] (all that is not a })
preg_match_all('~(?<={)[^}]++(?=})~', $str, $matches);
$result = $matches[0];
pattern details
~ # pattern delimiter
(?<={) # preceded by {
[^}]++ # all that is not a } one or more times (possessive)
(?=}) # followed by }
~ # pattern delimiter
note: the possessive quantifier ++ is not essential to have the good result and can be replaced by +. You can find more informations about this feature here.

Optionally prevent a string at the end of a wildcard from being matched

I have the following string:
12345 This could be anythingREMOVE
I need to match 12345 and This could be anything. Unfortunately, the format I need to parse also has a string at the end of the line that isn't always present (REMOVE in this example). How can I match what I'm looking for without REMOVE? I've tried the following pattern:
^(\d{5}) (.*)(?:REMOVE|$)
Unfortunately, REMOVE is picked up by the wildcard:
(
[0] => Array
(
[0] => 12345 This could be anythingREMOVE
)
[1] => Array
(
[0] => 12345
)
[2] => Array
(
[0] => This could be anythingREMOVE
)
)
If last string REMOVE is optional then why can't use use htis regex:
"/^(\d{5}) /"
However if you really want to avoid REMOVE in matching pattern then use this:
$s = '12345 This could be anythingREMOVE';
if (preg_match("/^(\d{5}) (.*?)(?:REMOVE|)$/", $s, $arr))
var_dump($arr);
Output:
array(3) {
[0]=>
string(34) "12345 This could be anythingREMOVE"
[1]=>
string(5) "12345"
[2]=>
string(22) "This could be anything"
}
You can try this regex:
^(\d{5})((?:.(?!REMOVE))+.)
How It Works
^(\d{5}) -- Matches start of string, followed by five digits [0-9]. Group of parentheses use to captured the text matched.
((?:.(?!REMOVE))+ -- Matches any character if not immediately followed by the secuence REMOVE one or more times. It stops at the n in anything. it can't match the g because is followed by REMOVE.
.) -- Allow the g to match.

RegExp pattern to capture around two-characters delimiter

I have a string which is something like:
prefix::key0==value0::key1==value1::key2==value2::key3==value3::key4==value4::
I want to retrieve the value associated to a key (say, key1). The following pattern:
::key1==([^:]*)
...will work only if there are no ':' character in the value, so I want to make sure the pattern matching will stop only for the substring ::, but I'm can't find how to do that, as most examples I see are about single character matching.
How do I modify the regexp pattern to match all characters between "::key1==" and the next "::" ?
Thanks!
Can you do something like this : ::key1==(.*?)::? Assuming the language supports the lazy ? operator, this should work.
As mentioned in my comment to your question, if the entirety of your string is
prefix::key0==value0::key1==value1::key2==value2::key3==value3::key4==value4::
I would suggest exploding/splitting the string at :: instead of using regex as it will usually always be faster. You didn't specify language but here is a php example:
// string
$string = "prefix::key0==value0::key1==value1::key2==value2::key3==value3::key4==value4::";
// explode using :: as delimiter
$string = explode('::',$string);
// for each element...
foreach ($string as $value) {
// check if it has == in it
if (strpos($value,'==')!==false) $matches[] = $value;
}
// output
echo "<pre>";print_r($matches);
output:
Array
(
[0] => key0==value0
[1] => key1==value1
[2] => key2==value2
[3] => key3==value3
[4] => key4==value4
)
However, if you insist on the regex approach, here negative look-ahead alternative
::((?:(?!::).)+)
php example
// string
$string = "prefix::key0==value0::key1==value1::key2==value2::key3==value3::key4==value4::";
preg_match_all('~::((?:(?!::).)+)~',$string,$matches);
//output
echo "<pre>";print_r($matches);
output
Array
(
[0] => key0==value0
[1] => key1==value1
[2] => key2==value2
[3] => key3==value3
[4] => key4==value4
)
I think you're looking for a positive look-ahead:
::key0==(.*?)(?=::\w+==)
With the following:
prefix::key0==val::ue0::key1==value1::key2==value2::key3==value3::key4==value4::
It correctly finds val::ue0. This also assumes the keys conform to \w ([0-9A-Za-z_])
Also, a positive look-ahead may be a bit of overkill, but will work if the answer contains ::, too.

Regex: How to "step back"

I am having some trouble cooking up a regex that produces this result:
Mike1, misha1,2, miguel1,2,3,4,5,6,7,18, and Michea2,3
How does one step back in regex and discard the last match? That is I need a comma before a space to not match. This what I came up with...
\d+(,|\r)
Mike1, misha1,2, miguel1,2,3,4,5,6,7,18, and Micheal2,3
The regex feature you're asking about is called a positive lookbehind. But in your case, I don't think you need it. Try this:
\d+(?:,\d+)*
In your example, this will match the comma delimited lists of numbers and exclude the names and trailing commas and whitespace.
Here is a short bit of test code written in PHP that verifies it on your input:
<?php
$input = "Mike1, misha1,2, miguel1,2,3,4,5,6,7,18, and Micheal2,3";
$matches = array();
preg_match_all('/\d+(?:,\d+)*/', $input, $matches);
print_r($matches[0]);
?>
outputs:
Array
(
[0] => 1
[1] => 1,2
[2] => 1,2,3,4,5,6,7,18
[3] => 2,3
)
I believe \d+,(?!\s) will do what you want. The ?! is a negative lookahead, which only matches if what follows the ?! does not appear at this position in the search string.
>>> re.findall(r'\d+,(?!\s)', 'Mike1, misha1,2, miguel1,2,3,4,5,6,7,18, and Michea2,3')
['1,', '1,', '2,', '3,', '4,', '5,', '6,', '7,', '2,']
Or if you want to match the comma-separated list of numbers excluding the final comma use \d+(?:,\d+)*.
>>> re.findall(r'\d+(?:,\d+)*', 'Mike1, misha1,2, miguel1,2,3,4,5,6,7,18, and Michea2,3')
['1', '1,2', '1,2,3,4,5,6,7,18', '2,3']

PHP/Javascript RegExp - Non-capturing group

I have three variations of a string:
1. view=(edit:29,30)
2. view=(edit:29,30;)
3. view=(edit:29,30;x:100;y:200)
I need a RegExp that:
capture up to and including ",30"
capture "x:100;y:200" - whenever there's a semicolon after the first match;
WILL NOT include leftmost semicolon in any of the groups;
entire string on the right of the first semicolon and up to ')' can/should be in the same group.
I came up with:
$pat = '/view=\((\w+)(:)([\d,]+)((;[^)]+){0,}|;)\)/';
Applied to 'view=(edit:29,30;x:100;y:200)' it yields:
Array
(
[0] => view=(edit:29,30;x:100;y:200)
[1] => edit
[2] => :
[3] => 29,30
[4] => ;x:100;y:200
[5] => ;x:100;y:200
)
THE QUESTION. How do I remove ';' from matches [4] and [5]?
IMPORTANT. The same RegExp should work with a string when no semicolons are present, as: 'view=(edit:29,30)'.
$pat = '/view=\((\w+)(:)([\d,]+)((;[^)]+){0,}|;)\)/';
$str = 'view=(edit:29,30;x:100;y:200)';
preg_match($pat, $str, $m);
print_r($m);
Thanks!
You don’t need to group everything. Try this regular expression:
/view=\((\w+):([\d,]+)(?:;([^)]+)?)?\)/
I guess you want something like this:
$pattern = '/view=\\((\\w+):(\\d+,\\d+)(?:;((?:\\w+:\\d+;?)*))?\\)/';
Should return
[0] view=(edit:29,30;x:100;y:200)
[1] edit
[2] 29,30
[3] x:100;y:200