PHP/Javascript RegExp - Non-capturing group - regex

I have three variations of a string:
1. view=(edit:29,30)
2. view=(edit:29,30;)
3. view=(edit:29,30;x:100;y:200)
I need a RegExp that:
capture up to and including ",30"
capture "x:100;y:200" - whenever there's a semicolon after the first match;
WILL NOT include leftmost semicolon in any of the groups;
entire string on the right of the first semicolon and up to ')' can/should be in the same group.
I came up with:
$pat = '/view=\((\w+)(:)([\d,]+)((;[^)]+){0,}|;)\)/';
Applied to 'view=(edit:29,30;x:100;y:200)' it yields:
Array
(
[0] => view=(edit:29,30;x:100;y:200)
[1] => edit
[2] => :
[3] => 29,30
[4] => ;x:100;y:200
[5] => ;x:100;y:200
)
THE QUESTION. How do I remove ';' from matches [4] and [5]?
IMPORTANT. The same RegExp should work with a string when no semicolons are present, as: 'view=(edit:29,30)'.
$pat = '/view=\((\w+)(:)([\d,]+)((;[^)]+){0,}|;)\)/';
$str = 'view=(edit:29,30;x:100;y:200)';
preg_match($pat, $str, $m);
print_r($m);
Thanks!

You don’t need to group everything. Try this regular expression:
/view=\((\w+):([\d,]+)(?:;([^)]+)?)?\)/

I guess you want something like this:
$pattern = '/view=\\((\\w+):(\\d+,\\d+)(?:;((?:\\w+:\\d+;?)*))?\\)/';
Should return
[0] view=(edit:29,30;x:100;y:200)
[1] edit
[2] 29,30
[3] x:100;y:200

Related

Optionally prevent a string at the end of a wildcard from being matched

I have the following string:
12345 This could be anythingREMOVE
I need to match 12345 and This could be anything. Unfortunately, the format I need to parse also has a string at the end of the line that isn't always present (REMOVE in this example). How can I match what I'm looking for without REMOVE? I've tried the following pattern:
^(\d{5}) (.*)(?:REMOVE|$)
Unfortunately, REMOVE is picked up by the wildcard:
(
[0] => Array
(
[0] => 12345 This could be anythingREMOVE
)
[1] => Array
(
[0] => 12345
)
[2] => Array
(
[0] => This could be anythingREMOVE
)
)
If last string REMOVE is optional then why can't use use htis regex:
"/^(\d{5}) /"
However if you really want to avoid REMOVE in matching pattern then use this:
$s = '12345 This could be anythingREMOVE';
if (preg_match("/^(\d{5}) (.*?)(?:REMOVE|)$/", $s, $arr))
var_dump($arr);
Output:
array(3) {
[0]=>
string(34) "12345 This could be anythingREMOVE"
[1]=>
string(5) "12345"
[2]=>
string(22) "This could be anything"
}
You can try this regex:
^(\d{5})((?:.(?!REMOVE))+.)
How It Works
^(\d{5}) -- Matches start of string, followed by five digits [0-9]. Group of parentheses use to captured the text matched.
((?:.(?!REMOVE))+ -- Matches any character if not immediately followed by the secuence REMOVE one or more times. It stops at the n in anything. it can't match the g because is followed by REMOVE.
.) -- Allow the g to match.

VB2010 Extract all recurring strings with A -B as start-stop of each string from source

I'm looking for way to extract "abc" from source where ABC will always start with "X" and will stop with "Y".
At this moment I'm using:
Dim myString As String = source3RTB.Text
Dim finalString As String = myString.Substring((myString.IndexOf("X")), (myString.IndexOf("Y") - myString.IndexOf("X")) + 1)
source2RTB.Text = finalString
sourceRTB.Text = myString.Trim(finalString)
But there is problem as above code only selecting first X and first Y...
Source is complicated set of lines (xxxx) regex is not working too well for it (?<=X)(.*?)(?=Y) is only working for small piece of source, when I try it on whole source it is not working (not sure if it is because of new line, or...)
Any idea ?
Description
This regex will capture the inner text between an X and Y
(?:^|\s*?)\b(x(\w*)y)\b(?=\s*|$)
The \w* can be replaced with whatever search you're looking for. If you're looking to match new lines with the .*? then you should use the m option with your regex command to allow . to match new line characters.
I would have further customized this solution if the OP would have included sample text.
Groups
Group 0 gets the entire matching string including preceding spaces
gets the full string including open and close tags
gets the inner string of alphabetical charcters
PHP Code Example
<?php
$sourcestring="I'am xlikelyy xupvotey photos of xkittensy on the internet.";
preg_match_all('/(?:^|\s*?)\b(x(\w*)y)\b(?=\s*|$)/i',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
$matches Array:
(
[0] => Array
(
[0] => xlikelyy
[1] => xupvotey
[2] => xkittensy
)
[1] => Array
(
[0] => xlikelyy
[1] => xupvotey
[2] => xkittensy
)
[2] => Array
(
[0] => likely
[1] => upvote
[2] => kittens
)
)

How do I do multiple backtracked words if only with periods or single characters in parentheses in regex?

I have an entry like so:
Beetle as creator. So. Am. Indian (Lengua): Métraux BBAE CXLIII (1) 367 (Guaranyi):
which I am trying to parse out to find the names of tribes. I want to be able to get So. Am. Indian, Lengua and Guaranyi, but avoid the (1).
I've gotten this so far:
\w+.[A-Za-z0-9_.()]+:| \(.*?\)
which gives me Indian, Lengua and Guaranyi but also the 367 which isn't correct. I'm not great at regex and I've just spent three hours on this so I was hoping someone might give me a pointer. thanks!
Your criteria to distinguish tribes and common words are not really clear for me, but here is an attempt in perl:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use utf8;
my $str = 'Beetle as creator. So. Am. Indian (Lengua): Métraux BBAE CXLIII (1) 367 (Guaranyi): ';
my #tribes = $str =~ /((?:\p{Lu}\p{Ll}+\.?\s?)+)/g;
print Dumper\#tribes;
explanation:
/ : Regex delimiter
( : Begin capture group 1
(?: : Begin non capture group
\p{Lu} : An uppercase letter
\p{Ll}+ : One or more lowercase letter
\.? : A dot 0 or 1 time
\s? : A space 0 or 1 time
)+ : End non capture group repeated 1 or more time
) : End of captur group
/ : Regex delimiter
g : Global search
output:
$VAR1 = [
'Beetle ',
'So. Am. Indian ',
'Lengua',
'Métraux',
'Guaranyi'
];
Like you can see it captures also Beetle and Métraux.
Same code in php:
$str = 'Beetle as creator. So. Am. Indian (Lengua): Métraux BBAE CXLIII (1) 367 (Guaranyi): ';
preg_match_all('/((?:\p{Lu}\p{Ll}+\.?\s?)+)/u', $str, $tribes);
print_r($tribes);
output:
Array
(
[0] => Array
(
[0] => Beetle
[1] => So. Am. Indian
[2] => Lengua
[3] => Métraux
[4] => Guaranyi
)
[1] => Array
(
[0] => Beetle
[1] => So. Am. Indian
[2] => Lengua
[3] => Métraux
[4] => Guaranyi
)
)

RegEx - grouping a string

Can't seem to figure out an expression which handles this line of text:
'SOME_TEXT','EVEN_MORE_TEXT','EXPRESSION IS IN (''YES'',''NO'')'
To this groupings
SOME_TEXT
EVEN_MORE_TEXT
EXPRESSION IS IN ('YES', 'NO')
....I'd rather have a nifty regex than solving this by string functions like indexOf(), etc..
The regex '([^']|'')++' will match the parts you're interested in, as this demo shows:
$text = "'SOME_TEXT','EVEN_MORE_TEXT','EXPRESSION IS IN (''YES'',''NO'')'";
preg_match_all("/'([^']|'')+'/", $text, $matches);
print_r($matches[0]);
which prints:
Array
(
[0] => 'SOME_TEXT'
[1] => 'EVEN_MORE_TEXT'
[2] => 'EXPRESSION IS IN (''YES'',''NO'')'
)

RegExp pattern to capture around two-characters delimiter

I have a string which is something like:
prefix::key0==value0::key1==value1::key2==value2::key3==value3::key4==value4::
I want to retrieve the value associated to a key (say, key1). The following pattern:
::key1==([^:]*)
...will work only if there are no ':' character in the value, so I want to make sure the pattern matching will stop only for the substring ::, but I'm can't find how to do that, as most examples I see are about single character matching.
How do I modify the regexp pattern to match all characters between "::key1==" and the next "::" ?
Thanks!
Can you do something like this : ::key1==(.*?)::? Assuming the language supports the lazy ? operator, this should work.
As mentioned in my comment to your question, if the entirety of your string is
prefix::key0==value0::key1==value1::key2==value2::key3==value3::key4==value4::
I would suggest exploding/splitting the string at :: instead of using regex as it will usually always be faster. You didn't specify language but here is a php example:
// string
$string = "prefix::key0==value0::key1==value1::key2==value2::key3==value3::key4==value4::";
// explode using :: as delimiter
$string = explode('::',$string);
// for each element...
foreach ($string as $value) {
// check if it has == in it
if (strpos($value,'==')!==false) $matches[] = $value;
}
// output
echo "<pre>";print_r($matches);
output:
Array
(
[0] => key0==value0
[1] => key1==value1
[2] => key2==value2
[3] => key3==value3
[4] => key4==value4
)
However, if you insist on the regex approach, here negative look-ahead alternative
::((?:(?!::).)+)
php example
// string
$string = "prefix::key0==value0::key1==value1::key2==value2::key3==value3::key4==value4::";
preg_match_all('~::((?:(?!::).)+)~',$string,$matches);
//output
echo "<pre>";print_r($matches);
output
Array
(
[0] => key0==value0
[1] => key1==value1
[2] => key2==value2
[3] => key3==value3
[4] => key4==value4
)
I think you're looking for a positive look-ahead:
::key0==(.*?)(?=::\w+==)
With the following:
prefix::key0==val::ue0::key1==value1::key2==value2::key3==value3::key4==value4::
It correctly finds val::ue0. This also assumes the keys conform to \w ([0-9A-Za-z_])
Also, a positive look-ahead may be a bit of overkill, but will work if the answer contains ::, too.