I have three variations of a string:
1. view=(edit:29,30)
2. view=(edit:29,30;)
3. view=(edit:29,30;x:100;y:200)
I need a RegExp that:
capture up to and including ",30"
capture "x:100;y:200" - whenever there's a semicolon after the first match;
WILL NOT include leftmost semicolon in any of the groups;
entire string on the right of the first semicolon and up to ')' can/should be in the same group.
I came up with:
$pat = '/view=\((\w+)(:)([\d,]+)((;[^)]+){0,}|;)\)/';
Applied to 'view=(edit:29,30;x:100;y:200)' it yields:
Array
(
[0] => view=(edit:29,30;x:100;y:200)
[1] => edit
[2] => :
[3] => 29,30
[4] => ;x:100;y:200
[5] => ;x:100;y:200
)
THE QUESTION. How do I remove ';' from matches [4] and [5]?
IMPORTANT. The same RegExp should work with a string when no semicolons are present, as: 'view=(edit:29,30)'.
$pat = '/view=\((\w+)(:)([\d,]+)((;[^)]+){0,}|;)\)/';
$str = 'view=(edit:29,30;x:100;y:200)';
preg_match($pat, $str, $m);
print_r($m);
Thanks!
You don’t need to group everything. Try this regular expression:
/view=\((\w+):([\d,]+)(?:;([^)]+)?)?\)/
I guess you want something like this:
$pattern = '/view=\\((\\w+):(\\d+,\\d+)(?:;((?:\\w+:\\d+;?)*))?\\)/';
Should return
[0] view=(edit:29,30;x:100;y:200)
[1] edit
[2] 29,30
[3] x:100;y:200
Related
I have the following string:
12345 This could be anythingREMOVE
I need to match 12345 and This could be anything. Unfortunately, the format I need to parse also has a string at the end of the line that isn't always present (REMOVE in this example). How can I match what I'm looking for without REMOVE? I've tried the following pattern:
^(\d{5}) (.*)(?:REMOVE|$)
Unfortunately, REMOVE is picked up by the wildcard:
(
[0] => Array
(
[0] => 12345 This could be anythingREMOVE
)
[1] => Array
(
[0] => 12345
)
[2] => Array
(
[0] => This could be anythingREMOVE
)
)
If last string REMOVE is optional then why can't use use htis regex:
"/^(\d{5}) /"
However if you really want to avoid REMOVE in matching pattern then use this:
$s = '12345 This could be anythingREMOVE';
if (preg_match("/^(\d{5}) (.*?)(?:REMOVE|)$/", $s, $arr))
var_dump($arr);
Output:
array(3) {
[0]=>
string(34) "12345 This could be anythingREMOVE"
[1]=>
string(5) "12345"
[2]=>
string(22) "This could be anything"
}
You can try this regex:
^(\d{5})((?:.(?!REMOVE))+.)
How It Works
^(\d{5}) -- Matches start of string, followed by five digits [0-9]. Group of parentheses use to captured the text matched.
((?:.(?!REMOVE))+ -- Matches any character if not immediately followed by the secuence REMOVE one or more times. It stops at the n in anything. it can't match the g because is followed by REMOVE.
.) -- Allow the g to match.
I'm looking for way to extract "abc" from source where ABC will always start with "X" and will stop with "Y".
At this moment I'm using:
Dim myString As String = source3RTB.Text
Dim finalString As String = myString.Substring((myString.IndexOf("X")), (myString.IndexOf("Y") - myString.IndexOf("X")) + 1)
source2RTB.Text = finalString
sourceRTB.Text = myString.Trim(finalString)
But there is problem as above code only selecting first X and first Y...
Source is complicated set of lines (xxxx) regex is not working too well for it (?<=X)(.*?)(?=Y) is only working for small piece of source, when I try it on whole source it is not working (not sure if it is because of new line, or...)
Any idea ?
Description
This regex will capture the inner text between an X and Y
(?:^|\s*?)\b(x(\w*)y)\b(?=\s*|$)
The \w* can be replaced with whatever search you're looking for. If you're looking to match new lines with the .*? then you should use the m option with your regex command to allow . to match new line characters.
I would have further customized this solution if the OP would have included sample text.
Groups
Group 0 gets the entire matching string including preceding spaces
gets the full string including open and close tags
gets the inner string of alphabetical charcters
PHP Code Example
<?php
$sourcestring="I'am xlikelyy xupvotey photos of xkittensy on the internet.";
preg_match_all('/(?:^|\s*?)\b(x(\w*)y)\b(?=\s*|$)/i',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
$matches Array:
(
[0] => Array
(
[0] => xlikelyy
[1] => xupvotey
[2] => xkittensy
)
[1] => Array
(
[0] => xlikelyy
[1] => xupvotey
[2] => xkittensy
)
[2] => Array
(
[0] => likely
[1] => upvote
[2] => kittens
)
)
I have an entry like so:
Beetle as creator. So. Am. Indian (Lengua): Métraux BBAE CXLIII (1) 367 (Guaranyi):
which I am trying to parse out to find the names of tribes. I want to be able to get So. Am. Indian, Lengua and Guaranyi, but avoid the (1).
I've gotten this so far:
\w+.[A-Za-z0-9_.()]+:| \(.*?\)
which gives me Indian, Lengua and Guaranyi but also the 367 which isn't correct. I'm not great at regex and I've just spent three hours on this so I was hoping someone might give me a pointer. thanks!
Your criteria to distinguish tribes and common words are not really clear for me, but here is an attempt in perl:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use utf8;
my $str = 'Beetle as creator. So. Am. Indian (Lengua): Métraux BBAE CXLIII (1) 367 (Guaranyi): ';
my #tribes = $str =~ /((?:\p{Lu}\p{Ll}+\.?\s?)+)/g;
print Dumper\#tribes;
explanation:
/ : Regex delimiter
( : Begin capture group 1
(?: : Begin non capture group
\p{Lu} : An uppercase letter
\p{Ll}+ : One or more lowercase letter
\.? : A dot 0 or 1 time
\s? : A space 0 or 1 time
)+ : End non capture group repeated 1 or more time
) : End of captur group
/ : Regex delimiter
g : Global search
output:
$VAR1 = [
'Beetle ',
'So. Am. Indian ',
'Lengua',
'Métraux',
'Guaranyi'
];
Like you can see it captures also Beetle and Métraux.
Same code in php:
$str = 'Beetle as creator. So. Am. Indian (Lengua): Métraux BBAE CXLIII (1) 367 (Guaranyi): ';
preg_match_all('/((?:\p{Lu}\p{Ll}+\.?\s?)+)/u', $str, $tribes);
print_r($tribes);
output:
Array
(
[0] => Array
(
[0] => Beetle
[1] => So. Am. Indian
[2] => Lengua
[3] => Métraux
[4] => Guaranyi
)
[1] => Array
(
[0] => Beetle
[1] => So. Am. Indian
[2] => Lengua
[3] => Métraux
[4] => Guaranyi
)
)
Can't seem to figure out an expression which handles this line of text:
'SOME_TEXT','EVEN_MORE_TEXT','EXPRESSION IS IN (''YES'',''NO'')'
To this groupings
SOME_TEXT
EVEN_MORE_TEXT
EXPRESSION IS IN ('YES', 'NO')
....I'd rather have a nifty regex than solving this by string functions like indexOf(), etc..
The regex '([^']|'')++' will match the parts you're interested in, as this demo shows:
$text = "'SOME_TEXT','EVEN_MORE_TEXT','EXPRESSION IS IN (''YES'',''NO'')'";
preg_match_all("/'([^']|'')+'/", $text, $matches);
print_r($matches[0]);
which prints:
Array
(
[0] => 'SOME_TEXT'
[1] => 'EVEN_MORE_TEXT'
[2] => 'EXPRESSION IS IN (''YES'',''NO'')'
)
I have a string which is something like:
prefix::key0==value0::key1==value1::key2==value2::key3==value3::key4==value4::
I want to retrieve the value associated to a key (say, key1). The following pattern:
::key1==([^:]*)
...will work only if there are no ':' character in the value, so I want to make sure the pattern matching will stop only for the substring ::, but I'm can't find how to do that, as most examples I see are about single character matching.
How do I modify the regexp pattern to match all characters between "::key1==" and the next "::" ?
Thanks!
Can you do something like this : ::key1==(.*?)::? Assuming the language supports the lazy ? operator, this should work.
As mentioned in my comment to your question, if the entirety of your string is
prefix::key0==value0::key1==value1::key2==value2::key3==value3::key4==value4::
I would suggest exploding/splitting the string at :: instead of using regex as it will usually always be faster. You didn't specify language but here is a php example:
// string
$string = "prefix::key0==value0::key1==value1::key2==value2::key3==value3::key4==value4::";
// explode using :: as delimiter
$string = explode('::',$string);
// for each element...
foreach ($string as $value) {
// check if it has == in it
if (strpos($value,'==')!==false) $matches[] = $value;
}
// output
echo "<pre>";print_r($matches);
output:
Array
(
[0] => key0==value0
[1] => key1==value1
[2] => key2==value2
[3] => key3==value3
[4] => key4==value4
)
However, if you insist on the regex approach, here negative look-ahead alternative
::((?:(?!::).)+)
php example
// string
$string = "prefix::key0==value0::key1==value1::key2==value2::key3==value3::key4==value4::";
preg_match_all('~::((?:(?!::).)+)~',$string,$matches);
//output
echo "<pre>";print_r($matches);
output
Array
(
[0] => key0==value0
[1] => key1==value1
[2] => key2==value2
[3] => key3==value3
[4] => key4==value4
)
I think you're looking for a positive look-ahead:
::key0==(.*?)(?=::\w+==)
With the following:
prefix::key0==val::ue0::key1==value1::key2==value2::key3==value3::key4==value4::
It correctly finds val::ue0. This also assumes the keys conform to \w ([0-9A-Za-z_])
Also, a positive look-ahead may be a bit of overkill, but will work if the answer contains ::, too.