I have these 2 lines:
What is P(output1|cause1=2, cause2=2)
What is P(output2|cause3=2)
I would like to change it to:
method_to_use(model, {"cause1": 2, "cause2": 2}, "output1")
method_to_use(model, {"cause3": 2}, "output2")
this is my regex:
.*P[(]([a-z1-9]+)[|](([a-z1-9]+)=([1-9]),?)+[)]
and I try to replace it like this:
method_to_use(model, {"$3": $4}, "$1")
but I only get the last fit of the group:
method_to_use(model, {"cause2": 2}, "output1")
is it possible to do some kind of a "loop" and change all fits on the way?
You could match the string with the following regular expression.
^.*P\(([^|]+)\|([^=]+)=(\d+)(?:, +([^=]+)=(\d+))?\)$
If capture group 4 is non-empty replace the (whole-string) match with
method_to_use(model, {"$2": $3, "$4": $5}, "$1")
This causes the string
What is P(output1|cause1=2, cause2=2)
to be replaced with
method_to_use(model, {"cause1": 2, "cause2": 2}, "output1")
Demo 1
If capture group 4 is empty replace the match with
method_to_use(model, {"$2": $3}, "$1")
This causes the string
What is P(output2|cause3=2)
to be replaced with
method_to_use(model, {"cause3": 2}, "output2")
Demo 2
Note that the regular expressions at the two links are equivalent, the only difference is that at Demo 1 I expressed the regex in free-spacing mode, which permits it to be self-documenting.
Instead of replacing the entire string one could of course simply form the new string from the values of the capture groups. If that is done ^.*P at the beginning of the regex could be changed to simply P.
The regex engine performs the following operations.
^ # match beginning of line
.*P\( # match 0+ chars then '|('
([^|]+) # save 1+ chars except '|' in cap grp 1 (output)
\| # match ':'
([^=]+) # save 1+ chars except '=' in cap grp 2 (causeA)
= # match '='
(\d+) # save 1+ digits in cap grp 3 (causeAval)
(?: # begin non-cap grp
,\ + # match ',' then 1+ spaces
([^=]+) # match 1+ chars except '=' in cap grp 4 (causeB)
= # match '='
(\d+) # match 1+ digits in cap grp 5 (causeBval)
)? # end non-cap grp and make it optional
\) # match ')'
$ # match end of line
One thing is certain: you can't do that with a single regex.
You may use a 3-step approach:
Replace .*P\( with method_to_use(
Replace \(\K(\w+)\|([^()]+) with model, {$2}, "$1"
Replace (\w+)=(\w+) with "$1": $2
Note that
.*P\( matches any 0 or more chars other than line break chars as many as possible and the P(
\(\K(\w+)\|([^()]+) matches (, then discards it from the match value with \K, and then 1+ word chars are captured into Group 1 ($1 in the replacement pattern), | is matched and then 1+ chars other than ) and ( are captured into Group 2 ($2)
(\w+)=(\w+) - 1+ word chars are captured into Group 1 ($1 in the replacement pattern), = is matched and then 1+ word chars are captured into Group 2 ($2).
Related
I have the following regular expressions that extract everything after first two alphabets
^[A-Za-z]{2})(\w+)($) $2
now I want to the extract nothing if the data doesn't start with alphabets.
Example:
AA123 -> 123
123 -> ""
Can this be accomplished by regex?
Introduce an alternative to match any one or more chars from start to end of string if your regex does not match:
^(?:([A-Za-z]{2})(\w+)|.+)$
See the regex demo. Details:
^ - start of string
(?: - start of a container non-capturing group:
([A-Za-z]{2})(\w+) - Group 1: two ASCII letters, Group 2: one or more word chars
| - or
.+ - one or more chars other than line break chars, as many as possible (use [\w\W]+ to match any chars including line break chars)
) - end of a container non-capturing group
$ - end of string.
Your pattern already captures 1 or more word characters after matching 2 uppercase chars. The $ does not have to be in a group, and this $2 should not be in the pattern.
^[A-Za-z]{2})(\w+)$
See a regex demo.
Another option could be a pattern with a conditional, capturing data in group 2 only if group 1 exist.
^([A-Z]{2})?(?(1)(\w+)|.+)$
^ Start of string
([A-Z]{2})? Capture 2 uppercase chars in optional group 1
(? Conditional
(1)(\w+) If we have group 1, capture 1+ word chars in group 2
| Or
.+ Match the whole line with at least 1 char to not match an empty string
) Close conditional
$ End of string
Regex demo
For a match only, you could use other variations Using \K like ^[A-Za-z]{2}\K\w+$ or with a lookbehind assertion (?<=^[A-Za-z]{2})\w+$
I have a Pattern: (^([-]?\d+([.]\d+)?,){6}([10],)([-]?\d+([.]\d+)?)$) which matches: "26.9841,300.007666,4,1,0,15,1,0" this is what I want, however my pattern does not match the following Strings:
"26 . 9841,300 . 007666,4,1,0,15,1,0"
"26.9841\n,300.007666\n,4,\n1,0,15,1,0"
"2 6 . 9 8 4 1 ,\n 3 0 0 .0 0 7 6 6 6 , 4 \n, 1 , 0 , 1 5 , 1 , 0"
Which is the exact same String just with random Spaces and New Lines thrown in.
I could match those with the following Pattern:
(^([-]?\s*?\n*?[0-9 ]+\s*?\n*?(\s*?\n*?[.]\s*?\n*?[0-9 ]+\s*?\n*?)?\s*?\n*?,\s*?\n*?){6}([10]\s*?\n*?,)(\s*?\n*?[-]?\s*?\n*?[0-9 ]+\s*?\n*?([.]\s*?\n*?[0-9 ]+\s*?\n*?)?)$)
Which matches 1, 2, and 3, however this pattern is absurd, most likely can be simplified, and doesn't match all New lines; (it won't match New Line occurrence in [0-9]+ (+) chunks). It's also just slapping "\s*?\n*?" wherever it can.
Question
What I want to know is if there is a way to match through those characters. Ignoring their occurrence, as long as you can say, if they weren't there the Pattern would match.
Note:
Input String should match: ((Decimal|Int),{6}(1|0),(Decimal|Int))
If New line characters appear at the end of the pattern assume that no more input can be found.
I cannot remove those characters from the input String as I need to know that they were there.
I do not care about leading or trailing spaces/new-lines
Pattern will always start with "-" or "[0-9]" (yes 0 can be the first char)
Pattern will always end with [0-9]
Edit
This Regex works and passes my test suite: (^(-?\s*[0-9]\s*[\s.0-9]*,){6}(\s*[10]\s*,)(\s*-?\s*[0-9][\s.0-9]*?)$)
If you wish to have a test that is a little more of a validation,
this would be more appropriate.
However, anytime you intersperse white space \s construct in a regex that
separates alike clusters (?:\s*\d)+ when this looks like most of your data,
there is a risk of no way points from which to end the search.
This particular regex might work though.
^\s*((?:\s*[-]?(?:\s*\d)+(?:\s*[.](?:\s*\d)+)?\s*,){6}\s*[10]\s*,\s*[-]?(?:\s*\d)+?(?:\s*[.](?:\s*\d)+?)?)$
https://regex101.com/r/YmKJgW/1
The capture group 1 is a convenience that strips leading white space from the match.
^
\s*
( # (1 start)
(?:
\s* [-]?
(?: \s* \d )+
(?:
\s* [.]
(?: \s* \d )+
)?
\s* ,
){6}
\s*
[10] \s* , \s*
[-]?
(?: \s* \d )+?
(?:
\s* [.]
(?: \s* \d )+?
)?
) # (1 end)
$
I'm trying to match all fractions or 'evs' and strings (string1, string2) the following string with regex. The strings may contain any number of white spaces ('String 1', 'The String 1', 'The String Number 1').
10/3 string1 evs string2 8/5 mon 19:45 string1 v string2 1/1 string1 v string2 1/1
The following regex works in Javascript but not in PHP. No errors are returned, just 0 results.
(\d{1,3}\/\d{1,3}|evs).*?(.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+) v (.+).*?(\d{1,3}\/\d{1,3}|evs).*?(.+) v (.+).*?(\d{1,3}\/\d{1,3}|evs)
Here's the expected result, other than group 6 and 7 (ran using Javascript):
If I add a ? to the first (.+) so that it becomes (.+?), I get the desired result but with the first string not captured:
As soon as I remove the ? to capture the whole string, there are no results returned. Can somebody work out what's going on here?
In PCRE/PHP, you may use
$regex = '(\d{1,3}\/\d{1,3}|evs)\s+(\S+)\s+((?1))\s+(\S+)\s+((?1))\s+(.+?)\s+v\s+(\S+)\s+((?1))\s+(\S+)\s+v\s+(\S+)\s+((?1))';
if (preg_match_all($regex, $text, $matches)) {
print_r($matches[0]);
}
See the regex demo
The point is that you can't over-use .*? / .+ in the middle of the pattern, that leads to catastrophic backtracking.
You need to use precise patterns to match whitespace, and non-whitespace fields, and only use .*? / .+? where the fields can contain any amount of whitespace and non-whitespace chars.
Details
(\d{1,3}\/\d{1,3}|evs) - Group 1 (its pattern can be later accessed using (?1) subroutine): one to three digits, / and then one to three digits, or evs
\s+(\S+)\s+ - 1+ whitespaces, Group 2 matching 1+ non-whitespace chars, 1+ whitespaces
((?1)) - Group 3 that matches the same way Group 1 pattern does
\s+(\S+)\s+((?1))\s+ - 1+ whitespaces, Group 4 matching 1+ non-whitespaces, 1+ whitespaces, Group 5 with the Group 1 pattern, 1+ whitespaces
(.+?) - Group 6: matching any 1 or more char chars other than line break chars as few as possible
\s+v\s+ - v enclosed with 1+ whitespaces
(\S+) - Group 7: 1+ non-whitespaces
\s+((?1))\s+ - 1+ whitespaces, Group 8 with Group 1 pattern, 1+ whitespaces
(\S+) - Group 9: 1+ non-whitespaces
\s+v\s+ - v enclosed with 1+ whitespaces
(\S+)\s+((?1)) - Group 10: 1+ non-whitespaces, then 1+ whitespaces and Group 11 with Group 1 pattern.
I am try to split this String into key value pattern using regex
val x = "title=MyTitle, active=true, title2=MyTitle, Subtitle, new=false, title3=My Title#subtitle1"
I try using this formula
([\w]+)=(.*?)([\w]+)
the output is
title=MyTitle
active=true
title2=MyTitle
new=false
title3=My
Any clue to modified regex formula so the output become
title=MyTitle
active=true
title2=MyTitle, Subtitle
new=false
title3=My Title#subtitle1
A look-ahead works pretty well
(\w+)=(.*?(?=,\s*\w+=|\s*$))
Breakdown
( # group 1 (key)
\w+ # word characters
) # end group 1
= # equals sign
( # group 2 (value)
.*? # anything, non-greedy
(?= # look-ahead ("followed by")...
,\s* # comma and spaces
\w+= # word characters and an equals sign (the next key)
| # or
\s*$ # spaces and the end of string
) # end look-ahead
) # end group 2
You might use 2 capturing groups.
In the first group capture 1+ word chars. In the second group capture the words asserting what is on the right is not a whitespace char followed by a word and an equals sign.
(\w+)=([\w+,]+(?:(?!\s\w+=)\s[#\w]+)*)(?:,|$)
(\w+) Capture group 1, match 1+ word chars
= Match =
( Capture group 2
[\w+,]+ Match 1+ word chars or ,
(?: Non capturing group
(?!\s\w+=) Assert what is on the right is not a whitespace, 1+ word chars and =
\s[#\w]+ Match a whitespace and 1+ word chars or #
)* Close group and repeat 0+ times
) Close group 1
(?:,|$) Match either , or assert the end of the string
Regex demo
Or you might use a broader match for the value part using a negated character class instead of matching the specified characters.
(\w+)=([^=\s]+(?:(?!\s\w+=)\s[^=\s,]+)*)(?:,|$)
Regex demo
Example:
I have the following string
a125A##THISSTRING##.test123
I need to find THISSTRING. There are many strings which are nearly the same so I'd like to check if there is a digit or letter before the ## and also if there is a dot (.) after the ##.
I have tried something like:
([a-zA-Z0-9]+##?)(.+?)(.##)
But I am unable to get it working
You can use look behind and look ahead:
(?<=[a-zA-Z0-9]##).*?(?=##\.)
https://regex101.com/r/i3RzFJ/2
But I am unable to get it working.
Let's deconstruct what your regex ([a-zA-Z0-9]+##?)(.+?)(.##) says.
([a-zA-Z0-9]+##?) match as many [a-zA-Z0-9] followed by a # followed by optional #.
(.+?) any character as much as possible but fewer times.
(.##) any character followed by two #. Now . consumes G and then ##. Hence THISSTRING is not completely captured in group.
Lookaround assertions are great but are little expensive.
You can easily search for such patterns by matching wanted and unwanted and capturing wanted stuff in a capturing group.
Regex: (?:[a-zA-Z0-9]##)([^#]+)(?:##\.)
Explanation:
(?:[a-zA-Z0-9]##) Non-capturing group matching ## preceded by a letter or digit.
([^#]+) Capturing as many characters other than #. Stops before a # is met.
(?:##\.) Non-capturing group matching ##. literally.
Regex101 Demo
Javascript Example
var myString = "a125A##THISSTRING##.test123";
var myRegexp = /(?:[a-zA-Z0-9]##)([^#]+)(?:##\.)/g;
var match = myRegexp.exec(myString);
console.log(match[1]);
You wrote:
check if there is a digit or letter before the ##
I assume you mean a digit / letter before the first ## and
check for a dot after the second ## (as in your example).
You can use the following regex:
[a-z0-9]+ # Chars before "##", except the last
(?: # Last char before "##"
(\d) # either a digit - group 1
| # or
([a-z]) # a letter - group 2
)
##? # 1 or 2 "at" chars
([^#]+) # "Central" part - group 3
##? # 1 or 2 "at" chars
(?: # Check for a dot
(\.) # Captured - group 4
| # or nothing captured
)
[a-z0-9]+ # The last part
# Flags:
# i - case insensitive
# x - ignore blanks and comments
How it works:
Group 1 or 2 captures the last char before the first ##
(either group 1 captures a digit or group 2 captures a letter).
Group 3 catches the "central" part (THISSTRING,
a sequence of chars other than #).
Group 4 catches a dot, if any.
You can test it at https://regex101.com/r/ATjprp/1
Your regex has such an error that a dot matches any char.
If you want to check for a literal dot, you must escape it
with a backslash (compare with group 4 in my solution).