regular expression cannot match var*num and num*var - regex

I want to match the expression of var * var, var * num, num * var and num * num separately, i.e. using four different regular expression.
my var could be s1,s2,...,S1,S2,...,v1,v2,...V1,V2....
my num could be any float number
for var*var, I use:
[vVsS][0-9]+\s*[*/]\s*[vVsS][0-9]+
and it works well
for var*num and num*var, I use:
[vVsS][0-9]+\s*[*/]\s*[0-9]+[.]?[0-9]*
and
[0-9]+[.]?[0-9]*\s*[*/]\s*(vVsS)[0-9]+
but it returns nothing when I try the input:
2*4 + s1* 7 + v3 * 2 + s3 * V2 + 5*v1
UPDATE: I could do that now.
For example, for the case of var * num
[vVsS][0-9]+\s*[*/]\s*[0-9]+(?:[.][0-9]+)? works well, as Wiktor Stribiżew suggests in comment.
But I didn't find some explanation on the use of(?:) online. Anyone has idea on that?

You may use
[vVsS][0-9]+\s*[*/]\s*[0-9]+(?:[.][0-9]+)?
The pattern matches:
[vVsS][0-9]+ - a letter from the character class (either v, V, s or S) followed with one or more digits
\s*[*/]\s* - a / or * enclosed with zero or more whitespaces
[0-9]+ - one or more digits
(?:[.][0-9]+)? - an optional non-capturing group matching a dot and one or more digits.

Related

Valid regex for number(a,b) format

How can I express number(a,b) in regex? Example:
number(5,2) can be 123.45 but also 2.44
The best I got is: ([0-9]{1,5}.[0-9]{1,2}) but it isn't enough because it wrongly accepts 12345.22.
I thought about doing multiple OR (|) but that can be too long in case of a long format such as number(15,5)
You might use
(?<!\S)(?!(?:[0-9.]*[0-9]){6})[0-9]{1,5}(?:\.[0-9]{1,2})?(?!\S)
Explanation
(?<!\S) Negative lookbehind, assert what is on the left is not a non whitespace char
(?! Negative lookahead, assert what is on the right is not
(?:[0-9.]*[0-9]){6} Match 6 digits
) Close lookahead
[0-9]{1,5} Match 1 - 5 times a digit 0-9
(?:\.[0-9]{1,2})? Optionally match a dot and 1 - 2 digits
(?!\S) Negative lookahead, assert what is on the right is not a non whitespace char
Regex demo
I don't know Scala, but you would need to input those numbers when building your regular expression.
val a = 5
val b = 2
val regex = (raw"\((?=\d{1," + a + raw"}(?:\.0+)?|(?:(?=.{1," + (a + 1) + "}0*)(?:\d+\.\d{1," + n + "})))‌.+\)").r
This checks for either total digits is 5, or 6 (including decimal) where digits after the decimal are a max of 2 digits. For the above scenario. Of course, this accounts for variable numbers for a and b when set in code.

Regex for set of 6 digits from 1-49

I've a problem with define regular expression correctly. I want check sets of digits f.e.: 1,2,14,15,16,17 or 12,13,14,15,16,17 or 1,2,3,6,7,8. Every set contains 6 digits from 1 to 49. I check it by input's pattern field.
I wrote some regex but it works only for 2-digit sets.
([1-9]|[1-4][0-9],){5}([1-9]|[1-4][0-9])
Thanks for all answers :)
You forgot to group the number patterns inside the quantified group before comma and the anchors to make the regex engine match the full input string:
^(?:(?:[1-9]|[1-4][0-9]),){5}(?:[1-9]|[1-4][0-9])$
^ ^^^ ^ ^
See the regex demo.
Details
^ - start of string
(?:(?:[1-9]|[1-4][0-9]),){5} - five occurrences of:
(?:[1-9]|[1-4][0-9]) - either a digit from 1 to 9 or a number from 10 to 49`
, - a comma
(?:[1-9]|[1-4][0-9])
$ - end of string.
JS demo:
var strs = ['1,2,14,15,16,17','12,13,14,15,16,17', '1,2,3,6,7,8', '1,2,3,6,7,8,'];
var rng = '(?:[1-9]|[1-4][0-9])';
var rx = new RegExp("^(?:" + rng + ",){5}" + rng + "$");
for (var s of strs) {
console.log(s, '=>', rx.test(s));
}

RegEx for matching the first {N} chars and last {M} chars

I'm having an issue filtering tags in Grafana with an InfluxDB backend. I'm trying to filter out the first 8 characters and last 2 of the tag but I'm running into a really weird issue.
Here are some of the names...
GYPSKSVLMP2L1HBS135WH
GYPSKSVLMP2L2HBS135WH
RSHLKSVLMP1L1HBS045RD
RSHLKSVLMP35L1HBS135WH
RSHLKSVLMP35L2HBS135WH
only want to return something like this:
MP8L1HBS225
MP24L2HBS045
I first started off using this expression:
[MP].*
But it only returns the following out of 148:
PAYNKSVLMP27L1HBS045RD
PAYNKSVLMP27L1HBS135WH
PAYNKSVLMP27L1HBS225BL
PAYNKSVLMP27L1HBS315BR
The pattern [MP].* Matches either a M or P and then matches any char until the end of the string not taking any char, digit or quantifing number afterwards into account.
If you want to match MP and the value does not end on a digit but the last in the match should be a digit, you could use:
MP[A-Z0-9]+[0-9]
Regex demo
If lookaheads are supported you might also use:
MP[A-Z0-9]+(?=[A-Z0-9]{2}$)
Regex demo
You may not even want to touch MP. You can simply define a left and right boundary, just like your question asks, and swipe everything in between which might be faster, maybe an expression similar to:
(\w{8})(.*)(\w{2})
which you can simply call it using $2. That is the second capturing group, just to be easy to replace.
Graph
This graph shows how the expression would work:
Performance
This JavaScript snippet shows the performance of this expression using a simple 1-million times for loop.
repeat = 1000000;
start = Date.now();
for (var i = repeat; i >= 0; i--) {
var string = "RSHLKSVLMP35L2HBS135WH";
var regex = /^(\w{8})(.*)(\w{2})$/g;
var match = string.replace(regex, "$2");
}
end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");
Try Regex: (?<=\w{8})\w+(?=\w{2})
Demo

String Replacing in Regex

I am trying to replace text in string using regex. I accomplished it in c# using the same pattern but in swift its not working as per needed.
Here is my code:
var pattern = "\\d(\\()*[x]"
let oldString = "2x + 3 + x2 +2(x)"
let newString = oldString.stringByReplacingOccurrencesOfString(pattern, withString:"*" as String, options:NSStringCompareOptions.RegularExpressionSearch, range:nil)
print(newString)
What I want after replacement is :
"2*x + 3 +x2 + 2*(x)"
What I am getting is :
"* + 3 + x2 +*)"
Try this:
(?<=\d)(?=x)|(?<=\d)(?=\()
This pattern matches not any characters in the given string, but zero width positions in between characters.
For example, (?<=\d)(?=x) This matches a position in between a digit and 'x'
(?<= is look behind assertion (?= is look ahead.
(?<=\d)(?=\() This matches the position between a digit and '('
So the pattern before escaping:
(?<=\d)(?=x)|(?<=\d)(?=\()
Pattern, after escaping the parentheses and '\'
\(?<=\\d\)\(?=x\)|\(?<=\\d\)\(?=\\\(\)

regex with all components optionals, how to avoid empty matches

I have to process a comma separated string which contains triplets of values and translate them to runtime types,the input looks like:
"1x2y3z,80r160g255b,48h30m50s,1x3z,255b,1h,..."
So each substring should be transformed this way:
"1x2y3z" should become Vector3 with x = 1, y = 2, z = 3
"80r160g255b" should become Color with r = 80, g = 160, b = 255
"48h30m50s" should become Time with h = 48, m = 30, s = 50
The problem I'm facing is that all the components are optional (but they preserve order) so the following strings are also valid Vector3, Color and Time values:
"1x3z" Vector3 x = 1, y = 0, z = 3
"255b" Color r = 0, g = 0, b = 255
"1h" Time h = 1, m = 0, s = 0
What I have tried so far?
All components optional
((?:\d+A)?(?:\d+B)?(?:\d+C)?)
The A, B and C are replaced with the correct letter for each case, the expression works almost well but it gives twice the expected results (one match for the string and another match for an empty string just after the first match), for example:
"1h1m1s" two matches [1]: "1h1m1s" [2]: ""
"11x50z" two matches [1]: "11x50z" [2]: ""
"11111h" two matches [1]: "11111h" [2]: ""
This isn't unexpected... after all an empty string matches the expression when ALL of the components are empty; so in order to fix this issue I've tried the following:
1 to 3 quantifier
((?:\d+[ABC]){1,3})
But now, the expression matches strings with wrong ordering or even repeated components!:
"1s1m1h" one match, should not match at all! (wrong order)
"11z50z" one match, should not match at all! (repeated components)
"1r1r1b" one match, should not match at all! (repeated components)
As for my last attempt, I've tried this variant of my first expression:
Match from begin ^ to the end $
^((?:\d+A)?(?:\d+B)?(?:\d+C)?)$
And it works better than the first version but it still matches the empty string plus I should first tokenize the input and then pass each token to the expression in order to assure that the test string could match the begin (^) and end ($) operators.
EDIT: Lookahead attempt (thanks to Casimir et Hippolyte)
After reading and (try to) understanding the regex lookahead concept and with the help of Casimir et Hippolyte answer I've tried the suggested expression:
\b(?=[^,])(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b
Against the following test string:
"48h30m50s,1h,1h1m1s,11111h,1s1m1h,1h1h1h,1s,1m,1443s,adfank,12322134445688,48h"
And the results were amazing! it is able to detect complete valid matches flawlessly (other expressions gave me 3 matches on "1s1m1h" or "1h1h1h" which weren't intended to be matched at all). Unfortunately it captures emtpy matches everytime a unvalid match is found so a "" is detected just before "1s1m1h", "1h1h1h", "adfank" and "12322134445688", so I modified the Lookahead condition to get the expression below:
\b(?=(?:\d+[ABC]){1,3})(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b
It gets rid of the empty matches in any string which doesn't match (?:\d+[ABC]){1,3}) so the empty matches just before "adfank" and "12322134445688" are gone but the ones just before "1s1m1h", "1h1h1h" are stil detected.
So the question is: Is there any regular expression which matches three triplet values in a given order where all component is optional but should be composed of at least one component and doesn't match empty strings?
The regex tool I'm using is the C++11 one.
Yes, you can add a lookahead at the begining to ensure there is at least one character:
^(?=.)((?:\d+A)?(?:\d+B)?(?:\d+C)?)$
If you need to find this kind of substring in a larger string (so without to tokenize before), you can remove the anchors and use a more explicit subpattern in a lookahead:
(?=\d+[ABC])((?:\d+A)?(?:\d+B)?(?:\d+C)?)
In this case, to avoid false positive (since you are looking for very small strings that can be a part of something else), you can add word-boundaries to the pattern:
\b(?=\d+[ABC])((?:\d+A)?(?:\d+B)?(?:\d+C)?)\b
Note: in a comma delimited string: (?=\d+[ABC]) can be replaced by (?=[^,])
I think this might do the trick.
I am keying on either the beginning of the string to match ^ or the comma separator , for fix the start of each match: (?:^|,).
Example:
#include <regex>
#include <iostream>
const std::regex r(R"~((?:^|,)((?:\d+[xrh])?(?:\d+[ygm])?(?:\d+[zbs])?))~");
int main()
{
std::string test = "1x2y3z,80r160g255b,48h30m50s,1x3z,255b";
std::sregex_iterator iter(test.begin(), test.end(), r);
std::sregex_iterator end_iter;
for(; iter != end_iter; ++iter)
std::cout << iter->str(1) << '\n';
}
Output:
1x2y3z
80r160g255b
48h30m50s
1x3z
255b
Is that what you are after?
EDIT:
If you really want to go to town and make empty expressions unmatched then as far as I can tell you have to put in every permutation like this:
const std::string A = "(?:\\d+[xrh])";
const std::string B = "(?:\\d+[ygm])";
const std::string C = "(?:\\d+[zbs])";
const std::regex r("(?:^|,)(" + A + B + C + "|" + A + B + "|" + A + C + "|" + B + C + "|" + A + "|" + B + "|" + C + ")");