Regex for matching string with optional groups with separator - regex

I have 3 string group a, b, c, which are optional each, and the separator ,.
a
b
c
a,b
a,c
a,b,c
How can I have a regex to matching above requirement. Given at least one of a or b or c will be exists, and a, b, c will appeared in fixed order.
Constraint that a, b, c are long string, so we can't do the hardcore way
/a|b|c|a,b|a,c|a,b,c|/ -> we can't do this
I haven't have a solution yet, but below is what I have tried (not working, of couse)
/a?,?b?,?c?/
...
Extension to above question:
Can we make this solution more general? E.g:
More group: d, e, f
The separator is customized between each group, like: a abc b, b, c, a; c
a, b, c appeared at random order

Related

Replacing Arithmetic operations in C program with ternary operations and return the modified program

This is my first post on a community forum like stackoverflow so please forgive any mistakes.
I am working in the field of program repair and unable to pass this roadblock.
For every arithmetic operation in a program, the goal is to introduce another arithmetic operation (inverse of current operator) between the same operands, such that decision to choose between the original operation and new operation can be chosen conditionally during run time.
For Example : Consider the snippet
d = a + b;
Here, the statement should be replaced with
d = b1 ? a + b : a - b;
b1 is a boolean variable introduced to make the decision.
This has to be done for every arithmetic operation in a given c program.
Consider the following c Program
int function(int a, int b, int c) {
int d = a + b * c;
return d;
}
The output should be
int function(int a, int b, int c, int b1, int b2) {
int d = (b2 ? a + (b1 ? b * c : b / c) : a - (b1 ? b * c : b / c)) ;
return d;
}
Here again, b1 and b2 are boolean variables introduced to make the decision.
I am still not sure what's the best way to do this. I tried designing a parser but being new to the field, was unable to do so as unlike parsing, new statements have to be added here.
I am stuck on this for days and any help would be appreciated.

Regex to capture string outside of parentheses and remove comma

I have a string like below
A (1), B (2), C (1), D (3)
that I would like to take only A B C D and remove all parentheses, comma and whitespace. I have come up with (.*?)\s?\(.*?\),* but it still has a whitespace for the second element (B).
Expected output is a list
A
B
C
D
I use this one https://regex101.com/ to verify
You can try this:
console.log('A (1), B (2), C (1), D (3)'.replace(/[^A-Z]/g, ''));
You can try it on regexr
Update:
console.log('A (1), B (2), C (1), D (3)'.replace(/[^A-Z]/g, '').replace(/([A-Z])/g, '$1\n'));
Based on #AvinashRaj's comment
console.log('A (1), B (2), C (1), D (3)'.replace(/[^A-Z]+/g, '\n'));

Formal language theory (regular expressions and regular languages) - concept of "OR"

Okay, so in programming the logical OR symbol (typically ||) when applied to operands a and b, that is, a || b, means that either a or b can be true, OR both can be true. If you want only one to be true, you use XOR (sometimes, the ^ symbol.)
However, in formal language theory, the concept of OR (typically the + symbol) seems to imply exclusive-or (xor) instead of regular OR. For example, if we describe a language L with a regular expression aa + bb + ab, a valid string (word) from the language would be one of those (aa, bb, or ab), not some concatenation of them. To do that, you must use the Kleene closure, as in (aa + bb + ab)*, right?
Perhaps I'm just thinking of + as being defined in a peculiar way, or perhaps it's that the operands are no longer Boolean?
I'm just looking for verification if I seem to be understanding that + (OR) has a seemingly different meaning in formal language / computational modeling than it does in programming languages. Thanks!
The formal language OR is an inclusive ("regular") OR. E.g., the regular language ab* + a*b includes strings that are in both ab* and a*b (i.e., the string ab).
The problem is not with the operator - the + in regular expressions really does mean the same thing as union of sets - the problem is with your understanding of the operands. Specifically, in your regular expression, aa + bb + ab, aa does not represent a string over your alphabet, but a sub-regular expression. Regular expressions describe sets of strings; so the regular expression aa describes the set of strings {aa}. So, the regular expression aa + bb + ab describes the set of strings {aa} union {bb} union {ab} = {aa, bb, ab}. The exclusive-or of set theory, symmetric difference, does not have an operator in the regular expression syntax. We can recursively define the language of a regular expression, written L(r) for regular expression r, as follows:
L(r) = {r}, if r is a string over the alphabet;
L(r) = L(s)L(t) if r = st;
L(r) = L(s)* if r = s*;
L(r) = L(s) union L(t) if r = s + t.

regular expression evaluation in string matching

I am reading Regular expression in Algorithms by Robert Sedgwick book.
Here for regular expression mention below
A* | (A*BA*BA*)*
Here author mentioned matches are: AAA, BBAABB, and BABAAA.
does not match for above regular expression are ABA BBB BABBAAA.
My question is how BBAABB is matching and same way how BABAAA is matching. Kindly explain.
In general I am looking for how to evaluate with | and * operators in regular expressions.
in below example how we can get b alone in set if we have a+ since it says we must have atleast 1 a.
(a+b)* = (λ, a, b, aa, ab, ba, bb, aaa, ...)
There is one difference between * and +. The character after which you put * can have no repetition. But in + case, it can have minimum 1 repetition.
In A* | (A*BA*BA*)*, BBAABB is valid for following reasons and it is according to (A*BA*BA*)* pattern
No A at start for A*
1 B for BA* and no A
1 B for BA* and 2 A
* at end of (A*BA*BA*)*shows that pattern can repeat. So second repetition is BB which is valid
These are the points for which BBAABB is valid.

How to get the difference of two variables, when there are missing values?

I have two variables A & B, and I want to get A - B for a new variable called C. For that I used generate C = A - B. But it gives some missing values in C, when either A or B contains missing values.
For example, if A is 5000 while B is missing, it gives missing for C, even though I want C as 5000.
So I want to consider those missing values as zeros & get the answer. How can I do it in Stata?
gen C = cond(missing(A, B), min(A, B), A - B)
which is short-hand for
gen C = A - B
replace C = min(A, B) if missing(A, B)
which is short for
gen C = A - B
replace C = B if missing(A)
replace C = A if missing(B)
For a tutorial on cond() see http://www.stata-journal.com/article.html?article=pr0016
The result of min(A, B) is always the single non-missing value when there is one. (Also, true of max(A, B) in fact.)
You didn't spell out what you want if both are missing; the code here returns missing as the difference.
If your missings really are to be thought of as zeros, see help mvencode.