Regex for parenthesis (JavaScript) - regex

This is the regexp I created so far:
\((.+?)\)
This is my test string: (2+2) + (2+3*(2+3))
The matches I get are:
(2+2)
And
(2+3*(2+3)
I want my matches to be:
(2+2)
And
(2+3*(2+3))
How should I modify my regular expression?

You cannot parse parentesized expressions with regular expression.
There is a mathematical proof that regular expressions can't do this.
Parenthesized expressions are a context-free grammar, and can thus be recognized by pushdown automata (stack-machines).
You can, anyway, define a regular expression that will work on any expression with less than N parentheses, with an arbitrary finite N (even though the expression will get complex).
You just need to acknowledge that your parentheses might contain another arbitrary number of parenteses.
\(([^()]+(\([^)]+\)[^)]*)*)\)
It works like this:
\(([^()]+ matches an open parenthesis, follwed by whatever is not a parenthesis;
(\([^)]+\)[^)]*)* optionally, there may be another group, formed by an open parenthesis, with something inside it, followed by a matching closing parenthesis. Some other non-parenthesis character may follow. This can be repeated an arbitrary amount of times. Anyway, at last, there must be
)\) another closed parenthesis, which matches with the first one.
This should work for nesting depth 2. If you want nesting depth 3, you have to further recurse, allowing each of the groups I described at point (2) to have a nested parenthesized group.
Things will get much easier if you use a stack. Such as:
foundMatches = [];
mStack = [];
start = RegExp("\\(");
mid = RegExp("[^()]*[()]?");
idx = 0;
while ((idx = input.search(start.substr(idx))) != -1) {
mStack.push(idx);
//Start a search
nidx = input.substr(idx + 1).search(mid);
while (nidx != -1 && idx + nidx < input.length) {
idx += nidx;
match = input.substr(idx).match(mid);
match = match[0].substr(-1);
if (match == "(") {
mStack.push(idx);
} else if (mStack.length == 1) {
break;
}
nidx = input.substr(idx + 1).search(mid);
}
//Check the result
if (nidx != -1 && idx + nidx < input.length) {
//idx+nidx is the index of the last ")"
idx += nidx;
//The stack contains the index of the first "("
startIdx = mStack.pop();
foundMatches.push(input.substr(startIdx, idx + 1 - startIdx));
}
idx += 1;
}

How about you parse it yourself using a loop without the help of regex?
Here is one simple way:
You would have to have a variable, say "level", which keeps track of how many open parentheses you have come across so far (initialize it with a 0).
You would also need a string buffer to contain each of your matches ( e.g. (2+2) or (2+3 * (2+3)) ) .
Finally, you would need somewhere you can dump the contents of your buffer into whenever you finish reading a match.
As you read the string character by character, you would increment level by 1 when you come across "(", and decrement by 1 when you come across ")". You would then put the character into the buffer.
When you come across ")" AND the level happens to hit 0, that is when you know you have a match. This is when you would dump the contents of the buffer and continue.
This method assumes that whenever you have a "(" there will always be a corresponding ")" in the input string. This method will handle arbitrary number of parentheses.

Related

Regex: Find a word that consists of certain characters

I have a list of dictionary words, I would like to find any word that consists of (some or all) certain characters of a source word in any order :
For Example:
Characters (source word) to look for : stainless
Found Words : stainless, stain, net, ten, less, sail, sale, tale, tales, ants, etc.
Also if a letter is found once in the source word it can't be repeated in the found word
Unacceptable words to find : tent (t is repeated), tall (l is repeated) , etc.
Acceptable words to find : less (s is already repeated in the source word), etc.
You could take this approach:
Match any sequence of characters that are in the search word, requiring that the match is a word (word-boundaries)
Prohibit that a certain character occurs more often than it is present in the search word, using a negative look-ahead. Do this for every character that is in the search word.
For the given example the regular expression would be:
(?!(\S*s){4}|(\S*t){2}|(\S*a){2}|(\S*i){2}|(\S*n){2}|(\S*l){2}|(\S*e){2})\b[stainless]+\b
The biggest part of the pattern deals with the negative look-ahead. For example:
(\S*s){4} would match four times an 's' in a single word.
(?! | ) places these patterns as different options in a negative look-ahead so that none of them should match.
Automation
It is clear that making such a regular expression for a given word needs some work, so that is where you could use some automation. Notepad++ cannot help with that, but in a programming environment it is possible. Here is a little snippet in JavaScript that will give you the regular expression that corresponds to a given search word:
function regClassEscape(s) {
// Escape "[" and "^" and "-":
return s.replace(/[\]^-]/g, "\\$&");
}
function buildRegex(searchWord) {
// get frequency of each letter:
let freq = {};
for (let ch of searchWord) {
ch = regClassEscape(ch);
freq[ch] = (freq[ch] ?? 0) + 1;
}
// Produce negative options (too many occurrences)
const forbidden = Object.entries(freq).map(([ch, count]) =>
"(\\S*[" + ch + "]){" + (count + 1) + "}"
).join("|");
// Produce character set
const allowed = Object.keys(freq).join("");
return "(?!" + forbidden + ")\\b[" + allowed + "]+\\b";
}
// I/O management
const [input, output] = document.querySelectorAll("input,div");
input.addEventListener("input", refresh);
function refresh() {
if (/\s/.test(input.value)) {
output.textContent = "Input should have no white space!";
} else {
output.textContent = buildRegex(input.value);
}
}
refresh();
input { width: 100% }
Search word:<br>
<input value="stainless">
Regular expression:
<div></div>

How to split expression containing brackets correctly

I am trying to write an expression handler that will correctly split brackets, until today it has worked very well, but I've now encountered a problem I hadn't thought of.
I try to split the expression by the content of brackets first, once these are evaluated I replace the original content with the results and process until there are no brackets remaining.
The expression may contain marcos/variables. Macros are denoted by text wrapped in $macro$.
A typical expression:
($exampleA$ * 3) + ($exampleB$ / 2)
Macros are replaced before the expression is evaluated, the above works fine because the process is as follows:
Split expression by brackets, this results in two expressions:
$exampleA$ * 3
$exampleB$ / 2
Each expression is then evaluated, if exampleA = 3 and exampleB = 6:
$exampleA$ * 3 = 3 * 3 = 9
$exampleB$ / 2 = 6 / 2 = 3
The expression is then rebuilt using the results:
9 + 3
The final expression without any brackets is then evaluated to:
12
This works fine until an expressions with nested brackets is used:
((($exampleA$ * 3) + ($exampleB$ / 2) * 2) - 1)
This breaks completely because the regular expression I'm using:
regex("(?<=\\()[^)]*(?=\\))");
Results in:
($exampleA$ * 3
$exampleB$ / 2
So how can I correctly decode this, I want the above to be broken down to:
$exampleA$ * 3
$exampleB$ / 2
I am not exactly sure what you are trying to do. If you want to match the innermost expressions, wouldn't this help?:
regex("(?<=\\()[^()]*(?=\\))");
By the way, are the parentheses in your example unbalanced on purpose?
Traditional regex cannot handle recursive structures like nested brackets.
Depending on which regex flavor you are using, you may be able to use regex recursion. Otherwise, you will probably need a new method for parsing the groups. I think the traditional way is to represent the expression as a stack: start with an empty stack, push when you find a '(', pop when you find a ')'.
You can't really do this with regex. You really need a recursive method, like this:
using System;
using System.Data;
using System.Xml;
public class Program
{
public static void Main() {
Console.WriteLine(EvaluateExpression("(1 + 2) * 7"));
}
public static int EvaluateExpression(string expression) {
// Recursively evaluate parentheses as sub expressions
var expr = expression.ToLower();
while (expr.Contains("(")) {
// Find first opening bracket
var count = 1;
var pStart = expr.IndexOf("(", StringComparison.InvariantCultureIgnoreCase);
var pos = pStart + 1;
// Find matching closing bracket
while (pos < expr.Length && count > 0) {
if (expr.Substring(pos, 1) == "(") count++;
if (expr.Substring(pos, 1) == ")") count--;
pos++;
}
// Error if no matching closing bracket
if (count > 0) throw new InvalidOperationException("Closing parentheses not found.");
// Divide expression into sub expression
var pre = expr.Substring(0, pStart);
var subexpr = expr.Substring(pStart + 1, pos - pStart - 2);
var post = expr.Substring(pos, expr.Length - pos);
// Recursively evaluate the sub expression
expr = string.Format("{0} {1} {2}", pre, EvaluateExpression(subexpr), post);
}
// Replace this line with you're own logic to evaluate 'expr', a sub expression with any brackets removed.
return (int)new DataTable().Compute(expr, null);
}
}
I'm assuming your using C# here... but you should get the idea and be able to translate it into whatever.
If you use the following regex, you can capture them as group(1). group(0) will have parenthesis included.
"\\(((?:\"\\(|\\)\"|[^()])+)\\)"
Hope it helps!

How to use regular expressions to extract 3-tuple values from a string

I am trying to extract n 3-tuples (Si, Pi, Vi) from a string.
The string contains at least one such 3-tuple.
Pi and Vi are not mandatory.
SomeTextxyz#S1((property(P1)val(V1))#S2((property(P2)val(V2))#S3
|----------1-------------|----------2-------------|-- n
The desired output would be:
Si,Pi,Vi.
So for n occurrences in the string the output should look like this:
[S1,P1,V1] [S2,P2,V2] ... [Sn-1,Pn-1,Vn-1] (without the brackets)
Example
The input string could be something like this:
MyCarGarage#Mustang((property(PS)val(500))#Porsche((property(PS)val(425‌​)).
Once processed the output should be:
Mustang,PS,500 Porsche,PS,425
Is there an efficient way to extract those 3-tuples using a regular expression
(e.g. using C++ and std::regex) and what would it look like?
#(.*?)\(\(property\((.*?)\)val\((.*?)\)\) should do the trick.
example at http://regex101.com/r/bD1rY2
# # Matches the # symbol
(.*?) # Captures everything until it encounters the next part (ungreedy wildcard)
\(\(property\( # Matches the string "((property(" the backslashes escape the parenthesis
(.*?) # Same as the one above
\)val\( # Matches the string ")val("
(.*?) # Same as the one above
\)\) # Matches the string "))"
How you should implement this in C++ i don't know but that is the easy part :)
http://ideone.com/S7UQpA
I used C's <regex.h> instead of std::regex because std::regex isn't implemented in g++ (which is what IDEONE uses). The regular expression I used:
" In C(++)? regexes are strings.
# Literal match
([^(#]+) As many non-#, non-( characters as possible. This is group 1
( Start another group (group 2)
\\(\\(property\\( Yet more literal matching
([^)]+) As many non-) characters as possible. Group 3.
\\)val\\( Literal again
([^)]+) As many non-) characters as possible. Group 4.
\\)\\) Literal parentheses
) Close group 2
? Group 2 optional
" Close Regex
And some c++:
int getMatches(char* haystack, item** items){
first, calculate the length of the string (we'll use that later) and the number of # found in the string (the maximum number of matches)
int l = -1, ats = 0;
while (haystack[++l])
if (haystack[l] == '#')
ats++;
malloc a large enough array.
*items = (item*) malloc(ats * sizeof(item));
item* arr = *items;
Make a regex needle to find. REGEX is #defined elsewhere.
regex_t needle;
regcomp(&needle, REGEX, REG_ICASE|REG_EXTENDED);
regmatch_t match[5];
ret will hold the return value (0 for "found a match", but there are other errors you may want to be catching here). x will be used to count the found matches.
int ret;
int x = -1;
Loop over matches (ret will be zero if a match is found).
while (!(ret = regexec(&needle, haystack, 5, match,0))){
++x;
Get the name from match1
int bufsize = match[1].rm_eo-match[1].rm_so + 1;
arr[x].name = (char *) malloc(bufsize);
strncpy(arr[x].name, &(haystack[match[1].rm_so]), bufsize - 1);
arr[x].name[bufsize-1]=0x0;
Check to make sure the property (match[3]) and the value (match[4]) were found.
if (!(match[3].rm_so > l || match[3].rm_so<0 || match[3].rm_eo > l || match[3].rm_so< 0
|| match[4].rm_so > l || match[4].rm_so<0 || match[4].rm_eo > l || match[4].rm_so< 0)){
Get the property from match[3].
bufsize = match[3].rm_eo-match[3].rm_so + 1;
arr[x].property = (char *) malloc(bufsize);
strncpy(arr[x].property, &(haystack[match[3].rm_so]), bufsize - 1);
arr[x].property[bufsize-1]=0x0;
Get the value from match[4].
bufsize = match[4].rm_eo-match[4].rm_so + 1;
arr[x].value = (char *) malloc(bufsize);\
strncpy(arr[x].value, &(haystack[match[4].rm_so]), bufsize - 1);
arr[x].value[bufsize-1]=0x0;
} else {
Otherwise, set both property and value to NULL.
arr[x].property = NULL;
arr[x].value = NULL;
}
Move the haystack to past the match and decrement the known length.
haystack = &(haystack[match[0].rm_eo]);
l -= match[0].rm_eo;
}
Return the number of matches.
return x+1;
}
Hope this helps. Though it occurs to me now that you never answered kind of a vital question: What have you tried?

how to solve adding with REGEX

i want to get math equation only with addition such as 1+2+3 and return its result. i have the following code, and the problem is that it doesn't deal with doubles (i cant write 2.2+3.4)
I tried to change the regex expression to ([\+-]?\d+.\d+)([\+-])(-?(\d+.\d+)) and now it doesnt deal with integers (i cant write 2+4). what should be the correct regex expression to deal with doubles and integers? thanx
the code:
regEx = new Regex(#"([\+-]?\d+)([\+-])(-?(\d+))");
m = regEx.Match(Expression, 0);
while (m.Success)
{
double result;
switch (m.Groups[2].Value)
{
case "+":
result = Convert.ToDouble(m.Groups[1].Value) + Convert.ToDouble(m.Groups[3].Value);
if ((result < 0) || (m.Index == 0)) Expression = regEx.Replace(Expression, DoubleToString(result), 1);
else Expression = regEx.Replace(Expression, "+" + result, 1);
m = regEx.Match(Expression);
continue;
case "-":
result = Convert.ToDouble(m.Groups[1].Value) - Convert.ToDouble(m.Groups[3].Value);
if ((result < 0) || (m.Index == 0)) Expression = regEx.Replace(Expression, DoubleToString(result), 1);
else Expression = regEx.Replace(Expression, "+" + result, 1);
m = regEx.Match(Expression);
continue;
}
}
if (Expression.StartsWith("--")) Expression = Expression.Substring(2);
return Expression;
}
As the comments have stated, RegEx is not a good solution to this problem. You would be much better off with either a simple split statement (if you only want to support the + and - operators), or an actual parser (if you want to support actual mathematical expressions).
But, for the sake of explaining some RegEx, your problem is that \d+.\d+ matches "one or more digits, followed by any character, followed by one or more digits." If you gave it an integer greater than 99, it would work, since you're matching . (any character) and not \. (specifically the dot character).
A simpler version would be [\d\.]+, which matches one-or-more digits-or-dots. The problems is that it allows multiple dots, so 8.8.8.8 is a valid match. So what you really want is \d+\.?\d*, which matches one-or-more digits, one-or-zero dots, and zero-or-more digits. Thus 2, 2., and 2.05 are all valid matches.

I want to check a string against many different regular expressions at once

I have a string which the user has inputted and I have my regular expressions within my Database and I can check the input string against those regular expressions within the database fine.
But now I need to add another column within my database which will hold another regular expression but I want to use the same for loop to check the input string againt my new regular expression aswell but at the end of my first loop. But I want to use this new expression against the same string
i.e
\\D\\W\\D <-- first expression
\\d <-- second expression which I want to use after the first expression is over
use regular expressions from database against input string which works
add new regular expression and corporate that within the same loop and check against the same string - not workin
my code is as follows
std::string errorMessages [2][2] = {
{
"Correct .R\n",
},
{
"Free text characters out of bounds\n",
}
};
for(int i = 0; i < el.size(); i++)
{
if(el[i].substr(0,3) == ".R/")
{
DCS_LOG_DEBUG("--------------- Validating .R/ ---------------");
output.push_back("\n--------------- Validating .R/ ---------------\n");
str = el[i].substr(3);
split(st,str,boost::is_any_of("/"));
DCS_LOG_DEBUG("main loop done");
for (int split_id = 0 ; split_id < splitMask.size() ; split_id++ )
{
boost::regex const string_matcher_id(splitMask[split_id]);
if(boost::regex_match(st[split_id],string_matcher_id))
{
a = errorMessages[0][split_id];
DCS_LOG_DEBUG("" << a );
}
else
{
a = errorMessages[1][split_id];
DCS_LOG_DEBUG("" << a);
}
output.push_back(a);
}
DCS_LOG_DEBUG("Out of the loop 2");
}
}
How can I retrieve my regular expression from the database and after this loops has finished use this new regex against the same string.
STRING IS - shamari
regular expresssion i want to add - "\\d"
ask me any questions if you do not understand
I'm not sure I understand you entirely, but if you're asking "How do I combine two separate regexes into a single regex", then you need to do
combinedRegex = "(?:" + firstRegex + ")|(?:" + secondRegex + ")"
if you want an "or" comparison (either one of the parts must match).
For an "and" comparison it's a bit more complicated, depending on whether these regexes match the entire string or only a substring.
Be aware that if the second regex uses numbered backreferences, this won't work since the indexes will change: (\w+)\1 and (\d+)\1 would have to become (?:(\w+)\1)|(?:(\d+)\2), for example.