How to match variables in formula with regular epxressions - regex

If I have a formula similar to this: a + b - c * (exp(a*b) ) / 3
I want to match only variables(a, b, c). For me, [a-zA-Z]+ does the job. However I do not want to match exp function. How can I achieve this with regular expressions? I use javascript.

([a-zA-Z]+)\b(?!\s*\()
more common notion of acceptable variable names would be
\b([a-zA-Z_]\w*)\b(?!\s*\()
with dots in function names it becomes
(?:[^.]|^)\b([a-zA-Z]+)\b(?!(\.|\s*\())
(the variable will be in the first capturing match)

Changing it to \b[A-z]\b will match single a-z letters not standing next to other characters.

Related

Combining 2 regular expressions

I have 2 strings and I would like to get a result that gives me everything before the first '\n\n'.
'1. melléklet a 37/2018. (XI. 13.) MNB rendelethez\n\nÁltalános kitöltési előírások\nI.\nA felügyeleti jelentésre vonatkozó általános szabályok\n\n1.
'12. melléklet a 40/2018. (XI. 14.) MNB rendelethez\n\nÁltalános kitöltési előírások\n\nKapcsolódó jogszabályok\naz Önkéntes Kölcsönös Biztosító Pénztárakról szóló 1993. évi XCVI. törvény (a továbbiakban: Öpt.);\na személyi jövedelemadóról szóló 1995. évi CXVII.
I have been trying to combine 2 regular expressions to solve my problem; however, I could be on a bad track either. Maybe a function could be easier, I do not know.
I am attaching one that says that I am finding the character 'z'
extended regex : [\z+$]
I guess finding the first number is: [^0-9.].+
My problem is how to combine these two expressions to get the string inbetween them?
Is there a more efficient way to do?
You may use
re.findall(r'^(\d.*?)(?:\n\n|$)', s, re.S)
Or with re.search, since it seems that only one match is expected:
m = re.search(r'^(\d.*?)(?:\n\n|$)', s, re.S)
if m:
print(m.group(1))
See the Python demo.
Pattern details
^ - start of a string
(\d.*?) - Capturing group 1: a digit and then any 0+ chars, as few as possible
(?:\n\n|$) - a non-capturing group matching either two newlines or end of string.
See the regex graph:

How to restrict the expression in QLineEdit

I need a QLineEdit which must represent a range.
F.E. (1,2] , and for this representation I want to set a validation checker for user not to write other symbols.
In this case I have char + int + char + int + char as shown in example below.
Does Qt have any feature to handle this?
Thanks in advance.
You can use Qt's Input Validator feature to achieve this goal.
The following snippet will restrict the input on a line edit as you specified.
QRegExp re("^[[,(]{1,1}(0|[1-9]{1,1}[0-9]{0,9})[,]{1,1}(0|[1-9]{1,1}[0-9]{0,9})[],)]{1,1}$");
QRegExpValidator *validator = new QRegExpValidator(re, this);
ui->lineEdit->setValidator(validator);
Edit
Updated the regex
QRegExp expr("^[[,(]{1,1}(0|[1-9]{1,1}[0-9]{0,9})[,]{1,1}(0|[1-9]{1,1}[0-9]{0,9})[],)]{1,1}$");
This is what I wanted! I must allow more then one leading 0-s.
It is not possible to write a regexp accepting only valid ranges, the reason is that you can check the syntax but not the numeric value (unless e regexp engine has some extensions). The difference between
[1234,5678)
and
[5678,1234)
is not in the syntax (what regexps are about), but in the semantics (where regexps are not that powerful).
For checking just the syntax a regexp could be
\[\d+,\d+\)
or, if you also allow other types of interval boundary conditions:
[\[)]\d+,\d+[\])]
I would recommend not allowing all chars but only the needed ones. Example:
QRegExp("[\\\\\\(\\)\\{\\}]\\d[\\\\\\(\\)\\{\\}]\\d[\\\\\\(\\)\\{\\}]");
I'll explain:
[] these contain the matchin characters for your char: \\ (this is actually matching the \ sign, as you need to escape it once for your Regular Expression \ and once more for Qt String makes it \\), \( is for opening bracket and so on. You can add all chars you would like to be matched. A good help is the Regular Expression Cheat Sheet for this.
\d is matching a single digit, if you want to have more than one digit you could use \d+ for at least one or \d{3} for exactly 3 digits. (+ 1 or more, ? 0 or 1, * 0 or more)
Another example would be:
QRegExp("[\\\\\\(\\)\\{\\}]\\d[,\\.]\\d[\\\\\\(\\)\\{\\}]");
for having the center character to be a . or a , sign.

(V)C++ (2010) regular expressions, "recursive captures"

I want match and capture operators and operands of an expression like:
1
x
1 + x
x + y + 3 + 10
etc...
So on regexpal,
(\w+)(\s*([+])\s*(\w+))*
Appears to do it, but how do I obtain the matched captures? Notice [+] and (\w+) is already in 1 capture.
Unfortunately this is not possible (at least in any regex flavor that I know of). If one capturing group is used multiple times, the capture will always be filled with the last thing it captured. Simpley example: ([a-z])* applied to abc will give you only c.
I recommend that you use the regex just to check for a valid format. Then you can split the string at the matches of \s*\b\s*. This should then result in an array containing x, +, y, +, 3, +, 10 for your last example.
Here is some example code that shows how to use regexes to split strings, using boost::regex.
Maybe this would be a better job for System.CodeDom.Compiler than for Regexes.
If boost is an option for you, then you can use boost::regex with boost::match_extra flag, then match_results::captures and sub_match::captures contain list of all captured items

Extract numbers between brackets within a string [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Extract info inside all parenthesis in R (regex)
I inported data from excel and one cell consists of these long strings that contain number and letters, is there a way to extract only the numbers from that string and store it in a new variable? Unfortunately, some of the entries have two sets of brackets and I would only want the second one? Could I use grep for that?
the strings look more or less like this, the length of the strings vary however:
"East Kootenay C (5901035) RDA 01011"
or like this:
"Thompson-Nicola J (Copper Desert Country) (5933039) RDA 02020"
All I want from this is 5901035 and 5933039
Any hints and help would be greatly appreciated.
There are many possible regular expressions to do this. Here is one:
x=c("East Kootenay C (5901035) RDA 01011","Thompson-Nicola J (Copper Desert Country) (5933039) RDA 02020")
> gsub('.+\\(([0-9]+)\\).+?$', '\\1', x)
[1] "5901035" "5933039"
Lets break down the syntax of that first expression '.+\\(([0-9]+)\\).+'
.+ one or more of anything
\\( parentheses are special characters in a regular expression, so if I want to represent the actual thing ( I need to escape it with a \. I have to escape it again for R (hence the two \s).
([0-9]+) I mentioned special characters, here I use two. the first is the parentheses which indicate a group I want to keep. The second [ and ] surround groups of things. see ?regex for more information.
?$ The final piece assures that I am grabbing the LAST set of numbers in parens as noted in the comments.
I could also use * instead of . which would mean 0 or more rather than one or more i in case your paren string comes at the beginning or end of a string.
The second piece of the gsub is what I am replacing the first portion with. I used: \\1. This says use group 1 (the stuff inside the ( ) from above. I need to escape it twice again, once for the regex and once for R.
Clear as mud to be sure! Enjoy your data munging project!
Here is a gsubfn solution:
library(gsubfn)
strapplyc(x, "[(](\\d+)[)]", simplify = TRUE)
[(] matches an open paren, (\\d+) matches a string of digits creating a back-reference owing to the parens around it and finally [)] matches a close paren. The back-reference is returned.

Difference between * and + regex

Can anybody tell me the difference between the * and + operators in the example below:
[<>]+ [<>]*
Each of them are quantifiers, the star quantifier(*) means that the preceding expression can match zero or more times it is like {0,} while the plus quantifier(+) indicate that the preceding expression MUST match at least one time or multiple times and it is the same as {1,} .
So to recap :
a* ---> a{0,} ---> Match a or aa or aaaaa or an empty string
a+ ---> a{1,} ---> Match a or aa or aaaa but not a string empty
* means zero-or-more, and + means one-or-more. So the difference is that the empty string would match the second expression but not the first.
+ means one or more of the previous atom. ({1,})
* means zero or more. This can match nothing, in addition to the characters specified in your square-bracket expression. ({0,})
Note that + is available in Extended and Perl-Compatible Regular Expressions, and is not available in Basic RE. * is available in all three RE dialects. That dialect you're using depends most likely on the language you're in.
Pretty much, the only things in modern operating systems that still default to BRE are grep and sed (both of which have ERE capability as an option) and non-vim vi.
* means zero or more of the previous expression.
In other words, the expression is optional.
You might define an integer like this:
-*[0-9]+
In other words, an optional negative sign followed by one or more digits.
They are quantifiers.
+ means 1 or many (at least one occurrence for the match to succeed)
* means 0 or many (the match succeeds regardless of the presence of the search string)
[<>]+ is same as [<>][<>]*
I'll bring some example to extend answers above. Let we have a text:
100test10
test10
test
if we write \d+test\d+, this expression matches 100test10 and test10 but \d*test\d* matches three of them