I want a regex to match complex mathematical expressions.
However I will ask for an easier regex because it will be the simplest case.
Example input:
1+2+3+4
I want to separate each char:
[('1', '+', '2', '+', '3', '+', '4')]
With a restriction: there has to be at least one operation (i.e. 1+2).
My regex: ([0-9]+)([+])([0-9]+)(([+])([0-9]+))*
or (\d+)(\+)(\d+)((\+)(\d+))*
Output for re.findall('(\d+)(\+)(\d+)((\+)(\d+))*',"1+2+3+4")
:
[('1', '+', '2', '+4', '+', '4')]
Why is this not working? Is Python the problem?
You could go the test route.
See if its valid using re.match
then just get the results with re.findall
Python code
import re
input = "1+2+3+4";
if re.match(r"^\d+\+\d+(?:\+\d+)*$", input) :
print ("Matched")
print (re.findall(r"\+|\d+", input))
else :
print ("Not valid")
Output
Matched
['1', '+', '2', '+', '3', '+', '4']
You need ([0-9]+)([+])([0-9]+)(?:([+])([0-9]+))*
you get the '+4' for the group is out the last two expressions (([+])([0-9]+)).
the ?: indicate to python dont get de string for this group in the output.
Related
I have a mathematical expression (formula) in string format. What I'm trying to do is to split that string and make an array of all the operators and words collectively an array. I'm doing this by passing regex to split() function (As I'm new with regex I tried to create the regex to get my desired result). With this expression I'm getting an array seperated by operators, digits and words. But, somehow I'm getting an extra blank element in the array after each element. Have a look below to get what I'm exactly talking about.
My mathematical expression (formula):
1+2-(0.67*2)/2%2=O_AnnualSalary
Regex that I'm using to split it into an array:
this.createdFormula.split(/([?=+-*/%,()])/)
What I'm expecting an array should I get:
['1', '+', '2', '-', '(', '0', '.', '6', '7', '*', '2', ')', '/', '2', '%', '2', '=', 'O_AnnualSalary']
This what I'm getting:
['', '1', '', '+', '', '2', '', '-', '', '(', '', '0', '', '.', '', '6', '', '7', '', '*', '', '2', '', ')', '', '/', '', '2', '', '%', '', '2', '', '=', 'O_AnnualSalary']
So far what I've tried this expressions from many posts on SO:
this.createdFormula.split(/([?=+-\\*\\/%,()])/)
this.createdFormula.split(/([?=\\\W++-\\*\\/%,()])/)
this.createdFormula.split(/([?=//\s++-\\*\\/%,()])/)
this.createdFormula.split(/([?=+-\\*\\/%,()])(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)/)
this.createdFormula.split(/([?=+-\\*\\/%0-9,()])/)
Can anyone help me to fix this expression to get the desired result? If you need any more information please feel free to ask.
Any help is really appreciated.
Thanks
Assuming you have string match() available, we can use:
var input = "1+2-(0.67*2)/2%2=O_AnnualSalary,HwTotalDays";
var parts = input.match(/(?:[0-9.,%()=/*+-]|\w+)/g);
console.log(parts);
I am trying to implement a tokenizer to split string of words.
The special conditions I have are: split punctuation . , ! ? into a separate string
and split any characters that have a space in them i.e. I have a dog!'-4# -> 'I', 'have', 'a' , 'dog', !, "'-4#"
Something like this.....
I don't plan on trying the nltk's package, and I have looked at re.split and re.findall, yet for both cases:
re.split = I don't know how to split out words with punctuation next to them such as 'Dog,'
re.findall = Sure it prints out all the matched string, but what about the unmatched ones?
IF you guys have any suggestions, I'd be very happy to try them.
Are you trying to split on a delimiter(punctuation) while keeping it in the final results? One way of doing that would be this:
import re
import string
sent = "I have a dog!'-4#"
punc_Str = str(string.punctuation)
print(re.split(r"([.,;:!^ ])", sent))
This is the result I get.
['I', ' ', 'have', ' ', 'a', ' ', 'dog', '!', "'-4#"]
Try:
re.findall(r'[a-z]+|[.!?]|(?:(?![.!?])\S)+', txt, re.I)
Alternatives in the regex:
[a-z]+ - a non-empty sequence of letters (ignore case),
[.!?] - any (single) char from your list (note that between brackets
neither a dot nor a '?' need to be quoted),
(?:(?![.!?])\S)+ - a non-empty sequence of non-white characters,
other than in your list.
E.g. for text containing I have a dog!'-4#?. the result is:
['I', 'have', 'a', 'dog', '!', "'-4#", '?', '.']
num = re.findall (r'[-+]?\d*\.*\d+' , str (table))
Hi all I have this regular expression and it is printing the values i want. However, they are separated.
For example:
['7', '336.82', '-3.89', '-0.05', '7', '351.60', '7', '322.86', '7', '340.71']
is what it prints
But i want it to print:
['7,336.82', '-3.89', '-0.05', '7,351.60', '7,322.86', '7,340.71']
Please could someone help?
Thanks in advance.
Looks like you want to capture numbers that are separated by comma. You can use:
r'[-+]?(?:\d+[\d,]*)?\.?\d+'
RegEx Demo
If validating numbers by mandatory 3 digits after comma is necessary:
[-+]?\d{1,3}(\,\d{3})*(\.\d+)?
if input is 1,000,00.0 it means: 1,000 and 00.0 in this answer.
Demo: https://regex101.com/r/8nYbaQ/2
If 01,123 should be reject: (because of starting 0 digit)
(\+?[1-9]|\-\d)\d{0,2}(\,\d{3})*(\.\d+)?
Demo: https://regex101.com/r/8nYbaQ/3
I'm trying to capture tokens from a pseudo-programming-language script, but the +-*/, etc are not captured.
I tried this:
[a-z_]\w*|"([^"\r\n]+|"")*"|\d*\.?\d*|\+|\*|\/|\(|\)|&|-|=|,|!
For example i have this code:
for i = 1 to 10
test_123 = 3.55 + i- -10 * .5
next
msg "this is a ""string"" with quotes in it..."
in this part of code the regular expression has to highlight:
valid variablenames,
strings enclosed with quotes,
operators like (),+-*/!
numbers like 0.1 123 .5 10.
the result of the regular expression has to be:
'for',
'i',
'=',
'1',
'to',
'10',
'test_123',
'=',
'3.55',
'+'
etc....
the problem is that the operators are not selected if i use this regular expression...
We don't know your requirements, but it seems that in your regex you are capturing only a few non \n, \r etc...
try something like this, grouping the tokens you want to capture:
'([a-z_]+)|([\.\d]+)|([\+\-\*\/])|(\=)|([\(\)\[\]\{\}])|(['":,;])'
EDIT: With the new information you wrote in your question, I adjusted the regex to this new one, and tried it with python. I don't know vbscript.
import re
test_string = r'''for i = 1 to 10:
test_123 = 3.55 + i- -10 * .5
next
msg "this is a 'string' with quotes in it..."'''
patterb = r'''([\da-z_^\.]+|[\.\d]+|[\+\-\*\/]|\=|[\(\)\[\]\{\}]|[:,;]|".*[^"]"|'.*[^']')'''
print(re.findall(pattern, test_string, re.MULTILINE))
And this is the list with the matches:
['for', 'i', '=', '1', 'to', '10', ':', 'test_123', '=', '3.55', '+', 'i', '-', '-', '10', '*', '.5', 'next', 'msg', '"this is a \'string\' with quotes in it..."']
I think it captures all you need.
This fits my needs i guess:
"([^"]+|"")*"|[\-+*/&|!()=,]|[a-z_]\w*|(\d*\.)?\d*
but only white space must be left over so i have to find a way to capture everything else that is not white space to if its not any of the other options in my regular expression.
characters like "$%µ°" are ignored even when i put "|." after my regular expression :(
I have a text file with this format:
('1', '2', '3', '4', '5');
('a', 'b', 'c', 'd', 'e');
etc...
I want from each line the third and the fourth entry in the ''
My Text file has 125k lines so it is something big.
Thank you
^.*?,.*?,(.*?),(.*?),.*
will get you the third and fourth fields in \1 and \2 (assuming no commas will appear between quotes, that you wish not to be treated as delimiters, or anything like that).
When run on your example, replacing with \1,\2, the end result is:
'3', '4'
'c', 'd'