regular expression: extract number in an expression - regex

Suppose I have the following expression:
"1+3x+52-9-45x+x"
my goal is to extract all the constants:
[1,+52,-9]
I have tried using Python:
re.findall("[+-]?\d+","1+3x+52-9-45x+x")
Result is:
['1', '+3', '+52', '-9', '-45']
which are not correct because the coefficents of x are also extracted.
I also tried:
re.findall("[+-]?\d+[+-]?","1+3x+52-9-45x+x")
But still not working.

Try this Regex:
(?:[+-])?\b\d+\b
Demo
OR
(?:[+-])?\d+(?=[\s+-]|$)
Demo
Explanation(for the 1st Regex):
(?:[+-]) Matching Either a + or a -(Add more operators if you want)
? Making + or - optional
\b\d+\b matching 1 or more digits between 2 non-words(so it will not include the coefficients)
Explanation(for the 2nd Regex):
(?:[+-]) Matching Either a + or a -(Add more operators if you want)
? Making + or - optional
\d+(?=[+-]) matching 1 or more digits(greedy) immediately followed by a + or - or a space or If it is the end of line. You can add more operators if you want.

Related

Vim: Adding * symbol between numbers and the left parenthesis

I have thousands of lines in a file in the following format:
x1(t) = 1.568(1-t) + 5.145(1-t)**2 + ... (other terms)
x2(t) = 3.347(1-t) + 1.304(1-t)**2 + ...
x3(t) = 7.016(1-t) + 1.901(1-t)**2 + ...
x4(t) = 0.843(1-t) + 5.335(1-t)**2 + ...
....
As you can see, there is no * sign between the numbers and the left parenthesis. I could record a macro to fix that ,but for some reason I do like to use the :substitute command with regular expressions, instead.
I've tried the following:
:%s/[0-9]([0-9]/*(/g
But that does substitute also the digits before and after the left parenthesis. I don't know how to match the parenthesis alone without matching the numbers before and after.
I appreciate your help.
You may use
:%s/[0-9]\zs(\ze[0-9]/*(/g
This is roughly equivalent to a [0-9]\K\((?=[0-9]) PCRE regex and matches:
[0-9] - a digit
\zs - omit the text matched so far from match memory buffer
( - a ( char
\ze - end of consuming pattern, the rest is context
[0-9] - a digit must appear after ( (the digit is just context, not part of a match).

Matlab: How to replace dynamic part of string with regexprep

I have strings like
#(foo) 5 + foo.^2
#(bar) bar(1,:) + bar(4,:)
and want the expression in the first group of parentheses (which could be anything) to be replaced by x in the whole string
#(x) 5 + x.^2
#(x) x(1,:) + x(4,:)
I thought this would be possible with regexprep in one step somehow, but after reading the docu and fiddling around for quite a while, I have not found a working solution, yet.
I know, one could use two commands: First, grab the string to be matched with regexp and then use it with regexprep to replace all occurrences.
However, I have the gut feeling this should be somehow possible with the functionality of dynamic expressions and tokens or the like.
Without the support of an infinite-width lookbehind, you cannot do that in one step with a single call to regexprep.
Use the first idea: extract the first word and then replace it with x when found in between word boundaries:
s = '#(bar) bar(1,:) + bar(4,:)';
word = regexp(s, '^#\((\w+)\)','tokens'){1}{1};
s = regexprep(s, strcat('\<',word,'\>'), 'x');
Output: #(x) x(1,:) + x(4,:)
The ^#\((\w+)\) regex matches the #( at the start of the string, then captures alphanumeric or _ chars into Group 1 and then matches a ). tokens option allows accessing the captured substring, and then the strcat('\<',word,'\>') part builds the whole word matching regex for the regexprep command.

Combining 2 regular expressions

I have 2 strings and I would like to get a result that gives me everything before the first '\n\n'.
'1. melléklet a 37/2018. (XI. 13.) MNB rendelethez\n\nÁltalános kitöltési előírások\nI.\nA felügyeleti jelentésre vonatkozó általános szabályok\n\n1.
'12. melléklet a 40/2018. (XI. 14.) MNB rendelethez\n\nÁltalános kitöltési előírások\n\nKapcsolódó jogszabályok\naz Önkéntes Kölcsönös Biztosító Pénztárakról szóló 1993. évi XCVI. törvény (a továbbiakban: Öpt.);\na személyi jövedelemadóról szóló 1995. évi CXVII.
I have been trying to combine 2 regular expressions to solve my problem; however, I could be on a bad track either. Maybe a function could be easier, I do not know.
I am attaching one that says that I am finding the character 'z'
extended regex : [\z+$]
I guess finding the first number is: [^0-9.].+
My problem is how to combine these two expressions to get the string inbetween them?
Is there a more efficient way to do?
You may use
re.findall(r'^(\d.*?)(?:\n\n|$)', s, re.S)
Or with re.search, since it seems that only one match is expected:
m = re.search(r'^(\d.*?)(?:\n\n|$)', s, re.S)
if m:
print(m.group(1))
See the Python demo.
Pattern details
^ - start of a string
(\d.*?) - Capturing group 1: a digit and then any 0+ chars, as few as possible
(?:\n\n|$) - a non-capturing group matching either two newlines or end of string.
See the regex graph:

Python Regex - How to extract the third portion?

My input is of this format: (xxx)yyyy(zz)(eee)fff where {x,y,z,e,f} are all numbers. But fff is optional though.
Input: x = (123)4567(89)(660)
Expected output: Only the eeepart i.e. the number inside 3rd "()" i.e. 660 in my example.
I am able to achieve this so far:
re.search("\((\d*)\)", x).group()
Output: (123)
Expected: (660)
I am surely missing something fundamental. Please advise.
Edit 1: Just added fff to the input data format.
You could find all those matches that have round braces (), and print the third match with findall
import re
n = "(123)4567(89)(660)999"
r = re.findall("\(\d*\)", n)
print(r[2])
Output:
(660)
The (eee) part is identical to the (xxx) part in your regex. If you don't provide an anchor, or some sequencing requirement, then an unanchored search will match the first thing it finds, which is (xxx) in your case.
If you know the (eee) always appears at the end of the string, you could append an "at-end" anchor ($) to force the match at the end. Or perhaps you could append a following character, like a space or comma or something.
Otherwise, you might do well to match the other parts of the pattern and not capture them:
pattern = r'[0-9()]{13}\((\d{3})\)'
If you want to get the third group of numbers in brackets, you need to skip the first two groups which you can do with a repeating non-capturing group which looks for a set of digits enclosed in () followed by some number of non ( characters:
x = '(123)4567(89)(660)'
print(re.search("(?:\(\d+\)[^(]*){2}(\(\d+\))", x).group(1))
Output:
(660)
Demo on rextester

How to select a part of a string OR another with REGEXP in MATLAB

I've been trying to solve this problem in the last few days with no success. I have the following string:
comment = '#disabled, Fc = 200Hz'
What I need to do is: if there's the string 'disabled' it needs to be matched. Otherwise I need to match the number that comes before 'Hz'.
The closest solution I found so far was:
regexpi(comment,'\<#disabled\>|\w*Hz\>','match') ;
It will match the word '#disabled' or anything that comes before 'Hz'. Problem is that when it first finds '#disabled#' it copies also the result '200Hz'.
So I'm getting:
ans = '#disabled' '200Hz'
Summing up, I need to select only the 'disabled' part of a string if there is one, otherwise I need to get the number before 'Hz'.
Can someone give me a hand ?
Suppose your input is:
comment = {'#disabled, Fc = 200Hz';
'Fc = 300Hz'}
The regular expression (match disabled if follows # otherwise match digits if they are followed by Hz):
regexp(comment, '(?<=^#)disabled|\d+(?=Hz)','match','once')
Explaining it:
^# - match # at the beginning of the line
(?<=expr)disabled - match disabled if follows expr
expr1 | expr2 - otherwise match expr2
\d+ - match 1 or more digits, equivalently [0-9]+
expr(?=Hz) - match expr only if followed by 'Hz'
Diagram:
Debuggex Demo