Regex Capture Middle Value - regex

I would like to ask for your help...
I have this string where I have to get the 4.75. I've tried many regex expression but I could not get it to work and been through browsing lots of examples as well.
Regexr Image
Loan Amount Interest Rate
$336,550 4.75 %
So far, below is my current expression
1. (?<=Interest Rate\s*\n*)([^\s]+).+(?=%)
I'm getting the $336,550 4.75
2. ([^\s]+).(?=%)
Resulted into multiple output. In my entire text, which I can't share, there are also other data that is in %.
I am only after the 4.75. I know I can just select the first match via code (i guess) but for now it is not an option.
Thanks in advance!
I've tried different regex expression

You just need to extract "4.75 %" ?
Try this:
(?<=Interest Rate\n\n\$\d{3},\d{3}\s)(\d{1,5}\.\d{1,5}\s%)

Since your regex with variable length patterns inside lookbehind works, you can use the following .NET compliant regex:
(?<=Interest Rate\s+\S+\s+)(\S+)(?=\s*%)
See the regex demo.
Details:
(?<=Interest Rate\s+\S+\s+) - a positive lookbehind that requires Interest Rate, one or more whitespaces, one or more non-whitespaces and again one or more whitespaces immediately to the left of the current location
(\S+) - Group 1: one or more non-whitespace chars
(?=\s*%) - a positive lookahead that requires zero or more whitespaces and then a % char immediately to the right of the current location.

Hi Please try this.
[0-9]+.[0-9]+

Related

Capture groups in 1 line with fixed delimiters

I'm a beginner at regex and still don't understand a lot. I apologize in advance from any wrong notations or missing information :(
I need to extract groups from an e-mail subject where I have to use each value further on in a process to use as a folder or document name.
Example: 123456/TEXT/567890/01Moretext
I need to get the following pieces of text:
123456
TEXT
567890
01Moretext
in seperate regex commands.
So far I have:
^\d{6}, which gives me 123456
(?<=/)[^/]*, which gives me TEXT
I can't figure out how to extract the third group, 567890
[^/]*$, which gives me 01Moretext
Would appreciate any help that can prevent my head from exploding!
You can use
[^/]+(?=/[^/]*$)
See the regex demo. Details:
[^/]+ - one or more chars other than /
(?=/[^/]*$) - a positive lookahead that requires a / and then one or more chars other than / till the end of string.

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

Get all the characters until a new date/hour is found

I have to parse a lot of content with a regular expression.
The content might, for example, be:
14-08-2015 14:18 : Example : Hello =) How are you?
What are you doing?
14-08-2015 14:19: Example2 : I'm fine thanks!
I have this regular expression that will of course return 2 matches, and the groups that I need - data, hour, name, multi line message:
(\d{2}-\d{2}-\d{4})\s?(\d{2}:\d{2})\s?:([^:]+):([^\d]+)
The problem is that if a number is written inside the message this will not be OK, because the regex will stop getting more characters.
For example in this case this will not work:
14-08-2015 14:18 : Example : Hello =) How are you?
What are you 2 doing?
14-08-2015 14:19: Example2 : I'm fine thanks!
How do I get all the characters until a new date/hour is found?
The problem is with your final capturing group ([^\d]+).
Instead you can use ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+)
The outer parenthesis: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+) indicate a capturing group
The next set of parenthesis: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+) indicate a non-capturing group that we want to match 1 to infinite amount of times.
Inside we have a negative look ahead: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+). This says that whatever we are matching cannot include a date.
What we actually capture: ((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+) means we capture every character including a new line.
The entire regex that works looks like this:
(\d{2}-\d{2}-\d{4})\s?(\d{2}:\d{2})\s?:([^:]+):((?:(?!\d{2}-\d{2}-\d{4})[\s\S])+)
https://regex101.com/r/wH5xR2/2
Use a lookahead for dates and get everything up to that.
/^(\d{2}-\d{2}-\d{4})\s?(\d{2}:\d{2})\s?:([^:]+):\s?((?:(?!^\d{2}-\d{2}-\d{4}\s?\d{2}:\d{2}).)*)/sm
I've edited you regex in two ways:
Added ^to the front, ensuring you only start from timestamps on their own line, which should filter out most issues with people posting timestamps
Replaced the last capturing group with ((?:(?!^\d{2}-\d{2}-\d{4}\s?\d{2}:\d{2}).)*)
(?!^\d{2}-\d{2}-\d{4}\s?\d{2}:\d{2}) is a negative lookahead, with date
(?:(lookahead).)* Looks for any amount of characters that aren't followed by a date anchored to the start of a line.
((?:(lookahead).)*) Just captures the group for you.
It's not that efficient, but it works. Note the s flag for dotall (dot matches newlines) and m flag that lets ^ match at the start of line. ^ is necessary in the lookahead so that you don't stop the match in case someone posts a timestamp, and in the start to make sure you only match dates from the start of a line.
DEMO: https://regex101.com/r/rX8eH0/3
DEMO with flags in regex: https://regex101.com/r/rX8eH0/4

How to extract a numeric substring from a string but only if the previous string part matches a target

So I am trying to extract defect numbers from changeset comments in TFS. However, there are several ways people have entered the numbers:
"Defect 1321: blah blah blah"
"Fixes HPQC 1427. Logic modified"
"- Bug 976 - Customer"
I am not great with regexes so any help would be great. I prepare the string ahead of time by tolowering it and stripping out the # and ., so I can be assured I am looking for something that starts with (defect|hpqc|bug) has an optional space (\s) then a number (\d) then ends with a space (\s) but this didn't work:
(defect|hpqc|bug)\s\d\s
I only want to find the first match.
I want to extract the numeric component but only if the previous word is a match.
I am sure this is a result of my trivial knowledge of regex creation.
Case matters (usually) and you want more than one digit \d+ and there is an optional number sign too so something like this should work, depending on your system:
(Defect|HPQC|Bug)\s*#?\s*(\d+)
This allows spaces and # or neither before the digits, and captures the digits. It would help to know if you are using python or something else (tag your question).
I believe this regex should work for you:
(?:defect|hpqc|bug)\s+(\d+)\s+
Defect/Bug # is available in matched group #1
If you are looking only for the number after the keyword here is a regex might should help...
(?<=(Defect|HPQC|Bug)\s*#?\s*)\d+
Good Luck!
I precise Beroe response :
(?:Defect|HPQC|Bug)\s*\#?\s*(\d+)`
(?:Defect|HPQC|Bug) : detect but don't capture
\# : slash for disable the comment
It works for me on Expresso

TextMate: Regex replacing $1 with following 0

I'm trying to fix a file full of 1- and 2-digit numbers to make them all 2 digits long.
The file is of the form:
10,5,2
2,4,5
7,7,12
...
I've managed to match the problem numbers with:
(^|,)(\d)(,|$)
All I want to do now is replace the offending string with:
${1}0$2$3
but TextMate gives me:
10${1}05,2
Any ideas?
Thanks in advance,
Ross
According to this, TextMate supports word boundary anchors, so you could also search for \b\d\b and replace all with 0$0. (Thanks to Peter Boughton for the suggestion!)
This has the advantage of catching all the numbers in one go - your solution will have to be applied at least twice because the regex engine has already consumed the comma before the next number after a successful replace.
Note: Tim's solution is simpler and solves this problem, but I'll leave this here for reference, in case someone has a similar but more complex problem, which using lookarounds can support.
A simpler way than your expression is to replace:
(?<!\d)\d(?!\d)
With:
0$0
Which is "replace all single digits with 0 then itself".
The regex is:
Negative lookbehind to not find a digit (?<!\d)
A single digit: \d
Negative lookahead to not find a digit (?!\d)
Single this is a positional match (not a character match), it caters for both comma and start/end positions.
The $0 part says "entire match" - since the lookbehind/ahead match positions, this will contain the single digit that was matched.
To anyone coming here, as #Amarghosh suggested, it's a bug, or intentional behavior that leads to problems if nothing else.
I just had this problem and had to use the following workaround: If you set up another capture group, and then use a conditional insertion, it will work. For example, I had a string like <WebObject name=Frage01 and wanted to replace the 01 with 02, so I captured the main string in $1 and the end number in $2, which gave me a regex that looked like (<WebObject name=(Frage|Antwort))(01).
Then the replace was $1(?2:02).
The (?2:02) is the conditional insertion, and in this instance will always find something, but it was necessary in order to work around the odd conundrum of appending a number to the end of $n. Hope that helps someone. There is documentation on the conditional insertion here
In TextMate 1.5.11 (1635) ${1} does not work (like the OP described).
I appreciate the many suggestions re altering the query string, however there is a much simpler solution, if you want to break between a capture group and a number: \u.
It is a TextMate specific replacement syntax, that converts the following character to uppercase. As there is no uppercase for numbers, it does nothing and moves on. It is described in the link from Tim Pietzcker's answer.
In my case I had to clean up a csv file, where box measurements were given in cm x cm x mm. Thus I had to add a zero to the first two numbers.
Text: "80 x 40 x 5 mm"
Desired text: "800 x 400 x 5 mm"
Find: (\d+) x (\d+) x (\d+)
Replace: $1\u0 x $2\u0 x $3 mm
Regarding the support of more than 10 capture groups, I do not know if this is a bug. But as OP and #rossmcf wrote, $10 is replaced with null.
You need not ${1} - replace strings support only up to nine groups maximum - so it won't mistake it for $10.
Replace with $10$2$3