I'm stumped on the following getting the following regex to work (VB.NET)
Input:
+1.41 DS +0.93 DC x 3* #12.5 mm (4.00 Rx Calc)
Expected Output:
+0.93
I've gotten as far as the following expression:
DS[ \t]*[+\-][ \t]*\d{1,2}\.\d{2}
This returns a result of
DS +0.93
I need to return only +0.93 (without any leading whitespace), when i modify the Regex as:
(?DS[ \t]*)([+\-][ \t]*\d{1,2}\.\d{2})
I get the error unrecognized grouping construct, I don't understand why it's giving me this error. I think my non-matching group is incorrect, but i can't find why/where?
You may use a positive lookbehind here:
(?<=DS[ \t]*)[+-][\t ]*\d{1,2}\.\d{2}
^^^
See the regex demo
To make sure you match the number and DS as whole words (with no letters, digits or _ in front and at the back) use word boundaries:
(?<=\bDS[ \t]*)[+-][\t ]*\d{1,2}\.\d{2}\b
Or a negative lookahead (?!\d) after \d{2}:
(?<=\bDS[ \t]*)[+-][\t ]*\d{1,2}\.\d{2}(?!\d)
See another regex demo.
Details
(?<=\bDS[ \t]*) - a positive lookbehind that matches a location in string that is immediately preceded with DS as a whole word followed with 0+ spaces or tabs
[+-] - a + or -
[\t ]* - 0+ spaces or tabs
\d{1,2} - 1 or 2 digits
\. - a dot
\d{2} - 2 digits
(?!\d) - no digit allowed immediately to the right of the current location.
VB.NET demo:
Dim my_rx As Regex = New Regex("(?<=\bDS[ \t]*)[+-][\t ]*\d{1,2}\.\d{2}(?!\d)")
Dim my_result As Match = my_rx.Match(" +1.41 DS +0.93 DC x 3* #12.5 mm (4.00 Rx Calc)")
If my_result.Success Then
Console.WriteLine(my_result.Value) ' => +0.93
End If
Related
Lines=["1x+1y+0","1x-1y+0","-1ax+0y-3","0x+1y-0.5"]
I am trying to find the intercept, say for equation no 3 i.e "-1ax+0y-3"
re.findall('[+-][\w]*[^XxYy]',Lines[2])
but it gives me
['-1ax+', '-3']
I was expecting only -3
[+-]?\w*?[^XxYy](?=\+|-|$) will give you the expected result.
[+-]? is making the sign optional, so you can also match a positive value at the start of your string
*? is making it ungreedy and
(?=\+|-|$) is a lookahead to check if there is either +, - or the end of the string after your value.
If you just want to match numbers: [+-]?[0-9\.]+?[^XxYy](?=\+|-|$)
[0-9\.] will match numbers or a decimal point!
You could use
(?<=[xyXY])[+-]?\d+(?:\.\d+)?$
Explanation
(?<=[xyXY]) Positive lookbehind, assert x or y at the left
[+-]? Optionally match + or -
\d+(?:\.\d+)? Match digits with an optional decimal part
$ End of string
Regex demo
import re
Lines = ["1x+1y+0","1x-1y+0","-1ax+0y-3","0x+1y-0.5"]
print(re.findall('(?<=[xyXY])[+-]?\d+(?:\.\d+)?$', Lines[2]))
Output
['-3']
You can use
re.findall(r'-?\b\d+(?:\.\d+)?\b', Lines[2])
See the regex demo. Details:
-? - an optional -
\b - a word boundary, no glued letters allowed
\d+ - one or more digits
(?:\.\d+)? - an optional fractional part
\b - a word boundary.
Problem:
How create regex to parse "DISNAY LAND 2.0 GCP" like name from Array of lines in Scala like this:
DE1ALAT0002 32.4756 -86.4393 106.1 ZQ DISNAY LAND 2.0 GCP 23456
//For using in code:
val regex = """(?:[\d\.\d]){2}\s*(?:[\d.\d])\s*(ZQ)\s*([A-Z])""".r . // my attempt
val getName = row match {
case regex(name) => name
case _ =>
}
I'm sure only in:
1) there is different number of spaces between values
2) useful value "DISNAY LAND 2.0 GCP" come after double number and "ZQ" letters
3) name separating with one space and may consist of one or many words
4) name ending with two or more spaces
sorry if I repeat the question, but after a long search I did not find the right solution
Many thank for answers
You may use an .unanchored pattern like
\d\.\d+\s+ZQ\s+(\S+(?:\s\S+)*)
See the regex demo. Details
\d\.\d+ - 1 digit, . and then 1+ digits
\s+ - 1+ whitespaces
ZQ - ZQ substring
\s+ - 1+ whitespaces (here, the left-hand side context definition ends, now, starting to capture the value we need to return)
(\S+(?:\s\S+)*) - Capturing group 1:
\S+ - 1 or more non-whitespace chars
(?:\s\S+)* - a non-capturing group that matches 0 or more sequences of a single whitespace (\s) and then 1+ non-whitespace chars (so, up to the double whitespace or end of string).
Scala demo:
val regex = """\d\.\d+\s+ZQ\s+(\S+(?:\s\S+)*)""".r.unanchored
val row = "DE1ALAT0002 32.4756 -86.4393 106.1 ZQ DISNAY LAND 2.0 GCP 23456"
val getName = row match {
case regex(name) => name
case _ =>
}
print(getName)
Output: DISNAY LAND 2.0 GCP
I want to extract a number bevore a list of specific Characters. I want to extract Volume, Pirce and more from different Websites.
For example I want to excract the Volume from here:
<td class="data">Single Malt Scotch Whisky der Marke Speyburn 10 Years 40% 0,7l Flasche</td>
or
<td class="data">Irish Whiskey der Marke Bushmills the Original 40% 1,0l Flasche</td>
I tried the following code:
re.findall("[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*?(?=l|L|Liter| Liter| l| L|ml)", string)
And this is the result:
First String = ['7'] and Second String = ['0']
How I get the complete number (0,7 and 1,0)?
For the Volume I tryed to convert the comma into a dot. This works fine for the volume but not for the price.
if ',' in string:
string= string.replace(',', '.')
If it is possibible, I want to use the regex also for the price. The difficulty here are the different types of numbers.
Following types are available:
10.00€
10,00€
1,234.56€
1.234,56€
You may use
[-+]?\.?\d+(?:[.,]\d+)*(?= ?[mM]?[lL])
See the regex demo. To match the measurement units as whole words, add \b word boundary at the end of the lookahead pattern, (?= ?(?:[mM]?[lL]|[Ll]iter)\b).
Details
[-+]? - an optional - or +
\.? - an optional .
\d+ - 1+ digits
(?:[.,]\d+)* - 0 or more occurrences of a dot or comma and then 1+ digits
(?= ?[mM]?[lL]) - a positive lookahead that matches a location that is immediately followed with
\? - an optional space (you may use \s? here to match any whitespace)
[mM]? - an optional m or M
[lL] - l or L.
Note that you do not need Liter alterantive in the lookahead if you use (?= ?[mM]?[lL]), but if you use a word boundary, you will need to use a Liter alternative.
I have lines of text as follows. I only want the first date after Examination date so that the expected output is 10.08.2017
Examination Date
date: 10.08.2017
423432
tert
g
534534
Examination Date: 04-07-2017
so far I have tried:
Examination Date.*?\d{2}.?{2}?.\d{4}
but I get the entire result to 04-07-2017
Fix the pattern by adding \d before the {2}? and removing unnecessary ?s abd capture the value you need:
String s = "Examination Date \n\ndate: 10.08.2017 \n423432\n\ntert\n\ng\n\n534534\n\nExamination Date: 04-07-2017";
Pattern pattern = Pattern.compile("Examination Date.*?\\b(\\d{2}\\W\\d{2}\\W\\d{4})\\b", Pattern.DOTALL);
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1)); // => 10.08.2017
}
See the Java demo and the regex demo. In the code, you only get the first match as if is used, not while, and the . matches line breaks thanks to the Pattern.DOTALL modifier.
Details
Examination Date - a literal substring
.*? - any 0+ chars, as few as possible
\\b - a word boundary (if you do not care about matching the date as a "whole" word, remove the \\b)
(\\d{2}\\W\\d{2}\\W\\d{4}) - Group 1:
\\d{2} - 2 digits
\\W - any non-word char (punctuation, space, symbol)
\\d{2}\\W - as above
\\d{4} - 4 digits
\\b - a trailing word boundary.
I need to match these values:
(First approach to a regex that roughly does what I want)
\d+([.,]\d{3})*[.,]\d{2}
like
24,56
24.56
1.234,56
1,234.56
1234,56
1234.56
but I need to not match
1.234.56
1,234,56
So somehow I need to check the last occurrence of "." or "," to not be the same as the previous "." or ",".
Background: Amounts shall be matched in English and German format with (optional) 1000-Separators.
But even with help of regex101 I completely fail at coming up with a correctly working look-behind. Any suggestions are highly appreciated.
UPDATE
Based on the answers I got so far, I came up with this (demo):
\d{1,3}(?:([\.,'])?\d{3})*(?!\1)[\.,\s]\d{2}
But it matches for example 1234.567,23 which is not desirable.
You may capture the digit grouping symbol and use a negative lookahead with a backreference to restrict the decimal separator:
^(?:\d+|\d{1,3}(?:([.,])\d{3})*)(?!\1)[.,]\d{2}$
^ ^ ^^^^^
See the regex demo
Group 1 will contain the last value of the digit grouping symbol and (?!\1)[.,] will match the other symbol.
Details:
^ - start of string
(?:\d+|\d{1,3}(?:([.,])\d{3})*) - either of the two alternatives:
\d+ - 1+ digits
| - or
\d{1,3} - 1 to 3 digits,
(?:([.,])\d{3})* - zero or more sequences of:
([.,]) - Group 1 capturing . or ,
\d{3} - 3 digits
(?!\1)[.,] - a . or , but not equal to what was last captured with ([.,]) pattern above
\d{2} - 2 digits
$ - end of string.
You can use
^\d+(([.,])\d{3})*(?!\2)[.,]\d{2}$
live demo