I have the following sentence
When the Forward Sensing Camera (FSC) detects a vehicle ahead or pedestrian and determines
that a collision with the object is unavoidable while the vehicle is driven at a vehicle
speed of about 4 to 80 km/h (2 to 50 mph) if the object is a vehicle ahead and 2 to 50km/hr if the object is pedestrian
My goal is to get all the speed ranges. Currently, I am using the regex
\d+ to \d+\s?(km\/hr|km\/h| mph)
The only issue is that I have hard-coded a to in the regex. The speed could also be specified as 5 - 25 kmph.
I am lost as to what a generic character sequence could be to cater to anything between two numbers
You can make the k optional and use an alternation:
\b\d+ (?:-|to) \d+\s?(?:km\/hr?| k?mph)\b
The pattern matches:
\b A word boundary
\d+ Match 1+ digits and
(?:-|to) Match either - or to
\d+\s? Match 1+ digits with an optional whitespace char
(?: Non capture group for the alternatives
km\/hr?| k?mph Match either km/h km/hr mph kmph
) Close the group
\b A word boundary
See a regex101 demo
Note that there is also a space in k?mph which you match 2 spaces as there is also \s?
If you don't want 2 spaces, you could write it as:
\b\d+ (?:-|to) \d+(?: ?km\/hr?| k?mph)\b
Related
I am trying to analyse my source code (written in C) for not corresponding timer variable comparisons/allocations. I have a rage of timers with different timebases (2-250 milliseconds). Every timer variable contains its granularity in milliseconds in its name (e.g. timer10ms) as well as every timer-photo and define (e.g. fooTimer10ms, DOO_TIMEOUT_100MS).
Here are some example lines:
fooTimer10ms = timer10ms;
baaTimer20ms = timer10ms;
if (DIFF_100MS(dooTimer10ms) >= DOO_TIMEOUT_100MS)
if (DIFF_100MS(dooTimer10ms) < DOO_TIMEOUT_100MS)
I want to match those line where the timebases are not corresponding (in this case the second, third and fourth line). So far I have this regex:
(\d{1,3}(?i)ms(?-i)).*[^\d](\d{1,3}(?i)ms(?-i))
that is capable of finding every line where there are two of those granularities. So instead of just line 2, 3 and 4 it matches all of them. The only idea I had to narrow it down is to add a negative lookbehind with a back-reference, like so:
(\d{1,3}(?i)ms(?-i)).*[^\d](\d{1,3}(?i)ms(?-i))(?<!\1)
but this is not allowed because a negative lookbehind has to have a fixed length.
I found these two questions (one, two) but the fist does not have the restriction of having both capture groups being of the same kind and the second is looking for equal instances of the capture group.
If what I want can be achieved way easier, by using something else than regex, I would be happy to know. My mind is just stuck due to my believe that regex is capable of that and I am just not creative enough to use it properly.
One option is to match the timer part followed by the digits and use a negative lookahead with a backreference to assert that it does not occur at the right.
For the example data, a bit specific pattern using a range from 2-250 might be:
.*?(timer(?:2[0-4]\d|250|1?\d\d|[2-9])ms)\b\S*[^\S\r\n]*[<>]?=[^\S\r\n]*\b(?!\S*\1)\S+
The pattern matches
.*? Match any char except a newline, as least as possible (Non greedy)
( Capture group 1
timer Match literally
(?:2[0-4]\d|250|1?\d\d|[2-9]) Match a digit in the range of 2-250
ms Match literally
)\b Close group and a word boundary
\S*[^\S\r\n]* Match optional non whitespace chars and optional spaces without newlines
[<>]?= Match an optional < or > and =
[^\S\r\n]*\b Match optional whitespace chars without a newline and a word boundary
(?!\S*\1) Negative lookahead, assert no occurrence of what is captured in group 1 in the value
\S+ Match 1+ non whitespace chars
Regex demo
Or perhaps a broader pattern matching 1-3 digits and optional whitespace chars which might also match a newline:
.*?(timer\d{1,3}ms\b)\S*\s*[<>]?=\s*\b(?!.*\1)\S+
Regex demo
Note that {1-3} should be {1,3} and could also match 999
I need to extract any number between 4-10 digits that following directly after 'PO#' OR 'PO# ' (with a whitespace). I do not want to include the PO# with the actual value that is extracted, however I do need it as criteria to target the value within a string. If the digits are less than 4 or greater than 10, I do not wish to capture the value and would like to otherwise ignore it.
A sample string would look like this:
PO#12445 for Vendor Enterprise
or
Invoice# 21412556 for Vendor Enterprise for PO# 12445
My current RegEX expression captures PO# with '#' and I use additional logic after the fact to remove the '#', however my expression is also capturing Invoice# and Inv# which I don't want it to do. I'd like it to only target PO#.
Current Expression: [P][O][#]\s*[0-9]{3,9}\d+\w
Any help would be greatly appreciated!
If you need only the digits, you can use \b(?<=PO#)\s?(\d{4,10})\b, with:
(?<=PO#): positivive lookbehind, be sure that this pattern is present before the needed pattern (PO followed by #)
\s?: 0 or 1 whitespace
(\d{4,10}): between 4 and 10 digits
\b: word boundaries to avoid ie. the 10 first digits of a 11 digits pattern match or 'SPO#' to match
Edit: Alexander Mashin is right about the lookbehind having to be fixed width, so \b(?<=PO#)\s?(\d{4,10})\b is better https://regex101.com/r/1KBQd1/5
Edit: added word boundaries
You can use a capturing group and repeat matching the digits 4-10 times using [0-9]{4,10}.
Note that [P][O][#] is the same as PO#
\bPO#\s*([0-9]{4,10})\b
\bPO#\s* Match PO# preceded by a word boundary and match 0+ whitespace chars
( Capture group 1
[0-9]{4,10} Match 4 - 10 digits
)\b Close group followed by a word boundary to prevent the match being part of a larger word
Regex demo
If PCRE is available, how about:
PO#\s*\K\d{4,10}(?=\D|$)
PO#\s* matches the leading substring "PO#" followed by 0 or more whitespaces.
\K resets the starting position of the match and works as a positive (zero length) lookbehind.
\d{4,10} matches a sequence of digits of 4 <= length <= 10.
(?=\D|$) is the positive lookahead to match a non-digit character or the end of the string.
I want to extract a number bevore a list of specific Characters. I want to extract Volume, Pirce and more from different Websites.
For example I want to excract the Volume from here:
<td class="data">Single Malt Scotch Whisky der Marke Speyburn 10 Years 40% 0,7l Flasche</td>
or
<td class="data">Irish Whiskey der Marke Bushmills the Original 40% 1,0l Flasche</td>
I tried the following code:
re.findall("[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*?(?=l|L|Liter| Liter| l| L|ml)", string)
And this is the result:
First String = ['7'] and Second String = ['0']
How I get the complete number (0,7 and 1,0)?
For the Volume I tryed to convert the comma into a dot. This works fine for the volume but not for the price.
if ',' in string:
string= string.replace(',', '.')
If it is possibible, I want to use the regex also for the price. The difficulty here are the different types of numbers.
Following types are available:
10.00€
10,00€
1,234.56€
1.234,56€
You may use
[-+]?\.?\d+(?:[.,]\d+)*(?= ?[mM]?[lL])
See the regex demo. To match the measurement units as whole words, add \b word boundary at the end of the lookahead pattern, (?= ?(?:[mM]?[lL]|[Ll]iter)\b).
Details
[-+]? - an optional - or +
\.? - an optional .
\d+ - 1+ digits
(?:[.,]\d+)* - 0 or more occurrences of a dot or comma and then 1+ digits
(?= ?[mM]?[lL]) - a positive lookahead that matches a location that is immediately followed with
\? - an optional space (you may use \s? here to match any whitespace)
[mM]? - an optional m or M
[lL] - l or L.
Note that you do not need Liter alterantive in the lookahead if you use (?= ?[mM]?[lL]), but if you use a word boundary, you will need to use a Liter alternative.
I am attempting to pick apart data from the following string utlizing a regex expression:
Ethane, C2 11.7310 3.1530 13.9982 HV, Dry # Base P,T 1432.00
The ultimate goal is to be able to pull out the middle three data points as individual values 11.7310, 3.153, 13.9982
The code expression I am working with at the moment is as follows:
(?<=C2 )(\d*\.?\d+)
This yields a full match of 11.7310 and a Group 1 match of 11.7310, but I can't figure out how to match the other two data points.
I am using PCRE (PHP) to create my expression.
You may use
(?:\G(?!^)|\bC2)\s+\K\d*\.?\d+
See the regex demo.
Details
(?:\G(?!^)|\bC2) - either the end of the previous successful match or C2 whole word
\s+ - 1+ whitespaces
\K - match reset operator discarding all the text matched so far in the match memory buffer
\d* - 0+ digits
\.? - an optional dot
\d+ - 1+ digits.
Im using this regex code in excel to find the desired text in a paragraph:
=RegexExtract(B2,"(bot|vehicle|scrape)")
This code will successfully return all 3 of the words if they are found on a paragraph, what I would like to do as an extra is for the regex to return the desired text in bold along with few words in front and 3 words in the back of the selected word.
Example of text:
A car (or automobile) is a wheeled motor vehicle used for transportation.
Most definitions of car say they run primarily on roads, seat one to eight people,
have four tires, and mainly transport people rather than goods.
Example output:
a wheeled motor **vehicle** used for transportation
I want a portion of the text to appear in order for the receiver to be able to pinpoint easier the location of the text.
Any alternative approach is much appreciated.
You may use
=RegexExtract(B2,"(?:\w+\W+(?:\w+\W+){0,2})?(?:bot|vehicle|scrape)(?:\W+\w+(?:\W+\w+){0,2})?")
See the regex demo and the Regulex graph:
Details: The pattern is enclosed with capturing parentheses to make REGEXEXTRACT actually extract the string you need that meets the following pattern:
(?:\w+\W+(?:\w+\W+){0,2})? - an optional sequence of a word followed with non-word chars that is followed with zero, one or two repetitions of 1+ word chars and then 1+ non-word chars
(?:bot|vehicle|scrape) - a bot, vehicle or scrape word
(?:\W+\w+(?:\W+\w+){0,2})? - an optional sequence of 1+ non-word chars and then 1+ word chars followed with zero, one or two repetitions of 1+ non-word chars and then 1+ word chars.
Google Spreadsheets test: