Regex - Multiple matches, colon and empty spaces - regex

I'm trying to get some RegEx working but fail slightly at my specific use case.
Given the following string for example
Device-1: P0_Node0_Channel0_Dimm0 size: 32 GB speed: 2133 MHz type: DDR4
I want to extract informations from this string preferably like that:
Device-1: P0_Node0_Channel0_Dimm0
size: 32 GB
speed: 2133 MHz
type: DDR4
So I tried a bit around and tested some expressions
(.*?):\s
Does work to some regard. Catches the first parameter name properly but after that messes up with the spaces.
:\s(.*?)\s\w*?:!?
Although this catches the empty space in the third parameter value, it only gives me the first and the third value. Also no parameter names.
Someone has an idea how I could achieve the expected behaviour?
Note: I'm doing this in Excel VBA, not sure if all functions are supported there.
Thanks

You may use
([^\s:]+):\s*(.*?)\s*(?=[^\s:]+:|$)
See the regex demo
Details
([^\s:]+) - Group 1: one or more chars other than whitespace and :
: - a colon
\s* - zero or more whitespaces
(.*?) - Group 2: any 0+ chars other than line break chars up to the first occurrence of...
\s* - zero or more whitespaces that are followed with...
(?=[^\s:]+:|$) - one or more chars other than whitespace and : or end of string

This is an Autoit example:
#include <Array.au3>
Local $str = 'Device-1: P0_Node0_Channel0_Dimm0 size: 32 GB speed: 2133 MHz type: DDR4'
$r_A = StringRegExp($str, '([\w-]*\:)\s*(\w*\s|.*)', 3)
ConsoleWrite(_ArrayToString($r_A, #CR) & #crlf)

Related

How to create a matching regex pattern for "greater than 10-000-000 and lower than 150-000-000"?

I'm trying to make
09-546-943
fail in the below regex pattern.
​^[0-9]{2,3}[- ]{0,1}[0-9]{3}[- ]{0,1}[0-9]{3}$
Passing criteria is
greater than 10-000-000 or 010-000-000 and
less than 150-000-000
The tried example "09-546-943" passes. This should be a fail.
Any idea how to create a regex that makes this example a fail instead of a pass?
You may use
^(?:(?:0?[1-9][0-9]|1[0-4][0-9])-[0-9]{3}-[0-9]{3}|150-000-000)$
See the regex demo.
The pattern is partially generated with this online number range regex generator, I set the min number to 10 and max to 150, then merged the branches that match 1-8 and 9 (the tool does a bad job here), added 0? to the two digit numbers to match an optional leading 0 and -[0-9]{3}-[0-9]{3} for 10-149 part and -000-000 for 150.
See the regex graph:
Details
^ - start of string
(?: - start of a container non-capturing group making the anchors apply to both alternatives:
(?:0?[1-9][0-9]|1[0-4][0-9]) - an optional 0 and then a number from 10 to 99 or 1 followed with a digit from 0 to 4 and then any digit (100 to 149)
-[0-9]{3}-[0-9]{3} - a hyphen and three digits repeated twice (=(?:-[0-9]{3}){2})
| - or
150-000-000 - a 150-000-000 value
) - end of the non-capturing group
$ - end of string.
This expression or maybe a slightly modified version of which might work:
^[1][0-4][0-9]-[0-9]{3}-[0-9]{3}$|^[1][0]-[0-9]{3}-[0-9]{2}[1-9]$
It would also fail 10-000-000 and 150-000-000.
In this demo, the expression is explained, if you might be interested.
This pattern:
((0?[1-9])|(1[0-4]))[0-9]-[0-9]{3}-[0-9]{3}
matches the range from (0)10-000-000 to 149-999-999 inclusive. To keep the regex simple, you may need to handle the extremes ((0)10-000-000 and 150-000-000) separately - depending on your need of them to be included or excluded.
Test here.
This regex:
((0?[1-9])|(1[0-4]))[0-9][- ]?[0-9]{3}[- ]?[0-9]{3}
accepts (space) or nothing instead of -.
Test here.

Regex - Alteryx - Parse - How to find an expression starting by the end of the string

I need to parse the following expression:
Fertilizer abc 7-15-15 5KG BOX 250 KG
in 3 fields:
The product description: Fertilizer abc 7-15-15
Size: 250
Size unit: KG
Do not know how to proceed. Please, any help and explanation?
Try this in the alteryx REGEX Tool with Parse selected as the Method:
([A-z ]* [\d-]{6,8}) ([A-Z\d]{2,6}) (.{1,5}?) (\d*) ([A-Z]*)
You can test it at Regexpal to see the breakdown of each group but essentially the first set of brackets will get you your product description (text and spaces until 6-8 characters made up of digits and dashes), the 2nd & 3rd parts will deal with the erroneous info that you don't want, the 4th group will be just digits and the 5th group will be any text afterwards.
Note that this will change dramatically if your data has digits where there is characters currently etc.
You can always break it up into even smaller groups and then concatenate back together as well.

Regex to obtain two values from string

I have string like below and i need to pull out two values (numeric values) one is 197kJ (numeric 197) and second is 47kcal (numeric 47). Can someone help me with this because I just go crazy :) ?
My regular expression:
((<|>)?\d+((\.|,)\d+)?kj\s?\/\s?)?(<|>)?(\d+((\.|,)\d+)?)kcal
String to search in:
Per 250ml serving (10 servings per pack): Energy 197kJ (2% ADH)/47kcal
(2% ADH), Fat 0.3g (of which Saturated Fat 0.1g), Carbohydrate 7.8g
(3% ADH) (of which Sugars 3.9g (4% ADH)), Fibres 1.6g, Protein 2.2g
(4% ADH), Salt 1.6g (27% ADH)
Just do:
\d+(?:kcal|kJ)
# require at least one number
# followed by either kcal or kJ
See a demo on regex101.com (or yours: https://regex101.com/r/uS3mE4/3)
Your regex pattern looks overcomplicated. Probably it is because it serves more complex job than described.
But your taks (get numeric values of kJ and kcal) can be done using pattern like:
(\d+[.,]?\d+)(?:kJ|kcal)
here is my proposition:
(\d+)kJ.*?(\d+)kcal
it extracts the kJ and the kcal number in two differents capturing
groups
it simply captures all numeric chars before "kJ" or "kcal"
substring
it uses lazy quantifier *? to avoid consuming the first
chars of the kcal number.
https://regex101.com/r/uS3mE4/4
You may use alternation and optional (0 or more) spaces between the number and measurement unit:
[<>]?(\d+(?:[.,]\d+)?)\s*k(?:cal|j)
To be used with the i case insensitive modifier.
See the regex demo.
Details:
[<>]? - an optional < or >
(\d+(?:[.,]\d+)?) - Group 1 capturing 1 or more digits, and then an optional sequence: a . or , and 1+ digits
\s* - zero or more whitespaces
k - literal k
(?:cal|j) - either a cal or j.

re.search( ): (\d+) matches only a single digit

I want to parse the value 387 KB/s from the string:
str1 = '2015-07-02 02:05:02 (387 KB/s)'
The regular expression I have written for it is this:
mbps = re.search('\d+-\d+-\d+ \d+:\d+:\d+ .*(\d+) (.*/s)',str1)
var = mbps.group(1)
Printing var gives me only 7 instead of 387 i.e. it matches only a single digit.
Please suggest how can I get the complete number i.e. 387?
Thanks.
The problem is that .* is greedy (matching as much as it can) and it can also match digits, so it matches (38, leaving only 7 for the \d+ (which, since it has successfully matched, sees no reason to expand its match).
One possible solution would be to make the quantifier lazy:
mbps = re.search(r'\d+-\d+-\d+ \d+:\d+:\d+ .*?(\d+) (.*/s)',str1)
A better solution would be more specific, for example disallowing digits:
mbps = re.search(r'\d+-\d+-\d+ \d+:\d+:\d+ [^\d]*(\d+) (.*/s)',str1)
Also, always use raw strings with regexes.

Regular Expressions in R

I found somewhat similar questions
R - Select string text between two values, regex for n characters or at least m characters,
but I'm still having trouble
say I have a string in r
testing_String <- "AK ADAK NAS PADK ADK 70454 51 53N 176 39W 4 X T 7"
And I need to be able to pull anything between the first element in the string that contains 2 characters (AK) and PADK,ADK. PADK and ADK will change in character but will always be 4 and 3 characters in length respectively.
So I would need to pull
ADAK NAS
I came up with this but its picking up everything from AK to ADK
^[A-Za-z0_9_]{2}(.*?) +[A-Za-z0_9_]{4}|[A-Za-z0_9_]{3,}
If I understood your question correctly, this should do the trick:
\b[A-Z]{2}\s+(.+?)\s+[A-Z]{4}\s+[A-Z]{3}\b
Demo
You'll have to switch the perl = TRUE option (to use a decent regex engine).
\b means word boundary. So this pattern looks for a match starting with a 2-letter word and ending with a 4 letter word followed by a 3 letter word. Your value will be in the first group.
Alternatively, you can write the following to avoid using the capturing group:
\b[A-Z]{2}\s+\K.+?(?=\s+[A-Z]{4}\s+[A-Z]{3}\b)
But I'd prefer the first method because it's easier to read.
Lookbehind is supported for perl=TRUE, so this regex will do what you want:
(?<=\w{2}\s).*?(?=\s+[^\s]{4}\s[^\s]{2})