I'm having difficulties extracting irregular data using Regex. I attempted to use Lookheads however when the value doesn't exist the entire match returns false. The data set is consistent all the way until I reach the characters starting with RXX. The RXX are unique identifiers (groups) and the numeric values in between each set of Rxx's is what I would like to capture and assigned them to group names.
The Rxx values are random from R01 to R15 and 1 to all 15 could exist in the string.
The string values could vary from
12*000000000**S304JB01811*8*0*8*4*4*34R0332R152~~~
12*000000000**S304JB01811*9*0*4*3*4*224R023R032R10234R1325~~~
I'm able to extract the values and assign a group name until I reach the Rxx
My attempt are extracting the values are as follow
S304JB0...(?<Total1>[\d]+).(?<Total2>[\d]+).(?<Total3>[\d]+).(?<Total4>[\d]+).(?<Total5>[\d]+).(?<Total6>[\d]+).(?<Total7>[\d]+)
Which gives me what I want below
Total1 `1`
Total2 `8`
Total3 `0`
Total4 `8`
Total5 `4`
Total6 `4`
Total7 `34`
Capturing the R03 value and assigning it to Row is achieved below but if the value R03 doesn't exist in the string then the entire match returns false
(?<Row3>(R03)[\d]+)
Looking how I can make these regex statements optional allowing me to return the following
Total1 `1`
Total2 `8`
Total3 `0`
Total4 `8`
Total5 `4`
Total6 `4`
Total7 `34`
Row1 `32`
Row15 `2`
S304JB0...(?<Total1>[\d]+).(?<Total2>[\d]+).(?<Total3>[\d]+).(?<Total4>[\d]+).(?<Total5>[\d]+).(?<Total6>[\d]+).(?<Total7>[\d]+)(?<Row3>(R03)[\d]+)(?<Row4>(R04)[\d]+) ------> (?<Row15>(R15)[\d]+)
Thanks for your help
-Edited
Thanks for the quick reply Jorge
The input data will be
12*000000000**S304JB01811*8*0*8*4*4*34R0332R152~~~
The output will be 9 captured groups results
Group | Result
Total1 = 1
Total2 = 8
Total3 = 0
Total4 = 8
Total5 = 4
Total6 = 4
Total7 = 34
Row1 = 32
Row15 = 2
My example is shared below with input and
https://regex101.com/r/wG3aM3/68
Hopefully this helped to clarify things
D.
I'm certain this would be easier parsing char by char and storing each value.
As for the regex question, basically what you want to do is create all the groups, just like you've already tried, but you also want to make them optional, because not all groups might be there.
You can make the group optional with a construct like:
(?:R01(?<Row1>\d+))?
So you should add one of each to get the values in different capture groups. Notice I used the construct (?:non-capturing) which is exactly the same as a group, but it doesn't create a backreference. You can read about it here.
Edit: One more thing. You're using a . to allow any delimiter. However, performance-wise it would be better to use something like \D (anything except digits). In case of failure, it saves the regex engine quite a few backtracking steps.
This would be the whole expression, assuming the Rxx groups are always ordered.
S304JB0...(?<Total1>\d+)\D(?<Total2>\d+)\D(?<Total3>\d+)\D(?<Total4>\d+)\D(?<Total5>\d+)\D(?<Total6>\d+)\D(?<Total7>\d+)(?:R01(?<Row1>\d+))?(?:R02(?<Row2>\d+))?(?:R03(?<Row3>\d+))?(?:R04(?<Row4>\d+))?(?:R05(?<Row5>\d+))?(?:R06(?<Row6>\d+))?(?:R07(?<Row7>\d+))?(?:R08(?<Row8>\d+))?(?:R09(?<Row9>\d+))?(?:R10(?<Row10>\d+))?(?:R11(?<Row11>\d+))?(?:R12(?<Row12>\d+))?(?:R13(?<Row13>\d+))?(?:R14(?<Row14>\d+))?(?:R15(?<Row15>\d+))?
DEMO
Related
I have a small issue, I am trying to get specific characters from a long string using regex but I am having trouble.
Workflow
Prometheus --> Grafana --> Variable (using regex)
I can't use anything other than Regex expressions to achieve this result
I am currently using this expression to grab the long string from some json output:
.*channel_id="(.*?)".*
FROM THIS
{account_id="XXXXXXX-xxxx-xxxx-xxxx-xxxxxxxxxx",account_name="testalpha",channel_id="s0022110430col0901241usa",channel_abbr="s0022109430col}
This returns a string that's ALWAYS 24 characters long:
s0022110430col0901241usa
PROBLEM:
I need to grab the 3 letters 'col' and 'usa' as they are the two teams that are playing, ideally I would be able to pipe the results from the first regex to get these values (the position is key, since the first value will ALWAYS be the 12-14th characters and the second value is the last 3 characters) if I could output these values in uppercase with the string "vs" in between to create a string such as:
COL vs USA
or
ARG vs BRA
I am open to any and every suggestion anyone may have
Thank you!
PS - The uppercase thing is 'nice to have' BUT not needed
I'm still learning RegEx, so this is all I could come up with:
For the col (first team):
(?<=(channel_id=".{11}))\w{3}
For the usa (second team):
(?<=(channel_id=".{21}))\w{3}
Can you define the channel_id?
It begins with 's' and then there are many numbers. If they are always numbers, you can use this regex:
channel_id=".[0-9]+([a-z]+)[0-9]+([a-z]+)
You will get 2 groups, one with "col" and the other with "usa".
Edit:
Or if you just know, that you have always the same size, you can use something like:
channel_id=".{11}([a-z]+).{7}([a-z]+)
I've created one text field which accepts the product code.
I have tried many ways and got disappointed.
The product code is having some validations like follows,
Product code :315299AZ
1.First 2 digits ranges from[01-31].,should not contain 00.
2.Second 2 digits ranges from [01-52]., should not contain 00.
3.Third 2 digits ranges from [00-99].
4.Last 2 are optional. But should accept only alphabets. Should not accepts numbers.
Please someone help me to get out of it.
You can use the following regex :
(?!00)(([0-2][0-9])|31|30)(?!00)(([0-4][0-9])|51|50|52)(\d{2})([a-zA-Z]{2})?
(?!00) is a negative look-ahead that doesn't allows 00.
Debuggex Demo
There you go:
((0[1-9])|([1-2]\d)|(3[0-1]))((0[1-9])|([1-4]\d)|(5[0-2]))\d{2}([a-zA-Z]{2})?
If you don't like look-aheads.
I know it's not the spirit, but any sensible language supporting regular expressions should allow you to access groups, hence do something along these lines (pseudocode follows):
if product_code matches /^(\d\d)(\d\d)\d\d([a-zA-Z]{2})?$/ {
assert 1 <= int($1) <= 31 // validate first group
assert 1 <= int($2) <= 52 // validate second group
}
Bonus: you can actually read it.
(This is assuming the last optional group contains either two or zero characters. If one character is acceptable, you can replace it with [a-zA-Z]{0,2})
I'm parsing a bunch of line items on an inventory list and while each line describes something similar, the text format was not standardized. I'm been working on a regex pattern for the past few days but I'm not having much luck with getting a pattern that can match all of my test scenarios. I hoping that someone with a lot more regex experience might be able to point out a few errors in the the pattern
Pattern To Match the palette number: \([Pp]alette [No\.\s]?#?(.*?)\),
1. Warehouse A, (Palette #91L41)
# Match Result Correct: 91L41
2. Warehouse B Palette No. 214
# Match Result Incorrect: no match
3. Warehouse Lot Storage C (Palette No. 9),
# Match Result Incorrect: o. 9 //I don't quite understand why it matches the o
4. Store Location D of Palette (Palette #1),
# Match Result Correct: 1
5. Store Location E of Palette, Empty, lot #45,
# Match Result Incorrect: no match
I've also tried to make the parenthesis optional so that it will match examples 2 and 5 but it's too greedy and included the previously mentioned lot word
Anything in brackets causes the engine to look for ONE of the provided characters. Your pattern successfully matches, for example, strings like: Palette Nabcdefg
To indicate one of different options, you'll need to use paranthesis. What you're actually looking for should look something like this: [Pp]alette (No\.?\s?|#)?(\d+?)
Though it seems highly ineffective to not standardize the pattern. Your last case for example could be completely incompatible since it seems to be capable of containing possibly any kind of input.
A little bit of explanation on matching your patterns with regular expressions. You really don't need to look for and match your parentheses ( .. ) in this case.
Let's say we want to just find any string with the word Palette that is followed with whitespace and the # symbol and capture the Palette sequence from it.
You could simply just use the following:
[Pp]alette\s+#([A-Z0-9]+)
This will result in capturing 91L41 and 1 from the matched patterns
1. Warehouse A, (Palette #91L41)
4. Store Location D of Palette (Palette #1)
Now say we want to find any string that has Palette, followed by whitespace and either a # symbol or No.
We can use a Non-capturing group for this. Non-capturing parentheses group the regex so you can apply regex operators, but do not capture anything.
So we could do something like:
[Pp]alette\s+(?:No[ .]+|#)([A-Z0-9]+)
Now this results in matching the following strings and capturing 91L41, 214, 9 and 1
1. Warehouse A, (Palette #91L41)
2. Warehouse B Palette No. 214
3. Warehouse Lot Storage C (Palette No. 9)
4. Store Location D of Palette (Palette #1)
And last if you want to match all the following strings and capture the Palette sequence.
[Pp]alette[\w, ]+(?:No[ .]+|#)([A-Z0-9]+)
See working demo and an explanation on this regular expression.
Everyone has a different way of using regular expressions, this is just one of many ways you can simply understand and accomplish this.
This should work for your case:
[Pp]alette.*?(?:No\.?|#)\s*(\w+)
This will search following types of patterns:
[Pp]alette{any_characters}No.{optonal_spaces}(alphanumeric)
[Pp]alette{any_characters}No{optonal_spaces}(alphanumeric)
[Pp]alette{any_characters}#{optonal_spaces}(alphanumeric)
Check it in action here
MATCH 1
1. [26-31] `91L41`
MATCH 2
1. [60-63] `214`
MATCH 3
1. [104-105] `9`
MATCH 4
1. [148-149] `1`
MATCH 5
1. [195-197] `45`
I want to extract the number sandwiched between two specific letters.
e.g. string: x23y4z90
I specify x and y , I get 23
I specify y and z , I get 4
I specify z and x , I get 90 (the string pattern loops)
x\dy yields x23y, but I don't want the letters included.
*note: This is to read sensor values serially in LabVIEW.
One possibility is to use groups:
x(\d+)y
Now, the second group will contain only the number. The first group will be the whole match.
Another possibility is to use positive lookahead and positive lookbehind:
(?<=x)\d+(?=y)
Please note the + I added. This is necessary to match numbers with multiple digits.
Check it here for x and y and here for y and z.
You need to use lookarounds or groups
(?<=x)\d+(?=y)
----- ----
| |->only checks if y is after a digit(lookahead)
|->only checks if x is before a digit(lookbehind)
I'm parsing out flight info.
Here's the sample data:
E0.777 7 3:09
E0.319 N 1:43
E0.735 8 1:45
E0.735 N 1:48
E0.M80 9 3:21
E0.733 1:48
I need to populate fields like this:
Equipment: 735
On Time: N
Duration: 1:48
Problem I'm having is capturing the Y or N character but ignoring the single digit, then capturing the duration.
This is the expression I have tried:
#"^.{3}(.{3})\s?([N|Y]?)?(?:[0-9]\s+)?(\w{4})"
Edit: I updated the sample data to clarify my question. Equipment is not always three digits, it could be a character and two digits. The data between the equipment and the duration could be a boolean N or Y, a single digit, or white space. Only the boolean should be captured.
Firstly, you mix up the concepts of alternation and character classes [Y|N] would match 3 different characters: Y or | or N. Either use (...) or leave out the pipe.
Secondly your double ? after the character class does not really do anything. Thirdly, at the end you only match consecutive spaces if a digit was found. But if there is no digit, the last ? will ignore the subpattern, thus not allowing spaces either.
Lastly, \w does not match :.
Try this:
#"^.{3}(\d{3})\s?(?:([NY])|\d)\s+(\d:\d\d)"
You should also think about restricting the repeated . at the beginning to a more precise character class (i.e \w{2}\., but I don't know the possibilities there).
#"^..\.(\d{3})\s(?:([YN])|\d)\s*(\S{4})"
Changed .{3} to ..\. which is a bit more specific about there being a literal . for character 3.
(?:([YN])|\d) matches either Y/N or a digit, but only captures a Y or N. Notice that it's [YN] not [Y|N].
Changed \w{4} to \S{4} since \w doesn't match colons :.
This will do it...
^\w\d\.(\d{3})\s(?:([YN])|\d)\s*(\d:\d{2})$
I made some other changes to your regex because it was easier for me to just rewrite it based off your data then to try to modify what you had.
This will capture the Y or N or it won't capture anything in that group. I also tried to be more specific with your duration regex.
Update: This works with your new requirements...
^\w\d\.(\w{3})\s(?:([YN])|\d|\s)\s*(\d:\d{2})$
You can see it working on your data here... http://regexr.com?32j1b
(hover over each line to see the matched groups)
This captures all lines with Y or N and ignores everything else:
^...(\d{3})\s*([YN])\s*(\d+:\d+)