I want to extract the number sandwiched between two specific letters.
e.g. string: x23y4z90
I specify x and y , I get 23
I specify y and z , I get 4
I specify z and x , I get 90 (the string pattern loops)
x\dy yields x23y, but I don't want the letters included.
*note: This is to read sensor values serially in LabVIEW.
One possibility is to use groups:
x(\d+)y
Now, the second group will contain only the number. The first group will be the whole match.
Another possibility is to use positive lookahead and positive lookbehind:
(?<=x)\d+(?=y)
Please note the + I added. This is necessary to match numbers with multiple digits.
Check it here for x and y and here for y and z.
You need to use lookarounds or groups
(?<=x)\d+(?=y)
----- ----
| |->only checks if y is after a digit(lookahead)
|->only checks if x is before a digit(lookbehind)
Related
I'm using a string to store a quadratic equation and then resolve it via quadratic formula.
So i require a regex for removing elements like x and/or x[^2], but i really i don't know how to create a regex unless i use replace for each symbol/letter/number i want to remove.
Ex:
Input: 2x^2+4x+6
Output: 2 4 6
Here is a regex that matches a quadratic polynomial:
^(?:([+-]?\d+)x\^2)?(?:([+-]?\d+)x)?([+-]?\d+)?$
Group 1 is the coefficient of x^2, Group 2 is the coefficient of x, Group 3 is the constant. A group with an empty string means it's a 1. You just get those groups and plug them into the quadratic formula. Remember to remove all whitespace from the string before using the regex.
Note that this regex matches things like 6x3 and interprets it as 6x+3.
Demo
In a pattern X-Y-Z where the delimiters are "-" i want to check if Y has the size 8 without repetions.
Y could be a subset like Y = (A-B-C) but Y just has a value 1 if there's no
1 - num-12345678-num -> In this case I want that Y has a value.
2 - num-12345678-234-213-num -> Since Y is a subset (12345678-234-213) Y should have a different value.
The reggex i'm using is '-([0-9]*)-' and works for the 1st case however gets the same value for the second. Could anyone help me?
Thanks in advance
You may add a hyphen to the character class:
-([0-9-]*)-
^
See the regex demo
If you put it at the end of the char class, you do not need to escape it.
Details:
- - a hyphen
([0-9-]*) - Group 1 capturing zero or more (due to the * quantifier) digits or/and hyphens
- - a literal hyphen again.
I'm having difficulties extracting irregular data using Regex. I attempted to use Lookheads however when the value doesn't exist the entire match returns false. The data set is consistent all the way until I reach the characters starting with RXX. The RXX are unique identifiers (groups) and the numeric values in between each set of Rxx's is what I would like to capture and assigned them to group names.
The Rxx values are random from R01 to R15 and 1 to all 15 could exist in the string.
The string values could vary from
12*000000000**S304JB01811*8*0*8*4*4*34R0332R152~~~
12*000000000**S304JB01811*9*0*4*3*4*224R023R032R10234R1325~~~
I'm able to extract the values and assign a group name until I reach the Rxx
My attempt are extracting the values are as follow
S304JB0...(?<Total1>[\d]+).(?<Total2>[\d]+).(?<Total3>[\d]+).(?<Total4>[\d]+).(?<Total5>[\d]+).(?<Total6>[\d]+).(?<Total7>[\d]+)
Which gives me what I want below
Total1 `1`
Total2 `8`
Total3 `0`
Total4 `8`
Total5 `4`
Total6 `4`
Total7 `34`
Capturing the R03 value and assigning it to Row is achieved below but if the value R03 doesn't exist in the string then the entire match returns false
(?<Row3>(R03)[\d]+)
Looking how I can make these regex statements optional allowing me to return the following
Total1 `1`
Total2 `8`
Total3 `0`
Total4 `8`
Total5 `4`
Total6 `4`
Total7 `34`
Row1 `32`
Row15 `2`
S304JB0...(?<Total1>[\d]+).(?<Total2>[\d]+).(?<Total3>[\d]+).(?<Total4>[\d]+).(?<Total5>[\d]+).(?<Total6>[\d]+).(?<Total7>[\d]+)(?<Row3>(R03)[\d]+)(?<Row4>(R04)[\d]+) ------> (?<Row15>(R15)[\d]+)
Thanks for your help
-Edited
Thanks for the quick reply Jorge
The input data will be
12*000000000**S304JB01811*8*0*8*4*4*34R0332R152~~~
The output will be 9 captured groups results
Group | Result
Total1 = 1
Total2 = 8
Total3 = 0
Total4 = 8
Total5 = 4
Total6 = 4
Total7 = 34
Row1 = 32
Row15 = 2
My example is shared below with input and
https://regex101.com/r/wG3aM3/68
Hopefully this helped to clarify things
D.
I'm certain this would be easier parsing char by char and storing each value.
As for the regex question, basically what you want to do is create all the groups, just like you've already tried, but you also want to make them optional, because not all groups might be there.
You can make the group optional with a construct like:
(?:R01(?<Row1>\d+))?
So you should add one of each to get the values in different capture groups. Notice I used the construct (?:non-capturing) which is exactly the same as a group, but it doesn't create a backreference. You can read about it here.
Edit: One more thing. You're using a . to allow any delimiter. However, performance-wise it would be better to use something like \D (anything except digits). In case of failure, it saves the regex engine quite a few backtracking steps.
This would be the whole expression, assuming the Rxx groups are always ordered.
S304JB0...(?<Total1>\d+)\D(?<Total2>\d+)\D(?<Total3>\d+)\D(?<Total4>\d+)\D(?<Total5>\d+)\D(?<Total6>\d+)\D(?<Total7>\d+)(?:R01(?<Row1>\d+))?(?:R02(?<Row2>\d+))?(?:R03(?<Row3>\d+))?(?:R04(?<Row4>\d+))?(?:R05(?<Row5>\d+))?(?:R06(?<Row6>\d+))?(?:R07(?<Row7>\d+))?(?:R08(?<Row8>\d+))?(?:R09(?<Row9>\d+))?(?:R10(?<Row10>\d+))?(?:R11(?<Row11>\d+))?(?:R12(?<Row12>\d+))?(?:R13(?<Row13>\d+))?(?:R14(?<Row14>\d+))?(?:R15(?<Row15>\d+))?
DEMO
I'm parsing out flight info.
Here's the sample data:
E0.777 7 3:09
E0.319 N 1:43
E0.735 8 1:45
E0.735 N 1:48
E0.M80 9 3:21
E0.733 1:48
I need to populate fields like this:
Equipment: 735
On Time: N
Duration: 1:48
Problem I'm having is capturing the Y or N character but ignoring the single digit, then capturing the duration.
This is the expression I have tried:
#"^.{3}(.{3})\s?([N|Y]?)?(?:[0-9]\s+)?(\w{4})"
Edit: I updated the sample data to clarify my question. Equipment is not always three digits, it could be a character and two digits. The data between the equipment and the duration could be a boolean N or Y, a single digit, or white space. Only the boolean should be captured.
Firstly, you mix up the concepts of alternation and character classes [Y|N] would match 3 different characters: Y or | or N. Either use (...) or leave out the pipe.
Secondly your double ? after the character class does not really do anything. Thirdly, at the end you only match consecutive spaces if a digit was found. But if there is no digit, the last ? will ignore the subpattern, thus not allowing spaces either.
Lastly, \w does not match :.
Try this:
#"^.{3}(\d{3})\s?(?:([NY])|\d)\s+(\d:\d\d)"
You should also think about restricting the repeated . at the beginning to a more precise character class (i.e \w{2}\., but I don't know the possibilities there).
#"^..\.(\d{3})\s(?:([YN])|\d)\s*(\S{4})"
Changed .{3} to ..\. which is a bit more specific about there being a literal . for character 3.
(?:([YN])|\d) matches either Y/N or a digit, but only captures a Y or N. Notice that it's [YN] not [Y|N].
Changed \w{4} to \S{4} since \w doesn't match colons :.
This will do it...
^\w\d\.(\d{3})\s(?:([YN])|\d)\s*(\d:\d{2})$
I made some other changes to your regex because it was easier for me to just rewrite it based off your data then to try to modify what you had.
This will capture the Y or N or it won't capture anything in that group. I also tried to be more specific with your duration regex.
Update: This works with your new requirements...
^\w\d\.(\w{3})\s(?:([YN])|\d|\s)\s*(\d:\d{2})$
You can see it working on your data here... http://regexr.com?32j1b
(hover over each line to see the matched groups)
This captures all lines with Y or N and ignores everything else:
^...(\d{3})\s*([YN])\s*(\d+:\d+)
Better explained with examples:
HHH
HHHH
HHHBBHHH
HHHBH
BB
HHBH
I need to come up with a regexp that matches only 3 H's or a multiple of 3 H's (so 6, 9, 12, ... H's are ok as well) and 5 H's are not ok. And if possible I don't want to use Perl regexps.
So for the input above the regexp would match (1), (3) and (6) only.
I'm just starting with regular expressions here so I don't exactly know how I'm supposed to approach this.
edit
Just to clear something up:, an H can only be in one group of 3 H's. The group of 3 H's might be HHH or HHBH.
That's why in example 2 above it is not a match because the last H is not in a group of 3 H's. And you can't take the last 3 H's in a group because the middle 2 H's have already been inside a group before.
You can use the following regular expression:
^([^H]*H[^H]*H[^H]*H[^H]*)+$
It matches any string which contains in total 3 H or any multiple of 3. In between there might be any other character.
Explanation:
^ begin of string
( start of group
[^H]*H any string of characters (or none) not including 'H' plus a single 'H'
[^H]*H any string of characters (or none) not including 'H' plus a single 'H'
[^H]*H any string of characters (or none) not including 'H' plus a single 'H'
[^H]* any string of characters (or none) which is not 'H'
)+ containing the group once or twice or ...
$ end of string
By repeating the subpattern [^H]*H three times we make sure that there are indeed 3 H included, [^H]* allows any separating characters.
Note: use either egrep or run grep with additional argument -E.
Use this to match a multiple of 3 H's:
(H{3})+
Here is a complete regex for your examples:
^(H{3})+B*(H{3})*$
Edit: It looks like you need to count non-consecutive H's. In that case:
^(([^H]*H){3})+[^H]*$
That should match any string with a multiple of 3 H's.
Given the requirement that H's can be arbitrarily interleaved with non-H's, but that the total number of H's must be a non-zero multiple of 3 (so XXX, containing no H's, is not a match), then the total regular expression is anything but trivial. This is not a beginner's regular expression.
I'm going to assume that the dialect of regular expression treats {} and () as metacharacters for counting and grouping, and includes + for one-or-more. If you're using a regular expression system that has a different requirement (\{\}, for example) then adjust accordingly.
You need the regex to match the whole string, so there are no stray H's allowed. So, it must start with ^ and end with $. You need to allow an arbitrary number of non-H's at front and back. The H's may be separated by an arbitrary number of non-H's. That leads to:
^([^H]*H[^H]*H[^H]*H)+[^H]*$
Ouch; that is hard to read! It says the line must consist of 1 or more (+) groups of an arbitrary number of non-H's followed by an H, an arbitrary number of non-H's, another H, an arbitrary number of non-H's and a third H; all of which can be followed by an arbitrary number of non-H's.
Using the {} for counting:
^(([^H]*H){3})+[^H]*$
That's still hard to read. Note that my description said "arbitrary number of non-H's at front and back", but I only use the [^H]* at the back; that's because the repeating pattern allows an arbitrary number of non-H's at the front anyway so there's no need to repeat that fragment.