regex capturing number format of 84.1 OR 95 - regex

This is my regex
my data:
1 2017-12 A 155749 131033 84.1;
2 2017-12 B 24869 23627 95;
3 2017-12 C 117618 117185 99.6;
my regex:
(?<serial>\d)\s+(?<date>\d+-\d+)\s+(?<type>\w)\s+(?<attempts>\d+)\s+(?<successfullAttempts>\d+)\s+(?<sr>\d+\.\d)
I am having trouble with the (?<sr>\d+\.\d) part it does not capture the 95. It captures the 99.6 and the 84.1.
I was trying to use an OR |
(?<sr>\d+\.\d|\d+)
How do i write this part so I can capture 95?

The data resembles some log file, and that means the data type is known, so you do not have to worry about pre-validation and the pattern can look simpler:
(?<serial>\d+)\s+(?<date>\d+-\d+)\s+(?<type>\w+)\s+(?<attempts>\d+)\s+(?<successfullAttempts>\d+)\s+(?<sr>\d[.\d]*)
See the regex demo
The last (?<sr>\d[.\d]*) capturing group matches a digit, and then 0+ digits or . chars.

Your regex could catch any form of number if you just use (?<sr>.+); at the end. Of course it captures anything else as well, but on the given example data it works well.

Related

Is it possible to negate a group in a regular expression?

Let's say that we have this text:
2020-09-29
2020-09-30
2020-10-01
2020-10-02
2020-10-12
2020-10-16
2020-11-12
2020-11-23
2020-11-15
2020-12-01
2020-12-11
2020-12-30
I want to do something like this:
\d\d\d\d-(NOT10)-(30)
So i want to get all dates of any year, but not of the 10th month and it is important, that the day is 30.
I tried a lot to do this using negative lookahead asserations but i did not come up with any working regexes.
You can use negative lookaheads:
\d\d\d\d-(?!10)\d\d-30
The Part (?!10) ensures that no 10 follows at the point where it is inserted into the regex. Notice that you still need to match the following digits afterwards, thus the \d\d part.
Generally speaking you can not (to my knowledge) negate a part that then also matches parts of the string. But with negative lookaheads you can simulate this as I did above. The generalized idea looks something like:
(?!<special-exclusion-pattern>)<general-inclusion-pattern>
Where the special-exclusion-pattern matches a subset of the general-inclusion-pattern. In the above case the general inclusion pattern is \d\d and the special exclusion pattern ins 10.
Try :
/20\d{2}-(?:0[1-9]|1[12])-30/
Explanation :
20\d{2} it will match 20XX
(?:0[1-9]|1[12]) it will match 0X or 11, 12
30 it will match 30
Demo :https://regex101.com/r/O2F1eV/1
It's easiest to simply convert the substring (if present) that matches /^\d{4}-10-30$/ to an empty string, then split the resulting string on one or more newlines.
If your string were
2020-10-16
2020-10-30
2020-11-12
2020-11-23
and was held by the variable str, then in Ruby, for example,
str.sub(/^\d{4}-10-30$/,'')
#=> "2020-10-16\n\n2020-11-12\n2020-11-23\n"
so
str.sub(/^\d{4}-10-30$/,'').split
#=> ["2020-10-16", "2020-11-12", "2020-11-23"]
Whatever language you are using undoubtedly has similar methods.

Regular Expression Extracting Text from a group

I have a filename like this:
0296005_PH3843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
I needed to break down the name into groups which are separated by a underscore. Which I did like this:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
So far so go.
Now I need to extract characters from one of the group for example in group 2 I need the first 3 and 8 decimal ( keep mind they could be characters too ).
So I had try something like this :
(.*?)_([38]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It didn’t work but if I do this:
(.*?)_([PH]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It will pull the PH into a group but not the 38 ? So I’m lost at this point.
Any help would be great
Try the below Regex to match any first 3 char/decimal and one decimal
(.?)_([A-Z0-9]{3}[0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
Try the below Regex to match any first 3 char/decimal and one decimal/char
(.?)_([A-Z0-9]{3}[A-Z0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
It will match any 3 letters/digits followed by 1 letter/digit.
If your first two letter is a constant like "PH" then try the below
(.?)_([PH]+[0-9A-Z]{2})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
I am assuming that you are trying to match group2 starting with numbers. If that is the case then you have change the source string such as
0296005_383843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
It works, check it out at https://regex101.com/r/zem3vt/1
Using [^_]* performs much better in your case than .*? since it doesn't backtrack. So changing your original regex from:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
to:
([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
reduces the number of steps from 114 to 42 for your given string.
The best method might be to actually split your string on _ and then test the second element to see if it contains 38. Since you haven't specified a language, I can't help to show how in your language, but most languages employ a contains or indexOf method that can be used to determine whether or not a substring exists in a string.
Using regex alone, however, this can be accomplished using the following regular expression.
See regex in use here
Ensuring 38 exists in the second part:
([^_]*)_([^_]*38[^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
Capturing the 38 in the second part:
([^_]*)_([^_]*)(38)([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)

Regular Expressions in R

I found somewhat similar questions
R - Select string text between two values, regex for n characters or at least m characters,
but I'm still having trouble
say I have a string in r
testing_String <- "AK ADAK NAS PADK ADK 70454 51 53N 176 39W 4 X T 7"
And I need to be able to pull anything between the first element in the string that contains 2 characters (AK) and PADK,ADK. PADK and ADK will change in character but will always be 4 and 3 characters in length respectively.
So I would need to pull
ADAK NAS
I came up with this but its picking up everything from AK to ADK
^[A-Za-z0_9_]{2}(.*?) +[A-Za-z0_9_]{4}|[A-Za-z0_9_]{3,}
If I understood your question correctly, this should do the trick:
\b[A-Z]{2}\s+(.+?)\s+[A-Z]{4}\s+[A-Z]{3}\b
Demo
You'll have to switch the perl = TRUE option (to use a decent regex engine).
\b means word boundary. So this pattern looks for a match starting with a 2-letter word and ending with a 4 letter word followed by a 3 letter word. Your value will be in the first group.
Alternatively, you can write the following to avoid using the capturing group:
\b[A-Z]{2}\s+\K.+?(?=\s+[A-Z]{4}\s+[A-Z]{3}\b)
But I'd prefer the first method because it's easier to read.
Lookbehind is supported for perl=TRUE, so this regex will do what you want:
(?<=\w{2}\s).*?(?=\s+[^\s]{4}\s[^\s]{2})

Need to capture single character, but ignore digit

I'm parsing out flight info.
Here's the sample data:
E0.777 7 3:09
E0.319 N 1:43
E0.735 8 1:45
E0.735 N 1:48
E0.M80 9 3:21
E0.733 1:48
I need to populate fields like this:
Equipment: 735
On Time: N
Duration: 1:48
Problem I'm having is capturing the Y or N character but ignoring the single digit, then capturing the duration.
This is the expression I have tried:
#"^.{3}(.{3})\s?([N|Y]?)?(?:[0-9]\s+)?(\w{4})"
Edit: I updated the sample data to clarify my question. Equipment is not always three digits, it could be a character and two digits. The data between the equipment and the duration could be a boolean N or Y, a single digit, or white space. Only the boolean should be captured.
Firstly, you mix up the concepts of alternation and character classes [Y|N] would match 3 different characters: Y or | or N. Either use (...) or leave out the pipe.
Secondly your double ? after the character class does not really do anything. Thirdly, at the end you only match consecutive spaces if a digit was found. But if there is no digit, the last ? will ignore the subpattern, thus not allowing spaces either.
Lastly, \w does not match :.
Try this:
#"^.{3}(\d{3})\s?(?:([NY])|\d)\s+(\d:\d\d)"
You should also think about restricting the repeated . at the beginning to a more precise character class (i.e \w{2}\., but I don't know the possibilities there).
#"^..\.(\d{3})\s(?:([YN])|\d)\s*(\S{4})"
Changed .{3} to ..\. which is a bit more specific about there being a literal . for character 3.
(?:([YN])|\d) matches either Y/N or a digit, but only captures a Y or N. Notice that it's [YN] not [Y|N].
Changed \w{4} to \S{4} since \w doesn't match colons :.
This will do it...
^\w\d\.(\d{3})\s(?:([YN])|\d)\s*(\d:\d{2})$
I made some other changes to your regex because it was easier for me to just rewrite it based off your data then to try to modify what you had.
This will capture the Y or N or it won't capture anything in that group. I also tried to be more specific with your duration regex.
Update: This works with your new requirements...
^\w\d\.(\w{3})\s(?:([YN])|\d|\s)\s*(\d:\d{2})$
You can see it working on your data here... http://regexr.com?32j1b
(hover over each line to see the matched groups)
This captures all lines with Y or N and ignores everything else:
^...(\d{3})\s*([YN])\s*(\d+:\d+)

What is wrong with this Regular Expression?

I am beginner and have some problems with regexp.
Input text is : something idUser=123654; nick="Tom" something
I need extract value of idUser -> 123456
I try this:
//idUser is already 8 digits number
MatchCollection matchsID = Regex.Matches(pk.html, #"\bidUser=(\w{8})\b");
Text = matchsID[1].Value;
but on output i get idUser=123654, I need only number
The second problem is with nick="Tom", how can I get only text Tom from this expresion.
you don't show your output code, where you get the group from your match collection.
Hint: you will need group 1 and not group 0 if you want to have only what is in the parentheses.
.*?idUser=([0-9]+).*?
That regex should work for you :o)
Here's a pattern that should work:
\bidUser=(\d{3,8})\b|\bnick="(\w+)"
Given the input string:
something idUser=123654; nick="Tom" something
This yields 2 matches (as seen on rubular.com):
First match is User=123654, group 1 captures 123654
Second match is nick="Tom", group 2 captures Tom
Some variations:
In .NET regex, you can also use named groups for better readability.
If nick always appears after idUser, you can match the two at once instead of using alternation as above.
I've used {3,8} repetition to show how to match at least 3 and at most 8 digits.
API links
Match.Groups property
This is how you get what individual groups captured in a match
Use look-around
(?<=idUser=)\d{1,8}(?=(;|$))
To fix length of digits to 6, use (?<=idUser=)\d{6}(?=($|;))