Extracting multi values with regex ( Only values, Not Fieldname ) - regex

Can someone help me with this regex?
I would like to extract either 1. or 2.
1.
(2624594000) 303 days, 18:32:20.00 <-- Timeticks
.1.3.6.1.4.1.14179.2.6.3.39. <-- OID
Hex-STRING: 54 4A 00 C8 73 70 <-- Hex-STRING (need "Hex-STRING" ifself too)
0 <--INTEGER
"NJTHAP027" <- STRING
OR
2.
Timeticks: (2624594000) 303 days, 18:32:20.00
OID: .1.3.6.1.4.1.14179.2.6.3.39
Hex-STRING: 54 4A 00 C8 73 70
INTEGER: 0
STRING: "NJTHAP027"
This filedname and value will return different data each time. (The data will be variable.)
I don't need to get the field names and only want to get the values in order from the top (multi value)
(?s)[^=]+\s=\s(?<value_v2c>([^=]+)-)
https://regex101.com/r/lsKeEM/2
-> I can't extract the last STRING: "NJTHAP027" at all!

The named group value_v2c is already a group, so you can omit the inner capture group.
Currently the - char should always be matched in the pattern, but you can either match it or assert the end of the string.
As you are using negated character classes and [^=]+ and \s, you can omit the inline modifier (?s) as both already match newlines.
To match the 2. variation, you can update the pattern to:
[^=]+\s=\s(?<value_v2c>[^=]+)(?:-|$)
Regex demo
To get the 1. version, you can match all before the colon as long as it is not Hex-String.
Then in the group optionally match it.
[^=]+\s=\s(?:(?!Hex-STRING:)[^:])*:?\s*(?<value_v2c>(?:Hex-STRING: )?[^=]+?)(?: -|$)
Regex demo

Related

Regex pattern to match "AA BB CC DD"

I have a hexadecimal string with space separator for each byte.
eg., A1 B2 C3 D4 E5 FF 00 11 22 33 44 ...
I would like to use a regex validator to verify the user input is correct or not?
How could I write the regular expression to achieve this goal?
Something like this:
^[A-F0-9]{2}( [A-F0-9]{2})*$
Explanation:
^ - anchor: string start
[A-F0-9]{2} - two symbols in either 0..9 or A..F range
( [A-F0-9]{2})* - followed by space and two 0..9 or A..F symbols zero or more times
$ - anchor: string end
If you allow a..f as valid hexadecimal symbols
^[A-Fa-f0-9]{2}( [A-Fa-f0-9]{2})*$
I would like to propose a solution based on DRY principle
(Don't Repeat Yourself).
Instead of writing the same pattern (as Dmitry proposed), you can:
Write the pattern for 2 hex digits as a capturing group - ([A-F0-9]{2}).
"Call" it again using (?1).
So the whole pattern can be ^([A-F0-9]{2})( (?1))*$.
There are also other variants of "calling" a capturing group, e.g.
(?-1) - call the preceding group or
(?&name) - call a named group.
For details see https://www.regular-expressions.info/subroutine.html

Regular expression to validate 2 character hex string

I have a source of data that was converted from an oracle database and loaded into a hadoop storage point. One of the columns was a BLOB and therefore had lots of control characters and unreadable/undetectable ascii characters outside of the available codeset. I am using Impala to write regex replace function to parse some of the unicode characters that the regex library cannot understand. I would like to remove the offending 2 character hex codes BEFORE I use the unhex query function so that I can do the rest of the regex parsing with a "clean" string.
Here's the code I've used so far, which doesn't quite work:
'[2-7]{1}([A-Fa-f]|[0-9]{1})'
I've determined that I only need to capture \u0020-\u007f - or represented in the two bit hex - 20-7f
If my string looks like this:
010A000000153020405C00000000143020405CBC000000F53320405C4C010000E12F204058540100002D01
I would like to be able to capture 2 characters at a time (e.g. 01,0A,00) evaluate whether or not that fits the acceptable range of 2 byte hex I mentioned above and return only what is acceptable.
The correct output should be:
30 20 40 5C 30 20 40 5C 33 20 40 5C 4C 2F 20 40 58 and 54
However, my expression finds the first acceptable number in my first range (5) and starts the capture from there which returns the position or indexing wrong for the rest of the string... and this is the return from my expression -
010A0000001**53**0**20****40****5C**000000001**43**0**20****40****5C**BC000000F**53****32**0**40****5C****4C**010000E1**2F****20****40****58****54**010000**2D**01
I just don't know how to evaluate only two characters at a time in a mixed-length string. And, if they don't fit the expression, iterate to the next two characters. But only in two character increments.
My example: https://regex101.com/r/BZL7t0/1
I have added a Positieve Lookbehind to it. Which starts at the beginning of the string and then matches 2 characters at the time. This ensures that the group you're matching always has groups of 2 characters before it.
Positieve Lookbehind:
(?<=^(..)*)
Updated regex:
(?<=^(..)*)([2-7]{1}[A-Fa-f0-9]{1})
Preview:
Regex101

Regex Validate French mobile number

I'm trying to validate french mobile numbers:
I have already removed all non numeric character and the eventual 00 at beginning, and rules are:
start with 06 or 07 or 09
is 10 digit long:
thus :
/^0(6|7|9)\d{8}$/
but (seems) that if countrycode (33) is present, the leading zero has to be avoided, but at this point I cannot create the right regex, since with number:
33614444444
/^(33|0)?(6|7|9)\d{8}$/
it works, but works also with
614444444
while it should not
can suggest solution?
you can do it using the regex
^(33|0)(6|7|9)\d{8}$
see the regex101 demo
Why don't you simply use /^(33|0)(6|7|9)\d{8}$/ ?
I do not think you need the quantifier ?.
When you add ? after (33|0). It implies either none of them is present or one of 33 or 0 is present. It would match all the following -
614444444 // none present
0614444444 // 0 present
33614444444 // 33 present

Find all substrings with at least one group

I try to find in a string all substring that meet the condition.
Let's say we've got string:
s = 'some text 1a 2a 3 xx sometext 1b yyy some text 2b.'
I need to apply search pattern {(one (group of words), two (another group of words), three (another group of words)), word}. First three positions are optional, but there should be at least one of them. If so, I need a word after them.
Output should be:
2a 1a 3 xx
1b yyy
2b
I wrote this expression:
find_it = re.compile(r"((?P<one>\b1a\s|\b1b\s)|" +
r"(?P<two>\b2a\s|\b2b\s)|" +
r"(?P<three>\b3\s|\b3b\s))+" +
r"(?P<word>\w+)?")
Every group contain set or different words (not 1a, 1b). And I can't mix them into one group. It should be None if group is empty. Obviously the result is wrong.
find_it.findall(s)
> 2a 1a 2a 3 xx
> 1b 1b yyy
I am grateful for your help!
You can use following regex :
>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s?)+(?:\w+|\.))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '2b.']
Here I just concise your regex by using character class and modifier ?.The following regex is contain 2 part :
[12][ab]|3b?
[12][ab] will match 1a,1b,2a,2b and 3b? will match 3b and 3.
And if you don't want the dot at the end of 2b you can use following regex using a positive look ahead that is more general than preceding regex (because making \s optional is not a good idea in first group):
>>> reg=re.compile('((?:(?:[12][ab]|3b?)\s)+\w+|(?:(?:[12][ab]|3b?))+(?=\.|$))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '2b']
Also if your numbers and example substrings are just instances you can use [0-9][a-z] as a general regex :
>>> reg=re.compile('((?:[0-9][a-z]?\s)+\w+|(?:[0-9][a-z]?)+(?=\.|$))')
>>> reg.findall(s)
['1a 2a 3 xx', '1b yyy', '5h 9 7y examole', '2b']

Haskell Regex non capture group

I'm using Text.Regex.TDFA on Lazy ByteString for extract some infomation from a file.
I have to extract each byte from this string:
27 FB D9 59 50 56 6C 8A
Here is what i've tried (my string begins with space):
(\\ ([0-9A-Fa-f]{2}))+
but i have 2 problems:
Only last match is returned [[" 27 FB D9 59 50 56 6C 8A"," 8A","8A"]]
I want to make the outer group non caputing one (like ?: in other engines)
Here is my minimal code:
import System.IO ()
import Data.ByteString.Lazy.Char8 as L
import Text.Regex.TDFA
main::IO()
main = do
let input = L.pack " 27 FB D9 59 50 56 6C 8A"
let entries = input =~ "(\\ ([0-9A-Fa-f]{2}))+" :: [[L.ByteString]]
print entries
When you attach a multiplier to a capture group, the engine returns only the last match. See rexegg.com/regex-capture.html#groupnumbers for a good explanation.
On the first pass, use this regex, similar to what you were already using (using a case-insensitive option):
^([\dA-F]+) +([\dA-F]+) +(\d+) +([\dA-F]+)(( [\dA-F]{2})+)
You'll get the following matching groups:
Use the 5th one as the target of a second pass, to extract each individual byte (using a "global" option):
([0-9A-Fa-f]{2})
Then each match will be returned separately.
Note: you don't need to escape the spaces, as you had in your original regex.