Regex Match Hexidecimal in Groups of 2-8 - regex

I am working on a regular expression to match against a hexadecimal string and having some trouble near the end. I am specifically looking for groups of 2 bytes that do not contain 00 that are between 2 and 8 bytes long. I have it all working except that when there are less than 8 bytes, it will allow extra 00 to be in it sometimes.
https://regex101.com/r/jq3QpP/1/
(?!(00)+)([0-9a-fA-F]{2,8})?(?!(00)+) // This on the below text gives the following matches
C86B0200554E0200C86B02000000000000000000270000008109000000000000EC6A050079750
18881000000410000280100000000000000000001000002010400000000000000000000000000
0000000000000000000000F65FA45900000000FF0000002F0000000000000049000000403C9F5
A000000000000000000000000FFFF330000000000000F06EAE8333536
Match 1
Full match 0-8 `C86B0200`
Group 2. 0-8 `C86B0200`
Match 2
Full match 8-16 `554E0200`
Group 2. 8-16 `554E0200`
Match 3
Full match 16-21 `C86B0`
Group 2. 16-21 `C86B0`
Match 4
Full match 21-21 ``
Match 5
Full match 39-47 `02700000`
Group 2. 39-47 `02700000`
In match 1,2,5 there are extra 00, in match 3, it missed the 20 for some reason. If you have an idea what I missed, please let me know

You can avoid matching 00 by allowing only one 0 in two digits at a time instead:
(?:[A-F1-9][A-F0-9]|[A-F0-9][A-F1-9]){1,4}(?=(?:..)*$)
Demo: https://regex101.com/r/2hebvf/2

Related

Write regex patterns for matching single digit or double digit where tens place value is 2 or 4

Below is my regex for matching 2 digit where tens place value is 2 or 3 and it is working fine.
^(?=[2,4])\d{1,2}$
As soon as I add the regex for matching single digit in above regex , It started matching single digit and as well all 2 digit number.
^(?=\d|[2,4])\d{1,2}$
I want below sample input to be matched.
0
1
2
3
24
44
48
29
28
Below not to be matched.
99
11
33
55
77
Also It will great help if I would get to know why my regex is not working.
You get a difference in matches as the positive lookahead asserts that there must be to the right what you specify. In there first pattern that is either 2 4 or , and in the second case just a single digit.
You don't have a comma in your example data, so in that case you can match an optional 2 or 4 using just [24]? followed by a digit without any lookarounds.
^[24]?\d$
See a regex demo.
Try this: ^(\d|[2,4]\d)$
Test regex here: https://regex101.com/r/aZo7fK/1
^(\d|[2,4]\d)$
^ matches the start of string
(\d|[2,4]\d) matches either a single digit(0-9) or a two digit number which starts with either 2 or 4
$ matches the end of the string
This matches either a single digit(0-9) number or a two digit number which starts with either 2 or 4.
I suggest
^[2,4]?[0-9]$
pattern; where
^ - anchor, start of the text
[2,4]? - optional 2 or 4 digit for tens
[0-9] - mandatory digit 0..9 for units
$ - anchor, end of the text
Edit: Now, let's have a look at your current patterns; the first is
^(?=[2,4])\d{1,2}$
Here
(?=[2,4]) - look ahead for 2 or 4
\d{1,2} - one or two digits
as we can see 3 doesn't match: look ahead fails to find 2 or 4. As for your second attempt
^(?=\d|[2,4])\d{1,2}$
pattern, where
(?=\d|[2,4]) - look ahead for ANY digit (note, that |[2,4] is redundant)
\d{1,2} - one or two digits
the pattern matches too many texts; technically it matches any one or two digit numbers, e.g. for:
79
we have
(?=\d|[2,4]) - look ahead - succeeds with 7
\d{1,2} - one or two digits - succeeds with 79

How would I find values in a file, but only on lines that don't start with #?

I've got a document that looks something like this:
# Document ID 8934
# Last updated 2018-05-06
52 84 12 70 23 2 7 20 1 5
4 2 7 81 32 98 2 0 77 6
(..and so on..)
In other words, it starts off with a few comment lines, then the rest of the document is just a bunch of numbers separated by spaces.
I'm trying to write a regex that gets all digits on all lines that don't start with #, but I can't seem to get it.
I've read over answers such as
Regular Expressions: Is there an AND operator?
Regex: Find a character anywhere in a document but only on lines that begin with a specific word
and pawed through sites such as http://regular-expressions.info, but I still can't get an expression that works (the best I can get is a lengthy version of ^[^#].*
So how can I match digits (or text, or whatever) in a string, but only on lines that don't start with a certain character?
Your regex ^[^#].* uses a negated character class which matches not a # from the start of the string ^ and after that matches any character zero or more times.
This would for example also match t test
What you might do is use an alternation to match a whole line ^#.*$ that starts with a # or capture in a group one or more digits (\d+)
Your digits are captured group 1. You could change the (\d+) to for example a character class ([\w+.]+) to match more than only digits.
(?:^#.*$|(\d+))
Details
(?: Non capturing group
^#.*$ Match from the start of the line ^ a # followed by any character zero or more times .* until the end of the string $
| Or
(\d+) capture one or more digits in a group
) Close non capturing group
I think a way simpler method would be to replace the lines with "" first with this regex:
^#.*
And then you can just match all the numbers with this:
-?\d+ (-? is for negative)

Regex is possible to match?

I have files with these filename:
ZATR0008_2018.pdf
ZATR0018_2018.pdf
ZATR0218_2018.pdf
Where the 4 digits after ZATR is the issue number of magazine.
With this regex:
([1-9][0-9]*)(?=_\d)
I can extract 8, 18 or 218 but I would like to keep minimum 2 digits and max 3 digits so the result should be 08, 18 and 218.
How is possible to do that?
You may use
0*(\d{2,3})_\d
and grab Group 1 value. See the regex demo.
Details
0* - zero or more 0 chars
(\d{2,3}) - Group 1: two or three digits
_\d - a _ followed with a digit.
Here is a PCRE variation that grabs the value you need into a whole match:
0*\K\d{2,3}(?=_\d)
See another regex demo
Here, \K makes the regex engine omit the text matched so far (zeros) and then matches 2 to 3 digits that are followed with _ and a digit.
(?:[1-9][0-9]?)?[0-9]{2}(?=_[0-9])
or perhaps:
(?:[1-9][0-9]+|[0-9]{2})(?=_[0-9])
(https://www.freeformatter.com/regex-tester.html, which claims to use the XRegExp library, that you mention in another answer doesn't seem to backtrack into the (?:)? in my first suggestion where necessary, which makes it very different from any regex engine I've encoutered before and makes it prefer to match just the 18 of 218 even though it starts later in the string. But it does work with my second suggestion.
([1-9]\d{2,3})(?=_\d)
{x,y} will match from x to y times the previous pattern, in this case \d
Edit: from your own regex it looked as you wanted the part of the number which starts with a non-zero. However since your examples include leading 0s, maybe you really wanted :
(\d{2,3})(?=_\d)
Which will give you the last 3 digits before underscore unless there are only 2 digits.
I propose you:
^ZATR0*(\d{2,3})_\d+\.pdf$
demo code here. Result:
Match 1 Full match 0-17 ZATR0008_2018.pdf Group 1. 6-8 08
Match 2 Full match 18-35 ZATR0018_2018.pdf Group 1. 24-26 18
Match 3 Full match 36-53 ZATR0218_2018.pdf Group 1. 41-44 218

How can I replace this expression in chain regex (notepad++)?

i have this text
14 two 25 three 12 four 40 five 10
I want to obtain "14 two 14 25 three 14 25 12 four 14 25 12 40 five 14 25 12 40 10"
For example, when I replace (14 two ) for (14 two 14 ) this start after of 14 I can't start it after two.
Is there any other alternative to do?
For example using a group that is not included in match ( a group before match ) for replace it ?
please help me
This should do the trick for you:
Regex: ((?:\s?\d+\s?)+)((?:[a-zA-Z](?![^a-zA-Z]+\1))+)
Replacement: $1$2 $1
You will need to click on the "replace all" button for this to work (it cannot be done in one shot, it has to be repeated as long as it can find match. Online PHP example)
Explanation:
\s: Match a single space character
?: the previous expression must be matched 0 or 1 time.
\s?: Match a space character 0 or 1 time.
\d: Match a digit character (the equivalent of [0-9]).
+: The previous expression must be matched at least one time (u to infinite).
\d+: Match as much digit characters as you (but at least one time).
(): Capture group
(?:): Non-capturing group
((?:\s?\d+\s?)+): Match an optional space character followed by one or more digit characters followed by an optional space character. The expression is surrounded by a non-capturing group followed by a plus. That mean that the regex will try to match as much combination of space and digit character as it can (so you can end up with something like '14 25 12 40').
The capture group is meant to keep the value to reuse it in the replacement.You cannot simply add the plus at the end of the capture group without the non-capturing group within because it would only remember the last digits capture ('12' instead of the whole '14 25 12' use to build '14 25 12 40').
[a-zA-Z]: Match any English letters in any case (lower, upper).
\1: reference to what have been capture in the first group.
(?!): Negative lookahead.
[^]: Negative character class, so [^a-zA-Z] means match anything
((?:[a-zA-Z](?![^a-zA-Z]+\1))+): The negative lookahead is meant to make sure that we don't always end up matching the first "14 two" in the input text. Without it, we would end up in an infinite loop giving results as "14 two 14 14 14 14 14 14 25 three 12 four 40 five 10" (the "14" before "25" being repeated until you reach the timeout).
Basically, for every English letter we match, we lookahead to assert that the content of the first capture group (by example "14") is not present in our digit sequence.
For the replacement, $1$2 $1 means put the content of the capture group 1 and 2, add a space and put the content of the capture group 1 once more.

PCI Compliance regex detect pattern with spaces

I have to generate a regular expression to detect patterns of text where credit card numbers are involved, I have a regular expression but fails when the text is altered with simple spaces between the text for example (not valid credit card number):
4320 7589 9456 0123
The regex is:
4\d{3}(\s+|-)?\d{4}(\s+|-)?\d{4}(\s+|-)?\d{4}
This regex match easy, but if someone alter the text with spaces between any number like this:
4 320 7589 9456 0123
Does not match, I need a regex to detect any possible variable with spaces, special symbols, letters, some examples:
43 20 75 89 94 56 01 23
4 3 2 0 7 5 8 9 9 4 5 6 0 1 2 3
4320a7589b9456c0123
4320$7589$9456$0123
4320_7589_9456_0123
I don't know if I can strip any space, symbols from the pattern to analyze the text?
I am posting because you actually asked for help with pattern to match any number of non-digits between the first 4 and 15 more digits.
The pattern is
^4(?:\D*\d){15}$
See demo
Regex breakdown:
^ - start of string
4 - literal 4
(?:\D*\d){15} - 15 occurrences of sequences of...
\D* - 0 or more non-digit symbols before..
\d - a digit
$ - end of string
If you need to capture, you can capture (like ^4((?:\D*\d){3})((?:\D*\d){4})((?:\D*\d){4})((?:\D*\d){4})$), but the submatches will still contain the "junk" in-between digits.