Regular expression sequence matching - regex

Is it possible to create a regular expression to find an incrementing sequence of hex numbers? I am trying to find number sequences (4 numbers long) inside seemingly random hex number strings.
... 59 fd 25 bf b1 b2 b3 b4 39 ca ...
... 35 c1 55 c4 c5 c6 c7 74 92 e1 ...
I was hoping to find the pattern b1 b2 b3 b4 in line 1 and c4 c5 c6 c7 in line 2.
Group matching will find same number sequences... /(\w\w)\1{3}/ will find c4 c4 c4 c4 but I haven't found a way to match the incrementing sequence.
Any ideas?

Regex is used for matching patterns occurring repeatedly not for matching patterns occurring incrementally
You better parse it with your own parser

Related

Regex to match part of a hex

so I need to use regex to match a part of a hexadecimal string, but that part is random. Let me try to explain more:
So I have this hexa data:
70 75 62 71 00 7e 00 01 4c 00 06 72 61 6e 64 6f 6d 74 00 1c 4c 6a 2f 73 2f 6e 64 6f 6d 3b 78 70 77 25 00 00 00 20 f2 90 c2 91 c4 c4 ca 91 c0 c0 ca 91 94 cb c5 97 90 c5 90 c2 90 96 c7 ca 91 91 93 94 c6 c5 c6 cb c0 78
I need to match only the f2 in that case. But that is not always the case. Each data will be different. The only thing that is always the same is the '00 00 00' part and the '78' at the end. All the rest is random.
I managed to make the following regex:
/(?=00 00 00).+?(?=78)/
The output is:
00 00 00 20 f2 90 c2 91 c4 c4 ca 91 c0 c0 ca 91 94 cb c5 97 90 c5 90 c2 90 96 c7 ca 91 91 93 94 c6 c5 c6 cb c0
But I dont know how to build a regex to take only the 'f2' (reminder: not always is going to be f2)
Any thoughts?
Given the explanation in this comment, the regex that you need is:
(?<=00 00 00 [0-9a-f]{2} )[0-9a-f]{2}
Providing the first input string from the question, this regex matches f2 (no spaces around it).
Check it online.
How it works:
(?<= # start of a positive lookbehind
00 00 00 # match the exact string ("00 00 00 ")
[0-9a-f] # match one hex digit (lowercase only)
{2} # match the previous twice (i.e. two hex digits)
# there is a space after ")"
) # end of the lookbehind
[0-9a-f]{2} # match two hex digits
The positive lookbehind works like a non-capturing group but it is not part of the match. Basically it says that the matching part ([0-9a-f]{2}) matches only if it is preceded by a match of the lookbehind expression.
The matching part of the expression is [0-9a-f]{2} (i.e. two hex digits).
You need to add i or whatever flag uses the regex engine that you use to denote "ignore cases" (i.e. the a-f part of regex also match A-F). If you cannot (or do not want to) provide this flag you can put [0-9A-Fa-f] everywhere and it works.
If your regex engine does not support lookbehind you can get the same result using capturing groups:
00 00 00 [0-9a-f]{2} ([0-9a-f]{2})
Applied on the same input, this regex matches 00 00 00 20 f2 and its first (and only) capturing group matches f2.
Check it online.
Update
If it is important that the input string contains 78 somewhere after the matching part then add (?=(?: [0-9a-z]{2})* 78) to the first regex:
(?<=00 00 00 [0-9a-f]{2} )[0-9a-f]{2}(?=(?: [0-9a-z]{2})* 78)
(?= introduces a positive lookahead. It behaves similar to a lookbehind but must stay after the matching part of the reged and it is verified against the part of the string located after the matching part of the string.
(?: starts a non-capturing group.
The [0-9a-z]{2} followed or preceded by a space in the lookahead and lookbehind ensure that the entire matching string is composed only of 2 hex digit numbers separated by spaces. You can use .* instead but that will match anything, even if they do not follow the format of 2 hex digit numbers.
For the version without lookaheads or lookbehinds add (?: [0-9a-z]{2})* 78 at the end of the regex:
00 00 00 [0-9a-f]{2} ([0-9a-f]{2})(?: [0-9a-z]{2})* 78
The regex matches the entire string starting with 00 00 00 and ending with 78 and the first capturing group matches the second number after 00 00 00 (your target).
Is the f2 surrounded by asterisks?
Without asterisks:
00 00 00 [a-f0-9]+ (?<hexits>[a-f0-9]+).+78
With asterisks:
\*(?<hexits>[a-f0-9]+)\*
You can use the following regex to match the hexadecimal value after "00 00 00": /00 00 00 ([0-9A-Fa-f]{2})/. The value you want is in the capturing group, represented by \1.
Here is a demo:
import re
s = '70 75 62 71 00 7e 00 01 4c 00 06 72 61 6e 64 6f 6d 74 00 1c 4c 6a 2f 73 2f 6e 64 6f 6d 3b 78 70 77 25 00 00 00 20 f2 90 c2 91 c4 c4 ca 91 c0 c0 ca 91 94 cb c5 97 90 c5 90 c2 90 96 c7 ca 91 91 93 94 c6 c5 c6 cb c0 78'
match = re.search(r'00 00 00 ([0-9A-Fa-f]{2})', s)
if match:
print(match.group(1))
The output will be:
f2
You don't really need a regex for that. Get the offset of 3 bytes of zero in a row and take the 4th one after it:
s = '70 75 62 71 00 7e 00 01 4c 00 06 72 61 6e 64 6f 6d 74 00 1c 4c 6a 2f 73 2f 6e 64 6f 6d 3b 78 70 77 25 00 00 00 20 f2 90 c2 91 c4 c4 ca 91 c0 c0 ca 91 94 cb c5 97 90 c5 90 c2 90 96 c7 ca 91 91 93 94 c6 c5 c6 cb c0 78'
s2 = '01 02 03 00 00 00 05 06 07'
def locate(s):
data = bytes.fromhex(s)
offset = data.find(bytes([0,0,0]))
return data[offset + 4]
print(f'{locate(s):02X}')
print(f'{locate(s2):02X}')
Output:
F2
06
You could also extract the "f2" string directly from the string:
offset = s.index('00 00 00')
print(s[offset + 12 : offset + 14]) # 'f2'

Grouping lines with same leading word using Javascript Regex Engine

Suppose you have the following multi-line string:
C1 10
C2 20
C3 30
C2 40
C4 50
C3 60
And you want to match only those lines which have the same leading word, so as to build the following result:
C1 10
C2 20 40
C3 30 60
C4 50
I am trying to figure out a solution with pure Regex, but I am stuck. Any help?
I did try what the regex that follows, but it didn't work...
Regex: /(^\w+\b)(.*$)([\s\S]*?\n)(\1)(.*$)/gm
Substitution:$1$2$5$3
Result:
C1 10
C2 20 40
C3 30
C4 50
C3 60
As you can see, it only works with the first occurrence, despite the fact that I have used a lazy quantifier in the third capturing group.
Any help?
You could also accomplish this using reduce()
const data = `C1 10
C2 20
C3 30
C2 40
C4 50
C3 60`;
const result = data.split("\n").reduce((acc, val) => {
const vals = val.split(" ");
if (!acc[vals[0]]) acc[vals[0]] = vals[1];
else acc[vals[0]] += ` ${vals[1]}`;
return acc;
}, {});
console.log(result);

print and save matrix in fortran

Hello everyone I am new to Fortran and I am facing a problem. Let s assume I have a matrix a(5,50)
a1 a2 a3 a4 a5 a6 a7 etc
b1 b2 b3 b4 b5 b6 b7 etc
c1 c2 c3 c4 c5 c6 c7 etc
d1 d2 d3 d4 d5 d6 d7 etc
e1 e2 e3 e4 e5 e6 e7 etc
is there a way to save it into a file and print the matrix like the following
a1 b1 c1 d1 e1
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3
etc
without saving it to another matrix? Because ok i can always do a loop and save it to a new matrix and then save that to a file and print it. I have also created a subroutine to print my matrix in a correct order and be presentable
Sure.
You could loop over the first index, then write the whole column:
do ii = 1, 50
write(unit, '(5(I7))') a(ii, :)
end do
Or you could use transpose:
write(unit, '(5(I7))') transpose(a)
(I'm assuming that a is an integer array and that all values can be written with 6 or fewer digits (including sign). Change the format if that's not the case.)
This computer doesn't have a fortran compiler, so I haven't tested it, but it should work.
Cheers

c++ replacing order of items in byte array

I'm writing a code for Arduino C++.
I have a byte array with hex byte values, for example:
20 32 36 20 E0 EC 20 F9 F0 E9 E9 E3 F8 5C 70 5C 70 5C 73 20 E3 E2 EC 20 F8 E0 E5 E1 EF 20 39 31 5C
There are four ASCII digits in these bytes:
HEX 0x32 is number 2 in ascii code
HEX 0x35 is number 5 in ascii code
HEX 0x39 is number 9 in ascii code
and so on....
https://www.ascii-codes.com/cp862.html
So the hex values 32, 36 represent the number 26, and 39, 31 represent 91.
I want to find these numbers and reverse each group, so that (in this example) 62 and 19 are represented instead of 26 and 91.
The output would thus have to look like this:
20 36 32 20 E0 EC 20 F9 F0 E9 E9 E3 F8 5C 70 5C 70 5C 73 20 E3 E2 EC 20 F8 E0 E5 E1 EF 20 31 39 5C
The numbers don't have to be two digits but could be anything in 0-1000
I also know that each group of such numbers is preceded by the hex value 20, if that helps.
I have done this in C# (with some help of Stack overflow users :-) ):
string result = Regex.Replace(HexMessage1,
#"(?<=20\-)3[0-9](\-3[0-9])*(?=\-20)",
match => string.Join("-", Transform(match.Value.Split('-'))));
private static IEnumerable<string> Transform(string[] items)
{
// Either terse Linq:
// return items.Reverse();
// Or good old for loop:
string[] result = new string[items.Length];
for (int i = 0; i < items.Length; ++i)
result[i] = items[items.Length - i - 1];
return result;
}
Can someone help me make it work on C++?
Loop over the array, element by element, looking for 0x32 or 0x39. If found, check the next byte (if within bounds) to see if it matches 0x36 or 0x31 (respectively). If it does then swap the current and the next byte. Continue the loop, skipping over the current and the next byte.

C++ Matrix horizontal concat

I have 2 matrix, for example:
a1 a2 a3 a4 a5 a6 a7 a8
M1 = b1 b2 b3 b4 M2 = b5 b6 b7 b8
c1 c2 c3 c4 c5 c6 c7 c8
what i want is get a matrix concat like this:
a1 a2 a3 a4 a5 a6 a7 a8
Mr = b1 b2 b3 b4 b5 b6 b7 b8
c1 c2 c3 c4 c5 c6 c7 c8
fast as possible cause my program is all based on this concat at speed of 50MHz.(Sound acquisition)
It's actually neded for read a single line fast(each line is a microphone flow).
If you save your matrix as a std::vector<std::vector<double>>, where the inner vector is one of your rows, you can use std::insert to perform a concatenation of the rows of your matrices.
vector1.insert( vector1.end(), vector2.begin(), vector2.end() );
You might also find a library such as armadillo useful. I has a function join_rows( A, B ), which is doing, what you ask for. With some chance this will have a better performance, than what you can program yourself.