regular expression for a stream with no order

regular expression for a stream with no order - regex

I want to use regular expression(pcre regex), for matching a particular stream.
The stream i want to match is 3e followed by 20s or 09s or 0as, ending with 3c, then replace by just '3e3c'.
3e2020203c to be replaced by 3e3c
3e0920200a3c to be replaced by 3e3c
the thing is, the stream of 20, 09 and 0a(which comes between 3e and 3c - always starts with 3e and ens with 3c however) can come in any numbers and there is no order.

This should work for PHP.
$string = preg_replace('!3e(20|09|0a)+3c!','3e3c',$string);
In Perl
s/3e(20|09|0a)+3c/3e3c/g

Related

Vb.net regex finding a pattern after 5 spaces and 2 characters

I am trying to work on this regex pattern and match below examples. There are 5 spaces after Rx which I have tried to use " *", for but no luck.
("RX," *",\w\w(\w\w\w\w))
1) 18468.0 Rx 1CEBF900 8 02 00 00 80 00 01 01 FF - ' should match EBF9
2) 18468.6 Rx 18FD4000 8 FF FF 00 FF FF FF FF FF - 'should match FD40
ETC . . .

This expression seems to work:
Rx\s{5}\S{2}(.{4})
Function GetValue(line As String) As String
Dim regex As New Regex("Rx {5}\S{2}(.{4})")
Dim match As Match = regex.Match(line)
If match.Success Then Return match.Groups(1).Value
Return Nothing
End Function
See it here:
https://dotnetfiddle.net/yY3xXX

Here is a pattern that seems to extract the specific data you're seeking. It was generated and tested via RegExr.
Search Pattern: /(Rx {5}[0-9A-F]{2})([0-9A-F]{4})/g;
List/Replace Pattern: $2
Description: the first capture group specifies "Rx", five spaces, and two hexadecimal range characters; the second capture group specifies the next four hexadecimal range characters.

With your shown samples and attempts please try following regex and vb.net code. This will result EBF9 and FD40 values in output. Here is the Online demo for used regex in following.
Regex used for solution is:(?<=\s+Rx\s{5}.{2})\S+(?=\d{2}).
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim regex As Regex = New Regex("(?<=\s+Rx\s{5}.{2})\S+(?=\d{2})")
Dim match As Match = regex.Match("18468.0 Rx 1CEBF900 8 02 00 00 80 00 01 01 FF")
If match.Success Then
Console.WriteLine("RESULT: [{0}]", match)
End If
Dim match1 As Match = regex.Match("18468.6 Rx 18FD4000 8 FF FF 00 FF FF FF FF FF")
If match1.Success Then
Console.WriteLine("RESULT: [{0}]", match1.Value)
End If
End Sub
End Module
Explanation of regex:
(?<=\s+Rx\s{5}.{2}) ##Positive look behind to make sure Rx followed
##by 5 spaces followed by 2 any characters present.
\S+ ##matching all non-spaces here.
(?=\d{2}) ##Making sure they are followed by 2 digits.
Also I have taken both of your sample lines in 2 different variables just to show 2 lines output.

Extracting multi values with regex ( Only values, Not Fieldname )

Can someone help me with this regex?
I would like to extract either 1. or 2.
1.
(2624594000) 303 days, 18:32:20.00 <-- Timeticks
.1.3.6.1.4.1.14179.2.6.3.39. <-- OID
Hex-STRING: 54 4A 00 C8 73 70 <-- Hex-STRING (need "Hex-STRING" ifself too)
0 <--INTEGER
"NJTHAP027" <- STRING
OR
2.
Timeticks: (2624594000) 303 days, 18:32:20.00
OID: .1.3.6.1.4.1.14179.2.6.3.39
Hex-STRING: 54 4A 00 C8 73 70
INTEGER: 0
STRING: "NJTHAP027"
This filedname and value will return different data each time. (The data will be variable.)
I don't need to get the field names and only want to get the values in order from the top (multi value)
(?s)[^=]+\s=\s(?<value_v2c>([^=]+)-)
https://regex101.com/r/lsKeEM/2
-> I can't extract the last STRING: "NJTHAP027" at all!

The named group value_v2c is already a group, so you can omit the inner capture group.
Currently the - char should always be matched in the pattern, but you can either match it or assert the end of the string.
As you are using negated character classes and [^=]+ and \s, you can omit the inline modifier (?s) as both already match newlines.
To match the 2. variation, you can update the pattern to:
[^=]+\s=\s(?<value_v2c>[^=]+)(?:-|$)
Regex demo
To get the 1. version, you can match all before the colon as long as it is not Hex-String.
Then in the group optionally match it.
[^=]+\s=\s(?:(?!Hex-STRING:)[^:])*:?\s*(?<value_v2c>(?:Hex-STRING: )?[^=]+?)(?: -|$)
Regex demo

Haskell Regex non capture group

I'm using Text.Regex.TDFA on Lazy ByteString for extract some infomation from a file.
I have to extract each byte from this string:
27 FB D9 59 50 56 6C 8A
Here is what i've tried (my string begins with space):
(\\ ([0-9A-Fa-f]{2}))+
but i have 2 problems:
Only last match is returned [[" 27 FB D9 59 50 56 6C 8A"," 8A","8A"]]
I want to make the outer group non caputing one (like ?: in other engines)
Here is my minimal code:
import System.IO ()
import Data.ByteString.Lazy.Char8 as L
import Text.Regex.TDFA
main::IO()
main = do
let input = L.pack " 27 FB D9 59 50 56 6C 8A"
let entries = input =~ "(\\ ([0-9A-Fa-f]{2}))+" :: [[L.ByteString]]
print entries

When you attach a multiplier to a capture group, the engine returns only the last match. See rexegg.com/regex-capture.html#groupnumbers for a good explanation.
On the first pass, use this regex, similar to what you were already using (using a case-insensitive option):
^([\dA-F]+) +([\dA-F]+) +(\d+) +([\dA-F]+)(( [\dA-F]{2})+)
You'll get the following matching groups:
Use the 5th one as the target of a second pass, to extract each individual byte (using a "global" option):
([0-9A-Fa-f]{2})
Then each match will be returned separately.
Note: you don't need to escape the spaces, as you had in your original regex.

Is it possible for C/C++ PCRE to match 2 or more UTF-8 codepoints which are far apart from each other in a UTF-8 String?

Good afternoon, We are using the latest C/C++ version of PCRE on WINDOWS Visual Studio 8.0 and 9.0 with PCRE_CASELESS, PCRE_UTF8, PCRE_UCP. When we use the PCRE regex [\x{00E4}]{1} we are able to match Standard Latin code point U+00E4 with the string DAS tausendschÃ¶ne JungfrÃ¤ulein, also known as 44 41 53 20 74 61 75 73 65 6E 64 73 63 68 C3 B6 6E 65 20 4A 75 6E 67 66 72 C3 A4 75 6C 65 69 6E.
Now we would like to match both the codepoints U+00E4(i,e.C3 B6) and U+00F6 (i.e. C3 A4) so we can implement a simple prototype C/C++ search and replace operation $1 $2. Is this possible to do? Thank you.
We are now using the PCRE regex [\x{00F6}\x{00E4}]{1,} with the following C++ function:
void cInternational::RegExSearchReplace(cOrderedList *RegExList_,char **Input_) {
const char *replacement;
char substitution[dMaxRegExSubstitution];
int subString;
cPCRE *regEx;
unsigned char* Buffer;
Buffer = new unsigned char[1024];
if (*Input_[0]!='\x0' && RegExList_->ResetIterator()) {
do {
regEx=new cPCRE();
regEx->SetOptions(PCRE_CASELESS);
if (regEx->Compile(RegExList_->GetCharacterField(1))) {
// Search for Search RegEx:
while (regEx->Execute((char *)Buffer)>0) {
// Found it, get Replacement expression:
replacement=RegExList_->GetCharacterField(2);
int subLen=0;
// Build substitution string by finding each $# in replacement and replacing
// them with the appropriate found substring. Other characters in replacment
// are sent through, untouched.
for (int i=0;replacement[i]!='\x0';i++) {
if (replacement[i]=='$' && isdigit(replacement[i+1])) {
subString=atoi(replacement+i+1);
if (regEx->HasSubString(subString)) {
strncpy(substitution+subLen,
*Input_+regEx->GetMatchStart(),
regEx->GetMatchEnd() - regEx->GetMatchStart());
subLen+=(regEx->GetMatchEnd() - regEx->GetMatchStart()
}
i++
} else {
substitution[subLen++]=replacement[i];
}
}
substitution[subLen]='\x0';
// Adjust the size of Input_ accordingly:
int sizeDiff=strlen(substitution)-(regEx->GetMatchEnd()-regEx->GetMatchStart());
if (sizeDiff>0) {
char *newInput=new char[strlen(*Input_)+sizeDiff+1];
strcpy(newInput,*Input_);
delete[] *Input_;
*Input_=newInput;
}
memmove(*Input_ + regEx->GetMatchStart() + 1,
*Input_+regEx->GetMatchEnd() + 1,
regEx->GetMatchEnd()- regEx->GetMatchStart());
strncpy(*Input_,substitution,strlen(substitution));
(*Input_)[strlen(substitution)] = '\x0';
Buffer = Buffer + regEx->GetMatchEnd();
}
}
delete regEx;
} while (RegExList_->Next());
}
}

Using PCRE, the regex you would use to match those appearing anywhere in a string is the following: \x{00E4}.*\x{00F6}
Explanation:
\x{00E4} matches the first unicode character you want to find.
. matches any character.
* modifies the previous period to match 0 or more times. This will allow the second unicode character to be any number of characters away.
\x{00F6} matches the second unicode character you want to find.
This will match if they appear at all. Let me know how it works, if you need it to do something else, etc. (For example: this doesn't seem all that useful for a search and replace operation. It's just going to tell you if those characters exist in the string at all. You'd need to modify the regex to do a substitution.)

I sent an email to the developer of PCRE, Phip Hazel last night. Mr. Hazel delives that is it is possible to implement order insensitive PCRE regexes such as
\x{00f6}.?\x{00e4} | \x{00e4}.?\x{00f6}
The explanation is shown below. Thank you for your help, Damon. Regards, Frank
From: Philip Hazel
Date: Tue, Jun 26, 2012 at 8:55 AM
To: Frank Chang
Cc: pcre-dev#exim.org
On Mon, 25 Jun 2012, Frank Chang wrote:
Good evening, We are using C/C++ PCRE 8.30 with PCRE_UTF8 | PCRE_UCP |
PCRE_COLLATE.Here's an order-insensitive
regex: '(?=.\x{00F6})(?=.\x{00E4})' It tries to use uses ?= or positive
lookahead to make sure both UTF-8 code points are matched in either order.
PCRE_compile() returns OK and PCRE_execute() returns OK on the string DAS
tausendschÃ¶ne JungfrÃ¤ulein . In hex, it is 44 41 53 20 74 61 75 73 65 6E
64 73 63 68 C3 B6 6E 65 20 4A 75 6E 67 66 72 C3 A4 75 6C 65 69 6E.
However, GetMatchStart() returns 0 and GetMatchEnd() returns 0
instead of GetMatchStart() = 14 and GetMatchEnd() = 27 which we obtain when
we use the PCRE '\x{00F6}.*\x{00E4}' regex.
Please advise us if it is possible to do order insensitive matching of
multiple UTF-8 code points in a PCRE regex. THank you.
I have run your regex through the basic pcretest program, and
it matches. This confirms your finding with PCRE_compile()
and PCRE_execute().
Since your regex consists entirely of assertions, the actual matched
string is empty (as pcretest shows). You need to modify your regex to
actually match something if you want a match start and end to be given
to you. If what you want is the string between these two code points, in
either order, something simple like
\x{00f6}.?\x{00e4} | \x{00e4}.?\x{00f6}
(ignore white space) should do what you want.
I realize that this example may be a simplification of your real
application, and my simple suggestion does not scale very well. But the
main point stands: if you want to extract strings, your regex must do
some actual matching, not just assertions.
Philip
--
Philip Hazel

We wrote a PCRE order insensitive regex.
(?=.+(\x{00F6})){1}(?=.+(\x{00E4})){1}
That appears to function correctly.

How to search through a hex dump using regexes in Vim (or elsewhere)?

I’m looking for a way to search for the text representation of a series of hexadecimal numbers in the hex dump of a binary file that looks like so:
0x000001A0: 36 5B 09 76 99 31 55 09 78 99 34 51 49 BF E0 03
0x000001B0: 28 0B 0A 03 0B E0 07 28 0B 0A 03 0B 49 58 09 35
The issue is that the pattern may roll over onto the next line. For instance, in the above two lines, I wouldn’t be able to immediately search for 03 28 0B because it spans two lines.
I have been told from recent posting that regex is the way to go, but I’m unfamiliar with it and do not know what to use: Notepad++, Vim, Word, or anything else.
Edit 1: The text file that shows the above was derived from a binary file, and I can use Notepad++.
Edit 2: To give an example, say I'm trying to get as close to 11:45:00 (military time) as possible. 03 28 0B 0A 03 0B scattered over the two lines above, can be read as “3 seconds, 40 minutes, 11 hours on the 10th day of March 2011”. I'm looking for a way to go through this file to find how close I can get to 11:45:00.

Let me propose the following mappings that take a number of hex digits
from user input or visual selection, create appropriate pattern, and
start a search for it.
nnoremap <silent> <expr> <leader>x/ SearchHexBytes('/', 0)
nnoremap <silent> <expr> <leader>x? SearchHexBytes('?', 0)
vnoremap <silent> <leader>x/ :call SearchHexBytes('/', 1)<cr>/<cr>
vnoremap <silent> <leader>x? :call SearchHexBytes('?', 1)<cr>?<cr>
function! SearchHexBytes(dir, vis)
if a:vis
let [qr, qt] = [getreg('"'), getregtype('"')]
norm! gvy
let s = #"
call setreg('"', qr, qt)
else
call inputsave()
let s = input(a:dir)
call inputrestore()
endif
if s =~ "[^ \t0-9A-Fa-f]"
echohl Error | echomsg 'Invalid hex digits' | echohl None
return
endif
let #/ = join(split(s, '\s\+'), '\%(\s*\|\n0x\x\+:\s*\)')
return a:dir . "\r"
endfunction

Well it seems none of the more elegant solutions have worked for you so here:
\v03(\n[^:]+:)? 28(\n[^:]+:)? 0B(\n[^:]+:)?
Yeah, it's copy pasted and super brute forcy but it'd look so much better if I could get friggin backreferences to work.
Just type '/' then copy that pattern in and hit enter, replace 03 28 0B with whatever you need followed by space, new value, then the parenthetical statement. There's roughly a 100% chance there's something better, but I can't think of it.
This will match the memory location as well, but that shouldn't matter if all you want to do is take a peek.
Edit: Forgot about \v

You can use PSPad which has a built-in HEX Editor and HEX search. Just open your original binary file, switch to HEX Editor and search for your sequence.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regular expression for a stream with no order - regex

This should work for PHP. $string = preg_replace('!3e(20|09|0a)+3c!','3e3c',$string); In Perl s/3e(20|09|0a)+3c/3e3c/g

Related

Vb.net regex finding a pattern after 5 spaces and 2 characters

Extracting multi values with regex ( Only values, Not Fieldname )

Haskell Regex non capture group

Is it possible for C/C++ PCRE to match 2 or more UTF-8 codepoints which are far apart from each other in a UTF-8 String?

How to search through a hex dump using regexes in Vim (or elsewhere)?

Categories

Resources