Find assembly instruction by C++ regex - regex

I have a text of disassembly and I want to find out all the instructions and their addresses.
The assembly has contents like
00000000: ae ee move $1, $2
00000004: c1 add $2, $2, $3
00000006: a2 aa e3 addi $1, $2, $4
Each line can be divided in three groups
the address, a 8-digits in hexidecimal, followed by a colon ([a-f0-9]{8}:)
the machine code, a series of 2-digits in hexidecimal ([a-f0-9]\s+)
the instruction ((.*))
I wonder if it's possible to match "a series of 2-digits in hexidecimal"? so I can find those instructions from the third group in a pattern like
([a-f0-9]{8}):\s+(blabla)(.*)

Related

How to delete selected duplicated line from file using perl script

Let's say I am having file a.txt which is having following content.
aa
aa
bb
cc
dd
ee
ff
ee
gg
I want following output -
aa
aa
bb
cc
dd
ee
ff
gg
Note that I want delete particular duplication of lines only: ee.
How can I do that following one liner I tried.
perl -ne 'print unless $a{$_}++' a.txt
but it is deleting all duplicated lines.
To remove duplicates of only specific ("target") lines add that condition
perl -lne'$tgt = "ee"; print unless $h{$_}++ and $_ eq $tgt' file
If there may be multiple such targets, then check whether the current line matches any one of them. A nice tool for that is any from List::Util
perl -MList::Util=any -lne'
$line=$_;
#tgt = qw(ee aa);
print unless $h{$line}++ and any { $_ eq $line} #tgt
' file
Target(s) can be read from the command line as arguments, if there is a benefit to not have them hardcoded.
Note: In older versions any is in List::MoreUtils, and not in List::Util.

PowerShell RegEx to split MAC address

I need to verify MAC address in RAW format using RegEx and split it into an array of 6 values by 2 characters.
When I use following pattern, I get content of last iteration of capture group only:
PS C:\Windows\System32> "708BCDBC8A0D" -match "^([0-9a-z]{2}){6}$"
True
PS C:\Windows\System32> $Matches
Name Value
---- -----
1 0D
0 708BCDBC8A0D
PS C:\Windows\System32>
With what pattern can I caputere all the groups?
I need this result:
0 = 708BCDBC8A0D
1 = 70
2 = 8B
3 = CD
4 = BC
5 = 8A
6 = 0D
You can not capture multiple groups with single group definition.
Avoid using RegEx when unnecessary as it takes lots of CPU. Valuable for millions of recrds.
For MACs you can use special PhysicalAddress class:
[System.Net.NetworkInformation.PhysicalAddress]::Parse('708BCDBC8A0D')
For .Net 5 (Powershell Core I think based on it) there is TryParse method added, but in .Net 4.5 there is no TryParse method.
To check .Net framework powershell running use [System.Reflection.Assembly]::GetExecutingAssembly().ImageRuntimeVersion
'708BCDBC8A0D' -match "^$('([A-F0-9]{2})' * 6)$"; $Matches
'708BCDBC8A0D' -match '^([A-F0-9]{2})([A-F0-9]{2})([A-F0-9]{2})([A-F0-9]{2})([A-F0-9]{2})([A-F0-9]{2})$'; $Matches
'#(0..5) | ForEach-Object {'708BCDBC8A0D'.Substring($_ * 2, 2)}'
#(
[String]::new('708BCDBC8A0D'[0..1]),
[String]::new('708BCDBC8A0D'[2..3]),
[String]::new('708BCDBC8A0D'[4..5]),
[String]::new('708BCDBC8A0D'[6..7]),
[String]::new('708BCDBC8A0D'[8..9]),
[String]::new('708BCDBC8A0D'[10..11])
)
As you've observed, the automatic $Matches variable, which reflects the result of the most recent (scalar-input[1]) regular-expression-based match operation, only ever contains the last instance of what an embedded capture group ((...)) captured.
Generally, -match only ever looks for at most ONE match in the input.
GitHub issue #7867 proposes introducing a new -matchall operator that would find all matches and return them as an array.
Direct use of the [regex] class (System.Text.RegularExpressions.Regex) that underlies PowerShell's regex functionality already provides that ability, namely in the form of the ::Matches() method, in which case capture groups aren't even needed.
# Note: Inline option (?i) makes the regex case-INsensitive
# (which PowerShell's operators are BY DEFAULT).
PS> [regex]::Matches('708BCDBC8A0D', '(?i)[0-9a-f]{2}').Value
70
8B
CD
BC
8A
0D
However, with a bit of trickery, you can also use -split, the string splitting operator:
# Note: No inline option needed: -split - like -match and -replace -
# is case-INsensitive by default.
PS> '708BCDBC8A0D' -split '([0-9a-f]{2})' -ne ''
70
8B
CD
BC
8A
0D
If can assume that all character pairs in the input strings are hex byte values, you can simplify to:
'708BCDBC8A0D' -split '(..)' -ne ''
Note:
The regex is of necessity enclosed in (...), a capturing group, to explicitly instruct -split to include what it matches in the results; since the regex normally describes the separators between the substrings of interest, its matches are normally not included.
In this case it is only the "separators" we care about, whereas the substrings between them are empty strings here, so we filter them out with -ne ''.
[1] If the LHS of a -match operation is an array (a collection), matching occurs against each element, and the sub-array of matching elements (rather than a single Boolean) is returned. In this case, $Matches is not populated.

Match certain text, but omit it in the output

In Notepad++'s find and replace regex feature, is there any way to match certain text, but not include it in the replacement? For instance: ([ab][cd] )* for matching a strings such as ac ad bc bc ad, and replacing it with $0, except not including the [ab] part, or in the case of the string above, c d c c d. While only answers for Notepad++'s regex dialect will be useful, if anyone knows a solution in some other dialect, I'd be curious to see them, and they might apply to this dialect anyway.
EDIT:
The pattern is easy to match, the part I don't know how to do is get the replacement to do what I want. For the example expression I gave, the pattern (?:[ab]([cd]))* actually works, with $1 in the replace box, but that said, it doesn't work for my actual use case because the [ab][cd] is a sub-expression of the result (note that I didn't think that it would make a difference, else I would have posted this in the original question, my apologies); a better example would be where I want strings like f(ac ad bc bc ad): replaced with f(ac ad bc bc ad): f'(c d c c d) (so, really I want a regular addition). I tried using the regex ([a-z])\((?:[ab]([cd] ?))*\):, with the replacement being $0$1'($2), but that results in the value of $2 being whatever it last matched (i.e., f(ac ad bc bc ad): f'(d)).
Notepad++ find and replace functionality doesn't provide a feature to solve this specific problem. As I see, you need to match a substring and replace parts of it without affecting similar patterns in text which I assume should be general to be able to expand.
if anyone knows a solution in some other dialect...
awk to the rescue
You have to use a programming language or a more powerful text-processing tool. If you have an awk implementation within your environment you are able to achieve what you desire in a second:
awk '{
sepRe = "[ab]"
regex = "(" sepRe "[cd] )+"
while(match($0, regex)) {
str = substr($0, RSTART, RLENGTH);
current = str
gsub(sepRe, "", current)
sub(str, current, $0)
}
print;
}' file
$ cat file:
ac ad bc bc ad
ac same ab ad af
Running awk outputs:
c d c c ad
c same ab d af
Note that there is no space after last ad

find an import in code with regular expression

I have an import error somewhere in my code but can't find it. I can use the search function in my editor (sublime text) to search for a regular expression in all the files inside the project. So I would like to search for the terms 'import' and 'views' with anything in between or before / after, I just want to match any line that contains these two words.
I'm not familiar with 'Sublime Text', but the following regex works well in 'Notepad++':
(.*)(import)(.*)(views)(.*)
Now, in the line matched, $1, $2, $3, $4 and $5 may be used to refer to the text before "import", "import", the text between "import" and "views", "views" and the text after "views" respectively.
EDIT - 1
It works well with 'Sublime Text' as well.
For example, for the text,
asdf1234 ..import.fghj4567 views...hjkl7890
Find What (as above):
(.*)(import)(.*)(views)(.*)
Replace With:
$2, $4, $5, $3, $1
Result:
import, views, ...hjkl7890, .fghj4567 , asdf1234 ..
For the text,
asdf1234 views fghj4567 import hjkl7890
Find What (as above):
(.*)(views)(.*)(import)(.*)
Replace With:
$2, $4, $5, $3, $1
Result:
views, import, hjkl7890, fghj4567 , asdf1234
EDIT - 2
It seems to be working well for me though:

In Perl, how can I use the regex substitution operator to replace non-ASCII characters in a substring?

How to use this command:
perl -pi -e 's/[^[:ascii:]]/#/g' file
to change only characters at offset A to offset B of each line?
Alternatively to rubber boots' answer, you can operate on a substring instead of the whole string to begin with:
perl -pi -e 'substr($_, 5, 5) =~ s/[^[:ascii:]]/#/g' file
To illustrate:
perl -e 'print "\xff" x 16' | \
perl -p -e 'substr($_, 5, 5) =~ s/[^[:ascii:]]/#/g' | \
hd
will print
ff ff ff ff ff 23 23 23 23 23 ff ff ff ff ff ff
In this code, the first offset is 0-based, and you have to use the length instead of the second offset, so it will be
substr($_, A-1, B-A).
Under reservation that I didn't understand your question correctly, if the offsets A, and B are 5 and 10, then it should be like:
perl -pi -e 's/(?<=.{5})(?<!.{10})[^[:ascii:]]/#/g' file
Explanation:
[^[:ascii:]] <- the character which is looked for
(?<=.{5}) <- if at least 5 chars were before (offset 5)
(?<!.{10}) <- but no more than 10 characters before (offset 10)
The constructs:
(?<= ...) and (?<! ...)
are called positive and negative lookbehinds, which are zero-with assertions.
(You can google them, see section Look-Around Assertions in perlre)
Addendum 1
You mentioned substr() in your title, which I overlooked first. This would work, of course, too:
perl -pi -e 'substr($_,5,10)=~s/[^[:ascii:]]/#/g' file
The description of substr EXPR,OFFSET,LENGTH can be found in perldoc.
This example nicely illustrates the use of substr() as a left-value.
Addendum 2
When updating this post, Grrrr added the same solution as an answer, but his came first by a minute! (so he deserves the booty)
Regards
rbo