using a partial variable as part of regular expression string

using a partial variable as part of regular expression string - regex

I'm trying to loop through an array of alarm codes and use a regular expression to look it up in cpp code. I know my regex works when I hard code a value in and use double quotes for my regex, but I need to pass in a variable because it's a list of about 100 to look up with separate definitions. Below is what I want to use in general. How do I fix it so it works with $lookupItem instead of hard-coding "OTHER-ERROR" for example in the Get-EpxAlarm function? I tried single quotes and double quotes around $lookupItem in the $fullregex definition, and it returns nothing.
Function Get-EpxAlarm{
[cmdletbinding()]
Param ( [string]$fileContentsToParse, [string]$lookupItem)
Process
{
$lookupItem = "OTHER_ERROR"
Write-Host "In epx Alarm" -ForegroundColor Cyan
# construct regex
$fullregex = [regex]'$lookupItem', # Start of error message########variable needed
":[\s\Sa-zA-Z]*?=", # match anything, non-greedy
"(?<epxAlarm>[\sa-zA-Z_0-9]*)", # Capture epxAlarm Num
'' -join ''
# run the regex
$Values = $fileContentsToParse | Select-String -Pattern $fullregex -AllMatches
# Convert Name-Value pairs to object properties
$result = $Values.Matches
Write-Host $result
#Write-Host "result:" $result -ForegroundColor Green
return $result
}#process
}#function
#main code
...
Get-EpxAlarm -fileContentsToParse $epxContents -lookupItem $item
...
where $fileContentsToParse is
case OTHER_ERROR:
bstrEpxErrorNum = FATAL_ERROR;
break;
case RI_FAILED:
case FILE_FAILED:
case COMMUNICATION_FAILURE:
bstrEpxErrorNum = RENDERING_ERROR;
break;
So if I look for OTHER_ERROR, it should return FATAL_ERROR.
I tested my regular expression in regex editor and it works with the hard-coded value. How can I define my regex so that I use the parameter and it returns the same thing as hard-coding the parameter value?

I wouldn't recommend trying to construct a single regular expression to do complex source code parsing - it gets quite unreadable really quickly.
Instead, write a small error mapping parser that just reads the source code line by line and constructs the error mapping table as it goes along:
function Get-EpxErrorMapping {
param([string]$EPXFileContents)
# create hashtable to hold the final mappings
$errorMap = #{}
# create array to collect keys that are grouped together
$keys = #()
switch -Regex ($EPXFileContents -split '\r?\n') {
'case (\w+):' {
# add relevant key to key collection
$keys += $Matches[1] }
'bstrEpxErrorNum = (\w+);' {
# we've reached the relevant error, set it for all relevant keys
foreach($key in $keys){
$errorMap[$key] = $Matches[1]
}
}
'break' {
# reset/clear key collection
$keys = #()
}
}
return $errorMap
}
Now all you need to do is call this function and use the resulting table to resolve the $lookupItem value:
Function Get-EpxAlarm{
[CmdletBinding()]
param(
[string]$fileContentsToParse,
[string]$lookupItem
)
$errorMap = Get-EpxErrorMapping $fileContentsToParse
return $errorMap[$lookupItem]
}
Now we can get the corresponding error code:
$epxContents = #'
case OTHER_ERROR:
bstrEpxErrorNum = FATAL_ERROR;
break;
case RI_FAILED:
case FILE_FAILED:
case COMMUNICATION_FAILURE:
bstrEpxErrorNum = RENDERING_ERROR;
break;
'#
# this will now return the string "FATAL_ERROR"
Get-EpxAlarm -fileContentsToParse $epxContents -lookupItem OTHER_ERROR

Related

PowerShell regex to get just hex part in strings

I'm working on a function that gets the map of string key and it's hex value. I got the string key part working, but I'm having trouble getting the hex part to work. This is my function so far:
function Get-Contents4_h{
[cmdletbinding()]
Param ([string]$fileContent)
#define Error_Failed_To_Do_A 0x81A0 /* random comments */
#define Error_Failed_To_Do_B 0x810A
# create an ordered hashtable to store the results
$errorMap = [ordered]#{}
# process the lines one-by-one
switch -Regex ($fileContent -split '\r?\n') {
'define ([\w]*)' { # Error_Failed_To_Do_ #this works fine
$key = ($matches[1]).Trim()
}
'([0x\w]*)' { # 0x04A etc #this does not work
$errorMap[$key] = ($matches[1]).Trim()
}
}
# output the completed data as object
#[PsCustomObject]$errorMap
return $errorMap
}
I'm going to be looping through the returned map and matching the hex value with the key in another object.
This is what the string parameter to the function looks like:
#define Error_Failed_To_Do_A 0x81A0 /* random comments */
#define Error_Failed_To_Do_B 0x810A
For some reason my
0x\w
regex is not returning anything in regex101.com. I've had luck with that with other hex numbers but not this time.
I've tried this and other variations as well: ^[\s\S]*?#[\w]*[\s\S]+([0x\w]*)
This is with powershell 5.1 and VS Code.

You need to remove the [...] range construct around 0x\w - the 0x occurs exactly once in the input string, and the following characters appears at least once - but the expression [0x\w]* could be satisfied by an empty string (thanks to the *, 0-or-more quantifier).
I'd suggest matching the whole line at once with a single pattern instead:
switch -Regex ($fileContent -split '\r?\n') {
'^\s*#define\s+(\w+)\s+(0x\w+)' {
$key,$value = $Matches[1,2] |ForEach-Object Trim
$errorMap[$key] = $value
}
}

This works for me. The square brackets match any one character inside them at a time. The pattern with the square brackets has 18 matches in this line, the first match being empty string ''. Regex101.com says the same thing (null). https://regex101.com/r/PZ8Y8C/1 This would work 0x[\w]*, but then you might as well drop the brackets. I made an example data file and then a script on how I would do it.
'#define Error_Failed_To_Do_A 0x81A0 /* random comments */' |
select-string [0x\w]* -AllMatches | % matches | measure | % count
18
'#define Error_Failed_To_Do_A 0x81A0 /* random comments */
#define Error_Failed_To_Do_B 0x810A' |
set-content file.txt
# Get-Contents4_h.ps1
Param ($file)
switch -Regex -File $file {
'define (\w+).*(0x\w+)' {
[pscustomobject]#{
Error = $matches[1]
Hex = $matches[2]
}
}
}
.\Get-Contents4_h file.txt
Error Hex
----- ---
Error_Failed_To_Do_A 0x81A0
Error_Failed_To_Do_B 0x810A

Matching Something Against Array List Using Where Object

I've found multiple examples of what I'm trying here, but for some reason it's not working.
I have a list of regular expressions that I'm checking against a single value and I can't seem to get a match.
I'm attempting to match domains. e.g. gmail.com, yahoo.com, live.com, etc.
I am importing a csv to get the domains and have debugged this code to make sure the values are what I expect. e.g. "gmail.com"
Regular expression examples AKA $FinalWhiteListArray
(?i)gmail\.com
(?i)yahoo\.com
(?i)live\.com
Code
Function CheckDirectoryForCSVFilesToSearch {
$global:CSVFiles = Get-ChildItem $Global:Directory -recurse -Include *.csv | % {$_.FullName} #removed -recurse
}
Function ImportCSVReports {
Foreach ($CurrentChangeReport in $global:CSVFiles) {
$global:ImportedChangeReport = Import-csv $CurrentChangeReport
}
}
Function CreateWhiteListArrayNOREGEX {
$Global:FinalWhiteListArray = New-Object System.Collections.ArrayList
$WhiteListPath = $Global:ScriptRootDir + "\" + "WhiteList.txt"
$Global:FinalWhiteListArray= Get-Content $WhiteListPath
}
$Global:ScriptRootDir = Split-Path -Path $psISE.CurrentFile.FullPath
$Global:Directory = $Global:ScriptRootDir + "\" + "Reports to Search" + "\" #Where to search for CSV files
CheckDirectoryForCSVFilesToSearch
ImportCSVReports
CreateWhiteListArrayNOREGEX
Foreach ($Global:Change in $global:ImportedChangeReport){
If (-not ([string]::IsNullOrEmpty($Global:Change.Previous_Provider_Contact_Email))){
$pos = $Global:Change.Provider_Contact_Email.IndexOf("#")
$leftPart = $Global:Change.Provider_Contact_Email.Substring(0, $pos)
$Global:Domain = $Global:Change.Provider_Contact_Email.Substring($pos+1)
$results = $Global:FinalWhiteListArray | Where-Object { $_ -match $global:Domain}
}
}
Thanks in advance for any help with this.

the problem with your current code is that you put the regex on the left side of the -match operator. [grin] swap that and your code otta work.
taking into account what LotPings pointed out about case sensitivity and using a regex OR symbol to make one test per URL, here's a demo of some of that. the \b is for word boundaries, the | is the regex OR symbol. the $RegexURL_WhiteList section builds that regex pattern from the 1st array. if i haven't made something clear, please ask ...
$URL_WhiteList = #(
'gmail.com'
'yahoo.com'
'live.com'
)
$RegexURL_WhiteList = -join #('\b' ,(#($URL_WhiteList |
ForEach-Object {
[regex]::Escape($_)
}) -join '|\b'))
$NeedFiltering = #(
'example.com/this/that'
'GMail.com'
'gmailstuff.org/NothingElse'
'NotReallyYahoo.com'
'www.yahoo.com'
'SomewhereFarAway.net/maybe/not/yet'
'live.net'
'Live.com/other/another'
)
foreach ($NF_Item in $NeedFiltering)
{
if ($NF_Item -match $RegexURL_WhiteList)
{
'[ {0} ] matched one of the test URLs.' -f $NF_Item
}
}
output ...
[ GMail.com ] matched one of the test URLs.
[ www.yahoo.com ] matched one of the test URLs.
[ Live.com/other/another ] matched one of the test URLs.

Get index of regex in filename in powershell

I'm trying to get the starting position for a regexmatch in a folder name.
dir c:\test | where {$_.fullname.psiscontainer} | foreach {
$indexx = $_.fullname.Indexofany("[Ss]+[0-9]+[0-9]+[Ee]+[0-9]+[0-9]")
$thingsbeforeregexmatch.substring(0,$indexx)
}
Ideally, this should work but since indexofany doesn't handle regex like that I'm stuck.

You can use the Regex.Match() method to perform a regex match. It'll return a MatchInfo object that has an Index property you can use:
Get-ChildItem c:\test | Where-Object {$_.PSIsContainer} | ForEach-Object {
# Test if folder's Name matches pattern
$match = [regex]::Match($_.Name, '[Ss]+[0-9]+[0-9]+[Ee]+[0-9]+[0-9]')
if($match.Success)
{
# Grab Index from the [regex]::Match() result
$Index = $Match.Index
# Substring using the index we obtained above
$ThingsBeforeMatch = $_.Name.Substring(0, $Index)
Write-Host $ThingsBeforeMatch
}
}
Alternatively, use the -match operator and the $Matches variable to grab the matched string and use that as an argument to IndexOf() (using RedLaser's sweet regex optimization):
if($_.Name -match 's+\d{2,}e+\d{2,}')
{
$Index = $_.Name.IndexOf($Matches[0])
$ThingsBeforeMatch = $_.Name.Substring(0,$Index)
}

You can use the Index property of the Match object. Example:
# Used regEx fom #RedLaser's comment
$regEx = [regex]'(?i)[s]+\d{2}[e]+\d{2}'
$testString = 'abcS00E00b'
$match = $regEx.Match($testString)
if ($match.Success)
{
$startingIndex = $match.Index
Write-Host "Match. Start index = $startingIndex"
}
else
{
Write-Host 'No match found'
}

How can I match these function calls, and extract the nmae of the function and the first argument?

I am trying to parse an array of elements. Those who match a pattern like the following:
Jim("jjanson", Customer.SALES);
I want to create a hash table like Jim => "jjanson"
How can I do this?
I can not match the lines using:
if($line =~ /\s*[A-Za-z]*"(.*),Customer.*\s*/)

You're not matching either the '(' after the name, nor the ' ' after the comma, before "Customer.".
I can get 'jjanson"' using this expression:
/\s*[A-Za-z]\(*"(.*), Customer.*\s*/
But I assume you don't want jjanson", so we need to modify it like so. (I tend to include the negative character class when I'm looking for simply-delimited stuff. So, in this case I'll make the expression "[^"]*"
/\s*[A-Za-z]\(*"([^"]+)", Customer.*\s*/
Also, I try not to depend upon whitespace, presence or number, I'm going to replace the space with \s*. That you didn't notice that you skipped the whitespace is a good illustration of the need to say "ignore a bunch of whitespace".
/\s*[A-Za-z]\(*"([^"]+)",\s*Customer.*\s*/
Now it's only looking for the sequence ',' + 'Customer' in the significant characters. Functionally, the same, if more flexible.
But since you only do one capture, I can't see what you'd map to what. So I'll do my own mapping:
my %records;
while ( my $line = $source->()) { # simply feed for a source of lines.
my ( $first, $user, $tag )
= $line = m/\s*(\p{Alpha}+)\s*\(\s*"([^"]+)",\s*Customer\.(\S+?)\)\/
;
$records{ $user }
= { first => $first
, username => $user
, tag => $tag
};
}
This is much more than you would tend to need in a one-off, quick solution. But I like to store as much of my input as seems significant.

Note that Jim("jjanson", Customer.SALES); matches the syntax of a function call with two arguments. You can thus abuse string eval:
#!/usr/bin/env perl
use strict;
use warnings;
use YAML::XS;
my $info = extract_first_arg(q{ Jim("jjanson", Customer.SALES);} );
print Dump $info;
sub extract_first_arg {
my $call = shift;
my ($name) = ($call =~ m{ \A \s* (\w+) }x);
unless ($name) {
warn "Failed to find function name in '$call'";
return;
}
my $username = eval sprintf(q{
package My::DangerZone;
no strict;
local *{ %s } = sub { $_[0] };
%s
}, $name, $call);
return { $name => $username };
}
Output:
---
Jim: jjanson
Or, you can abuse autoloading:
our $AUTOLOAD;
print Dump eval 'no strict;' . q{ Jim("jjanson", Customer.SALES); };
sub AUTOLOAD {
my ($fn) = ($AUTOLOAD =~ /::(\w+)\z/);
return { $fn => $_[0] };
}
I would not necessarily recommend using these methods, especially on input that is not in your control, and in a situation where this script has access to sensitive facilities.
On the other hand, I have, in the right circumstances, utilized this kind of thing in transforming one given set of information into something that can be used elsewhere.

Try this:
$line = 'Jim("jjanson", Customer.SALES)';
my %hashStore = (); #Jim("jjanson"
if($line=~m/^\s*([^\(\)]*)\(\"([^\"]*)\"/g) { $hashStore{$1} = $2; }
use Data::Dumper;
print Dumper \%hashStore;
Output:
$VAR1 = {
'Jim' => 'jjanson'
};

regular expression code

I need to find match between two tab delimited files files like this:
File 1:
ID1 1 65383896 65383896 G C PCNXL3
ID1 2 56788990 55678900 T A ACT1
ID1 1 56788990 55678900 T A PRO55
File 2
ID2 34 65383896 65383896 G C MET5
ID2 2 56788990 55678900 T A ACT1
ID2 2 56788990 55678900 T A HLA
what I would like to do is to retrive the matching line between the two file. What I would like to match is everyting after the gene ID
So far I have written this code but unfortunately perl keeps giving me the error:
use of "Use of uninitialized value in pattern match (m//)"
Could you please help me figure out where i am doing it wrong?
Thank you in advance!
use strict;
open (INA, $ARGV[0]) || die "cannot to open gene file";
open (INB, $ARGV[1]) || die "cannot to open coding_annotated.var files";
my #sample1 = <INA>;
my #sample2 = <INB>;
foreach my $line (#sample1) {
my #tab = split (/\t/, $line);
my $chr = $tab[1];
my $start = $tab[2];
my $end = $tab[3];
my $ref = $tab[4];
my $alt = $tab[5];
my $name = $tab[6];
foreach my $item (#sample2){
my #fields = split (/\t/,$item);
if ( $fields[1] =~ m/$chr(.*)/
&& $fields[2] =~ m/$start(.*)/
&& $fields[4] =~ m/$ref(.*)/
&& $fields[5] =~ m/$alt(.*)/
&& $fields[6] =~ m/$name(.*)/
) {
print $line, "\n", $item;
}
}
}

On its surface your code seems to be fine (although I didn't debug it). If you don't have an error I cannot spot, could be that the input data has RE special character, which will confuse the regular expression engine when you put it as is (e.g. if any of the variable has the '$' character). Could also be that instead of tab you have spaces some where, in which case you'll indeed get an error, because your split will fail.
In any case, you'll be better off composing just one regular expression that contains all the fields. My code below is a little bit more Perl Idiomatic. I like using the implicit $_ which in my opinion makes the code more readable. I just tested it with your input files and it does the job.
use strict;
open (INA, $ARGV[0]) or die "cannot open file 1";
open (INB, $ARGV[1]) or die "cannot open file 2";
my #sample1 = <INA>;
my #sample2 = <INB>;
foreach (#sample1) {
(my $id, my $chr, my $start, my $end, my $ref, my $alt, my $name) =
m/^(ID\d+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+(\w+)/;
my $rex = "^ID\\d+\\s+$chr\\s+$start\\s+$end\\s+$ref\\s+$alt\\s+$name\\s+";
#print "$rex\n";
foreach (#sample2) {
if( m/$rex/ ) {
print "$id - $_";
}
}
}
Also, how regular is the input data? Do you have exactly one tab between the fields? If that is the case, there is no point to split the lines into 7 different fields - you only need two: the ID portion of the line, and the rest. The first regex would be
(my $id, my $restOfLine) = m/^(ID\d+)\s+(.*)$/;
And you are searching $restOfLine within the second file in a similar technique as above.
If your files are huge and performance is an issue, you should consider putting the first regular expressions (or strings) in a map. That will give you O(n*log(m)) where n and m are the number of lines in each file.
Finally, I have a similar challenge when I need to compare logs. The logs are supposed to be identical, with the exception of a time mark at the beginning of each line. But more importantly: most lines are the same and in order. If this is what you have, and it make sense for you, you can:
First remove the IDxxx from each line: perl -pe "s/ID\d+ +//" file >cleanfile
Then use BeyondCompare or Windiff to compare the files.

I played a bit with your code. What you wrote there was actually three loops:
one over the lines of the first file,
one over the lines of the second file, and
one over all fields in these lines. You manually unrolled this loop.
The rest of this answer assumes that the files are strictly tab-seperated and that any other whitespace matters (even at the end of fields and lines).
Here is a condensed version of the code (assumes open filehandles $file1, $file2, and use strict):
my #sample2 = <$file2>;
SAMPLE_1:
foreach my $s1 (<$file1>) {
my (undef, #fields1) = split /\t/, $s1;
my #regexens = map qr{\Q$_\E(.*)}, #fields1;
SAMPLE_2:
foreach my $s2 (#sample2) {
my (undef, #fields2) = split /\t/, $s2;
for my $i (0 .. $#regexens) {
$fields2[$i] =~ $regexens[$i] or next SAMPLE_2;
}
# only gets here if all regexes matched
print $s1, $s2;
}
}
I did some optimisations: precompiling the various regexes and storing them in an array, quoting the contents of the fields etc. However, this algorithm is O(n²), which is bad.
Here is an elegant variant of that algorithm that knows that only the first field is different — the rest of the line has to be the same character for character:
my #sample2 = <$file2>;
foreach my $s1 (<$file1>) {
foreach my $s2 (#sample2) {
print $s1, $s2 if (split /\t/, $s1, 2)[1] eq (split /\t/, $s2, 2)[1];
}
}
I just test for string equality of the rest of the line. While this algorithm is still O(n²), it outperforms the first solution roughly by an order of magnitude simply by avoiding braindead regexes here.
Finally, here is an O(n) solution. It is a variant of the previous one, but executes the loops after each other, not inside each other, therefore finishing in linear time. We use hashes:
# first loop via map
my %seen = map {reverse(split /\t/, $_, 2)}
# map {/\S/ ? $_ : () } # uncomment this line to handle empty lines
<$file1>;
# 2nd loop
foreach my $line (<$file2>) {
my ($id2, $key) = split /\t/, $line, 2;
if (defined (my $id1 = $seen{$key})) {
print "$id1\t$key";
print "$id2\t$key";
}
}
%seen is a hash that has the rest of the line as a key and the first field as a value. In the second loop, we retrieve the rest of the line again. If this line was present in the first file, we reconstruct the whole line and print it out. This solution is better than the others and scales well up- and downwards, because of its linear complexity

How about:
#!/usr/bin/perl
use File::Slurp;
use strict;
my ($ina, $inb) = #ARGV;
my #lines_a = File::Slurp::read_file($ina);
my #lines_b = File::Slurp::read_file($inb);
my $table_b = {};
my $ln = 0;
# Store all lines in second file in a hash with every different value as a hash key
# If there are several identical ones we store them also, so the hash values are lists containing the id and line number
foreach (#lines_b) {
chomp; # strip newlines
$ln++; # count current line number
my ($id, $rest) = split(m{[\t\s]+}, $_, 2); # split on whitespaces, could be too many tabs or spaces instead
if (exists $table_b->{$rest}) {
push #{ $table_b->{$rest} }, [$id, $ln]; # push to existing list if we already found an entry that is the same
} else {
$table_b->{$rest} = [ [$id, $ln] ]; # create new entry if this is the first one
}
}
# Go thru first file and print out all matches we might have
$ln = 0;
foreach (#lines_a) {
chomp;
$ln++;
my ($id, $rest) = split(m{[\t\s]+}, $_, 2);
if (exists $table_b->{$rest}) { # if we have this entry print where it is found
print "$ina:$ln:\t\t'$id\t$rest'\n " . (join '\n ', map { "$inb:$_->[1]:\t\t'$_->[0]\t$rest'" } #{ $table_b->{$rest} }) . "\n";
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

using a partial variable as part of regular expression string - regex

Related

PowerShell regex to get just hex part in strings

Matching Something Against Array List Using Where Object

Get index of regex in filename in powershell

How can I match these function calls, and extract the nmae of the function and the first argument?

regular expression code

Categories

Resources