Getting first two strings between slashes - regex

I have a string, alpha/beta/charlie/delta
I'm trying to extract out the string alpha/beta including the forward slash.
I'm able to accomplish this with split and joining the first and second result, but I feel like a regex might be better suited.
Depending on how many slashes there are as well will determine how many strings I need to grab, e.g. if there's 4 slashes get the first two strings, if there's 5, then grab first three. Again, my problem is extracting the slash with the string.

As Mathias already noticed - Split+Join is a perfectly valid solution:
$StringArray = #(
'alpha/beta/charlie/delta',
'alpha/beta/charlie/delta/omega'
'alpha/beta/charlie/gamma/delta/omega'
)
foreach ($String in $StringArray) {
$StringSplit = $String -split '/'
($StringSplit | Select-Object -First ($StringSplit.Count - 2) ) -join '/'
}

A little long, but I did it without regex:
$string = 'alpha/beta/charlie/delta/gamma'
# Count number of '/'
$count = 0
for( $i = 0; $i -lt $string.Length; $i++ ) {
if( $string[ $i ] -eq '/' ) {
$count = $count + 1
}
}
# Depending on the number of '/' you can create a mathematical equation, or simply do an if-else ladder.
# In this case, if count of '/' = 3, get first 2 strings, if count = 4, get first 3 strings.
function parse-strings {
Param (
$number_of_slashes,
$string
)
$all_slash = $number_of_slashes
$to_get = $number_of_slashes - 1
$counter = 0
for( $j = 0; $j -lt $string.Length; $j++ ) {
if( $string[ $j ] -eq '/' ) {
$counter = $counter + 1
}
if( $counter -eq $to_get ) {
( $string[ 0 .. ( $j - 1 ) ] -join "" )
break
}
}
}
parse-strings -number_of_slashes $count -string $string

You can try the .split() .net method where you define in parentheses where to split (on which character).
Then use the join operator “-join” to join your elements from the array
For your matter of concern use it like this:
$string = 'alpha/beta/charlie/delta/gamma'
$string = $string.split('/')
$string = "$($string[0])" + "/" + "$($string[1])"
$string
And so on...

Related

Powershell regex replacement expressions

I've followed the excellent solution in this article:
PowerShell multiple string replacement efficiency
to try and normalize telephone numbers imported from Active Directory. Here is an example:
$telephoneNumbers = #(
'+61 2 90237534',
'04 2356 3713'
'(02) 4275 7954'
'61 (0) 3 9635 7899'
'+65 6535 1943'
)
# Build hashtable of search and replace values.
$replacements = #{
' ' = ''
'(0)' = ''
'+61' = '0'
'(02)' = '02'
'+65' = '001165'
'61 (0)' = '0'
}
# Join all (escaped) keys from the hashtable into one regular expression.
[regex]$r = #($replacements.Keys | foreach { [regex]::Escape( $_ ) }) -join '|'
[scriptblock]$matchEval = { param( [Text.RegularExpressions.Match]$matchInfo )
# Return replacement value for each matched value.
$matchedValue = $matchInfo.Groups[0].Value
$replacements[$matchedValue]
}
# Perform replace over every line in the file and append to log.
$telephoneNumbers |
foreach {$r.Replace($_,$matchEval)}
I'm having problems with the formatting of the match expressions in the $replacements hashtable. For example, I would like to match all +61 numbers and replace with 0, and match all other + numbers and replace with 0011.
I've tried the following regular expressions but they don't seem to match:
'^+61'
'^+[^61]'
What am I doing wrong? I've tried using \ as an escape character.
I've done some re-arrangement of this, I'm not sure if it works for your whole situation but it gives the right results for the example.
I think the key is not to try and create one big regex from the hashtable, but rather to loop over it and check the values in it against the telephone numbers.
The only other change I made was moving the ' ','' replacement from the hash into the code that prints the replacement phone number, as you want this to run in every scenario.
Code is below:
$telephoneNumbers = #(
'+61 2 90237534',
'04 2356 3713'
'(02) 4275 7954'
'61 (0) 3 9635 7899'
'+65 6535 1943'
)
$replacements = #{
'(0)' = ''
'+61' = '0'
'(02)' = '02'
'+65' = '001165'
}
foreach ($t in $telephoneNumbers) {
$m = $false
foreach($r in $replacements.getEnumerator()) {
if ( $t -match [regex]::Escape($r.key) ) {
$m = $true
$t -replace [regex]::Escape($r.key), $r.value -replace ' ', '' | write-output
}
}
if (!$m) { $t -replace ' ', '' | write-output }
}
Gives:
0290237534
0423563713
0242757954
61396357899
00116565351943

Character match count between strings in Perl

I have a string (say string 1) that needs to be matched to another string (string2). Both the strings will have the same length and are case in-sensitive.
I want to print the number of character matches between both the strings.
E.g.: String 1: stranger
String 2: strangem
Match count = 7
I tried this:
$string1 = "stranger";
$string2 = "strangem";
my $count = $string1 =~ m/string2/ig;
print "$count\n";
How can I fix this?
Exclusive or, then count the null characters (where the strings were the same):
my $string1 = "stranger";
my $string2 = "strangem";
my $count = ( lc $string1 ^ lc $string2 ) =~ tr/\0//;
print "$count\n";
I missed the "case in-sensitive" bit.
You can use substr for that:
#!/usr/bin/perl
use warnings;
use strict;
my $string1=lc('stranger');
my $string2=lc('strangem');
my $count=0;
for (0..length($string1)-1) {
$count++ if substr($string1,$_,1) eq substr($string2,$_,1);
}
print $count; #prints 7
Or you can use split to get all characters as an array, and loop:
#!/usr/bin/perl
use warnings;
use strict;
my $string1=lc('stranger');
my $string2=lc('strangem');
my $count=0;
my #chars1=split//,$string1;
my #chars2=split//,$string2;
for (0..$#chars1) {
$count++ if $chars1[$_] eq $chars2[$_];
}
print $count; #prints 7
(fc gives more accurate results than lc, but I went for backwards compatibility.)
Not tested
sub cm
{
my #a = shift;
my #b = shift;
# First match prefix of string:
my $n = 0;
while ($n < $#a && $n < $#b && $a[$n] eq $b[$n]) {
++$n;
}
# Then skip one char on either side, and recurse.
if ($n < $#a && $n < $#b) {
# Match rest by skipping one place:
my $n2best = 0;
my $n2a = cm(splice(#a, $n), splice(#b, $n + 1));
$n2best = $n2a;
my $n2b = cm(splice(#a, $n + 1), splice(#b, $n));
$n2best = $n2b if $n2b > $n2best;
my $n2c = cm(splice(#a, $n + 1), splice(#b, $n + 1));
$n2best = $n2c if $n2c > $n2best;
$n += $n2best;
}
return $n;
}
sub count_matches
{
my $a = shift;
my $b = shift;
my #a_chars = split //, $a;
my #b_chars = split //, $b;
return cm(#a_chars, #b_chars);
}
print count_matches('stranger', 'strangem')

Use of uninitialized value in perl

So I have this code:
use warnings;
use strict;
my #arr = ("stuff (06:13)", "more stuff (02:59)", "extra stuff (00:00)");
my #new_arr = map { /\((\d+:\d+)\)/ ; $1 } #arr;
my ( $sum, $hrs, $mins );
$sum = 0;
for my $t (#new_arr) {
my ( $h, $m ) = split m/:/, $t;
my $hm = $h * 3600;
my $tm = $m * 60;
$sum = $sum + $hm + $tm;
}
$mins = sprintf( "%02d", ( $sum % 3600 ) / 60 );
$hrs = int( $sum / 3600 );
print "$hrs:$mins\n";
and I got uninitialized value error
Use of uninitialized value $t in split at DR/Hello World/test.pl line 14.
Use of uninitialized value $h in multiplication (*) at DR/test.pl line 16.
Use of uninitialized value $m in multiplication (*) at DR/test.pl line 17.
so how can I fix that?
stuff (3+06:13) doesn't match /\((\d+:\d+)\)/, so $1 is left untouched, so $1 contains undef, so undef ends up in #arr.
It's unwise to use $1 without making sure the pattern matches. Either adjust the pattern to make sure it always matches,
/\(([\d+]+:\d+)\)/
Or filter out the results that don't match.
my #new_arr = map { /\((\d+:\d+)\)/ ? $1 : () } #arr;
-or-
my #new_arr = map { /\((\d+:\d+)\)/ } #arr;
You have a similar problem with map { /\((\d+\++)/ ; $1 }.
You're missing a capture in two of your regexes.
Your first one:
my #new_arr = map { /\((\d+:\d+)\)/ ; $1 } #arr;
Misses a capture in the first instance:
$VAR2 = [
undef,
'02:59',
'00:00'
];
Which can be corrected (see below).
Your second capture also fails to capture anything:
my #x = map {/\((\d+\++)/ ; $1 } #arr;
See:
$VAR3 = [
'3+',
undef,
undef
];
This is because your asking it to find a digit \d followed by a literal + one or more times, which only occurs in $arr[0]. Below i've adjusted to capture 0 if no capture is found:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my #arr = ("stuff (3+06:13)", "more stuff (02:59)", "extra stuff (00:00)");
my #new_arr = map { /\(.*?(\d+:\d+)\)/ ; $1 } #arr;
my #x = map {/\((\d+\+)|(0)/ ; $1 // $2 } #arr;
my ( $sum, $hrs, $mins );
$sum = 0;
for my $t (#new_arr) {
my ( $h, $m ) = split m/:/, $t;
my $hm = $h * 3600;
my $tm = $m * 60;
$sum = $sum + $hm + $tm;
}
$mins = sprintf( "%02d", ( $sum % 3600 ) / 60 );
$hrs = int( $sum / 3600 );
print "$hrs:$mins\n";
print Dumper (\#arr, \#new_arr, \#x);
$VAR1 = [
'stuff (3+06:13)',
'more stuff (02:59)',
'extra stuff (00:00)'
];
$VAR2 = [
'06:13',
'02:59',
'00:00'
];
$VAR3 = [
'3+',
'0',
'0'
];
Output:
9:12

regex for a number, number incremented by one and two

Give a number x, I wonder if there is any regex that matches for x and x+1 and x+2.
Thanks,
The best approach would probably be to do something like:
my $x = 3;
my $regex = join "|", $x, $x+1, $x+2;
for (0 .. 10) {
print "$_\n" if /$regex/;
}
But if you want, you can use interpolation directly within the regexp:
my $x = 3;
for (0 .. 10) {
print "$_\n" if /$x|${\($x+1)}|${\($x+2)}/;
}
Output for both:
3
4
5
I personally think the latter is a lot less readable though.
String contains:
my $pat = join '|', $x, $x+1, $x+2;
$s =~ /(?<![0-9])(?:$pat)(?![0-9])/ # Assumes non-negative integers
Exact match:
my $pat = join '|', $x, $x+1, $x+2;
$y =~ /^(?:$pat)\z/
$y == $x || $y == $x+1 || $y == $x+2 # Most straightfoward
$x <= $y && $y <= $x+2 # Possibly clearest
Exact match (More exotic):
grep $y == $x + $_, 0..2
$y ~~ [ map $x_+$_, 0..2 ]
You could use (??{...}):
use re qw'eval';
/^ (?: $x | (??{ $x+1 }) | (??{ $x+2 }) ) $/x;
I would like to say that it make more sense to use $":
local $" #" # fix highlighting
= '|';
/^#{[ $x, $x+1, $x+2 ]}$/;
/^#{[ $x .. $x+2 ]}$/;
my #match = ( $x, $x+1, $x+2 );
/^#match$/;
I first thought of using index like so:
index( $source, $x + 2 );
But then, for $x=1, it just looks for a '3' anywhere in the string, matching 31, 23. So it appears that you might want to use a regex to make sure that it's an isolated string of digits.
/(?<!\d)${\( $x + 2 )}(?!\d)/
Given your other variable--let's call it $y--why not just check whether or not $y==$x, $y-$x==1, or $y-$x==2?
As ghoti pointed out in his/her comment, regular expressions aren't the tool for this.
For X = 10, regex should be \b(?:11|12)\b

How to automagically create pattern based on real data?

I have many vendors in database, they all differ in some aspect of their data. I'd like to make data validation rule which is based on previous data.
Example:
A: XZ-4, XZ-23, XZ-217
B: 1276, 1899, 22711
C: 12-4, 12-75, 12
Goal: if user inputs string 'XZ-217' for vendor B, algorithm should compare previous data and say: this string is not similar to vendor B previous data.
Is there some good way/tools to achieve such comparison? Answer could be some generic algoritm or Perl module.
Edit:
The "similarity" is hard to define, i agree. But i'd like to catch to algorithm, which could analyze previous ca 100 samples and then compare the outcome of analyze with new data. Similarity may based on length, on use of characters/numbers, string creation patterns, similar beginning/end/middle, having some separators in.
I feel it is not easy task, but on other hand, i think it has very wide use. So i hoped, there is already some hints.
You may want to peruse:
http://en.wikipedia.org/wiki/String_metric and http://search.cpan.org/dist/Text-Levenshtein/Levenshtein.pm (for instance)
Joel and I came up with similar ideas. The code below differentiates 3 types of zones.
one or more non-word characters
alphanumeric cluster
a cluster of digits
It creates a profile of the string and a regex to match input. In addition, it also contains logic to expand existing profiles. At the end, in the task sub, it contains some pseudo logic which indicates how this might be integrated into a larger application.
use strict;
use warnings;
use List::Util qw<max min>;
sub compile_search_expr {
shift;
#_ = #{ shift() } if #_ == 1;
my $str
= join( '|'
, map { join( ''
, grep { defined; }
map {
$_ eq 'P' ? quotemeta;
: $_ eq 'W' ? "\\w{$_->[1],$_->[2]}"
: $_ eq 'D' ? "\\d{$_->[1],$_->[2]}"
: undef
;
} #$_
)
} #_ == 1 ? #{ shift } : #_
);
return qr/^(?:$str)$/;
}
sub merge_profiles {
shift;
my ( $profile_list, $new_profile ) = #_;
my $found = 0;
PROFILE:
for my $profile ( #$profile_list ) {
my $profile_length = #$profile;
# it's not the same profile.
next PROFILE unless $profile_length == #$new_profile;
my #merged;
for ( my $i = 0; $i < $profile_length; $i++ ) {
my $old = $profile->[$i];
my $new = $new_profile->[$i];
next PROFILE unless $old->[0] eq $new->[0];
push( #merged
, [ $old->[0]
, min( $old->[1], $new->[1] )
, max( $old->[2], $new->[2] )
]);
}
#$profile = #merged;
$found = 1;
last PROFILE;
}
push #$profile_list, $new_profile unless $found;
return;
}
sub compute_info_profile {
shift;
my #profile_chunks
= map {
/\W/ ? [ P => $_ ]
: /\D/ ? [ W => length, length ]
: [ D => length, length ]
}
grep { length; } split /(\W+)/, shift
;
}
# Psuedo-Perl
sub process_input_task {
my ( $application, $input ) = #_;
my $patterns = $application->get_patterns_for_current_customer;
my $regex = $application->compile_search_expr( $patterns );
if ( $input =~ /$regex/ ) {}
elsif ( $application->approve_divergeance( $input )) {
$application->merge_profiles( $patterns, compute_info_profile( $input ));
}
else {
$application->escalate(
Incident->new( issue => INVALID_FORMAT
, input => $input
, customer => $customer
));
}
return $application->process_approved_input( $input );
}
Here is my implementation and a loop over your test cases. Basically you give a list of good values to the function and it tries to build a regex for it.
output:
A: (?^:\w{2,2}(?:\-){1}\d{1,3})
B: (?^:\d{4,5})
C: (?^:\d{2,2}(?:\-)?\d{0,2})
code:
#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils qw'uniq each_arrayref';
my %examples = (
A => [qw/ XZ-4 XZ-23 XZ-217 /],
B => [qw/ 1276 1899 22711 /],
C => [qw/ 12-4 12-75 12 /],
);
foreach my $example (sort keys %examples) {
print "$example: ", gen_regex(#{ $examples{$example} }) || "Generate failed!", "\n";
}
sub gen_regex {
my #cases = #_;
my %exploded;
# ex. $case may be XZ-217
foreach my $case (#cases) {
my #parts =
grep { defined and length }
split( /(\d+|\w+)/, $case );
# #parts are ( XZ, -, 217 )
foreach (#parts) {
if (/\d/) {
# 217 becomes ['\d' => 3]
push #{ $exploded{$case} }, ['\d' => length];
} elsif (/\w/) {
#XZ becomes ['\w' => 2]
push #{ $exploded{$case} }, ['\w' => length];
} else {
# - becomes ['lit' => '-']
push #{ $exploded{$case} }, ['lit' => $_ ];
}
}
}
my $pattern = '';
# iterate over nth element (part) of each case
my $ea = each_arrayref(values %exploded);
while (my #parts = $ea->()) {
# remove undefined (i.e. optional) parts
my #def_parts = grep { defined } #parts;
# check that all (defined) parts are the same type
my #part_types = uniq map {$_->[0]} #def_parts;
if (#part_types > 1) {
warn "Parts not aligned\n";
return;
}
my $type = $part_types[0]; #same so make scalar
# were there optional parts?
my $required = (#parts == #def_parts);
# keep the values of each part
# these are either a repitition or lit strings
my #values = sort uniq map { $_->[1] } #def_parts;
# these are for non-literal quantifiers
my $min = $required ? $values[0] : 0;
my $max = $values[-1];
# write the specific pattern for each type
if ($type eq '\d') {
$pattern .= '\d' . "{$min,$max}";
} elsif ($type eq '\w') {
$pattern .= '\w' . "{$min,$max}";
} elsif ($type eq 'lit') {
# quote special characters, - becomes \-
my #uniq = map { quotemeta } uniq #values;
# join with alternations, surround by non-capture grouup, add quantifier
$pattern .= '(?:' . join('|', #uniq) . ')' . ($required ? '{1}' : '?');
}
}
# build the qr regex from pattern
my $regex = qr/$pattern/;
# test that all original patterns match (#fail should be empty)
my #fail = grep { $_ !~ $regex } #cases;
if (#fail) {
warn "Some cases fail for generated pattern $regex: (#fail)\n";
return '';
} else {
return $regex;
}
}
To simplify the work of finding the pattern, optional parts may come at the end, but no required parts may come after optional ones. This could probably be overcome but it might be hard.
If there was a Tie::StringApproxHash module, it would fit the bill here.
I think you're looking for something that combines the fuzzy-logic functionality of String::Approx and the hash interface of Tie::RegexpHash.
The former is more important; the latter would make light work of coding.