How to capture string before the symbol - regex

Morning all,
I want to capture the string before the colon:
and then compare the string after the colon
and remove the values which are equal to the values before the colon.
For example:
aaa:aaa-bbb-ccc
Output:
aaa others:bbb,ccc
my code below
$string = "aaa:aaa-bbb-ccc";
$first =~ /(:.*\)/; //get aaa before the colon
$others =$string=~ s/$first//; remove the same values after colon
Can you please help me ?? Thank you..

This works:
use strict;
use warnings;
my $s='aaa:aaa-bbb-ccc';
if ($s=~/^([^:]+):/) {
my $first=$1;
$s=~s/\Q$1\E://;
print "$first others:" . join ',', grep { $_ ne $first } split '-', $s;
}
# aaa others:bbb,ccc

Related

Matching a variable in a string in Perl from the end

I want to match a variable character in a given string, but from the end.
Ideas on how to do this action?
for example:
sub removeCharFromEnd {
my $string = shift;
my $char = shift;
if($string =~ m/$char/){ // I want to match the char, searching from the end, $doesn't work
print "success";
}
}
Thank you for your assistance.
There is no regex modifier that would force Perl regex engine to parse the string from right to left. Thus, the most convenient way to achieve that is via a negative lookahead:
m/$char(?!.*$char)/
The (?!.*$char) negative lookahead will require the absence (=will fail the match if found) of a $char after any 0+ chars other than linebreak chars (use s modifier if you are running the regex against a multiline string input).
The regex engine works from left to right.
You can use the natural greediness of quantifiers to reach the end of the string and find the last char with the backtracking mechanism:
if($string =~ m/.*\K$char/s) { ...
\K marks the position of the match result beginning.
Other ways:
you can also reverse the string and use your previous pattern.
you can search all occurrences and take the last item in the list
I'm having trouble understanding what you want. Your subroutine is called removeCharFromEnd, so perhaps you want to remove $char from $string if it appears at the end of the string
You can do that like this
sub removeCharFromEnd {
my ( $string, $char ) = #_;
if ( $string =~ s/$char\z// ) {
print "success";
}
$string;
}
Or perhaps you want to remove the last occurrence of $char wherever it is. You can do that with
s/.*\K$char//
The subroutine I have written returns the modified string, so you would have to assign the result to a variable to save it. You can write
my $s = 'abc';
$s = removeCharFromEnd($s, 'c');
say $s;
output
ab
If you just want to modify the string in place then you should write
$ARGV[0] =~ s/$char\z//
using whichever substitution you choose. Then you can do this
my $s = 'abc';
removeCharFromEnd($s, 'c');
say $s;
This produces the same output
To get Perl to search from the end of a string, reverse the string.
sub removeCharFromEnd {
my $string = reverse shift #_;
my $char = quotemeta reverse shift #_;
$string =~ s/$char//;
$string = reverse $string;
return $string;
}
print removeCharFromEnd(qw( abcabc b )), "\n";
print removeCharFromEnd(qw( abcdefabcdef c )), "\n";
print removeCharFromEnd(qw( !"/$%?&*!"/$%?&* $ )), "\n";

perl regex square brackets and single quotes

Have this string:
ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722
The data is repeated.
I need to remove the []' characters from the data so it looks like this:
ABC,-0.5,10Y,10Y,TEST,ABC.1000145721ABC,-0.5,20Y,10Y,TEST,ABC.1000145722
I'm also trying to split the data to assign it to variables as seen below:
my($currency, $strike, $tenor, $tenor2,$ado_symbol) = split /,/, $_;
This works for everything but the ['TEST'] section. Should I remove the []' characters first then keep my split the same or is there an easier way to do this?
Thanks
Something that's useful to know is this - that split takes a regex. (It'll even let you capture, but that'll insert into the returned list, which is why I've got (?: for non capturing groups)
I observe your data only has [' right next to the delimiter - so how about:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
while ( <DATA> ) {
chomp;
my #fields = split /(?:\'])?,(?:\[\')?/;
print Dumper \#fields;
}
__DATA__
ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722
Output:
$VAR1 = [
'ABC',
'-0.5',
'10Y',
'10Y',
'TEST',
'ABC.1000145721ABC',
'-0.5',
'20Y',
'10Y',
'TEST',
'ABC.1000145722'
];
my $str = "ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722";
$str =~ s/\['|'\]//g;
print $str;
output is
ABC,-0.5,10Y,10Y,TEST,ABC.1000145721ABC,-0.5,20Y,10Y,TEST,ABC.1000145722
Now you can split.
Clean up $ado_symbol after split:
$ado_symbol =~ s/^\['//;
$ado_symbol =~ s/'\]$//;
You can use a global regex match to find all substrings that are not a comma, a single quote, or a square bracket
Like this
use strict;
use warnings 'all';
my $s = q{ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722};
my #data = $s =~ /[^,'\[\]]+/g;
my ( $currency, $strike, $tenor, $tenor2, $ado_symbol ) = #data;
print "\$currency = $currency\n";
print "\$strike = $strike\n";
print "\$tenor = $tenor\n";
print "\$tenor2 = $tenor2\n";
print "\$ado_symbol = $ado_symbol\n";
output
$currency = ABC
$strike = -0.5
$tenor = 10Y
$tenor2 = 10Y
$ado_symbol = TEST
Another alternative
my $str = "ABC,-0.5,10Y,10Y,['TEST'],ABC.1000145721ABC,-0.5,20Y,10Y,['TEST'],ABC.1000145722";
my ($currency, $strike, $tenor, $tenor2,$ado_symbol) = map{ s/[^A-Z0-9\.-]//g; $_} split ',',$str;
print "$currency, $strike, $tenor, $tenor2, $ado_symbol",$/;
Output is:
ABC, -0.5, 10Y, 10Y, TEST

Array splitting with regex

I have an array that holds only one string, it's hash's value:
%hash = ("key"=>["Value1 unit(1), Value2 unit(2), Value3 unit"])
How to split the " unit"-s from the value of the hash and save it to an array?
The new array should be like this:
#array=["Value1", "Value2", "Value3"]
I've tried this way:
#array=split(/\s\w\(\w\)\,/, $hash{key});
Split the string on comma, then strip off the unit at the end.
map(s/\s.*$//, #array = split(/,\s*/, $hash{'key'}[0]));
Here's another option:
use strict;
use warnings;
use Data::Dumper;
my %hash = ("key"=>["1567 I(u), 2070 I(m), 2.456e-2 V(m), 417 ---, 12 R(k),"]);
my #array = $hash{'key'}->[0] =~ /(\S+)\s+\S+,?/g;
print Dumper \#array;
Output:
$VAR1 = [
'1567',
'2070',
'2.456e-2',
'417',
'12'
];
And another option:
my %hash = ("key"=>["Value1 unit(1), Value2 unit(2), Value3 unit"]);
my $i;
my #array = grep { ++$i % 2 } split /,?\s/, $hash{key}->[0];
$, = "\n";
print #array;
You can do it with a single relatively straightforward regex:
#array = $hash{'key'}[0] =~ m/\s*(\S+)\s+\S+,?/g;
The important thing to know is that with the /g flag, a regex matcher returns all the captured ($1, $2, etc.) groups from matching globally against the string. This regex has one such group.
My assumptions are:
The "Value1", "Value2" parts are just chunks of non-whitespace
The "unit(1)", "unit(2)", parts are also just chunks of non-whitespace
If these aren't valid assumptions, you can replace the two \S+ parts of the regex with something more specific that matches your data.

pattern matching in regular expression (Perl)

Make a pattern that will match three consecutive copies of whatever is currently contained in $what. That is, if $what is fred, your pattern should match fredfredfred. If $what is fred|barney, your pattern should match fredfredbarney, barneyfredfred, barneybarneybarney, or many other variations. (Hint: You should set $what at the top of the pattern test program with a statement like my $what = 'fred|barney';)
But my solution to this is just too easy so I'm assuming its wrong. My solution is:
#! usr/bin/perl
use warnings;
use strict;
while (<>){
chomp;
if (/fred|barney/ig) {
print "pattern found! \n";
}
}
It display what I want. And I didn't even have to save the pattern in a variable. Can someone help me through this? Or enlighten me if I'm doing/understanding the problem wrong?
This example should clear up what was wrong with your solution:
my #tests = qw(xxxfooxx oofoobar bar bax rrrbarrrrr);
my $str = 'foo|bar';
for my $test (#tests) {
my $match = $test =~ /$str/ig ? 'match' : 'not match';
print "$test did $match\n";
}
OUTPUT
xxxfooxx did match
oofoobar did match
bar did match
bax did not match
rrrbarrrrr did match
SOLUTION
#!/usr/bin/perl
use warnings;
use strict;
# notice the example has the `|`. Meaning
# match "fred" or "barney" 3 times.
my $str = 'fred|barney';
my #tests = qw(fred fredfredfred barney barneybarneybarny barneyfredbarney);
for my $test (#tests) {
if( $test =~ /^($str){3}$/ ) {
print "$test matched!\n";
} else {
print "$test did not match!\n";
}
}
OUTPUT
$ ./test.pl
fred did not match!
fredfredfred matched!
barney did not match!
barneybarneybarny did not match!
barneyfredbarney matched!
use strict;
use warnings;
my $s="barney/fred";
my #ra=split("/", $s);
my $test="barneybarneyfred"; #etc, this will work on all permutations
if ($test =~ /^(?:$ra[0]|$ra[1]){3}$/)
{
print "Valid\n";
}
else
{
print "Invalid\n";
}
Split delimited your string based off of "/". (?:$ra[0]|$ra[1]) says group, but do not extract, "barney" or "fred", {3} says exactly three copies. Add an i after the closing "/" if the case doesn't matter. The ^ says "begins with," and the $ says "ends with."
EDIT:
If you need the format to be barney\fred, use this:
my $s="barney\\fred";
my #ra=split(/\\/, $s);
If you know that the matching will always be on fred and barney, then you can just replace $ra[0], $ra[1] with fred and barney.

regex pattern match and extraction

I am trying to write a perl regex to extract words greater than 2 letters after the colon :. For example, If the pattern is subject:I am about to write a regex. I need to extract in my $variable only >2 letter wordsi.e, $variable = "subject:about write regex".
Here is my program where the regex and pattern matching is done but when I print, my variable is empty. What am I doing wrong?
#!/usr/bin/perl
while (<STDIN>) {
foreach my $query_part (split(/\s+/, $_)) {
my($query_part_subject) = $query_part =~ /([^\w\#\.]+)?((?:\w{3,}|[\$\#()+.])+)(?::(\w{3,}.+))?/ ;
print "query_part : $query_part_subject \n";
}
}
exit(0);
Try doing this :
#!/usr/bin/perl
use strict; use warnings;
while (<DATA>) {
s/.*?://;
print join "\n", grep { length($_) > 2 } split;
__DATA__
subject:a bb ccc dddd fffff
OUTPUT
ccc
dddd
fffff
NOTE
from my understanding of your question : I display only the words length > 2 characters after the : character.
It isn't clear from your question. Is this what you are looking for??
$txt='I am about to create regex';
$x='(I)';
$y='.*?';
$z='(am)';
$re1=$x.$y.$z;
if ($txt =~ m/$re1/is)
{
$var1=$1;
$word1=$2;
print "($var1) ($word1) \n";
}