perl insert and or replace string variable into a string - regex

the requirement I have is to check a string and based on particular set of chars either insert or replace with prefix string
$prefix = "DV1";
Following are my source $input strings:
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCET2.VCE2.QR
Y88.ABCNT2.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
the first chars before first . can be of any length
but after the first . the chars ABC remain constant followed by any one character - these four chars will always be there in my input string.
after these 4 chars, the i/p string may have two alphanumeric chars - T2 in this case.
what needs to be done is check if $input has "T2" (can be any two alphanum chars) and if it has then replace those 2 chars with D1 (any two chars from $prefix)
if $input does not have "T2", then insert $prefix

This can be done quite straightforwardly with a single substitution. This program demonstrates
The pattern looks for the sequence .ABC followed by any non-dot character. The \K protects that part of the pattern from being changed. Then there may be two optional non-dot characters, followed by a dot. The replacement string is D1 if the two optional characters were present, or the value of $prefix if not
use strict;
use warnings;
my $prefix = 'DV1';
while (<DATA>) {
s/\.ABC[^.]\K([^.]{2})?(?=\.)/$1 ? 'D1' : $prefix/e;
print;
}
__DATA__
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCET2.VCE2.QR
Y88.ABCNT2.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
output
SS7.ABCWD1.RSND.LTE1.QR
IT4.ABCED1.VCE2.QR
Y88.ABCND1.MIM.EDR2.QR
9C5.ABCSDV1.MIM.EDR2.QR

Here's the code you can try..
I am assuming that, T2 can be a string of length 2 any alphanumeric characters.. It can be A4, or 5B...
#!/perl/bin
use v5.14;
use warnings;
my $str = "9C5.ABCS.MIM.EDR2.QR";
my $str1 = "SS7.ABCWT2.RSND.LTE1.QR";
my $prefix = "DV1";
my $file = 'D:\Programming\Perl\Learning Perl\chapter_1\demo.txt';
open my $fh, '<', $file or die $!;
foreach (<$fh>) {
if (m/(^.*\.ABC\w)\w{2}\./g) {
s/(^.*\.ABC\w)\w{2}\./$1D1\./;
} else {
s/(^.*\.ABC\w)\./$1$prefix\./;
}
say; # Takes current line as default($_). We don't need to specify it.
}
Input File: -
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCEX4.VCE2.QR
Y88.ABCN5W.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
Output: -
SS7.ABCWD1.RSND.LTE1.QR # Replace T2
IT4.ABCED1.VCE2.QR # Replace X4
Y88.ABCND1.MIM.EDR2.QR # Replace 5W
9C5.ABCSDV1.MIM.EDR2.QR # Does not contains T2. Add DV1

Try the following code, and tell me if it fits your needs :
#!/usr/bin/perl -l
use strict;
use warnings;
my $text =<<EOF;
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCET2.VCE2.QR
Y88.ABCNT2.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
EOF
my $prefix = "DV1";
for (split "\n", $text) {
s/^(\w+\.ABC\w)T2/$1D1/ || s/^/$prefix/;
print;
}
OUTPUT
SS7.ABCWD1.RSND.LTE1.QR
IT4.ABCED1.VCE2.QR
Y88.ABCND1.MIM.EDR2.QR
DV19C5.ABCS.MIM.EDR2.QR

Related

How to verify if a variable value contains a character and ends with a number using Perl

I am trying to check if a variable contains a character "C" and ends with a number, in minor version. I have :
my $str1 = "1.0.99.10C9";
my $str2 = "1.0.99.10C10";
my $str3 = "1.0.999.101C9";
my $str4 = "1.0.995.511";
my $str5 = "1.0.995.AC";
I would like to put a regex to print some message if the variable has C in 4th place and ends with number. so, for str1,str2,str3 -> it should print "matches". I am trying below regexes, but none of them working, can you help correcting it.
my $str1 = "1.0.99.10C9";
if ( $str1 =~ /\D+\d+$/ ) {
print "Candy match1\n";
}
if ( $str1 =~ /\D+C\d+$/ ) {
print "Candy match2\n";
}
if ($str1 =~ /\D+"C"+\d+$/) {
print "candy match3";
}
if ($str1 =~ /\D+[Cc]+\d+$/) {
print "candy match4";
}
if ($str1 =~ /\D+\\C\d+$/) {
print "candy match5";
}
if ($str1 =~ /C[^.]*\d$/)
C matches the letter C.
[^.]* matches any number of characters that aren't .. This ensures that the match won't go across multiple fields of the version number, it will only match the last field.
\d matches a digit.
$ matches the end of the string. So the digit has to be at the end.
I found it really helpful to use https://www.regextester.com/109925 to test and analyse my regex strings.
Let me know if this regex works for you:
((.*\.){3}(.*C\d{1}))
Following your format, this regex assums 3 . with characters between, and then after the third . it checks if the rest of the string contains a C.
EDIT:
If you want to make sure the string ends in a digit, and don't want to use it to check longer strings containing the formula, use:
^((.*\.){3}(.*C\d{1}))$
Lets look what regex should look like:
start{digit}.{digit}.{2-3 digits}.{2-3 digits}C{1-2 digits}end
very very strict qr/^1\.0\.9{2,3}\.101?C\d+\z/ - must start with 1.0.99[9]?.
very strict qr/^1\.\0.\d{2,3}\.\d{2,3}C\d{1,2}\z/ - must start with 1.0.
strict qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d{1,2}\z/
relaxed qr/^\d\.\d\.\d+\.\d+C\d+\z/
very relaxed qr/\.\d+C\d+\z/
use strict;
use warnings;
use feature 'say';
my #data = qw/1.0.99.10C9 1.0.99.10C10 1.0.999.101C9 1.0.995.511 1.0.995.AC/;
#my $re = qr/^\d\.\d\.\d+\.\d+C\d+\z/;
my $re = qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d+\z/;
say '--- Input Data ---';
say for #data;
say '--- Matching -----';
for( #data ) {
say 'match ' . $_ if /$re/;
}
Output
--- Input Data ---
1.0.99.10C9
1.0.99.10C10
1.0.999.101C9
1.0.995.511
1.0.995.AC
--- Matching -----
match 1.0.99.10C9
match 1.0.99.10C10
match 1.0.999.101C9

Extract words between begin and end, \G \K

This pattern does the work
(?:\G(?!\A)|begin).*?\K(keyword)(?=.*end)
String:
begin
keyword
keyword
end
I get what I want (keyword keyword) in just one capture group, but if the string has this:
begin
keyword
keyword
end
keyword
end
I get three matches, How to stop in the first end ?
Can be this pattern be better, optimized?
demo regex
I would hate to run across such a regex in code. Any small change and it's broken.
I'd open a filehandle on a reference to the string then read its lines. Skip everything until you run into the starting line, then read everything up to the ending line:
use v5.26;
my $string =<<~'HERE';
begin
keyworda
keywordb
end
keywordc
end
HERE
open my $fh, '<', \$string;
while( <$fh> ) { last if /\Abegin/ }
my #keywords;
while( <$fh> ) {
last if /^end/;
chomp;
push #keywords, $_;
}
say join "\n", #keywords;
This outputs:
keyworda
keywordb
Or, break it up into two regexes. One sets the starting position, then you repeatedly match as long as the line isn't the ending line. This is a bit cleaner, but some people may be confused by the global matching in scalar context:
use v5.26;
my $string =<<~'HERE';
begin
keyworda
keywordb
end
keywordc
end
HERE
my #keywords;
if( $string =~ / ^ begin \R /gmx ) {
while( $string =~ /\G (?!end \R) (\N+) \R /gx ) {
push #keywords, $1;
}
}
say join "\n", #keywords;
Use regular expression and store match in an array
my #result = $data =~ /begin\n(.*?)\nend/sg;
then output to console
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $data = do { local $/; <DATA> };
my #result = $data =~ /begin\n(.*?)\nend/sg;
say '-' x 35 . "\n" . $_ for #result;
__DATA__
begin
keyword 1
keyword 2
end
keyword
end
keyword
begin
keyword 3
keyword 4
end
keyword
keyword
Output
-----------------------------------
keyword 1
keyword 2
-----------------------------------
keyword 3
keyword 4
You can use not equal in grouping to fetch the data from begin to end.
my #keyws = ($data=~/begin((?:(?!begin|end).)*)end/sg);
use Data::Dumper;
print Dumper #keyws;
It's my way to doing in LaTeX.

Perl pattern match not working as expected

I'm trying to match values, which may be comma separated, using a regex. Basically, I want to return true if any value in the string does NOT have 3g or 3k starting in the 3rd position.
My test code is as follows:
my #a = ('in3g123456,dh3k123456,dhec110101','dhec110101,dhec123456','in3g123456,dh3k123456', 'c3kasdf', 'usdfusdufs3gsdf' );
foreach (#a) {
print $_;
say $_ =~ /(?:^|,)\w{2}[^(?:3G|3K)]/i ? " true" : " false";
}
This returns
in3g123456,dh3k123456,dhec110101 true
dhec110101,dhec123456 true
in3g123456,dh3k123456 false
c3kasdf false <- whaaaaaaaat?
usdfusdufs3gsdf true
I don't understand why the 4th one is not true. Any help would be appreciated.
[^(?:3G|3K)] reads as "any character but (, ?, etc."
failed
v
c3 kasdf
/(?:^|,)\w{2}[^(?:3G|3K)]/i
Use this:
/(?:^|,)\w{2}(?!3G|3K)/i
Demo: https://regex101.com/r/P2XsgN/1.
How about /\b\w{2}(?!3g|3k)/i.
\b matches the empty string at the beginning or end of a word. Slightly simpler equivalent to (^|,) in this situation.
(?!foo) is a zero-width negative lookahead assertion. So, matches the empty string as long as it's not followed by a substring that matches foo.
You can also split the string first, instead of parsing everything with a regex. That is far more flexible and maintainable, and easier.
When processing the list of the extracted "values" you can match any character twice then your pattern, /^..$patt/. The module List::MoreUtils is useful (and fast) for list manipulations, and its notall function is tailor-made for your condition.
use warnings 'all';
use strict;
use List::MoreUtils qw(notall);
my $file = '...';
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>)
{
my $res = notall { /^..(?:3k|3g)/ } split /,/;
print "$_: " . ($res ? 'true' : 'false'), "\n";
}
I presume that you read from a file. If not, replace while (<$fn>) with for (#strings).
The notall function returns true if any element of the list fails the condition.
The split by default uses $_ so we only need the pattern. Here it is simply , but the pattern takes a regex so one can match separators flexibly. For example, this /[,\s]+/ splits on any amount of , and/or whitespace. So ,, , in a string is matched as a separator, as well as , or space(s).
When applied to the array with your strings the above prints
in3g123456,dh3k123456,dhec110101: true
dhec110101,dhec123456: true
in3g123456,dh3k123456: false
c3kasdf: true
usdfusdufs3gsdf: true
You could use substr to get data at 3rd and 4th position and then compare it with (3g|3k).
substr $_,2,2
#!/usr/bin/perl
use strict;
use warnings;
my #a = ('in3g123456,dh3k123456,dhec110101','dhec110101,dhec123456','in3g123456,dh3k123456', 'c3kasdf', 'usdfusdufs3gsdf' );
foreach (#a) {
my #inputs = split /,/,$_;
my $flag = 0;
foreach (#inputs){
$flag = 1 unless ((substr $_,2,2) =~ /(3g|3k)/);
}
$flag ? print "$_: True\n" : print "$_: False\n";
}
Output:
in3g123456,dh3k123456,dhec110101: True
dhec110101,dhec123456: True
in3g123456,dh3k123456: False
c3kasdf: True
usdfusdufs3gsdf: True
Demo

Matching a variable in a string in Perl from the end

I want to match a variable character in a given string, but from the end.
Ideas on how to do this action?
for example:
sub removeCharFromEnd {
my $string = shift;
my $char = shift;
if($string =~ m/$char/){ // I want to match the char, searching from the end, $doesn't work
print "success";
}
}
Thank you for your assistance.
There is no regex modifier that would force Perl regex engine to parse the string from right to left. Thus, the most convenient way to achieve that is via a negative lookahead:
m/$char(?!.*$char)/
The (?!.*$char) negative lookahead will require the absence (=will fail the match if found) of a $char after any 0+ chars other than linebreak chars (use s modifier if you are running the regex against a multiline string input).
The regex engine works from left to right.
You can use the natural greediness of quantifiers to reach the end of the string and find the last char with the backtracking mechanism:
if($string =~ m/.*\K$char/s) { ...
\K marks the position of the match result beginning.
Other ways:
you can also reverse the string and use your previous pattern.
you can search all occurrences and take the last item in the list
I'm having trouble understanding what you want. Your subroutine is called removeCharFromEnd, so perhaps you want to remove $char from $string if it appears at the end of the string
You can do that like this
sub removeCharFromEnd {
my ( $string, $char ) = #_;
if ( $string =~ s/$char\z// ) {
print "success";
}
$string;
}
Or perhaps you want to remove the last occurrence of $char wherever it is. You can do that with
s/.*\K$char//
The subroutine I have written returns the modified string, so you would have to assign the result to a variable to save it. You can write
my $s = 'abc';
$s = removeCharFromEnd($s, 'c');
say $s;
output
ab
If you just want to modify the string in place then you should write
$ARGV[0] =~ s/$char\z//
using whichever substitution you choose. Then you can do this
my $s = 'abc';
removeCharFromEnd($s, 'c');
say $s;
This produces the same output
To get Perl to search from the end of a string, reverse the string.
sub removeCharFromEnd {
my $string = reverse shift #_;
my $char = quotemeta reverse shift #_;
$string =~ s/$char//;
$string = reverse $string;
return $string;
}
print removeCharFromEnd(qw( abcabc b )), "\n";
print removeCharFromEnd(qw( abcdefabcdef c )), "\n";
print removeCharFromEnd(qw( !"/$%?&*!"/$%?&* $ )), "\n";

Perl regex digit

Consider my regex in this code section:
use strict;
my #list = ("1", "2", "123");
&chk(#list);
sub chk {
my #num = split (" ", "#_");
foreach my $chk (#num) {
chomp $chk;
if ($chk =~ m/\d{1,2}?/) {
print "$chk\n";
}
}
}
The \d{4} will print nothing. The \d{3} will print only 123. But if I change to \d{1,2}? it will print all. I thought, according to all the sources I read so far, that {1,2} mean: one digit but no more than two. So it should have printed only 1 and 2, correct?
What do I need to extract items that contains only one to two digits?
\d{1,2} succeeds if it finds 1 or 2 digits anywhere in the string provided. Additional string content is does not cause the match to fail. If you want to match only when the string contains exactly 1 or 2 digits, do this: ^\d{1,2}$
You should anchor your regular expression for the desired effect. The built-in function grep suits better here since it is a selection from an array that is to be done:
#!/usr/bin/env perl
use strict;
use warnings;
my #list = ( 1, 2, 123 );
print join "\n", grep /^\d{1,2}$/, #list;
It appears to be working perfectly!
Here's a hint: Use the Perl variables $`, $&, and $'. These variables are special regular expression variables that show the part of the string before the match, what was matched, and the post matched string.
Here's a sample program:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use Scalar::Util;
my #list = ("1", "2", "123");
foreach my $string (#list) {
if ($string =~ /\d{1,2}?/) {
say qq(We have a match for "string"!);
say qq("$`" "$&" "$'");
}
else {
say "No match makes David Sad";
}
}
The output will be:
We have a match for "1"!
"" "1" ""
We have a match for "2"!
"" "2" ""
We have a match for "123"!
"" "1" "23"
What this does is divide up the string into three sections: The section of the string before the regular expression match, the section of the string that matches the regular expression, and the section of the string after the regular expression match.
In each case, there was no pre-match because the regular expression matches from the start of the string. We also see that \d{1,2}? matches a single digit in each case even through 123 could have matched two digits. Why? Because the question mark on the end of the match specifier tells the regular expression not to be greedy. In this case, we tell the regular expression to match either one or two characters. Fine, it matches on one. Remove the question mark, and the last line would have looked like this:
We have a match for "123"!
"" "12" "3"
If you want to match on one or two digits, but not three or more digits, you'll have to specify the part of your string before and after the one or two digits. Something like this:
/\D\d{1,2}\D/
This would match your string foo12bar, but not foo123bar. But what if the string is 12? In that case, we want to say that either we have the beginning of the string, or a non-digit before our one or two character match, and we either have a non-digit or the end of the string at the end of our one or two character match:
/(\D|^)\d{1,2}(/D|$)/
A quick explanation:
(\D|^): A non-digit or the beginning of the string (The ^ anchor)
d{1,2}: One or two digits
(\D|$): A non-digit or the end of the string (The $ anchor)
Now, this will match 12, but not 123, and it will match foo12 and foo12bar, but not foo123 or foo123bar.
Just looking for a one or two digit number, we can simply specify the anchors:
/^\d{1,2}$/;
Now, that will match 1, 12, but not foo12 or 123.
The main thing is to use the $`, $&, and $' variables in order to help see exactly what your regular expression is matching on and what's before and after your match.
No, because while the regex only matches two digits, $chk still contains 123. If you want to only print the part that is matched, use
if ($chk =~ m/(\d{1,2})/) {
print "$1\n";
}
Note the parentheses and the $1. This causes it to print only that which is in the parentheses.
Also, this code doesn't make much sense:
sub chk {
my #num = split (" ", "#_");
Because #_ already is an array it makes no sense to make it into a string and then split it. Simply do:
sub chk {
foreach my $chk (#_) {
You also do not need to use chomp for data that is not coming from user input, as it is intended to remove the trailing newline. There is no newline in any of this data.
#!/usr/bin/perl
use strict;
my #list = ("1", "2", "123");
&chk(\#list);
sub chk {
foreach my $chk (#{$_[0]}) {
print "$chk\n" if $chk =~ m/^\d{1,2}$/ ;
}
}
#!/usr/bin/perl
use strict;
use warnings;
my #list = ("1", "2", "123");
&chk(#list);
sub chk {
my #num = split (" ", "#_");
foreach my $chk (#num) {
chomp $chk;
if ($chk =~ m/\d{1,2}/ && length($chk) <= 2) {
print "$chk\n";
}
}
}