Regex to read the id - regex

I have the log file with the following content:
(2947:_dRW00T3WEeSkhZ9pqkt5dQ) ---$ ABC XY "Share" 16-Sep-2014 03:22 PM
(2948:_3nFSwz3TEeSkhZ9pqkt5dQ) ---$ ABC XY "Share" 16-Sep-2014 03:05 PM
(2949:_voeYED3AEeSkhZ9pqkt5dQ) ---$ ABC XY "Initial for Re,oved" 16-Sep-2014 12:44 PM
I want to read the unique id say _dRW00T3WEeSkhZ9pqkt5dQ from each line and store it in a array.
My current code is:
while(<$fh>) {
if ($_ =~ /\((.*?)\)/) {
push #cs_ids , $1;
}
}

Try this:
while(<$fh>) {
if ($_ =~ /\(\d+:(.+?)\)/) {
push #cs_ids , $1;
}
}
The regexp checks all string which starts with ( then one or more digits a double point and than one or more characters ( Which will be stored in $1). THe end of the string is a ).

You were almost there:
perl -e '$string = "(2947:_dRW00T3WEeSkhZ9pqkt5dQ)"; if ($string =~ /^\((\d+:)(.*?)\)$/) { die $2; }'
_dRW00T3WEeSkhZ9pqkt5dQ at -e line 1.
Change your regular expression condition to:
/^\((\d+:)(.*?)\)$/
What that does is match and group the 4 digits and colon into special var $1 and the id you want into special var $2.

If every line of the log file is guaranteed to have an ID string, then you can write just
while (<$fh>) {
/:(\w+)/ and push #cs_ids , $1;
}
The \w ("word") character class matches alphanumeric characters or underscore, and this regex just snags the first sequence of word characters that follow a colon. It is best to avoid the non-greedy modifier if possible as it is a sloppy specification and can be much slower than a simple multiple character match.

Related

How to verify if a variable value contains a character and ends with a number using Perl

I am trying to check if a variable contains a character "C" and ends with a number, in minor version. I have :
my $str1 = "1.0.99.10C9";
my $str2 = "1.0.99.10C10";
my $str3 = "1.0.999.101C9";
my $str4 = "1.0.995.511";
my $str5 = "1.0.995.AC";
I would like to put a regex to print some message if the variable has C in 4th place and ends with number. so, for str1,str2,str3 -> it should print "matches". I am trying below regexes, but none of them working, can you help correcting it.
my $str1 = "1.0.99.10C9";
if ( $str1 =~ /\D+\d+$/ ) {
print "Candy match1\n";
}
if ( $str1 =~ /\D+C\d+$/ ) {
print "Candy match2\n";
}
if ($str1 =~ /\D+"C"+\d+$/) {
print "candy match3";
}
if ($str1 =~ /\D+[Cc]+\d+$/) {
print "candy match4";
}
if ($str1 =~ /\D+\\C\d+$/) {
print "candy match5";
}
if ($str1 =~ /C[^.]*\d$/)
C matches the letter C.
[^.]* matches any number of characters that aren't .. This ensures that the match won't go across multiple fields of the version number, it will only match the last field.
\d matches a digit.
$ matches the end of the string. So the digit has to be at the end.
I found it really helpful to use https://www.regextester.com/109925 to test and analyse my regex strings.
Let me know if this regex works for you:
((.*\.){3}(.*C\d{1}))
Following your format, this regex assums 3 . with characters between, and then after the third . it checks if the rest of the string contains a C.
EDIT:
If you want to make sure the string ends in a digit, and don't want to use it to check longer strings containing the formula, use:
^((.*\.){3}(.*C\d{1}))$
Lets look what regex should look like:
start{digit}.{digit}.{2-3 digits}.{2-3 digits}C{1-2 digits}end
very very strict qr/^1\.0\.9{2,3}\.101?C\d+\z/ - must start with 1.0.99[9]?.
very strict qr/^1\.\0.\d{2,3}\.\d{2,3}C\d{1,2}\z/ - must start with 1.0.
strict qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d{1,2}\z/
relaxed qr/^\d\.\d\.\d+\.\d+C\d+\z/
very relaxed qr/\.\d+C\d+\z/
use strict;
use warnings;
use feature 'say';
my #data = qw/1.0.99.10C9 1.0.99.10C10 1.0.999.101C9 1.0.995.511 1.0.995.AC/;
#my $re = qr/^\d\.\d\.\d+\.\d+C\d+\z/;
my $re = qr/^\d\.\d\.\d{2,3}\.\d{2,3}C\d+\z/;
say '--- Input Data ---';
say for #data;
say '--- Matching -----';
for( #data ) {
say 'match ' . $_ if /$re/;
}
Output
--- Input Data ---
1.0.99.10C9
1.0.99.10C10
1.0.999.101C9
1.0.995.511
1.0.995.AC
--- Matching -----
match 1.0.99.10C9
match 1.0.99.10C10
match 1.0.999.101C9

How to delete all characters after a certain character in each line in perl?

I have a file that I am reading in and I am trying to delete everything after specific characters such as "[". I have listed the code I have below:
while($line = <INFILE>) {
print "$line \n";
}
Some lines will have "[blah blah blah blah]" and I need to delete everything after the first bracket including the first bracket per line, any help would be greatly appreciated!
To print up to the first occurence of a specific string $delim:
while (<INFILE>) {
printf "%s\n", substr($_, 0, index($_, $delim));
}
This finds the index of the first occurence of the string and prints from first character (0) up to but excluding the index of tat first occurence.
Another option is to use a regex:
while (<$fh>) {
s/\Q$delim\E.*$//m;
print;
}
Note the \Q and \E delimiters to prevent the regex engine from interpreting e.g. [ as a regex metacharacter.
Since you mentioned you wanted to store the line in an array then do the following.
my #arr = ();
while(<INFILE>){
chomp;
push #arr, $_;
}
foreach my $item (#arr){
#process lines as you see fit here
}

perl insert and or replace string variable into a string

the requirement I have is to check a string and based on particular set of chars either insert or replace with prefix string
$prefix = "DV1";
Following are my source $input strings:
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCET2.VCE2.QR
Y88.ABCNT2.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
the first chars before first . can be of any length
but after the first . the chars ABC remain constant followed by any one character - these four chars will always be there in my input string.
after these 4 chars, the i/p string may have two alphanumeric chars - T2 in this case.
what needs to be done is check if $input has "T2" (can be any two alphanum chars) and if it has then replace those 2 chars with D1 (any two chars from $prefix)
if $input does not have "T2", then insert $prefix
This can be done quite straightforwardly with a single substitution. This program demonstrates
The pattern looks for the sequence .ABC followed by any non-dot character. The \K protects that part of the pattern from being changed. Then there may be two optional non-dot characters, followed by a dot. The replacement string is D1 if the two optional characters were present, or the value of $prefix if not
use strict;
use warnings;
my $prefix = 'DV1';
while (<DATA>) {
s/\.ABC[^.]\K([^.]{2})?(?=\.)/$1 ? 'D1' : $prefix/e;
print;
}
__DATA__
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCET2.VCE2.QR
Y88.ABCNT2.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
output
SS7.ABCWD1.RSND.LTE1.QR
IT4.ABCED1.VCE2.QR
Y88.ABCND1.MIM.EDR2.QR
9C5.ABCSDV1.MIM.EDR2.QR
Here's the code you can try..
I am assuming that, T2 can be a string of length 2 any alphanumeric characters.. It can be A4, or 5B...
#!/perl/bin
use v5.14;
use warnings;
my $str = "9C5.ABCS.MIM.EDR2.QR";
my $str1 = "SS7.ABCWT2.RSND.LTE1.QR";
my $prefix = "DV1";
my $file = 'D:\Programming\Perl\Learning Perl\chapter_1\demo.txt';
open my $fh, '<', $file or die $!;
foreach (<$fh>) {
if (m/(^.*\.ABC\w)\w{2}\./g) {
s/(^.*\.ABC\w)\w{2}\./$1D1\./;
} else {
s/(^.*\.ABC\w)\./$1$prefix\./;
}
say; # Takes current line as default($_). We don't need to specify it.
}
Input File: -
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCEX4.VCE2.QR
Y88.ABCN5W.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
Output: -
SS7.ABCWD1.RSND.LTE1.QR # Replace T2
IT4.ABCED1.VCE2.QR # Replace X4
Y88.ABCND1.MIM.EDR2.QR # Replace 5W
9C5.ABCSDV1.MIM.EDR2.QR # Does not contains T2. Add DV1
Try the following code, and tell me if it fits your needs :
#!/usr/bin/perl -l
use strict;
use warnings;
my $text =<<EOF;
SS7.ABCWT2.RSND.LTE1.QR
IT4.ABCET2.VCE2.QR
Y88.ABCNT2.MIM.EDR2.QR
9C5.ABCS.MIM.EDR2.QR
EOF
my $prefix = "DV1";
for (split "\n", $text) {
s/^(\w+\.ABC\w)T2/$1D1/ || s/^/$prefix/;
print;
}
OUTPUT
SS7.ABCWD1.RSND.LTE1.QR
IT4.ABCED1.VCE2.QR
Y88.ABCND1.MIM.EDR2.QR
DV19C5.ABCS.MIM.EDR2.QR

Regular expressions to match protected separated values

I'd like to have a regular expression to match a separated values with some protected values that can contain the separator character.
For instance:
"A,B,{C,D,E},F"
would give:
"A"
"B"
"{C,D,E}"
"F"
Please note the protected values can be nested, as follows:
"A,B,{C,D,{E,F}},G"
would give:
"A"
"B"
"{C,D,{E,F}}"
"G"
I already coded that feature with a character iteration as follow:
sub Parse
{
my #item;
my $curly;
my $string;
foreach(split //)
{
$_ eq "{" and ++$curly;
$_ eq "}" and --$curly;
if(!$curly && /[,:]/)
{
push #item, $string;
undef $string;
next;
}
$string .= $_;
}
push #item, $string;
return #item;
}
But it would definitively be so much nicer with a regexp.
A regex that supports nesting would look as follows:
my #items;
push #items, $1 while
/
(?: ^ | \G , )
(
(?: [^,{}]+
| (
\{
(?: [^{}]
| (?2)
)*
\}
)
| # Empty
)
)
/xg;
$ perl -E'$_ = shift; ... say for #items;' 'A,B,{C,D,{E,F}},G'
A
B
{C,D,{E,F}}
G
Assumes valid input since it can't extract and validate at the same time. (Well, not without making things really messy.)
Improved from nhahtdh's answer.
$_ = "A,B,{C,D,E},F";
while ( m/(\{.*?\}|((?<=^)|(?<=,)).(?=,|$))/g ) {
print "[$&]\n";
}
Improved it again. Please look at this one!
$_ = "A,B,{C,D,{E,F}},G";
while ( m/(\{.*\}|((?<=^)|(?<=,)).(?=,|$))/g ) {
print "$&\n";
}
It will get:
A
B
{C,D,{E,F}}
G
$a = "A,B,{C,D,E},F";
while ($a =~ s/(\{[\{\}\w,]+\}|\w)//) {
push (#res, $1);
}
print "\#res: #res\n"
Result:
#res: A B {C,D,E} F
Explanation : we try to match either the protected block \{[\{\}\w,]+\} or just a single character \w successively in a loop, deleting it from the original string if there is a match. Every time there is a match, we store it (meaning the $1) in the array, et voilĂ !
Here is a regex in bash:
chronos#localhost / $ echo "A,B,{C,D,E},F" | grep -oE "(\{[^\}]*\}|[A-Z])"
A
B
{C,D,E}
F
Try this regex. Use the regex to match and extract the token.
/(\{.*?\}|(?<=,|^).*?(?=,|$))/
I have not tested this code in Perl.
There is an assumption about on how the regex engine works here (I assume that it will try to match the first part \{.*?\} before the second part). I also assume that there are no nested curly bracket, and badly paired curly brackets.
$s = "A,B,{C,D,E},F";
#t = split /,(?=.*{)|,(?!.*})/, $s;

Perl regex digit

Consider my regex in this code section:
use strict;
my #list = ("1", "2", "123");
&chk(#list);
sub chk {
my #num = split (" ", "#_");
foreach my $chk (#num) {
chomp $chk;
if ($chk =~ m/\d{1,2}?/) {
print "$chk\n";
}
}
}
The \d{4} will print nothing. The \d{3} will print only 123. But if I change to \d{1,2}? it will print all. I thought, according to all the sources I read so far, that {1,2} mean: one digit but no more than two. So it should have printed only 1 and 2, correct?
What do I need to extract items that contains only one to two digits?
\d{1,2} succeeds if it finds 1 or 2 digits anywhere in the string provided. Additional string content is does not cause the match to fail. If you want to match only when the string contains exactly 1 or 2 digits, do this: ^\d{1,2}$
You should anchor your regular expression for the desired effect. The built-in function grep suits better here since it is a selection from an array that is to be done:
#!/usr/bin/env perl
use strict;
use warnings;
my #list = ( 1, 2, 123 );
print join "\n", grep /^\d{1,2}$/, #list;
It appears to be working perfectly!
Here's a hint: Use the Perl variables $`, $&, and $'. These variables are special regular expression variables that show the part of the string before the match, what was matched, and the post matched string.
Here's a sample program:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use Scalar::Util;
my #list = ("1", "2", "123");
foreach my $string (#list) {
if ($string =~ /\d{1,2}?/) {
say qq(We have a match for "string"!);
say qq("$`" "$&" "$'");
}
else {
say "No match makes David Sad";
}
}
The output will be:
We have a match for "1"!
"" "1" ""
We have a match for "2"!
"" "2" ""
We have a match for "123"!
"" "1" "23"
What this does is divide up the string into three sections: The section of the string before the regular expression match, the section of the string that matches the regular expression, and the section of the string after the regular expression match.
In each case, there was no pre-match because the regular expression matches from the start of the string. We also see that \d{1,2}? matches a single digit in each case even through 123 could have matched two digits. Why? Because the question mark on the end of the match specifier tells the regular expression not to be greedy. In this case, we tell the regular expression to match either one or two characters. Fine, it matches on one. Remove the question mark, and the last line would have looked like this:
We have a match for "123"!
"" "12" "3"
If you want to match on one or two digits, but not three or more digits, you'll have to specify the part of your string before and after the one or two digits. Something like this:
/\D\d{1,2}\D/
This would match your string foo12bar, but not foo123bar. But what if the string is 12? In that case, we want to say that either we have the beginning of the string, or a non-digit before our one or two character match, and we either have a non-digit or the end of the string at the end of our one or two character match:
/(\D|^)\d{1,2}(/D|$)/
A quick explanation:
(\D|^): A non-digit or the beginning of the string (The ^ anchor)
d{1,2}: One or two digits
(\D|$): A non-digit or the end of the string (The $ anchor)
Now, this will match 12, but not 123, and it will match foo12 and foo12bar, but not foo123 or foo123bar.
Just looking for a one or two digit number, we can simply specify the anchors:
/^\d{1,2}$/;
Now, that will match 1, 12, but not foo12 or 123.
The main thing is to use the $`, $&, and $' variables in order to help see exactly what your regular expression is matching on and what's before and after your match.
No, because while the regex only matches two digits, $chk still contains 123. If you want to only print the part that is matched, use
if ($chk =~ m/(\d{1,2})/) {
print "$1\n";
}
Note the parentheses and the $1. This causes it to print only that which is in the parentheses.
Also, this code doesn't make much sense:
sub chk {
my #num = split (" ", "#_");
Because #_ already is an array it makes no sense to make it into a string and then split it. Simply do:
sub chk {
foreach my $chk (#_) {
You also do not need to use chomp for data that is not coming from user input, as it is intended to remove the trailing newline. There is no newline in any of this data.
#!/usr/bin/perl
use strict;
my #list = ("1", "2", "123");
&chk(\#list);
sub chk {
foreach my $chk (#{$_[0]}) {
print "$chk\n" if $chk =~ m/^\d{1,2}$/ ;
}
}
#!/usr/bin/perl
use strict;
use warnings;
my #list = ("1", "2", "123");
&chk(#list);
sub chk {
my #num = split (" ", "#_");
foreach my $chk (#num) {
chomp $chk;
if ($chk =~ m/\d{1,2}/ && length($chk) <= 2) {
print "$chk\n";
}
}
}