perl regex match and store specific character in scalar variable - regex

Now suppose say i have this line in a file:
my %address = (
or any such similar line in which i have defined the hash.
I want to find the character "(" in the line and store "address" in say $hash_name. How do I do it?
Basic idea is to capture the name of the hash defined in the files.
I am trying to do is,
foreach $line <MYFILE> {
if($line =~ /($/ {
How do I proceed further?

Not sure if I understood your problem, but, how about:
my %hash;
while (my $line = <MYFILE>) {
if ($line =~ /\%(\w+)\s*=\s*\($/) {
$hash{$1} = 1;
}
}

open (F1,"inputfile.txt") or die("unable to open inputfile.txt");
my $hash_name
while (<F1>) {
if (/%(\w+) *= *\(/) {
$hash_name = $1;
print $hash_name;
}
}

Related

Check if multiple lines exist in text file

Using Perl I would like to check if the two lines highlighted below exist in a text file . Each line is preceded by a tab.
CF=CFU-ALL-PROV-NONE-YES-NO-NONE-YES;
CF=CFB-ALL-PROV-NONE-YES-YES-NONE-YES;
***CF=CFU-TS10-ACT-NONE-YES-NO-NONE-YES;***
CF=CFNRY-ALL-PROV-NONE-YES-YES-NONE-YES;
CF=CFNRC-ALL-PROV-NONE-YES-NO-NONE-YES;
***CF=CFB-TS10-ACT-NONE-YES-NO-NONE-YES;***
CF=CFD-TS10-REG-9124445544-YES-YES;
I am using the following if statement but it is not matched
if (/\t*CF=(CFU-TS10-ACT-(NONE|\d+))/ && /\t*CF=(CFB-TS10-ACT-(NONE|\d+))/)
{
say "this case is found here .....";
}
What am I doing wrong ?
Edited
This is the program I wrote :-
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $HSSIN='D:\testproject\HSS-export-test-run-small.txt';
my $ofile = 'D:\testproject\HSS-output.txt';
open (INFILE, $HSSIN) or die "Can't open input file";
open (OUTFILE,"> $ofile" ) or die "Cant open file";
my $add;
my $MSISDN;
my $line;
sub callForwardingsCF()
{
if (/\t*CF=(CFU-TS10-ACT-(NONE|\d+))/ && /\t*CF=(CFB-TS10-ACT-(NONE|+\d+))/)
{
say "this case is found here .....";
}
} # end sub callForwardingsCFD
while (<INFILE>)
{
if (/<SUBEND/)
{
say "SUBEND found";
#$line = $1 if /^\s*MSISDN=(\d+);/;
print OUTFILE "processSingle UpdateCommand GSUB MKEY $line";
print OUTFILE "\n";
}
if ($_ =~ /^\t*MSISDN=(\d+);/)
{ #find MSISDN in file global search
say "STARTER MSISDN is $1";
$MSISDN = $1;
$add = $1;
$line = "$1"; #group 1
}
callForwardingsCF(); #callForwardings
}
close INFILE;
close OUTFILE;
Example of a record in the input file
<BEGINFILE>
<SUBBEGIN
IMSI=232191400029053;
MSISDN=4369050064401;
DEFCALL=TS11;
CURRENTNAM=BOTH;
CAT=COMMON;
TBS=TS11&TS12&TS21&TS22;
VLRLIST=10;
SGSNLIST=10;
SMDP=MSC;
CB=BAOC-ALL-PROV;
CB=BOIC-ALL-PROV;
CB=BOICEXHC-ALL-PROV;
CB=BICROAM-ALL-PROV;
CW=CW-ALL-PROV;
CF=CFU-ALL-PROV-NONE-YES-NO-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFB-ALL-PROV-NONE-YES-YES-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFU-TS10-ACT-NONE-YES-NO-NONE-YES-65535-YES-YES-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFNRY-ALL-PROV-NONE-YES-YES-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFNRC-ALL-PROV-NONE-YES-NO-NONE-YES-65535-NO-NO-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFB-TS10-ACT-NONE-YES-NO-NONE-YES-65535-YES-YES-NO-NO-NO-NO-NO-NO-NO-NO;
CF=CFD-TS10-REG-91436903000-YES-YES-25-YES-65535-YES-YES-NO-NO-NO-YES-YES-YES-YES-NO;
TCSISTATE=YES;
OCSISTATE=YES;
CONTROL=SUB;
WPA=0;
GS=HOLD&MPTY&ECT&CLIR&CLIP;
CLIRES=TEMPALLOW;
CLIPOC=NO;
OCSI=10;
CFSMS=ACT-10-914366488325207-YES-YES-NO-NO-NO;
ARD=PROV;
SUBRES=ALLPLMN;
IST_ALERT_TIMER=120;
IST_ALERT_RESPONSE=2;
SUB_AGE=0;
MIMSI=240076400029053-ONELIVE-2-2-1-0-0;
MIMSI=232191400029053-ONELIVE-1-1-1-0-0;
SID=2805158185721065;
MCSISTATE=YES;
CLRBSG=CLIP-YES-NO-NO-NO-NO;
UPLCSLCK=NO;
UPLPSLCK=NO;
DEFOFAID=10;
EPS_PROFILE_ID=1;
TGPPAMBRMAXUL=50000000;
TGPPAMBRMAXDL=150000000;
ARD_EXT=NULL-NULL-NULL-N3GPPNOTALLOWED;
FRAUDTPL_ID=10;
HLR_INDEX=1;
LTEAUTOPROV=NO;
PSSER=1-1-10-1-NONE-DYNAMIC-00000000;
EPSSER=1-10-10-1-NONE-DYNAMIC-00000000-1;
MPS=NO;
<SUBEND
Thanks,
Graham
Per default regexes match linewise.
So if you were trying to match an input that contains multiple lines, you would have to use one of the modifiers that allows the regex to match the entire string.
See the the perl regex documentation - the chapter "Modifiers".
Then you should add the s modifiler and change your if statement to:
if ( /\t*CF=(CFB-TS10-ACT-(NONE|\d+))/s &&
/\t*CF=(CFU-TS10-ACT-(NONE|\d+))/s ) {
say "found";
}
If you read linewise you will never have both of your regexes match for the same line, so you would need to do your regexes seperately as already suggested by the other answer.
#$/ = ""; #without paragraph mode
open my $file, '<', 'data_file';
binmode $file;
while(<$file>){
print $_ if ( $_ =~ /\s+CF=CFU-TS10-ACT-NONE-YES-NO-NONE-YES-\d+-YES-YES-NO-NO-NO-NO-NO-NO-NO-NO;/ ||
$_ =~ /\s+CF=CFB-TS10-ACT-NONE-YES-NO-NONE-YES-\d+-YES-YES-NO-NO-NO-NO-NO-NO-NO-NO;/ );
}
EDIT:
OR, you can do it in paragraph mode if conditions allow it.
$/ = "";
open my $file, '<', 'data_file';
binmode $file;
while(<$file>){
(undef, $first) = split (/\s+(CF=CFU-TS10-ACT-NONE-YES-NO-NONE-YES-\d+-YES-YES-NO-NO-NO-NO-NO-NO-NO-NO;)/, $_);
(undef, $second) = split(/\s+(CF=CFB-TS10-ACT-NONE-YES-NO-NONE-YES-\d+-YES-YES-NO-NO-NO-NO-NO-NO-NO-NO;)/, $_ );
print $first . "\n" . $second;
}
Code is tested and seems to work fine with supplied data.
Also, those are not tabs "\t" ... those are spaces "\s+" preceding those lines. Best thing is to learn your data set before you try to parse it ;)
Typically perl processes file "line by line".
Try something like sample script below:
my($line1,$line2);
while(<STDIN>) {
$line1=$_ if /\t*CF=(CFU-TS10-ACT-(NONE|\d+))/
$line2=$_ if /\t*CF=(CFB-TS10-ACT-(NONE|\d+))/
if( $line1 and $line2 ) {
say "this case is found here .....";
last; # skip processing remaning lines
}
}
Alternatively you may "slurp" whole file into one scalar variable.

perl read line by line with global variable

i have a perl script-
#!/usr/bin/perl;
my $email = '[a-zA-Z0-9._]+#[a-zA-Z0-9._]+.[a-zA-Z0-9._]{2,4}';
open(FILE,'emails');
while (<FILE>) {
my $emails_not_found = 1;
if ( m/$email/ ) {
print($_);
my $emails_not_found = 0;
}
if ( $emails_not_found ) {
print "no emails\n";
}
}
close FILE;
the file emails is:
sdfasd#asd
asdf
so, as you can see, the script will not match regex to any of the lines. however, it outputs this-
no emails
no emails
i want it to output 'no emails' ONCE if it doesn't match the regex pattern at all. If it only matches the regex pattern just once, it will print that line and output 'no emails' for the other
line :( I just want it to output either JUST the lines with the emails, or output 1 line that says 'no emails'. Thanks in advance.
Consider using a module, such as Regexp::Common::Email::Address, "...to match email addresses as defined by RFC 2822":
use strict;
use warnings;
use Regexp::Common qw/Email::Address/;
my $emailFound = 0;
open my $fh, '<', 'emails' or die $!;
while (<$fh>) {
if (/$RE{Email}{Address}/) {
print;
$emailFound = 1;
}
}
close $fh;
print "no emails\n" if !$emailFound;
Hope this helps!
Try this, I changed the emails file name in order to test on my machine:
#!/usr/bin/perl
#output either JUST the lines with emails
# or
#1 line that says 'no emails'
use strict;
use warnings;
my $email = '[a-z0-9\._]+#[a-z0-9\._]+\.[a-z0-9\._]{2,4}';
open(FILE,'./email.txt');
my $emails_not_found = 1;
while (<FILE>) {
if ( m/$email/i ) {
print($_);
$emails_not_found = 0;
}
}
if ( $emails_not_found == 1) {
print "no emails\n";
}
close FILE;
Test file
sdfasd#asd
asdf
aaa#aaa.com
AAA#AAA.COM
Output
aaa#aaa.com
AAA#AAA.COM

Regex across multiple lines

I was able to successfully extract everything with your suggestions. My issue came as expected, with the regex not properly recognizing something... thanks so much!! Here is my end code... hope it helps someone!
if($_=~/(Research Interests)/){
$research = "Research Interest";
if($_=~m/<h2>Research Interests<\/h2>(.*?)<p>(.*?)<\/p>/gs){
#researchInterests = split(/,+/, $2);
$count = 1;
foreach(#researchInterests){
print "$research $count:";
print $_. "\n";
$count++;
}
}
}
The problem is that you've only read in one line at a time. Why don't you read in the entire file and match against that.
my $file;
{
local $/;
$file = <FILE>;
}
You can simply go get more lines at that point:
while (<FILE>) {
if (m/Research Interests/) {
while (<FILE>) {
if (m/<p>(.*)<p>/) {
print "Research Interests: $1";
last;
}
}
}
}
I don't know whether your file is huge or not, but it's worth learning techniques that don't require reading the whole file at once so that you can deal with arbitrarily large files, or with streams.
If you absolutely have to do this, you could try setting the newline separator to undef:
#!/usr/bin/perl
use warnings;
use strict;
my $infile = 'in.txt';
open my $input, '<', $infile or die "Can't open to $infile: $!";
my $reserch_interests;
$/=undef;
while(<$input>){
if($_ =~ /(Research Interests)/){
$reserch_interests = $1;
if($_=~ m/<p>(.*)<\/p>/){
print "Title: $reserch_interests\nInterests: $1\n";
}
}
}
Prints:
Title: Research Interests
Interests: Data mining, databases, information retrieval

Pull regular expressions from file and compare to each line in a file

I found something that I could use on perlmonks.org (http://www.perlmonks.org/?node_id=870806) but I can't get it to work.
I can read the file without issue and build an array. Then, I'd like to compare each index of the array (each regex) to each line of a file, printing out the line before and the line after the matched line.
My code:
# List of regex's. If this file doesn't exist, we can't continue
open ( $fh, "<", $DEF_FILE ) || die ("Can't open regex file: $DEF_FILE");
while (<$fh>) {
chomp;
push (#bad_strings, $_);
}
close $fh || die "Cannot close regex file: $DEF_FILE: $!";
$file = '/tmp/mydirectory/myfile.txt';
eval { open ( $fh, "<", $file ); };
if ($#) {
# If there was an error opening the file, just move on
print "Error opening file: $file.\n";
} else {
# If no error, process the file
foreach $bad_string (#bad_strings) {
$this_line = "";
$do_next = 0;
seek($fh, 0, 0); # move pointer to 0 each time through
while(<$fh>) {
$last_line = $this_line;
$this_line = $_;
my $rege = eval "sub{ \$_[0] =~ $bad_string }"; # Real-time regex
if ($rege->( $this_line )) { # Line 82
print $last_line unless $do_next;
print $this_line;
$do_next = 1;
} else {
print $this_line if $do_next;
$last_line = "";
$do_next = 0;
}
}
}
} # End "if error opening file" check
This was working before when I had just a string per line in the file and performed a simple test such as if ($this_line =~ /$string_to_search_for/i ) but when I switched to regex in the file and a "real-time" eval statement, I now get Can't use string ("") as a subroutine ref while "strict refs" in use at scrub_file.pl line 82 and line 82 is if ($rege->($this_line)) {.
Prior to that error message, I'm receiving: Use of uninitialized value in subroutine entry at scrub_hhsysdump_file.pl line 82, <$fh> I have some understanding of that error message but can't seem to make the perl engine happy with my code thus far.
Still new to perl and always looking for pointers. Thanks in advance.
I fail to see the reason for those eval statements - all they seem to do is make the code a lot more complicated and difficult to debug.
But $rege is undef because eval "sub{ \$_[0] =~ $bad_string }" isn't working, due to the string having a syntax error. I don't know what's in $DEF_FILE, but unless it has properly-delimited regular expressions then you need to add the delimiters in the eval string.
my $rege = eval "sub{ \$_[0] =~ /$bad_string/ }"
may work, but you may need /\Q$bad_string/ instead if the strings in $DEF_FILE contain regex metacharacters and you want them to be treated as literal characters.
I suggest this version of your program which seems to do what you need without the fuss of the eval calls.
use strict;
use warnings;
use Fcntl ':seek';
my $DEF_FILE = 'myfile';
my #bad_strings = do {
open my $fh, '<', $DEF_FILE or die qq(Can't open regex file "$DEF_FILE": $!);
<$fh>;
};
chomp #bad_strings;
my $file = '/tmp/mydirectory/myfile.txt';
open my $fh, '<', $file or die qq(Unable to open "$file" for input: $!);
for my $bad_string (#bad_strings) {
my $regex = qr/$bad_string/;
my ($last_line, $this_line, $do_next) = ('', '', 0);
seek $fh, 0, SEEK_SET;
while (<$fh>) {
($last_line, $this_line) = ($this_line, $_);
if ($this_line =~ $regex) {
print $last_line unless $do_next;
print $this_line;
$do_next = 1;
}
else {
print $this_line if $do_next;
$do_next = 0;
}
}
}

Finding the last "}" of a subroutine

Supposed I have a file with Perl-code: does somebody know, if there is a module which could find the closing "}" of a certain subroutine in that file.
For example:
#!/usr/bin/env perl
use warnings;
use 5.012;
routine_one( '{°^°}' );
routine_two();
sub routine_one {
my $arg = shift;
if ( $arg =~ /}\z/ ) {
say "Hello my }";
}
}
sub routine_two {
say '...' for 0 .. 10
}
The module should be able to remove the whole routine_one or it should can tell me the line-number of the closing "}" from that routine.
You want to use PPI if you are going to be parsing Perl code.
#!/usr/bin/env perl
use warnings;
use 5.012;
use PPI;
my $file = 'Example.pm';
my $doc = PPI::Document->new( $file );
$doc->prune( 'PPI::Token::Pod' );
$doc->prune( 'PPI::Token::Comment' );
my $subs = $doc->find( sub { $_[1]->isa('PPI::Statement::Sub') and $_[1]->name eq 'layout' } );
die if #$subs != 1;
my $new = PPI::Document->new( \qq(sub layout {\n say "my new layout_code";\n}) );
my $subs_new = $new->find( sub { $_[1]->isa('PPI::Statement::Sub') and $_[1]->name eq 'layout' } );
$subs->[0]->block->insert_before( $subs_new->[0]->block ) or die $!;
$subs->[0]->block->remove or die $!;
# $subs->[0]->replace( $subs_new->[0] );
# The ->replace method has not yet been implemented at /usr/local/lib/perl5/site_perl/5.12.2/PPI/Element.pm line 743.
$doc->save( $file ) or die $!;
The following will work in case your subroutines don't contain any blank lines, like the one in your example:
#!/usr/bin/perl -w
use strict;
$^I = ".bkp"; # to create a backup file
{
local $/ = ""; # one paragraph constitutes one record
while (<>) {
unless (/^sub routine_one \{.+\}\s+$/s) { # 's' => '.' will also match "\n"
print;
}
}
}