I want to read 12h format time from file and replace it with 24 hour
example
this is due at 3:15am -> this is due 15:15
I tried saving variables in regex and manupilate it later but didnt work, I also tried using substitution "/s" but because it is variable I couldnt figure it out
Here is my code:
while (<>) {
my $line = $_;
print ("this is text before: $line \n");
if ($line =~ m/\d:\d{2}pm/g){
print "It is PM! \n";}
elsif ($line =~ m/(\d):(\d\d)am/g){
print "this is try: $line \n";
print "Its AM! \n";}
$line =~ s/($regexp)/<French>$lexicon{$1}<\/French>/g;
print "sample after : $line\n";
}
A simple script can do the work for you
$str="this is due at 3:15pm";
$str=~m/\D+(\d+):\d+(.*)$/;
$hour=($2 eq "am")? ( ($1 == 12 )? 0 : $1 ) : ($1 == 12 ) ? $1 :$1+12;
$min=$2;
$str=~s/at.*/$hour:$min/g;
print "$str\n";
Gives output as
this is due 15:15
What it does??
$str=~m/\D+(\d+):(\d+)(.*)$/; Tries to match the string with the regex
\D+ matches anything other than digits. Here it matches this is due at
(\d+) matches any number of digits. Here it matches 3. Captured in group 1 , $1 which is the hours
: matches :
(\d+) matches any number of digits. Here it matches 15, which is the minutes
(.*) matches anything follwed, here am . Captures in group 2, `$2
$ anchors the regex at end of
$hour=($2 eq "am")? ( ($1 == 12 )? 0 : $1 ) : ($1 == 12 ) ? $1 :$1+12; Converts to 24 hour clock. If $2 is pm adds 12 unless it is 12. Also if the time is am and 12 then the hour is 0
$str=~s/at.*/$hour:$min/g; substitutes anything from at to end of string with $hour:$min, which is the time obtained from the ternary operation performed before
#!/usr/bin/env perl
use strict;
use warnings;
my $values = time_12h_to_24h("11:00 PM");
sub time_12h_to_24h
{
my($t12) = #_;
my($hh,$mm,$ampm) = $t12 =~ m/^(\d\d?):(\d\d?)\s*([AP]M?)/i;
$hh = ($hh % 12) + (($ampm =~ m/AM?/i) ? 0 : 12);
return sprintf("%.2d:%.2d", $hh, $mm);
}
I found this code in the bleow link. Please check:
Is my pseudo code for changing time format correct?
Try this it give what you expect
my #data = <DATA>;
foreach my $sm(#data){
if($sm =~/(12)\.\d+(pm)/g){
print "$&\n";
}
elsif($sm =~m/(\d+(\.)?\d+)(pm)/g )
{
print $1+12,"\n";
}
}
__DATA__
Time 3.15am
Time 3.16pm
Time 5.17pm
Time 1.11am
Time 1.01pm
Time 12.11pm
Related
rencently I have met a strange bug when use a dynamic regular expressions in perl for Nesting brackets' match. The origin string is " {...test{...}...} ", I want to grep the pair brace begain with test, "test{...}". actually there are probably many pairs of brace before and end this group , I don't really know the deepth of them.
Following is my match scripts: nesting_parser.pl
#! /usr/bin/env perl
use Getopt::Long;
use Data::Dumper;
my %args = #ARGV;
if(exists$args{'-help'}) {printhelp();}
unless ($args{'-file'}) {printhelp();}
unless ($args{'-regex'}) {printhelp();}
my $OpenParents;
my $counts;
my $NestedGuts = qr {
(?{$OpenParents = 0})
(?>
(?:
[^{}]+
| \{ (?{$OpenParents++;$counts++; print "\nLeft:".$OpenParents." ;"})
| \} (?(?{$OpenParents ne 0; $counts++}) (?{$OpenParents--;print "Right: ".$OpenParents." ;"})) (?(?{$OpenParents eq 0}) (?!))
)*
)
}x;
my $string = `cat $args{'-file'}`;
my $partten = $args{'-regex'} ;
print "####################################################\n";
print "Grep [$partten\{...\}] from $args{'-file'}\n";
print "####################################################\n";
while ($string =~ /($partten$NestedGuts)/xmgs){
print $1."}\n";
print $2."####\n";
}
print "Regex has seen $counts brackts\n";
sub printhelp{
print "Usage:\n";
print "\t./nesting_parser.pl -file [file] -regex '[regex expression]'\n";
print "\t[file] : file path\n";
print "\t[regex] : regex string\n";
exit;
}
Actually my regex is:
our $OpenParents;
our $NestedGuts = qr {
(?{$OpenParents = 0})
(?>
(?:
[^{}]+
| \{ (?{$OpenParents++;})
| \} (?(?{$OpenParents ne 0}) (?{$OpenParents--})) (?(?{$OpenParents eq 0} (?!))
)*
)
}x;
I have add brace counts in nesting_parser.pl
I also write a string generator for debug: gen_nesting.pl
#! /usr/bin/env perl
use strict;
my $buffer = "{{{test{";
unless ($ARGV[0]) {print "Please specify the nest pair number!\n"; exit}
for (1..$ARGV[0]){
$buffer.= "\n\{\{\{\{$_\}\}\}\}";
#$buffer.= "\n\{\{\{\{\{\{\{\{\{$_\}\}\}\}\}\}\}\}\}";
}
$buffer .= "\n\}}}}";
open TEXT, ">log_$ARGV[0]";
print TEXT $buffer;
close TEXT;
You can generate a test file by
./gen_nesting.pl 1000
It will create a log file named log_1000, which include 1000 lines brace pairs
Now we test our match scripts:
./nesting_parser.pl -file log_1000 -regex "test" > debug_1000
debug_1000 looks like a great perfect result, matched successfully! But when I gen a 4000 lines test log file and match it again, it seem crashed:
./gen_nesting.pl 4000
./nesting_parser.pl -file log_4000 -regex "test" > debug_4000
The end of debug_4000 shows
{{{{3277}
####
Regex has seen 26213 brackts
I don't know what's wrong with the regex expresions, mostly it works well for paired brackets, untill recently I found it crashed when I try to match a text file more than 600,000 lines.
I'm really confused by this problems,
I really hope to solve this problem.
thank you all!
First for matching nested brackets I normally use Regexp::Common.
Next, I'm guessing that your problem is that Perl's regular expression engine breaks after matching 32767 groups. You can verify this by turning on warnings and looking for a message like Complex regular subexpression recursion limit (32766) exceeded.
If so, you can rewrite your code using /g and \G and pos. The idea being that you match the brackets in a loop like this untested code:
my $start = pos($string);
my $open_brackets = 0;
my $failed;
while (0 < $open_brackets or $start == pos($string)) {
if ($string =~ m/\G[^{}]*(\{|\})/g) {
if ($1 eq '{') {
$open_brackets++;
}
else {
$open_brackets--;
}
}
else {
$failed = 1;
break; # WE FAILED TO MATCH
}
}
if (not $failed and 0 == $open_brackets) {
my $matched = substr($string, $start, pos($string));
}
I am trying to extract some patterns out of a log file but I am unable to print them properly.
Examples of log strings :
1) sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.2644?startid=2644000&endid=2644666
2) sequence_history/buckets/FPJ.INV_DOM_16_PRD.41987.9616
I want to extract 3 things :
A = "FPJ.INV_DOM_16_PRD" B = "47269" C = 9616 or 2644666 (if the line
has endid then C = 2644666 else it's 9616)
log line can either be of type 1 or 2. I am able to extract A and B but I am stuck with C as I need a conditional statement for it and I am not able to extract it properly. I am pasting my code :
my $string='/sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.2644?startid=2644000&endid=2644666';
if ($string =~ /sequence_history\/buckets\/(.*)/){
my $line = $1;
print "$line\n";
if($line =~ /(FPJ.*PRD)\.(\d*)\./){
my $topic_type_string = $1;
my $topic_id = $2;
print "$1\n$2\n";
}
if($string =~ /(?(?=endid=)\d*$)/){
# how to print match pattern here?
print "match\n";
}
Thanks in advance!
This will do the job:
use Modern::Perl;
use Data::Dumper;
my $re = qr/(FPJ.+?PRD)\.(\d+)\..*?(\d+)$/;
while(<DATA>) {
chomp;
my (#l) = $_ =~ /$re/g;
say Dumper\#l;
}
__DATA__
sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.2644?startid=2644000&endid=2644666
sequence_history/buckets/FPJ.INV_DOM_16_PRD.41987.9616
Output:
$VAR1 = [
'FPJ.INV_DOM_16_PRD',
'47269',
'2644666'
];
$VAR1 = [
'FPJ.INV_DOM_16_PRD',
'41987',
'9616'
];
Explanation:
( : start group 1
FPJ : literally FPJ
.+? : 1 or more any character but newline, not greedy
PRD : literally PRD
) : end group 1
\. : a dot
( : start group 2
\d+ : 1 or more digit
) : end group 2
\. : a dot
.*? : 0 or more any character not greedy
( : start group 3
\d+ : 1 or more digit
) : end group 3
$ : end of string
If you are trying to fetch some entries in log file, then you can use file handles in perl. In below code i'm trying to fetch the entries from a log file named as test.log
Entries of the log are as below.
sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.2644?startid=2644000&endid=2644666
sequence_history/buckets/FPJ.INV_DOM_16_PRD.41987.9616
sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.69886?startid=2644000&endid=26765849
sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.24465?startid=2644000&endid=836783741
Below is the perl script to fetch required data.
#!/usr/bin/perl
use strict;
use warnings;
open (FH, "test.log") || die "Not able to open test.log $!";
my ($a,$b,$c);
while (my $line=<FH>)
{
if ($line =~ /sequence_history\/buckets\/.*endid=(\d*)/)
{
$c= $1;
if ($line =~ /(FPJ.*PRD)\.(\d*)\.(\d*)\?/)
{
$a=$1;
$b=$2;
}
}
else
{
if ($line =~ /sequence_history\/buckets\/(FPJ.*PRD)\.(\d*)\.(\d*)/)
{
$a=$1;
$b=$2;
$c=$3;
}
}
print "\n \$a=$a\n \$b=$b\n \$c=$c \n";
}
Output:
$a=FPJ.INV_DOM_16_PRD
$b=47269
$c=2644666
$a=FPJ.INV_DOM_16_PRD
$b=41987
$c=9616
$a=FPJ.INV_DOM_16_PRD
$b=47269
$c=26765849
$a=FPJ.INV_DOM_16_PRD
$b=47269
$c=836783741
You can use the above code by replacing "test.log" by log file name (along with its path) from which you want to fetch data as shown below.
open (FH, "/path/to/log/file/test.log") || die "Not able to open test.log $!";
Working from an example found else where on stackoverflow.com I am attempting to replace on the Nth instance of a regex match on a string in Perl. My code is as follows:
#!/usr/bin/env perl
use strict;
use warnings;
my $num_args = $#ARGV +1;
if($num_args != 3) {
print "\nUsage: replace_integer.pl occurance replacement to_replace";
print "\nE.g. `./replace_integer.pl 1 \"INTEGER_PLACEHOLDER\" \"method(0 , 1, 6);\"`";
print "\nWould output: \"method(INTEGER_PLACEMENT , 1, 6);\"\n";
exit;
}
my $string =$ARGV[2];
my $cont =0;
sub replacen {
my ($index,$original,$replacement) = #_;
$cont++;
return $cont == $index ? $replacement: $original;
}
sub replace_quoted {
my ($string, $index,$replacement) = #_;
$cont = 0; # initialize match counter
$string =~ s/[0-9]+/replacen($index,$1,$replacement)/eg;
return $string;
}
my $result = replace_quoted ( $string, $ARGV[0] ,$ARGV[1]);
print "RESULT: $result\n";
For
./replace_integer.pl 3 "INTEGER_PLACEHOLDER" "method(0, 1 ,6);"
I'd expect an output of
RESULT: method(0, 1 ,INTEGER_PLACEHOLDER);
Unfortunately I get an output of
RESULT: method(, ,INTEGER_PLACEHOLDER);
With these warnings/errors
Use of uninitialized value in substitution iterator at ./replace_integer.pl line 26.
Use of uninitialized value in substitution iterator at ./replace_integer.pl line 26.
Line 26 is the following line:
$string =~ s/[0-9]+/replacen($index,$1,$replacement)/eg;
I have determined this is due to $1 being uninitialised. To my understanding $1 should have the value of the last match. Given my very simple regex ([0-9]+) I see no reason why it should be uninitialised.
I am aware there are easier ways to find and replace the Nth instance in sed but I will require Perl's back and forward references once this hurdle is overcome (not supported by sed)
Does anyone know the cause of this error and how to fix it?
I am using Perl v5.18.2 , built for x86_64-linux-gnu-thread-multi
Thank you for your time.
$1 is only set after you capture a pattern, for example:
$foo =~ /([0-9]+)/;
# $1 equals whatever was matched between the parens above
Try wrapping your matching in parens to capture it to $1
I would write it like this
The while loop iterates over occurrences of the \d+ pattern in the string. When the Nth occurrence is found the last match is replaced using a call to substr using the values in built-in arrays #- (the start of the last match) and #+ (the end of the last match)
#!/usr/bin/env perl
use strict;
use warnings;
#ARGV = ( 3, 'INTEGER_PLACEHOLDER', 'method(0, 1, 6);' );
if ( #ARGV != 3 ) {
print qq{\nUsage: replace_integer.pl occurrence replacement to_replace};
print qq{\nE.g. `./replace_integer.pl 1 "INTEGER_PLACEHOLDER" "method(0 , 1, 6);"`};
print qq{\nWould output: "method(INTEGER_PLACEMENT , 1, 6);"\n};
exit;
}
my ( $occurrence, $replacement, $string ) = #ARGV;
my $n;
while ( $string =~ /\d+/g ) {
next unless ++$n == $occurrence;
substr $string, $-[0], $+[0]-$-[0], $replacement;
last;
}
print "RESULT: $string\n";
output
$ replace_integer.pl 3 INTEGER_PLACEHOLDER 'method(0, 1, 6);'
RESULT: method(0, 1, INTEGER_PLACEHOLDER);
$ replace_integer.pl 2 INTEGER_PLACEHOLDER 'method(0, 1, 6);'
RESULT: method(0, INTEGER_PLACEHOLDER, 6);
$ replace_integer.pl 1 INTEGER_PLACEHOLDER 'method(0, 1, 6);'
RESULT: method(INTEGER_PLACEHOLDER, 1, 6);
This is a part of my input Genbank file:
LOCUS AC_000005 34125 bp DNA linear VRL 03-OCT-2005
DEFINITION Human adenovirus type 12, complete genome.
ACCESSION AC_000005 BK000405
VERSION AC_000005.1 GI:56160436
KEYWORDS .
SOURCE Human adenovirus type 12
ORGANISM Human adenovirus type 12
Viruses; dsDNA viruses, no RNA stage; Adenoviridae; Mastadenovirus.
REFERENCE 1 (bases 1 to 34125)
AUTHORS Davison,A.J., Benko,M. and Harrach,B.
TITLE Genetic content and evolution of adenoviruses
JOURNAL J. Gen. Virol. 84 (Pt 11), 2895-2908 (2003)
PUBMED 14573794
And I want to extract the journal title for example J. Gen. Virol. (not including the issue number and pages)
This is my code and it doesn't give any result so I am wondering what goes wrong. I did use parentheses for $1, $2 etc... And though it worked my tutor told me to try without using that method, use substr instead.
foreach my $line (#lines) {
if ( $line =~ m/JOURNAL/g ) {
$journal_line = $line;
$character = substr( $line, $index, 2 );
if ( $character =~ m/\s\d/ ) {
print substr( $line, 12, $index - 13 );
print "\n";
}
$index++;
}
}
Another way to do this, is to take advantage of BioPerl, which can parse GenBank files:
#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;
my $io=Bio::SeqIO->new(-file=>'AC_000005.1.gb', -format=>'genbank');
my $seq=$io->next_seq;
foreach my $annotation ($seq->annotation->get_Annotations('reference')) {
print $annotation->location . "\n";
}
If you run this script with AC_000005.1 saved in a file called AC_000005.1.gb, you get:
J. Gen. Virol. 84 (PT 11), 2895-2908 (2003)
J. Virol. 68 (1), 379-389 (1994)
J. Virol. 67 (2), 682-693 (1993)
J. Virol. 63 (8), 3535-3540 (1989)
Nucleic Acids Res. 9 (23), 6571-6589 (1981)
Submitted (03-MAY-2002) MRC Virology Unit, Church Street, Glasgow G11 5JR, U.K.
Rather than matching and using substr, it is much easier to use a single regex to capture the whole JOURNAL line and use brackets to capture the text representing the journal information:
foreach my $line (#lines) {
if ($line =~ /JOURNAL\s+(.+)/) {
print "Journal information: $1\n";
}
}
The regular expression looks for JOURNAL followed by one or more whitespace characters, and (.+) captures the rest of the characters in the line.
To get the text without using $1, I think you're trying to do something like this:
if ($line =~ /JOURNAL/) {
my $ix = length('JOURNAL');
# variable containing the journal name
my $j_name;
# while the journal name is not defined...
while (! $j_name) {
# starting with $ix = the length of the word JOURNAL, get character $ix in the string
if (substr($line, $ix, 1) =~ /\s/) {
# if it is whitespace, increase $ix by one
$ix++;
}
else {
# if it isn't whitespace, we've found the text!!!!!
$j_name = substr($line, $ix);
}
}
If you already know how many characters there are in the left-hand column, you can just do substr($line, 12) (or whatever) to retrieve a substring of $line starting at character 12:
foreach my $line (#lines) {
if ($line =~ /JOURNAL/) {
print "Journal information: " . substr($line, 12) . "\n";
}
}
You can combine the two techniques to eliminate the issue number and dates from the journal data:
if ($line =~ /JOURNAL/) {
my $j_name;
my $digit;
my $indent = 12; # the width of the left-hand column
my $ix = $indent; # we'll use this to track the characters in our loop
while (! $digit) {
# starting with $ix = the length of the indent,
# get character $ix in the string
if (substr($line, $ix, 1) =~ /\d/) {
# if it is a digit, we've found the number of the journal
# we can stop looping now. Whew!
$digit = $ix;
# set j_name
# get a substring of $line starting at $indent going to $digit
# (i.e. of length $digit - $indent)
$j_name = substr($line, $indent, $digit-$indent);
}
$ix++;
}
print "Journal information: $j_name\n";
}
I think it would have been easier just to get the data from the Pubmed API! ;)
Have a strange issue with my regex.
My regex is truncating the last character , in the example below it should return the value 32 but it is instead returning 3.
Note that the value could be up to 10 digits!!!
$word = "thisisit=";
$line = "hello thisisit=32 byefornow ";
if ($line =~ m/$word(.*?)\d /)
{
print $1; #returns 3 instead of 32
}
Thanks.
You can do:
if ($line =~ /$word(\d+)/) # This will capture all numbers after your $word
{
print $1;
}
You can also refine to:
if ($line =~ /$word\s*(\d+)/) # In case you're having like "thisisit= 32 byefornow"
Or, to capture everything and stop after first white space:
if ($line =~ /$word(.+?)\s/)
{
print $1;
}
You should ask for it to return zero or any number of digits:
($line =~ m/$word(.*?)\d* /)
At least one digit: \d+
Two digits: \d{2}
I'm not sure what you are looking for here, in terms of the specs.