regex expression to fetch last week data using perl - regex

I have a Perl script which will generate an output in one csv file. Perl script is giving an output based on monthly basis and executing on every Thursday. Please see the regular expression which i put into perl script.
'{"Date":{"\$regex":/^(0[1-9]|[12][0-9]|3[01])-10-2017/i}}'
But once the this output generates, i need to copy last week data(i.e. for e.g. 12th Oct to 18th) and needs to send to members. So i want regular expression in such way that it will send me an based on output last week.(Concurrent last week)

It would be possible to generate a regex something like the following:
use strict;
use warnings;
use POSIX qw/strftime/;
my $now = time;
my #dates;
push #dates, strftime "%d-%m-%Y", localtime($now - $_ * 86_400 ) for (0..6);
my $regex_string = '(' . join( '|', #dates) . ')';
Hope that will work for you
I am aware of the problem one could potentially have with this, when
it's just after the moment Daylight Saving Time would start -- if
that is really a problem, then iterate over a for loop for 13 times,
with steps of 43.200 seconds, which is half a day decrements.

Related

Use RegEx to find dates and increment year by a value

I have a large number of files that contain dates. I would like to use a Regular Expression to find the dates and if possible increment the year of the date by 10.
The files can have multiple date formats ..
04/22/78
06-OCT-14
How would one write a regular expression that could find, increment, and replace the dates, or even just the year of the dates?
I plan to use a text editor like Text Pad, UltraEdit, or Notepad++ to search the files
Assuming the pattern of date is date.month.year. . in date can be any field separator.
You can use simple perl program to do this:
perl -ne 's/(\d+)$/($1+10)/e && print' filename
This will add 10 to the year, and print the date.
Output for this is:
04/22/88
06-OCT-24
Just wrote this python snippet to get it done.
import re
def add_ten_years(date):
reg = "((\d{2})(.)(\w{2,4})(.)(\d{2}))"
mat = re.search(reg, date)
if mat:
mat = mat.groups()
return ''.join(mat[1:5])+str(int(mat[5])+10)
print add_ten_years("04/22/78")
print add_ten_years("06-OCT-14")
You can configure the regex pattern to generalize it even more. Or can be easily translated to other languages. Hope it helped!

Perl, regular expression, matching exactly 2 spaces does not work

Working on the parser for STA/SSTA timing reports. The following cases of "Arrival Time" occurrence are possible:
Arrival Time 3373.000
- Arrival Time 638.700 | 100.404
Arrival Time Report
The goal is to match cases 1st and 2nd, but ignore 3rd case.
I tried two matching patterns in my Perl code:
1) if (m/^-?\s{1,2}Arrival\sTime/) { ($STA_DATA{$file}{$path}{Arrival_Time}) = m/\sArrival\sTime\s+(.*)\s+$/ }
2) if (m/^-\sArrival\sTime/ || m/^\s{1,2}Arrival\sTime/) { ($STA_DATA{$file}{$path}{Arrival_Time}) = m/\sArrival\sTime\s+(.*)\s+$/ }
Both of them pick up the 3rd case as well. I do not understand why.
I defined specifically one or two space characters \s{1,2}, no more than that. As the 3rd line contains more than two whitespace character it should not match the pattern. How is this possible?
The data you have published is not the same as you used in your test.
This program checks both of the regex patterns against the data copied directly from an edit of your original post. Neither pattern matches any of the lines in your data
use strict;
use warnings;
use 5.010;
my (%STA_DATA, $file, $path);
while ( <DATA> ) {
if ( /^-?\s{1,2}Arrival\sTime/ ) {
say 'match1';
$STA_DATA{$file}{$path}{Arrival_Time} = m/\sArrival\sTime\s+(.*)\s+$/
}
if ( /^-\sArrival\sTime/ or m/^\s{1,2}Arrival\sTime/ ) {
say 'match2';
$STA_DATA{$file}{$path}{Arrival_Time} = m/\sArrival\sTime\s+(.*)\s+$/
}
}
__DATA__
Arrival Time 3373.000
- Arrival Time 638.700 | 100.404
Arrival Time Report
Here is a possible workaround you can try:
if (m/^-?\s{1,2}Arrival\sTime\s{2,}/) { ($STA_DATA{$file}{$path}{Arrival_Time}) = m/\sArrival\sTime\s+(.*)\s+$/ }
You can match the string "Arrival Time " with two or more spaces after it, ruling out the string "Arrival Time Report"
Can you confirm your regex is inside a loop reading the input line by line ?
In case $_ contains the whole text your observation would be expected because you anchored the extracting regex to the end of the text by using a $.
It should help to replace spaces in your data with Unicode U+2423 OPEN BOX that is commonly used to signify a space using a visible character.
␣␣␣␣␣␣Arrival␣Time␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣3373.000
␣␣␣␣-␣Arrival␣Time␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣638.700␣|␣100.404
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣Arrival␣Time␣Report␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣
As rightfully requested by Borodin, for the learning of others I'm gong to explain the mistake I have done and show the solution.
The mistake that I have done is following:
I wrongly assumed that my matching pattern is being applied on the text as seen in the .rpt file.
Three cases (relevant for my matching pattern) that can occur in such a file are following:
Arrival Time 3373.000
- Arrival Time 638.700 | 100.404
Arrival Time Report
But, I have forgotten that somewhere in the code I have implemented following line:
s/->//g; s/\s\S+\s[v\^]\s//g; s/\s+/ /g;
It is namely the last substitution in this series of substitutions that changes the original text into:
Arrival Time 3373.000
- Arrival Time 638.700 | 100.404
Arrival Time Report
There for my matching patterns (that are presented in the question above) did not work.
Knowing this, the solution is simple. I have adjusted matching pattern as follows:
if (m/^\-?\sArrival\sTime\s\d+/) { ($STA_DATA{$file}{$path}{Arrival_Time}) = m/\sArrival\sTime\s(.*)\s?$/ }
I appreciate all the help and feedback received, and I truly sorry for wasting everyone's time with this ill defined problem.

regex maching after new line in perl

i am trying to match with regex in perl different parts of a text which are not in the same line.
I have a file sized 200 mb aprox with all cases similar to the following example:
rewfww
vfresrgt
rter
*** BLOCK 049 Aeee/Ed "ewewew"U 141202 0206
BLAH1
BLAH2
END
and i want to extract all what is in the same line after the "***" in $1, BLAH1 in $2 and BLAH2 in $3.
i have tried the following without success:
open(archive, "C:/Users/g/Desktop/blahs.txt") or die "die\n";
while(< archive>){
if($_ =~ /^\*\*\*(.*)\n(.*)/s){
print $1;
print $2;
}
}
One more complexity: i don´t know how many BLAH´s are in each case. Perhaps one case have only BLAH1, other case with BLAH1, BLAH2 and BLAH3 etc. The only thing thats sure is the final "END" who separates the cases.
Regards
\*\*\*([^\n]*)\n|(?!^)\G\s*(?!\bEND\b)([^\n]+)
Try this.See demo.
https://regex101.com/r/vN3sH3/17
How about:
#!/usr/bin/perl
use strict;
use warnings;
open(my $archive, '<', "C:/Users/g/Desktop/blahs.txt") or die "die: $!";
while(<$archive>){
if (/^\*{3}/ .. /END/) {
s/^\*{3}//;
print unless /END/;
}
}
As far as I understand your question the following works for me. Please update or provide feedback if you are looking for something more or less strict (or spot any mistakes!).
^(\*{3}.*\n{2})(([a-zA-Z])*([0-9]*)\n{2})*(END)$
^(\*{3}\n{2}) - Find line consisting of three *s followed by two newlines - You could repeat this by adding * after the last closing parenthesis if you want/need to check for a "false" start. While it looks like you may have data in the file before this but this is the start of the data you actually care about/want to capture.
(([a-zA-Z])*([0-9]*)\n{2})* -The desired word characters followed by a number (or numbers if your BLAH count >9) and also check for two trailing spaces. The * at the end denotes that this can repeat zero or more times which accounts for the case where you have no data. If you want a fail if there is not data use ? instead of * to denote it must repeat 1 or more times. this segment assumes you wanted to check for data in the format word+number. If that is not the case this part can be easily modified to accept a wider range of data - let me know if you want/need a more or less strict case
(END)$ - The regex ends with sequence "END". If it is permissible for the data to continue and you just want to stop capture at this point do not include the $
I don't have permissions to post pics yet but a great site to check and to see a visual representation of your regex imo is https://www.debuggex.com/

use perl to replace unit timestamp within text string

Ok, I a have a data file with two columns of data. They are RecordNumber and Notes. They are separated by pipes and look like this.
Record1|1234567890 username notes notes notes notes 1254184921 username notes notes notes notes|
... This goes on for thousands of records.
Using a perl script (and possible some regex) I need to take the notes column and parse it out to make 3 new columns separated with pipes to load into a table. The columns need to be Note_Date|Note_Username|Note_Text.
The 10-digit string of numbers throughout the notes column is a unix timestamp. My second task is to take this and convert it to a regular timestamp. Please, any help would be appreciated.
Thanks.
You may need to modify this for your needs:
use strict;
use warnings;
while (<>) {
my #a = split(/\|/);
while ($a[1]=~/\s*(\d+)\s+(\w+)\s+([^0-9]*)/g) {
my ($t, $u, $n) = ($1, $2, $3);
$t = localtime($t);
print $a[0], "|$t $u $n|\n";
}
}

Use Regex to modify specific column in a CSV

I'm looking to convert some strings in a CSV which are in 0000-2400 hour format to 00-24 hour format. e.g.
2011-01-01,"AA",12478,31703,12892,32575,"0906",-4.00,"1209",-26.00,2475.00
2011-01-02,"AA",12478,31703,12892,32575,"0908",-2.00,"1236",1.00,2475.00
2011-01-03,"AA",12478,31703,12892,32575,"0907",-3.00,"1239",4.00,2475.00
The 7th and 9th columns are departure and arrival times, respectively. Preferably the lines should look like this when I'm done:
2011-01-01,"AA",12478,31703,12892,32575,"09",-4.00,"12",-26.00,2475.00
The whole csv will eventually be imported into R and I want to try and handle some of the processing beforehand because it will be kinda large. I initially attempted to do this with Perl but I'm having trouble picking out multiple digits w/ a regex. I can get a single digit before a given comma with a lookbehind expression, but not more than one.
I'm also open to being told that doing this in Perl is needlessly silly and I should stick to R. :)
I may as well offer my own solution to this, which is
s/"(\d\d)\d\d"/"$1"/g
Like I mentioned in the comments, using a CSV module like Text::CSV is a safe option. This is a quick sample script of how its used. You'll notice that it does not preserve quotes, though it should, since I put in keep_meta_info. If it's important to you, I'm sure there's a way to fix it.
use strict;
use warnings;
use Data::Dumper;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
eol => $/,
keep_meta_info => 1,
});
while (my $row = $csv->getline(*DATA)) {
for ($row->[6], $row->[8]) {
s/\d\d\K\d\d//;
}
$csv->print(*STDOUT, $row);
}
__DATA__
2011-01-01,"AA",12478,31703,12892,32575,"0906",-4.00,"1209",-26.00,2475.00
2011-01-02,"AA",12478,31703,12892,32575,"0908",-2.00,"1236",1.00,2475.00
2011-01-03,"AA",12478,31703,12892,32575,"0907",-3.00,"1239",4.00,2475.00
Output:
2011-01-01,AA,12478,31703,12892,32575,09,-4.00,12,-26.00,2475.00
2011-01-02,AA,12478,31703,12892,32575,09,-2.00,12,1.00,2475.00
2011-01-03,AA,12478,31703,12892,32575,09,-3.00,12,4.00,2475.00