Use RegEx to find dates and increment year by a value - regex

I have a large number of files that contain dates. I would like to use a Regular Expression to find the dates and if possible increment the year of the date by 10.
The files can have multiple date formats ..
04/22/78
06-OCT-14
How would one write a regular expression that could find, increment, and replace the dates, or even just the year of the dates?
I plan to use a text editor like Text Pad, UltraEdit, or Notepad++ to search the files

Assuming the pattern of date is date.month.year. . in date can be any field separator.
You can use simple perl program to do this:
perl -ne 's/(\d+)$/($1+10)/e && print' filename
This will add 10 to the year, and print the date.
Output for this is:
04/22/88
06-OCT-24

Just wrote this python snippet to get it done.
import re
def add_ten_years(date):
reg = "((\d{2})(.)(\w{2,4})(.)(\d{2}))"
mat = re.search(reg, date)
if mat:
mat = mat.groups()
return ''.join(mat[1:5])+str(int(mat[5])+10)
print add_ten_years("04/22/78")
print add_ten_years("06-OCT-14")
You can configure the regex pattern to generalize it even more. Or can be easily translated to other languages. Hope it helped!

Related

regex expression to fetch last week data using perl

I have a Perl script which will generate an output in one csv file. Perl script is giving an output based on monthly basis and executing on every Thursday. Please see the regular expression which i put into perl script.
'{"Date":{"\$regex":/^(0[1-9]|[12][0-9]|3[01])-10-2017/i}}'
But once the this output generates, i need to copy last week data(i.e. for e.g. 12th Oct to 18th) and needs to send to members. So i want regular expression in such way that it will send me an based on output last week.(Concurrent last week)
It would be possible to generate a regex something like the following:
use strict;
use warnings;
use POSIX qw/strftime/;
my $now = time;
my #dates;
push #dates, strftime "%d-%m-%Y", localtime($now - $_ * 86_400 ) for (0..6);
my $regex_string = '(' . join( '|', #dates) . ')';
Hope that will work for you
I am aware of the problem one could potentially have with this, when
it's just after the moment Daylight Saving Time would start -- if
that is really a problem, then iterate over a for loop for 13 times,
with steps of 43.200 seconds, which is half a day decrements.

Get part of a string based on conditions using regex

For the life of me, I can't figure out the combination of the regular expression characters to use to parse the part of the string I want. The string is part of a for loop giving a line of 400 thousand lines (out of order). The string I have found by matching with the unique number passed by an array for loop.
For every string I'm trying to get a date number (such as 20151212 below).
Given the following examples of the strings (pulled from a CSV file with 400k++ lines of strings):
String1:
314513,,Jr.,John,Doe,652622,U51523144,,20151212,A,,,,,,,
String2:
365422,johnd#blankity.com,John,Doe.,Jr,987235,U23481,z725432,20160221,,,,,,,,
String3:
6231,,,,31248,U51523144,,,CB,,,,,,,
There are several complications here...
Some names have a "," in them, so it makes it more than 15 commas.
We don't know the value of the date, just that it is a date format such as (get-date).tostring("yyyyMMdd")
For those who can think of a better way...
We are given two CSV files to match. Algorithmic steps:
Look in the CSV file 1 for the ID Number (found on the 2nd column)
** No ID Numbers will be blank for CSV file 1
Look in the CSV file 2 and match the ID number from CSV file 1. On this same line, get the date. Once have date, append in 5th column on CSV file 1 with the same row as ID number
** Note: CSV file 2 will have $null for some of the values in the ID
number column
I'm open to suggestions (including using the Import-Csv cmdlet in which I am not to familiar with the flags and syntax of for loops with those values yet).
You could try something like this:
,(19|20)[0-9]{2}(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01]),
This will match all dates in the given format from 1900 - 2099. It is also specific enough to rule out most other random numbers, although without a larger sample of data, it's impossible to say.
Then in PowerShell:
gc data.csv | where { $_ -match ",((19|20)[0-9]{2}(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])),"} | % { $matches[1] }
In the PowerShell match we added capturing parenthesis around what we want, and reference the group via the group number in the $matches index.
If you are only interested in matching one line based on a preceding id you could use a lookbehind. For example,
$id=314513; # Or maybe U23481
gc c:\temp\reg.txt | where { $_ -match "(?<=$id.*),((19|20)[0-9]{2}(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])),"} | % { $matches[1] }

Find and Replace Regular Expression (NotePad++)

I'm looking at a global search and replace within NotePad++ which will look for the following text...
date and %c
examples would be...
date +%m-%d%H:%M:%S%c
date +%m-%d%H%c:%M:%S
date +%c-%m-%d%H:%M:%S
etc.
I can find the string by searching for
date (.)+%c
but I can't think for the life of me what the replacement would be. I want to replace %c with %z (or something else later on) and keep the remaining text.
Thanks for any help.
(date.*?)%c
Replace by :$1%z
You can search for
date (.*?)%c(.*)
and replace it by
date $1%z$2
Output for your examples:
date +%m-%d%H:%M:%S%z
date +%m-%d%H%z:%M:%S
date +%z-%m-%d%H:%M:%S

Detect quoted strings in sql query

I am writing a bash script that I am using to detect certain classes of strings in a SQL query (like all upper-case, all lowercase, all numeric characters, etc...). Before doing the classification, I want to extract all quoted strings. I am having trouble getting a regex that will properly extract the quoted strings from the query string. For example, take this query from the TPCH benchmark:
select
o_year,
sum(case
when nation = 'JAPAN' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) as volume,
n2.n_name as nation
from
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'ASIA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01' and date '1996-12-31'
and p_type = 'MEDIUM BRUSHED BRASS'
) as all_nations
group by
o_year
order by
o_year;
Its a complex query, but that is besides the point. I need to be able to extract all of the single-quoted strings from this file and print them on their own line. ie:
'JAPAN'
'ASIA'
'1995-01-01'
'1996-12-31'
'MEDIUM BRUSHED BRASS'
Right now, (being that I'm not very familiar with regex) all I have is:
printf '%s\n' $SQL_FILE_VARIABLE | grep -E "'*'"
But this doesn't support strings with spaces, and it doesn't work when multiple strings are on the same line of the file. Ideally, I can get this to work in my bash script, so preferably the solution will be grep/sed/perl. I have done some googling and have found solutions to similar problems, but I have not been able to get them to work for this in particular.
Any Ideas how I can achieve this? Thanks.
You want something like this:
printf '%s\n' $SQL_FILE_VARIABLE | grep -E "'[^']*'"
Why not try /'(.*)?'/g
This means, between the quotes, match everything and extract it.

use perl to replace unit timestamp within text string

Ok, I a have a data file with two columns of data. They are RecordNumber and Notes. They are separated by pipes and look like this.
Record1|1234567890 username notes notes notes notes 1254184921 username notes notes notes notes|
... This goes on for thousands of records.
Using a perl script (and possible some regex) I need to take the notes column and parse it out to make 3 new columns separated with pipes to load into a table. The columns need to be Note_Date|Note_Username|Note_Text.
The 10-digit string of numbers throughout the notes column is a unix timestamp. My second task is to take this and convert it to a regular timestamp. Please, any help would be appreciated.
Thanks.
You may need to modify this for your needs:
use strict;
use warnings;
while (<>) {
my #a = split(/\|/);
while ($a[1]=~/\s*(\d+)\s+(\w+)\s+([^0-9]*)/g) {
my ($t, $u, $n) = ($1, $2, $3);
$t = localtime($t);
print $a[0], "|$t $u $n|\n";
}
}