Finding unique file names from an html file - regex

$ cat downloaded_file.html
1373 STDMON11202010_company.txt<br> Monday, November 22, 2010 1:31 AM
How do I search an html file from my shell script and select the unique filenames those start with STDMON and end with _company.txt

If you have only digits between STDMON and _company.txt you can do:
grep -o 'STDMON[0-9]*_company\.txt' input.txt | sort -u
See it
And if there can be anything you can do:
grep -oP 'STDMON.*?_company\.txt' input.txt | sort -u

awk -F'>|<' '$3 ~ /STDMON[0-9]+_company.txt/ && !a[$0=$3]++' download_file.html
Input
$ cat downloaded_file.html
1373 STDMON11202010_company.txt<br> Monday, November 22, 2010 1:31 AM
1373 STDMON11202010_company.txt<br> Monday, November 22, 2010 1:31 AM
1373 STDMON14959440_company.txt<br> Monday, November 22, 2010 1:31 AM
1373 STDMON11202010_company.txt<br> Monday, November 22, 2010 1:31 AM
1373 STDMON14959440_company.txt<br> Monday, November 22, 2010 1:31 AM
1373 STDMON11202010_company.txt<br> Monday, November 22, 2010 1:31 AM
1373 STDMON12342440_company.txt<br> Monday, November 22, 2010 1:31 AM
Output
$ awk -F'>|<' '$3 ~ /STDMON[0-9]+_company.txt/ && !a[$0=$3]++'
STDMON11202010_company.txt
STDMON14959440_company.txt
STDMON12342440_company.txt

Related

How to identify a list of dates with a specific day value between 2 dates in powershell

So I have this code (thanks to #reeeky2001) that answer my other post Trying to find a way to list all friday dates between 2 dates in powershell :
$FiscalStart = [datetime]'2019-03-31'
$date2 = Get-Date -Hour 0 -Minute 0 -Second 0
$EvaluateDates = 1..($date2 - $FiscalStart).Days | % {$($FiscalStart).AddDays($_)} | ? {$_.DayOfWeek -eq 'monday'}
$EvaluateDates
So let's say we are on feb 09, 2020. if I run the following code, the last monday will be feb 03, 2020. How could I exclude the last date, So the next monday (feb 10, 2020) is in the future? How could I exclude any dates where the next monday would be in the future ? Assuming this code could be run anyday of the week...
At the end I added "-and $_ -lt (get-date)"
$FiscalStart = [datetime]'2019-12-31'
$date2 = Get-Date -Hour 0 -Minute 0 -Second 0
1..($date2 - $FiscalStart).Days | % {($FiscalStart).AddDays($_)} |
? { $_.DayOfWeek -eq 'monday' -and $_ -lt (get-date)}
Monday, January 6, 2020 12:00:00 AM
Monday, January 13, 2020 12:00:00 AM
Monday, January 20, 2020 12:00:00 AM
Monday, January 27, 2020 12:00:00 AM
Monday, February 3, 2020 12:00:00 AM

FILTER non-working days from range/list of dates

I'm trying to create a list of dates between a start date and an end date (done). But now, I want to FILTER weekends out of that list.
The start date is defined, but the end date is based on a number of working days after the start date. The problem is, when I create the list using the following formula, all dates in between are included and I've made numerous attempts to FILTER said dates using WORKDAY.INTL and REGEXMATCH without success. Is it possible to modify this particular formula or do I need to start over with something different?
=ArrayFormula(TO_DATE(row(indirect("A"&A2):indirect("A"&B2))))
Here is an example of what I've done.
This is what I'm getting:
Friday, October 4, 2019
Saturday, October 5, 2019
Sunday, October 6, 2019
Monday, October 7, 2019
Tuesday, October 8, 2019
Wednesday, October 9, 2019
Thursday, October 10, 2019
Friday, October 11, 2019
Saturday, October 12, 2019
Sunday, October 13, 2019
This is what I'm after:
Friday, October 4, 2019
Monday, October 7, 2019
Tuesday, October 8, 2019
Wednesday, October 9, 2019
Thursday, October 10, 2019
Friday, October 11, 2019
Monday, October 14, 2019
Tuesday, October 15, 2019
Wednesday, October 16, 2019
Thursday, October 17, 2019
See if this works
=query(ArrayFormula(TO_DATE(row(indirect("A"&A2):indirect("A"&B2)))), "where dayOfWeek(Col1) <> 7 and dayOfWeek(Col1) <> 1")
you can do it like this:
=ARRAYFORMULA(FILTER(ROW(INDIRECT("A"&A2&":A"&B2)),
REGEXMATCH(TEXT(ROW(INDIRECT("A"&A2&":A"&B2)), "ddd"), "[^(Sat|Sun)]")))

display each day's date python from today

I was able to display the week that starts every Saturday by:
today = now().date()
sat_offset = (today.weekday() - 5) % 7
week_start = today - datetime.timedelta(days=sat_offset)
This will display the week from last Saturday but how would I show the dates of each day forward as well? So if the week: Oct. 27, 2018 is display it should say:
Saturday : Oct. 27, 2018
Sunday: Oct. 28, 2018
Monday: Oct. 29, 2018
Tuesday: Oct. 30, 2018
Wednesday: Oct. 31, 2018
Thursday: Nov. 01, 2018
Friday: Nov. 02, 2018
Thank you for your help.
You can iterate through the days of the week using range and time delta like so:
for i in range(7):
week_start += datetime.timedelta(days=1)
print(week_start.strftime("%A %d. %B %Y"))
This will produce a dates like:
Monday : Oct. 28, 2018
Tuesday : Oct. 29, 2018
Wednesday : Oct. 30, 2018
Thursday : Oct. 31, 2018
Friday : Nov. 01, 2018
Saturday : Nov. 02, 2018
Sunday : Nov. 03, 2018
You can format the string how ever you want. Here is some info on dates in python.

regular expression in R with nth occurrence

I am reading a text file in R and want to replace every 3rd occurrence of '|' with '\n' here is my code and input data
**Input Data**
======================
'Monday, November 2, 2015|10:21:27|17:58:12|Tuesday, November 3, 2015|10:13:09|18:52:44|Wednesday, November 4, 2015|10:11:52|18:40:36|Thursday, November 5, 2015|10:31:42|18:16:57|Friday, November 6, 2015|10:13:13|--|Saturday, November 7, 2015|--|--|Sunday, November 8, 2015|--|--|Monday, November 9, 2015|--|--|Tuesday, November 10, 2015|10:03:20|18:07:52|Wednesday, November 11, 2015|09:40:20|18:42:20|Thursday, November 12, 2015|10:38:56|18:37:20|Friday, November 13, 2015|10:45:26|18:09:54|Saturday, November 14, 2015|--|--|Sunday, November 15, 2015|--|--|Monday, November 16, 2015|--|--|Tuesday, November 17, 2015|10:11:43|18:36:15|Wednesday, November 18, 2015|--|--|Thursday, November 19, 2015|--|--|Friday, November 20, 2015|12:14:25|20:25:08|Saturday, November 21, 2015|--|--|Sunday, November 22, 2015|--|--|Monday, November 23, 2015|10:08:08|17:57:35|Tuesday, November 24, 2015|14:30:32|--|'
**My R-Code**
====================
emp <- readChar(FileDir, (file.info(FileDir)$size-172))
emp <- gsub("\r\n","|",emp)
empTMP <- gsub('([^|]*|[^|]*|[^|]*)|',"\1\n",emp)
**output**
====================
"\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n|\001\n"
**Required output**
====================
Monday, November 2, 2015|10:21:27|17:58:12
Tuesday, November 3, 2015|10:13:09|18:52:44
Wednesday, November 4, 2015|10:11:52|18:40:36
Thursday, November 5, 2015|10:31:42|18:16:57
Friday, November 6, 2015|10:13:13|--
Saturday, November 7, 2015|--|--
Sunday, November 8, 2015|--|--
Monday, November 9, 2015|--|--
Tuesday, November 10, 2015|10:03:20|18:07:52
Wednesday, November 11, 2015|09:40:20|18:42:20
Thursday, November 12, 2015|10:38:56|18:37:20
Friday, November 13, 2015|10:45:26|18:09:54
Saturday, November 14, 2015|--|--
Sunday, November 15, 2015|--|--
Monday, November 16, 2015|--|--
Tuesday, November 17, 2015|10:11:43|18:36:15
Wednesday, November 18, 2015|--|--
Thursday, November 19, 2015|--|--
Friday, November 20, 2015|12:14:25|20:25:08
Saturday, November 21, 2015|--|--
Sunday, November 22, 2015|--|--
Monday, November 23, 2015|10:08:08|17:57:35
Tuesday, November 24, 2015|14:30:32|--
Kindly help what I am doing wrong, I check the above regular expression in text editor it works perfectly fine however in "R" it is not producing the correct result.
The following works:
#input <- #your input string
x <- strsplit(input, split = "|", fixed = TRUE)[[1L]]
idx <- seq(3L, length(x), by = 3L)
x[idx] <- paste0(x[idx], "\n")
x[-idx] <- paste0(x[-idx], "|")
paste(x, collapse = "")
Or in one command:
paste(paste0(x <- strsplit(input, split = "|", fixed = TRUE)[[1L]],
rep_len(c("|", "|", "\n"), length(x))), collapse = "")
And if you wanted to stick with gsub, this works as well:
gsub("([^|]*\\|[^|]*\\|[^|]*)\\|", "\\1\n", input)
Broken down (regex101 colored version):
gsub(paste0("(", #start capturing group 1
"[^|]*", #Matching anything but | 0 or more times
"\\|", #Match | (must escape because it's reserved for OR)
"[^|]*\\|", #again
"[^|]*", #again matching anything but |
")", #end captured group
"\\|"), #captured group is followed by a third |
"\\1\n",input) #replace match with captured group followed by \n
# (instead of |)
(just noticed your original attempt is very close. just that you forgot to escape things properly: "\\1", not "\1", and "|" is reserved so we have to escape that as well. Also #CAFEBABE is right that this seems better suited to awk...)
empTMP <- gsub('([^|]*|[^|]*|[^|]*)|',"\1\n",emp)
this is the line which causes imho the trouble.
It should be
empTMP <- gsub('([^|]*|[^|]*|[^|]*)|',"\\1\n",emp)
(note the \\1)
On the side: why do you want to use R for this task. Looks like something for standard shell scripting.
On the side 2: why teradata? On which TD box do you use R?

How do I get coupon payment dates for a simple fixed bond using quantlib, quantlib-swig and python

I am trying yo learn quantlib (1.3) & python bindings using quantlib-swig (1.2) in ubuntu 13.04. As a starter I am trying to determine the payment dates for a very simple bond as given below using 30/360 European day counter
from QuantLib import *
faceValue = 100.0
doi = Date(31, August, 2000)
dom = Date(31, August, 2008)
coupons = [0.05]
dayCounter = Thirty360(Thirty360.European)
schedule = Schedule(doi, dom, Period(Semiannual),
India(),
Unadjusted, Unadjusted,
DateGeneration.Backward, False)
Following are my questions:
Which method of schedule object will give me the payment dates?
Where do I need to specify the dayCounter object so that the dates are appropriately calculated?
Using Dimitri Reiswich' Presentation, I tried mimicking C++ code, but schedule.dates() returns an error as no such method.
The payment dates for this Fixed Rate bond are, (obtained by using oocalc)
Feb 28, 2001; Aug 31, 2001
Feb 28, 2002; Aug 31, 2002
Feb 28, 2003; Aug 31, 2003
Feb 29, 2004; Aug 31, 2004
Feb 28, 2005; Aug 31, 2005
Feb 28, 2006; Aug 31, 2006
Feb 28, 2007; Aug 31, 2007
Feb 29, 2008; Aug 31, 2008
How do I get the payment dates for this simple bond using python & quantlib? Can someone please help?
regards
K
If you want to look at the schedule you just generated, you can iterate over it:
>>> for d in schedule: print d
...
August 31st, 2000
February 28th, 2001
August 31st, 2001
February 28th, 2002
August 31st, 2002
February 28th, 2003
August 31st, 2003
February 29th, 2004
August 31st, 2004
February 28th, 2005
August 31st, 2005
February 28th, 2006
August 31st, 2006
February 28th, 2007
August 31st, 2007
February 29th, 2008
August 31st, 2008
or call list(schedule) if you want to store them. However, are you sure that those are the payment dates? They are the start and end date for accrual calculation; but some of these fall on a Saturday or a Sunday, and the bond will be paying on the next business day. You can see the effect if you instantiate the bond and retrieve the coupons:
>>> settlement_days = 3
>>> bond = FixedRateBond(settlement_days, faceValue, schedule, coupons, dayCounter)
>>> for c in bond.cashflows():
... print c.date()
...
February 28th, 2001
August 31st, 2001
February 28th, 2002
September 2nd, 2002
February 28th, 2003
September 1st, 2003
March 1st, 2004
August 31st, 2004
February 28th, 2005
August 31st, 2005
February 28th, 2006
August 31st, 2006
February 28th, 2007
August 31st, 2007
February 29th, 2008
September 1st, 2008
September 1st, 2008
(that is, unless Saturdays and Sundays shouldn't be holidays for the Indian calendar. If you think they shouldn't, file a bug report with QuantLib).