javascript regular expression: how do I find date without year or date with year<2010 - regex

I need to find date without year, or date with year<2010.
basically,
Feb 15
Feb 20
Feb 20, 2009
Feb 20, 1995
should be accepted
Feb 20, 2010
Feb 20, 2011
should be rejected
How do I do it?
Thanks,
Cheng

Try this:
(Jan|Feb|Mar...Dec)\s\d{1,2},\s([1][0-9][0-9][0-9]|200[0-9])
Note: Expand the month list with proepr names. I was too lazy to spell it all out.

Related

Power Query/DAX to calculate monthly raw sales figure

Dear stackoverflow, please help!
I'm hoping for some assistance with data processing in Power BI, either using Power Query or DAX. At this point I am really stuck and can't figure out how to solve this problem.
The below table is a list of sales by Product, Month, and Year. The problem with my data is that the value in the sales data is actually cumulative, rather than the raw figure of sales for that month. In other words, the figure is the sum of the number of sales for the month (for that Year and Product combination) and the number of sales for the preceding month. As you will see in the table below, the number gets progressively larger in each category as the year progresses. The true number of sales for TVs in Feb of 2021, for example, is the sales figure of 3 minus the corresponding figure for sales of TVs in Jan of 2021 (1).
I really would appreciate if anyone knows of a solution to this problem. In reality, my table has hundreds of thousands of rows, so I cannot do the calculations manually.
Is there a way to use Power Query or DAX to create a calculated column with the Raw Sales figure for each month? Something that would check if Product and Year are equal, then subtract the Jan figure from the Feb figure and so on?
Any help will be very much appreciated,
Sales Table
Product
Sales (YTD)
Month
Year
TV
1
Jan
2021
Radio
4
Jan
2021
Cooker
5
Jan
2021
TV
3
Feb
2021
Radio
5
Feb
2021
Cooker
6
Feb
2021
TV
3
Mar
2021
Radio
6
Mar
2021
Cooker
8
Mar
2021
TV
5
Apr
2021
Radio
7
Apr
2021
Cooker
8
Apr
2021
TV
7
May
2021
Radio
8
May
2021
Cooker
8
May
2021
TV
9
Jun
2021
Radio
10
Jun
2021
Cooker
10
Jun
2021
TV
10
Jul
2021
Radio
10
Jul
2021
Cooker
10
Jul
2021
TV
11
Aug
2021
Radio
13
Aug
2021
Cooker
12
Aug
2021
TV
11
Sep
2021
Radio
13
Sep
2021
Cooker
12
Sep
2021
TV
12
Oct
2021
Radio
14
Oct
2021
Cooker
13
Oct
2021
TV
17
Nov
2021
Radio
19
Nov
2021
Cooker
17
Nov
2021
TV
19
Dec
2021
Radio
20
Dec
2021
Cooker
20
Dec
2021
TV
4
Jan
2022
Radio
2
Jan
2022
Cooker
3
Jan
2022
TV
5
Feb
2022
Radio
3
Feb
2022
Cooker
5
Feb
2022
Thanks, Jim
Give this a try in powerquery / M. It groups on Product and Year, then sorts the months, and subtracts each row from the next row to determine the period amount.
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"Product", "Year"}, {
{"data", each
let r=Table.Sort(Table.AddIndexColumn(_, "Index", 0, 1),{ each List.PositionOf({"Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"}, [Month]), {"Month",Order.Ascending}}),
x= Table.AddColumn( r, "Period Sales", each if [Index]=0 then [#"Sales (YTD)"] else [#"Sales (YTD)"]-r{[Index]-1}[#"Sales (YTD)"])
in x
, type table }
}),
#"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", {"Sales (YTD)", "Month", "Period Sales"}, {"Sales (YTD)", "Month", "Period Sales"})
in #"Expanded data"

how do I use f-string with regex in Python

This code works if I use raw strings only. However, as soon as I add f to r it stops working.
Is there a way to make f-strings work with raw strings for re?
import re
lines = '''
04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009
Feb 2009; Sep 2009; Oct 2010
6/2008; 12/2009
2009; 2010
'''
rmonth = 'a'
regex = fr'(\d{1,2})/(\d{1,2})/(\d{4}|\d{2})'
date_found = re.findall(regex, lines)
date_found
The new fstrings in Python interpret brackets in their own way. You can escape brackets you want to see in the output by doubling them:
regex = fr'(\d{{1,2}})/(\d{{1,2}})/(\d{{4}}|\d{{2}})'

All CSV values in column 0 are strings

For some reason a csv file I wrote (win7) with Python has all the values as a string in column 0 and cannot perform any operation.
It has no labels.
The format is (I would like to keep the last value - date - as a date format):
"Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
EDIT - When I read it with the csv module it prints it out like:
['Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0," date: Feb 04, 2016\t\t\t"']
What is the best way to convert the strings into comma separated values like this?
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date:, Feb 04, 2016
Thanks a lot.
s="Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
print(s)
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date: Feb 04, 2016
to add a comma after "date:" you need to add some logic (like replace ":" with ":,"; or after first word etc.
First, your date field is quoted, which is ok (and needed) because there is a comma inside:
" date: Feb 04, 2016 "
But then the whole line also gets quoted (and thus seen as a single field). And because there are already quotes around the date field, those get escaped with another quote:
"Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0,"" date: Feb 04, 2016 """
So, if you remove that last quoting, everything should be fine (but you might want to trim the date field):
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0," date: Feb 04, 2016 "
If you want it exactly like this, you need another comma after date: :
Rob,Avanti,Ave,12.83,Max,4.0,Min,-21.9,analist disp:,-1.0, date:,"Feb 04, 2016"
On the other hand, it would be better to use a header instead:
Name,Name2,Ave,Max,Min,analist disp,date
Rob,Avanti,12.83,4.0,-21.9,-1.0,"Feb 04, 2016"

How do I get coupon payment dates for a simple fixed bond using quantlib, quantlib-swig and python

I am trying yo learn quantlib (1.3) & python bindings using quantlib-swig (1.2) in ubuntu 13.04. As a starter I am trying to determine the payment dates for a very simple bond as given below using 30/360 European day counter
from QuantLib import *
faceValue = 100.0
doi = Date(31, August, 2000)
dom = Date(31, August, 2008)
coupons = [0.05]
dayCounter = Thirty360(Thirty360.European)
schedule = Schedule(doi, dom, Period(Semiannual),
India(),
Unadjusted, Unadjusted,
DateGeneration.Backward, False)
Following are my questions:
Which method of schedule object will give me the payment dates?
Where do I need to specify the dayCounter object so that the dates are appropriately calculated?
Using Dimitri Reiswich' Presentation, I tried mimicking C++ code, but schedule.dates() returns an error as no such method.
The payment dates for this Fixed Rate bond are, (obtained by using oocalc)
Feb 28, 2001; Aug 31, 2001
Feb 28, 2002; Aug 31, 2002
Feb 28, 2003; Aug 31, 2003
Feb 29, 2004; Aug 31, 2004
Feb 28, 2005; Aug 31, 2005
Feb 28, 2006; Aug 31, 2006
Feb 28, 2007; Aug 31, 2007
Feb 29, 2008; Aug 31, 2008
How do I get the payment dates for this simple bond using python & quantlib? Can someone please help?
regards
K
If you want to look at the schedule you just generated, you can iterate over it:
>>> for d in schedule: print d
...
August 31st, 2000
February 28th, 2001
August 31st, 2001
February 28th, 2002
August 31st, 2002
February 28th, 2003
August 31st, 2003
February 29th, 2004
August 31st, 2004
February 28th, 2005
August 31st, 2005
February 28th, 2006
August 31st, 2006
February 28th, 2007
August 31st, 2007
February 29th, 2008
August 31st, 2008
or call list(schedule) if you want to store them. However, are you sure that those are the payment dates? They are the start and end date for accrual calculation; but some of these fall on a Saturday or a Sunday, and the bond will be paying on the next business day. You can see the effect if you instantiate the bond and retrieve the coupons:
>>> settlement_days = 3
>>> bond = FixedRateBond(settlement_days, faceValue, schedule, coupons, dayCounter)
>>> for c in bond.cashflows():
... print c.date()
...
February 28th, 2001
August 31st, 2001
February 28th, 2002
September 2nd, 2002
February 28th, 2003
September 1st, 2003
March 1st, 2004
August 31st, 2004
February 28th, 2005
August 31st, 2005
February 28th, 2006
August 31st, 2006
February 28th, 2007
August 31st, 2007
February 29th, 2008
September 1st, 2008
September 1st, 2008
(that is, unless Saturdays and Sundays shouldn't be holidays for the Indian calendar. If you think they shouldn't, file a bug report with QuantLib).

get mean of values that fit a specific criteria (pattern matching)

I asked this question before and got a reply that solved it for me. I have a dataframe that looks like this:
id weekdays halflife
241732222300860000 Friday, Aug 31, 2012, 22 0.4166666667
241689170123309000 Friday, Aug 31, 2012, 19 0.3833333333
241686878137512000 Friday, Aug 31, 2012, 19 0.4
241651117396738000 Friday, Aug 31, 2012, 16 1.5666666667
241635163505820000 Friday, Aug 31, 2012, 15 0.95
241633401382265000 Friday, Aug 31, 2012, 15 2.3666666667
And I would like to get average half life of items that were created on Monday, then on Tuesday...etc. (My date range spans over 6 months).
To get the date values I used strptime and difftime. Also, I found the maximum halflife with max(df$halflife), how can I find which id it corresponds to?
Reproducible code:
structure(list(id = c(241732222300860416, 241689170123309056,
241686878137511936, 241651117396738048, 241635163505819648, 241633401382264832
), weekdays = c("Friday, Aug 31, 2012, 22", "Friday, Aug 31, 2012, 19",
"Friday, Aug 31, 2012, 19", "Friday, Aug 31, 2012, 16", "Friday, Aug 31, 2012, 15",
"Friday, Aug 31, 2012, 15"), halflife = structure(c(0.416666666666667,
0.383333333333333, 0.4, 1.56666666666667, 0.95, 2.36666666666667
), class = "difftime", units = "mins")), .Names = c("id",
"weekdays", "halflife"), row.names = c(NA, 6L), class = "data.frame")
So now, I have an average half life value for all mondays, tuesdays...etc. How can I get the average value for all hours within those weekdays, i.e.: Average half life of all items that were created on all Mondays at 9am, then 10am, then 11am..etc. And then Tuesday at 9am, 10am, 11am..etc. The dates in the weekdays column is formatted so that the last number after the comma is the hour it was created at. I am really bad with regular expressions and pattern matching, which is why I am asking this follow-up question.
with base packages you can do following.
> mydf
id weekdays halflife
1 2.417322e+17 Friday, Aug 31, 2012, 22 0.4166667 mins
2 2.416892e+17 Friday, Aug 31, 2012, 19 0.3833333 mins
3 2.416869e+17 Friday, Aug 31, 2012, 19 0.4000000 mins
4 2.416511e+17 Friday, Aug 31, 2012, 16 1.5666667 mins
5 2.416352e+17 Friday, Aug 31, 2012, 15 0.9500000 mins
6 2.416334e+17 Friday, Aug 31, 2012, 15 2.3666667 mins
Instead of using regex, we can just use strsplit on each element of weekdays, unlist the result, and it back in 4 column format as matrix and cbind it back with mydf.
> mydf2 <- cbind(mydf, matrix(unlist(sapply(mydf$weekdays, strsplit, split=',')), byrow=TRUE, ncol=4, dimnames=list(1:nrow(mydf), c('Weekday', 'Day', 'Year', 'Hour'))))
> mydf2
id weekdays halflife Weekday Day Year Hour
1 2.417322e+17 Friday, Aug 31, 2012, 22 0.4166667 mins Friday Aug 31 2012 22
2 2.416892e+17 Friday, Aug 31, 2012, 19 0.3833333 mins Friday Aug 31 2012 19
3 2.416869e+17 Friday, Aug 31, 2012, 19 0.4000000 mins Friday Aug 31 2012 19
4 2.416511e+17 Friday, Aug 31, 2012, 16 1.5666667 mins Friday Aug 31 2012 16
5 2.416352e+17 Friday, Aug 31, 2012, 15 0.9500000 mins Friday Aug 31 2012 15
6 2.416334e+17 Friday, Aug 31, 2012, 15 2.3666667 mins Friday Aug 31 2012 15
Now we have split weekdays column appropriately, we can use aggregate function to calculate mean over desired grouping columns.
> aggregate(halflife ~ Weekday, data=mydf2, FUN = mean)
Weekday halflife
1 Friday 1.013889
If you want to group by Weekday as well as Hour then
> aggregate(halflife ~ Weekday + Hour, data=mydf2, FUN = mean)
Weekday Hour halflife
1 Friday 15 1.6583333
2 Friday 16 1.5666667
3 Friday 19 0.3916667
4 Friday 22 0.4166667
As such first parameter of aggregate function here is a forumla object which supports one ~ one, one ~ many, many ~ one, and many ~ many relationships. See ?aggregate examples to understand how to use it.
I will give brief example of how to many to many relationships.
> set.seed(12345)
> mydf2 <- cbind(mydf2, newvar = rnorm(nrow(mydf2)))
> mydf2
id weekdays halflife Weekday Day Year Hour newvar
1 2.417322e+17 Friday, Aug 31, 2012, 22 0.4166667 mins Friday Aug 31 2012 22 0.5855288
2 2.416892e+17 Friday, Aug 31, 2012, 19 0.3833333 mins Friday Aug 31 2012 19 0.7094660
3 2.416869e+17 Friday, Aug 31, 2012, 19 0.4000000 mins Friday Aug 31 2012 19 -0.1093033
4 2.416511e+17 Friday, Aug 31, 2012, 16 1.5666667 mins Friday Aug 31 2012 16 -0.4534972
5 2.416352e+17 Friday, Aug 31, 2012, 15 0.9500000 mins Friday Aug 31 2012 15 0.6058875
6 2.416334e+17 Friday, Aug 31, 2012, 15 2.3666667 mins Friday Aug 31 2012 15 -1.8179560
> aggregate(cbind(newvar,halflife) ~ Weekday + Hour, data=mydf2, FUN = mean)
Weekday Hour newvar halflife
1 Friday 15 -0.6060343 1.6583333
2 Friday 16 -0.4534972 1.5666667
3 Friday 19 0.3000814 0.3916667
4 Friday 22 0.5855288 0.4166667