I'm trying to create a list of dates between a start date and an end date (done). But now, I want to FILTER weekends out of that list.
The start date is defined, but the end date is based on a number of working days after the start date. The problem is, when I create the list using the following formula, all dates in between are included and I've made numerous attempts to FILTER said dates using WORKDAY.INTL and REGEXMATCH without success. Is it possible to modify this particular formula or do I need to start over with something different?
=ArrayFormula(TO_DATE(row(indirect("A"&A2):indirect("A"&B2))))
Here is an example of what I've done.
This is what I'm getting:
Friday, October 4, 2019
Saturday, October 5, 2019
Sunday, October 6, 2019
Monday, October 7, 2019
Tuesday, October 8, 2019
Wednesday, October 9, 2019
Thursday, October 10, 2019
Friday, October 11, 2019
Saturday, October 12, 2019
Sunday, October 13, 2019
This is what I'm after:
Friday, October 4, 2019
Monday, October 7, 2019
Tuesday, October 8, 2019
Wednesday, October 9, 2019
Thursday, October 10, 2019
Friday, October 11, 2019
Monday, October 14, 2019
Tuesday, October 15, 2019
Wednesday, October 16, 2019
Thursday, October 17, 2019
See if this works
=query(ArrayFormula(TO_DATE(row(indirect("A"&A2):indirect("A"&B2)))), "where dayOfWeek(Col1) <> 7 and dayOfWeek(Col1) <> 1")
you can do it like this:
=ARRAYFORMULA(FILTER(ROW(INDIRECT("A"&A2&":A"&B2)),
REGEXMATCH(TEXT(ROW(INDIRECT("A"&A2&":A"&B2)), "ddd"), "[^(Sat|Sun)]")))
I was able to display the week that starts every Saturday by:
today = now().date()
sat_offset = (today.weekday() - 5) % 7
week_start = today - datetime.timedelta(days=sat_offset)
This will display the week from last Saturday but how would I show the dates of each day forward as well? So if the week: Oct. 27, 2018 is display it should say:
Saturday : Oct. 27, 2018
Sunday: Oct. 28, 2018
Monday: Oct. 29, 2018
Tuesday: Oct. 30, 2018
Wednesday: Oct. 31, 2018
Thursday: Nov. 01, 2018
Friday: Nov. 02, 2018
Thank you for your help.
You can iterate through the days of the week using range and time delta like so:
for i in range(7):
week_start += datetime.timedelta(days=1)
print(week_start.strftime("%A %d. %B %Y"))
This will produce a dates like:
Monday : Oct. 28, 2018
Tuesday : Oct. 29, 2018
Wednesday : Oct. 30, 2018
Thursday : Oct. 31, 2018
Friday : Nov. 01, 2018
Saturday : Nov. 02, 2018
Sunday : Nov. 03, 2018
You can format the string how ever you want. Here is some info on dates in python.
I have a csv file that contains a column with multiple date formats. I need to split them and get the extracted result in the same format.
Wednesday 12 August 2015
Wednesday 12 August 2015
Friday April 1 2016
Friday April 1 2016
5/12/2016
5/12/2016
This is the file and i want it in the mm/dd/yy format. My code is as follows:
import re
import csv
import pandas as pd
#delimiters = " ", "/"
#f = open('merged_34.csv')
f = open('test3.csv')
df = pd.read_csv('test3.csv')
for item in df['serverDatePrettyFirstAction']:
if '/' in item:
newDate.append(item)
else:
item = item.split(' ', 1)[1]
newDate.append(item)
df['newDate'] = newDate
df.to_csv('D:/Python/10.36.202.64/newfile.csv', index = False)
And this is what i get:
serverDatePrettyFirstAction newDate
Wednesday 12 August 2015 12-Aug-15
Wednesday 12 August 2015 12-Aug-15
Friday April 1 2016 April 1 2016
Friday April 1 2016 April 1 2016
5/12/2016 5/12/2016
5/12/2016 5/12/2016
Also is there a way to overwrite the values in the same column itself
a faster approach would be to use pandas's method to_datetime():
In [2]: df
Out[2]:
Date
0 Wednesday 12 August 2015
1 Wednesday 12 August 2015
2 Friday April 1 2016
3 Friday April 1 2016
4 5/12/2016
5 5/12/2016
In [6]: df['newDate'] = pd.to_datetime(df['Date'])
Result:
In [7]: df
Out[7]:
Date newDate
0 Wednesday 12 August 2015 2015-08-12
1 Wednesday 12 August 2015 2015-08-12
2 Friday April 1 2016 2016-04-01
3 Friday April 1 2016 2016-04-01
4 5/12/2016 2016-05-12
5 5/12/2016 2016-05-12
You can use third party dateutil library as long as your data is not too big.( After all, It guesses format every time)
import pandas as pd
from dateutil import parser
df = pd.read_csv('test3.csv')
df['newDate'] = df['serverDatePrettyFirstAction'].apply(parser.parse)
df.to_csv('newfile.csv', index=False, date_format='%Y-%m-%d ')
to overwrite the values in the same column
Use
df['serverDatePrettyFirstAction']=df['serverDatePrettyFirstAction'].apply(parser.parse)
I asked this question before and got a reply that solved it for me. I have a dataframe that looks like this:
id weekdays halflife
241732222300860000 Friday, Aug 31, 2012, 22 0.4166666667
241689170123309000 Friday, Aug 31, 2012, 19 0.3833333333
241686878137512000 Friday, Aug 31, 2012, 19 0.4
241651117396738000 Friday, Aug 31, 2012, 16 1.5666666667
241635163505820000 Friday, Aug 31, 2012, 15 0.95
241633401382265000 Friday, Aug 31, 2012, 15 2.3666666667
And I would like to get average half life of items that were created on Monday, then on Tuesday...etc. (My date range spans over 6 months).
To get the date values I used strptime and difftime. Also, I found the maximum halflife with max(df$halflife), how can I find which id it corresponds to?
Reproducible code:
structure(list(id = c(241732222300860416, 241689170123309056,
241686878137511936, 241651117396738048, 241635163505819648, 241633401382264832
), weekdays = c("Friday, Aug 31, 2012, 22", "Friday, Aug 31, 2012, 19",
"Friday, Aug 31, 2012, 19", "Friday, Aug 31, 2012, 16", "Friday, Aug 31, 2012, 15",
"Friday, Aug 31, 2012, 15"), halflife = structure(c(0.416666666666667,
0.383333333333333, 0.4, 1.56666666666667, 0.95, 2.36666666666667
), class = "difftime", units = "mins")), .Names = c("id",
"weekdays", "halflife"), row.names = c(NA, 6L), class = "data.frame")
So now, I have an average half life value for all mondays, tuesdays...etc. How can I get the average value for all hours within those weekdays, i.e.: Average half life of all items that were created on all Mondays at 9am, then 10am, then 11am..etc. And then Tuesday at 9am, 10am, 11am..etc. The dates in the weekdays column is formatted so that the last number after the comma is the hour it was created at. I am really bad with regular expressions and pattern matching, which is why I am asking this follow-up question.
with base packages you can do following.
> mydf
id weekdays halflife
1 2.417322e+17 Friday, Aug 31, 2012, 22 0.4166667 mins
2 2.416892e+17 Friday, Aug 31, 2012, 19 0.3833333 mins
3 2.416869e+17 Friday, Aug 31, 2012, 19 0.4000000 mins
4 2.416511e+17 Friday, Aug 31, 2012, 16 1.5666667 mins
5 2.416352e+17 Friday, Aug 31, 2012, 15 0.9500000 mins
6 2.416334e+17 Friday, Aug 31, 2012, 15 2.3666667 mins
Instead of using regex, we can just use strsplit on each element of weekdays, unlist the result, and it back in 4 column format as matrix and cbind it back with mydf.
> mydf2 <- cbind(mydf, matrix(unlist(sapply(mydf$weekdays, strsplit, split=',')), byrow=TRUE, ncol=4, dimnames=list(1:nrow(mydf), c('Weekday', 'Day', 'Year', 'Hour'))))
> mydf2
id weekdays halflife Weekday Day Year Hour
1 2.417322e+17 Friday, Aug 31, 2012, 22 0.4166667 mins Friday Aug 31 2012 22
2 2.416892e+17 Friday, Aug 31, 2012, 19 0.3833333 mins Friday Aug 31 2012 19
3 2.416869e+17 Friday, Aug 31, 2012, 19 0.4000000 mins Friday Aug 31 2012 19
4 2.416511e+17 Friday, Aug 31, 2012, 16 1.5666667 mins Friday Aug 31 2012 16
5 2.416352e+17 Friday, Aug 31, 2012, 15 0.9500000 mins Friday Aug 31 2012 15
6 2.416334e+17 Friday, Aug 31, 2012, 15 2.3666667 mins Friday Aug 31 2012 15
Now we have split weekdays column appropriately, we can use aggregate function to calculate mean over desired grouping columns.
> aggregate(halflife ~ Weekday, data=mydf2, FUN = mean)
Weekday halflife
1 Friday 1.013889
If you want to group by Weekday as well as Hour then
> aggregate(halflife ~ Weekday + Hour, data=mydf2, FUN = mean)
Weekday Hour halflife
1 Friday 15 1.6583333
2 Friday 16 1.5666667
3 Friday 19 0.3916667
4 Friday 22 0.4166667
As such first parameter of aggregate function here is a forumla object which supports one ~ one, one ~ many, many ~ one, and many ~ many relationships. See ?aggregate examples to understand how to use it.
I will give brief example of how to many to many relationships.
> set.seed(12345)
> mydf2 <- cbind(mydf2, newvar = rnorm(nrow(mydf2)))
> mydf2
id weekdays halflife Weekday Day Year Hour newvar
1 2.417322e+17 Friday, Aug 31, 2012, 22 0.4166667 mins Friday Aug 31 2012 22 0.5855288
2 2.416892e+17 Friday, Aug 31, 2012, 19 0.3833333 mins Friday Aug 31 2012 19 0.7094660
3 2.416869e+17 Friday, Aug 31, 2012, 19 0.4000000 mins Friday Aug 31 2012 19 -0.1093033
4 2.416511e+17 Friday, Aug 31, 2012, 16 1.5666667 mins Friday Aug 31 2012 16 -0.4534972
5 2.416352e+17 Friday, Aug 31, 2012, 15 0.9500000 mins Friday Aug 31 2012 15 0.6058875
6 2.416334e+17 Friday, Aug 31, 2012, 15 2.3666667 mins Friday Aug 31 2012 15 -1.8179560
> aggregate(cbind(newvar,halflife) ~ Weekday + Hour, data=mydf2, FUN = mean)
Weekday Hour newvar halflife
1 Friday 15 -0.6060343 1.6583333
2 Friday 16 -0.4534972 1.5666667
3 Friday 19 0.3000814 0.3916667
4 Friday 22 0.5855288 0.4166667