Google's OR-Tools - single event planning - linear-programming

Google's OR-Tools provides some example code demonstrating how to solve the nurse-scheduling problem. I'm trying to adapt it to solve an interview-planning scenario wherein a single candidate will attend 2 meetings. Each meeting has the following requirement:
Each meeting must have 2 attendees
Solving Requirement (1) is pretty simple:
meetings = ["phone_screen", "in_person"]
users = ["alice", "bob", "carl", "donna"]
days = ["Mon", "Tue", "Wed"]
times = ["morning", "afternoon"]
model = cp_model.CpModel()
# Build a boolean variable for every possible meeting time attendee
data = {}
for meeting in meetings:
for day in days:
for time in times:
for user in users:
id = 'meeting={},day={},time={},user={}'.format(meeting, day, time, user)
data[(meeting, day, time, user)] = model.NewBoolVar(id)
## Requirement 1: Ensure 2 attendees for each time slot
for meeting in meetings:
for day in days:
for time in times:
per_time_data = []
for user in users:
per_time_data.append(data[(meeting, day, time, user)])
model.Add(sum(per_time_data) == 2)
# Solve and print solutions
solver = cp_model.CpSolver()
solver.Solve(model)
for day in days:
for time in times:
for meeting in meetings:
string = '{} {}\t| {}\t| '.format(day, time, meeting)
for user in users:
if solver.Value(data[(meeting, day, time, user)]) == 1:
string += user + '\t'
print(string)
This works as expected, printing out a single solution wherein each meeting slot has a (basically) random selection of 2 attendees:
Mon morning | phone_screen | bob carl
Mon morning | in_person | alice bob
Mon afternoon | phone_screen | alice bob
Mon afternoon | in_person | alice bob
Tue morning | phone_screen | alice bob
Tue morning | in_person | alice bob
Tue afternoon | phone_screen | alice bob
Tue afternoon | in_person | alice bob
Wed morning | phone_screen | alice bob
Wed morning | in_person | alice bob
Wed afternoon | phone_screen | alice bob
Wed afternoon | in_person | alice bob
But this isn't really what I want. The interview candidate only needs to attend two total meetings (one phone_screen and one in_person) whereas the above "solution" shows 12. I want to end up with something like this:
Mon morning | phone_screen | bob carl
Mon afternoon | in_person | alice bob
Therefore, we have Requirement (2):
Each meeting type should only occur once
Solving Requirement (2) is trickier for some reason, even though it seems like I should be able to follow a very similar strategy.
## Requirement 2: Ensure only 1 of each type of meeting exists
for meeting in meetings:
per_meeting_data = []
for user in users:
for day in days:
for time in times:
per_meeting_data.append(data[(meeting, day, time, user)])
# Ensure the number of attendees for this meeting type on all days is 2
model.Add(sum(per_meeting_data) == 2)
Adding the above code causes it to fail as an INFEASIBLE solution. Where am I going wrong?

Shouldn't your first requirement be?
for meeting in meetings:
for day in days:
for time in times:
per_time_data = []
for user in users:
per_time_data.append(data[(meeting, day, time, user)])
# notice the indentation and <=
model.Add(sum(per_time_data) <= 2)
Because it looks like you will have less meetings than slots.
As for your second requirement, I don't quite understand it, do you want each (user, meeting) pair to occur only once?
If so, you can just do:
for meeting in meetings:
for user in users:
per_meeting_data = []
for day in days:
for time in times:
per_meeting_data.append(data[(meeting, day, time, user)])
model.Add(sum(per_meeting_data) == 1)

Related

PowerBI - Counting the attributes of a record based on date

I have two simple tables. I need to able to determine who is 'new' as of a particular date (say January) and then count only those attributes. There's a 1:M relationship on name. I basically need to answer the following questions with the below data:
What is the total number of FamilyMembers based on log-in for the month? (Done using custom measure)
Out of the total of #1 - how many have logged in for the first time?
Out of the total of #2 - how many were children? How many were adults?
Log In Table
ID
Name
Date
login1
Sam
Jan
login2
Sam
Jan
login3
Dave
Jan
login4
Dave
Jan
login5
Jack
Jan
login6
Sam
Jan
login7
James
Feb
login8
James
Feb
login9
James
Feb
login10
Sam
Feb
login11
Sam
Feb
login12
Steve
Feb
Contact Table
Name
FamilyMembers
Child
Adult
Sam
3
1
2
James
2
1
1
Dave
4
2
2
Jack
1
0
1
Steve
6
1
5
Using this data, filtered on February we would see Steve never signed in prior to that date, so that makes him 'new'. James is also new.
My closest attempt is the custom 'Count of New Individuals' Measure
VAR currentUsers = VALUES('Log-Ins'[Name])
VAR currentDate = MIN('Log-Ins'[Date])
VAR pastUsers = CALCULATETABLE(VALUES('Log-Ins'[Name]),
ALL('Log-Ins'[Date].[Month],'Log-Ins'[Date].[MonthNo],'Log-Ins'[Date].[Year])
, 'Log-Ins'[Date]<currentDate)
VAR newUsers = EXCEPT(currentUsers,pastUsers)
RETURN COUNTROWS(newUsers)
As you can see this gives me the count of new individuals but I want to count their attributes to say :: Out of the 11 total family members, 8 were new. Out of those 8, 6 were adults and 2 were children.
I may be getting lost in the translation, but I don't understand how exactly you want to display the information.
#ContactsWhoLoggedIN :=
CALCULATE(COUNTROWS(Contacts),FILTER(Contacts,CALCULATE(COUNTROWS(LogIN)>0)))
#NewCWhoLoggedIN :=
CALCULATE(COUNTROWS(Contacts),
FILTER(Contacts,
//LoggedIn in the Current Date Context
CALCULATE(COUNTROWS(LogIN))>0
&&
//Never LoogedIN before the Current Date Context
CALCULATE(COUNTROWS(LogIN),FILTER(ALL(Dates),Dates[Date]<MIN(Dates[Date])))=0
)
)
#CWhoLoggedBackIN := [#ContactsWhoLoggedIN]-[#NewCWhoLoggedIN]
#FM_NewCWLI :=
CALCULATE(SUM(Contacts[FamilyMembers]),
FILTER(Contacts,
//LoggedIn in the Current Date Context
CALCULATE(COUNTROWS(LogIN))>0
&&
//Never LoogedIN before the Current Date Context
CALCULATE(COUNTROWS(LogIN),FILTER(ALL(Dates),Dates[Date]<MIN(Dates[Date])))=0
)
)
I remember this pattern from "Microsoft Excel 2013: Building Data Models with PowerPivot"

How to append current and previous sessions side by side filtered by two independent slicers

Objective: I would like obtain the difference between current and previous sessions based on date slicers
I want the output to be 4 columns as such:
Date
Current Sessions (see measure below)
Previous Sessions (see measure below)
Difference (no measure calculated yet).
Situation:
I currently have two measures
Current Sessions: SUM(Sales[Sessions])
Previous Sessions (thanks to #Alexis Olson):
VAR datediffs = DATEDIFF(
CALCULATE (MAX ( 'Date'[Date] ) ),
CALCULATE (MAX ('Previous Date'[Date])),
DAY
)
RETURN
CALCULATE(SUM(Sales[Sessions]),
USERELATIONSHIP('Previous Date'[Date],'Date'[Date]),
DATEADD('Date'[Date],datediffs,DAY)
)
I have three tables.
Sales
Date
Previous Date (carbon copy of Date table)
My previous date table is 1:1 inactive relationship with the Date table. Date table is 1 to many active relationship
with my Sales Table.
I have two slicers at all time comparing the same amount of days from different time periods (e.g. Jan 1th to Jan 7th 2019 vs Dec 25st to Dec 31th 2019)
If i put current sessions, previous sessions and a date column from any of the three tables
+----------+------------------+-------------------+------------+
| date | current sessions | previous sessions | difference |
+----------+------------------+-------------------+------------+
| Jan 8th | 10000 | 70000 | 3000 |
| Jan 9th | 20000 | 10000 | 10000 |
| Jan 10th | 15000 | 16000 | -1000 |
| Jan 11th | 14000 | 12000 | 2000 |
| Jan 12th | 12000 | 14000 | -2000 |
| Jan 13th | 11000 | 16000 | -5000 |
| Jan 14th | 15000 | 18000 | -3000 |
+----------+------------------+-------------------+------------+
When I put the Sessions date on the table along with sessions and previous sessions, I get the sessions amounts right for each day but the previous session amounts doesn't calculate correctly I assume because its being filtered by the date rows.
How can I override that table filter and force it to get the exact previous sessions amounts? Basically have both results appended to each other.The following shows my problem. the previous session is the same on each day and is basically the amount of dec 31st jan 2018 because the max date is different for each row but I want it to be based on the slicer.
The mistake came in the first part of the VAR Datediffs variable within the previous session formula:
CALCULATE(LASTDATE('Date'[Date]),ALLSELECTED('Date'))
This forces to always calculate the last day for each row and overrides the date value in each row.

Django ORM group by, and find latest item of each group (window functions)

Say we have a model as below
class Cake(models.Model):
baked_on = models.DateTimeField(auto_now_add=True)
cake_name = models.CharField(max_length=20)
Now, there are multiple Cakes baked on the same day, and I need a query that will return me a monthly cake report which consists of each day of the month, and the names of the first and last cakes baked on that day.
For example, if the data is something like this:
baked_on cake_name
11 Jan 12:30 Vanilla
11 Jan 14:30 Strawberry
11 Jan 20:45 Avocado
12 Jan 09:05 Raspberry
12 Jan 16:30 Sprinkles
12 Jan 20:11 Chocolate
My query's output should look like
date first last
11 Jan Vanilla Avocado
12 Jan Raspberry Chocolate
How should I go about doing this in a single ORM call?
Django 2.0 introduced window functions that are made for that kind of queries. Simple answer for your question will be:
Cake.objects.annotate(
first_cake=Window(
expression=FirstValue('cake_name'),
partition_by=[TruncDate('baked_on')],
order_by=F('baked_on').asc(),
),
last_cake=Window(
expression=FirstValue('cake_name'),
partition_by=[TruncDate('baked_on')],
order_by=F('baked_on').desc(),
),
day=TruncDate('baked_on'),
).distinct().values_list('day', 'first_cake', 'last_cake')
Why FirstValue in last_cake? That's becaues window query by default will traverse through each row and won't look ahead, so for every row, last row will be equal to current row. Using last_row together with descending sorting will fix that. Either that or you can define frame for which window query should work:
Cake.objects.annotate(
first_cake=Window(
expression=FirstValue('cake_name'),
partition_by=[TruncDate('baked_on')],
order_by=F('baked_on').asc(),
),
last_cake=Window(
expression=LastValue('cake_name'),
partition_by=[TruncDate('baked_on')],
order_by=F('baked_on').asc(),
frame=ValueRange(),
),
day=TruncDate('baked_on'),
).distinct().values_list('day', 'first_cake', 'last_cake')

Pandas - Best way to deal with this data

I hopefully have a very simple question. My company uses a employee shift management software known as humanity which produces hour reports in a really unuseable format.
I need to clean it up so I can apply it in the rest of my analysis but I am at a loss at the best way to do this and I can't figure it out. The data starts looking like this:
Name | Total | Start (Sep 1, 2017) | End (Sep 1, 2017) | Hrs (Sep 1, 2017)
User 1 | 12 | 06:00 | 18:30 | 13
User 2 | 0 | | |
There are obviously many more users and many more dates but it repeats across the columns for additional dates. Here is what I have done to clean it up so far:
data = pd.read_csv("TestReport.csv")
del data["Total"]
cols = [c for c in data.columns if c.lower()[:3] != 'hrs']
data = data[cols]
data.rename(columns=lambda x: re.sub('Start \(', '', x), inplace=True)
data.rename(columns=lambda x: re.sub('End \(', '', x), inplace=True)
data.rename(columns=lambda x: re.sub('\)', '', x), inplace=True)
data.fillna(0, inplace=True)
My end need is to create date fields for start and finish times for each day for each user. With my data now having the column names as a pure month, day, year I think the best way would be just to iterate over each row and add the row value + column name, convert to date time and that will work.
However, I am not positive the best way to go about doing this, or if this would even be the best way.
The most important thing for me is that each user has a combined start and finish date time to be able to use to further analyze their efficiency during their shift on different records.
Let me know if I can provide any more details,
Thank you!
Andy McMaster
*******************Edited to show example*********************
Ideally end goal is to create a series of date ranges for each user. I need to be able to compare these series to my dataframe which holds records for all employees work, then assign each record to the user (team lead) who managed that record.
End would ideally be
Name | Total | Start (Sep 1, 2017) | End (Sep 1, 2017) | Hrs (Sep 1, 2017)
User 1 | 12 | 06:00 Sep 1, 2017 | 18:30 Sep 1, 2017 | 13
User 2 | 0 | | |
All - I discovered at least in my opinion the best way to solve this issue. I stick with the same data cleaning but end up with a little piece like this to create a workable list to add hours and dates together.
month_list = data.columns.tolist()
month_list.remove('Name')
new_list = []
for i in month_list:
if i not in new_list:
new_list.append(i)
for i in new_list:
data[i] = i + " " + data[i].astype(str)
This produces data that looks like:
Name Sep 1, 2017 Sep 1, 2017 Sep 2, 2017 \
0 User 1 Sep 1, 2017 6:00 Sep 1, 2017 18:30 Sep 2, 2017 6:00
1 User 2 Sep 1, 2017 0 Sep 1, 2017 0 Sep 2, 2017 0
2 User 3 Sep 1, 2017 0 Sep 1, 2017 0 Sep 2, 2017 0
3 User 4 Sep 1, 2017 0 Sep 1, 2017 0 Sep 2, 2017 0
4 User 5 Sep 1, 2017 6:00 Sep 1, 2017 12:00 Sep 2, 2017 6:00
Next steps will involve reworkign my code to remove all zero times or creating a filter down the road so as I go through each user, I only am using available times they worked.
Hopefully this will help someone if they ever have a poorly designed timesheet they need to work with.

Django ORM: Joining a table to itself and aggregating

I have a table in my Django app, UserMonthScores, where every user has a "score" for every month. So, it looks like
userid | month | year | score
-------+-------+------+------
sil | 9 | 2014 | 20
sil | 8 | 2014 | 20
sil | 7 | 2014 | 20
other | 9 | 2014 | 100
other | 8 | 2014 | 1
I'd like to work out which position a specific user was in, for each month, in the ranking table. So in the above, if I ask for monthly ranking positions for user "sil", per month, I should get a response which looks like
month | year | rank
------+------+-----
9 2014 2 # in second position behind user "other" who scored 100
8 2014 1 # in first position ahead user "other" who scored 1
7 2014 1 # in first position because no-one else scored anything!
The way I'd do this in SQL is to join the table to itself on month/year, and select rows where the second table was for the specific user and the first table had a larger score than the second table, group by month/year, and select the count of rows per month/year. That is:
select u1.month,u1.year,count(*) from UserMonthScores u1
inner join UserMonthScores u2
on u1.month=u2.month and u1.year=u2.year
and u2.userid = 'sil' and u1.score >= u2.score
group by u1.year, u1.month;
That works excellently. However, I do not understand how to do this query using the Django ORM. There are other questions about joining a table to itself, but they don't seem to cover this use case.