Understanding "Real world modelling" program - c++

Few days now I've got new project to do related with a "real world modelling" program.
Here's how it looks like:
A visit to a psychologist (Use queue). Experts provides psychologist's advice, some of them (n) forms therapeutic groups of k people (GrT - duration of group therapy in hours), other experts (m) takes individual patients (InT - duration of individual therapy in hours). Each newly came patient (new patient's appearance probability is p1, recurring patients comes after period of time (h)) can choose to go to a psychologist providing individual therapies, or to group therapies. If group therapy session is full, patients who are wishing to participate in group sessions must wait. Recurring patients wishing to go to group sessions can start a session with smaller group, but can't go to same session with newly came patients. It has been observed that patients who took individual therapy are recovering faster than those, who chose group sessions(they will need less sessions), but there are exceptions - due to social interaction factor, some patients (probability p2) recover h percent faster than those, who choose individual therapy. Individual session costs InC, group session GrC. You need to assess what therapeutic approach patient should choose optimizing with their resources, and how many and what specialists should hire a health care facility.
Here's my approach to this problem:
Read text file containing Names, Surnames, money willing to spend and place everything in queue structure.
Find which group is better for patient by generating random number for p2probability and using it, we'll find if patient recover faster in individual or group therapy. IMO factor sequence here: Money(looking, if patient can afford individual therapy sessions) > p2 (should patient take group sessions if it's better for him).
By looking how many patients there are in queue, we can find how many psychologists we'll need. (Are there any other factors here? What if we are short of experts?)
Problems that I can't understand: how do I implement p1 probability of new patients appearance if I write every patient into a text file and put them in a queue? How many therapy sessions does it take for patient to recover (static number?)?
Am I missing something? Basically it's open question and there could be no right answer. If anyone have any suggestions how to build this program to better one, I'd be glad to take it!
Programming language I'm using: C++

If you want to break up a task, analyse it and prepare it for coding, you could :
Firstly make a Block diagram, representing program flow control.
Followed by Pseudo code implementation.
P.S. update the question following the above and when you reach the "code stage", there, definitely, will be more help.

Related

Which network architecture will work best in my setup? (DQN)

I try to distribute x jobs among y persons using reinforcement learning (DQN).
Every person can have a specific amount of tasks and every task can only be done once.
I mask out all the non possible task for each person for example if a task is already choosen it will just be masked out (So the output size stays the same)
I preprocess my data by combining the features of the person with the features of the task. For example I would substract the timeslots: A Person has 4 timeslots left and the task needs 2 the resulting feature would be 2. I do this for every person and with every task resulting in one big matrix where the #rows = #persons and #colums = #tasks * #features.
Now I want to give my network as many information as possible meaning the whole matrix but I am unsure on how to do it.
One possible idea would be to make one big flatten array but the problems would be that the amount of persons can change and also that I can only choose one task at a time for one person so I would need to tell the network which person is the active one.
Another approach would be something like "Hey I have a sequence lets use RNN" but I am not sure how to teach the network which is the current person. I also think this would lead the network to give me the best task over all persons. But it should learn something like "If the task is better for another person don't choose it for the active one".
The output of my network are the actions(tasks) where I choose the maximum.
Maybe some smart person has an idea. Thanks for your help.

How can I restrict the output of an Amazon Machine Learning model? (Predicting cricket team results)

I am trying to predict match winner based on the historical data set as shown below,
The data set comprises of IPL seasons and Team_Name_id vs Opponent Team are the team names in IPL. I have set the match id as Row id and created the model. When running realtime testing, the result is not as expected (shown below)
Target is set as Match_winner_id.
Am I missing any configurations? Please help
The model is working perfectly correctly. There's just two problems:
Your input data is not very good
There's no way for the model to know that only one of those two teams should win
Data Quality
A predictive model needs good quality input data on which to reverse-engineer a model that explains a given result. This input data should contain information that can be used to predict a result given a different set of input data.
For example, when predicting house prices, it would need to know the suburb (category), number of bedrooms/bathrooms/parking spaces, age of the building and selling price. It could then predict the selling price for other houses with a slightly different mix of variables.
However, based on your screenshot, you are giving the following information (and probably more) on which to make your prediction:
Teams: Not great, because you are separating Column C and Column D. The model will assume they are unrelated information. It doesn't realise that those two values could be swapped.
Match date: Useless information unless the outcome varies in proportion to time (eg a team continually gets better)
Season: As with Match Date, this is probably useless because you're always predicting the future -- you won't be predicting for a past season
Venue: Only relevant if a particular team always wins at a given venue
Toss Decision: Would this really influence the outcome? Also, it's only known once the game begins, so not great for predicting a future game.
Win Type: You won't know the win type until a game is over, so it's not suitable for predicting a future game.
Score: Again, not known until the actual game, so no good for future predictions.
Man of the Match: Not known for future games.
Umpire: How does an umpire influence the result of a game?
City: Yes, given that home teams often have an advantage.
You have provided very little information that could be used to predict a future game. There is really only the teams and the venue. Everything else is either part of the game itself or irrelevant.
Picking only one of the two teams
When the ML model looks at your data and tries to make a prediction, it will look at all the data you have provided. For example, it might notice that for a given venue and season, Team 8 has a higher propensity to win. Therefore, given that venue and season, it will favour a win by Team 8. The model has no concept that the only possible outcome is one of the two teams given in columns C and D.
You are predicting for two given teams and you are listing the teams in either Column C or Column D and this makes no sense -- the result is the same if you swapped the teams between columns, but the model has no concept of this. Also, information about Team 1 vs Team 2 is totally irrelevant for Team 3 vs Team 4.
What you should do is create one dataset per team, listing all their matches, plus a column that shows the outcome -- either a boolean (Win/Lose) or a value that represents the number of runs by which they won (where negative is a loss). You would then ask them model to predict the result for that team, given the input data, which would be win/lose or a points above/below the other team.
But at the core, I think that your input data doesn't have enough rich content to be able to make a sensible prediction. Just ask yourself: "What data would I like to know if I were to guess which team would win?" It would probably be past results, weather conditions, which players were on each team, how many matches they played in the last week, etc. None of this information is being provided as input on each line of your input data.

Modelling EVERY day in Django

I have a booking system for something where the price can change based on the day. The admins for the site can make these changes. If a booking crosses the boundary of a daily rate, they pay pro-rata for the rates they used.
I'm losing confidence in how this is implemented. There are at least two ways:
Having Rates that specify their validity (start, end fields) and then working out which of those apply. But which overlapping ones take priority? Etc. Nasty. This is what we're trying to do and cannot currently answer sufficiently well.
The same except that there is some form of unique quality to date so that no two rates can overlap. The problem here is we'd need to split existing Rates on insert and rejoin two on delete/edit, etc if they had the same value. We'd need to make sure there were no gaps. It requires some heavy ORM overriding.
Keeping a DayRate table with every day defined. This means keeping a load of extra data around but most bookings are for tens of days, not thousands so I'm not worried about the database bandwidth requirements here. Date would be primary-unique and I'd just do a range filter for grabbing which ones I need to factor in.
The problem is generating these dates ahead of time. I know that as soon as I implement this, somebody will make a booking for 2032. Is there a good way around this or should we limit them?
None of these answers seems great and I have to imagine that I'm not the first guy with a booking system. Is there a better way of keeping track of a rate over a contiguous (possibly infinite) amount of time?

Collaborative Filtering: Ways to determine implicit scores for products for each user?

Having implemented an algorithm to recommend products with some success, I'm now looking at ways to calculate the initial input data for this algorithm.
My objective is to calculate a score for each product that a user has some sort of history with.
The data I am currently collecting:
User order history
Product pageview history for both anonymous and registered users
All of this data is timestamped.
What I'm looking for
There are a couple of things I'm looking for suggestions on, and ideally this question should be treated more for discussion rather than aiming for a single 'right' answer.
Any additional data I can collect for a user that can directly imply an interest in a product
Algorithms/equations for turning this data into scores for each product
What I'm NOT looking for
Just to avoid this question being derailed with the wrong kind of answers, here is what I'm doing once I have this data for each user:
Generating a number of user clusters (21 at the moment) using the k-means clustering algorithm, using the pearsons coefficient for the distance score
For each user (on demand) calculating their a graph of similar users by looking for their most and least similar users within their cluster, and repeating for an arbitrary depth.
Calculating a score for each product based on the preferences of other users within the user's graph
Sorting the scores to return a list of recommendations
Basically, I'm not looking for ideas on what to do once I have the input data (I may need further help with that later, but it's not the point of this question), just for ideas on how to generate this input data in the first place
Here's a haymaker of a response:
time spent looking at a product
semantic interpretation of comments left about the product
make a discussion page about a product, brand, or product category and semantically interpret the comments
if they Shared a product page (email, del.icio.us, etc.)
browser (mobile might make them spend less time on the page vis-à-vis laptop while indicating great interest) and connection speed (affects amt. of time spent on the page)
facebook profile similarity
heatmap data (e.g. à la kissmetrics)
What kind of products are you selling? That might help us answer you better. (Since this is an old question, I am addressing both #Andrew Ingram and anyone else who has the same question and found this thread through search.)
You can allow users to explicitly state their preferences, the way netflix allows users to assign stars.
You can assign a positive numeric value for all the stuff they bought, since you say you do have their purchase history. Assign zero for stuff they didn't buy
You could do some sort of weighted value for stuff they bought, adjusted for what's popular. (if nearly everybody bought a product, it doesn't tell you much about a person that they also bought it) See "term frequency–inverse document frequency"
You could also assign some lesser numeric value for items that users looked at but did not buy.

Explaining race conditions to a non-technical audience [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Recently, I found myself having to write up some concerns I have about race conditions in an application that is in development (not by me). This will likely be brought to the attention of stakeholders who are non-technical and with whom I do not have a direct line of communication, so my explanation needs to be in written form.
I have already made an attempt at this write-up. I gloss over the technical specifics as best I can, give an example of how a race condition would occur in the application, and describe its impact. I feel I did pretty well, but it's far from perfect.
The problem is, as much as I try to shield the reader from computer science, I have still found it difficult to eliminate phrases like "threads of execution" and "mutual exclusion" without losing correctness and substance. The risk is that, with too much hand-waving, these concerns could be dismissed as a made-up boogeyman.
Anyway, my question to you is this: How would you explain race conditions to a non-technical audience? Would you dare to explain CPU scheduling? Would you invoke the dining philosophers?
You don't have to work within the constraints of my situation (but it would be awesomely helpful if you did).
Company X has $1,000 in the bank. X pays a rent of $2,000 and received a payment of $10,000 for services rendered to company Y. However, due to a race condition, X is in deficit of $1,000 and is now applying for bankruptcy. =(
You might want to explain how the bank handles company X's account in this way: Bank staff A takes the current value of $1,000 and adds $10,000 to it. Bank staff B takes the current value of $1,000 and subtracts $2,000 from it. Bank staff A updates the value to $11,000. Bank staff B updates the value to -$1,000.
I think bank transactions might be a good example, both because it's easy to see that an incorrect result is bad and because race conditions are easy to create in such an environment.
I have $500 on my account.
Someone transfers $200 to me at the same time that I withdraw $50.
Now, if the bank doesn't handle race conditions properly, they will do the following (assuming the transactions are handled manually, of course)
Clerk A will see the request to add $200 to my balance, and note that my balance is currently $500.
Clerk B will see the request to subtract $50 from my balance, and note that my balance is currently $500 (clerk A hasn't yet transferred the money).
Clerk A finishes the paperwork and sets my account balance to $700 (500 + the 200 he was supposed to add).
And then, a minute later (because clerk B just had to grab a cup of coffee), clerk B finishes up the other transaction and sets my balance to $450 (the 500 I had when he checked, minus the 50 he was meant to subtract).
My balance is now $450, when it should have been $650, because of a race condition. The outcome depended on the order in which different parts of the two transactions were performed.
That's the general description of how race conditions are bad. Now say that instead of clerks, we have our application processing two separate tasks at the same time (that's your 'threads of execution'), and just like above, they both read a value, modify the value that they read, and then write it back. One of the modifications may then be lost if this happens in the order shown above.
That should relate it to the specific problems in your app.
I would go for a Dining Philosopher's-esque approach, but depending on my audience, I would try to analogize it to the context of my audience. Are you speaking to business executives? Then analogize it to something like allocate a meeting room or a corporate car or booking a hotel room or whatever. Are you talking to average people? Then the dining philosopher's example is fine, or you can think up a similar situation involving caring for farm animals or sitting in chairs or whatever.
Whether you hijack the dining philosopher's example, or make up your own, definitely use a metaphor.
If you are writing to a non technical audience, you'll want to simplify your explanations and relate it to something they can understand. One explanation taken from the paper Analogies for teaching parallel computing to inexperienced programmers (http://portal.acm.org/citation.cfm?doid=1189136.1189172) explains it in terms of a pen game:
We’re going to play a game called the
Pen Game. The rules are simple: I’m
going to hold a pen in my hand, and
then I’ll say “One, two, three, go.”
When I say “go,” take the pen from my
hand. Whoever gets the pen wins.
Ready? One, two, three, go.
You then ask if the outcome of this game can be predicted in advance. If it can't be predicted, can we guarantee a correct outcome? This should lead to the realization that it's possible to get incorrect results for simultaneous writes to the same piece of memory.
I was going to recommend the dining philosophers, but I see you have already found that one. So, as an alternative, how about using gridlock as an analogy?
Imagine normal traffic driving along the four streets next to a single city block (North ave, South ave, East street and West street). When there are only one or two cars on the road, everything moves smoothly. When there is steady traffic, some cars will have to stop and wait for other cars to move past, but this is a manageable problem. One car stops to wait for another car to go by, and then continues on its merry way.
Now, picture rush-hour traffic at the same location. Let's say that one car driving South on West street can't make it all the way through the intersection at the NorthWest corner of our city block. That car now blocks all of the Westbound cross traffic on North ave. It doesn't take long before a Westbound car tries to make it through the NorthEast corner intersection and gets stuck, blocking all of the Northbound traffic on East street. When this situation makes it all the way around the four intersections, no cars can move! Each one is waiting for the cars in front of it to move ahead, but there is no way for the gridlock to be releived without pulling cars out backwards.
The comparison to computing should be straightforward. Cars are threads or processes, streets and avenues are processors, buffers, or cores. The concept of blocking can be described using traffic lights or stop signs, and the whole thing starts to make intuitive sense, even to non-programmers.
Write a program:
Wait for salary.
Go to shop.
Buy food.
Turn on the plate.
Put food on the plate.
Keep plate for 20 minutes.
Eat.
Go to bed.
Now try to have two threads (you, wife) execute it without syncronization.
You: Wait for salary.
Wife: Go to the shop without money, crash
You: Turn on the plate.
You: Keep plate for 20 minutes.
You: Go to bed.
Wife: Eat at someone else's place.
Wife: Go to bed.
Peter wants to pull out of his driveway. He checks that nothing is in the way of his car, then gets in. His son Frank then hides behind the car. Peter cannot see him and runs him over.
The important thing here is that for a computer, "inspect" and "modify" tend to be two separate actions, so an example where you can't check something when you modify it is a good one.
How about the plain obvious?
A race condition is literally a race between two people.
A company is bidding on a project. Two employees working independently on bids submit them to the customer, but one of the employees has outdated information. Neither employee know that the other is in the process of submitting a bid, therefore depending on who is faster, the first bid may be replaced with the slower employee. This will cause confusion as the bid may have changed over time.
There needs to be communication between the two employees to either work together or stop one of them.
One difficulty in explaining the general concept is that race conditions manifest themselves in a wide variety of situations. If your goal is give your non-technical audience the sense that this is a generic problem type, you should try to offer more than one example.
A picture is worth a 1000 words. Its true. If you draw a timeline and put some entity on it, and show its state changes as time progresses you can demonstrate a race-condition pretty easily in one diagram. It may take a few redos to get the picture just right, but I've always found that drawing it out gets my point across must faster than describing it.
I think it's hard to explain this in a simple way, because thinking about concurrency is inherently hard. The basic idea of a financial transaction might be a good place to start, since people will have some familiarity with them from real life.
In any kind of transaction, you need to make simultaneous entries in two places - debits and credits. If the transaction gets interrupted in the middle by someone else trying to perform another transaction, they will see the wrong balance in one or the other of the accounts.
There's a great example in Structured Concurrent Programming With Operating Systems Applications (as I recall)
In the impoverished country of Bezerkistan, two lines merge onto a single track in a tunnel. There have been collisions and the ruling junta needs a solution.
The issue is that it's mountainous and the engineers are blind. There's very little advance warning of two trains about to collide in the tunnel.
Here's the plan.
Put a big bowl at the juncture.
Give each engineer a little brass monkey.
When you're about to enter the tunnel, you stop your train. You pat around in the bowl to see if a brass monkey is in the bowl.
If there's a monkey, someone else is using the tunnel, so you have to wait until their train is entirely in the tunnel, at which time the conductor gets out of the caboose and grabs the monkey from the bowl.
If there's no monkey, no one else is using the tunnel. So, you can grab your monkey from the engine compartment, put it in the bowl and drive through the tunnel, knowing you have acquired exclusive access to the track. Of course, you stop briefly to allow the conductor to retrieve the brass monkey.
Guess what?
They still had collisions!
Why? What's the situation or sequence of actions that causes this to fail?
That's a race condition.
In a written document, you can explain how the race condition leads to an accident.
In a presentation, you can coach the audience through reasoning about concurrency and locking.
i would use a shared memory bank account example of a data race condition.
explain that the computer does something like: load balance; add 1; store balance;. consider two threads that are modifying your bank account balance (you and your wife are both depositing one dollar at the same time).
if both threads get interuupted after the: load balance; and then resume, you can lose one dollar.
see: http://wasp.cs.washington.edu/atomeclipse/handouts.pdf
As you mentioned, you often need to introduce other concepts (mutual exclusion, threads of execution) to accurately describe race conditions, even in a metaphor. So try defining these terms (or at least getting the idea across) first, using metaphor.
As a simple example, let's use a 4-way intersection (set in a country where you drive on the right). Divide the intersection into 4 quadrants: North-West, North-East, South-East, and South-West. Now call each quadrant a resource, and call each car a thread of execution. These cars only respect traffic systems, and since there are no stop signs or traffic lights at this intersection, the cars barrel right on through without slowing or considering traffic.
You can easily show that simultaneous use of one of these quadrants by more than one car is bad, and results in a car crash. One obvious solution is to install a traffic system. The system ensures that no more than one car is passing through a quadrant at the same time. It can do this intricately, without tying up all the resources. For example, letting cars coming from the South make a left turn to head West (using south-east and north-west quadrants), while letting cars coming from the West make a right turn to head South (using the south-west quadrant). The traffic system is providing mutual exclusion, or preventing simultaneous use (by multiple cars) of a common resource (the quadrant of road in the intersection).
This at least provides the ideas behind these definitions, the idea that simultaneously accessing shared resources can be bad, and that mutual exclusion can solve this problem. After this is established, you'll need to map these to a more appropriate metaphor to show what a race condition is and how it's one of those bad things that results from lack of mutual exclusion for a common resource.
It takes a bit longer, but it grants some familiarity with terms and the big picture before drilling down into a more complex metaphor.
Talking about money to your stakeholders might send them into panic mode especially if they assume they are losing actual money because of this, which is not exactly ideal if the problem does not specifically result in a net loss of profits, so here's a less financially oriented story on how you can explain a race condition to anyone.
This story does not address the concept of deadlock, but the more traditional race condition scenario and consequences.
STORY STARTS HERE:
The Setting: There are 3 cities connected by a railway network. The trains do not have any signs on them indicating which city they are coming from and which city they are going to because they are being used between all 3 cities and the railway network didn't want to deal with the hassle of changing signs all the time. Since the network is small there is no concrete schedule on when trains arrive and leave. The station overseers just get a call from the other city station overseers when a train departs, the overseer takes a note of the time when it has left and since all trains are the same models they drive at the same speed, so when the overseer receives a call from the other cities they announce to the people in the station that: "The next train will be heading to city C". So the people who wish to travel to city C await the train, hop on and merrily ride to city C.
The Problem: But one day, as a train was planning its route from A to B to C, it broke down half-way between A and B. Luckily the technicians are very skilled and would be able to repair the train in a short while. However that same day another train was also planning a different route from C to B to A. The overseer at station B received a call from A that a train is coming, and shortly after received another call from C that another train was also coming. The station overseer then announced to the passengers awaiting in the station: "The first train arriving will be heading to station C, and shortly after the train after that will be heading to station A." As the passengers gathered their luggage and went to their respective platforms. The overseer saw a train coming and redirected the rails to the platform where people were planning to head to city C. Little did they know that the train was actually going to city A instead. The other train, after having fixed its' mechanical problems also arrived at the station and the overseer happily directed it to the platform containing passengers wishing to go to city A. Needless to say none of the passengers arrived where they planned to, all because the overseer assumed that they would arrive in order as usual.
The problem with race conditions and many many computer science constructs is that people are not computers. Every time I explain an algorithm to my students they say "but it doesn't make sense to do it that way", to which I reply "computers don't have common sense, all they have are instructions". That aside, you should explain a race condition as a race, and it makes most sense to let people actually try the race, if they can. That way they can see how things go wrong. But... they are not allowed to use common sense.
So let's assume we have a game where 2 persons fill up stacks of colored blocks in order Red, Orange, Yellow. They have many red, orange and yellow blocks. All stacks need to be exactly three blocks high.
In the first game both try to do this as fast as possible, but they only work on their own stacks.
In the second game they try to work together by allowing themselves to also stack blocks on each other's stacks. However they are not allowed to change the block they have in their hand, and they have to place a planned block.
You can imagine a situation like this occurs in stack 1:
player 1 grabs a red block
player 1 places red block - player 2 grabs an orange block
player 1 grabs an orange block - player 2 places an orange block
player 1 places an orange block
So now we have a stack with two orange blocks. It's obvious that with a human game this would never happen, because people have common sense: they see that the orange block is already placed, and revert their decision to also place an orange block.
Also you can show them this video: https://www.youtube.com/watch?v=TcGwNdbsAbc
Let's use a whiteboard to do a trivial accounting task. We've got $100 on hand - write it on the whiteboard.
Alice has dozens of invoices that add up to $100, so she's going to note that $100, go and add up her list and come back in 5 minutes and write $200 on the board.
Bob's been shopping. He's going to take that number from the whiteboard and go and subtract $50 worth of purchases, and then he's going to write $50 on the board.
If Bob gets back first, we'll see $200 after Alice writes her result. If Alice gets back first we'll see $50, also wrong. What we want to see is $150, and we need to add some precautions somewhere to make that happen.
That should be enough to scaffold a discussion of technical solutions with reasonable intuitions.
For example, a mutex means you lock the door to the room with the whiteboard in it, and make them do their work in there. An optimistic solution means you get them both to check and start over if the number changed while they were away. If you want to talk about deadlocks, you can laugh about Bob calling Alice from inside the locked room to ask her to hurry up.
Send them to Race Condition on Wikipedia.
The first part will make some sense, and the rest (not shown below) will make you look smart since they will assume you understand it.
"A race condition or race hazard is a flaw in a system or process whereby the output and/or result of the process is unexpectedly and critically dependent on the sequence or timing of other events. The term originates with the idea of two signals racing each other to influence the output first."
I think the key point to get across is that its most frequently a timing issue that can be unpredictable because the timing something takes differs from time to time.