This is a question from the Australian Informatics Olympiad
The question is:
Have you ever heard of Melodramia, my friend? It is a land of forbidden forests and boundless swamps, of sprinting heroes and dashing heroines. And it is home to two dragons, Rose and Scarlet, who, despite their competitive streak, are the best of friends.
Rose and Scarlet love playing Binary Snap, a game for two players. The game is played with a deck of cards, each with a numeric label from 1 to N. There are two cards with each possible label, making 2N cards in total. The game goes as follows:
Rose shuffles the cards and places them face down in front of Scarlet.
Scarlet then chooses either the top card, or the second-from-top card from the deck and reveals it.
Scarlet continues to do this until the deck is empty. If at any point the card she reveals has the same label as the previous card she revealed, the cards are a Dragon Pair, and whichever dragon shouts `Snap!' first gains a point.
After many millenia of playing, the dragons noticed that having more possible Dragon Pairs would often lead to a more exciting game. It is for this reason they have summoned you, the village computermancer, to write a program that reads in the order of cards in the shuffled deck and outputs the maximum number of Dragon Pairs that the dragons can find.
I'm not sure how to solve this. I thought of something which is wrong(choosing the maximum over all cards, when compared with its previous occurence for each card)
Here's my code as of now:
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ifstream fin("snapin.txt");
ofstream fout("snapout.txt");
int n;
fin>>n;
int arr[(2*n)+1];
for(int i=0;i<2*n;i++){
fin>>arr[i];
}
int dp[(2*n) +1];
int maxi = 0;
int pos[n+1];
for(int i=0;i<n+1;i++){
pos[i] = -1;
}
int count = 0;
for(int i=2;i<(2*n)-2;i++){
if(pos[arr[i]] == -1){
pos[arr[i]] = i;
}else{
dp[i] = pos[arr[i]]+1;
maxi = max(dp[i],maxi);
}
dp[i] = max(dp[i],maxi);
}
fout<<dp[2*n -1];
}
Ok, let's get some basic measurements of the problem out of the way first:
There are 2N cards. 1 card is drawn at a time, without replacement. Therefore there are 2N draws, taking the deck from size 2N (before the first draw) to size 0 (after the last draw).
The final draw takes place from a deck of size 1, and must take the last remaining card.
The 2N-1 preceding draws have deck size 2N, ... 3, 2. For each of these you have a choice between the top two cards. 2N-1 decisions, each with 2 possibilities.
The brute force search space is therefore 22N-1.
That is exponential growth, every optimization scientist's favorite sort of challenge.
If N is small, say 20, the brute force method needs to search "only" a trillion possibilities, which you can get done in a few thousand seconds on a readily available PC that does a few billion operations per second (each solution takes more than one CPU instruction to check).
In N is not quite as small, perhaps 100, the brute force method is akin to breaking the encryption on government secrets.
Not happy with the brute force approach then? I'm not either.
Before we get to the optimal solution, let’s take a break to explore what the Markov assumption is and what it means for us. It shows up in different fields using different verbiage, but I’ll just paraphrase it in a way that is particularly useful for this problem involving gameplay choices:
Markov Assumption
A process is Markov if and only if The choices available to you in the future depend only on what you have now, and not how you got it.
A bad but often used real-world example is the stock market. Not only do taxation differences between short-term and long-term capital gains make history important in a small way, but investors do trend analysis and remember what stocks have done before, which affects future behavior in a big way.
A better example, especially for StackOverflow, is that of Turing machines and computer processors. What your program does next depends on the current instruction pointer and the contents of memory, but not the history of memory that’s since been overwritten. But there are many more. As we’ll see shortly, the Binary Snap problem can be formulated as Markov.
Now let’s talk about what makes the Markov assumption so important. For that, we’ll use the Travelling Salesman Problem. No, the Travelling International Salesman Problem. Still too messy. Let’s try the “Travelling International Salesman with a Single-Entry Visa Problem”. But we’ll go through all three of them briefly:
Travelling Salesman Problem
A salesman has to visit potential buyers in N cities. Plan an itinerary for the salesman which minimizes the total cost of visiting all N cities (variations: at least once / exactly once), given a matrix aj,k which is the cost of travel from city j to city k.
Another variation is whether the starting city is predetermined or not.
Travelling International Salesman Problem
The cities the salesman needs to visit are split between two (or more) nations. A subset of the cities have border crossings and have travel options to all cities. The other cities can only reach cities which are either in the same country or are border-equipped.
Alternatively, instead of cities along the border, use cities with international airports. Won’t make a difference in the end.
The cost matrix for this problem looks rather like the flag of the Dominican Republic. Travel between interior cities of country A is permitted, as is travel between interior cities of country B (blue fields). Border cities connect with interior and border cities in both countries (white cross). And direct travel between an interior city of country A and one of country B is impossible (red areas).
Travelling International Salesman with a Single-Entry Visa
Now not only does the salesman need to visit cities in both countries, but he can only cross the border once.
(For travel fanatics, assume he starts in a third country and has single-entry visas for both countries, so he can’t visit some of A, all of B, then return to A for the rest).
Let’s look at an extremely simple case first: Only one border city. We’ll use one additional trick, the one from proof by induction: We assume that all problems smaller than the current one can be solved.
It should be fairly obvious that the Markov assumption holds when the salesman reaches the border city. No matter what path he took through country A, he has exactly the same choice of paths through country B.
But there’s a really important point here: Any path through country A ending at the border and any path through country B starting at the border, can be combined into a feasible full itinerary. If we have two full itineraries x and y, and x spent more money in country A than y did, then even if x has a lower total cost than the total cost of y, we can plan a path better than both, using the portion of y in country A and the portion of x in country B. I’m going to call that “splicing”. The Markov assumption lets us do it, by making all roads leading to the border interchangeable!
In fact, we can look just at the cities of country A, pick the best of all routes to the border, and forget about all the other options as soon as (in our plan) the salesman steps across into B.
This means instead of having factorial(NA) * factorial(NB) routes to look at, there’s only factorial(NA) + factorial(NB). Which is pretty much factorial(NA) times better. Wow, is this Markov thing helpful or what?
Ok, that was too easy. Let’s mess it all up by having NAB border cities instead of just one. Now if I have a path x which costs less in country B and a path y which costs less in country A, but they cross the border in different cities, I can’t just splice them together. So I have to keep track of all the paths through all the cities again, right?
Not exactly. What if, instead of throwing away all the paths through country A except the best y path, I actually keep one path ending in each border city (the lowest cost of all paths ending in the same border city). Now, for any path x I look at in country B, I have a path yendpt(x) that uses the same border city, to splice it with. So I have to solve the country A and country B partitions each NAB times to find the best splice of a complete itinerary, for total work of NAB factorial(NA) + NAB factorial(NB) which is still way better than factorial(NA) * factorial(NB).
Enough development of tools. Let’s get back to the dragons, since they are they are subtle and quick to anger and I don’t want to be eaten or burnt to a crisp.
I claim that at any step T of the Binary Snap game, if we consider our “location” a pair of (card just drawn, card on top of deck), the Markov assumption will hold. These are the only things that determine our future options. All the cards below the top one in the deck must be in the same order no matter what we did before. And for knowing whether to count a Snap! with the next card, we need to know the last one taken. And that’s it!
Furthermore, there are N possible labels on the card last drawn, and N possible for the top card on the deck, for a total of N2 “border cities”. As we start playing the game, there are two choices on the first turn, two on the second, two on the third, so we start out with 2T possible game states (and a count of Snap!s for each). But by the pigeonhole principle, when 2T > N2, some of these plays must end in exactly the same game state (“border city”) as each other, and when that happens, we only need to keep the "dominating" one that got the best score on the way there.
Final complexity bound: 2*N timesteps, from no more than N2 game states, with 2 draw choices at each, equals an upper limit of 4*N3 simulated draws.
And that means the same trillion calculations that allowed us to do N=20 with the brute force method, now permit right around N=8000.
That makes the dragons happy, which makes us alive and well.
Implementation note: Since the challenge didn’t ask for the order of draws, but just the highest attainable number of snaps, all you data to keep track of in addition to the initial ordering of the cards is the time, T, and a 2-dimensional array (N rows, N columns) of the best score you can have and reach that state at time T.
Real world applications: If you take this approach and apply it to a digital radio (fixed uniform bit timing, discrete signal levels) receiving a signal using a convolutional error-correcting code, you have the Viterbi decoder. If you apply it to acquired medical data, with variable timing intervals and continuous signal levels, and add some other gnarly math, you get my doctoral project.
I am building an experiment with the psychopy builder and I want to do an "evaluative conditioning" experiment. That is: first, different stimuli are rated according to their valence. Later, neutral stimuli will be paired with negative or positive stimuli.
Here is the structure, I have done so far:
Experiment structure
Specifically, in a first step, participants rate different items (pictures, shapes, ciphers). So far, no problem.
The next trials, the problem starts.
Trial structure in the builder
In a next routine, the pictures should be arranged according to their rating.
That is positive, neutral, negative pictures (1-3 = negative, 4-6 = neutral, 7-9 = positive). They are called UCStim.
First, I need, to know, if at least 5 pictures in the rating routine were rated < 3. If yes, the experiment can continue. If not, it has to be stopped.
Second step: select the 5 most negatively rated pictures. (But what, if 8 were rated with 1? Then a random selection of the images rated with 1 should take place).
Then select 5 pictures with values between 4 and 6 (neutral pictures).
Third step: combine the shapes (NeutralStim) with neutral Pictures and the ciphers (also NeutralStim) with negative pictures. The definition of shape vs. cipher is done in the excel-file of the loop (called condLoop.xlsx, column is called "kind"). The shapes should now be presented before negatively rated pictures and the ciphers before positively rated pictures.
My first problem is: how to I get the rated values? Do I have to import the csv-file? Or can I directly import them (the responses of the ratings are called rating.response_raw in the data file.
I've hit a snag in continuing my work in a C++ program, I'm not sure what the best way to approach my problem is. Here is the situation in non-programming terms: I have a list of children and each child has a specific weight, age, and happiness. I have a way that people can visually view the bones of the child that is specific to these characteristics. (Think of an MMO character customization where there are sliders for each characteristic and when you slide the weight slider to heavy, the walk cycle looks like the character is heavier).
Before, my system had a set walk cycle for each end of the spectrum for each characteristic. For example, there is one specific walk cycle for the heaviest walk, one for the lightest walk, one for youngest walk, etc. There was not middle input, the output was the position of the slider on the scale and the heaviest walk cycle and the lightest walk cycle were averaged by a specific percentage, the position of the slider.
Now to the problem, I have a large library of preset walk cycles and each walk cycle has a specific weight, age, and happiness. So, Joe has a weight of 4, an age of 7, and happiness level of 8 and Sally 2, 3, 5. When the sliders move to a the specific value (weight 5, age 8, happiness 7). However, only one slider can be moved at one time and the slider that was moved last is the most important characteristic to find the closest match to. I want to find in my library the child that has the closest to all three of these values and Joe will be the closest.
I was told to check out using a 3 dimensional array but I would rather use an array of child objects and do multiple searches on that array which, I am a rookie and I know the search will take a bit of computing time but I keep leaning towards using the single array. I could also use a two dimensional array but I'm not sure. What data structure would be the best to search for three values in?
Thank you for any help!
How many different values can each slider take? If there are say ten values for each slider this would mean there are 10*10*10=1000 different possible character classes. If your library has less than 1000 walk cycles just reading through them all looking for the nearest match is probably gonna be fast enough.
Of course if there are 100 values for each slider then you may want something more clever. My point is there are some thing that don't have to be optimized.
Also is your library of walk cycles fixed once and for all? If so perhaps you could pre compute the walk cycle for each setting of the sliders and write that to a static array.
I agree with Wilf that the number of walk cycles is critical, as even if there are say 100,000 cycles you could easily use a brute-force find-the-maximum over...
weight_factor * diff(candidate.weight, target.weight) +
age_factor * diff(candidate.age, target.age) +
happiness_factor * diff(candidate.happiness, target.happiness)
...where the factor for the last-moved slider was higher than the others.
For more cycles than that you'd want to limit the search space somewhat, and some indices would be useful, e.g.:
map<int, map< int, map<int, vector<Cycle*>> cycles_by_weight_age_happiness;
You'd populate that adding a pointer to each walk cycle - characterised by { weight, age, happiness } - to cycles[rw(weight)][ra(age)][rh(happiness)], where each of rw, ra and rh rounded the parameters by whatever granularity you liked (e.g. round weight down to nearest 5kgs, group ages by integer part of log base 1.5 of age, leave happiness alone). Then to search you evaluate the entries "around" your target { rw(weight), ra(age), rh(happiness) } indices... the further from there you deviate (especially on the last-slider-moved parameter, the less likely you are to find a better fit than you've already seen, so tune to taste.
The above indexing is a refinement of what I think Wilf intended - just using functions to decouple the mapping from value space into vectors in the index.
For a course in my Computer Science studies, I have to come up with a set of constraints and a score-definition to find a tiling for frequent itemset mining. The matrix with the data consists of ones and zeroes.
My task is to come up with a set of constraints for the tiling (having a fixed amount of tiles), and a score-function that needs to be maximized. Since I started working out a solution that allows overlapping tiles, I tried to find a score-function to calculate the total "area" of all tiles. Bear in mind that the score function has to be evaluated for every possible solution, so I can't simply go over the total matrix (which contains about 100k elements) and see if it is part of a tile.
However, I only took into account overlap between only 2 tiles, and came up with the following:
TotalArea = Sum_a_in_Tiles(Area(a)) - Sum_a/b_in_tiles(Overlap(a,b))
Silly me, I didn't consider a possible overlap between 3 tiles. My question is the following:
Is it possible to come up with a generic score-function for n tiles, considering only area per tile and area per overlap between 2 (or more) tiles, and if so, how would I program it?
I could provide some code, but then again it has to be programmed in some obscure language called Comet :(
I've a problem statement like :
Zombies have placed themselves at every junction in New York. Each junction 'i' initially has a presence of ai number of zombies. At every timestep each zombie randomly chooses one of its neighboring junctions and walks towards it. Each neighboring junction is choosen by the zombie with an equal probability. In order to safegaurd the citizens of New York we need to find out the number of zombies at every junction after 'k' timesteps.
The network of New York is given as an edge list.
I've the option to input all the nodes and all the edges and k
. Now I need the number of zombies in five most populated nodes. Now my question is why this set allways need to be the same?
I mean when I run the program first time suppose I get the output set{5,5,5,4,4} But why allways this output needs to be the same?
Thanks in advance and I'm new to simulation so I'm sorry If I've asked something absurd. Actually it's a Interviewstreet question and I'm not asking for the solution code.
The zombies move randomly so it won't be the same every time. It will be somewhat random. You need to simulate this random movement of zombies.