This is a question from the Australian Informatics Olympiad
The question is:
Have you ever heard of Melodramia, my friend? It is a land of forbidden forests and boundless swamps, of sprinting heroes and dashing heroines. And it is home to two dragons, Rose and Scarlet, who, despite their competitive streak, are the best of friends.
Rose and Scarlet love playing Binary Snap, a game for two players. The game is played with a deck of cards, each with a numeric label from 1 to N. There are two cards with each possible label, making 2N cards in total. The game goes as follows:
Rose shuffles the cards and places them face down in front of Scarlet.
Scarlet then chooses either the top card, or the second-from-top card from the deck and reveals it.
Scarlet continues to do this until the deck is empty. If at any point the card she reveals has the same label as the previous card she revealed, the cards are a Dragon Pair, and whichever dragon shouts `Snap!' first gains a point.
After many millenia of playing, the dragons noticed that having more possible Dragon Pairs would often lead to a more exciting game. It is for this reason they have summoned you, the village computermancer, to write a program that reads in the order of cards in the shuffled deck and outputs the maximum number of Dragon Pairs that the dragons can find.
I'm not sure how to solve this. I thought of something which is wrong(choosing the maximum over all cards, when compared with its previous occurence for each card)
Here's my code as of now:
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ifstream fin("snapin.txt");
ofstream fout("snapout.txt");
int n;
fin>>n;
int arr[(2*n)+1];
for(int i=0;i<2*n;i++){
fin>>arr[i];
}
int dp[(2*n) +1];
int maxi = 0;
int pos[n+1];
for(int i=0;i<n+1;i++){
pos[i] = -1;
}
int count = 0;
for(int i=2;i<(2*n)-2;i++){
if(pos[arr[i]] == -1){
pos[arr[i]] = i;
}else{
dp[i] = pos[arr[i]]+1;
maxi = max(dp[i],maxi);
}
dp[i] = max(dp[i],maxi);
}
fout<<dp[2*n -1];
}
Ok, let's get some basic measurements of the problem out of the way first:
There are 2N cards. 1 card is drawn at a time, without replacement. Therefore there are 2N draws, taking the deck from size 2N (before the first draw) to size 0 (after the last draw).
The final draw takes place from a deck of size 1, and must take the last remaining card.
The 2N-1 preceding draws have deck size 2N, ... 3, 2. For each of these you have a choice between the top two cards. 2N-1 decisions, each with 2 possibilities.
The brute force search space is therefore 22N-1.
That is exponential growth, every optimization scientist's favorite sort of challenge.
If N is small, say 20, the brute force method needs to search "only" a trillion possibilities, which you can get done in a few thousand seconds on a readily available PC that does a few billion operations per second (each solution takes more than one CPU instruction to check).
In N is not quite as small, perhaps 100, the brute force method is akin to breaking the encryption on government secrets.
Not happy with the brute force approach then? I'm not either.
Before we get to the optimal solution, let’s take a break to explore what the Markov assumption is and what it means for us. It shows up in different fields using different verbiage, but I’ll just paraphrase it in a way that is particularly useful for this problem involving gameplay choices:
Markov Assumption
A process is Markov if and only if The choices available to you in the future depend only on what you have now, and not how you got it.
A bad but often used real-world example is the stock market. Not only do taxation differences between short-term and long-term capital gains make history important in a small way, but investors do trend analysis and remember what stocks have done before, which affects future behavior in a big way.
A better example, especially for StackOverflow, is that of Turing machines and computer processors. What your program does next depends on the current instruction pointer and the contents of memory, but not the history of memory that’s since been overwritten. But there are many more. As we’ll see shortly, the Binary Snap problem can be formulated as Markov.
Now let’s talk about what makes the Markov assumption so important. For that, we’ll use the Travelling Salesman Problem. No, the Travelling International Salesman Problem. Still too messy. Let’s try the “Travelling International Salesman with a Single-Entry Visa Problem”. But we’ll go through all three of them briefly:
Travelling Salesman Problem
A salesman has to visit potential buyers in N cities. Plan an itinerary for the salesman which minimizes the total cost of visiting all N cities (variations: at least once / exactly once), given a matrix aj,k which is the cost of travel from city j to city k.
Another variation is whether the starting city is predetermined or not.
Travelling International Salesman Problem
The cities the salesman needs to visit are split between two (or more) nations. A subset of the cities have border crossings and have travel options to all cities. The other cities can only reach cities which are either in the same country or are border-equipped.
Alternatively, instead of cities along the border, use cities with international airports. Won’t make a difference in the end.
The cost matrix for this problem looks rather like the flag of the Dominican Republic. Travel between interior cities of country A is permitted, as is travel between interior cities of country B (blue fields). Border cities connect with interior and border cities in both countries (white cross). And direct travel between an interior city of country A and one of country B is impossible (red areas).
Travelling International Salesman with a Single-Entry Visa
Now not only does the salesman need to visit cities in both countries, but he can only cross the border once.
(For travel fanatics, assume he starts in a third country and has single-entry visas for both countries, so he can’t visit some of A, all of B, then return to A for the rest).
Let’s look at an extremely simple case first: Only one border city. We’ll use one additional trick, the one from proof by induction: We assume that all problems smaller than the current one can be solved.
It should be fairly obvious that the Markov assumption holds when the salesman reaches the border city. No matter what path he took through country A, he has exactly the same choice of paths through country B.
But there’s a really important point here: Any path through country A ending at the border and any path through country B starting at the border, can be combined into a feasible full itinerary. If we have two full itineraries x and y, and x spent more money in country A than y did, then even if x has a lower total cost than the total cost of y, we can plan a path better than both, using the portion of y in country A and the portion of x in country B. I’m going to call that “splicing”. The Markov assumption lets us do it, by making all roads leading to the border interchangeable!
In fact, we can look just at the cities of country A, pick the best of all routes to the border, and forget about all the other options as soon as (in our plan) the salesman steps across into B.
This means instead of having factorial(NA) * factorial(NB) routes to look at, there’s only factorial(NA) + factorial(NB). Which is pretty much factorial(NA) times better. Wow, is this Markov thing helpful or what?
Ok, that was too easy. Let’s mess it all up by having NAB border cities instead of just one. Now if I have a path x which costs less in country B and a path y which costs less in country A, but they cross the border in different cities, I can’t just splice them together. So I have to keep track of all the paths through all the cities again, right?
Not exactly. What if, instead of throwing away all the paths through country A except the best y path, I actually keep one path ending in each border city (the lowest cost of all paths ending in the same border city). Now, for any path x I look at in country B, I have a path yendpt(x) that uses the same border city, to splice it with. So I have to solve the country A and country B partitions each NAB times to find the best splice of a complete itinerary, for total work of NAB factorial(NA) + NAB factorial(NB) which is still way better than factorial(NA) * factorial(NB).
Enough development of tools. Let’s get back to the dragons, since they are they are subtle and quick to anger and I don’t want to be eaten or burnt to a crisp.
I claim that at any step T of the Binary Snap game, if we consider our “location” a pair of (card just drawn, card on top of deck), the Markov assumption will hold. These are the only things that determine our future options. All the cards below the top one in the deck must be in the same order no matter what we did before. And for knowing whether to count a Snap! with the next card, we need to know the last one taken. And that’s it!
Furthermore, there are N possible labels on the card last drawn, and N possible for the top card on the deck, for a total of N2 “border cities”. As we start playing the game, there are two choices on the first turn, two on the second, two on the third, so we start out with 2T possible game states (and a count of Snap!s for each). But by the pigeonhole principle, when 2T > N2, some of these plays must end in exactly the same game state (“border city”) as each other, and when that happens, we only need to keep the "dominating" one that got the best score on the way there.
Final complexity bound: 2*N timesteps, from no more than N2 game states, with 2 draw choices at each, equals an upper limit of 4*N3 simulated draws.
And that means the same trillion calculations that allowed us to do N=20 with the brute force method, now permit right around N=8000.
That makes the dragons happy, which makes us alive and well.
Implementation note: Since the challenge didn’t ask for the order of draws, but just the highest attainable number of snaps, all you data to keep track of in addition to the initial ordering of the cards is the time, T, and a 2-dimensional array (N rows, N columns) of the best score you can have and reach that state at time T.
Real world applications: If you take this approach and apply it to a digital radio (fixed uniform bit timing, discrete signal levels) receiving a signal using a convolutional error-correcting code, you have the Viterbi decoder. If you apply it to acquired medical data, with variable timing intervals and continuous signal levels, and add some other gnarly math, you get my doctoral project.
Related
I am attempting to use Q-learning to learn minesweeping behavior on a discreet version of Mat Buckland's smart sweepers, the original available here http://www.ai-junkie.com/ann/evolved/nnt1.html, for an assignment. The assignment limits us to 50 iterations of 2000 moves on a grid that is effectively 40x40, with the mines resetting and the agent being spawned in a random location each iteration.
I've attempted performing q learning with penalties for moving, rewards for sweeping mines and penalties for not hitting a mine. The sweeper agent seems unable to learn how to sweep mines effectively within the 50 iterations because it learns that going to specific cell is good, but after a the mine is gone it is no longer rewarded, but penalized for going to that cell with the movement cost
I wanted to attempt providing rewards only when all the mines were cleared in an attempt to make the environment static as there would only be a state of not all mines collected, or all mines collected, but am struggling to implement this due to the agent having only 2000 moves per iteration and being able to backtrack, it never manages to sweep all the mines in an iteration within the limit with or without rewards for collecting mines.
Another idea I had was to have an effectively new Q matrix for each mine, so once a mine is collected, the sweeper transitions to that matrix and operates off that where the current mine is excluded from consideration.
Are there any better approaches that I can take with this, or perhaps more practical tweaks to my own approach that I can try?
A more explicit explanation of the rules:
The map edges wrap around, so moving off the right edge of the map will cause the bot to appear on the left edge etc.
The sweeper bot can move up down, left or right from any map tile.
When the bot collides with a mine, the mine is considered swept and then removed.
The aim is for the bot to learn to sweep all mines on the map from any starting position.
Given that the sweeper can always see the nearest mine, this should be pretty easy. From your question I assume your only problem is finding a good reward function and representation for your agent state.
Defining a state
Absolute positions are rarely useful in a random environment, especially if the environment is infinite like in your example (since the bot can drive over the borders and respawn at the other side). This means that the size of the environment isn't needed for the agent to operate (we will actually need it to simulate the infinite space, tho).
A reward function calculates its return value based on the current state of the agent compared to its previous state. But how do we define a state? Lets see what we actually need in order to operate the agent like we want it to.
The position of the agent.
The position of the nearest mine.
That is all we need. Now I said erlier that absolute positions are bad. This is because it makes the Q table (you call it Q matrix) static and very fragile to randomness. So let's try to completely eliminate abosulte positions from the reward function and replace them with relative positions. Luckily, this is very simple in your case: instead of using the absolute positions, we use the relative position between the nearest mine and the agent.
Now we don't deal with coordinates anymore, but vectors. Lets calculate the vector between our points: v = pos_mine - pos_agent. This vector gives us two very important pieces of information:
the direction in which the nearst mine is, and
the distance to the nearest mine.
And these are all we need to make our agent operational. Therefore, an agent state can be defined as
State: Direction x Distance
of which distance is a floating point value and direction either a float that describes the angle or a normalized vector.
Defining a reward function
Given our newly defined state, the only thing we care about in our reward function is the distance. Since all we want is to move the agent towards mines, the distance is all that matters. Here are a few guesses how the reward function could work:
If the agent sweeps a mine (distance == 0), return a huge reward (ex. 100).
If the agent moves towards a mine (distance is shrinking), return a neutral (or small) reward (ex. 0).
If the agent moves away from a mine (distance is increasing), retuan a negative reward (ex. -1).
Theoretically, since we penaltize moving away from a mine, we don't even need rule 1 here.
Conclusion
The only thing left is determining a good learning rate and discount so that your agent performs well after 50 iterations. But, given the simplicity of the environment, this shouldn't even matter that much. Experiment.
I've hit a snag in continuing my work in a C++ program, I'm not sure what the best way to approach my problem is. Here is the situation in non-programming terms: I have a list of children and each child has a specific weight, age, and happiness. I have a way that people can visually view the bones of the child that is specific to these characteristics. (Think of an MMO character customization where there are sliders for each characteristic and when you slide the weight slider to heavy, the walk cycle looks like the character is heavier).
Before, my system had a set walk cycle for each end of the spectrum for each characteristic. For example, there is one specific walk cycle for the heaviest walk, one for the lightest walk, one for youngest walk, etc. There was not middle input, the output was the position of the slider on the scale and the heaviest walk cycle and the lightest walk cycle were averaged by a specific percentage, the position of the slider.
Now to the problem, I have a large library of preset walk cycles and each walk cycle has a specific weight, age, and happiness. So, Joe has a weight of 4, an age of 7, and happiness level of 8 and Sally 2, 3, 5. When the sliders move to a the specific value (weight 5, age 8, happiness 7). However, only one slider can be moved at one time and the slider that was moved last is the most important characteristic to find the closest match to. I want to find in my library the child that has the closest to all three of these values and Joe will be the closest.
I was told to check out using a 3 dimensional array but I would rather use an array of child objects and do multiple searches on that array which, I am a rookie and I know the search will take a bit of computing time but I keep leaning towards using the single array. I could also use a two dimensional array but I'm not sure. What data structure would be the best to search for three values in?
Thank you for any help!
How many different values can each slider take? If there are say ten values for each slider this would mean there are 10*10*10=1000 different possible character classes. If your library has less than 1000 walk cycles just reading through them all looking for the nearest match is probably gonna be fast enough.
Of course if there are 100 values for each slider then you may want something more clever. My point is there are some thing that don't have to be optimized.
Also is your library of walk cycles fixed once and for all? If so perhaps you could pre compute the walk cycle for each setting of the sliders and write that to a static array.
I agree with Wilf that the number of walk cycles is critical, as even if there are say 100,000 cycles you could easily use a brute-force find-the-maximum over...
weight_factor * diff(candidate.weight, target.weight) +
age_factor * diff(candidate.age, target.age) +
happiness_factor * diff(candidate.happiness, target.happiness)
...where the factor for the last-moved slider was higher than the others.
For more cycles than that you'd want to limit the search space somewhat, and some indices would be useful, e.g.:
map<int, map< int, map<int, vector<Cycle*>> cycles_by_weight_age_happiness;
You'd populate that adding a pointer to each walk cycle - characterised by { weight, age, happiness } - to cycles[rw(weight)][ra(age)][rh(happiness)], where each of rw, ra and rh rounded the parameters by whatever granularity you liked (e.g. round weight down to nearest 5kgs, group ages by integer part of log base 1.5 of age, leave happiness alone). Then to search you evaluate the entries "around" your target { rw(weight), ra(age), rh(happiness) } indices... the further from there you deviate (especially on the last-slider-moved parameter, the less likely you are to find a better fit than you've already seen, so tune to taste.
The above indexing is a refinement of what I think Wilf intended - just using functions to decouple the mapping from value space into vectors in the index.
I have to find the shortest path from point D to R. These are fixed points.
This is an example of a situation:
The box also contains walls, through which you cannot pass across them, unless you break them. Each wall break costs you, let's say "a" points, where "a" is a positive integer.
Each move which doesn't involve a wall, costs you 1 point.
The mission is to find out of all the paths of minimum cost, the one with the least number of broken walls.
Since, the width of the box can go up to 100 cells, it's irrelevant to use backtracking. It's too slow. The only solution I came up is this one:
Go east or south if there are no walls
If south has a wall, check if west has wall. If west has wall, break south wall. If west doesn't have wall, go west, until you find a south cell without wall. Repeat this process with south and east until you exceed the cost of a broken wall in this order. If path from west goes into the same place as if I had broken the south wall and costs the same or less than "a" points, then use this path, else brake south wall.
If nothing above encounters, brake a south or east wall, depending on the box boundary.
Repeat steps 1, 2, 3 till the "passenger" arrives in point R. Between these 3 steps, there are "else-if" relations.
Can you come up with a better problem algorithm? I program in C++.
Use Dijkstra, but for costs give it 1 for a move that doesn't break a wall, and (a+0.00001) for breaking a wall. Then Dijkstra will give you what you want, the path that breaks the fewest walls among all minimal-cost paths.
Conceptually, imagine a traveler who can jump over walls -- while keeping track of the cost -- and can also split into two identical travelers when faced with a choice of two paths, so as to take them both (take that, Robert Frost!). Only one traveler moves at a time, the one who has incurred the lowest cost so far. That one moves, and writes on the floor "I reached here at a cost of only x". If I find such a note already there, if I got there more cheaply I erase the old note and write my own; if that other traveler got there more cheaply I commit suicide.
The two-part "cost first, then broken walls", can be represented as a pair (c, w) that is compared lexicographically. c is the cost, w is the number of broken walls. That makes it a "single thing" again (in some sense), so it's a thing that you can put into algorithms and so on that expect simply "a cost" (as an abstract thing that it may add an other cost to or compare to an other cost).
So we can just use A*, with a Manhattan Distance heuristic (perhaps there's something smarter that doesn't ignore walls completely, but this will work - underestimating the distance is admissible). The movement cost will, of course, not ignore walls. Neighbours will be all adjacent squares. All costs will be the pair I described above.
This could easily be modeled as a weighted graph and then apply Dijkstra's shortest path algorithm to it. Each square is a node. It is connected to the nodes of the squares it is adjacent to. The weight of the connections is either 1 or "a", based on whether there is a wall or not. This will get you the minimal cost. It's possible that the minimum cost and the minimum number of wall breaks could be different.
Here is a general algorithm (you'll have to do the implementation yourself):
Convert the matrix into a weighted graph:
For each entry in the matrix, create a Vertex.
For each Vertex, create an array of Edges, one for each neighboring Vertex.
For each Edge, define a weight according to the cost of breaking the wall between the two Vertices that the Edge is connecting.
Then, run the Dijkstra's algorithm (http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm) on the graph, starting from Vertex D. As an output, you will have the shortest (cheapest) path from Vertex D to any other Vertex on the graph, including Vertex R.
For a course in my Computer Science studies, I have to come up with a set of constraints and a score-definition to find a tiling for frequent itemset mining. The matrix with the data consists of ones and zeroes.
My task is to come up with a set of constraints for the tiling (having a fixed amount of tiles), and a score-function that needs to be maximized. Since I started working out a solution that allows overlapping tiles, I tried to find a score-function to calculate the total "area" of all tiles. Bear in mind that the score function has to be evaluated for every possible solution, so I can't simply go over the total matrix (which contains about 100k elements) and see if it is part of a tile.
However, I only took into account overlap between only 2 tiles, and came up with the following:
TotalArea = Sum_a_in_Tiles(Area(a)) - Sum_a/b_in_tiles(Overlap(a,b))
Silly me, I didn't consider a possible overlap between 3 tiles. My question is the following:
Is it possible to come up with a generic score-function for n tiles, considering only area per tile and area per overlap between 2 (or more) tiles, and if so, how would I program it?
I could provide some code, but then again it has to be programmed in some obscure language called Comet :(
In the field of stock market technical analysis there is the concept of rectangular price congestion levels, that is: the price goes up and down essentially never breaking the previous high and low price levels for some time, forming the figure of a rectangule. E.g.: http://cf.ydcdn.net/1.0.0.25/images/invest/congestion%20area.jpg.
Edit: to me clearer: the stock as well as the forex market is made by sets of movements called "impulse" and "correction", the first one being in the direction of the current stock's trendand the other in the opposite. When the stock is moving in the direction of the trend, the impulse movement is always bigger than the following correction, but sometimes what happens is a that the correction end-up being at the same size of the impulse. So for example, in a stock with a positive trend, the impulse movement moved from price $10,00 to $15,00, and than a correction appeared dropping the price to $12,00. When the new impulse appeared, thought, instead of passing the previous high value ($15,00), it stooped exactly on it, being followed by a new correction that dropped the price exactly to the previous low price ($12,00). So now we may draw two paralel horizontal lines in the stock's graph: one in the $15,00 price and other in the $12,00, forming a channel where the price is "congestioned" inside. And if we draw two vertical bars in the extreme sides, we have a rectangle: one that has its top bar in the high level and other in the low one.
I'm trying to create an algorithm in C++/Qt capable of detecting such patterns with candlestick data inside a list container (using Qt -> QList), but currently I'm doing research to see if anybody knows about someone who already did such code so I save lots of efforts and time in developing such algorithm.
So my first question will be: does anybody knows and open-source code that can detect such figure? - Obviously doesn't have to be exactly in this conditions, but if there is a code that do a similar taks, only needing for me to do the adjustments, that would be fine.
In the other hand, how could I create such algorithm anyway? It's clear the the high spot is to detect the high and low levels and than just control when those levels are 'broken' to detect the end of the figure, but how could I do that in an efficient way? Today the best thing I'm able to do is to detect high-and-low levels using time as parameter (e.g. "the highest price in four candles", and this using a very expensive code.
Technical analysis is very vague and subjective, hard to code in a program when everyone sees different things in the same chart. A good start would be to use some cost function such as choosing levels that minimizing the sum of squared distances, which penalizes large deviations more than smaller ones.
You should use the idea of 'hysteresis' thresholding; you create a 4-level state machine for how the price breaks the low (L) or high (H) levels. (first time reaches new low level) L->L, (return to low level) H->L,(new high level) H->H, and then (return to high level) L->H.