Q-learning to learn minesweeping behavior - c++

I am attempting to use Q-learning to learn minesweeping behavior on a discreet version of Mat Buckland's smart sweepers, the original available here http://www.ai-junkie.com/ann/evolved/nnt1.html, for an assignment. The assignment limits us to 50 iterations of 2000 moves on a grid that is effectively 40x40, with the mines resetting and the agent being spawned in a random location each iteration.
I've attempted performing q learning with penalties for moving, rewards for sweeping mines and penalties for not hitting a mine. The sweeper agent seems unable to learn how to sweep mines effectively within the 50 iterations because it learns that going to specific cell is good, but after a the mine is gone it is no longer rewarded, but penalized for going to that cell with the movement cost
I wanted to attempt providing rewards only when all the mines were cleared in an attempt to make the environment static as there would only be a state of not all mines collected, or all mines collected, but am struggling to implement this due to the agent having only 2000 moves per iteration and being able to backtrack, it never manages to sweep all the mines in an iteration within the limit with or without rewards for collecting mines.
Another idea I had was to have an effectively new Q matrix for each mine, so once a mine is collected, the sweeper transitions to that matrix and operates off that where the current mine is excluded from consideration.
Are there any better approaches that I can take with this, or perhaps more practical tweaks to my own approach that I can try?
A more explicit explanation of the rules:
The map edges wrap around, so moving off the right edge of the map will cause the bot to appear on the left edge etc.
The sweeper bot can move up down, left or right from any map tile.
When the bot collides with a mine, the mine is considered swept and then removed.
The aim is for the bot to learn to sweep all mines on the map from any starting position.

Given that the sweeper can always see the nearest mine, this should be pretty easy. From your question I assume your only problem is finding a good reward function and representation for your agent state.
Defining a state
Absolute positions are rarely useful in a random environment, especially if the environment is infinite like in your example (since the bot can drive over the borders and respawn at the other side). This means that the size of the environment isn't needed for the agent to operate (we will actually need it to simulate the infinite space, tho).
A reward function calculates its return value based on the current state of the agent compared to its previous state. But how do we define a state? Lets see what we actually need in order to operate the agent like we want it to.
The position of the agent.
The position of the nearest mine.
That is all we need. Now I said erlier that absolute positions are bad. This is because it makes the Q table (you call it Q matrix) static and very fragile to randomness. So let's try to completely eliminate abosulte positions from the reward function and replace them with relative positions. Luckily, this is very simple in your case: instead of using the absolute positions, we use the relative position between the nearest mine and the agent.
Now we don't deal with coordinates anymore, but vectors. Lets calculate the vector between our points: v = pos_mine - pos_agent. This vector gives us two very important pieces of information:
the direction in which the nearst mine is, and
the distance to the nearest mine.
And these are all we need to make our agent operational. Therefore, an agent state can be defined as
State: Direction x Distance
of which distance is a floating point value and direction either a float that describes the angle or a normalized vector.
Defining a reward function
Given our newly defined state, the only thing we care about in our reward function is the distance. Since all we want is to move the agent towards mines, the distance is all that matters. Here are a few guesses how the reward function could work:
If the agent sweeps a mine (distance == 0), return a huge reward (ex. 100).
If the agent moves towards a mine (distance is shrinking), return a neutral (or small) reward (ex. 0).
If the agent moves away from a mine (distance is increasing), retuan a negative reward (ex. -1).
Theoretically, since we penaltize moving away from a mine, we don't even need rule 1 here.
Conclusion
The only thing left is determining a good learning rate and discount so that your agent performs well after 50 iterations. But, given the simplicity of the environment, this shouldn't even matter that much. Experiment.

Related

Physics [UE4]: Implement a "maximum compression" for vehicle-suspension

For a given vehicle, I implemented a suspension system on four wheels.
The system is based on Hooke's Law.
The Problem: The vehicle should not be able to touch the ground. When driving in a spherical container (inside), the suspension gets compressed up to 100%, making the vehicle chassis touch the underground, which leads to unwanted collisions that throw the vehicle around.
Despite that may being a realistical behaviour, our game aims for an arcade-feeling, so I am looking for a formula to implement a maximum compression, so that the vehicle chassis can't come closer to the underground than X percent of the suspension size at any given moment, without actually simulating a physical contact between the two rigid bodys. Thus, I need to apply a fake force to the suspensions.
My current approach:
If the vehicle chassis would in fact touch the suspension base (Sorry, I don't know the proper word to describe this. I mean, when the suspension is at maximum compression), a force equal in magnitude and opposite in direction relative to the force pushing onto the suspension would be applied to the vehicle chassis, forcing it to stop moving downwards.
Therefore, I receive my vehicles world velocity V.
To get the downwards-velocity, I get the DotProduct of the velocity and the BodyUpVector.
float DownForceMagnitude = DotProduct(VelocityAtSuspension, BodyUpVector);
FVector DownForce = DownForceMagnitude * BodyUpVector;
FVector CounterForce = -DownForce * WeightOnSuspension;
Okay, this pseudo-code works somewhat fine on even underground, when the vehicle lands on a plane after a jump. Driving on a increasing slope however (like driving on the inside-walls of a sphere), makes the suspension reach maximum compression anyway, so apparently my approach is not correct.
I am now wondering what the cause is. My weight calculation only is simulated by VehicleWeight / 4, since the Unreal Engine 4 has no functionality to receive weight at a given location. I am no physics-pro, so forgive me if this is easy to calculate. Could that be the issue?
I do not need a physically 100% plausible solution, I just need a solution that works, and sufficiently stops the downwards motion of my vehicle chassis.
Any help is appreciated.
Greetings,
I had this problem with a futuristic magnetic hovercraft.
I solved it by reducing the force by ln depending on suspensions extension levels like so:
y = ln(ln(x+e))
where:
x = Suspension extension lvl in % (-> 0 being fully compressed)
y = the factor that you multiply the force with
e = eulers number
Here a graphic to help what it will be like:
https://ggbm.at/gmGEsAzE
ln is a very slow growing function thats why it works so great for this.
You probably want to clamp the values (maybe between 0 and 100 idk exactly how your code behaves and how u want this "break" to behave)
Tailor the function to your needs, I just wanted to suggest u use the ln like I did to solve this Problem.
I added e to x first to make it go through 0,0 if u want to make it stop earlier just subtract from x before using ln.
Also notice depending on when/how you calculate / update your suspension this (and any function applied to the force based on the extension of suspension levels) may not work under some circumstances or at all.

Plot a curved trajectory on a 2D coordinate system

I have a few ordered points (less than 10) in 2D coordinates system.
I have an agent moving in the system and I want to find the shortest path between those points following their order.
For background the agent can be given a position to go to with a thrust, and my objective is to plot the fastest course given the fact that the agent has a maximum thrust and maximum angular velocity.
After some research I realized that I may be looking for a curve fitting algorithm, but I don't know the underlying function since the points are randomly distributed in the coordinates system.
Please, help me find a solution to this problem.
I am open to any suggestion, my prefered programming language being C++.
I'm sure there is a pure mathematical solution such as spacecraft trajectory optimization for example, but here is how I would consider approaching it from a programming/annealing perspective.
Even though you need a continuous path, you might start the path search with discreet steps and a tolerance for getting close enough to each point as a start.
Assign a certain amount of time to each step and vary applied angular force and thrust at each step.
Calculate resulting angular momentum and position for the start of the next step.
Select the parameters for the next step either with a random search, or iterate through each to find the closest to the next target point (quantize the selection of angles and thrust to begin with). Repeat steps until you are close enough to the next target point. Then repeat to get to the next point.
Once you have a rough path you can start refining it (perhaps use the rough point from the previous run as target points in the new one) by reducing the time/size of the steps and tolerance to target points.
When evaluating the parameters' fitness at each step you might want to consider that once you reach a target point you also want to perhaps have momentum in the direction of the next point. This should be an emergent property if multiple paths are considered and the fitness function considers shortest total time.
c++ could help here if you use the std::priority_queue and std::map for example to keep track of the potential paths.

'Stable' multi-dimensional scaling algorithm

I have a wireless mesh network of nodes, each of which is capable of reporting its 'distance' to its neighbors, measured in (simplified) signal strength to them. The nodes are geographically in 3d space but because of radio interference, the distance between nodes need not be trigonometrically (trigonomically?) consistent. I.e., given nodes A, B and C, the distance between A and B might be 10, between A and C also 10, yet between B and C 100.
What I want to do is visualize the logical network layout in terms of connectness of nodes, i.e. include the logical distance between nodes in the visual.
So far my research has shown the multidimensional scaling (MDS) is designed for exactly this sort of thing. Given that my data can be directly expressed as a 2d distance matrix, it's even a simpler form of the more general MDS.
Now, there seem to be many MDS algorithms, see e.g. http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html and http://tapkee.lisitsyn.me/ . I need to do this in C++ and I'm hoping I can use a ready-made component, i.e. not have to re-implement an algo from a paper. So, I thought this: https://sites.google.com/site/simpmatrix/ would be the ticket. And it works, but:
The layout is not stable, i.e. every time the algorithm is re-run, the position of the nodes changes (see differences between image 1 and 2 below - this is from having been run twice, without any further changes). This is due to the initialization matrix (which contains the initial location of each node, which the algorithm then iteratively corrects) that is passed to this algorithm - I pass an empty one and then the implementation derives a random one. In general, the layout does approach the layout I expected from the given input data. Furthermore, between different runs, the direction of nodes (clockwise or counterclockwise) can change. See image 3 below.
The 'solution' I thought was obvious, was to pass a stable default initialization matrix. But when I put all nodes initially in the same place, they're not moved at all; when I put them on one axis (node 0 at 0,0 ; node 1 at 1,0 ; node 2 at 2,0 etc.), they are moved along that axis only. (see image 4 below). The relative distances between them are OK, though.
So it seems like this algorithm only changes distance between nodes, but doesn't change their location.
Thanks for reading this far - my questions are (I'd be happy to get just one or a few of them answered as each of them might give me a clue as to what direction to continue in):
Where can I find more information on the properties of each of the many MDS algorithms?
Is there an algorithm that derives the complete location of each node in a network, without having to pass an initial position for each node?
Is there a solid way to estimate the location of each point so that the algorithm can then correctly scale the distance between them? I have no geographic location of each of these nodes, that is the whole point of this exercise.
Are there any algorithms to keep the 'angle' at which the network is derived constant between runs?
If all else fails, my next option is going to be to use the algorithm I mentioned above, increase the number of iterations to keep the variability between runs at around a few pixels (I'd have to experiment with how many iterations that would take), then 'rotate' each node around node 0 to, for example, align nodes 0 and 1 on a horizontal line from left to right; that way, I would 'correct' the location of the points after their relative distances have been determined by the MDS algorithm. I would have to correct for the order of connected nodes (clockwise or counterclockwise) around each node as well. This might become hairy quite quickly.
Obviously I'd prefer a stable algorithmic solution - increasing iterations to smooth out the randomness is not very reliable.
Thanks.
EDIT: I was referred to cs.stackexchange.com and some comments have been made there; for algorithmic suggestions, please see https://cs.stackexchange.com/questions/18439/stable-multi-dimensional-scaling-algorithm .
Image 1 - with random initialization matrix:
Image 2 - after running with same input data, rotated when compared to 1:
Image 3 - same as previous 2, but nodes 1-3 are in another direction:
Image 4 - with the initial layout of the nodes on one line, their position on the y axis isn't changed:
Most scaling algorithms effectively set "springs" between nodes, where the resting length of the spring is the desired length of the edge. They then attempt to minimize the energy of the system of springs. When you initialize all the nodes on top of each other though, the amount of energy released when any one node is moved is the same in every direction. So the gradient of energy with respect to each node's position is zero, so the algorithm leaves the node where it is. Similarly if you start them all in a straight line, the gradient is always along that line, so the nodes are only ever moved along it.
(That's a flawed explanation in many respects, but it works for an intuition)
Try initializing the nodes to lie on the unit circle, on a grid or in any other fashion such that they aren't all co-linear. Assuming the library algorithm's update scheme is deterministic, that should give you reproducible visualizations and avoid degeneracy conditions.
If the library is non-deterministic, either find another library which is deterministic, or open up the source code and replace the randomness generator with a PRNG initialized with a fixed seed. I'd recommend the former option though, as other, more advanced libraries should allow you to set edges you want to "ignore" too.
I have read the codes of the "SimpleMatrix" MDS library and found that it use a random permutation matrix to decide the order of points. After fix the permutation order (just use srand(12345) instead of srand(time(0))), the result of the same data is unchanged.
Obviously there's no exact solution in general to this problem; with just 4 nodes ABCD and distances AB=BC=AC=AD=BD=1 CD=10 you cannot clearly draw a suitable 2D diagram (and not even a 3D one).
What those algorithms do is just placing springs between the nodes and then simulate a repulsion/attraction (depending on if the spring is shorter or longer than prescribed distance) probably also adding spatial friction to avoid resonance and explosion.
To keep a "stable" diagram just build a solution and then only update the distances, re-using the current position from previous solution as starting point. Picking two fixed nodes and aligning them seems a good idea to prevent a slow drift but I'd say that spring forces never end up creating a rotational momentum and thus I'd expect that just scaling and centering the solution should be enough anyway.

Number of tours through m x n grid?

Let T(x,y) be the number of tours over a X × Y grid such that:
the tour starts in the top left square
the tour consists of moves that are up, down, left, or right one square,
the tour visits each square exactly once, and
the tour ends in the bottom left square.
It’s easy to see, for example, that T(2,2) = 1, T(3,3) = 2, T(4,3) = 0, and T(3,4) = 4.
Write a program to calculate T(10,4).
I have been working on this for hours ... I need a program that takes the dimensions of the grid as input and returns the number of possible tours? Any idea on how I should go about solving this?
Since you're new to backtracking, this might give you an idea how you could solve this:
You need some data structure to represent the state of the cells on the grid (visited/not visited).
Your algorithm:
step(posx, posy, steps_left)
if it is not a valid position, or already visited
return
if it's the last step and you are at the target cell
you've found a solution, increment counter
return
mark cell as visited
for each possible direction:
step(posx_next, posy_next, steps_left-1)
mark cell as not visited
and run with
step(0, 0, sizex*sizey)
The basic building blocks of backtracking are: evaluation of the current state, marking, the recursive step and the unmarking.
This will work fine for small boards. The real fun starts with larger boards where you have to cut branches on the tree which aren't solvable (eg: there's an unreachable area of not visited cells).
The assigned exercise is a good one. It forces you to think through several concepts, step-by-step. I cannot think all the concepts through for you, but maybe I can help by asking the following question.
At some point, your program must represent a partially completed tour. That is, it must represent a path which does not yet pass through all the squares and has not yet reached its target in the bottom left, but which might do both if the path were later extended. How do you mean to represent a partially completed tour?
If you can answer the question, and if you grasp the concept of recursion, then one suspects that you can solve the problem with some work but without too much real trouble. To represent the partially completed tour is your obstacle, so my recommendation is that you go to work on that.
Update: See the comment of #KarolyHorvath below. If you have not yet learned the use of dynamically allocated memory (or, equivalently, of STL containers like std::vector and std::list), then you should rather follow his hint, which is a good hint in any case.

Simulating a car moving along a track

For Operating Systems class I'm going to write a scheduling simulator entitled "Jurrasic Park".
The ultimate goal is for me to have a series of cars following a set path and passengers waiting in line at a set location for those cars to return to so they can be picked up and be taken on the tour. This will be a simple 2d, top-down view of the track and the cars moving along it.
While I can code this easily without having to visually display anything I'm not quite sure what the best way would be to implement a car moving along a fixed track.
To start out, I'm going to simply use OpenGL to draw my cars as rectangles but I'm still a little confused about how to approach updating the car's position and ensuring it is moving along the set path for the simulated theme park.
Should I store vertices of the track in a list and have each call to update() move the cars a step closer to the next vertex?
If you want curved track, you can use splines, which are mathematically defined curves specified by two vector endpoints. You plop down the endpoints, and then solve for a nice curve between them. A search should reveal source code or math that you can derive into source code. The nice thing about this is that you can solve for the heading of your vehicle exactly, as well as get the next location on your path by doing a percentage calculation. The difficult thing is that you have to do a curve length calculation if you don't want the same number of steps between each set of endpoints.
An alternate approach is to use a hidden bitmap with the path drawn on it as a single pixel wide curve. You can find the next location in the path by matching the pixels surrounding your current location to a direction-of-travel vector, and then updating the vector with a delta function at each step. We used this approach for a path traveling prototype where a "vehicle" was being "driven" along various paths using a joystick, and it works okay until you have some intersections that confuse your vector calculations. But if it's a unidirectional closed loop, this would work just fine, and it's dead simple to implement. You can smooth out the heading angle of your vehicle by averaging the last few deltas. Also, each pixel becomes one "step", so your velocity control is easy.
In the former case, you can have specially tagged endpoints for start/stop locations or points of interest. In the latter, just use a different color pixel on the path for special nodes. In either case, what you display will probably not be the underlying path data, but some prettied up representation of your "park".
Just pick whatever is easiest, and write a tick() function that steps to the next path location and updates your vehicle heading whenever the car is in motion. If you're really clever, you can do some radius based collision handling so that cars will automatically stop when a car in front of them on the track has halted.
I would keep it simple:
Run a timer (every 100msec), and on each timer draw each ones of the cars in the new location. The location is read from a file, which contains the 2D coordinates of the car (each car?).
If you design the road to be very long (lets say, 30 seconds) writing 30*10 points would be... hard. So how about storing at the file the location at every full second? Then between those 2 intervals you will have 9 blind spots, just move the car in constant speed (x += dx/9, y+= dy/9).
I would like to hear a better approach :)
Well you could use some path as you describe, ether a fixed point path or spline. Then move as a fixed 'velocity' on this path. This may look stiff, if the car moves at the same spend on the straight as cornering.
So you could then have speeds for each path section, but you would need many speed set points, or blend the speeds, otherwise you'll get jerky speed changes.
Or you could go for full car simulation, and use an A* to build the optimal path. That's over kill but very cool.
If there is only going forward and backward, and you know that you want to go forward, you could just look at the cells around you, find the ones that are the color of the road and move so you stay in the center of the road.
If you assume that you won't have abrupt curves then you can assume that the road is directly in front of you and just scan to the left and right to see if the road curves a bit, to stay in the center, to cut down on processing.
There are other approaches that could work, but this one is simple, IMO, and allows you to have gentle curves in your road.
Another approach is just to have it be tile-based, so you just look at the tile before you, and have different tiles for changes in road direction an so you know how to turn the car to stay on the tile.
This wouldn't be as smooth but is also easy to do.