C++ pathfinding with a-star, optimization - c++

Im wondering if I can optimize my pathfinding code a bit, lets look at this map:
+ - wall, . - free, S - start, F - finish
.S.............
...............
..........+++..
..........+F+..
..........+++..
...............
The human will look at it and say its impossible, becouse finish is surrounded... But A-star MUST check all fields to ascertain, that there isnt possible road. Well, its not a problem with small maps. But when I have 256x265 map, it takes a lot of time to check all points. I think that i can stop searching while there are closed nodes arround the finish, i mean:
+ - wall, . - free, S - start, F - finish, X - closed node
.S.............
.........XXXXX.
.........X+++X.
.........X+F+X.
.........X+++X.
.........XXXXX.
And I want to finish in this situation (There is no entrance to "room" with finish). I thought to check h, and while none of open nodes is getting closer, then to finish... But im not sure if its ok, maybe there is any better way?
Thanx for any replies.

First of all this problem is better solved with breadth-first search, but I will assume you have a good reason to use a-star instead. However I still recommend you first check the connectivity between S and F with some kind of search(Breadth-first or depth-first search). This will solve our issue.

Assuming the map doesn't change, you can preprocess it by dividing it to connected components. It can be done with a fast disjoint set data structure. Then before launching A* you check in constant time that the source and destination belong to the same component. If not—no path exists, otherwie you run A* to find the path.
The downside is that you will need additional n-bits per cell where n = ceil(log C) for C being the number of connected components. If you have enough memory and can afford it then it's OK.
Edit: in case you fix n being small (e.g. one byte) and have more than that number of components (e.g. more than 256 for 8-bit n) then you can assign the same number to multiple components. To achieve best results make sure each component-id has nearly the same number of cells assigned to it.

Related

Memory Error when using combination and intersection in python

I am coding for large graph sampling and meet some memory problems.
possible_edges = set(itertools.combinations(list(sampled_nodes), 2))
sampled_graph = list(possible_edges.intersection(ori_edges))
The code is supposed to find all combinations of nodes in sampled_nodes, which provided all possible edges formed by these nodes. Then take the intersection with original_edges to find which edge exactly exists.
The problem is when the graph is enormous, the itertools.combinations function would cause memory error.
I've thought to write for loop to iteratively calculate the intersection but takes too much time.
Any help from you guys would be appreciated. Thank you!
Found one solution, instead of take intersection of 2 lists, I choose not to create all combinations for possible_edges.
I pick edges in ori_edges to see if both of the node exists in sampled_nodes.
sampled_graph = list(set([(e[0], e[1]) for e in ori_edges if int(e[0]) in result_nodes and int(e[1]) in result_nodes]))
This code won't create the huge list and the speed is acceptable.

How to normalize sequence of numbers?

I am working user behavior project. Based on user interaction I have got some data. There is nice sequence which smoothly increases and decreases over the time. But there are little discrepancies, which are very bad. Please refer to graph below:
You can also find data here:
2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2.42068 2.43947 2.45099 2.46564 2.48385 2.49747 2.49031 2.51458 2.5149 2.52632 2.54689 2.56077 2.57821 2.57877 2.59104 2.57625 2.55987 2.5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.36415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1.83539 1.80641 1.77946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1.52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.60095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.84942 1.87731
1.89895 1.91676 1.92987
I would want to smooth out this sequence. The technique should be able to eliminate numbers with characteristic of X and Y, i.e. error in mono-increasing or mono-decreasing.
If not eliminate, technique should be able to shift them so that series is not affected by errors.
What I have tried and failed:
I tried to test difference between values. In some special cases it works, but for sequence as presented in this the distance between numbers is not such that I can cut out errors
I tried applying a counter, which is some X, then only change is accepted otherwise point is mapped to previous point only. Here I have great trouble deciding on value of X, because this is based on user-interaction, I am not really controller of it. If user interaction is such that its plot would be a zigzag pattern, I am ending up with 'no user movement data detected at all' situation.
Please share the techniques that you are aware of.
PS: Data made available in this example is a particular case. There is no typical pattern in which numbers are going to occure, but we expect some range to be continuous with all the examples. Solution I am seeking is generic.
I do not know how much effort you want to involve in this problem but if you want theoretical guaranties,
topological persistence seems well adapted to your problem imho.
Basically with that method, you can filtrate local maximum/minimum by fixing a scale
and there are theoritical proofs that says that if you sampling is
close from your function, then you extracts correct number of maximums with persistence.
You can see these slides (mainly pages 7-9 to get the idea) to get an idea of the method.
Basically, if you take your points as a landscape and imagine a watershed starting from maximum height and decreasing, you have some picks.
Every pick has a time where it is born which is the time where it becomes emerged and a time where it dies which is when it merges with an higher pick. Now a persistence diagram pictures a point for every pick where its x/y coordinates are its time of birth/death (by assumption the first pick does not die and is not shown).
If a pick is a global maximal, then it will be further from the diagonal in the persistence diagram than a local maximum pick. To remove local maximums you have to remove picks close to the diagonal. There are fours local maximums in your example as you can see with the persistence diagram of your data (thanks for providing the data btw) and two global ones (the first pick is not pictured in a persistence diagram):
If you noise your data like that :
You will still get a very decent persistence diagram that will allow you to filter local maximum as you want :
Please ask if you want more details or references.
Since you can not decide on a cut off frequency, and not even on the filter you want to use, I would implement several, and let the user set the parameters.
The first thing that I thought of is running average, and you can see that there are so many things to set, to get different outputs.

Partition in Equivalence sets

I have a simple, non-dirictional tree T. I should find a path named A and another named B that A and B have no common vertex. The perpuse is to maxmize the Len(A)*Len(B).
I figured this problem is similer to Partition Problem, except in Partition Problem you have a set but here you have a Equivalence set. The solution is to find two uncrossed path that Len(A) ~ Len(B) ~ [n-1/2]. Is this correnct? how should I impliment such algorithm?
First of all. I Think you are looking at this problem the wrong way. As I understand it you have a graph related problem. What you do is
Build a maximum spanning tree and find the length L.
Now, you say that the two paths can't have any vertex in common, so we have to remove an edge to archieve this. I assume that every edge wheight in your graph is 1. So sum of the two paths A and B are L-1 after you removed an edge. The problem is now that you have to remove an edge such that the product of len(a) and len(b) is maximized. You do that by removeing the edge in et most 'middel' of L. Why, the problem is of the same as optimizing the area of a rectangle with a fixed perimeter. A short youtube video on the subject can be found here.
Note if your edge wheights are not equal to 1, then you have a harder problem, because there may exist more than one maximum spanning tree. But you may be able to split them in different ways, if this is the case, write me back, then I will think about a solution, but i do not have one at hand.
Best of luck.
I think there is a dynamic programming solution that is just about tractable if path length is just the number of links in the paths (so links don't have weights).
Work from the leaves up. At each node you need to keep track of the best pair of solutions confined to the subtree with root at that node, and, for each k, the best solution with a path of length k terminating in that node and a second path of maximum length somewhere below that node and not touching the path.
Given this info for all descendants of a node, you can produce similar info for that node, and so work your way up to the route.
You can see that this amount of information is required if you consider a tree that is in fact just a line of nodes. The best solution for a line of nodes is to split it in two, so if you have only worked out the best solution when the line is of length 2n + 1 you won't have the building blocks you need for a line of length 2n + 3.

Best way to model music (notes) for fast searching notes at a particular time

I'm working on an iOS music app (written in C++) and my model looks more or less like this:
--Song
----Track
----Track
------Pattern
------Pattern
--------Note
--------Note
--------Note
So basically a Song has multiple Tracks, a Track can have multiple Patterns and a Pattern has multiple Notes. Each one of those things is represented by a class and except for the Song object, they're all stored inside vectors.
Each Note has a "frame" parameter so that I can calculate when a note should be played. For example, if I have 44100 samples / second and the frame for a particular note is 132300 I know that I need that Note at the start of the third second.
My question is how I should represent those notes for best performance? Right now I'm thinking of storing the notes in a vector datamember of each pattern and than loop all the Tracks of the Song, than look the Patterns and than loop the Notes to see which one has a frame datamember that is greater than 132300 and smaller than 176400 (start of 4th second).
As you can tell, that's a lot of loops and a song could be as long as 10 minutes. So I'm wondering if this will be fast enough to calculate all the frames and send them to the buffer on time.
One thing you should remember is that to improve performance, normally memory consumption would have to increase. It is also relevant (and justified) in this case, because I believe you want to store the same data twice, in different ways.
First of all, you should have this basic structure for a song:
map<Track, vector<Pattern>> tracks;
It maps each Track to a vector of Patterns. Map is fine, because you don't care about the order of tracks.
Traversing through Tracks and Patterns should be fast, as their amounts will not be high (I assume). The main performance concern is to loop through thousands of notes. Here's how I suggest to solve it:
First of all, for each Pattern object you should have a vector<Note> as your main data storage. You will write all the changes on the Pattern's contents to this vector<Note> first.
vector<Note> notes;
And for performance considerations, you can have a second way of storing notes:
map<int, vector<Notes>> measures;
This one will map each measure (by its number) in a Pattern to the vector of Notes contained in this measure. Every time data changes in the main notes storage, you will apply the same changes to data in measures. You could also do it only once every time before the playback, or even while playback, in a separate thread.
Of course, you could only store notes in measures, without having to sync two sources of data. But it may be not so convenient to work with when you have to apply mass operations on bunches of notes.
During the playback, before the next measure starts, the following algorithm would happen (roughly):
In every track, find all patterns, for which pattern->startTime <= [current playback second] <= pattern->endTime.
For each pattern, calculate current measure number and get vector<Notes> for the corresponding measure from the measures map.
Now, until the next measure (second?) starts, you only have to loop through current measure's notes.
Just keep those vectors sorted.
During playback, you can just keep a pointer (index) into each vector for the last note player. To search for new notes, you check have to check the following note in each vector, no looping through notes required.
Keep your vectors sorted, and try things out - that is more important and any answer you can receive here.
For all of your questions you should seek to answer then with tests and prototypes, then you will know if you even have a problem. And also while trying it out you will see things that you wouldn't normally see with just the theory alone.
and my model looks more or less like this:
Several critically important concepts are missing from your model:
Tempo.
Dynamics.
Pedal
Instrument
Time signature.
(Optional) Tonality.
Effect (Reverberation/chorus, pitch wheel).
Stereo positioning.
Lyrics.
Chord maps.
Composer information/Title.
Each Note has a "frame" parameter so that I can calculate when a note should be played.
Several critically important concepts are missing from your model:
Articulation.
Aftertouch.
Note duration.
I'd advise to take a look at lilypond. It is typesetting software, but it is also one of the most precise way to represent music in human-readable text format.
My question is how I should represent those notes for best performance?
Put them all into std::map<Timestamp, Note> and find segment you want to playing using lower_bound/upper_bound. Alternatively you could binary search them in flat std::vector as long as data is sorted.
Unless you want to make a "beeper", making music application is much more difficult than you think. I'd strongly recommend to try another project.

The New Villa Acm solution strategy

I am trying to solve this ACM problem The New Villa
and i am not figuring out how to approach this problem definitely its graph problem but doors and the room that have switches to other rooms are very confusing to make a generic solution. Can some body help me in defining the strategy for this problem.
Also i want some discussion forum for ACM problems if you know any one then please share.
Thanks
A.S
It seems like a pathfinding problem on states.
You can represent each vertex with a binary vector of size n + an indentifier - where which room you are in at the moment [n is the number of rooms].
G=(V,E) where V = {all binary vectors of size n and a recored for which room you are in} and E = {(u,v) | you can switch from binary vector u to v by clicking a button in the room you are in, or move to adjacent lights on room }
Now you only need to run a search algorithm on the possible paths.
Possible search algorithms:
BFS - simplest to program, though slowest run time
bi - directional BFS - since there is only one target node,
a bi-directional search will work here, it is expected to be much
faster then BFS
A* - find an admissible heurstic function and run
informed A* on the problem. It is harder to program it then the rest - but if you find a good heurisitc, it will most likely perform much better.
(*) All of the above are both complete [will find a solution if one exists] and optimal [will find the shortest solution, if one exists]
(*) This solution runs in exponential time on the number of rooms, but it should end up for d <= 10 as indicated in the problem in reasonable time.