Data structure for Hamming cube

Data structure for Hamming cube - c++

I have a Hamming cube, of general dimension, but in practice, usually, the dimension ranges from 3 to 6.
The search algorithm is:
Input: any vertex, `v`.
Find all vertices that lie in Hamming distance 1 from `v`.
Find all vertices that lie in Hamming distance 2 from `v`.
...
I do not know in advance how far away from v will I need to go. I might stop at distance 1 for example.
For instance, given this cube:
and v = 100, I would need to the vertices at Hamming distance 1, which are 000, 101, 110 (at any order). Then, I might need to get those in distance 2, namely 111, 001, 010. In the unlikely event of needing the vertices at distance 3 too, I Would get 011 as well.
A vertex of the cube may contain IDs (integers).
Which would be an appropriate Data structure to store this cube and efficiently search it? I am not really interested in other operations.
I thought about sorting all the bit sequences somehow, so that I can easily access them, but didn't get anything to work.
My approach so far:
data-structureUse a hashtable (specifically std::unordered_map), where the keys are the vertices and the values are the IDs.
algorithmGiven a vertex v, generate all sequences of bits within Hamming distance t (i.e. 1, 2, ...).
However, this requires me to call a function every time a vertex varrives (which often happens). I have a function to achieve this, based on this.

I'm rusty with C++, so I'll keep this abstract.
The neighbors of a given point of your Hamming cube are easily computable. Given a vertex's bit sequence, flip each bit individually.
You could precompute that, though. You could cache the results of your neighbors() function, or you could save them to an array. Each vertex would have its own neighbors, so you have one array for each vertex. That gives you, essentially, your adjacency list.
With that adjacency list, you can search your Hamming cube using depth-limited search, a variant of DFS (or BFS, I guess—but space complexity is worse) that only goes k units deep.
Your data structure is a good choice, but consider that your vertices are binary strings, so they cover all points from 0 to 2^n - 1. You might as well just use an array—lookup will still be O(1), and it'll be more compact because there aren't unused buckets sitting around.

Related

Path finding on a large list of nodes? Around 100,000 nodes

I have a list of nodes as 2D coordinate (array of float) and the goal is to find how many nodes are linked to the source node(given).
Two nodes are defined as linked, if the distance between the nodes is less than or equal to 10. Also, if distance between A and B is <= 10, distance between B and C is <= 10 and distance between A and C > 10, even then, A and C are linked as then path would be is A->B->C. So, it is a typical graph search problem in theory.
Here is the problem. I have around 100,000 nodes in a list. Each node is a 2D coordinate point. Since, the list is enormous, conventional traversal and path finding algorithms like DFS or BFS would take up a O(n^2) to construct the adjacency list, which is not ideal and not what I am looking for.
I researched on the internet and found out that Quad Tree or kd Tree probably might be the best to implement in this case. I have made my own Quad Tree class also, I just don't understand how to implement a search algorithm like DFS on it. Or if there is something else that I am missing out on?

A quadtree groups points by splitting 2D space into quarters, either until each point has a quadrant to itself, or until you reach a minimum size, after which you lump all points within the quadrant into a list.
Since you're trying to find all points within a maximum distance of each point in your source list, you don't need to go all the way down to one-point-per-cell. To pick a cutoff, I would do performance tests on some different values, but as a starting point for minimum quadrant size the maximum connection distance for points is probably a good guess.
So now you have all of your points grouped into a tree and you need to know how to actually find nearby ones.
Since the quadtree encodes spatial information, to find points within a certain distance of any given point, you would descend the quadtree and use that spatial information to exclude entire quadrants from your search. To do this, you would check whether the nearest bound of each quadrant is beyond the maximum distance from the point you are searching from. If the closest edge of that quadrant is beyond the maximum distance, then none of the points in that quadrant can possibly be within the maximum distance, so there is no need to explore that part of the tree. (This is similar to how a binary search doesn't need to search parts of a sorted array or tree, because it knows that those parts cannot possibly contain the value being searched for).
Once you get down to the level of the quadtree where you have a single point or list of points, you would do a regular euclidean distance check with those points to see if they were actually within the maximum distance. (Don't forget to check for equality, otherwise you'll find the same point you're searching around).
So, for example, if you were searching for points near one of the points in the bottom-right corner of this image, there would be no need to search the other three top-level quadrants because all three of them would be beyond the maximum distance. This would save you from exploring all of the sub-quadrants in those parts of the tree and avoid doing distance comparisons against all of those points.
If, however, you are searching for a point near the edge of a quadrant, you do need to check neighboring quadrants, because the nearest bound will be close enough that you cannot exclude the possibility of a valid point being in that quadrant.
In your particular case, you would make use of this by building the quadtree once, and then looping over the original list of points and doing the search I described above to find all points near that point. You would then use the found-points to build a connectivity graph, which could be efficiently traversed by Depth/Breadth-First-Search or could be given edge-weights to be used with a more complex, weighted search like Dijkstra's Algorithm or A*.

what kind of algorithm for generating height-map from contour line?

I'm looking for interpolating some contour lines to generating a 3D view. The contours are not stored in a picture, coordinates of each point of the contour are simply stored in a std::vector.
for convex contours :
, it seems (I didn't check by myself) that the height can be easily calculates (linear interpolation) by using the distance between the two closest points of the two closest contours.
my contours are not necessarily convex :
, so it's more tricky... actualy I don't have any idea what kind of algorithm I can use.
UPDATE : 26 Nov. 2013
I finished to write a Discrete Laplace example :
you can get the code here

What you have is basically the classical Dirichlet problem:
Given the values of a function on the boundary of a region of space, assign values to the function in the interior of the region so that it satisfies a specific equation (such as Laplace's equation, which essentially requires the function to have no arbitrary "bumps") everywhere in the interior.
There are many ways to calculate approximate solutions to the Dirichlet problem. A simple approach, which should be well suited to your problem, is to start by discretizing the system; that is, you take a finite grid of height values, assign fixed values to those points that lie on a contour line, and then solve a discretized version of Laplace's equation for the remaining points.
Now, what Laplace's equation actually specifies, in plain terms, is that every point should have a value equal to the average of its neighbors. In the mathematical formulation of the equation, we require this to hold true in the limit as the radius of the neighborhood tends towards zero, but since we're actually working on a finite lattice, we just need to pick a suitable fixed neighborhood. A few reasonable choices of neighborhoods include:
the four orthogonally adjacent points surrounding the center point (a.k.a. the von Neumann neighborhood),
the eight orthogonally and diagonally adjacent grid points (a.k.a. the Moore neigborhood), or
the eight orthogonally and diagonally adjacent grid points, weighted so that the orthogonally adjacent points are counted twice (essentially the sum or average of the above two choices).
(Out of the choices above, the last one generally produces the nicest results, since it most closely approximates a Gaussian kernel, but the first two are often almost as good, and may be faster to calculate.)
Once you've picked a neighborhood and defined the fixed boundary points, it's time to compute the solution. For this, you basically have two choices:
Define a system of linear equations, one per each (unconstrained) grid point, stating that the value at each point is the average of its neighbors, and solve it. This is generally the most efficient approach, if you have access to a good sparse linear system solver, but writing one from scratch may be challenging.
Use an iterative method, where you first assign an arbitrary initial guess to each unconstrained grid point (e.g. using linear interpolation, as you suggest) and then loop over the grid, replacing the value at each point with the average of its neighbors. Then keep repeating this until the values stop changing (much).

You can generate the Constrained Delaunay Triangulation of the vertices and line segments describing the contours, then use the height defined at each vertex as a Z coordinate.
The resulting triangulation can then be rendered like any other triangle soup.
Despite the name, you can use TetGen to generate the triangulations, though it takes a bit of work to set up.

Partition an n-dimensional "square" space into cubes

right now I am stuck solving the following "semi"-mathematical Problem.
I would like to partition an n-dimensinal restricted space (a hypercube to be precise)
D={(x_1, ...,x_n), x_i \in IR and -limits<=x_i<=limits \forall i<=n} Into smaller cubes.
Meaning I would like to specify n,limits,m where m would be the number of partitions per side of the cube - 2*limits/m would be the length of the small cubes and I would get m^n such cubes.
Now I would like to return a vector of vectors containing some distinct coordinates of these small cubes. (or perhaps one could represent the cubes as objects which are characterized by a vector pointing to the "left" outer corner ? )
Basically I have no idea whether something like that is even doable using C++. Implementing this for fixed n does not pose a problem. But I would like to enable the user to have free choice of the dimension.
Background: Something like that would be priceless in optimization. Where one would partition the space into smaller ones and use e.g. a genetic algorithms on each of the subspaces and later compare the results. Thus huge initial Populations could be avoided and the search results drastically improved.
Also I am just curious whether sth. like that is doable :)
My Suggestion: Use B+ Trees ?

Let m be the number of partitions per dimension, i.e. per edge, of the hypercube D.
Then there are m^n different subspaces S of D, like you say. Let the subspaces S be uniquely represented by integer coordinates S=[y_1,y_2,...,y_n] where the y_i are integers in the range 1, ..., m. In Cartesian coordinates, then, S=(x_1,x_2,...,x_n) where Delta*(y_i-1)-limits <= x_i < Delta*y_i-limits, and Delta=2*limits/m.
The "left outer corner" or origin of S you were looking for is just the point corresponding to the smallest x_i, i.e. the point (Delta*(y_1-1)-limits, ..., Delta*(y_n-1)-limits). Instead of representing the different S by this point, it makes a lot more sense (and will be faster in a computer) to represent them using the integer coordinates above.

Given an even number of vertices, how to find an optimum set of pairs based on proximity?

The problem:
We have a set of n vertices in 3D euclidean space, and there is an even number of these vertices.
We want to pair them up based on their proximity. In other words, we'd like to be able to find a set of vertex pairs, where the vertices in each pair are as close as possible together.
We want to minimise sacrificing the proximity between the vertices of any other pairs as much as possible in doing this.
I am not looking for the most optimal solution (if it even strictly exists/can be done), just a reasonable one that can be computed relatively quickly.
A relatively awful brute force approach involves choosing a vertex and looping through the rest to find its nearest neighbor and then repeating until there are none left. Of course as we near the end of the list the closest vertex could be very far away, but it is the only choice, therefore this can fail badly on the third point above.

A common approach for this kind of problems (especially if n is large) is to precompute a spatial index structure, such as a kd tree or an octtree and perform the search for nearest neighbors with the help of it. Through the nodes of the octtree, the available point are put into bins, so you can be sure they are mutually close. Also you minimize the number of comparisons.
A sketch of the implementation with an octtree: you need a Node class that stores its bounding box. A derived LeafNode class stores small number of points up to a maximum (e.g. k = 20), that are added with an insert function. A derived NonLeafNode class stores references to 8 subnodes (which may be both Leaf and NonLeafNodes).
The tree is represented by a root node, all insertions and queries start here. The tree is built up by starting with the first k points being inserted into a LeafNode. If the k+1st point is inserted, the bounding box is split into 8 sub boxes and the contained points are sorted into them. The current LeafNode is replaced by one NonLeafNode with 8 subnodes.
This is iterated until all points are in the tree.
For nearest neighbor searches, the tree is traversed starting from the root node by comparing with the bounding box. If the query point is within a node's bounding box, the traversal goes into that node. Note that if you found the nearest candidate, you also need to check with neighboring nodes in the octtree.
For a kdtree implementation check the wikipedia page, looks quite straigthforward.

Since you are not looking for an optimal solution, here's a heuristic you may consider.
For each point p compute two points: the nearest neighbour and the farthest neighbour that are closest and farthest to p respectively. Now let q be the point with the largest farthest neighbour (q is an extreme point in the input). Match q with its nearest neighbour, delete both of them and recursively compute the matching for the remaining points.
This is certainly NOT optimal, but it does seem to do reasonably well on small input sets. If you need an optimal solution you should read about the euclidean matching problem.

Finding edge in weighted graph

I have a graph with four nodes, each node represents a position and they are laid out like a two dimensional grid. Every node has a connection (an edge) to all (according to the position) adjacent nodes. Every edge also has a weight.
Here are the nodes represented by A,B,C,D and the weight of the edges is indicated by the numbers:
A 100 B
120 220
C 150 D
I want to structure a container and an algorithm that switches the nodes sharing the edge with the highest weight. Then reset the weight of that edge. No node (position) can be switched more than once each time the algorithm is executed.
For example, processing the above, the highest weight is on edge BD, so we switch those. Since no node can be switched more than once, all edges involved in either B or D is reset.
A D
120
C B
Then, the next highest weight is on the only edge left, switching those would give us the final layout: C,D,A,B.
I'm currently running a quite awful implementation of this. I store a long list of edges, holding four values for the nodes they are (potentially) connected to, a value for its weight and the position for the node itself. Every time anything is requested, I loop through the entire list.
I'm writing this in C++, could some parts of the STL help speed this up? Also, how to avoid the duplication of data? A node position is currently in five objects. The node itself that is there and the four nodes indicating a connection to it.
In short, I want help with:
Can this be structured in a way so that there is no data duplication?
Recognise the problem? If any of this has a name, tell me so I can google for more info on the subject.
Fast algorithms are always nice.

As for names, this is a vertex cover problem. Optimal vertex cover is NP-hard with decent approximation solutions, but your problem is simpler. You're looking at a pseudo-maximum under a tighter edge selection criterion. Specifically, once an edge is selected every connected edge is removed (representing the removal of vertices to be swapped).
For example, here's a standard greedy approach:
0) sort the edges; retain adjacency information
while edges remain:
1) select the highest edge
2) remove all adjacent edges from the list
endwhile
The list of edges selected gives you the vertices to swap.
Time complexity is O(Sorting vertices + linear pass over vertices), which in general will boil down to O(sorting vertices), which will likely by O(V*log(V)).
The method of retaining adjacency information depends on the graph properties; see your friendly local algorithms text. Feel free to start with an adjacency matrix for simplicity.
As with the adjacency information, most other speed improvements will apply best to graphs of a certain shape but come with a tradeoff of time versus space complexity.
For example, your problem statement seems to imply that the vertices are laid out in a square pattern, from which we could derive many interesting properties. For example, that system is very easily parallelized. Also, the adjacency information would be highly regular but sparse at large graph sizes (most vertices wouldn't be connected to each other). This makes the adjacency matrix give a high overhead; you could instead store adjacency in an array of 4-tuples as it would retain fast access but almost entirely eliminate overhead.

If you have bigger graphs look into the boost graph library. It gives you good data structures for graphs and basic iterators for different types of graph traversing

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js