My question here is what data structure should I use to distribute the work to each threads and get the calculated value from them. First thing in my mind is fill vector[0] .. vector[63999] (for 800x800 pixel) with struct that holds x,y and iterate_value. Pass those vector to each node -> then further divide the given vector to each core(Os-thread) -> then further divide the given vector to each thread. Is there any other possible way to send and received the values? and also if I do it in vector way should I pass the vector by pass by value or pass by reference, which one would be better in this case ?
Different points of the mandelbrot set take varying amounts of time to compute (points near the edge are more expensive), so giving each worker an even number of pixels will have some of them finishing faster than others.
Break the image into small rectangles (tiles). Create a work list using a multithreaded queue, and fill it with the tiles. Each worker thread loops, picking a tile off the work list and submitting the results, until the work list is empty.
Pixels are evenly spaced, so why send the coordinates for each one? Just tell each node the x and y coordinates of its lower left pixel, the spacing between pixels, and the number of pixels. This way, your work unit specification is a small constant size.
As far as the larger design goes, there is no point in having more worker threads than physical cores to run on. The context switches of multiple threads per core only reduces performance.
Related
I have a grayscale texture (8000*8000) , the value of each pixel is an ID (actually, this ID is the ID of triangle to which the fragment belongs, I want to using this method to calculate how many triangles and which triangles are visible in my scene).
now I need to count how many unique IDs there are and what are them. I want to implement this with GLSL and minimize the data transfer between GPU RAM and RAM.
The initial idea I come up with is to use a shader storage buffer, bind it to an array in GLSL, its size is totalTriangleNum, then iterate through the ID texture in shader, increase the array element by 1 that have index equal to ID in texture.
After that, read the buffer to OpenGL application and get what I want. Is this a efficient way to do so? Or are there some better solutions like compute-shader (well I'm not familiar with it) or something else.
I want to using this method to calculate how many triangles and which triangles are visible in my scene)
Given your description of your data let me rephrase that a bit:
You want to determine how many distinct values there are in your dataset, and how often each value appears.
This is commonly known as a Histogram. Unfortunately (for you) generating histograms are among the problems not that trivially solved on GPUs. Essentially you have to divide down your image into smaller and smaller subimages (BSP, quadtree, etc.) until divided down to single pixels on which you perform the evaluation. Then you backtrack propagating up the sub-histograms, essentially performing an insertion or merge sort on the histogram.
Generating histograms with GPUs is still actively researched, so I suggest you read up on the published academic works (usually accompanied with source code). Keywords: Histogram, GPU
This one is a nice paper done by the AMD GPU researchers: https://developer.amd.com/wordpress/media/2012/10/GPUHistogramGeneration_preprint.pdf
I have a wireless mesh network of nodes, each of which is capable of reporting its 'distance' to its neighbors, measured in (simplified) signal strength to them. The nodes are geographically in 3d space but because of radio interference, the distance between nodes need not be trigonometrically (trigonomically?) consistent. I.e., given nodes A, B and C, the distance between A and B might be 10, between A and C also 10, yet between B and C 100.
What I want to do is visualize the logical network layout in terms of connectness of nodes, i.e. include the logical distance between nodes in the visual.
So far my research has shown the multidimensional scaling (MDS) is designed for exactly this sort of thing. Given that my data can be directly expressed as a 2d distance matrix, it's even a simpler form of the more general MDS.
Now, there seem to be many MDS algorithms, see e.g. http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html and http://tapkee.lisitsyn.me/ . I need to do this in C++ and I'm hoping I can use a ready-made component, i.e. not have to re-implement an algo from a paper. So, I thought this: https://sites.google.com/site/simpmatrix/ would be the ticket. And it works, but:
The layout is not stable, i.e. every time the algorithm is re-run, the position of the nodes changes (see differences between image 1 and 2 below - this is from having been run twice, without any further changes). This is due to the initialization matrix (which contains the initial location of each node, which the algorithm then iteratively corrects) that is passed to this algorithm - I pass an empty one and then the implementation derives a random one. In general, the layout does approach the layout I expected from the given input data. Furthermore, between different runs, the direction of nodes (clockwise or counterclockwise) can change. See image 3 below.
The 'solution' I thought was obvious, was to pass a stable default initialization matrix. But when I put all nodes initially in the same place, they're not moved at all; when I put them on one axis (node 0 at 0,0 ; node 1 at 1,0 ; node 2 at 2,0 etc.), they are moved along that axis only. (see image 4 below). The relative distances between them are OK, though.
So it seems like this algorithm only changes distance between nodes, but doesn't change their location.
Thanks for reading this far - my questions are (I'd be happy to get just one or a few of them answered as each of them might give me a clue as to what direction to continue in):
Where can I find more information on the properties of each of the many MDS algorithms?
Is there an algorithm that derives the complete location of each node in a network, without having to pass an initial position for each node?
Is there a solid way to estimate the location of each point so that the algorithm can then correctly scale the distance between them? I have no geographic location of each of these nodes, that is the whole point of this exercise.
Are there any algorithms to keep the 'angle' at which the network is derived constant between runs?
If all else fails, my next option is going to be to use the algorithm I mentioned above, increase the number of iterations to keep the variability between runs at around a few pixels (I'd have to experiment with how many iterations that would take), then 'rotate' each node around node 0 to, for example, align nodes 0 and 1 on a horizontal line from left to right; that way, I would 'correct' the location of the points after their relative distances have been determined by the MDS algorithm. I would have to correct for the order of connected nodes (clockwise or counterclockwise) around each node as well. This might become hairy quite quickly.
Obviously I'd prefer a stable algorithmic solution - increasing iterations to smooth out the randomness is not very reliable.
Thanks.
EDIT: I was referred to cs.stackexchange.com and some comments have been made there; for algorithmic suggestions, please see https://cs.stackexchange.com/questions/18439/stable-multi-dimensional-scaling-algorithm .
Image 1 - with random initialization matrix:
Image 2 - after running with same input data, rotated when compared to 1:
Image 3 - same as previous 2, but nodes 1-3 are in another direction:
Image 4 - with the initial layout of the nodes on one line, their position on the y axis isn't changed:
Most scaling algorithms effectively set "springs" between nodes, where the resting length of the spring is the desired length of the edge. They then attempt to minimize the energy of the system of springs. When you initialize all the nodes on top of each other though, the amount of energy released when any one node is moved is the same in every direction. So the gradient of energy with respect to each node's position is zero, so the algorithm leaves the node where it is. Similarly if you start them all in a straight line, the gradient is always along that line, so the nodes are only ever moved along it.
(That's a flawed explanation in many respects, but it works for an intuition)
Try initializing the nodes to lie on the unit circle, on a grid or in any other fashion such that they aren't all co-linear. Assuming the library algorithm's update scheme is deterministic, that should give you reproducible visualizations and avoid degeneracy conditions.
If the library is non-deterministic, either find another library which is deterministic, or open up the source code and replace the randomness generator with a PRNG initialized with a fixed seed. I'd recommend the former option though, as other, more advanced libraries should allow you to set edges you want to "ignore" too.
I have read the codes of the "SimpleMatrix" MDS library and found that it use a random permutation matrix to decide the order of points. After fix the permutation order (just use srand(12345) instead of srand(time(0))), the result of the same data is unchanged.
Obviously there's no exact solution in general to this problem; with just 4 nodes ABCD and distances AB=BC=AC=AD=BD=1 CD=10 you cannot clearly draw a suitable 2D diagram (and not even a 3D one).
What those algorithms do is just placing springs between the nodes and then simulate a repulsion/attraction (depending on if the spring is shorter or longer than prescribed distance) probably also adding spatial friction to avoid resonance and explosion.
To keep a "stable" diagram just build a solution and then only update the distances, re-using the current position from previous solution as starting point. Picking two fixed nodes and aligning them seems a good idea to prevent a slow drift but I'd say that spring forces never end up creating a rotational momentum and thus I'd expect that just scaling and centering the solution should be enough anyway.
I have file with table containing 23 millions records the following form {atomName, x, y, z, transparence}. For solutions I decided to use OpenGL.
My task to render it. In first iteration, I used block "glBegin/glEnd" and have drawed every atom as point some color. This solution worked. But I got 0.002 fps.
Then i tried using VBO. I formed three buffers: vertex, color and indexes. This solution worked. I got 60 fps, but i have not comfortable binding buffers and i am drawing points, not spheres.
Then i read about VAO, which can simplify binding buffers. Ok, it is worked. I got comfortable binding.
Now i want to draw spheres, not points. I thought, to form relative to each point of the set of vertices on which it will be possible to build a sphere (with some accuracy). But if i have 23 million vertices, i must calculate yet ~12 or more vertices relaty every point. 23 000 000 * 4 (float) = 1 Gb data, perhaps it not good solution.
What is the best next move i should do? I can not fully understand, applicable shaders in this task or exist other ways.
About your drawing process
My task to render it. In first iteration, I used block "glBegin/glEnd" and have drawed every atom as point some color. This solution worked. But I got 0.002 fps.
Think about it: For every of your 23 million records you make at least one function call directly (glVertex) and probably several function calls implicitly by that. Even worse, glVertex likely causes a context switch. What this means is, that your CPU hits several speed bumps for every vertex it has to processes. A top notch CPU these days has a clock rate of about 3 GHz and a pipeline length in the order of 10 instructions. When you make a context switch that pipeline gets stalled, in the worst case it then takes one pipeline length to actually process one single instruction. Lets consider that you have to perform at least 1000 instructions for processing a single glVertex call (which is actually a rather optimistic estimation). That alone means, that you're limited to process at most 3 million vertices per second. So at 23 million vertices that's already less than one FPS then.
But you also got context switches in there, which add a further penality. And probably a lot of branching which create further pipeline flushes.
And that's just the glVertex call. You also have colors in there.
And you wonder that immediate mode is slow?
Of course it's slow. Using the Immediate Mode has been discouraged for well over 15 years. Vertex Arrays are available since OpenGL-1.1.
This solution worked. I got 60 fps,
Yes, because all the data resides on the GPU's own memory now. GPUs are massively parallel and optimized to crunch this kind of data and doing the operations they do.
but i have not comfortable binding buffers
Well, OpenGL is not a high level scene graph library. It's a mid to low level drawing API. You use it like a sophisticated pencil to draw on a digital canvas.
Then i read about VAO
Well, VAOs are meant to coalesce buffer objects that belong together so it makes sense using them.
Now i want to draw spheres, not points.
You have two options:
Using point sprite textures. This means that your points will get area when drawn, and that area gets a texture applied. I think this is the best method for you. Given the right shader you can even give your point sprite the right kind of depth values, so that your "spheres" will actually intersect like spheres in the depth buffer.
The other option is using instancing a single sphere geometry, using your atom records as control data for the instancing process. This would then process real sphere geometry. However I fear that implementing an instanced drawing process might be a bit too advanced for your skill level at the moment.
About drawing 23 million points
Seriously what kind of display do you have available, that you can draw 23 million, distinguishable points? Your typical computer screen will have some about 2000×1500 points. The highest resolution displays you can buy these days have about 4k×2.5k pixels, i.e. 10 million individual pixels. Let's assume your atoms are evenly distributed in a plane: At 23 million atoms to draw each pixel will get several times overdrawn. You simply can't display 23 million individual atoms that way. Another way to look at this is, that the display's pixel grid implies a spatial sampling and you can't reproduce anything smaller than twice the average sampling distance (sampling theorem).
So it absolutely makes sense to draw only a subset of the data, namely the subset that's actually in view. Also if you're zoomed very far out (i.e. you have the full dataset in view) it makes sense to coalesce atoms closeby.
It definitely makes sense to sort your data into a spatial subdivision structure. In your case I think an octree would be a good choice.
I'm trying to determine from a large set of positions how to narrow my list down significantly.
Right now I have around 3000 positions (x, y, z) and I want to basically keep the positions that are furthest apart from each other (I don't need to keep 100 positions that are all within a 2 yard radius from each other).
Besides doing a brute force method and literally doing 3000^2 comparisons, does anyone have any ideas how I can narrow this list down further?
I'm a bit confused on how I should approach this from a math perspective.
Well, I can't remember the name for this algorithm, but I'll tell you a fun technique for handling this. I'll assume that there is a semi-random scattering of points in a 3D environment.
Simple Version: Divide and Conquer
Divide your space into a 3D grid of cubes. Each cube will be X yards on each side.
Declare a multi-dimensional array [x,y,z] such that you have an element for each cube in your grid.
Every element of the array should either be a vertex or reference to a vertex (x,y,z) structure, and each should default to NULL
Iterate through each vertex in your dataset, determine which cube the vertex falls in.
How? Well, you might assume that the (5.5, 8.2, 9.1) vertex belongs in MyCubes[5,8,9], assuming X (cube-side-length) is of size 1. Note: I just truncated the decimals/floats to determine which cube.
Check to see if that relevant cube is already taken by a vertex. Check: If MyCubes[5,8,9] == NULL then (inject my vertex) else (do nothing, toss it out! spot taken, buddy)
Let's save some memory
This will give you a nicely simplified dataset in one pass, but at the cost of a potentially large amount of memory.
So, how do you do it without using too much memory?
I'd use a hashtable such that my key is the Grid-Cube coordinate (5,8,9) in my sample above.
If MyHashTable.contains({5,8,9}) then DoNothing else InsertCurrentVertex(...)
Now, you will have a one-pass solution with minimal memory usage (no gigantic array with a potentially large number of empty cubes. What is the cost? Well, the programming time to setup your structure/class so that you can perform the .contains action in a HashTable (or your language-equivalent)
Hey, my results are chunky!
That's right, because we took the first result that fit in any cube. On average, we will have achieved X-separation between vertices, but as you can figure out by now, some vertices will still be close to one another (at the edges of the cubes).
So, how do we handle it? Well, let's go back to the array method at the top (memory-intensive!).
Instead of ONLY checking to see if a vertex is already in the cube-in-question, also perform this other check:
If Not ThisCubeIsTaken()
For each SurroundingCube
If not Is_Your_Vertex_Sufficiently_Far_Away_From_Me()
exit_loop_and_outer_if_statement()
end if
Next
//Ok, we got here, we can add the vertex to the current cube because the cube is not only available, but the neighbors are far enough away from me
End If
I think you can probably see the beauty of this, as it is really easy to get neighboring cubes if you have a 3D array.
If you do some smoothing like this, you can probably enforce a 'don't add if it's with 0.25X' policy or something. You won't have to be too strict to achieve a noticeable smoothing effect.
Still too chunky, I want it smooth
In this variation, we will change the qualifying action for whether a vertex is permitted to take residence in a cube.
If TheCube is empty OR if ThisVertex is closer to the center of TheCube than the Cube's current vertex
InsertVertex (overwrite any existing vertex in the cube
End If
Note, we don't have to perform neighbor detection for this one. We just optimize towards the center of each cube.
If you like, you can merge this variation with the previous variation.
Cheat Mode
For some people in this situation, you can simply take a 10% random selection of your dataset and that will be a good-enough simplification. However, it will be very chunky with some points very close together. On the bright side, it takes a few minutes max. I don't recommend it unless you are prototyping.
I am providing a question regarding a subject that I am now working on.
I have an OpenGL view in which I would like to display points.
So far, this is something I can handle ;)
For every point, I have its coordinates (X ; Y ; Z) and a value (unsigned char).
I have a color array giving the link between one value and a color.
For example, 255 is red, 0 is blue, and so on...
I want to display those points in an OpenGL view.
I want to use a threshold value so that depending on it, I can modify the transparency value of a color depending on the value of one point.
I want also that the performance doesn't go bad even if I have a lot of points (5 billions in the worst case but 1~2 millions in a standard case).
I am now looking for the effective way to handle this.
I am interested in the VBO. I have read that it will allow some good performance and also that I can modify the buffer as I want without recalculating it from scratch (as with display list).
So that I can solve the threshold issue.
However, doing this on a million points dynamically will provide some heavy calculations (at least a pretty bad for loop), no ?
I am opened to any suggestions and I would like to discuss about any of your ideas !
Trying to display a billion points or more is generally (forgive the pun) pointless.
Even an extremely high resolution screen has only a few million pixels. Nothing you can do will get it to display more points than that.
As such, your first step is almost undoubtedly to figure out a way to restrict your display to a number of points that's at least halfway reasonable. OpenGL can (and will) oblige if you ask it to display more, but your monitor won't and neither will mine or much or anybody else's.
Not directly related to the OpenGL part of your question, but if you are looking at rendering massive point clouds you might want to read up on space partitioning hierarchies such as octrees to keep performance in check.
Put everything into one VBO. Draw it as an array of points: glDrawArrays(GL_POINTS,0,num). Calculate alpha in a pixel shader (using threshold passed as uniform).
If you want to change a small subset of points - you can map a sub-range of the VBO. If you need to update large parts frequently - you can use Transform Feedback to utilize GPU.
If you need to simulate something for the updates, you should consider using CUDA or OpenCL to run the update completely on the GPU. This will give you the best performance. Otherwise, you can use a single VBO and update it once per frame from the CPU. If this gets too slow, you could try multiple buffers and distribute the updates across several frames.
For the threshold, you should use a shader uniform variable instead of modifying the vertex buffer. This allows you to set a value per-frame which can be then combined with the data from the vertex buffer (for instance, you set a float minVal; and every vertex with some attribute less than minVal gets discarded in the geometry shader.)