determine work group count for particle system opengl - opengl

What is the correct way to determine the global work group size and local work group size for compute shader?
And how can the 2nd dimension be used (num_groups_y).
Can I pass an array of vectors and index them using gl_GlobalInvocationID.y ?
My requirement has 100k particles and I am passing another array of vectors for position calculation.
For my Particle Position is GL_SHADER_STORAGE_BUFFER
and for the array of vectors I am using uniform.
Is it possible to index the array of vectors using gl_GlobalInvocationID.y by passing the array size in gl_DispatchCompute ?
And what should be the optimal num_groups_x value to be set?

Related

Optimal way to append to numpy array when dealing with large dimensions

I am working with a json file that consists of approximately 17,000 3x1 arrays denoting the coordinates.
Currently,I have an image of 1024x1024 dimensions(which I have flattend),and I am using np.hstack to add the 3x1 array to that image ,this gives me a 1d array of dimension 1048579x1
My objective is to create a final array of dimension 1048579x17,000.
Unfortunately list.append and np.append are not working in this case,because it's consuming too much memory.I tried running this on colab pro,but the memory consumption is too high which causes the session to crash
My current code is as follows
image=cv2.imread('image_name.jpg',0)
shape=image.shape
flat_img=image.ravel()
print(flat_img.shape)
#Here data consists of 17,000 entries each of which is a 3x1 list
with open('data.json') as f:
json1_str = f.read()
json1_data=json.loads(json1_str)
local_coordinates=[]
for i in range(len(json1_data)):
local_coord=json1_data[i]['local_coordinate']
local_coord=np.array(local_coord)
new_arr=np.hstack((flat_img,local_coord))
new_arr=new_arr.tolist()
local_coordinates.append(new_arr) #
Is there an optimal way to stack all the 1048579 1d arrays to create the final matrix which can be used for training purposes?

How can I create heightmap for QHeightMapSurfaceDataProxy from 2D array to show 2D Fourier transform results

I have the data - 2D discrete Fourier transform results. I want to gain heightmap but i dont know how to form heightmap. I need plot this data as surface in Q3DSurface through heightmap (not just 2D array).
QHeightMapSurfaceDataProxy's constructor takes an image or an image file as an argument. All you need to do is create this image and load it.
Images can easily be generated from a 2D array since the indices used to point at a specific value stored in it can be interpreted as X,Y, while the value at the specific pair of indices as the Z coordinate.
Example:
If you have the following assignment
myarr[2][10] = 200;
you can read it as X=2, Y=10 and Z=200, which would mean that pixel at location [2;10] has value 200.
The size of the image is calculated by taking the dimensions of your array. If you have 10x15 elements your image will be 10x15 pixels. Check how to populate a QImage to have a more accurate code and not my pseudo-code from above.

Sum elements in a channel in caffe

If I have a 4-D blob, say of size (40,1024,300,1) and I want to average pool across the second channel and generate an output of size (40,1,300,1), how would I do it? I think the reduction layer collapses the whole blob and generates a blob of size (40) by summing elements in all other axises (after 1) also. Is there any work around for this without re-implementing a new layer?
The only easy workaround I found is as follows. Permute your blob to a shape (40,300,1,1024). Use reduction layer to compute the mean with axis = -1 and operation = MEAN. I think the blob will be of shape (40,300,1). You may need to use reshape to append an extra dimension at the end (check if this is needed) and then permute back to shape (40,1,300,1).
You can find an implementation of a Permute layer here or here. I hope this helps.

openmp - using locks for 2d array summation

I'm a new user to openmp, and I try to parallelize a part of my program and use locks for a 2 dimentional array. I won't go over all the details of my real problem, but instead discuss the following simplified example:
Let's say I have a huge group of N>>1 particles, which their x,y locations is stored in some data structure . I want to create a 2d counter array that will represent a grid, and count all the particles in each cell of the grid:
count[j][k]+=1 (for the corresponding j,k of each particle, according to his x,y values)
Using locks is a natural choice (The 2D array is on the scale of 100x100 and above , so when you have a 20 cores machine, the chances of 2 threads updating an array element simultaneously is still pretty low). The relevant parts of the code:
//define a 2d lock:
omp_lock_t **count_lock;
count_lock = new omp_lock_t* [J+1];
for(j=0;j<=J;j++) {
count_lock[j]=new omp_lock_t[K+1];
memset(count_lock[j],0,(K+1)*sizeof(omp_lock_t));
}
//initializing:
for(j=0;j<=J;j++)
for(k=0;k<=K;k++){
omp_init_lock(&(count_lock[j][k]));
omp_unset_lock(&(count_lock[j][k]));
}
//using the lock (after determining the right j,k):
omp_set_lock(&(count_lock[j][k]));
count[j][k] += 1;
omp_unset_lock(&(count_lock[j][k]));
//destroying the lock:
for(i=0;i<=J;i++)
for(j=0;j<=K;j++)
omp_destroy_lock(&(count_lock[i][j]));
for (j=0; j<=J; j++)
delete[] count_lock[j];
delete[] count_lock;
The particles are divided into groups of 500 particles. each group is a linked list of particles, and all the particle groups are also forming a linked list. The paralellization comes from parallelizing the "for" loop that goes over the particle groups.
For some reason I can't seem to get it right...no improvement in performance is obtained, and the simulation get stuck after a few iterations.
I tried using "atomic" but it gave even worse performance than the serial code. Another option I tried is to create a private 2D array for each thread, and then sum them up. I got some improvement that way, but it is pretty costly and I hope there is a better way.
Thanks!

Draw multiple meshes to different locations (DirectX 12)

I have a problem with DirectX 12. I have made a small 3D renderer. Models are translated to 3D space in vertex shader with basic World View Projection matrixes that are in constant buffer.
To change data of the constant buffer i'm currently using memcpy(pMappedConstantBuffer + alignedSize * frame, newConstantBufferData, alignedSize) this command replaces constant buffer's data immediately.
So the problem comes here, drawing is recorded to a command list that will be later sent to the gpu for execution.
Example:
/* Now i want to change the constant buffer to change the next draw call's position to (0, 1, 0) */
memcpy(/*Parameters*/);
/* Now i want to record a draw call to the command list */
DrawInstanced(/*Parameters*/);
/* But now i want to draw other mesh to other position so i have to change the constant buffer. After this memcpy() the draw position will be (0, -1, 0) */
memcpy(/*Parameters*/);
/* Now i want to record new draw call to the list */
DrawInstanced(/*Parameters*/);
After this i sent the command list to gpu for execution, but quess what all the meshes will be in the same position, because all memcpys are executed before even the command list is sent to gpu. So basically the last memcpy overwrites the previous ones.
So basically the question is how do i draw meshes to different positions or how to replace constant buffer's data in the command list so the constant buffer changes between each draw call on gpu?
Thanks
No need for help anymore i solved it by myself. I created constant buffer for each mesh.
About execution order, you are totally right, you memcpy calls will update the buffers immediately, but the commands will not be processed until you push your command list in the queue (and you will not exactly know when this will happen).
In Direct3D11, when you use Map on a buffer, this is handled for you (some space will be allocated to avoid that if required).
So In Direct3D12 you have several choices, I'll consider that you want to draw N objects, and you want to store one matrix per object in your cbuffer.
First is to create one buffer per object and set data independently. If you have only a few, this is easy to maintain (and extra memory footprint due to resource allocations will be ok)
Other option is to create a large buffer (which can contain N matrices), and create N constant buffer views that points to the memory location of each object. (Please note that you also have to respect 256 bytes alignment in that case too, see CreateConstantBufferView).
You can also use a StructuredBuffer and copy all data into it (in that case you do not need the alignment), and use an index in the vertex shader to lookup the correct matrix. (it is possible to set a uint value in your shader and use SetGraphicsRoot32BitConstant to apply it directly).