I am doing a Marching cube algorithm in a Compute shader. The vertices generated by the compute stage will be input to the vertex stage.
Compute -> Vertices -> Render
There is no way of knowing how many vertices that the compute stage will output, so I need a storage buffer looking something like this:
layout(set = 1, binding = 0) buffer Count{
int value;
} count;
layout(set = 2, binding = 0) buffer Mesh {
vec4 vertices[1<<15];
} mesh;
The vertices do not need a roundtrip to the CPU, but the count is a variable used by the vkCmdDraw command. So I need to put the count buffer in host visible memory, map that memory and do a memcpy after the compute stage. Is this a good way of solving this problem or is there some other way where I don't have to read back data to the CPU?
Well, this is exactly what vkCmdDrawIndirect is for. The vertex count is stored in a Vkuffer, which makes the CPU round-trip unnecessary.
Related
I have many Esri Grid files (https://en.wikipedia.org/wiki/Esri_grid#ASCII) and I would like to render them in 3D without losing precision, I am using OpenSceneGraph.
The problem is this grids are around 1000x1000 (or more) points, so when I extract the vertices, then compute the triangles to create the geometry, I end up having millions of them and the interaction with the scene is impossible (frame rate drops to 0).
I've tried several approches:
Triangle list
Basically, as I read the file, I fill an array with 3 vertices per triangle (this leads to duplication);
osg::ref_ptr<osg::Geode> l_pGeodeSurface = new osg::Geode;
osg::ref_ptr<osg::Geometry> l_pGeometrySurface = new osg::Geometry;
osg::ref_ptr<osg::Vec3Array> l_pvTrianglePoints = osg::Vec3Array;
osg::ref_ptr<osg::Vec3Array> l_pvOriginalPoints = osg::Vec3Array;
... // Read the file and fill l_pvOriginalPoints
for(*triangle inside the file*)
{
... // Compute correct triangle indices (l_iP1, l_iP2, l_iP3)
// Push triangle vertices inside the array
l_pvTrianglePoints->push_back(l_pvOriginalPoints->at(l_iP1));
l_pvTrianglePoints->push_back(l_pvOriginalPoints->at(l_iP2));
l_pvTrianglePoints->push_back(l_pvOriginalPoints->at(l_iP3));
}
l_pGeometrySurface->setVertexArray(l_pvTrianglePoints);
l_pGeometrySurface->addPrimitiveSet(new osg::DrawArrays(GL_TRIANGLES, 0, 3, l_pvTrianglePoints->size()));
Indexed triangle list
Same as before, but the array contains the every vertices just once and I create a second array of indices (basically i tell osg how to build triangles, no duplication)
osg::ref_ptr<osg::Geode> l_pGeodeSurface = new osg::Geode;
osg::ref_ptr<osg::Geometry> l_pGeometrySurface = new osg::Geometry;
osg::ref_ptr<osg::DrawElementsUInt> l_pIndices = new osg::DrawElementsUInt(osg::PrimitiveSet::TRIANGLES, *number of indices*);
osg::ref_ptr<osg::Vec3Array> l_pvOriginalPoints = osg::Vec3Array;
... // Read the file and fill l_pvOriginalPoints
for(i = 0; i < *number of indices*; i++)
{
... // Compute correct triangle indices (l_iP1, l_iP2, l_iP3)
// Push vertices indices inside the array
l_pIndices->at(i) = l_iP1;
l_pIndices->at(i+1) = l_iP2;
l_pIndices->at(i+2) = l_iP3;
}
l_pGeometrySurface->setVertexArray(l_pvOriginalPoints );
l_pGeometrySurface->addPrimitiveSet(l_pIndices.get());
Instancing
this was a bit of an experiment, since I've never used shaders, I tought I could instance a single triangle, then manipulate its coordinates in a vertex shader for every triangle in my scene, using transformation matrices (passing the matrices as a uniform array, one for triangle). I ended up with too many uniforms just with a grid 20x20.
I used these links as a reference:
https://learnopengl.com/Advanced-OpenGL/Instancing,
https://books.google.it/books?id=x_RkEBIJeFQC&pg=PT265&lpg=PT265&dq=osg+instanced+geometry&source=bl&ots=M8ii8zn8w7&sig=ACfU3U0_92Z5EGCyOgbfGweny4KIUfqU8w&hl=en&sa=X&ved=2ahUKEwj-7JD0nq7qAhUXxMQBHcLaAiUQ6AEwAnoECAkQAQ#v=onepage&q=osg%20instanced%20geometry&f=false
None of the above solved my issue, what else can I try? Am I missing something in terms of rendering techinques? I thought it was fairly simple task, but I'm kind of stuck.
I feel like you should consider taking a step back. If you're visualizing GIS-based terrain data, osgEarth is really designed for doing this and has fairly efficient LOD tools for large terrains. Do you need the data always represented at maximum full LOD or are you looking for dynamic LOD to improve frame rate?
Depending on your goals and requirements you might want to look at some more advanced terrain rendering techniques, like rightfield tracing, etc. If the terrain is always static, you can precompute quadtrees and Signed Distance Functions and trace against the heightfield.
I am working on particle system. For calculation of each particle position, time alive and so on I use compute shader. I have problem to get count of dead particles back to the cpu, so I can set how many particles renderer should render. To store data of particles i use shader storage buffer. To render particles i use instancing. I tried to use atomic buffer counter, it works fine, but it is slow to copy data from buffer to the cpu. I wonder if there is some other option.
This is important part of compute shader
if (pData.timeAlive >= u_LifeTime)
{
pData.velocity = pData.defaultVelocity;
pData.timeAlive = 0;
pData.isAlive = u_Loop;
atomicCounterIncrement(deadParticles)
pVertex.position.x = pData.defaultPosition.x;
pVertex.position.y = pData.defaultPosition.y;
}
InVertex[id] = pVertex;
InData[id] = pData;
To copy data to the cpu i use following code
uint32_t* OpenGLAtomicCounter::GetCounters()
{
glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, m_AC);
glGetBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0, sizeof(uint32_t) * m_NumberOfCounters, m_Counters);
return m_Counters;
}
I've noticed that my shaders are performing a calculation that I need in the CPU code. Is it possible for me to load that results of that calculation into a uniform array, and then access that uniform from the CPU once the GPU has finished working?
You can write arbitrary amounts of data through either Image Load/Store or SSBOs. While the number of image variables is restricted in image load/store, those variables can refer to buffer textures or array textures. Either of which give you access to a more-or-less arbitrarily large amount of data to write to:
layout(rgba32f, writeOnly) imageBuffer buffer;
imageStore(buffer, valueOffset1, value1);
imageStore(buffer, valueOffset2, value2);
imageStore(buffer, valueOffset3, value3);
imageStore(buffer, valueOffset4, value4);
SSBOs make this even easier:
layout(std430) buffer Data
{
float giantArray[];
};
giantArray[valueOffset1] = data1;
giantArray[valueOffset2] = data2;
giantArray[valueOffset3] = data3;
giantArray[valueOffset4] = data4;
However, note that any such writes will be unordered with regard to writes from other shader invocations. So overwriting such data will be... problematic. And you'll need an appropriate glMemoryBarrier call before you try to read from it.
But if all you're doing is a compute operation, you ought to be using dedicated compute shaders.
As far as i know, there is no way of retreiving uniform data from your GPU. But you could execute the calculation and set the output color to something you can identify on your screen depending on the expected result of your calculation. For exmaple:
#version 330 core
layout(location = 0) out vec4 color;
void main() {
if( Something you're trying to debug )
color = vec4(1, 1, 1, 1);
else
color = vec4(0, 0, 0, 1);
}
That's the only way I know of, and I use it all the time.
So I wanted to store all my meshes in one large VBO. The problem is, how do you do have just one draw call, but let every mesh have its own model to world matrix?
My idea was to submit an array of matrices to a uniform before drawing. In the VBO I would make the color of every first vertex of a mesh negative (So I'd be using the signing bit to check whether a vertex was the first of a mesh).
Okay, so I can detect when a new mesh has started and I have an array of matrices ready and probably a uniform called 'index'. But how do I increase this index by one every time I encounter a new mesh?
Can you modify a uniform from within the shader? If so, how?
Can you modify a uniform from within the shader?
If you could, it wouldn't be uniform anymore, would it?
Furthermore, what you're wanting to do cannot be done even with Image Load/Store or SSBOs, both of which allow shaders to write data. It won't work because vertex shader invocations are not required to be executed sequentially. Many happen at the same time, and there's no way for any shader invocation to know that it will happen "after" the "first vertex" in a mesh.
The simplest way to deal with this is the obvious solution. Render each mesh individually, but set the uniforms for each mesh before each draw call. Without changing buffers between draws, of course. Uniform changes, while not exactly cheap, aren't the most expensive state changes that exist.
There are more complicated drawing methods that could allow you more performance. But that form is adequate for most needs. You've already done the hard part: you removed the need for any state change (textures, buffers, vertex formats, etc) except uniform state.
There are two approaches to minimize draw calls - instancing and batching. The first (instancing) allows you to draw multiple copies of same meshes in one draw call, but it depends on the API (is available from OpenGL 3.1). Batching is similar to instancing but allows you to draw different meshes. Both of these approaches have restrictions - meshes should be with the same materials and shaders.
If you would to draw different meshes in one VBO then instancing is not an option. So, batching requires keeping all meshes in 'big' VBO with applied world transform. It not a problem with static meshes, but have some discomfort with animated. I give you some pseudocode with batching implementation
struct SGeometry
{
uint64_t offsetVB;
uint64_t offsetIB;
uint64_t sizeVB;
uint64_t sizeIB;
glm::mat4 oldTransform;
glm::mat4 transform;
}
std::vector<SGeometry> cachedGeometries;
...
void CommitInstances()
{
uint64_t vertexOffset = 0;
uint64_t indexOffset = 0;
for (auto instance in allInstances)
{
Copy(instance->Vertexes(), VBO);
for (uint64_t i = 0; i < instances->Indices().size(); ++i)
{
auto index = instances->Indices()[i];
index += indexOffset;
IBO[i] = index;
}
cachedGeometries.push_back({vertexOffset, indexOffset});
vertexOffset += instance->Vertexes().size();
indexOffset += instance->Indices().size();
}
Commit(VBO);
Commit(IBO);
}
void ApplyTransform(glm::mat4 modelMatrix, uint64_t instanceId)
{
const SGeometry& geom = cachedGeometries[i];
glm::mat4 inverseOldTransform = glm::inverse(geom.oldTransform);
VertexStream& stream = VBO->GetStream(Position, geom.offsetVB);
for (uint64_t i = 0; i < geom.sizeVB; ++i)
{
glm::vec3 pos = stream->Get(i);
// We need to revert absolute transformation before applying new
pos = glm::vec3(inverseOldNormalTransform * glm::vec4(pos, 1.0f));
pos = glm::vec3(normalTransform * glm::vec4(pos, 1.0f));
stream->Set(i);
}
// .. Apply normal transformation
}
GPU Gems 2 has a good article about geometry instancing http://www.amazon.com/GPU-Gems-Programming-High-Performance-General-Purpose/dp/0321335597
I am trying to do a Sum of absolute difference within my shader and write back the single result back to a uniform float in a in unity.
In the shader I have 2 geometries with the same number of vertices that map one to one.
// substract vertices
float norm = 10;
float error=infereCrater.vertex.y-v.vertex.y;
error = error*error*norm;
o.debugColor = float3(error,1-error ,0.0f);
//////
o.posWorld =mul(_Object2World,v.vertex);
o.normalWorld = normalize(mul(float4(v.normal,0.0),_World2Object).xyz);
o.tangentWorld = normalize(mul(float4(v.tangent,0.0),_World2Object).xyz);
o.binormalWorld = cross(o.normalWorld,o.tangentWorld);
o.tex = v.texcoord;
o.pos = mul(UNITY_MATRIX_MVP,v.vertex);
TRANSFER_VERTEX_TO_FRAGMENT(o);
return o;
}
I am available to calculate the error for each individual vertex and change the color of the surface based on the difference.
I hit a road block where I don't know how to sync all the threads and start adding up the values.
Is there a way to call another vertex shader after the first one is done?
How can the vertex shader read the values of adjacent vertex to it? (don't think its possible because in local memory of thread)
Or its possible to have a global array, to store the difference values, copy this to the CPU (which I don't want because of latency) and add them in the CPU?
I don't want to use compute shader because I am not in Windows