diffrence between std140 and std430 layout - opengl

I have trouble with differences between layout std140 and std430.
this is my code of struct in .Cpp
struct Particle {
glm::vec3 position = glm::vec3(0);
float density = 1;
};
for (int z = 0; z < d; ++z) {
for (int y = 0; y < d; ++y) {
for (int x = 0; x < d; ++x) {
int index = z * d * d + y * d + x;
if (index >= num) break;
// dam break
m_InitParticles[index].position = glm::vec3(x, y, z) * distance;
m_InitParticles[index].position += glm::vec3(getJitter(), getJitter(), getJitter());
m_InitParticles[index].density = index;
}
}
}
And compute shader code
struct Particle {
vec3 position;
float density;
};
layout(std140, binding = 0) restrict buffer Particles{
Particle particles[];
};
it seems that I get correct data std430 data in renderdoc by
pack#(std430)
struct Particle {
vec3 position;
float t;
}
And when I use pack#(std140), the struct seems to have a 8N space std140 in renderdoc.
pack#(std430)
struct Particle {
vec3 position;
float t;
}
With std140, glGetActiveUniformsiv returns offset 0 and 12.
Why struct vec + float takes up extra space in std140?

The official OpenGL wiki has got you covered:
The rules for std140 layout are covered quite well in the OpenGL specification (OpenGL 4.5, Section 7.6.2.2, page 137). Among the most important is the fact that arrays of types are not necessarily tightly packed. An array of floats in such a block will not be the equivalent to an array of floats in C/C++. The array stride (the bytes between array elements) is always rounded up to the size of a vec4 (ie: 16-bytes). So arrays will only match their C/C++ definitions if the type is a multiple of 16 bytes
Warning: Implementations sometimes get the std140 layout wrong for vec3 components. You are advised to manually pad your structures/arrays out and avoid using vec3 at all.

Related

GLSL uint_fast64_t type

how can i get an input to the vertex shader of type uint_fast64_t?
there is not such type available in the language how can i pass it differently?
my code is this:
#version 330 core
#define CHUNK_SIZE 16
#define BLOCK_SIZE_X 0.1
#define BLOCK_SIZE_Y 0.1
#define BLOCK_SIZE_Z 0.1
// input vertex and UV coordinates, different for all executions of this shader
layout(location = 0) in uint_fast64_t vertexPosition_modelspace;
layout(location = 1) in vec2 vertexUV;
// Output data ; will be interpolated for each fragment.
out vec2 UV;
// model view projection matrix
uniform mat4 MVP;
int getAxis(uint_fast64_t p, int choice) { // axis: 0=x 1=y 2=z 3=index_x 4=index_z
switch (choice) {
case 0:
return (int)((p>>59 ) & 0xF); //extract the x axis int but i only want 4bits
case 1:
return (int)((p>>23 ) & 0xFF);//extract the y axis int but i only want 8bits
case 2:
return (int)((p>>55 ) & 0xF);//extract the z axis int but i only want 4bits
case 3:
return (int)(p & 0x807FFFFF);//extract the index_x 24bits
case 4:
return (int)((p>>32) & 0x807FFFFF);//extract the index_z 24bits
}
}
void main()
{
// assign vertex position
float x = (getAxis(vertexPosition_modelspace,0) + getAxis(vertexPosition_modelspace,3)*CHUNK_SIZE)*BLOCK_SIZE_X;
float y = getAxis(vertexPosition_modelspace,1)*BLOCK_SIZE_Y;
float z = (getAxis(vertexPosition_modelspace,2) + getAxis(vertexPosition_modelspace,3)*CHUNK_SIZE)*BLOCK_SIZE_Z;
gl_Position = MVP * vec4(x,y,z, 1.0);
// UV of the vertex. No special space for this one.
UV = vertexUV;
}
the error message i am takeing is :
i tried to put uint64_t but the same problem
Unextended GLSL for OpenGL does not have the ability to directly use 64-bit integer values. And even the fairly widely supported ARB extension that allows for the use of 64-bit integers within shaders doesn't actually allow you to use them as vertex shader attributes. That requires an NVIDIA extension supported only by... NVIDIA.
However, you can send 32-bit integers, and a 64-bit integer is just two 32-bit integers. You can put 64-bit integers into the buffer and pass them as 2 32-bit unsigned integers in your vertex attribute format:
glVertexAttribIFormat(0, 2, GL_UNSIGNED_INT, <byte_offset>);
Your shader will retrieve them as a uvec2 input:
layout(location = 0) in uvec2 vertexPosition_modelspace;
The x component of the vector will have the first 4 bytes and the y component will store the second 4 bytes. But since "first" and "second" are determined by your CPU's endian, you'll need to know whether your CPU is little endian or big endian to be able to use them. Since most desktop GL implementations are paired with little endian CPUs, we'll assume that is the case.
In this case, vertexPosition_modelspace.x contains the low 4 bytes of the 64-bit integer, and vertexPosition_modelspace.y contains the high 4 bytes.
So your code could be adjusted as follows (with some cleanup):
const vec3 BLOCK_SIZE(0.1, 0.1, 0.1);
//Get the three axes all at once.
uvec3 getAxes(in uvec2 p)
{
return uvec3(
(p.y >> 27) & 0xF),
(p.x >> 23) & 0xFF),
(p.y >> 23) & 0xF)
);
}
//Get the indices
uvec2 getIndices(in uvec2 p)
{
return p & 0x807FFFFF; //Performs component-wise bitwise &
}
void main()
{
uvec3 iPos = getAxes(vertexPosition_modelspace);
uvec2 indices = getIndices(vertexPosition_modelspace);
vec3 pos = vec3(
iPos.x + (indices.x * CHUNK_SIZE),
iPos.y,
iPos.z + (indices.x * CHUNK_SIZE) //You used index 3 in your code, so I used .x here, but I think you meant index 4.
);
pos *= BLOCK_SIZE;
...
}

I'm experiencing very slow OpenGL compute shader compilation (10+ minutes) when using larger work groups, is there anything I can do to speed it up?

So, I'm encountering a really bizarre (at least to me as a compute shader noob) phenomenon when I compile my compute shader using glGetShaderiv(m_shaderID, GL_COMPILE_STATUS, &status). Inexplicably, my compute shader takes much longer to compile when I increase the size of my work groups! When I have one-dimensional work groups, it compiles in less than a second, but when I increase the size of my work groups to 4x1x6, the compute shader takes 10+ minutes to compile! How strange.
For background, I'm trying to implement a light clustering algorithm (essentially the one shown here: http://www.aortiz.me/2018/12/21/CG.html#tiled-shading--forward), and my compute shader is this monster:
// TODO: Figure out optimal tile size, currently using a 16x9x24 subdivision
#define FLT_MAX 3.402823466e+38
#define FLT_MIN 1.175494351e-38
#define DBL_MAX 1.7976931348623158e+308
#define DBL_MIN 2.2250738585072014e-308
layout(local_size_x = 4, local_size_y = 9, local_size_z = 4) in;
// TODO: Change to reflect my light structure
// struct PointLight{
// vec4 position;
// vec4 color;
// uint enabled;
// float intensity;
// float range;
// };
// TODO: Pack this more efficiently
struct Light {
vec4 position;
vec4 direction;
vec4 ambientColor;
vec4 diffuseColor;
vec4 specularColor;
vec4 attributes;
vec4 intensity;
ivec4 typeIndexAndFlags;
// uint flags;
};
// Array containing offset and number of lights in a cluster
struct LightGrid{
uint offset;
uint count;
};
struct VolumeTileAABB{
vec4 minPoint;
vec4 maxPoint;
};
layout(std430, binding = 0) readonly buffer LightBuffer {
Light data[];
} lightBuffer;
layout (std430, binding = 1) buffer clusterAABB{
VolumeTileAABB cluster[ ];
};
layout (std430, binding = 2) buffer screenToView{
mat4 inverseProjection;
uvec4 tileSizes;
uvec2 screenDimensions;
};
// layout (std430, binding = 3) buffer lightSSBO{
// PointLight pointLight[];
// };
// SSBO of active light indices
layout (std430, binding = 4) buffer lightIndexSSBO{
uint globalLightIndexList[];
};
layout (std430, binding = 5) buffer lightGridSSBO{
LightGrid lightGrid[];
};
layout (std430, binding = 6) buffer globalIndexCountSSBO{
uint globalIndexCount;
};
// Shared variables, shared between all invocations WITHIN A WORK GROUP
// TODO: See if I can use gl_WorkGroupSize for this, gl_WorkGroupSize.x * gl_WorkGroupSize.y * gl_WorkGroupSize.z
// A grouped-shared array which contains all the lights being evaluated
shared Light sharedLights[4*9*4]; // A grouped-shared array which contains all the lights being evaluated, size is thread-count
uniform mat4 viewMatrix;
bool testSphereAABB(uint light, uint tile);
float sqDistPointAABB(vec3 point, uint tile);
bool testConeAABB(uint light, uint tile);
float getLightRange(uint lightIndex);
bool isEnabled(uint lightIndex);
// Runs in batches of multiple Z slices at once
// In this implementation, 6 batches, since each thread group contains four z slices (24/4=6)
// We begin by each thread representing a cluster
// Then in the light traversal loop they change to representing lights
// Then change again near the end to represent clusters
// NOTE: Tiles actually mean clusters, it's just a legacy name from tiled shading
void main(){
// Reset every frame
globalIndexCount = 0; // How many lights are active in t his scene
uint threadCount = gl_WorkGroupSize.x * gl_WorkGroupSize.y * gl_WorkGroupSize.z; // Number of threads in a group, same as local_size_x, local_size_y, local_size_z
uint lightCount = lightBuffer.data.length(); // Number of total lights in the scene
uint numBatches = uint((lightCount + threadCount -1) / threadCount); // Number of groups of lights that will be completed, i.e., number of passes
uint tileIndex = gl_LocalInvocationIndex + gl_WorkGroupSize.x * gl_WorkGroupSize.y * gl_WorkGroupSize.z * gl_WorkGroupID.z;
// uint tileIndex = gl_GlobalInvocationID; // doesn't wortk, is uvec3
// Local thread variables
uint visibleLightCount = 0;
uint visibleLightIndices[100]; // local light index list, to be transferred to global list
// Every light is being checked against every cluster in the view frustum
// TODO: Perform active cluster determination
// Each individual thread will be responsible for loading a light and writing it to shared memory so other threads can read it
for( uint batch = 0; batch < numBatches; ++batch){
uint lightIndex = batch * threadCount + gl_LocalInvocationIndex;
//Prevent overflow by clamping to last light which is always null
lightIndex = min(lightIndex, lightCount);
//Populating shared light array
// NOTE: It is VERY important that lightBuffer.data not be referenced after this point,
// since that is not thread-safe
sharedLights[gl_LocalInvocationIndex] = lightBuffer.data[lightIndex];
barrier(); // Synchronize read/writes between invocations within a work group
//Iterating within the current batch of lights
for( uint light = 0; light < threadCount; ++light){
if( isEnabled(light)){
uint lightType = uint(sharedLights[light].typeIndexAndFlags[0]);
if(lightType == 0){
// Point light
if( testSphereAABB(light, tileIndex) ){
visibleLightIndices[visibleLightCount] = batch * threadCount + light;
visibleLightCount += 1;
}
}
else if(lightType == 1){
// Directional light
visibleLightIndices[visibleLightCount] = batch * threadCount + light;
visibleLightCount += 1;
}
else if(lightType == 2){
// Spot light
if( testConeAABB(light, tileIndex) ){
visibleLightIndices[visibleLightCount] = batch * threadCount + light;
visibleLightCount += 1;
}
}
}
}
}
// We want all thread groups to have completed the light tests before continuing
barrier();
// Back to every thread representing a cluster
// Adding the light indices to the cluster light index list
uint offset = atomicAdd(globalIndexCount, visibleLightCount);
for(uint i = 0; i < visibleLightCount; ++i){
globalLightIndexList[offset + i] = visibleLightIndices[i];
}
// Updating the light grid for each cluster
lightGrid[tileIndex].offset = offset;
lightGrid[tileIndex].count = visibleLightCount;
}
// Return whether or not the specified light intersects with the specified tile (cluster)
bool testSphereAABB(uint light, uint tile){
float radius = getLightRange(light);
vec3 center = vec3(viewMatrix * sharedLights[light].position);
float squaredDistance = sqDistPointAABB(center, tile);
return squaredDistance <= (radius * radius);
}
// TODO: Different test for spot-lights
// Has been done by using several AABBs for spot-light cone, this could be a good approach, or even just use one to start.
bool testConeAABB(uint light, uint tile){
// Light light = lightBuffer.data[lightIndex];
// float innerAngleCos = light.attributes[0];
// float outerAngleCos = light.attributes[1];
// float innerAngle = acos(innerAngleCos);
// float outerAngle = acos(outerAngleCos);
// FIXME: Actually do something clever here
return true;
}
// Get range of light given the specified light index
float getLightRange(uint lightIndex){
int lightType = sharedLights[lightIndex].typeIndexAndFlags[0];
float range;
if(lightType == 0){
// Point light
float brightness = 0.01; // cutoff for end of range
float c = sharedLights[lightIndex].attributes.x;
float lin = sharedLights[lightIndex].attributes.y;
float quad = sharedLights[lightIndex].attributes.z;
range = (-lin + sqrt(lin*lin - 4.0 * c * quad + (4.0/brightness)* quad)) / (2.0 * quad);
}
else if(lightType == 1){
// Directional light
range = FLT_MAX;
}
else{
// Spot light
range = FLT_MAX;
}
return range;
}
// Whether the light at the specified index is enabled
bool isEnabled(uint lightIndex){
uint flags = sharedLights[lightIndex].typeIndexAndFlags[2];
return (flags | 1) != 0;
}
// Get squared distance from a point to the AABB of the specified tile (cluster)
float sqDistPointAABB(vec3 point, uint tile){
float sqDist = 0.0;
VolumeTileAABB currentCell = cluster[tile];
cluster[tile].maxPoint[3] = tile;
for(int i = 0; i < 3; ++i){
float v = point[i];
if(v < currentCell.minPoint[i]){
sqDist += (currentCell.minPoint[i] - v) * (currentCell.minPoint[i] - v);
}
if(v > currentCell.maxPoint[i]){
sqDist += (v - currentCell.maxPoint[i]) * (v - currentCell.maxPoint[i]);
}
}
return sqDist;
}
Edit: Whoops, lost the bottom part of this!
What I don't understand is why changing the size of the work groups affects compilation time at all? It sort of defeats the point of the algorithm if my work group sizes are too small for the compute shader to run efficiently, so I'm hoping there's something that I'm missing.
As a last note, I'd like to avoid using glGetProgramBinary as a solution. Not only because it merely circumvents the issue instead of solving it, but because pre-compiling shaders will not play nicely with the engine's current architecture.
So, I'm figuring that this must be a bug in the compiler, since I've replaced the loop in my sqDistPointAABB function with:
vec3 minPoint = currentCell.minPoint.xyz;
vec3 maxPoint = currentCell.maxPoint.xyz;
vec3 t1 = vec3(lessThan(point, minPoint));
vec3 t2 = vec3(greaterThan(point, maxPoint));
vec3 sqDist = t1 * (minPoint - point) * (minPoint - point) + t2 * (maxPoint - point) * (maxPoint - point);
return sqDist.x + sqDist.y + sqDist.z;
And it compiles just fine now, in less than a second! So strange

Opengl double overflow

I have double 'radius' = 2.0E-45, when i set it to ~2.0E-46 calculation collapse resulting in white screen. So seems like issue is overflow. I wrote the same algorithm but using nubma cuda and f64 (double precision) 'radius'. And everything works fine. I am using f32 texture buffer for 'depth_array' (there is no float64 dtype for this), but numba implementation works fine with f32, and opengl implementation also works fine until 'radius' bigger than ~2.0E-46. Why numba implementation works, while opengl not? I want to stick with opengl. Is there any possibility to fix it?
I only put in parts that use the 'radius'. All other variables are double type. (code is messy and just a scratch)
#version 150
#extension GL_ARB_gpu_shader_fp64 : enable
double radius = 2.0E-45;
...
dvec2 pixel = dvec2(gl_FragCoord.xy) + dvec2(-0.5+(double(x)+0.5)/double(AA),-0.5+(double(y)+0.5)/double(AA));
dvec2 c = pixel/dvec2(width, height) * dvec2(radius, radius) + dvec2(-radius/2, -radius/2);
color.rgb += sample(c);
...
vec3 sample(dvec2 dn)
{
vec3 color = vec3(0.0,0.0,0.0);
dvec2 d0 = dn;
double zn_size = 0.0;
int i = 0;
while (i < depth)
{
int x = i % depth;
dvec2 value = dvec2(texelFetch(depth_array, x).rg);
dn = complex_mul(dn, value + dn);
dn = dn + d0;
i++;
x = i % depth;
value = dvec2(texelFetch(depth_array, x).rg);
dvec2 zn = value * 0.5 + dn;
zn_size = dot(zn, zn);
if (zn_size > r)
{
double fraciter = (zn_size-r)/(r2-r);
double iter = double(i) - fraciter;
double m = sqrt(iter)*mul*2.0;
color = sin(vec3(.1, .15, .2)*float(m)*0.5)*.5+0.5;
break;
}
}
return color;
}
In GLSL, the literal value 2.0E-45 has the type float. That means the value will be squashed into the valid range of a float before it gets assigned to a value.
If you want a literal to be a double, then it needs to use the proper suffix: 2.0E-45lf.

Very strange behaviour with sampler handling using OpenGL and GLSL

I have implemented cubemap shadow mapping successfully with just one point light.
To render this scene I use in the first render pass geometry shaders to dispatch the 6 frustrums. In the second render pass I use samplerCubeShadow in the fragment shader to computer the shadow factor.
I have OpenGL/GLSL version 4.40 with NVIDIA GeForce GTX 780M.
Here's a screenshot:
But now I want to implement multiple cubemap shadow mapping to render shadows using several point lights.
Here's some peace of code from my fragment shader:
[...]
/*
** Shadow Cube sampler array.
*/
uniform samplerCubeShadow ShadowCubeSampler[5]; //Max point light by scene = 5
[...]
float ConvertDistToClipSpace(vec3 lightDir_ws)
{
vec3 AbsVec = abs(lightDir_ws);
float LocalZcomp = max(AbsVec.x, max(AbsVec.y, AbsVec.z));
float NormZComp = (NearFar.y + NearFar.x)/(NearFar.y - NearFar.x)
- (2.0f * NearFar.y * NearFar.x)/(LocalZcomp * NearFar.y - NearFar.x);
return ((NormZComp + 1) * 0.5f);
}
float GetCubeShadowFactor(vec3 vertexPosition_ws, float shadowFactor, int idx)
{
vec3 lightToVertexDir_ws = vertexPosition_ws - LightPos_ws.xyz;
float LightToVertexClipDist = ConvertDistToClipSpace(lightToVertexDir_ws);
float LightToOccluderClipDist = texture(
ShadowCubeSampler[idx], vec4(lightToVertexDir_ws, LightToVertexClipDist));
if (LightToOccluderClipDist < LightToVertexClipDist)
{
shadowFactor = 0.0f;
}
return (shadowFactor);
}
void main(void)
{
[...]
for (int idx = 0; idx < 1; idx++) //Test first with 1 point light
{
float ShadowFactor = GetCubeShadowFactor(Position_ws.xyz, ShadowFactor, idx);
}
[...]
}
The problem is I have the error 1282 (INVALID_OPERATION). To resume the situation here, I want to display exactly the same scene like in the picture above with a SINGLE point light but this time using an array of samplerCubeShadow. What is amazing is if I replace the first parameter of the function 'texture' 'ShadowCubeSampler[idx]' by 'ShadowCubeSampler[0]' is works! However the value of 'idx' is always '0'. I tried the following code without success:
int toto = 0;
float LightToOccluderClipDist = texture(ShadowCubeSampler[toto], vec4(lightToVertexDir_ws, LightToVertexClipDist));
I already have the error 1282! The type of the index is the same (int)!
I have already use arrays of 'sampler2DShadow' or 'sampler2D' without problem.
So, Why it does not work correctly using 'samplerCubeShadow' and the solution 'ShadowCubeSampler[0]' works and not the others ?
PS: If I define an array of 2 and if I use 2 cubemaps so 2 point lights, it works. So, if I load a number of cubemaps inferior to the number specified in the fragment shader it fails!
I have no compilation error and no linkage error. Here's the code I use to check shader programs state:
void video::IEffectBase::Log(void) const
{
GLint errorLink = 0;
glGetProgramiv(this->m_Handle, GL_LINK_STATUS, &errorLink);
if (errorLink != GL_TRUE)
{
GLint sizeError = 0;
glGetProgramiv(this->m_Handle, GL_INFO_LOG_LENGTH, &sizeError);
char *erreur = new char[sizeError + 1];
glGetShaderInfoLog(this->m_Handle, sizeError, &sizeError, erreur);
erreur[sizeError] = '\0';
std::cerr << erreur << std::endl;
glDeleteProgram(this->m_Handle);
delete[] erreur;
}
}
And about the texture unit limits:
std::cout << GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS << std::endl;
std::cout << GL_MAX_TEXTURE_IMAGE_UNITS << std::endl;
$> 35660
$> 34930
If I use 'ShadowCubeSampler[0]', '0' written directly in the code I have the same display like the picture a the beginning of the my post without error. If I use 'ShadowCubeSampler[idx]' with idx = 0 I have the following display:
As you can see, all the geometry sharing this shader has not been rendered. However I don't have any linkage error. How can you explain that ? Is it possible the system unlink the shader program?
UPDATE
Let's suppose my array of samplerCubeShadow can contain 2 maximum samplers (uniform samplerCubeShadow tex_shadow2).
I noticed if I load just one point light, so one cubemap:
CASE 1
uniform samplerCubeShadow tex_shadow[1]; //MAX POINT LIGHT = 1
for (int i=0; i < 1; i++) {tex_shadow[i];} //OK
for (int i=0; i < 1; i++) {texture(tex_shadow[i], ...);} //OK
for (int i=0; i < 1; i++) {texture(tex_shadow[0], ...);} //OK
CASE 2
uniform samplerCubeShadow tex_shadow[2]; //MAX POINT LIGHT = 2
for (int i=0; i < 1; i++) {tex_shadow[i];} //NOT OK - 1282
for (int i=0; i < 1; i++) {texture(tex_shadow[i], ...);} //NOT OK - 1282
for (int i=0; i < 1; i++) {texture(tex_shadow[0], ...);} //OK
CASE 3
uniform samplerCubeShadow tex_shadow[2]; //MAX POINT LIGHT = 2
for (int i=0; i < 2; i++) {tex_shadow[i];} //OK
for (int i=0; i < 2; i++) {texture(tex_shadow[i], ...);} //OK
for (int i=0; i < 2; i++) {texture(tex_shadow[0], ...);} //OK
Conclusion: if the max number of sampler is equal to the number of sampler loaded, I can loop over the samplers contained in my array. If the number is inferior, it does not work! I can use a maximum of 32 texture units for each use of shader program. I have the same problem using the samplerCube keyword.
It's very strange because I don't have any problem using sampler2D or sampler2DShadow for spot light shadow computation.
I check with NSight where I put a break point in the fragment shader file and of course the break point is neaver reached. It's like the shader program is not linked but it's not the case.
Do you think it could be a problem concerning cubeMap samplers in general or the problem comes from the cubemap initialization ?
Does anyone can help me?
i have never use an array inside of glsl and infortuntly i dont have the equipments now to do so,
but have you tried using an unsigned int uint in glsl.
float GetCubeShadowFactor(vec3 vertexPosition_ws, float shadowFactor, uint idx) {
....
}
also note that you cannot use infinite samplers in you shaders.
OpenGL has, depending on the targeted version, special restrictions on arrays of opaque types (textures are one of them). Before OpenGL 4 looping over such arrays is not possible. You can check the details here: OpenGL Wiki - Data Types

Cell-Shading Outlines: edge mesh writer does not define all desired edges

The program that I am writing takes in the vertex data of a 3D mesh, performs a series of calculations (forgive the vagueness, I'll try to explain in better detail later), and outputs a binary file that defines where the edges are on the mesh. My program then draws a colored line where the edge is. Without the appropriate vertex shader, this would look like a regular triangulated mesh, but once the appropriate vertex shader is applied, only the edges that are "sharp" (the dot product of their normals is greater than something close to zero) have lines drawn on them, along with the edges on the outside of the figure. My implementation for the outline is not correct, as I made the assumption that if an edge wasn't behind the edge, and didn't define a sharp edge, it would be an outline edge. I haven't found a satisfactory answer to this elsewhere, and I didn't want to rely on the old trick of re-drawing the mesh as a solid color, and rendering it to be slightly larger than the original mesh. This approach was to be entirely math-based, relying only on the vertex data of a mesh. I am writing a program that uses the following vertex shader:
uniform mat4 worldMatrix;
uniform mat4 projMatrix;
uniform mat4 viewProjMatrix;
uniform vec4 eyepos;
attribute vec3 a;
attribute vec3 b;
attribute vec3 n1;
attribute vec3 n2;
attribute float w;
void main()
{
float a_vertex = dot(eyepos.xyz - a, n1);
float b_vertex = dot(eyepos.xyz - a, n2);
if (a_vertex * b_vertex > 0.0) // signs are different, edge is behind the object
{
gl_Position = vec4(2.0,2.0,2.0,1.0);
}
else // the outline of the figure
{
if(w == 0.0)
{
vec4 p = vec4(a.x, a.y, a.z, 1.0);
p = p * worldMatrix * viewProjMatrix;
gl_Position = p;
}
else
{
vec4 p = vec4(b.x, b.y, b.z, 1.0);
p = p * worldMatrix * viewProjMatrix;
gl_Position = p;
}
}
if(dot(n1, n2) <= 0.2) // there is a sharp edge
{
if(w == 0.0)
{
vec4 p = vec4(a.x, a.y, a.z, 1.0);
p = p * worldMatrix * viewProjMatrix;
gl_Position = p;
}
else
{
vec4 p = vec4(b.x, b.y, b.z, 1.0);
p = p * worldMatrix * viewProjMatrix;
gl_Position = p;
}
}
}
... to take information from a binary file that is written using this program in C++:
#include <iostream>
#include "llgl.h"
#include <fstream>
#include <vector>
#include "SuperMesh.h"
using namespace std;
using namespace llgl;
struct Vertex
{
float x,y,z,w;
float s,t,p,q;
float nx,ny,nz,nw;
};
bool isFileAlright(string fName)
{
ifstream in(fName.c_str());
if(!in.good())
return false;
return true;
}
int main(int argc, char* argv[])
{
// INPUT FILE NAME //
string fName;
cout << "Enter the path to your spec.mesh file here: ";
cin >> fName;
while(!isFileAlright(fName))
{
cout << "Enter the path to your spec.mesh file here: ";
cin >> fName;
}
SuperMesh* Model = new SuperMesh(fName.c_str());
// END INPUT //
Model->load();
Model->draw();
string fname = Model->fname;
string FileName = fname.substr(0, fname.size() - 10); // supposed to slash the last 10 characters off of the string, removing ".spec.mesh"...
FileName = FileName + ".bin"; //... and then we make it a .bin file*/
cout << FileName << endl;
ofstream out(FileName.c_str(), ios::binary);
for (unsigned w = 0; w < Model->m.size(); w++)
{
vector<float> &vdata = Model->m[w]->vdata;
vector<char> &idata = Model->m[w]->idata;
//Create a vertex and index variable, a map for Edge Mesh, perform two loops to analyze all triangles on a mesh and write out their vertex values to a file.//
Vertex* V = (Vertex*)(&vdata[0]);
unsigned short* I16 = (unsigned short*)(&idata[0]);
unsigned char* I8 = (unsigned char*)(&idata[0]);
unsigned int* I32 = (unsigned int*)(&idata[0]);
map<set<int>, vector<vec3> > EM;
for(unsigned i = 0; i < Model->m[w]->ic; i += 3) // 3 because we're looking at triangles //
{
Mesh* foo = Model->m[w];
int i1;
int i2;
int i3;
if( Model->m[w]->ise == GL_UNSIGNED_BYTE)
{
i1 = I8[i];
i2 = I8[i + 1];
i3 = I8[i + 2];
}
else if( Model->m[w]->ise == GL_UNSIGNED_SHORT)
{
i1 = I16[i];
i2 = I16[i + 1];
i3 = I16[i + 2];
}
else
{
i1 = I32[i];
i2 = I32[i + 1];
i3 = I32[i + 2];
}
vec3 p = vec3(V[i1].x, V[i1].y, V[i1].z); // to represent the point in 3D space of each vertex on every triangle on the mesh
vec3 q = vec3(V[i2].x, V[i2].y, V[i2].z);
vec3 r = vec3(V[i3].x, V[i3].y, V[i3].z);
vec3 v1 = p - q;
vec3 v2 = r - q;
vec3 n = cross(v2,v1); //important to make sure the order is correct here, do VERTEX TWO dot VERTEX ONE//
set<int> tmp;
tmp.insert(i1); tmp.insert(i2);
EM[tmp].push_back(n);
set<int> tmp2;
tmp2.insert(i2); tmp2.insert(i3);
EM[tmp2].push_back(n);
set<int> tmp3;
tmp3.insert(i3); tmp3.insert(i1);
EM[tmp3].push_back(n);
//we have now pushed every needed point into our edge map
}
int edgeNumber = 0;
cout << "There should be 12 edges on a lousy cube." << endl;
for(map<set<int>, vector<vec3> >::iterator it = EM.begin(); it != EM.end(); ++it)
{
//Now we will take our edge map and write its data to the file!//
/* Information is written to the file in this form:
Vertex One, Vertex Two, Normal One, Normal Two, r (where r, depending on its value, determines whether one edge is on top of the other in the case
where two edges are aligned with one another)
*/
set<int>::iterator tmp = it->first.begin();
int pi = *tmp;
tmp++;
int qi = *tmp;
Vertex One = V[pi];
Vertex Two = V[qi];
vec3 norm1 = it->second[0];
vec3 norm2;
if(it->second.size() == 1)
norm2 = -1 * norm1;
else
norm2 = it->second[1];
out.write((char*) &One, 12);
out.write((char*) &Two, 12);
out.write((char*) &norm1, 12);
out.write((char*) &norm2, 12);
float r = 0;
out.write((char*) &r, 4);
out.write((char*) &One, 12);
out.write((char*) &Two, 12);
out.write((char*) &norm1, 12);
out.write((char*) &norm2, 12);
r = 1;
out.write((char*) &r, 4);
edgeNumber++;
cout << "Wrote edge #" << edgeNumber << endl;
}
}
return 0;
}
The problem that this program has is that it does neither of these two essential things in the test case where I use it to draw a simple box with outlines:
It does not draw outlines. The vertex shader is not sufficient to determine anything more than where the edges of the object are. The binary file that makes this happen is pre-computed in a separate program using code from the second snippet posted above, and then it is saved as a .bin file along with the mesh assets to which it belongs. However, raw vertex data would only take me so far, and I seek a way to draw a line around the outside of the mesh without using more traditional methods.
It does not draw ALL of the edges that I need. In my test case, two of the edges are missing, and I cannot figure out for the life of me why. I figure I must have done something wrong in writing the edge map.
A couple notes about the above code:
llgl is an OpenGL wrapper that I have used to simplify many elements of OpenGL. It is not used extensively here, but rather in the creation of meshes, done elsewhere.
Things like Mesh and SuperMesh (a collection of meshes into one rigid body) are meant to be 3D objects in my scene. In my test case, there is only one Mesh in my scene, and defining a SuperMesh of a single Mesh is essentially just creating a single Mesh.
The "draw" call in the second snippet, which pre-computes a Mesh's edge map, does not actually draw anything. It is necessary to gain access to the Mesh's vertex data.
The variable "ise" is taken from the individual Meshes in the SuperMesh, and is a variable found by reading it in from the original Blender .OBJ file. It is related to how much memory should be used to store the important vertex data. It generally isn't a good idea to allocate more space than is needed for these values, as I've been told by friends and mentors who work with Blender.
It isn't well-commented, as I'm not the only one who has worked on this code, and I, unfortunately, have a limited understanding of how the second snippet could iterate through all of the triangles on a mesh and somehow miss the last two edges. Once I understand better what this code should do when properly written, I plan on heavily commenting it and using it in future applications.
Order of multiplication between matrix and vector is not comutative, so
your vertex shader have to output Projection * Model * Vertex and not the opposite.
I solved the mystery of the undrawn lines by allocating more space to write vertex data in a different part of my code. As for my other problems, although the order of multiplication being done in my vertex shader was actually alright, I had messed up another fundamental concept of vector math. The dot product of two face normals will be a negative number when the normals make an obtuse angle... the way a sharp point on my model would. Also, there is the faulty logic above that basically says that if the face is visible, draw all of the lines on it. I re-wrote my shader to test first if a face was visible, and then in that same conditional block I did the test for sharp edges. Now, if a face is visible BUT it doesn't create a sharp edge, the shader will ignore that edge. Also, outlines appear now, just not perfectly. Here is a modified version of the above vertex shader:
uniform mat4 worldMatrix; /* the matrix that defines how to project a point from
object space to world space.*/
uniform mat4 viewProjMatrix; // the view (pertaining to screen size) matrix times the projection (how to project points to 3D) matrix.
uniform vec4 eyepos; // the position of the eye, given by the program.
attribute vec3 a; // one vertex on an edge, having an x,y,z, and w coordinate.
attribute vec3 b; // the other edge vertex.
attribute vec3 n1; // the normal of the face the edge is on.
attribute vec3 n2; // another normal in the case that an edge shares two faces... otherwise, this is the same as n1.
attribute float w; // an attribute given to make a binary choice between two edges when they draw on top of one another.
void main()
{
// WORLD SPACE ATTRIBUTES //
vec4 eye_world = eyepos * worldMatrix;
vec4 a_world = vec4(a.x, a.y,a.z,1.0) * worldMatrix;
vec4 b_world = vec4(b.x, b.y,b.z,1.0) * worldMatrix;
vec4 n1_world = normalize(vec4(n1.x, n1.y,n1.z,0.0) * worldMatrix);
vec4 n2_world = normalize(vec4(n2.x, n2.y,n2.z,0.0) * worldMatrix);
// END WORLD SPACE ATTRIBUTES //
// TEST CASE ATTRIBUTES //
float a_vertex = dot(eye_world - a_world, n1_world);
float b_vertex = dot(eye_world - b_world, n2_world);
float normalDot = dot(n1_world.xyz, n2_world.xyz);
float vertProduct = a_vertex * b_vertex;
float hardness = 0.0; // this would be the value for an object made of sharp angles, like a box. Take a look at its use below.
// END TEST CASE ATTRIBUTES //
gl_Position = vec4(2.0,2.0,2.0,1.0); // if all else fails, keeping this here will discard unwanted data.
if (vertProduct >= 0.1) // NOTE: face is behind the viewable portion of the object, normally uses 0.0 when not checking for silhouette
{
gl_Position = vec4(2.0,2.0,2.0,1.0);
}
else if(vertProduct < 0.1 && vertProduct >= -0.1) // NOTE: face makes almost a right angle with the eye vector
{
if(w == 0.0)
{
vec4 p = vec4(a_world.x, a_world.y, a_world.z, 1.0);
p = p * viewProjMatrix;
gl_Position = p;
}
else
{
vec4 p = vec4(b_world.x, b_world.y, b_world.z, 1.0);
p = p * viewProjMatrix;
gl_Position = p;
}
}
else // NOTE: this is the case where you can very clearly see a face.
{ // NOTE: the number that normalDot compares to should be its "hardness" value. The more negative the value, the smoother the surface.
// a.k.a. the less we care about hard edges (when the normals of the faces make an obtuse angle) on the object, the more negative
// hardness becomes on a scale of 0.0 to -1.0.
if(normalDot <= hardness) // NOTE: the dot product of the two normals is obtuse, so we are looking at a sharp edge.
{
if(w == 0.0)
{
vec4 p = vec4(a_world.x, a_world.y, a_world.z, 1.0);
p = p * viewProjMatrix;
gl_Position = p;
}
else
{
vec4 p = vec4(b_world.x, b_world.y, b_world.z, 1.0);
p = p * viewProjMatrix;
gl_Position = p;
}
}
else // NOTE: not sharp enough, just throw the vertex away
{
gl_Position = vec4(2.0,2.0,2.0,1.0);
}
}
}