I am building a kernel code that captures the triangles inside my current GL scene.
For that I send my vertex streams along with my indices to the kernel code.
Here's the declaration for my kernel entry:
void CaptureTriangles(
const uint NumTriangles,
const float16 WorldMatrix,
__constant ushort3 *IndexDataBlock,
__constant struct Vertex *DataBlock,
__global struct Triangle *TriangleBuffer,
__global uint *TriangleBufferCount)
The Vertex structure is defined as such:
struct Vertex
float3 position;
float3 normal;
float materialIndex;
Now this stream was created through GL and that's how the data is laid out.
When fetching the triangles, I do the following in kernel code:
const ushort3 idx = IndexDataBlock[get_global_id(0)];
const struct Vertex v0 = DataBlock[idx.x],
v1 = DataBlock[idx.y],
v2 = DataBlock[idx.z];
But it seems like OpenCL keeps on re-aligning the Vertex struct to its own internal requirements even though it is declared as __attribute((packed)).
So the triangles are never captured properly.
Switching from __constant struct Vertex *DataBlock to __constant float *DataBlock and fetching each float explicitly in the kernel code fixes the issue.
So this works when reading float by float:
// __constant float *DataBlock
float4 p0 = (float4)(DataBlock[7 * idx.x + 0], DataBlock[7 * idx.x + 1], DataBlock[7 * idx.x + 2], 1.0f),
p1 = (float4)(DataBlock[7 * idx.y + 0], DataBlock[7 * idx.y + 1], DataBlock[7 * idx.y + 2], 1.0f),
p2 = (float4)(DataBlock[7 * idx.z + 0], DataBlock[7 * idx.z + 1], DataBlock[7 * idx.z + 2], 1.0f);
I'd rather use the struct Vertex syntax for code clarity, is there any way to get OpenCL to not re-align structs?

For CL, cl_float3 and cl_float4 are equal in size. But in your case your GL code gives real float3 values as output.
__attribute__((packed)) will not fix your problem, because for CL the struct is already packed, just with different element sizes inside.
You will have to manually parse it I'm afraid.


OpenGL: Lambert shading imported OBJs results in artifacts and strange culling

I decided to post this as I now believe the problem isn't simply stemming from the shader program, but most probably the OBJ import and mesh initialization process. I wanted to write a quick Lambert shader to finally get stuff appearing on the screen. The final result is riddled with interesting artifacts and visibility issues:
It appears as though the vertex positions are encoded correctly, but the either the normals or indices are completely messed up.
Vertex Shader
#version 330
// MeshVertex
in layout(location=0) vec3 a_Position;
in layout(location=1) vec3 a_Normal;
in layout(location=2) vec2 a_UV;
in layout(location=3) vec3 a_Tangent;
in layout(location=4) vec3 a_BiTangent;
uniform mat4 View;
uniform mat4 Projection;
uniform mat4 Model;
out VS_out
vec3 fragNormal;
vec3 fragPos;
} vs_out;
void main()
mat3 normalMatrix = mat3(transpose(inverse(Model)));
vec4 position = vec4(a_Position, 1.f);
vs_out.fragPos = (Model * position).xyz;
vs_out.fragNormal = normalMatrix * a_Normal;
gl_Position = Projection * View * Model * position;
I initially thought I was passing the vertex normals incorrectly to the fragment shader. I have seen some samples multiply the vertex position by the ModelView matrix. That sounds non-intuitive to me, my lights are positioned in world space, so I would need the world space coordinates of my vertices, hence the multiplication by the Model matrix only. If there are no red flags in this thought process, here is the fragment shader:
#version 330
struct LightSource
vec3 position;
vec3 intensity;
uniform LightSource light;
in VS_out
vec3 fragNormal;
vec3 fragPos;
} fs_in;
struct Material
vec4 color;
vec3 ambient;
uniform Material material;
void main()
// just playing around with some values for now, dont worry, removing this still does not fix the issue
vec3 ambient = normalize(vec3(69, 111, 124));
vec3 norm = normalize(fs_in.fragNormal);
vec3 pos = fs_in.fragPos;
vec3 lightDir = normalize(light.position - pos);
float lambert = max(dot(norm, lightDir), 0.0);
vec3 illumination = (lambert * light.intensity) + ambient;
gl_FragColor = vec4(illumination *, 1.f);
Now the main suspicion is how the OBJ is interpreted. I use the tinyOBJ importer for this. I mostly copied the sample code they had on their GitHub page, and initialized my native vertex type using that data.
OBJ Import Code
bool Model::Load(const void* rawBinary, size_t bytes)
tinyobj::ObjReader reader;
if(reader.ParseFromString((const char*)rawBinary, ""))
// Fetch meshes
std::vector<MeshVertex> vertices;
std::vector<Triangle> triangles;
const tinyobj::attrib_t& attrib = reader.GetAttrib();
const std::vector<tinyobj::shape_t>& shapes = reader.GetShapes();
// Loop over shapes; in our case, each shape corresponds to a mesh object
for(size_t s = 0; s < shapes.size(); s++)
// Loop over faces(polygon)
size_t index_offset = 0;
for(size_t f = 0; f < shapes[s].mesh.num_face_vertices.size(); f++)
// Num of face vertices for face f
int fv = shapes[s].mesh.num_face_vertices[f];
ASSERT(fv == 3, "Only supporting triangles for now");
Triangle tri;
// Loop over vertices in the face.
for(size_t v = 0; v < fv; v++) {
// access to vertex
tinyobj::index_t idx = shapes[s].mesh.indices[index_offset + v];
tinyobj::real_t vx = 0.f;
tinyobj::real_t vy = 0.f;
tinyobj::real_t vz = 0.f;
tinyobj::real_t nx = 0.f;
tinyobj::real_t ny = 0.f;
tinyobj::real_t nz = 0.f;
tinyobj::real_t tx = 0.f;
tinyobj::real_t ty = 0.f;
vx = attrib.vertices[3 * idx.vertex_index + 0];
vy = attrib.vertices[3 * idx.vertex_index + 1];
vz = attrib.vertices[3 * idx.vertex_index + 2];
nx = attrib.normals[3 * idx.normal_index + 0];
ny = attrib.normals[3 * idx.normal_index + 1];
nz = attrib.normals[3 * idx.normal_index + 2];
tx = attrib.texcoords[2 * idx.texcoord_index + 0];
ty = attrib.texcoords[2 * idx.texcoord_index + 1];
// Populate our native vertex type
MeshVertex meshVertex;
meshVertex.Position = glm::vec3(vx, vy, vz);
meshVertex.Normal = glm::vec3(nx, ny, nz);
meshVertex.UV = glm::vec2(tx, ty);
meshVertex.BiTangent = glm::vec3(0.f);
meshVertex.Tangent = glm::vec3(0.f);
tri.Idx[v] = index_offset + v;
index_offset += fv;
// per-face material
// Adding meshes should occur here!
m_Meshes[s] = std::make_unique<StaticMesh>(vertices, triangles);
// m_Materials[s] = ....
return true;
With the way I understand OBJ, the notion of OpenGL indices does not equate to a Face elements found in the OBJ. This is because each face element has different indices into the position, normal,and texcoord arrays. So instead, I just copy the vertex attributes indexed by the face element into my native MeshVertex structure -- this represents one vertex of my mesh; the corresponding face element ID is then simply the corresponding index for my index buffer object. In my case, I use a Triangle structure instead, but it's effectively the same thing.
The Triangle struct if interested:
struct Triangle
uint32_t Idx[3];
Triangle(uint32_t v1, uint32_t v2, uint32_t v3)
Idx[0] = v1;
Idx[1] = v2;
Idx[2] = v3;
Triangle(const Triangle& Other)
Idx[0] = Other.Idx[0];
Idx[1] = Other.Idx[1];
Idx[2] = Other.Idx[2];
Other than that, I have no idea what can cause this problem, I am open to hearing new thoughts; perhaps someone experienced understands what these artifacts signify. If you want to take a deeper dive, I can post the mesh initialization code as well.
So I tried importing an FBX format, and I encountered a very similar issue. I am now considering silly errors in my OpenGL code to initialize the mesh.
This initializes OpenGL buffers based on arbitrary vertex data, and triangles to index by
void Mesh::InitBuffers(const void* vertexData, size_t size, const std::vector<Triangle>& triangles)
glGenVertexArrays(1, &m_vao);
// Interleaved Vertex Buffer
glGenBuffers(1, &m_vbo);
glBindBuffer(GL_ARRAY_BUFFER, m_vbo);
glBufferData(GL_ARRAY_BUFFER, size, vertexData, GL_STATIC_DRAW);
// Index Buffer
glGenBuffers(1, &m_ibo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, m_ibo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(Triangle) * triangles.size(),, GL_STATIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
Then I setup the layout of the vertex buffer using a BufferLayout structure that specifies the attributes we want.
void Mesh::SetBufferLayout(const BufferLayout& layout)
glBindBuffer(GL_ARRAY_BUFFER, m_vbo);
uint32_t stride = layout.GetStride();
int i = 0;
for(const BufferElement& element : layout)
glVertexAttribPointer(i++, element.GetElementCount(), GLType(element.Type), element.Normalized, stride, (void*)(element.Offset));
glBindBuffer(GL_ARRAY_BUFFER, 0);
So in our case, the BufferLayout corresponds to the MeshVertex I populated, containing a Position(float3), Normal(float3), UV(float2), Tangent(float3), BiTangent(float3). I can confirm via debugging that the strides and offsets, and other values coming from the BufferElement are exactly what I expect; so I am concerned with the nature of the OpenGL calls I am making.
Alright, let us all just forget this has happened. This is very embarrassing, everything was working fine after all. I simply "forgot" to call the following before rendering:
So understandably, all kinds of shapes were being rendered and culled in completely random fashion. (Why is it not enabled by default?)

Why is my glBindBufferRange offset alignment incorrect?

I'm having difficulty understanding how glBindBufferRange offset / alignment works in the Nvidia example project gl_commandlist_basic. I've read that the offset needs to be a multiple of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT which is 256 and/or that offset and alignment is very important with glBindBuffer range.. I have an example UBO that works with mat4/vec4 and a non-working example with mat4/mat3/vec4. The UBO doesn't add up to be a multiple of 256 in either case. I'm try to send vec4(0.f, 1.f, 0.f, 1.f).
If mat4 = 64 bytes, mat3 = 36 bytes, vec4 = 16 bytes then the working example has 64+16=80 bytes, which isn't a multiple of 256. The non-working example has 64+36+16 = 116 bytes.
NV uses an inline called uboAligned which is defined as
inline size_t uboAligned(size_t size) { return ((size + 255) / 256) * 256; }
Removing this from the working/non made no difference either way.
I assume I need to add some "padding" to the UBO in the form of a float/vec2/vec3/vec4, etc. How do I determine the correct amount of padding I need if I want to use the mat4/mat3/vec4 UBO?
typedef struct
glm::mat4 MM;
// glm::mat3 NM;
glm::vec4 Cs;
} myData0;
Gluint objectUBO;
glCreateBuffers(1, &objectUBO);
glNamedBufferData(objectUBO, uboAligned(sizeof(abjObjectData) * 2), 0, GL_STATIC_DRAW); //
for (unsigned int i = 0; i < allObj.size(); ++i)
myData0 myDataTemp;
myDataTemp.Cs = glm::vec4(0.f, 1.f, 0.f, 1.f);
glNamedBufferSubData(objectUBO, sizeof(abjObjectData) * i, sizeof(abjObjectData), &objDataInit);
//hot loop
for (unsigned int i = 0; i < allObj.size(); ++i)
glBindBufferRange(GL_UNIFORM_BUFFER, 1, objectUBO, uboAligned(sizeof(abjObjectData)) * i, sizeof(abjObjectData));
/* HW */
out vec4 Ci;
struct ObjectData
mat4 MM;
// mat3 NM;
vec4 Cs;
layout (std140, binding = 1) uniform objectBuffer { ObjectData object; };
void main()
Ci = object.Cs;
Simple typo with glNamedBufferData. Changing from
glNamedBufferData(objectUBO, uboAligned(sizeof(abjObjectData) * 2), 0, GL_STATIC_DRAW);
glNamedBufferData(objectUBO, uboAligned(sizeof(abjObjectData)) * 2, 0, GL_STATIC_DRAW);
fixes the offset / alignment problems.
OpenGL uses a particular alignment. Assuming you are using std140 layout, for example, this is a structure defined in Cpp :
struct PointLight
glm::vec3 position;
int padding; //this is needed for alignement
glm::vec3 color; // because a vec3 has to be aligned
float intensity; //no need for alignment because a float can be read directly without alignement
That you can pass to a uniform of a struct like this in shader :
uniform light
vec3 Position;
vec3 Color;
float Intensity;
} PointLight;
I would test something like :
struct ObjectData
mat4 MM;
mat3 NM;
vec3 padding; //I think you have to add 3 floats of padding
vec4 Cs;
But I couldn't find more infos on it, I don't remember where I found it on the first place.
You have different type of memory layout tho, you can check them, std140 define memory layout in the specs here and I advise you to use this layout. If you don't, shared layout will be used and you have to query the layout. You can use OpenGL to query the layout to know what padding you should add BlockLayoutQuery
Concerning glBindBufferRange, I've never heard about the 256bits alignement. Here is an example of how I use it :
const int pointLightCount = SomeNumber;
int pointLightBufferSize = sizeof(PointLight) * pointLightCount + sizeof(int) * 4;
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo[0]);
glBufferData(GL_SHADER_STORAGE_BUFFER, pointLightBufferSize, 0, GL_DYNAMIC_COPY);
void * lightBuffer = glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_WRITE_ONLY);
((int*) lightBuffer)[0] = pointLightCount;
for (int i = 0; i < pointLightCount; ++i) {
PointLight p = { something };
((PointLight*) ((int*) lightBuffer + 4))[i] = p;
glBindBufferRange(GL_SHADER_STORAGE_BUFFER, 0, ssbo[0], 0, pointLightBufferSize); //Bind all the buffer

Emulating Direct3D9 fixed function pipeline in Direct3D11

Our software currently uses the fixed function pipeline in Direct3D9 to offer our users an easily scriptable way of throwing lights and objects into a simple scene. We allow directional, point and spot lights. I'm trying to move us over to Direct3D11, but I want it to be as close to the Direct3D9 fixed function pipeline as possible as a baseline. We can add stuff like per-pixel lighting later. I'm brand new to shader coding and even though I've been doing C++ for years, I feel out of my element. I'm wondering if there is a DX11 supported shader out there which perfectly emulates the lighting offered by DX9 fixed function pipeline. I have seen the old point-light emulation example in FixedFuncEMUFX11, and I'll be trying to use that as a base, but I just know I'm not doing this as efficiently as it could be, and the math for proper spot light emulation is beyond me. Is there any open source out there which emulates the fixed function pipeline, or would contain directional, point, and spot light implementations for DirectX11/10? I think all I'd need are the .fx or .hlsl files. Even if it tops out at 8 light sources, that'd be fine.
I'm currently using DirectXTK as a guide to proper DirectX11 programming, but if there is a better standard to code to, I'd love to see an example of industry standard DirectX11 rendering engine programming methodology. Thank you for any advice you can give.
For basic rendering with Direct3D 11, the DirectX Tool Kit built-in shaders are based on the XNA Game Studio 4 which provides a good set of basic features including directional lighting and per-pixel lighting. They are designed to work with all Direct3D Feature Level hardware, so they don't implement things like spotlights which are easily done with more modern hardware.
The FixedFuncEMU11 sample is a Direct3D 11 port of the legacy DirectX SDK's FixedFuncEMU Direct3D 10 sample. The shaders are useful for understanding the various Direct3D 9 specific fixed-function pipeline, but doesn't cover 'standard' stuff like implementing standard lighting models in HLSL. Also note that this sample uses the Effects system for Direct3D 11 which has it's own issues. Still, it's useful for seeing:
Fixed-function Transformation Pipeline
Fixed-function Lighting Pipeline
User Clip Planes
Pixel Fog
Gouraud and Flat shade modes
Projected texture lookups (texldp)
D3DFILL_POINT fillmode
Screen space UI rendering
You might want to try some of the old Direct3D 9 era introductions to HLSL shader programming. While not 100% compatible with Direct3D 11, they are pretty close and HLSL itself is basically the same. I found this article for example.
There are also a number of excellent Direct3D 11 books all of which cover HLSL shaders since there's no fixed-function pipeline in Direct3D 11. See Book Recommendations for some details and notes as some of those books were written before the DirectX SDK was retired. In particular, you should look at Real-Time 3D Rendering with DirectX and HLSL by Varcholik as it's heavily focused on HLSL authoring.
For anyone who wound up here by searching for a Fixed Function Pipeline emulation in hlsl, this is pretty much exactly what I was looking for: The shader in there is not very optimized, but is written for clarity and a good introduction for beginners to hlsl. There is very little extra cruft to sift through in order to see exactly what is going on and the bare minimum to get your scene running. The spotlight isn't an exact replication of DX9's FFP spotlight, but it is easily modified to become such.
#define MAX_LIGHTS 8
// Light types.
#define POINT_LIGHT 1
#define SPOT_LIGHT 2
Texture2D Texture : register(t0);
sampler Sampler : register(s0);
struct _Material
float4 Emissive; // 16 bytes
//----------------------------------- (16 byte boundary)
float4 Ambient; // 16 bytes
//------------------------------------(16 byte boundary)
float4 Diffuse; // 16 bytes
//----------------------------------- (16 byte boundary)
float4 Specular; // 16 bytes
//----------------------------------- (16 byte boundary)
float SpecularPower; // 4 bytes
bool UseTexture; // 4 bytes
float2 Padding; // 8 bytes
//----------------------------------- (16 byte boundary)
}; // Total: // 80 bytes ( 5 * 16 )
cbuffer MaterialProperties : register(b0)
_Material Material;
struct Light
float4 Position; // 16 bytes
//----------------------------------- (16 byte boundary)
float4 Direction; // 16 bytes
//----------------------------------- (16 byte boundary)
float4 Color; // 16 bytes
//----------------------------------- (16 byte boundary)
float SpotAngle; // 4 bytes
float ConstantAttenuation; // 4 bytes
float LinearAttenuation; // 4 bytes
float QuadraticAttenuation; // 4 bytes
//----------------------------------- (16 byte boundary)
int LightType; // 4 bytes
bool Enabled; // 4 bytes
int2 Padding; // 8 bytes
//----------------------------------- (16 byte boundary)
}; // Total: // 80 bytes (5 * 16 byte boundary)
cbuffer LightProperties : register(b1)
float4 EyePosition; // 16 bytes
//----------------------------------- (16 byte boundary)
float4 GlobalAmbient; // 16 bytes
//----------------------------------- (16 byte boundary)
Light Lights[MAX_LIGHTS]; // 80 * 8 = 640 bytes
}; // Total: // 672 bytes (42 * 16 byte boundary)
float4 DoDiffuse( Light light, float3 L, float3 N )
float NdotL = max( 0, dot( N, L ) );
return light.Color * NdotL;
float4 DoSpecular( Light light, float3 V, float3 L, float3 N )
// Phong lighting.
float3 R = normalize( reflect( -L, N ) );
float RdotV = max( 0, dot( R, V ) );
// Blinn-Phong lighting
float3 H = normalize( L + V );
float NdotH = max( 0, dot( N, H ) );
return light.Color * pow( RdotV, Material.SpecularPower );
float DoAttenuation( Light light, float d )
return 1.0f / ( light.ConstantAttenuation + light.LinearAttenuation * d + light.QuadraticAttenuation * d * d );
struct LightingResult
float4 Diffuse;
float4 Specular;
LightingResult DoPointLight( Light light, float3 V, float4 P, float3 N )
LightingResult result;
float3 L = ( light.Position - P ).xyz;
float distance = length(L);
L = L / distance;
float attenuation = DoAttenuation( light, distance );
result.Diffuse = DoDiffuse( light, L, N ) * attenuation;
result.Specular = DoSpecular( light, V, L, N ) * attenuation;
return result;
LightingResult DoDirectionalLight( Light light, float3 V, float4 P, float3 N )
LightingResult result;
float3 L =;
result.Diffuse = DoDiffuse( light, L, N );
result.Specular = DoSpecular( light, V, L, N );
return result;
float DoSpotCone( Light light, float3 L )
float spotMinAngle = cos( light.SpotAngle );
float spotMaxAngle = ( spotMinAngle + 1.0f ) / 2.0f;
float cosAngle = dot(, L );
return smoothstep( spotMinAngle, spotMaxAngle, cosAngle );
LightingResult DoSpotLight( Light light, float3 V, float4 P, float3 N )
LightingResult result;
float3 L = ( light.Position - P ).xyz;
float distance = length(L);
L = L / distance;
float attenuation = DoAttenuation( light, distance );
float spotIntensity = DoSpotCone( light, -L );
result.Diffuse = DoDiffuse( light, L, N ) * attenuation * spotIntensity;
result.Specular = DoSpecular( light, V, L, N ) * attenuation * spotIntensity;
return result;
LightingResult ComputeLighting( float4 P, float3 N )
float3 V = normalize( EyePosition - P ).xyz;
LightingResult totalResult = { {0, 0, 0, 0}, {0, 0, 0, 0} };
for( int i = 0; i < MAX_LIGHTS; ++i )
LightingResult result = { {0, 0, 0, 0}, {0, 0, 0, 0} };
if ( !Lights[i].Enabled ) continue;
switch( Lights[i].LightType )
result = DoDirectionalLight( Lights[i], V, P, N );
result = DoPointLight( Lights[i], V, P, N );
result = DoSpotLight( Lights[i], V, P, N );
totalResult.Diffuse += result.Diffuse;
totalResult.Specular += result.Specular;
totalResult.Diffuse = saturate(totalResult.Diffuse);
totalResult.Specular = saturate(totalResult.Specular);
return totalResult;
struct PixelShaderInput
float4 PositionWS : TEXCOORD1;
float3 NormalWS : TEXCOORD2;
float2 TexCoord : TEXCOORD0;
float4 TexturedLitPixelShader( PixelShaderInput IN ) : SV_TARGET
LightingResult lit = ComputeLighting( IN.PositionWS, normalize(IN.NormalWS) );
float4 emissive = Material.Emissive;
float4 ambient = Material.Ambient * GlobalAmbient;
float4 diffuse = Material.Diffuse * lit.Diffuse;
float4 specular = Material.Specular * lit.Specular;
float4 texColor = { 1, 1, 1, 1 };
if ( Material.UseTexture )
texColor = Texture.Sample( Sampler, IN.TexCoord );
float4 finalColor = ( emissive + ambient + diffuse + specular ) * texColor;
return finalColor;

HLSL: packing error?

I am passing in a constant buffer with the following layout:
float spread;
D2D1_POINT_2F dimension;
D2D1_POINT_2F dimension2;
} m_constants;
for debugging sake, dimension and dimension2 have the same values.
In the shader i have:
cbuffer constants
float spread;
float2 dimension;
float2 dimension2;
float4 main(
float4 pos : SV_POSITION,
float4 posScene : SCENE_POSITION,
float4 uv0 : TEXCOORD0
) : SV_Target
float width = dimension.x;
float height = dimension.y;
float2 uv2 = float2(posScene.x / width, posScene.y / height);
color.rgb = float3(uv2.xy, 0);
return color;
this, in theory, should output a gradient with green on the bottom left and red at the top right. And it does.
But if, in the shader i have the width and height to use dimension2 instead. i get a horizontal gradient from green on the left to yellow on the right.
Why is that? both dimensions have the same value when i passed the m_constants to the shader
Constant buffers data is aligned by 16 bytes by default, so this means:
cbuffer constants
float spread;
float2 dimension;
float2 dimension2;
will be
cbuffer constants
float spread; // 4 bytes
float2 dimension; // 4 + 8 = 12 bytes
float dummy; //12+8 = 20, which means we cross 16 for dimension 2, hence a dummy 4 bytes element is added
float2 dimension2;
here is a link that describes this.
So a better way to arrange your structure would be:
D2D1_POINT_2F dimension;
D2D1_POINT_2F dimension2;
float spread;
} m_constants;
and modify the hlsl counterpart accordingly:
cbuffer constants
float2 dimension;
float2 dimension2;
float spread; // No more 16 bytes crossing problem
Another way, without modifying initial layout, in c++ side, either declare your structure like:
#pragma pack(push)
#pragma pack(16)
float spread;
D2D1_POINT_2F dimension;
D2D1_POINT_2F dimension2;
} m_constants;
#pragma pack(pop)
That will force structure to be 16 bytes aligned.
You can also use /Zp16 compiler flag , but that will then apply to every structure in your program (which is not always desirable). In visual studio go to project properties -> c/c++ -> Code Generation, then you have option "Struct Member Alignment", where you can set it from.
You can also use packoffset on hlsl side, but then it means that the c++ layout needs to match the packed hlsl one (which means you keep same order in hlsl constant buffer, but still have to modify the c++ version).

Why won't this GLSL Vertex Shader compile?

I'm writing my own shader with OpenGL, and I am stumped why this shader won't compile. Could anyone else have a look at it?
What I'm passing in as a vertex is 2 floats (separated as bytes) in this format:
Float 1:
Byte 1: Position X
Byte 2: Position Y
Byte 3: Position Z
Byte 4: Texture Coordinate X
Float 2:
Byte 1: Color R
Byte 2: Color G
Byte 3: Color B
Byte 4: Texture Coordinate Y
And this is my shader:
in vec2 Data;
varying vec3 Color;
varying vec2 TextureCoords;
uniform mat4 projection_mat;
uniform mat4 view_mat;
uniform mat4 world_mat;
void main()
vec4 dataPosition = UnpackValues(Data.x);
vec4 dataColor = UnpackValues(Data.y);
vec4 position = dataPosition * vec4(1.0, 1.0, 1.0, 0.0);
Color =;
TextureCoords = vec2(dataPosition.w, dataColor.w)
gl_Position = projection_mat * view_mat * world_mat * position;
vec4 UnpackValues(float value)
return vec4(value % 255, (value >> 8) % 255, (value >> 16) % 255, value >> 24);
If you need any more information, I'd be happy to comply.
You need to declare UnpackValues before you call it. GLSL is like C and C++; names must have a declaration before they can be used.
BTW: What you're trying to do will not work. Floats are floats; unless you're working with GLSL 4.00 (and since you continue to use old terms like "varying", I'm guessing not), you cannot extract bits out of a float. Indeed, the right-shift operator is only defined for integers; attempting to use it on floats will fail with a compiler error.
GLSL is not C or C++ (ironically).
If you want to pack your data, use OpenGL to pack it for you. Send two vec4 attributes that contain normalized unsigned bytes:
glVertexAttribPointer(X, 4, GL_UNSIGNED_BYTE, GL_TRUE, 8, *);
glVertexAttribPointer(Y, 4, GL_UNSIGNED_BYTE, GL_TRUE, 8, * + 4);
Your vertex shader would take two vec4 values as inputs. Since you are not using glVertexAttribIPointer, OpenGL knows that you're passing values that are to be interpreted as floats.