Misaligned address using virtual inheritance - c++

The following apparently valid code produces a misaligned address runtime error using the UndefinedBehaviorSanitizer sanitiser.
#include <memory>
#include <functional>
struct A{
std::function<void()> data; // seems to occur only if data is a std::function
} ;
struct B{
char data; // occurs only if B contains a member variable
};
struct C:public virtual A,public B{
};
struct D:public virtual C{
};
void test(){
std::make_shared<D>();
}
int main(){
test();
return 0;
}
Compiling and executing on a macbook with
clang++ -fsanitize=undefined --std=c++11 ./test.cpp && ./a.out
produces the output
runtime error: constructor call on misaligned address 0x7fe584500028 for type 'C', which requires 16 byte alignment [...].
I would like to understand how and why the error occurs.

Since alignment of std::function<void()> is 16 and size is 48 lets simplify. This code has the same behavior but is easier to understand:
struct alignas(16) A
{ char data[48]; };
struct B
{ char data; };
struct C : public virtual A, public B
{};
struct D : public virtual C
{};
int main()
{
D();
}
We have the following alignments and sizes:
|__A__|__B__|__C__|__D__|
alignment (bytes): | 16 | 1 | 16 | 16 |
size (bytes): | 48 | 1 | 64 | 80 |
Now lets see how this looks like in memory. More explanation on that can be found in this great answer.
A: char[48] + no padding == 48B
B: char[1] + no padding == 1B
C: A* + B + A + 7 bytes of padding (align to 16) == 64B
D: C* + C + 8 bytes of padding (align to 16) == 80B
Now it is easy to see that the offset of C inside D is 8 bytes, but C is aligned to 16. Thus error, which is helpfully accompanied by this pseudo-graphic
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
Here each zero is 1 byte.
UPDATE:
Where and how to place padding is up to a C++ compiler. Standard does not specify it. It looks like with the size of padding it has, clang is unable to align everything in D. One way to mitigate the misalignment is to design your classes carefully so that they have the same alignment (e.g., 8 bytes).

Related

Reordering bit-fields mysteriously changes size of struct

For some reason I have a struct that needs to keep track of 56 bits of information ordered as 4 packs of 12 bits and 2 packs of 4 bits. This comes out to 7 bytes of information total.
I tried a bit field like so
struct foo {
uint16_t R : 12;
uint16_t G : 12;
uint16_t B : 12;
uint16_t A : 12;
uint8_t X : 4;
uint8_t Y : 4;
};
and was surprised to see sizeof(foo) evaluate to 10 on my machine (a linux x86_64 box) with g++ version 12.1. I tried reordering the fields like so
struct foo2 {
uint8_t X : 4;
uint16_t R : 12;
uint16_t G : 12;
uint16_t B : 12;
uint16_t A : 12;
uint8_t Y : 4;
};
and was surprised that the size now 8 bytes, which is what I originally expected. It's the same size as the structure I expected the first solution to effectively produce:
struct baseline {
uint16_t first;
uint16_t second;
uint16_t third;
uint8_t single;
};
I am aware of size and alignment and structure packing, but I am really stumped as to why the first ordering adds 2 extra bytes. There is no reason to add more than one byte of padding since the 56 bits I requested can be contained exactly by 7 bytes.
Minimal Working Example Try it on Wandbox
What am I missing?
PS: none of this changes if we change uint8_t to uint16_t
If we create an instance of struct foo, zero it out, set all bits in a field, and print the bytes, and do this for each field, we see the following:
R: ff 0f 00 00 00 00 00 00 00 00
G: 00 00 ff 0f 00 00 00 00 00 00
B: 00 00 00 00 ff 0f 00 00 00 00
A: 00 00 00 00 00 00 ff 0f 00 00
X: 00 00 00 00 00 00 00 f0 00 00
Y: 00 00 00 00 00 00 00 00 0f 00
So what appears to be happening is that each 12 bit field is starting in a new 16 bit storage unit. Then the first 4 bit field fills out the remaining bits in the prior 16 bit unit, then the last field takes up 4 bits in the last unit. This occupies 9 bites And since the largest field, in this case a bitfield storage unit, is 2 bytes wide, one byte of padding is added at the end.
So it appears that is 12 bit field, which has a 16 bit base type, is kept within a single 16 bit storage unit instead of being split between multiple storage units.
If we do the same for the modified struct:
X: 0f 00 00 00 00 00 00 00
R: f0 ff 00 00 00 00 00 00
G: 00 00 ff 0f 00 00 00 00
B: 00 00 00 00 ff 0f 00 00
A: 00 00 00 00 00 00 ff 0f
Y: 00 00 00 00 00 00 00 f0
We see that X takes up 4 bits of the first 16 bit storage unit, then R takes up the remaining 12 bits. The rest of the fields fill out as before. This results in 8 bytes being used, and so requires no additional padding.
While the exact details of the ordering of bitfields is implementation defined, the C standard does set a few rules.
From section 6.7.2.1p11:
An implementation may allocate any addressable storage unit large
enough to hold a bit- field. If enough space remains, a bit-field that
immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined. The order of
allocation of bit-fields within a unit (high-order to low-order or
low-order to high-order) is implementation-defined. The alignment of
the addressable storage unit is unspecified.
And 6.7.2.1p15:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared.

C++ structure/array initialization

I have a C++ array or structure initialization issue that I have not been able to resolve.
I have a 4-level nested structure. Each level is actually the same 48 bytes wrapped in the structure next level up. The issue is when the structure is initialized and declared as a scalar value, it is correctly initialized with the provided values. However, when it is declared as a single element array, all 48 bytes become zeros, as shown below. Unfortunately the structures are too complicated to be pasted here.
If I define 4 simple structures, one containing another, with the innermost one containing the same 12 unsigned integers, then it is initialized correctly, even if it is declared in an array.
Has anyone experienced similar issues? What am I missing? What compiler flags, options, etc could lead to such a problem? Appreciate any comments and help.
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "bls12_381/fq.hpp"
static constexpr embedded_pairing::bls12_381::Fq scalar = {
{{{.std_words = {0x1c7238e5, 0xcf1c38e3, 0x786f0c70, 0x1616ec6e, 0x3a6691ae, 0x21537e29,
0x4d9e82ef, 0xa628f1cb, 0x2e5a7ddf, 0xa68a205b, 0x47085aba, 0xcd91de45}}}}
};
static constexpr embedded_pairing::bls12_381::Fq array[1] = {
{{{{.std_words = {0x1c7238e5, 0xcf1c38e3, 0x786f0c70, 0x1616ec6e, 0x3a6691ae, 0x21537e29,
0x4d9e82ef, 0xa628f1cb, 0x2e5a7ddf, 0xa68a205b, 0x47085aba, 0xcd91de45}}}}}
};
void print_struct(const char *title, const uint8_t *cbuf, int len)
{
printf("\n");
printf("[%s] %d\n", title, len);
for (int i = 0; i < len; i++) {
if (i % 30 == 0 && i != 0)
printf("\n");
else if ((i % 10 == 0 || i % 20 == 0) && i != 0)
printf(" ");
printf("%02X ", cbuf[i]);
}
printf("\n");
}
void run_tests()
{
print_struct("scalar", (const uint8_t *) &scalar, sizeof(scalar));
print_struct("array", (const uint8_t *) &array[0], sizeof(array[0]));
}
[scalar] 48
E5 38 72 1C E3 38 1C CF 70 0C 6F 78 6E EC 16 16 AE 91 66 3A 29 7E 53 21 EF 82 9E 4D CB F1
28 A6 DF 7D 5A 2E 5B 20 8A A6 BA 5A 08 47 45 DE 91 CD
[array] 48
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
I've just narrowed down the example.
The following is a complete and standalone example. I also forgot to mention that the initialization on Linux using g++ 9.3.0, -std=c++17, gets the expected results of all FF's. However, on an embedded device, the inherited structure gets all 0's.
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
struct Data {
uint32_t words;
};
struct Overlay {
Data val;
};
struct Inherit : Data {
};
static Overlay overlay[1] = {
{{.words = 0xffffffff}}
};
static Inherit inherit[1] = {
{{.words = 0xffffffff}}
};
void print_struct(const char *title, const uint8_t *cbuf, int len)
{
printf("[%s] %d\n", title, len);
for (int i = 0; i < len; i++) {
printf("%02X ", cbuf[i]);
}
printf("\n");
}
int main()
{
print_struct("overlay", (const uint8_t *) &overlay[0], sizeof(overlay[0])); // FF FF FF FF
print_struct("inherit", (const uint8_t *) &inherit[0], sizeof(inherit[0])); // 00 00 00 00 <-- incorrect?
return 0;
}

Read binary file into struct and also problems with endianness

I want to read a binary file image.dd into struct teststruct *test;. Basically there are two problems:
1. Wrong order because of little / big endian.
printf("%02x", test->magic); just gives me 534b554c instead of 4c55b453 (maybe this has something to do with the "main problem" in the next part). Its just "one value". As an example, printf("%c", test->magic); gives L instead of LUKS.
2. No output with test->version.
uint16_t version; in struct teststruct gives no output. Which means, i call printf("%x ", test->version); and there is no result.
This is exampleh.h which contains struct:
#ifndef _EXAMPLEH_H
#define _EXAMPLEH_H
#define MAGIC_L 6
struct teststruct {
char magic [MAGIC_L];
uint16_t version;
};
#endif
This is the main code:
using namespace std;
#include <stdint.h>
#include <string.h>
#include <iostream>
#include <fstream>
#include "exampleh.h"
struct teststruct *test;
int main() {
FILE *fp = fopen("C:\\image.dd", "rb"); // open file in binary mode
if (fp == NULL) {
fprintf(stderr, "Can't read file");
return 0;
}
fread(&test,sizeof(test),1,fp);
//printf("%x ", test->magic); //this works, but in the wrong order because of little/big endian
printf("%x ", test->version); //no output at all
fclose(fp);
return 0;
}
And this here are the first 114 Bytes of image.dd:
4C 55 4B 53 BA BE 00 01 61 65 73 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 78 74 73 2D 70 6C 61 69
6E 36 34 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 73 68 61 32 35 36 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 40
You must allocate the structure and read data into the structure instead of reading into an pointer directly. If you are going to read only one structure, you won't need to declare pointers for the structure.
printf("%x ", test->magic); invokes undefined behavior because pointer (automatically converted from the array) is passed to where unsigned int is required.
In this case, the observed behavior is because:
Firstly, fread(&test,sizeof(test),1,fp); read the first few bytes from the file as pointer value.
Then, printf("%02x", test->magic); printed the first 4-byte integer from the file because test->magic is (converted to) the pointer to the array placed at the top of the structure, and the address of the array is same as the address of the structure itself, so the address read from the file is printed. One more lucky is that where to read 4-byte integer and address (pointer) from as function arguments are the same.
Finally, you didn't get any output from printf("%x ", test->version); because the address read from the file is unfortunately in region that is not readable and trying to read there caused Segmentation Fault.
Fixed code:
using namespace std;
#include <stdint.h>
#include <string.h>
#include <iostream>
#include <fstream>
#include "exampleh.h"
struct teststruct test; // allocate structure directly instead of pointer
int main() {
FILE *fp = fopen("C:\\image.dd", "rb"); // open file in binary mode
if (fp == NULL) {
fprintf(stderr, "Can't read file");
return 0;
}
fread(&test,sizeof(test),1,fp); // now structure is read instead of pointer
for (int i = 0; i < 6; i++) {
printf("%02x", (unsigned char)test.magic[i]); // print using proper combination of format and data
}
printf(" ");
printf("%x ", test.version); // also use . instead of ->
fclose(fp);
return 0;
}
struct teststruct *test; points to NULL, as it is defined in the global namespace. You do not allocate memory for this pointer, so test->version is UB.
fread(&test,sizeof(test),1,fp); is also wrong, this will read a pointer, not the content of the struct.
An easy fix is to change test to be a struct teststruct and not a pointer to it.
using namespace std;
#include <stdint.h>
#include <string.h>
#include <iostream>
#include <fstream>
#include "exampleh.h"
struct teststruct test; //not a pointer anymore
int main() {
FILE *fp = fopen("C:\\image.dd", "rb"); // open file in binary mode
if (fp == NULL) {
fprintf(stderr, "Can't read file");
return 0;
}
fread(&test,sizeof(test),1,fp);
//printf("%x ", test.magic); //this works, but in the wrong order because of little/big endian
printf("%x ", test.version); //no output at all
fclose(fp);
return 0;
}

The size of these structs are different in a file but the same in program memory

Consider the following POD struct:
struct MessageWithArray {
uint32_t raw;
uint32_t myArray[10];
//MessageWithArray() : raw(0), myArray{ 10,20,30,40,50,60,70,80,90,100 } { };
};
Running the following:
#include <type_traits>
#include <iostream>
#include <fstream>
#include <string>
struct MessageWithArray {
uint32_t raw;
uint32_t myArray[10];
//MessageWithArray() : raw(0), myArray{ 10,20,30,40,50,60,70,80,90,100 } { };
};
//https://stackoverflow.com/questions/46108877/exact-definition-of-as-bytes-function
template <class T>
char* as_bytes(T& x) {
return &reinterpret_cast<char&>(x);
// or:
// return reinterpret_cast<char*>(std::addressof(x));
}
int main() {
MessageWithArray msg = { 0, {0,1,2,3,4,5,6,7,8,9} };
std::cout << "Size of MessageWithArray struct: " << sizeof(msg) << std::endl;
std::cout << "Is a POD? " << std::is_pod<MessageWithArray>() << std::endl;
std::ofstream buffer("message.txt");
buffer.write(as_bytes(msg), sizeof(msg));
return 0;
}
Gives the following output:
Size of MessageWithArray struct: 44
Is a POD? 1
A hex dump of the "message.txt" file looks like this:
00 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00
03 00 00 00 04 00 00 00 05 00 00 00 06 00 00 00
07 00 00 00 08 00 00 00 09 00 00 00
Now if I uncomment the constructor (so that MessageWithArray has a zero-argument constructor), MessageWithArray becomes a non-POD struct. Then I use the constructor to initialize instead. This results in the following changes in the code:
....
struct MessageWithArray {
.....
MessageWithArray() : raw(0), myArray{ 10,20,30,40,50,60,70,80,90,100 }{ };
};
....
int main(){
MessageWithArray msg;
....
}
Running this code, I get:
Size of MessageWithArray struct: 44
Is a POD? 0
A hex dump of the "message.txt" file looks like this:
00 00 00 00 0D 0A 00 00 00 14 00 00 00 1E 00 00
00 28 00 00 00 32 00 00 00 3C 00 00 00 46 00 00
00 50 00 00 00 5A 00 00 00 64 00 00 00
Now, I'm not so interested in the actual hex values, what I'm curious about is why there is one more byte in the non-POD struct dump compared to the POD struct dump, when sizeof() declares they are the same number of bytes? Is it possible that, because the constructor makes the struct non-POD, that something hidden has been added to the struct? sizeof() should be an accurate compile-time check, correct? Is something possibly avoiding being measured by sizeof()?
Specifications: I am running this in an empty project in Visual Studio 2017 version 15.7.5, Microsoft Visual C++ 2017, on a Windows 10 machine.
Intel Core i7-4600M CPU
64-bit Operating System, x64-based processor
EDIT: I decided to initialize the struct to avoid Undefined Behaviour, and because the question is still valid with the initialization. Initializing it to a value without 10 preserves the behaviour I observed initially, because the data the array had never contained any 10s (even if it was garbage, and random).
It has nothing to do with POD-ness.
Your ofstream is opened in text mode (rather than binary mode). On windows it means that \n gets converted to \r\n.
In the second case there happened to be one 0x0A (\n) byte in the struct, that became 0x0D 0x0A (\r\n). That's why you see an extra byte.
Also, using uninitialized variables in the first case leads to undefined behaviour, which is this case didn't manifest itself.
Other answer explains the problem with writing binary data into stream opened in text mode, however this code is fundamentally wrong. There is no need to dump anything, the proper way to check sizes of those structures and verify that they are equal would be to use static_assert:
struct MessageWithArray {
uint32_t raw;
uint32_t myArray[10];
};
struct NonPodMessageWithArray {
uint32_t raw;
uint32_t myArray[10];
NonPodMessageWithArray() : raw(0), myArray{ 10,20,30,40,50,60,70,80,90,100 } {}
};
static_assert(sizeof(MessageWithArray) == sizeof(NonPodMessageWithArray));
online compiler

Disable alignment on a 64-bit structure

I'm trying to align my structure and make it as small as possible using bit fields. I have to send this data back to a client, which will examine the fields to set a few data members.
The size of the structure is indeed the same, but when I set members it does not work at all.
Here's some example code:
#pragma pack(push, 1)
struct PW_INFO
{
char hash[16]; //Does not matter
uint32_t number; //Does not matter
uint32_t salt_id : 30; //Position: 0 bits
uint32_t enc_level : 7; //Position: 30 bits
uint32_t delta : 27; //Position: 37 bits
}; //Total size: 28 bytes
#pragma pack(pop)
void int64shrl(uint64_t& base, uint32_t to_shift, uint32_t position)
{
uint64_t res = static_cast<uint64_t>(to_shift);
res = Int64ShllMod32(res, position);
base |= res;
}
int32_t main()
{
std::cout << "Size of PW_INFO: " << sizeof(PW_INFO) << "\n"; //Returns 28 as expected (16 + sizeof(uint32_t) + 8)
PW_INFO pw = { "abc123", 0, 0, 0, 0 };
pw.enc_level = 105;
uint64_t base{ 0 };
&base; //debug purposes
int64shrl(base, 103, 30);
return 0;
}
Here's where it gets weird: setting the "salt_id" field (which is 30 bits into the bitfield) will yield the following result in memory:
0x003FFB8C 61 62 63 31 32 33 00 00 abc123..
0x003FFB94 00 00 00 00 00 00 00 00 ........
0x003FFB9C 00 00 00 00 00 00 00 00 ........
0x003FFBA4 69 00 00 00 i...
(Only the last 8 bytes are of concern since they represent the bit field.)
But, Int64ShllMod32 returns a correct result (the remote client undersands it perfectly):
0x003FFB7C 00 00 00 c0 19 00 00 00 ...À....
I'm guessing it has to do with alignment, if so how would I completely get rid of it? It seems even if the size is correct, it will try to align it (1 byte boundary as the #pragma directive suggests).
More information:
I use Visual Studio 2015 and its compiler.
I am not trying to write those in a different format, the reason I'm asking this is that I do NOT want to use my own format. They are reading from 64 bit bitfields everywhere, I don't have access to the source code but I see a lot of calls to Int64ShrlMod32 (from what I read, this is what the compiler produces when dealing with 8 byte structures).
The actual bitfield starts at "salt_id". 30 + 7 + 27 = 64 bits, I hope it is clearer now.