Dumping memory layout with clang

Dumping memory layout with clang - c++

Hi search for a way to dump the memory layout of a class/structure/datatype with clang.
I have a simple application based on this tutorial.
I also added this function
bool VisitFieldDecl(FieldDecl *F)
{
F->dump();
std::cerr << F->getQualifiedNameAsString() << " " << F->getBitWidthValue(*Context) << " " << std::endl;
F->dump() ;
std::cerr << "-----------------------------------------" << std::endl;
return true;
}
Unfortunately getBitWidthValue also returns zero for my types.
I need the complete memory-layout recursively for each class and all nested types. Including sizes/offsets.
Maybe the AST is the wrong place, and i need a other hook to start?

One way would be to use the "AST Record Layout" of a given const clang::CXXRecordDecl* decl in llvm/clang-3.4:
const clang::ASTRecordLayout &typeLayout(decl->getASTContext().getASTRecordLayout(decl));
std::cout << "record '" << decl->getQualifiedNameAsString() << "' with " << typeLayout.getSize().getQuantity() << "bytes\n";
for(clang::RecordDecl::field_iterator fit = decl->field_begin(); fit != decl->field_end(); fit++) {
const clang::QualType qualType = fit->getType().getLocalUnqualifiedType().getCanonicalType();
size_t fieldOffset = typeLayout.getFieldOffset(fit->getFieldIndex());
std::cout << "member '" << qualType.getAsString() << "' with " << fieldOffset/8 << " bytes offset\n";
}
no warranties: Code copied together, not tested as typed here -- but should work... (tm)
Example:
struct EXAMPLE
{
char a;
int b;
long c;
long long d;
float e;
double f;
};
Output:
record 'EXAMPLE' with 40 bytes
member 'char' with 0 bytes offset
member 'int' with 4 bytes offset
member 'long' with 8 bytes offset
member 'long long' with 16 bytes offset
member 'float' with 24 bytes offset
member 'double' with 32 bytes offset
for more see:
https://clang.llvm.org/doxygen/classclang_1_1CXXRecordDecl.html
https://clang.llvm.org/doxygen/classclang_1_1ASTRecordLayout.html

Related

How to get the values of a std::vector<int>*?

I do not know what to put in the 'Title' box so I consider that the title does not completely answer my problem and I am sorry about it.
First of all, I would like to give a bit of context. I discovered the development of the network for two weeks now for a 3D game.
Today I'm focusing on shipping packages using std::vector to send templates as .obj, but that's not my problem.
The problem is that I am not receiving the information from this vector.
As a picture is worth a thousand words, here is my code (this code is just to test the encryption of the data in a char[] on the 'server' side and the reception on the client side).
My C++ program:
#include <vector>
#include <iostream>
int main()
{
/* -- server -- */
// variables to buffer
std::vector<int> vec = { 1, 25, 156, 0, 1 };
short type = 25;
int written = 0;
char buffer[256] = {};
memcpy(&buffer[written], &type, sizeof(type));
written += sizeof(type);
memcpy(&buffer[written], &vec, sizeof(vec));
written += sizeof(vec);
std::cout << "/* -- Server -- */\n" << std::endl;
std::cout << "[written] --> " << "Size : " << sizeof(written) << " | Value : " << written << std::endl;
std::cout << "[type] --> " << "Size : " << sizeof(type) << " | Value : " << type << std::endl;
std::cout << "[vec] --> " << "Size : " << sizeof(vec) << " | Value : " << vec.data() << std::endl;
// 'send' to client && 'receive' from server \\ (for example)
char buffer2[256] = {};
memcpy(&buffer2, &buffer[0], sizeof(buffer2));
/* -- client -- */
// buffer to variables
std::vector<int>* vec2;
short type2 = 0;
int read = 0;
memcpy(&type2, &buffer[read], sizeof(type2));
read += sizeof(type2);
memcpy(&vec2, &buffer[read], sizeof(vec2));
read += sizeof(vec2);
std::cout << "\n\n";
std::cout << "/* -- Client -- */\n" << std::endl;
std::cout << "[read] --> " << "Size : " << sizeof(read) << " | Value : " << read << std::endl;
std::cout << "[type2] --> " << "Size : " << sizeof(type2) << " | Value : " << type2 << std::endl;
std::cout << "[vec2] --> " << "Size : " << sizeof(vec2) << " | Value : " << vec2->data() << std::endl;
std::cout << "\n"; system("pause");
return 0;
}
Return of the cmd:
/* -- Server -- */
[written] --> Size : 4 | Value : 18
[type] --> Size : 2 | Value : 25
[vec] --> Size : 16 | Value : 00589A00
/* -- Client -- */
[read] --> Size : 4 | Value : 6
[type2] --> Size : 2 | Value : 25
[vec2] --> Size : 4 | Value : 00000000
Press any key to continue...

memcpy(&buffer[written], &vec, sizeof(vec));
You already have problems here, even before receiving anything.
sizeof(vec) is the size of std::vector<int>. Which will probably be 8 or 16 bytes, or something like this. The size of the vector will always be the same, whether the vector is empty, or holds an image of every page in an encyclopedia.
A vector, and what's in a vector, are two completely different things.
A vector's size() method gives the number of values in the vector, so the above should obviously be:
memcpy(&buffer[written], vec.data(), vec.size()*sizeof(int));
The rest of the code should be adjusted accordingly.
Similarly, the process of deserializing into a vector is also wrong, in the shown code:
std::vector<int>* vec2;
This is a pointer to a vector. In C++, before using a pointer it must be initialized to point to an existing instance of the same type. There's nothing in the shown code that does that.
memcpy(&vec2, &buffer[read], sizeof(vec2));
Since vec2 is a pointer, sizeof(vec2) will be the size of a pointer: either 4 or 8 bytes. This attempts to deserialize the raw memory address of a pointer. Again, this makes no sense.
What the shown code is attempting to do should be:
Declare your vector
std::vector<int> vec2;
Determine how many values will be read into the vector, and resize it. If, for example, you know that you have n bytes worth of raw integer data to deserialize:
vec2.resize(n / sizeof(int));
At this point you can copy the raw data into th epointer.
memcpy(vec2.data(), &buffer[written], n);
This approach is slightly inefficient, due to resizing, but that's a secondary issue. It's also possible to implement this logic in more C++-friendly ways, but that's also a secondary issue. The main issue is that your sizeof will not magically give you the number of bytes that will be read into a vector. This is something that you need to track by yourself. There's very little that C++ will do for you, you'll always have to do all the work. You need to figure out how to keep track of the actual number of bytes that were read, that comprise the contents of the vector, then resize and read into the vector, accordingly.

Why does Visual Studio output this in my C++ program (adding string and character)? [duplicate]

The code successfully compiles it but I can't understand why, for certain values of number, the program crashes and for other values it doesn't. Could someone explain the behavior of adding a long int with a char* that the compiler uses?
#include <iostream>
int main()
{
long int number=255;
std::cout<< "Value 1 : " << std::flush << ("" + number) << std::flush << std::endl;
number=15155;
std::cout<< "Value 2 : " << std::flush << ("" + number) << std::flush << std::endl;
return 0;
}
Test results:
Value 1 : >
Value 2 : Segmentation fault
Note: I'm not looking for a solution on how to add a string with a number.

In C++, "" is a const char[1] array, which decays into a const char* pointer to the first element of the array (in this case, the string literal's '\0' nul terminator).
Adding an integer to a pointer performs pointer arithmetic, which will advance the memory address in the pointer by the specified number of elements of the type the pointer is declared as (in this case, char).
So, in your example, ... << ("" + number) << ... is equivalent to ... << &""[number] << ..., or more generically:
const char *ptr = &""[0];
ptr = reinterpret_cast<const char*>(
reinterpret_cast<const uintptr_t>(ptr)
+ (number * sizeof(char))
);
... << ptr << ...
Which means you are going out of bounds of the array when number is any value other than 0, thus your code has undefined behavior and anything could happen when operator<< tries to dereference the invalid pointer you give it.
Unlike in many scripting languages, ("" + number) is not the correct way to convert an integer to a string in C++. You need to use an explicit conversion function instead, such as std::to_string(), eg:
#include <iostream>
#include <string>
int main()
{
long int number = 255;
std::cout << "Value 1 : " << std::flush << std::to_string(number) << std::flush << std::endl;
number = 15155;
std::cout << "Value 2 : " << std::flush << std::to_string(number) << std::flush << std::endl;
return 0;
}
Or, you can simply let std::ostream::operator<< handle that conversion for you, eg:
#include <iostream>
int main()
{
long int number = 255;
std::cout<< "Value 1 : " << std::flush << number << std::flush << std::endl;
number = 15155;
std::cout<< "Value 2 : " << std::flush << number << std::flush << std::endl;
return 0;
}

Pointer arithmetic is the culprit.
A const char* is accepted by operator<<, but will not point to a valid memory address in your example.
If you switch on -Wall, you will see a compiler warning about that:
main.cpp: In function 'int main()':
main.cpp:6:59: warning: array subscript 255 is outside array bounds of 'const char [1]' [-Warray-bounds]
6 | std::cout<< "Value 1 : " << std::flush << ("" + number) << std::flush << std::endl;
| ^
main.cpp:8:59: warning: array subscript 15155 is outside array bounds of 'const char [1]' [-Warray-bounds]
8 | std::cout<< "Value 2 : " << std::flush << ("" + number) << std::flush << std::endl;
| ^
Value 1 : q
Live Demo

What's the behaviour of "" + number and why it compile?

The code successfully compiles it but I can't understand why, for certain values of number, the program crashes and for other values it doesn't. Could someone explain the behavior of adding a long int with a char* that the compiler uses?
#include <iostream>
int main()
{
long int number=255;
std::cout<< "Value 1 : " << std::flush << ("" + number) << std::flush << std::endl;
number=15155;
std::cout<< "Value 2 : " << std::flush << ("" + number) << std::flush << std::endl;
return 0;
}
Test results:
Value 1 : >
Value 2 : Segmentation fault
Note: I'm not looking for a solution on how to add a string with a number.

In C++, "" is a const char[1] array, which decays into a const char* pointer to the first element of the array (in this case, the string literal's '\0' nul terminator).
Adding an integer to a pointer performs pointer arithmetic, which will advance the memory address in the pointer by the specified number of elements of the type the pointer is declared as (in this case, char).
So, in your example, ... << ("" + number) << ... is equivalent to ... << &""[number] << ..., or more generically:
const char *ptr = &""[0];
ptr = reinterpret_cast<const char*>(
reinterpret_cast<const uintptr_t>(ptr)
+ (number * sizeof(char))
);
... << ptr << ...
Which means you are going out of bounds of the array when number is any value other than 0, thus your code has undefined behavior and anything could happen when operator<< tries to dereference the invalid pointer you give it.
Unlike in many scripting languages, ("" + number) is not the correct way to convert an integer to a string in C++. You need to use an explicit conversion function instead, such as std::to_string(), eg:
#include <iostream>
#include <string>
int main()
{
long int number = 255;
std::cout << "Value 1 : " << std::flush << std::to_string(number) << std::flush << std::endl;
number = 15155;
std::cout << "Value 2 : " << std::flush << std::to_string(number) << std::flush << std::endl;
return 0;
}
Or, you can simply let std::ostream::operator<< handle that conversion for you, eg:
#include <iostream>
int main()
{
long int number = 255;
std::cout<< "Value 1 : " << std::flush << number << std::flush << std::endl;
number = 15155;
std::cout<< "Value 2 : " << std::flush << number << std::flush << std::endl;
return 0;
}

Pointer arithmetic is the culprit.
A const char* is accepted by operator<<, but will not point to a valid memory address in your example.
If you switch on -Wall, you will see a compiler warning about that:
main.cpp: In function 'int main()':
main.cpp:6:59: warning: array subscript 255 is outside array bounds of 'const char [1]' [-Warray-bounds]
6 | std::cout<< "Value 1 : " << std::flush << ("" + number) << std::flush << std::endl;
| ^
main.cpp:8:59: warning: array subscript 15155 is outside array bounds of 'const char [1]' [-Warray-bounds]
8 | std::cout<< "Value 2 : " << std::flush << ("" + number) << std::flush << std::endl;
| ^
Value 1 : q
Live Demo

How to print data member's address(in class offset) with "cout"

I have a class below:
class A
{
public:
double a;
float b;
double c;
};
I want to print data member offset in class, than I use:
double A::* pm = &A::a;
cout << *(int *)&pm << endl;
It works well and print '0', but I don't want to use intermediate variable pm
cout << *(int *)&A::a << endl;
I got compile error with : Invalid type conversion

With offset is assumed that you refer offset in bytes.
You could try this solution:
(size_t) &(((A*)0)->a) // prints 0
Actually this is the implementation of the macro offsetof as WhozCraig suggested.
...
cout << "A::a => " << (size_t) &(((A*)0)->a)
<< "\nA::b => " << (size_t) &(((A*)0)->b)
<< "\nA::c => " << (size_t) &(((A*)0)->c);
...
Combined with your data, previous snippet will print:
A::a => 0
A::b => 8
A::c => 16

Correctly Deal With Byte Alignment Issues -- Between 16 Bit Embeded System and 32 Bit Desktop via UDP

The application I am working on receives C style structs from an embed system whose code was generated to target a 16 bit processor. The application which speaks with the embedded system is built with either a 32 bit gcc compiler, or a 32 bit MSVC c++ compiler. The communication between the application and the embedded system takes place via UDP packets over ethernet or modem.
The payload within the UDP packets consist of various different C style structs. On the application side a C++ style reinterpret_cast is capable of taking the unsigned byte array and casting it into the appropriate struct.
However, I run into problems with reinterpret_cast when the struct contains enumerated values. The 16 bit Watcom compiler will treat enumerated values as an uint8_t type. However, on the application side the enumerated values are treated as 32 bit values. When I receive a packet with enumerated values in it the data gets garbled because the size of the struct on the application side is larger the struct on the embedded side.
The solution to this problem, so far, has been to change the enumerated type within the struct on the application side to an uint8_t. However, this is not an optimal solution because we can no longer use the member as an enumerated type.
What I am looking for is a solution which will allow me to use a simple cast operation without having to tamper with the struct definition in the source on the application side. By doing so, I can use the struct as is in the upper layers of my application.

As noted, correctly deal with the issue is proper serialization and deserialization.
But it doesn't mean we can't try some hacks.
Option 1:
If you particular compiler support packing the enum (in my case gcc 4.7 in windows), this might work:
typedef enum { VALUE_1 = 1, VALUE_2, VALUE_3 }__attribute__ ((__packed__)) TheRealEnum;
Option 2:
If your particular compiler supports class sizes of < 4 bytes, you can use a HackedEnum class which uses operator overloading for the conversion (note the gcc attribute you might not want it):
class HackedEnum
{
private:
uint8_t evalue;
public:
void operator=(const TheRealEnum v) { evalue = v; };
operator TheRealEnum() { return (TheRealEnum)evalue; };
}__attribute__((packed));
You would replace TheRealEnum in your structures for HackedEnum, but you still continue using it as TheRealEnum.
A full example to see it working:
#include <iostream>
#include <stddef.h>
using namespace std;
#pragma pack(push, 1)
typedef enum { VALUE_1 = 1, VALUE_2, VALUE_3 } TheRealEnum;
typedef struct
{
uint16_t v1;
uint8_t enumValue;
uint16_t v2;
}__attribute__((packed)) ShortStruct;
typedef struct
{
uint16_t v1;
TheRealEnum enumValue;
uint16_t v2;
}__attribute__((packed)) LongStruct;
class HackedEnum
{
private:
uint8_t evalue;
public:
void operator=(const TheRealEnum v) { evalue = v; };
operator TheRealEnum() { return (TheRealEnum)evalue; };
}__attribute__((packed));
typedef struct
{
uint16_t v1;
HackedEnum enumValue;
uint16_t v2;
}__attribute__((packed)) HackedStruct;
#pragma pop()
int main(int argc, char **argv)
{
cout << "Sizes: " << endl
<< "TheRealEnum: " << sizeof(TheRealEnum) << endl
<< "ShortStruct: " << sizeof(ShortStruct) << endl
<< "LongStruct: " << sizeof(LongStruct) << endl
<< "HackedStruct: " << sizeof(HackedStruct) << endl;
ShortStruct ss;
cout << "address of ss: " << &ss << " size " << sizeof(ss) <<endl
<< "address of ss.v1: " << (void*)&ss.v1 << endl
<< "address of ss.ev: " << (void*)&ss.enumValue << endl
<< "address of ss.v2: " << (void*)&ss.v2 << endl;
LongStruct ls;
cout << "address of ls: " << &ls << " size " << sizeof(ls) <<endl
<< "address of ls.v1: " << (void*)&ls.v1 << endl
<< "address of ls.ev: " << (void*)&ls.enumValue << endl
<< "address of ls.v2: " << (void*)&ls.v2 << endl;
HackedStruct hs;
cout << "address of hs: " << &hs << " size " << sizeof(hs) <<endl
<< "address of hs.v1: " << (void*)&hs.v1 << endl
<< "address of hs.ev: " << (void*)&hs.enumValue << endl
<< "address of hs.v2: " << (void*)&hs.v2 << endl;
uint8_t buffer[512] = {0};
ShortStruct * short_ptr = (ShortStruct*)buffer;
LongStruct * long_ptr = (LongStruct*)buffer;
HackedStruct * hacked_ptr = (HackedStruct*)buffer;
short_ptr->v1 = 1;
short_ptr->enumValue = VALUE_2;
short_ptr->v2 = 3;
cout << "Values of short: " << endl
<< "v1 = " << short_ptr->v1 << endl
<< "ev = " << (int)short_ptr->enumValue << endl
<< "v2 = " << short_ptr->v2 << endl;
cout << "Values of long: " << endl
<< "v1 = " << long_ptr->v1 << endl
<< "ev = " << long_ptr->enumValue << endl
<< "v2 = " << long_ptr->v2 << endl;
cout << "Values of hacked: " << endl
<< "v1 = " << hacked_ptr->v1 << endl
<< "ev = " << hacked_ptr->enumValue << endl
<< "v2 = " << hacked_ptr->v2 << endl;
HackedStruct hs1, hs2;
// hs1.enumValue = 1; // error, the value is not the wanted enum
hs1.enumValue = VALUE_1;
int a = hs1.enumValue;
TheRealEnum b = hs1.enumValue;
hs2.enumValue = hs1.enumValue;
return 0;
}
The output on my particular system is:
Sizes:
TheRealEnum: 4
ShortStruct: 5
LongStruct: 8
HackedStruct: 5
address of ss: 0x22ff17 size 5
address of ss.v1: 0x22ff17
address of ss.ev: 0x22ff19
address of ss.v2: 0x22ff1a
address of ls: 0x22ff0f size 8
address of ls.v1: 0x22ff0f
address of ls.ev: 0x22ff11
address of ls.v2: 0x22ff15
address of hs: 0x22ff0a size 5
address of hs.v1: 0x22ff0a
address of hs.ev: 0x22ff0c
address of hs.v2: 0x22ff0d
Values of short:
v1 = 1
ev = 2
v2 = 3
Values of long:
v1 = 1
ev = 770
v2 = 0
Values of hacked:
v1 = 1
ev = 2
v2 = 3

On the application side a C++ style reinterpret_cast is capable of taking the unsigned byte array and casting it into the appropriate struct.
The layout of structs is not required to be the same between different implementations. Using reinterpret_cast in this way is not appropriate.
The 16 bit Watcom compiler will treat enumerated values as an uint8_t type. However, on the application side the enumerated values are treated as 32 bit values.
The underlying type of an enum is chosen by the implementation, and is chosen in an implementation defined manner.
This is just one of the many potential differences between implementations that can cause problems with your reinterpret_cast. There are also actual alignment issues if you're not careful, where the data in the received buffer isn't appropriately aligned for the types (e.g., an integer that requires four byte alignment ends up one byte off) which can cause crashes or poor performance. Padding might be different between platforms, fundamental types might have different sizes, endianess can differ, etc.
What I am looking for is a solution which will allow me to use a simple cast operation without having to tamper with the struct definition in the source on the application side. By doing so, I can use the struct as is in the upper layers of my application.
C++11 introduces a new enum syntax that allows you to specify the underlying type. Or you can replace your enums with integral types along with a bunch of predefined constants with manually declared values. This only fixes the problem you're asking about and not any of the other ones you have.
What you should really do is proper serialization and deserialization.

Put your enumerated type inside of a union with a 32-bit number:
union
{
Enumerated val;
uint32_t valAsUint32;
};
This would make the embedded side have it expanded to 32-bit. Should work as long as both platforms are little-endian and the structs are zero-filled initially. This would change wire format, though.

If by "simple cast operation" you mean something that's expressed in the source code, rather than something that's necessarily zero-copy, then you can write two versions of the struct -- one with enums, one with uint8_ts, and a constructor for one from the other that copies it element-by-element to repack it. Then you can use an ordinary type-cast in the rest of the code. Since the data sizes are fundamentally different (unless you use the C++11 features mentioned in another answer), you can't do this without copying things to repack them.
However, if you don't mind some small changes to the struct definition on the application side, there are a couple of options that don't involve dealing with bare uint8_t values. You could use aaronps's answer of a class that is the size of a uint8_t (assuming that's possible with your compiler) and implicitly converts to and from an enum. Alternately, you could store the values as uint8_ts and write some accessor methods for your enum values that take the uint8_t data in the struct and convert it to an enum before returning it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Dumping memory layout with clang - c++

Related

How to get the values of a std::vector<int>*?

Why does Visual Studio output this in my C++ program (adding string and character)? [duplicate]

What's the behaviour of "" + number and why it compile?

How to print data member's address(in class offset) with "cout"

Correctly Deal With Byte Alignment Issues -- Between 16 Bit Embeded System and 32 Bit Desktop via UDP

Categories

Resources