Serializing binary struct gcc vs cl - c++

Full disclosure - this is homework, although completed and fully working, I'm searching for a nicer solution.
I have a binary file, which was created by a program compiled within Visual Studio (I believe). The structure looks something like this.
struct Record {
char c;
double d;
time_t t;
};
The size of this structure on Windows with Visual Studio 2008 gives 24 bytes. 1 + 8 + 8 = 24. So there's some padding going on. The same structure on Linux and gcc gives 16 bytes. 1 + 8 + 4 = 16. To line this up I added some padding and changed time_t to another type. So then my struct looks like this.
struct Record {
char c;
char __padding[7];
double d;
long long t;
};
This now works and gcc gives its size as 24 bytes, but it seems a little dirty. So two questions..
Why is this implemented differently between the two compilers?
Are there any __attribute__ ((aligned))-type options or any other cleaner solutions for this?

The difference stems from whether we 32bit align doubles by default or 64bit align doubles by default. On a 32 bit machine, having a double on a 64 bit boundary may have some benefits but is probably not huge. VC is then probably more careful about this than gcc.
The botton line is that if you are using structs for serialization you should ALWAYS make them packed (ie 8 bit aligned) and then do the alignment by hand. This way your code is sure to be compatible across platforms.

Related

Force C++ structure to pack tightly

I am attempting to read in a binary file. The problem is that the creator of the file took no time to properly align data structures to their natural boundaries and everything is packed tight. This makes it difficult to read the data using C++ structs.
Is there a way to force a struct to be packed tight?
Example:
struct {
short a;
int b;
}
The above structure is 8 bytes: 2 for short a, 2 for padding, 4 for int b. However, on disk, the data is only 6 bytes (not having the 2 bytes of padding for alignment)
Please be aware the actual data structures are thousands of bytes and many fields, including a couple arrays, so I would prefer not to read each field individually.
If you're using GCC, you can do struct __attribute__ ((packed)) { short a; int b; }
On VC++ you can do #pragma pack(1). This option is also supported by GCC.
#pragma pack(push, 1)
struct { short a; int b; }
#pragma pack(pop)
Other compilers may have options to do a tight packing of the structure with no padding.
You need to use a compiler-specific, non-Standard directive to specify 1-byte packing. Such as under Windows:
#pragma pack (push, 1)
The problem is that the creator of the file took no time to properly
byte align the data structures and everything is packed tight.
Actually, the designer did the right thing. Padding is something that the Standard says can be applied, but it doesn't say how much padding should be applied in what cases. The Standard doesn't even say how many bits are in a byte. Even though you might assume that even though these things aren't specified they should still be the same reasonable value on modern machines, that's simply not true. On a 32-bit Windows machine for example the padding might be one thing whereas on the 64-bit version of Windows is might be something else. Maybe it will be the same -- that's not the point. The point is you don't know what the padding will be on different systems.
So by "packing it tight" the developer did the only thing they could -- use some packing that he can be reasonably sure that every system will be able to understand. In that case that commonly-understood packing is to use no padding in structures saved to disk or sent down a wire.

How to guarantee long is 4 bytes

In c++, is there any way to guarantee that a long is 4 bytes? Perhaps a compiler flag for g++?
We are reusing some windows code in a linux program, and in windows a long is 4 bytes, but on my linux machine a long is 8 bytes. We can't go and change all the longs to ints because that would break the windows code.
The reason I need to guarantee longs are 4 bytes is because in certain parts of the code, we have a union of a struct and a char array, and when compiling for 64bit linux, the char array does not line up with the other struct.
Here are some code snippits for clarification:
struct ABC {
unsigned long data1;
unsigned short data2;
unsigned short data3;
unsigned char data4[8];
};
//sizeof(ABC) (32 bit): 16, sizeof(ABC) (64 bit): 24
union ABC_U {
ABC abc;
unsigned char bytes[16];
};
EDIT:
I forgot to mention, this problem only came up when trying to compile for 64 bit. Windows seems to like to keep longs 4 bytes regardless of architecture, whereas linux g++ usually makes longs the same size as pointers.
I'm leaning towards using a uint32_t here because this particular structure isn't used in the Windows code, and that wouldn't affect the program globally. Hopefully there aren't any other sections of the code where this will be a problem.
I found the compiler flag -mlong32, but this also forces pointers to be 32 bits which is undesirable, and as it forces nonstandard behavior would likely break the ABI like PascalCuoq mentioned.
You can use int32_t from stdint.h.
Instead of using int and long, you will likely want to create a header file that uses typedef's (preferred) or preprocessor macros to define common typenames for you to use.
#ifdef _WINDOWS
typedef unsigned long uint32;
#else
typedef unsigned int uint32;
#endif
In your union, you would use uint32 instead of long.
The header file stdint.h does exactly this, but is not always installed as a standard header file with every compiler. Visual Studio (for example) does not have it by default. You can download it from http://msinttypes.googlecode.com/svn/trunk/stdint.h if you would prefer to use it instead.
You want the compiler to generate code where long is 32-bit.
Compile the entire codebase with -m32. This will generate code in the old IA-32 instruction set, which is still widely supported library-wise(*), and where long is traditionally 32-bit like int.
(*) You may need to install these libraries in your Linux distribution, and you may have to ask your users to install them.

Sizeof() difference between C++ on PC and Arduino [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
In the following code, the value of structSize is different depending on whether it's executed on an Arduino vs my PC (Ubuntu 11.04 x64).
struct testStruct{
uint8_t val1;
uint16_t val2;
};
...
uint_8_t structSize = sizeof(testStruct);
On my PC, the value of structSize is 4, and on my Arduino the value of structSize is 3 (as expected).
Where is this 4th byte coming from?
Actually, I would have expected the size to be 4, because uint16_t is usually aligned to 16 bits.
The extra byte is padding inserted between the members to keep the alignment of uint16_t.
This is compiler dependent though. Arduino might be more selfish with memory and probably doesn't care that much about alignment. (possible explanation)
It's because of differing ABIs between the two CPU types you're targetting. It seems like that on Arduino (ARM v7?) differs from x86_64.
On x86 at least, uint16_t (short) is generally aligned to a two-byte boundary. In order to achieve that, a byte of padding is inserted after val1. I expect the same is true on x86_64.
There's lots of information about this in the Wikipedia article on x86 structure padding.
You may be able to achieve what you want using the #pragma pack directive … but here be dragons, don't tell anyone I suggested it :)
If you are designing a database engine to run on a mobile processor then carry on, but for most anything else you are writing, your time will be better spent on building functionality using type systems that are easy to understand and relatively standard across architectures.

Control trailing struct padding in VS 2008/2010? (#pragma pack is not good enough)

The project that I've been working on involves porting some old code. Right now we are using VS2010 but the project is setup to use the VS2008 compiler and tool chain but eventually we will probably move all the way to the VS2010 toolchain. The struct in question looks like this:
struct HuffmanDecodeNode
{
union
{
TCHAR cSymbol;
struct
{
WORD nOneIndex;
WORD nZeroIndex;
} cChildren;
} uNodeData;
BYTE bLeaf;
}
For reasons that I won't go into, sizeof(HuffmanDecodeNode) needs to be 8. I'm assuming that on the older compilers this worked out correctly, but now I'm seeing that the size is 6 unless I throw in some padding bytes. #pragma pack(show) confirms that the data should be 4 byte aligned which I assume used to be sufficient, but it appears that the newer compiler only uses this for alignment and doesn't insert any trailing padding at the end of the struct.
Is there any way that I can control the trailing padding without just adding more struct members?
You can put __declspec( align(8) ) in front of your struct declaration.
http://msdn.microsoft.com/en-us/library/83ythb65%28v=vs.100%29.aspx
but...
WCHAR has size 2 bytes, same for WORD. They both need alignment to 2 bytes only.
BYTE has size 1 byte and no alignment requirement.
I don't think you need the 4 byte alignment in your struct.
http://msdn.microsoft.com/en-us/library/aa383751%28v=vs.85%29.aspx
P.S. In GCC you can do the same with __attribute__ ((aligned (8))

Is the sizeof(some pointer) always equal to four? [duplicate]

This question already has answers here:
Do all pointers have the same size in C++?
(10 answers)
Closed 8 months ago.
For example:
sizeof(char*) returns 4. As does int*, long long*, everything that I've tried. Are there any exceptions to this?
The guarantee you get is that sizeof(char) == 1. There are no other guarantees, including no guarantee that sizeof(int *) == sizeof(double *).
In practice, pointers will be size 2 on a 16-bit system (if you can find one), 4 on a 32-bit system, and 8 on a 64-bit system, but there's nothing to be gained in relying on a given size.
Even on a plain x86 32 bit platform, you can get a variety of pointer sizes, try this out for an example:
struct A {};
struct B : virtual public A {};
struct C {};
struct D : public A, public C {};
int main()
{
cout << "A:" << sizeof(void (A::*)()) << endl;
cout << "B:" << sizeof(void (B::*)()) << endl;
cout << "D:" << sizeof(void (D::*)()) << endl;
}
Under Visual C++ 2008, I get 4, 12 and 8 for the sizes of the pointers-to-member-function.
Raymond Chen talked about this here.
Just another exception to the already posted list. On 32-bit platforms, pointers can take 6, not 4, bytes:
#include <stdio.h>
#include <stdlib.h>
int main() {
char far* ptr; // note that this is a far pointer
printf( "%d\n", sizeof( ptr));
return EXIT_SUCCESS;
}
If you compile this program with Open Watcom and run it, you'll get 6, because far pointers that it supports consist of 32-bit offset and 16-bit segment values
if you are compiling for a 64-bit machine, then it may be 8.
Technically speaking, the C standard only guarantees that sizeof(char) == 1, and the rest is up to the implementation. But on modern x86 architectures (e.g. Intel/AMD chips) it's fairly predictable.
You've probably heard processors described as being 16-bit, 32-bit, 64-bit, etc. This usually means that the processor uses N-bits for integers. Since pointers store memory addresses, and memory addresses are integers, this effectively tells you how many bits are going to be used for pointers. sizeof is usually measured in bytes, so code compiled for 32-bit processors will report the size of pointers to be 4 (32 bits / 8 bits per byte), and code for 64-bit processors will report the size of pointers to be 8 (64 bits / 8 bits per byte). This is where the limitation of 4GB of RAM for 32-bit processors comes from -- if each memory address corresponds to a byte, to address more memory you need integers larger than 32-bits.
The size of the pointer basically depends on the architecture of the system in which it is implemented. For example the size of a pointer in 32 bit is 4 bytes (32 bit ) and 8 bytes(64 bit ) in a 64 bit machines. The bit types in a machine are nothing but memory address, that it can have. 32 bit machines can have 2^32 address space and 64 bit machines can have upto 2^64 address spaces. So a pointer (variable which points to a memory location) should be able to point to any of the memory address (2^32 for 32 bit and 2^64 for 64 bit) that a machines holds.
Because of this reason we see the size of a pointer to be 4 bytes in 32 bit machine and 8 bytes in a 64 bit machine.
In addition to the 16/32/64 bit differences even odder things can occur.
There have been machines where sizeof(int *) will be one value, probably 4 but where sizeof(char *) is larger. Machines that naturally address words instead of bytes have to "augment" character pointers to specify what portion of the word you really want in order to properly implement the C/C++ standard.
This is now very unusual as hardware designers have learned the value of byte addressability.
8 bit and 16 bit pointers are used in most low profile microcontrollers. That means every washing machine, micro, fridge, older TVs, and even cars.
You could say these have nothing to do with real world programming.
But here is one real world example:
Arduino with 1-2-4k ram (depending on chip) with 2 byte pointers.
It's recent, cheap, accessible for everyone and worths coding for.
In addition to what people have said about 64-bit (or whatever) systems, there are other kinds of pointer than pointer-to-object.
A pointer-to-member might be almost any size, depending how they're implemented by your compiler: they aren't necessarily even all the same size. Try a pointer-to-member of a POD class, and then a pointer-to-member inherited from one of the base classes of a class with multiple bases. What fun.
From what I recall, it's based on the size of a memory address. So on a system with a 32-bit address scheme, sizeof will return 4, since that's 4 bytes.
In general, sizeof(pretty much anything) will change when you compile on different platforms. On a 32 bit platform, pointers are always the same size. On other platforms (64 bit being the obvious example) this can change.
No, the size of a pointer may vary depending on the architecture. There are numerous exceptions.
Size of pointer and int is 2 bytes in Turbo C compiler on windows 32 bit machine.
So size of pointer is compiler specific. But generally most of the compilers are implemented to support 4 byte pointer variable in 32 bit and 8 byte pointer variable in 64 bit machine).
So size of pointer is not same in all machines.
In Win64 (Cygwin GCC 5.4), let's see the below example:
First, test the following struct:
struct list_node{
int a;
list_node* prev;
list_node* next;
};
struct test_struc{
char a, b;
};
The test code is below:
std::cout<<"sizeof(int): "<<sizeof(int)<<std::endl;
std::cout<<"sizeof(int*): "<<sizeof(int*)<<std::endl;
std::cout<<std::endl;
std::cout<<"sizeof(double): "<<sizeof(double)<<std::endl;
std::cout<<"sizeof(double*): "<<sizeof(double*)<<std::endl;
std::cout<<std::endl;
std::cout<<"sizeof(list_node): "<<sizeof(list_node)<<std::endl;
std::cout<<"sizeof(list_node*): "<<sizeof(list_node*)<<std::endl;
std::cout<<std::endl;
std::cout<<"sizeof(test_struc): "<<sizeof(test_struc)<<std::endl;
std::cout<<"sizeof(test_struc*): "<<sizeof(test_struc*)<<std::endl;
The output is below:
sizeof(int): 4
sizeof(int*): 8
sizeof(double): 8
sizeof(double*): 8
sizeof(list_node): 24
sizeof(list_node*): 8
sizeof(test_struc): 2
sizeof(test_struc*): 8
You can see that in 64-bit, sizeof(pointer) is 8.
The reason the size of your pointer is 4 bytes is because you are compiling for a 32-bit architecture. As FryGuy pointed out, on a 64-bit architecture you would see 8.
A pointer is just a container for an address. On a 32 bit machine, your address range is 32 bits, so a pointer will always be 4 bytes. On a 64 bit machine were you have an address range of 64 bits, a pointer will be 8 bytes.
Just for completeness and historic interest, in the 64bit world there were different platform conventions on the sizes of long and long long types, named LLP64 and LP64, mainly between Unix-type systems and Windows. An old standard named ILP64 also made int = 64-bit wide.
Microsoft maintained LLP64 where longlong = 64 bit wide, but long remained at 32, for easier porting.
Type ILP64 LP64 LLP64
char 8 8 8
short 16 16 16
int 64 32 32
long 64 64 32
long long 64 64 64
pointer 64 64 64
Source: https://stackoverflow.com/a/384672/48026