I'm porting an application from 32 bit to 64 bit.
It is C style coding (legacy product) although it is C++. I have an issue where a combination of union and struct are used to store values. Here a custom datatype called "Any" is used that should hold data of any basic datatype. The implementation of Any is as follows:
typedef struct typedvalue
{
long data; // to hold all other types of 4 bytes or less
short id; // this tells what type "data" is holding
short sign; // this differentiates the double value from the rest
}typedvalue;
typedef union Any
{
double any_any;
double any_double; // to hold double value
typedvalue any_typedvalue;
}Any;
The union is of size 8 bytes. They have used union so that at a given time there will only be one value and they have used struct to differentiate the type. You can store a double, long, string, char, float and int values at any given time. Thats the idea.
If its a double value, the value is stored in any_double. if its any other type, then its stored in "data" and the type of the value is stored in the "id". The "sign" would tell if value "Any" is holding a double or another type.
any_any is used liberally in the code to copy the value in the address space irrespective of the type. (This is our biggest problem since we do not know at a given time what it will hold!)
If its a string or pointer "Any" is suppose to hold, it is stored in "data" (which is of type long). In 64 bit, here is where the problem lies. pointers are 8 bytes. So we will need to change the "long" to an equivalent 8 byte (long long). But then that would increase the size of the union to 16 bytes and the liberal usage of "any_any" will cause problems. There are too many usage of "any_any" and you are never sure what it can hold.
I already tried these steps and it turned unsuccessful:
1. Changed the "long data" to "long long data" in the struct, this will make the size of the union to 16 bytes. - This will not allow the data to be passed as "any_any" (8 bytes).
2. Declared the struct as a pointer inside union. And changed the "long data" to "long long data" inside struct. - the issue encountered here was that, since its a pointer we need to allocate memory for the struct. The liberal use of "any_any" makes it difficult for us to allocate memory. Sometimes we might overwrite the memory and hence erase the value.
3. Create a separate collection that will hold the value for "data" (a key value pair). - This will not work because this implementation is at the core of application, the collection will run into millions of data.
Can anybody help me in this?
"Can anybody help me" this sounds like a cry of desperation, and I totally understand it.
Whoever wrote this code had absolutely no respect for future-proofing, or of portability, and now you're paying the price.
(Let this be a lesson to anyone who says "but our platform is 32bit! we will never use 64bit!")
I know you're going to say "but the codebase is too big", but you are better off rewriting the product. And do it properly this time!
Ignoring that fact that the original design is insane, you could use <stdint.h> (or soon <cstdint> to get a little bit of predictability:
struct typedvalue
{
uint16_t id;
uint16_t sign;
uint32_t data;
};
union any
{
char any_raw[8];
double any_double
typedvalue any_typedvalue;
};
You're still not guaranteed that typedvalue will be tightly packed, since there are no alignment guarantees for non-char members. You could make a struct Foo { char x[8]; }; and type-pun your way around, like *(uint32_t*)(&Foo.x[0]) and *(uint16_t*)(&Foo.x[4]) if you must, but that too would be extremely ugly.
If you are in C++0x, I would definitely throw in a static assertion somewhere for sizeof(typedvalue) == sizeof(double).
If you need to store both an 8 byte pointer and a "type" field then you have no choice but to use at least 9 bytes, and on a 64-bit system alignment will likely pad that out to 16 bytes.
Your data structure should look something like:
typedef struct {
union {
void *any_pointer;
double any_double;
long any_long;
int any_int;
} any;
char my_type;
} any;
If using C++0x consider using a strongly typed enumeration for the my_type field. In earlier versions the storage required for an enum is implementation dependent and likely to be more than one byte.
To save memory you could use (compiler specific) directives to request optimal packing of the data structure, but the resulting mis-aligned memory accesses may cause performance issues.
Related
I want to use and store "Handles" to data in an object buffer to reduce allocation overhead. The handle is simply an index into an array with the object. However I need to detect use-after-reallocations, as this could slip in quite easily. The common approach seems to be using bit fields. However this leads to 2 problems:
Bit fields are implementation defined
Bit shifting is not portable across big/little endian machines.
What I need:
Store handle to file (file handler can manage either integer types (byte swapping) or byte arrays)
Store 2 values in the handle with minimum space
What I got:
template<class T_HandleDef, typename T_Storage = uint32_t>
struct Handle
{
typedef T_HandleDef HandleDef;
typedef T_Storage Storage;
Handle(): handle_(0){}
private:
const T_Storage handle_;
};
template<unsigned T_numIndexBits = 16, typename T_Tag = void>
struct HandleDef{
static const unsigned numIndexBits = T_numIndexBits;
};
template<class T_Handle>
struct HandleAccessor{
typedef typename T_Handle::Storage Storage;
typedef typename T_Handle::HandleDef HandleDef;
static const unsigned numIndexBits = HandleDef::numIndexBits;
static const unsigned numMagicBits = sizeof(Storage) * 8 - numIndexBits;
/// "Magic" struct that splits the handle into values
union HandleData{
struct
{
Storage index : numIndexBits;
Storage magic : numMagicBits;
};
T_Handle handle;
};
};
A usage would be for example:
typedef Handle<HandleDef<24> > FooHandle;
FooHandle Create(unsigned idx, unsigned m){
HandleAccessor<FooHandle>::HandleData data;
data.idx = idx;
data.magic = m;
return data.handle;
}
My goal was to keep the handle as opaque as possible, add a bool check but nothing else. Users of the handle should not be able to do anything with it but passing it around.
So problems I run into:
Union is UB -> Replace its T_Handle by Storage and add a ctor to Handle from Storage
How does the compiler layout the bit field? I fill the whole union/type so there should be no padding. So probably the only thing that can be different is which type comes first depending on endianess, correct?
How can I store handle_ to a file and load it from a possible different endianess machine and still have index and magic be correct? I think I can store the containing Storage 'endian-correct' and get correct values, IF both members occupy exactly half the space (2 Shorts in an uint) But I always want more space for the index than for the magic value.
Note: There are already questions about bitfields and unions. Summary:
Bitfields may have unexpected padding (impossible here as whole type occupied)
Order of "members" depend on compiler (only 2 possible ways here, should be save to assume order depends entirely on endianess, so this may or may not actually help here)
Specific binary layout of bits can be achieved by manual shifting (or e.g. wrappers http://blog.codef00.com/2014/12/06/portable-bitfields-using-c11/) -> Is not an answer here. I need also a specific layout of the values IN the bitfield. So I'm not sure what I get, if I e.g. create a handle as handle = (magic << numIndexBits) | index and save/load this as binary (no endianess conversion) Missing a BigEndian machine for testing.
Note: No C++11, but boost is allowed.
Answer is pretty simple (based on another question I forgot the link to and comments by #Jeremy Friesner ):
As "numbers" are already an abstraction in C++ one can be sure to always have the same bit representation when the variable is in a CPU register (when it is used for anything calculation like) Also bit shifts in C++ are defined in an endian-independent way. This means x << 1 is always equal x * 2 (and hence big-endian)
Only time one get endianess problems is when saving to file, send/recv over network or accessing it from memory differently (e.g. via pointers...)
One cannot use C++ bitfields here, as one cannot be 100% sure about the order of the "entries". Bitfield containers might be ok, if they allow access to the data as a "number".
Savest is (still) using bitshifts, which are very simple in this case (only 2 values) During storing/serialization the number must then be stored in an endian-agnostic way.
The VS documentation states
Half the size of a pointer. Use within a structure that contains a pointer and two small fields.
Windows Data Types
What, exactly, is this type and how is it used, if ever?
Note: Anonymous structs are not standard, but MSVC takes them:
union
{
int * aPointer
struct
{
HALF_PTR lowerBits;
HALF_PTR upperBits;
};
} myvar; //You can be assured this union is sizeof(int *)
If you're thinking they're not too terribly useful, you would be right.
I found this article on Intel's site, and it they suggest using it in a context where you have a class with many pointer members, along with a 32-bit offset to get the actual address, to cut down on data bloat of a class. The article specifically talks about the Itanium platform because it uses 64-bit pointers instead of 32-bit, but I assume the problem/solution to the problem would be the same on any system using 64-bit pointers.
So in short, it seems to suggest that it can be used if you, for example, wish to reduce the memory footprint of a class?
Use within a structure that contains a pointer and two small fields.
This means that in the following structure, no padding is required:
struct Example {
void* pointer;
HALF_PTR one;
HALF_PTR two;
};
Of course, this is only relevant if the size of HALF_PTR (32 bits on a 64-bit system, 16 bits on a 32-bit system) is sufficient to hold the intended values.
I guess “Use within a structure that contains a pointer and two small fields” means a pointer constructed from two HALF_PTRs along with two other non-pointer small data fields
struct Packed {
HALF_PTR low_ptr;
HALF_PTR high_ptr;
SMALL one;
SMALL two;
};
struct Padded {
void *ptr;
SMALL one;
SMALL two;
};
On 32-bit Windows:
When SMALL is char: sizeof(Packed) == 6 but sizeof(Padded) = 8
When SMALL is short: the size of both structs are 8, but the alignment requirement for the former is just 2 compared to 4
On 64-bit Windows:
When SMALL is char: sizeof(Packed) == 10 but sizeof(Padded) = 16
When SMALL is short: sizeof(Packed) == 12 but sizeof(Padded) = 16
When SMALL is int: same size, but reduced alignment requirement like above
This is unlike Philipp's answer where the size and alignment of the struct is exactly the same whether splitting into half or not
struct Example1 {
void* pointer;
HALF_PTR one;
HALF_PTR two;
};
struct Example2 {
void* pointer;
void* one_two;
};
Both have alignment equal to the size of the pointer, and size of 2 pointers
So, you know how the primitive of type char has the size of 1 byte? How would I make a primitive with a custom size? So like instead of an in int with the size of 4 bytes I make one with size of lets say 16.
Is there a way to do this? Is there a way around it?
It depends on why you are doing this. Usually, you can't use types of less than 8 bits, because that is the addressable unit for the architecture. You can use structs, however, to define different lengths:
struct s {
unsigned int a : 4; // a is 4 bits
unsigned int b : 4; // b is 4 bits
unsigned int c : 16; // c is 16 bits
};
However, there is no guarantee that the struct will be 24 bits long. Also, this can cause endian issues. Where you can, it's best to use system independent types, such as uint16_t, etc. You can also use bitwise operators and bit shifts to twiddle things very specifically.
Normally you'd just make a struct that represents the data in which you're interested. If it's 16 bytes of data, either it's an aggregate of a number of smaller types or you're working on a processor that has a native 16-byte integral type.
If you're trying to represent extremely large numbers, you may need to find a special library that handles arbitrarily-sized numbers.
In C++11, there is an excellent solution for this: std::aligned_storage.
#include <memory>
#include <type_traits>
int main()
{
typedef typename std::aligned_storage<sizeof(int)>::type memory_type;
memory_type i;
reinterpret_cast<int&>(i) = 5;
std::cout << reinterpret_cast<int&>(i) << std::endl;
return 0;
}
It allows you to declare a block of uninitialized storage on the stack.
If you want to make a new type, typedef it. If you want it to be 16-bytes in size, typedef a struct that has 16-bytes of member data within it. Just beware that quite often compilers will pad things on you to match your systems alignment needs. A 1 byte struct rarely remains 1 bytes without care.
You could just static cast to and from std::string. I don't know enough C++ to give an example, but I think this would be pretty intuitive.
I am curious to know why bit fields with same data type takes less size than with mixed
data types.
struct xyz
{
int x : 1;
int y : 1;
int z : 1;
};
struct abc
{
char x : 1;
int y : 1;
bool z : 1;
};
sizeof(xyz) = 4
sizeof(abc) = 12.
I am using VS 2005, 64bit x86 machine.
A bit machine/compiler level answer would be great.
Alignment.
Your compiler is going to align variables in a way that makes sense for your architecture. In your case, char, int, and bool are different sizes, so it will go by that information rather than your bit field hints.
There was some discussion in this question on the matter.
The solution is to give #pragma directives or __attributes__ to your compiler to instruct it to ignore alignment optimizations.
The C standard (1999 version, §6.7.2.1, page 102, point 10) says this:
An implementation may allocate any addressable storage unit large enough to hold a
bit-field. If enough space remains, a bit-field that immediately follows another
bit-field in a structure shall be packed into adjacent bits of the same unit.
There does not seem to be any wording to allow the packing to be affected by the types of the fields. Thus I would conclude that this is a compiler bug.
gcc makes a 4 byte struct in either case, on both a 32-bit and a 64-bit machine, under Linux. I don't have VS and can't test that.
It's complier bug or some code error.
All bits assigned in the structure always try to make sizeof highest data type defined.
e.g. In struct xyz sizeof highest data type is 4 i.e. of int.
In the similar fashion for second structure abc highest data type size is 4 for int.
Where as if we change variables of structure as following:
struct abc
{
char a:1;
char b:1;
bool c:1;
};
sizeof(abc) would be 1 not 4. Since size highest data type is 1 and all bits fit into 1byte of char.
various tests could be performed by changing data type in the structure.
Link for output based on old structure:
Visit http://codepad.org/6j5z2CEX
Link for output based on above structure defined by me:
Visit http://codepad.org/fqF9Ob8W
To avoid such problems for sizeof structures we shall properly pack structures using #pragma pack macro.
I am working on refactoring some old code and have found few structs containing zero length arrays (below). Warnings depressed by pragma, of course, but I've failed to create by "new" structures containing such structures (error 2233). Array 'byData' used as pointer, but why not to use pointer instead? or array of length 1? And of course, no comments were added to make me enjoy the process...
Any causes to use such thing? Any advice in refactoring those?
struct someData
{
int nData;
BYTE byData[0];
}
NB It's C++, Windows XP, VS 2003
Yes this is a C-Hack.
To create an array of any length:
struct someData* mallocSomeData(int size)
{
struct someData* result = (struct someData*)malloc(sizeof(struct someData) + size * sizeof(BYTE));
if (result)
{ result->nData = size;
}
return result;
}
Now you have an object of someData with an array of a specified length.
There are, unfortunately, several reasons why you would declare a zero length array at the end of a structure. It essentially gives you the ability to have a variable length structure returned from an API.
Raymond Chen did an excellent blog post on the subject. I suggest you take a look at this post because it likely contains the answer you want.
Note in his post, it deals with arrays of size 1 instead of 0. This is the case because zero length arrays are a more recent entry into the standards. His post should still apply to your problem.
http://blogs.msdn.com/oldnewthing/archive/2004/08/26/220873.aspx
EDIT
Note: Even though Raymond's post says 0 length arrays are legal in C99 they are in fact still not legal in C99. Instead of a 0 length array here you should be using a length 1 array
This is an old C hack to allow a flexible sized arrays.
In C99 standard this is not neccessary as it supports the arr[] syntax.
Your intution about "why not use an array of size 1" is spot on.
The code is doing the "C struct hack" wrong, because declarations of zero length arrays are a constraint violation. This means that a compiler can reject your hack right off the bat at compile time with a diagnostic message that stops the translation.
If we want to perpetrate a hack, we must sneak it past the compiler.
The right way to do the "C struct hack" (which is compatible with C dialects going back to 1989 ANSI C, and probably much earlier) is to use a perfectly valid array of size 1:
struct someData
{
int nData;
unsigned char byData[1];
}
Moreover, instead of sizeof struct someData, the size of the part before byData is calculated using:
offsetof(struct someData, byData);
To allocate a struct someData with space for 42 bytes in byData, we would then use:
struct someData *psd = (struct someData *) malloc(offsetof(struct someData, byData) + 42);
Note that this offsetof calculation is in fact the correct calculation even in the case of the array size being zero. You see, sizeof the whole structure can include padding. For instance, if we have something like this:
struct hack {
unsigned long ul;
char c;
char foo[0]; /* assuming our compiler accepts this nonsense */
};
The size of struct hack is quite possibly padded for alignment because of the ul member. If unsigned long is four bytes wide, then quite possibly sizeof (struct hack) is 8, whereas offsetof(struct hack, foo) is almost certainly 5. The offsetof method is the way to get the accurate size of the preceding part of the struct just before the array.
So that would be the way to refactor the code: make it conform to the classic, highly portable struct hack.
Why not use a pointer? Because a pointer occupies extra space and has to be initialized.
There are other good reasons not to use a pointer, namely that a pointer requires an address space in order to be meaningful. The struct hack is externalizeable: that is to say, there are situations in which such a layout conforms to external storage such as areas of files, packets or shared memory, in which you do not want pointers because they are not meaningful.
Several years ago, I used the struct hack in a shared memory message passing interface between kernel and user space. I didn't want pointers there, because they would have been meaningful only to the original address space of the process generating a message. The kernel part of the software had a view to the memory using its own mapping at a different address, and so everything was based on offset calculations.
It's worth pointing out IMO the best way to do the size calculation, which is used in the Raymond Chen article linked above.
struct foo
{
size_t count;
int data[1];
}
size_t foo_size_from_count(size_t count)
{
return offsetof(foo, data[count]);
}
The offset of the first entry off the end of desired allocation, is also the size of the desired allocation. IMO it's an extremely elegant way of doing the size calculation. It does not matter what the element type of the variable size array is. The offsetof (or FIELD_OFFSET or UFIELD_OFFSET in Windows) is always written the same way. No sizeof() expressions to accidentally mess up.