fixed length data types in C/C++

fixed length data types in C/C++ - c++

I've heard that size of data types such as int may vary across platforms.
My first question is: can someone bring some example, what goes wrong, when program
assumes an int is 4 bytes, but on a different platform it is say 2 bytes?
Another question I had is related. I know people solve this issue with some typedefs,
like you have variables like u8,u16,u32 - which are guaranteed to be 8bits, 16bits, 32bits, regardless of the platform -- my question is, how is this achieved usually? (I am not referring to types from stdint library - I am curious manually, how can one enforce that some type is always say 32 bits regardless of the platform??)

I know people solve this issue with some typedefs, like you have variables like u8,u16,u32 - which are guaranteed to be 8bits, 16bits, 32bits, regardless of the platform
There are some platforms, which have no types of certain size (like for example TI's 28xxx, where size of char is 16 bits). In such cases, it is not possible to have an 8-bit type (unless you really want it, but that may introduce performance hit).
how is this achieved usually?
Usually with typedefs. c99 (and c++11) have these typedefs in a header. So, just use them.
can someone bring some example, what goes wrong, when program assumes an int is 4 bytes, but on a different platform it is say 2 bytes?
The best example is a communication between systems with different type size. Sending array of ints from one to another platform, where sizeof(int) is different on two, one has to take extreme care.
Also, saving array of ints in a binary file on 32-bit platform, and reinterpreting it on a 64-bit platform.

In earlier iterations of the C standard, you generally made your own typedef statements to ensure you got a (for example) 16-bit type, based on #define strings passed into the compiler for example:
gcc -DINT16_IS_LONG ...
Nowadays (C99 and above), there are specific types such as uint16_t, the exactly 16-bit wide unsigned integer.
Provided you include stdint.h, you get exact bit width types,at-least-that-width types, fastest types with a given minimum widthand so on, as documented in C99 7.18 Integer types <stdint.h>. If an implementation has compatible types, they are required to provide these.
Also very useful is inttypes.h which adds some other neat features for format conversion of these new types (printf and scanf format strings).

For the first question: Integer Overflow.
For the second question: for example, to typedef an unsigned 32 bits integer, on a platform where int is 4 bytes, use:
typedef unsigned int u32;
On a platform where int is 2 bytes while long is 4 bytes:
typedef unsigned long u32;
In this way, you only need to modify one header file to make the types cross-platform.
If there are some platform-specific macros, this can be achieved without modifying manually:
#if defined(PLAT1)
typedef unsigned int u32;
#elif defined(PLAT2)
typedef unsigned long u32;
#endif
If C99 stdint.h is supported, it's preferred.

First of all: Never write programs that rely on the width of types like short, int, unsigned int,....
Basically: "never rely on the width, if it isn't guaranteed by the standard".
If you want to be truly platform independent and store e.g. the value 33000 as a signed integer, you can't just assume that an int will hold it. An int has at least the range -32767 to 32767 or -32768 to 32767 (depending on ones/twos complement). That's just not enough, even though it usually is 32bits and therefore capable of storing 33000. For this value you definitively need a >16bit type, hence you simply choose int32_t or int64_t. If this type doesn't exist, the compiler will tell you the error, but it won't be a silent mistake.
Second: C++11 provides a standard header for fixed width integer types. None of these are guaranteed to exist on your platform, but when they exists, they are guaranteed to be of the exact width. See this article on cppreference.com for a reference. The types are named in the format int[n]_t and uint[n]_t where n is 8, 16, 32 or 64. You'll need to include the header <cstdint>. The C header is of course <stdint.h>.

usually, the issue happens when you max out the number or when you're serializing. A less common scenario happens when someone makes an explicit size assumption.
In the first scenario:
int x = 32000;
int y = 32000;
int z = x+y; // can cause overflow for 2 bytes, but not 4
In the second scenario,
struct header {
int magic;
int w;
int h;
};
then one goes to fwrite:
header h;
// fill in h
fwrite(&h, sizeof(h), 1, fp);
// this is all fine and good until one freads from an architecture with a different int size
In the third scenario:
int* x = new int[100];
char* buff = (char*)x;
// now try to change the 3rd element of x via buff assuming int size of 2
*((int*)(buff+2*2)) = 100;
// (of course, it's easy to fix this with sizeof(int))
If you're using a relatively new compiler, I would use uint8_t, int8_t, etc. in order to be assure of the type size.
In older compilers, typedef is usually defined on a per platform basis. For example, one may do:
#ifdef _WIN32
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
// and so on...
#endif
In this way, there would be a header per platform that defines specifics of that platform.

I am curious manually, how can one enforce that some type is always say 32 bits regardless of the platform??
If you want your (modern) C++ program's compilation to fail if a given type is not the width you expect, add a static_assert somewhere. I'd add this around where the assumptions about the type's width are being made.
static_assert(sizeof(int) == 4, "Expected int to be four chars wide but it was not.");
chars on most commonly used platforms are 8 bits large, but not all platforms work this way.

Well, first example - something like this:
int a = 45000; // both a and b
int b = 40000; // does not fit in 2 bytes.
int c = a + b; // overflows on 16bits, but not on 32bits
If you look into cstdint header, you will find how all fixed size types (int8_t, uint8_t, etc.) are defined - and only thing differs between different architectures is this header file. So, on one architecture int16_tcould be:
typedef int int16_t;
and on another:
typedef short int16_t;
Also, there are other types, which may be useful, like: int_least16_t

If a type is smaller than you think then it may not be able to store a value you need to store in it.
To create a fixed size types you read the documentation for platforms to be supported and then define typedefs based on #ifdef for the specific platforms.

can someone bring some example, what goes wrong, when program assumes an int is 4 bytes, but on a different platform it is say 2 bytes?
Say you've designed your program to read 100,000 inputs, and you're counting it using an unsigned int assuming a size of 32 bits (32-bit unsigned ints can count till 4,294,967,295). If you compile the code on a platform (or compiler) with 16-bit integers (16-bit unsigned ints can count only till 65,535) the value will wrap-around past 65535 due to the capacity and denote a wrong count.

Compilers are responsible to obey the standard. When you include <cstdint> or <stdint.h> they shall provide types according to standard size.
Compilers know they're compiling the code for what platform, then they can generate some internal macros or magics to build the suitable type. For example, a compiler on a 32-bit machine generates __32BIT__ macro, and previously it has these lines in the stdint header file:
#ifdef __32BIT__
typedef __int32_internal__ int32_t;
typedef __int64_internal__ int64_t;
...
#endif
and you can use it.

bit flags are the trivial example. 0x10000 will cause you problems, you can't mask with it or check if a bit is set in that 17th position if everything is being truncated or smashed to fit into 16-bits.

Related

Forcing sign of a bit field (pre-C++14) when using fixed size types

Skip to the bolded part for the essential question, the rest is just background.
For reasons I prefer not to get into, I'm writing a code generator that generates C++ structs in a (very) pre-C++14 environment. The generator has to create bit-fields; it also needs the tightest possible control over the behaviour of the generated fields, in as portable a fashion as possible. I need to control both the size of the underlying allocation unit, and how signed values are handled. I won't get into why I'm on such a fool's errand, that so obviously runs afoul of Implementation Defined behaviour, but there's a paycheck involved, and all the right ways to do what needs to be done have been rejected by the people who arrange the paychecks.
So I'm stuck generating things like:
int32_t x : 11;
because I need to convince the compiler that this field (and other adjacent fields with the same underlying type) live in a 32 bit word. Generating int for the underlying type is not an option because int doesn't have a fixed size, and things would go very wrong the day someone releases a compiler in which int is 64 bits wide, or we end up back on one where it's 16.
In pre-C++14, int x: 11 might or might not be an unsigned field, and you prepend an explicit signed or unsigned to get what you need. I'm concerned that int32_t and friends will have the same ambiguity (why wouldn't it?) but compilers are gagging on signed int32_t.
Does the C++ standard have any words on whether the intxx_t types impose their signedness on bit fields? If not, is there any guarantee that something like
typedef signed int I32;
...
I32 x : 11;
...
assert(sizeof(I32)==4); //when this breaks, you won't have fun
will carry the signed indicator into the bitfield?
Please note that any suggestion that starts with "just generate a function to..." is by fiat off the table. These generated headers will be plugged into code that does things like s->x = 17; and I've had it nicely explained to me that I must not suggest changing it all to s->set_x(17) even one more time. Even though I could trivially generate a set_x function to exactly and safely do what I need without any implementation defined behaviour at all. Also, I've very aware of the vagaries of bit fields, and left to right and right to left and inside out and whatever else compilers get up to with them, and several other reasons why this is a fool's errand. And I can't just "try stuff" because this needs to work on compilers I don't have, which is why I'm scrambling after guarantees in the standard.
Note: I can't implement any solution that doesn't allow existing code to simply cast a pointer to a buffer of bytes to a pointer to the generated struct, and then use their pointer to get to fields to read and write. The existing code is all about s->x, and must work with no changes. That rules out any solution involving a constructor in generated code.

Does the C++ standard have any words on whether the intxx_t types impose their signedness on bit fields?
No.
The standard's synopsis for the fixed-width integers of <cstdint>, [cstdint.syn] (link to modern standard; the relevant parts of the synopsis looks the same in the C++11 standard) simply specifies, descriptively (not by means of the signed/unsigned keywords), that they shall be of "signed integer type" or "unsigned integer type".
E.g. for gcc, <cstdint> expose the fixed width integers of <stdint.h>, which in turn are typedefs to predefined pre-processor macros (e.g. __INT32_TYPE__ for int32_t), the latter being platform specific.
The standard does not impose any required use of the signed or unsigned keywords in this synopsis, and thus bit fields of fixed width integer types will, in C++11, suffer the same implementation-defined behavior regarding their signedness as is present when declaring a plain integer bit field. Recall that the relevant part of [class.bit]/3 prior to C++14 was (prior to action due to CWG 739):
It is implementation-defined whether a plain (neither explicitly signed nor unsigned) char, short, int, long, or long long bit-field is signed or unsigned. ...
Indeed, the following thread
How are the GNU C preprocessor predefined macros used?
shows an example where e.g. __INT32_TYPE__ on the answerer's particular platform is defined with no explicit presence of the signed keyword:
$ gcc -dM -E - < /dev/null | grep __INT
...
#define __INT32_TYPE__ int

it also needs the tightest possible control over the behaviour of the generated fields, in as portable a fashion as possible. I need to control both the size of the underlying allocation unit, and how signed values are handled.
These two goals are incompatible. Bitfields inherently have portability problems.
If the standard defined the behaviors you want, then the "vagaries of bit fields" wouldn't exist, and people wouldn't bother recommending using bitmasks and shifts for portability.
What you possibly could do is to provide a class that exposes the same interface as a struct with bitfields but that doesn't actually use bitfields internally. Then you could make its constructor and destructor read or write those fields portably via masks and shifts. For example, something like:
class BitfieldProxy
{
public:
BitfieldProxy(uint32_t& u)
: x((u >> 4) & 0x7FF),
y(u & 0xF),
mDest(u)
{
}
~BitfieldProxy()
{
assert((x & 0x7FF) == x);
assert((y & 0xF) == y);
dest = (x << 4) | y;
}
BitfieldProxy(const BitfieldProxy&) = delete;
BitfieldProxy& operator=(const BitfieldProxy&) = delete;
// Only the last 11 bits are valid.
unsigned int x;
// Only the last 4 bits are valid.
unsigned int y;
private:
uint32_t& mDest;
};

x86-64 MSVC++/Intel C++ change size of int, long, etc

I wish to have the following sizes when I compile (using Visual C++ 2015 and/or Intel C++ 16.0)
char 32 bits unsigned (for UTF-32 characters)
short 32 bits
int 64 bits
long 128 bits
Pointers and size_t 64 bits (which they are currently)
Is this possible to change? My current solution uses the macros:
#define int int64_t
#define char char32_t // Is this unsigned?
#define short int32_t
#define long __int128
But it has problems, like "int main" doesn't work... And I can't defined "signed int" "unsigned int" etc. as macro names can't have spaces
EDIT: The reason I want to do this is to improve legibility (so I don't have to write int64_t...) and also to make any code I use, that uses int/char/short/long to automatically upgrade (when recompiling) to using 64/32/32/128 bits, without having to modify it directly.

You cannot do this. The only proper way to achieve this is by introducing your own types and using them instead.
Also, when using types like int you must not depend on the underlying size apart from what the standard says (i.e. in case of int the only guarantee is that it's at least 16 bits). What you want to achieve is a dependency you shouldn't have, and that would make you code completely unportable. Besides, I don't see why int64_t would be less legible than using int. Also, the redefining you want would come unexpected to other developers and thus is likely to cause bugs. Using your own types makes it explicit that the types are different.

It's not necessary to use macro when you define unsigned int, you can write code like the following:
typedef unsigned int UINT;
now you can also write code like this:
#define UINT balabala

Content within types.h — where does the compiler define the width of int, signed int and others?

I read both /usr/include/bits/types.h and /usr/include/sys/types.h, but only see them use "unsigned int" or "signed int" to define some other relatively rarely used type for us, e.g:
typedef signed char __int8_t;
...
typedef signed int __int32_t;
or:
#define __S32_TYPE int
#define __U32_TYPE unsigned int;
As to "where is the signed int (or int) originally defined?" and "in which file, gcc decide the int should be 4 bytes width in my x86-64 server"? I cannot find anything.
I am wondering the process in which the gcc/g++ compiler define these primitive type for us, and want to see the originally definition file.
Please tell me the originally position or enlighten me about some method to find them.

int, unsigned int, long and some others are system type, they are defined by compiler itself. Compiler has some demands on those type, for instance int must be at least 16 bits, but compiler may make it longer. Usually int means most efficient integral type of at least 16 bits.
You should not rely on actual size of int, if you need it to hold more than 32767, please stick to long or long long type. If you need certain amount of bits integral due to desired overflow behavior, you can use uint16_t/uint32_t types. If you want to make sure there is at least certain amount of bits, you can also use uint_fast16_t/uint_fast32_t.

The basic types are intrinsic to the compiler; they are built in when the compiler is compiled, and are not defined anywhere you can find it easily. (Somewhere in the code there is the relevant information, but it won't be particularly easy to find.)
Thus, you won't find the information in a header directly. You can get the size information from the sizeof() operator. You can infer sizes from the macros in <limits.h> and <float.h>.

crossplatform 64 bit type

Is there a 64 bit type that in every OS(32/64 bit) and for every compiler has a size of 64?
The same question is also for 32 bit type. (It should be int?)
The origin of the question is : I am implementing the system which has 2 kinds of instructions :
32 bit
64 bit
I want to write something like:
typedef int instruction32bit;
typedef long long instruction64bit //it is not correct some system have sizeof(long long) = 128

You are looking for int64_t and int32_t, or their unsigned friends uint64_t and uint32_t. Include either cinttypes or cstdint.

If you want your code to be truly portable, then you probably want to typedef your own type, and use for example
typedef int32_t instruction32bit;
typedef int64_t instruction64bit;
This will work MOST of the time, but if it doesn't for a particular system/compiler/whatever, you can add do something like this:
#ifdef SOMEDEFINE
typedef long long int instruction64bit;
typedef int instruction32bit;
#else
typedef int32_t instruction32bit;
typedef int64_t instruction64bit;
#endif
Of course, for each model of compiler/OS (or group thereof) that doesn't support int32_t and int64_t, you probably will need a special #ifdef.
This is exactly what all truly portable code does, because no matter how much you find that "nearly all compilers do X", if you get your code popular enough, there's always someone who wants to compile the code with "Bob's Compiler Project" which doesn't have this feature. Of course, the other thing is to just leat those who use "Bob's compiler" edit the typedef itself, and not accept the "For Bob's compiler, you need this ..." patch that inevitably gets sent your way.
As Carl Norum points out in a comment, the #ifdef may be possible to convert to a #if in many cases, and then use generic types such as int and long.

Use uint_least32_t and uint_least64_t. The fixed-size types uint32_t and uint64_t will not exist on systems that don't have the exact sizes they describe.

Is int guaranteed to be 32 bits on each platform supported by Qt, or only qint32?

I remember reading somewhere that Qt guarantees the size of some data types on supported platforms. Is it that int will be at least 32 bits everywhere, and qint32 will be exactly 32 bits everywhere? Or something else?
C++ guarantees that int will be at least 16 bits, and some Qt structures like QRect and QPoint use int internally. I'm developing an application where 32 bits is needed with those types, and I don't want to have to duplicate their functionality so I can use a larger type.

The size of an integer type is up to the compiler. I don't think there's a guarantee that plain int will be of a precise size. But you can make sure you know it's not what you want by adding this line to the beginning of your main():
if(sizeof(int) != 4) {
throw std::runtime_error("int is not 32-bit");
}

While, as far as I know, it's technically possible that int isn't 32bits, I've never seen a platform where it isn't. Imagine- char, 8bits, short, 16bits, int, .. 24bits? It simply doesn't fit the hierarchy for int to be non-32bits.
In addition, you can use UINT_MAX to confirm int's size on your given compiler.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

fixed length data types in C/C++ - c++

If a type is smaller than you think then it may not be able to store a value you need to store in it. To create a fixed size types you read the documentation for platforms to be supported and then define typedefs based on #ifdef for the specific platforms.

bit flags are the trivial example. 0x10000 will cause you problems, you can't mask with it or check if a bit is set in that 17th position if everything is being truncated or smashed to fit into 16-bits.

Related

Forcing sign of a bit field (pre-C++14) when using fixed size types

x86-64 MSVC++/Intel C++ change size of int, long, etc

Content within types.h — where does the compiler define the width of int, signed int and others?

crossplatform 64 bit type

Is int guaranteed to be 32 bits on each platform supported by Qt, or only qint32?

Categories

Resources