Forcing types to a specific size - c++

I've been learning C++ and one thing that I'm not really comfortable with is the fact that datatype sizes are not consistent. Depending on what system something is deployed on an int could be 16 bits or 32 bits, etc.
So I was thinking it might be a good idea to make my own header file with data types like byte, word, etc. that are defined to be a specific size and will maintain that size on any platform.
Two questions. First is this a good idea? Or is it going to create other problems I'm not aware of? Second, how do you define a type as being, say, 8 bits? I can't just say #define BYTE char, cause char would vary across platforms.

Fortunately, other people have noticed this same problem. In C99 and C++11 (so set your compiler to compatibility with one of those two modes, there should be a switch in your compiler settings), they added the header stdint.h (for C) and cstdint (for C++). If you #include <cstdint>, you get the types int8_t, int16_t, int32_t, int64_t, and the same prefixed with a u for unsigned versions. If your platform supports those types, they will be defined in the header, along with several others.
If your compiler does not yet support that standard (or you are forced by reasons out of your control to remain on C++03), then there is also Boost.
However, you should only use this if you care exactly about the size of the type. int and unsigned are fine for throw-away variables in most cases. size_t should be used for indexing std::vector, etc.

First you need to figure out if you really care what sizes things are. If you are using an int to count the number of lines in a file, do you really care if it's 32-bit or 64? You need BYTE, WORD, etc if you are working with packed binary data, but generally not for any other reason. So you may be worrying over something that doesn't really matter.

Better yet, use the already defined stuff in stdint.h. See here for more details. Similar question here.
Example:
int32_t is always 32 bits.

Many libraries have their own .h with a lots of typedef to have constant size types. This is useful when making portable code, and avoid relying on the headers of the platform you are currently working with.

If you only want to make sure the builtin data types have a minimum size you can use std::numeric_limits in the header to check.
std::numeric_limits<int>::digits
will give you, for example, the number of bits of an int without the sign bit. And
std::numeric_limits<int>::max()
will give you the max value.

Related

Size of Primitive data types

On what exactly does the size of a primitive data type like int depend on?
Compiler
Processor
Development Environment
Or is it a combination of these or other factors?
An explanation on the reason of the same will be really helpful.
EDIT: Sorry for the confusion..I meant to ask about Primitive data type like int and not regarding PODs, I do understand PODs can include structure and with structure it is a whole different ball game with padding coming in to the picture.
I have corrected the Q, the edit note here should ensure the answers regarding POD don't look irrelevant.
I think there are two parts to this question:
What sizes primitive types are allowed to be.
This is specified by the C and C++ standards: the types have allowed minimum value ranges they must have, which implicitly places a lower bound on their size in bits (e.g. long must be at least 32 bit to comply with the standard).
The standards do not specify the size in bytes, because the definition of the byte is up to the implementation, e.g. char is byte, but byte size (CHAR_BIT macro) may be 16 bit.
The actual size as defined by the implementation.
This, as other answers have already pointed out, is dependent on the implementation: the compiler. And the compiler implementation, in turn, is heavily influenced by the target architecture. So it's plausible to have two compilers running on the same OS and architecture, but having different size of int. The only assumption you can make is the one stated by the standard (given that the compiler implements it).
There also may be additional ABI requirements (e.g. fixed size of enums).
First of all, it depends on Compiler. Compiler in turns usually depends on the architecture, processor, development environment etc because it takes them into account. So you may say it's a combination of all. But I would NOT say that. I would say, Compiler, since on the same machine you may have different sizes of POD and built-in types, if you use different compilers. Also note that your source code is input to the compiler, so it's the compiler which makes final decision of the sizes of POD and built-in types. However, it's also true that this decision is influenced by the underlying architecture of the target machine. After all, the real useful compiler has to emit efficient code that eventually runs on the machine you target.
Compilers provides options too. Few of them might effect sizes also!
EDIT: What Standards say,
Size of char, signed char and unsigned char is defined by C++ Standard itself! Sizes of all other types are defined by the compiler.
C++03 Standard $5.3.3/1 says,
sizeof(char), sizeof(signed char) and
sizeof(unsigned char) are 1; the
result of sizeof applied to any other
fundamental type (3.9.1) is
implementation-defined. [Note: in
particular,sizeof(bool) and
sizeof(wchar_t) are
implementation-defined.69)
C99 Standard ($6.5.3.4) also itself defines the size of char, signed char and unsigned char to be 1, but leaves the size of other types to be defined by the compiler!
EDIT:
I found this C++ FAQ chapter really good. The entire chapter. It's very tiny chapter though. :-)
http://www.parashift.com/c++-faq-lite/intrinsic-types.html
Also read the comments below, there are some good arguments!
If you're asking about the size of a primitive type like int, I'd say it depends on the factor you cited.
The compiler/environment couple (where environment often means OS) is surely a part of it, since the compiler can map the various "sensible" sizes on the builtin types in different ways for various reasons: for example, compilers on x86_64 Windows will usually have a 32 bit long and a 64 bit long long to avoid breaking code thought for plain x86; on x86_64 Linux, instead, long is usually 64 bit because it's a more "natural" choice and apps developed for Linux are generally more architecture-neutral (because Linux runs on a much greater variety of architectures).
The processor surely matters in the decision: int should be the "natural size" of the processor, usually the size of the general-purpose registers of the processor. This means that it's the type that will work faster on the current architecture. long instead is often thought as a type which trades performance for an extended range (this is rarely true on regular PCs, but on microcontrollers it's normal).
If in instead you're also talking about structs & co. (which, if they respect some rules, are POD), again the compiler and the processor influence their size, since they are made of builtin types and of the appropriate padding chosen by the compiler to achieve the best performance on the target architecture.
As I commented under #Nawaz's answer, it technically depends solely on the compiler.
The compiler is just tasked with taking valid C++ code, and outputting valid machine code (or whatever language it targets).
So a C++ compiler could decide to make an int have a size of 15, and require it to be aligned on 5-byte boundaries, and it could decide to insert arbitrary padding between the variables in a POD. Nothing in the standard prohibits this, and it could still generate working code.
It'd just be much slower.
So in practice, compilers take some hints from the system they're running on, in two ways:
- the CPU has certain preferences: for example, it may have 32-bit wide registers, so making an int 32 bits wide would be a good idea, and it usually requires variables to be naturally aligned (a 4-byte wide variable must be aligned on an address divisible by 4, for example), so a sensible compiler respects these preferences because it yields faster code.
- the OS may have some influence too, in that if it uses another ABI than the compiler, making system calls is going to be needlessly difficult.
But those are just practical considerations to make life a bit easier for the programmer or to generate faster code. They're not required.
The compiler has the final word, and it can choose to completely ignore both the CPU and the OS. As long as it generates a working executable with the semantics specified in the C++ standard.
It depends on the implementation (compiler).
Implementation-defined behavior means unspecified behavior where each implementation documents how the choice is made.
A struct can also be POD, in which case you can explicity control potential padding between members with #pragma pack on some compilers.

C++: Datatypes, which to use and when?

I've been told that I should use size_t always when I want 32bit unsigned int, I don't quite understand why, but I think it has something to do with that if someone compiles the program on 16 or 64 bit machines, the unsigned int would become 16 or 64 bit but size_t won't, but why doesn't it? and how can I force the bit sizes to exactly what I want?
So, where is the list of which datatype to use and when? for example, is there a size_t alternative to unsigned short? or for 32bit int? etc. How can I be sure my datatypes have as many bits as I chose at the first place and not need to worry about different bit sizes on other machines?
Mostly I care more about the memory used rather than the marginal speed boost I get from doubling the memory usage, since I have not much RAM. So I want to stop worrying will everything break apart if my program is compiled on a machine that's not 32bit. For now I've used size_t always when i want it to be 32bit, but for short I don't know what to do. Someone help me to clear my head.
On the other hand: If I need 64 bit size variable, can I use it on a 32bit machine successfully? and what is that datatype name (if i want it to be 64bit always) ?
size_t is for storing object sizes. It is of exactly the right size for that and only that purpose - 4 bytes on 32-bit systems and 8 bytes on 64-bit systems. You shouldn't confuse it with unsigned int or any other datatype. It might be equivalent to unsigned int or might be not depending on the implementation (system bitness included).
Once you need to store something other than an object size you shouldn't use size_t and should instead use some other datatype.
As a side note: For containers, to indicate their size, don't use size_t, use container<...>::size_type
boost/cstdint.hpp can be used to be sure integers have right size.
size_t is not not necessarily 32-bit. It has been 16-bit with some compilers. It's 64-bit on a 64-bit system.
The C++ standard guarantees, via reference down to the C standard, that long is at least 32 bits.
int is only formally guaranteed 16 bits, but in practice I wouldn't worry: the chance that any ordinary code will be used on a 16-bit system is slim indeed, and on any 32-bit system int is 32-bit. Of course it's different if you're coding for a 16-bit system like some embedded computer. But in that case you'd probably be writing system-specific code anyway.
Where you need exact sizes you can use <stdint.h> if your compiler supports that header (it was introduced in C99, and the current C++ standard stems from 1998), or alternatively the corresponding Boost library header boost/cstdint.hpp.
However, in general, just use int. ;-)
Cheers & hth.,
size_t is not always 32-bit. E.g. It's 64-bit on 64-bit platforms.
For fixed-size integers, stdint.h is best. But it doesn't come with VS2008 or earlier - you have to download it separately. (It comes as a standard part of VS2010 and most other compilers).
Since you're using VS2008, you can use the MS-specific __int32, unsigned __int32 etc types. Documentation here.
To answer the 64-bit question: Most modern compilers have a 64-bit type, even on 32-bit systems. The compiler will do some magic to make it work. For Microsoft compilers, you can just use the __int64 or unsigned __int64 types.
Unfortunately, one of the quirks of the nature of data types is that it depends a great deal on which compiler you're using. Naturally, if you're only compiling for one target, there is no need to worry - just find out how large the type is using sizeof(...).
If you need to cross-compile, you could ensure compatibility by defining your own typedefs for each target (surrounded #ifdef blocks, referencing which target you're cross-compiling to).
If you're ever concerned that it could be compiled on a system that uses types with even weirder sizes than you have anticipated, you could always assert(sizeof(short)==2) or equivalent, so that you could guarantee at runtime that you're using the correctly sized types.
Your question is tagged visual-studio-2008, so I would recommend looking in the documentation for that compiler for pre-defined data types. Microsoft has a number that are predefined, such as BYTE, DWORD, and LARGE_INTEGER.
Take a look in windef.h winnt.h for more.

Why do C programmers use typedefs to rename basic types?

So I'm far from an expert on C, but something's been bugging me about code I've been reading for a long time: can someone explain to me why C(++) programmers use typedefs to rename simple types? I understand why you would use them for structs, but what exactly is the reason for declarations I see like
typedef unsigned char uch;
typedef uch UBYTE;
typedef unsigned long ulg;
typedef unsigned int u32;
typedef signed short s16;
Is there some advantage to this that isn't clear to me (a programmer whose experience begins with Java and hasn't ventured far outside of strictly type-safe languages)? Because I can't think of any reason for it--it looks like it would just make the code less readable for people unfamiliar with the project.
Feel free to treat me like a C newbie, I honestly know very little about it and it's likely there are things I've misunderstood from the outset. ;)
Renaming types without changing their exposed semantics/characteristics doesn't make much sense. In your example
typedef unsigned char uch;
typedef unsigned long ulg;
belong to that category. I don't see the point, aside from making a shorter name.
But these ones
typedef uch UBYTE;
typedef unsigned int u32;
typedef signed short s16;
are a completely different story. For example, s16 stands for "signed 16 bit type". This type is not necessarily signed short. Which specific type will hide behind s16 is platform-dependent. Programmers introduce this extra level of naming indirection to simplify the support for multiple platforms. If on some other platform signed 16 bit type happens to be signed int, the programmer will only have to change one typedef definition. UBYTE apparently stands for an unsigned machine byte type, which is not necessarily unsigned char.
It's worth noting that the C99 specification already provides a standard nomenclature for integral types of specific width, like int16_t, uint32_t and so on. It probably makes more sense to stick with this standard naming convention on platforms that don't support C99.
This allows for portability. For example you need an unsigned 32-bit integer type. Which standard type is that? You don't know - it's implementation defined. That's why you typedef a separate type to be 32-bit unsigned integer and use the new type in your code. When you need to compile on another C implementation you just change the typedefs.
Sometimes it is used to reduce an unwieldy thing like volatile unsigned long to something a little more compact such as vuint32_t.
Other times it is to help with portability since types like int are not always the same on each platform. By using a typedef you can set the storage class you are interested in to the platform's closest match without changing all the source code.
There are many reasons to it. What I think is:
Typename becomes shorter and thus code also smaller and more readable.
Aliasing effect for longer structure names.
Convention used in particular team/companies/style.
Porting - Have same name across all OS and machine. Its native data-structure might be slightly different.
Following is a quote from The C Programming Language (K&R)
Besides purely aesthetic issues, there are two main reasons for using
typedefs.
First- to parameterize a program
The first is to parameterize a program against portability problems.
If typedefs are used for data types
that may be machine-dependent, only
the typedefs need change when the
program is moved.
One common situation is to use typedef names for various integer
quantities, then make an appropriate
set of choices of short, int, and long
for each host machine. Types like
size_t and ptrdiff_t from the standard library are examples.
The italicized portions tells us that programmers typedef basic type for portability. If I want to make sure my program works on different platforms, using different compiler, I will try to ensure that its portability in every possible way and typedef is one of them.
When I started programming using Turbo C compiler on Windows platform, it gave us the size of int 2. When I moved to Linux platform and GCC complier, the size I get is 4. If I had developed a program using Turbo C which relied on the assertion that sizeof( int ) is always two, it would have not ported properly to my new platform.
Hope it helps.
Following quote from K&R is not related to your query but I have posted it too for the sake of completion.
Second- to provide better documentation
The second purpose of typedefs is to provide better documentation for a
program - a type called Treeptr may be easier to understand than one declared only as a
pointer to a complicated structure.
Most of these patterns are bad practices that come from reading and copying existing bad code. Often they reflect misunderstandings about what C does or does not require.
Is akin to #define BEGIN { except it saves some typing instead of making for more.
Is akin to #define FALSE 0. If your idea of "byte" is the smallest addressable unit, char is a byte by definition. If your idea of "byte" is an octet, then either char is the octet type, or your machine has no octet type.
Is really ugly shorthand for people who can't touch type...
Is a mistake. It should be typedef uint32_t u32; or better yet, uint32_t should just be used directly.
Is the same as 4. Replace uint32_t with int16_t.
Please put a "considered harmful" stamp on them all. typedef should be used when you really need to create a new type whose definition could change over the life cycle of your code or when the code is ported to different hardware, not because you think C would be "prettier" with different type names.
We use it to make it Project/platform specific, everything has a common naming convention
pname_int32, pname_uint32, pname_uint8 -- pname is project/platform/module name
And some #defines
pname_malloc, pname_strlen
It easier to read and shortens long datatypes like unsigned char to pname_uint8 also making it a convention across all modules.
When porting you need to just modify the single file , thus making porting easy.
To cut the long story short,
you might want to do that to make your code portable (with less effort/editing).
This way you don't depend to 'int', instead you are using INTEGER that can be anything you want.
All [|u]intN_t types, where N=8|16|32|64 and so forth, are defined per architecture in this exact manner. This is a direct consequence of the fact that the standard does not mandate that char,int,float, etc. have exactly N bits - that would be insane. Instead, the standard defines minimum and maximum values of each type as guarantees to the programmer, and in various architectures types may well exceed those boundaries. It is not an uncommon sight.
The typedefs in your post are used to defined types of a certain length, in a specific architecture. It's probably not the best choice of naming; u32 and s16 are a bit too short, in my opinion. Also, it's kind of a bad thing to expose the names ulg and uch, one could prefix them with an application specific string since they obviously will not be exposed.
Hope this helps.

Getting a pointer to a 4-byte object.. in an implementation independent way

I was programming normally when I realized that its probably not perfectly safe to assume an int is going to be a pointer to something 4 bytes in length.
Because Some of the aspects of C++’s fundamental types, such as the size of an int, are implementation- defined..
What if you're dealing with something (like a waveform, for example) that has 32-bit signed integer samples. You cast the byte pointer to (int*) and deal with it one sample at a time.
I'm just curious what's the "safe way" to acquire a 4-byte pointer, that ISN'T going to stop working if sometime in the future MSVC committee decides int is now 8 bytes.
Related
There is a C99 header called stdint.h your compiler might have. It defines types like uint32_t, an unsigned 32-bit integer.
Since C++11, your compiler is required to have this header. You should include it with #include <cstdint>.
If not, check out Boost Integer, which mimics this header as <boost/cstdint.hpp>.
For storing pointers as integers, use intptr_t, defined in the same header.
Use a pointer to uint32_t instead of int.
this type (and others) is defined in stdint.h and is part of the C99 standard
One way I've seen it done is abstracting out the size with precompiler directives and typedefs. Then you use the abstracted types which will be correct for the set of systems you want to support.
Perhaps you could just use an assert on the sizeof(int) so that at least if your assumptions are violated in future you'll know.
By far the easiest solution is to get a char* to a char[4]. On each and every platform, char[4] is a 4-byte object. For a entire waveform, you might need a char[4*512]

Cross-platform primitive data types in C++

Unlike Java or C#, primitive data types in C++ can vary in size depending on the platform. For example, int is not guaranteed to be a 32-bit integer.
Various compiler environments define data types such as uint32 or dword for this purpose, but there seems to be no standard include file for fixed-size data types.
What is the recommended method to achieve maximum portability?
I found this header particularly useful:
BOOST cstdint
Usually better than inventing own wheel (which incurs the maintenance and testing).
Create a header file called types.h, and define all the fixed-size primitive types you need (int32, uint32, uint8, etc.). To support multiple platforms, you can either use #ifdef's or have a separate include directory for each platform (include_x86, include_x86_64, include_sparc). In the latter case you would have separate build configurations for each platform, which would have the right include directory in their include path. The second method is preferable, according to the "The C++ Gotchas" by Stephen Dewhurst.
Just an aside, if you are planning to pass binary data between different platforms, you also have to worry about byte order.
Part of the C99 standard was a stdint.h header file to provide this kind of information. For instance, it defines a type called uint32_t. Unfortunately, a lot of compilers don't support stdint.h. The best cross-platform implementation I've seen of stdint.h is here: http://www.azillionmonkeys.com/qed/pstdint.h. You can just include that in your project.
If you're using boost, I believe it also provides something equivalent to the stdint header.
Define a type (e.g. int32) in a header file. For each platform use another #ifdef and make sure that in32 is a 32 bit integer. Everywhere in your code use int32 and make sure that when you compile on different platforms you use the right define
There is a stdint.h header defined by the C99 standard and (I think) some variant or another of ISO C++. This defines nice types like int16_t, uint64_t, etc... which are guaranteed to have a specific size and representation. Unfortunately, it's availability isn't exactly standard (Microsoft in particular was a foot dragger here).
The simple answer is this, which works on every 32 or 64 bit byte-addressable architecture I am aware of:
All char variables are 1 byte
All short variables are 2 bytes
All int variables are 4 byte
DO NOT use a "long", which is of indeterminate size.
All known compilers with support for 64 bit math allow "long long" as a native 64 bit type.
Be aware that some 32 bit compilers don't have a 64 bit type at all, so using long long will limit you to 64 bit systems and a smaller set of compilers (which includes gcc and MSVC, so most people won't care about this problem).
If its name begins with two underscores (__), a data type is non-standard.
__int8 (unsigned __int8)
__int16 (unsigned __int16)
__int32 (unsigned __int32)
__int64 (unsigned __int64)
Try to use boost/cstdint.hpp
Two things:
First, there is a header file called limits.h that gives lots of useful platform
specific information. It will give max and min values for the int type for example.
From that, you can deduce how big the int type is.
You can also use the sizeof operator at runtime for these purposes too.
I hope this helps . . .
K