Practical consequences of breaking strict aliasing between int and float - c++

I once had a very subtle bug in one of the projects I had to maintain. Essentially it was doing something like this:
union Value {
int64_t int64;
int32_t int32[2];
int16_t shorts[4];
int8_t chars[8];
float floats[2];
double float64;
};
Value v;
// in one place (not sure about exact code, it could be just memcpy):
v.shorts[0] = <some short value>;
v.shorts[1] = <some other short value>;
// in another place:
float f = v.floats[0];
Now, as far as the standard is concerned, this is simply UB. In practice, this could mean anything, but I can hardly imagine a reasonable implementation that would cause the code above to start World War III or disintegrate my PC. In real life, I can only imagine two things happening:
The compiler may screw up something with optimizations, not realizing that it deals with the same memory here. Pretty unlikely in this case, since writes and reads happened in totally different places.
Nothing bad really happens, and the float value is simply read bit-by-bit.
In practice, it was almost always case 2, except for once. After running the program compiled with MSVC 2010 in release mode on about 100-150 input files, in one of the files it generated an incorrect value that differed in exactly one bit from what it was supposed to be according to the common sense. That was a pretty significant bit, too, so instead of, say, 1.5, I got something like 117.9. I was able to trace it down to that exact read, and after fixing the code to adhere to strict aliasing rules, everything worked fine.
Now the question is, purely from low-level point of view, what could have caused that? Some peculiarities in CPU handling floating-point values? Hardware caching specifics? Compiler quirks? Why only one value was wrong?
The hardware was some old 2-core 64-bit Intel CPU running a 32-bit Windows 7, if that is of any help. The program is a single-threaded console application, nothing fancy. The problem was 100% reproducible, the same input files always produced the same output, and it's always the same value that was wrong.

From the point of view of the Standard, the code v.shorts[0] = something; takes a pointer value of type "short*", adds zero, and uses the resulting pointer to store a value. I think the authors of C89 intended that quality implementations where aliasing would be useful would recognize aliasing in this case, but nothing in the letter of the Standard would require it. Note that when the rules were included in C89, compilers were only expected to apply them on a very local level; further, the rules generally don't pose severe problems for programmers unless they're applied at a more far-reaching level. Unfortunately, some compilers are aggressively seek to extend the range of the rules as far as possible.
If you were to place each array within a separate structure within the union and then do something like:
v.floats.arr[0] = value;
v.floats.arr[1] = value;
v.floats = v.floats; // Compiler knows that float* may alter float members,
// and that writing member of union may alter other
// members
... now use other stuff
a compiler should hopefully recognize that the assignment to v.floats need
not generate any code but a conforming compiler must must still regard it as proper notice that other members of the union may have been altered. Note that the pattern does not seem to be reliable in gcc as of 6.2, however; in some cases when an assignment is not required to generate any code, the compiler will ignore the assignment--including its aliasing implications--altogether. I don't see any reason to bend over backward working around gcc's broken behavior, however--simply use -fno-strict-aliasing guilt-free unless or until gcc's aliasing logic gets fixed.

Related

C++ Statement Reordering

This is a question about Chandler's answer here (I didn't have a high enough rep to comment): Enforcing statement order in C++
In his answer, suppose foo() has no input or output. It's a black box that does work that is observable eventually, but won't be needed immediately (e.g. executes some callback). So we don't have input/output data locally handy to tell the compiler not to optimize. But I know that foo() will modify the memory somewhere, and the result will be observable eventually. Will the following prevent statement reordering and get the correct timing in this case?
#include <chrono>
#include <iostream>
//I believe this tells the compiler that all memory everywhere will be clobbered?
//(from his cppcon talk: https://youtu.be/nXaxk27zwlk?t=2441)
__attribute__((always_inline)) inline void DoNotOptimize() {
asm volatile("" : : : "memory");
}
// The compiler has full knowledge of the implementation.
static int ugly_global = 1; //we print this to screen sometime later
static void foo(void) { ugly_global *= 2; }
auto time_foo() {
using Clock = std::chrono::high_resolution_clock;
auto t1 = Clock::now(); // Statement 1
DoNotOptimize();
foo(); // Statement 2
DoNotOptimize();
auto t2 = Clock::now(); // Statement 3
return t2 - t1;
}
Will the following prevent statement reordering and get the correct timing in this case?
It should not be necessary because the calls to Clock::now should, at the language-definition level, enforce enough ordering. (That is, the C++11 standard says that the high resolution clock ought to get as much information as the system can give here, in the way that is most useful here. See "secondary question" below.)
But there is a more general case. It's worth thinking about the question: How does whoever provides the C++ library implementation actually write this function? Or, take C++ itself out of the equation. Given a language standard, how does an implementor—a person or group writing an implementation of that language—get you what you need? Fundamentally, we need to make a distinction between what the language standard requires and how an implementation provider goes about implementing the requirements.
The language itself may be expressed in terms of an abstract machine, and the C and C++ languages are. This abstract machine is pretty loosely defined: it executes some kind of instructions, which access data, but in many cases we don't know how it does these things, or even how big the various data items are (with some exceptions for fixed-size integers like int64_t), and os on. The machine may or may not have "registers" that hold things in ways that cannot be addressed as well as memory that can be addressed and whose addresses can be recorded in pointers:
p = &var
makes the value store in p (in memory or a register) such that using *p accesses the value stored in var (in memory or a register—some machines, especially back in the olden days, have / had addressable registers).1
Nonetheless, despite all of this abstraction, we want to run real code on real machines. Real machines have real constraints: some instructions might require particular values in particular registers (think about all the bizarre stuff in the x86 instruction sets, or wide-result integer multipliers and dividers that use special-purpose registers, as on some MIPS processors), or cause CPU sychronizations, or whatever.
GCC in particular invented a system of constraints to express what you could or could not do on the machine itself, using the machine's instruction set. Over time, this evolved into user-accessible asm constructs with input, output, and clobber sections. The particular one you show:
__attribute__((always_inline)) inline void DoNotOptimize() {
asm volatile("" : : : "memory");
}
expresses the idea that "this instruction" (asm; the actual provided instruction is blank) "cannot be moved" (volatile) "and clobbers all of the computer's memory, but no registers" ("memory" as the clobber section).
This is not part of either C or C++ as a language. It's just a compiler construction, supported by GCC and now supported by clang as well. But it suffices to force the compiler to issue all stores-to-memory before the asm, and reload values from memory as needed after the asm, in case they changed when the computer executed the (nonexistent) instruction included in the asm line. There's no guarantee that this will work, or even compile at all, in some other compiler, but as long as we're the implementor, we choose the compiler we're implementing for/with.
C++ as a language now has support for ordered memory operations, which an implementor must implement. The implementor can use these asm volatile constructs to achieve the right result, provided they do actually achieve the right result. For instance, if we need to cause the machine itself to synchronize—to emit a memory barrier—we can stick the appropriate machine instruction, such as mfence or membar #sync or whatever it may be, in the asm's instruction-section clause. See also compiler reordering vs memory reordering as Klaus mentioned in a comment.
It is up to the implementor to find an appropriately effective trick, compiler-specific or not, to get the right semantics while minimizing any runtime slowdown: for instance, we might want to use lfence rather than mfence if that's sufficient, or membar #LoadLoad, or whatever the right thing is for the machine. If our implementation of Clock::now requires some sort of fancy inline asm, we write one. If not, we don't. We make sure that we produce what's required—and then all users of the system can just use it, without needing to know what sort of grubby implementation tricks we had to invoke.
There's a secondary question here: does the language specification really constrain the implementor the way we think/hope it does? Chris Dodd's comment says he thinks so, and he's usually right on these kinds of questions. A couple of other commenters think otherwise, but I'm with Chris Dodd on this one. I think it is not necessary. You can always compile to assembly, or disassemble the compiled program, to check, though!
If the compiler didn't do the right thing, that asm would force it to do the right thing, in GCC and clang. It probably wouldn't work in other compilers.
1On the KA-10 in particular, the registers were just the first sixteen words of memory. As the Wikipedia page notes, this meant you could put instructions into there and call them. Because the first 16 words were the registers, these instructions ran much faster than other instructions.

When are type-punned pointers safe in practice?

A colleague of mine is working on C++ code that works with binary data arrays a lot. In certain places, he has code like
char *bytes = ...
T *p = (T*) bytes;
T v = p[i]; // UB
Here, T can be sometimes short or int (assume 16 and 32 bit respectively).
Now, unlike my colleague, I belong to the "no UB if at all possible" camp, while he is more along the lines of "if it works, it's OK". I am having a hard time trying to convince him otherwise.
Given that:
bytes really come from somewhere outside this compilation unit, being read from some binary file.
It's safe to assume that array really contains integers in the native endianness.
In practice, given mainstream C++ compilers like MSVC 2017 and gcc 4.8, and Intel x64 hardware, is such a thing really safe? I know it wouldn't be if T was, say, float (got bitten by it in the past).
char* can alias other entities without breaking strict aliasing rule.
Your code would be UB only if originally p + i wasn't a T originally.
char* byte = (char*) floats;
int *p = (int*) bytes;
int v = p[i]; // UB
but
char* byte = (char*) floats;
float *p = (float*) bytes;
float v = p[i]; // OK
If origin of byte is "unknown", compiler cannot benefit of UB for optimization and should assume we are in valid case and generate code according.
But how do you guaranty it is unknown ? Even outside the TU, something like Link-Time Optimization might allow to provide the hidden information.
Type-punned pointers are safe if one uses a construct which is recognized by the particular compiler one is using [i.e. any compiler that is configured support quality semantics if one is using straightforward constructs; neither gcc nor clang support quality semantics qualifies with optimizations are enabled, however, unless one uses -fno-strict-aliasing]. The authors of C89 were certainly aware that many applications required the use of various type-punning constructs beyond those mandated by the Standard, but thought the question of which constructs to recognize was best left as a quality-of-implementation issue. Given something like:
struct s1 { int objectClass; };
struct s2 { int objectClass; double x,y; };
struct s3 { int objectClass; char someData[32]; };
int getObjectClass(void *p) { return ((struct s1*)p)->objectClass; }
I think the authors of the Standard would have intended that the function be usable to read field objectClass of any of those structures [that is pretty much the whole purpose of the Common Initial Sequence rule] but there would be many ways by which compilers might achieve that. Some might recognize function calls as barriers to type-based aliasing analysis, while others might treat pointer casts in such a fashion. Most programs that use type punning would do several things that compilers might interpret as indications to be cautious with optimizations, so there was no particular need for a compiler to recognize any particular one of them. Further, since the authors of the Standard made no effort to forbid implementations that are "conforming" but are of such low-quality implementations as to be useless, there was no need to forbid compilers that somehow managed not to see any of the indications that storage might be used in interesting ways.
Unfortunately, for whatever reason, there hasn't been any effort by compiler vendors to find easy ways of recognizing common type-punning situations without needlessly impairing optimizations. While handling most cases would be fairly easy if compiler writers hadn't adopted designs that filter out the clearest and most useful evidence before applying optimization logic, both the designs of gcc and clang--and the mentalities of their maintainers--have evolved to oppose such a concept.
As far as I'm concerned, there is no reason why any "quality" implementation should have any trouble recognizing type punning in situations where all operations upon a byte of storage using a pointer converted to a pointer-to-PODS, or anything derived from that pointer, occur before the first time any of the following occurs:
That byte is accessed in conflicting fashion via means not derived from that pointer.
A pointer or reference is formed which will be used sometime in future to access that byte in conflicting fashion, or derive another that will.
Execution enters a function which will do one of the above before it exits.
Execution reaches the start of a bona fide loop [not, e.g. a do{...}while(0);] which will do one of the above before it exits.
A decently-designed compiler should have no problem recognizing those cases while still performing the vast majority of useful optimizations. Further, recognizing aliasing in such cases would be simpler and easier than trying to recognize it only in the cases mandated by the Standard. For those reasons, compilers that can't handle at least the above cases should be viewed as falling in the category of implementations that are of such low quality that the authors of the Standard didn't particularly want to allow, but saw no reason to forbid. Unfortunately, neither gcc nor clang offer any options to behave reasonably except by requiring that they disable type-based aliasing altogether. Unfortunately, the authors of gcc and clang would rather deride as "broken" any code needing features beyond what the Standard requires, than attempt a useful blend of optimization and semantics.
Incidentally, neither gcc nor clang should be relied upon to properly handle any situation in which storage that has been used as one type is later used as another, even when the Standard would require them to do so. Given something like:
union { struct s1 v1; struct s2 v2} unionArr[100];
void test(int i)
{
int test = unionArr[i].v2.objectClass;
unionArr[i].v1.objectClass = test;
}
Both clang and gcc will treat it as a no-op even if it is executed between code which writes unionArr[i].v2.objectClass and code which happens to reads member v1.objectClass of the same union object, thus causing them to ignore the possibility that the write to unionArr[i].v2.objectClass might affect v1.objectClass.

Is there a reason why not to use link-time optimization (LTO)?

GCC, MSVC, LLVM, and probably other toolchains have support for link-time (whole program) optimization to allow optimization of calls among compilation units.
Is there a reason not to enable this option when compiling production software?
I assume that by "production software" you mean software that you ship to the customers / goes into production. The answers at Why not always use compiler optimization? (kindly pointed out by Mankarse) mostly apply to situations in which you want to debug your code (so the software is still in the development phase -- not in production).
6 years have passed since I wrote this answer, and an update is necessary. Back in 2014, the issues were:
Link time optimization occasionally introduced subtle bugs, see for example Link-time optimization for the kernel. I assume this is less of an issue as of 2020. Safeguard against these kinds of compiler and linker bugs: Have appropriate tests to check the correctness of your software that you are about to ship.
Increased compile time. There are claims that the situation has significantly improved since 2014, for example thanks to slim objects.
Large memory usage. This post claims that the situation has drastically improved in recent years, thanks to partitioning.
As of 2020, I would try to use LTO by default on any of my projects.
This recent question raises another possible (but rather specific) case in which LTO may have undesirable effects: if the code in question is instrumented for timing, and separate compilation units have been used to try to preserve the relative ordering of the instrumented and instrumenting statements, then LTO has a good chance of destroying the necessary ordering.
I did say it was specific.
If you have well written code, it should only be advantageous. You may hit a compiler/linker bug, but this goes for all types of optimisation, this is rare.
Biggest downside is it drastically increases link time.
Apart from to this,
Consider a typical example from embedded system,
void function1(void) { /*Do something*/} //located at address 0x1000
void function2(void) { /*Do something*/} //located at address 0x1100
void function3(void) { /*Do something*/} //located at address 0x1200
With predefined addressed functions can be called through relative addresses like below,
(*0x1000)(); //expected to call function2
(*0x1100)(); //expected to call function2
(*0x1200)(); //expected to call function3
LTO can lead to unexpected behavior.
updated:
In automotive embedded SW development,Multiple parts of SW are compiled and flashed on to a separate sections.
Boot-loader, Application/s, Application-Configurations are independently flash-able units. Boot-loader has special capabilities to update Application and Application-configuration. At every power-on cycle boot-loader ensures the SW application and application-configuration's compatibility and consistence via Hard-coded location for SW-Versions and CRC and many more parameters. Linker-definition files are used to hard-code the variable location and some function location.
Given that the code is implemented correctly, then link time optimization should not have any impact on the functionality. However, there are scenarios where not 100% correct code will typically just work without link time optimization, but with link time optimization the incorrect code will stop working. There are similar situations when switching to higher optimization levels, like, from -O2 to -O3 with gcc.
That is, depending on your specific context (like, age of the code base, size of the code base, depth of tests, are you starting your project or are you close to final release, ...) you would have to judge the risk of such a change.
One scenario where link-time-optimization can lead to unexpected behavior for wrong code is the following:
Imagine you have two source files read.c and client.c which you compile into separate object files. In the file read.c there is a function read that does nothing else than reading from a specific memory address. The content at this address, however, should be marked as volatile, but unfortunately that was forgotten. From client.c the function read is called several times from the same function. Since read only performs one single read from the address and there is no optimization beyond the boundaries of the read function, read will always when called access the respective memory location. Consequently, every time when read is called from client.c, the code in client.c gets a freshly read value from the address, just as if volatile had been used.
Now, with link-time-optimization, the tiny function read from read.c is likely to be inlined whereever it is called from client.c. Due to the missing volatile, the compiler will now realize that the code reads several times from the same address, and may therefore optimize away the memory accesses. Consequently, the code starts to behave differently.
Rather than mandating that all implementations support the semantics necessary to accomplish all tasks, the Standard allows implementations intended to be suitable for various tasks to extend the language by defining semantics in corner cases beyond those mandated by the C Standard, in ways that would be useful for those tasks.
An extremely popular extension of this form is to specify that cross-module function calls will be processed in a fashion consistent with the platform's Application Binary Interface without regard for whether the C Standard would require such treatment.
Thus, if one makes a cross-module call to a function like:
uint32_t read_uint32_bits(void *p)
{
return *(uint32_t*)p;
}
the generated code would read the bit pattern in a 32-bit chunk of storage at address p, and interpret it as a uint32_t value using the platform's native 32-bit integer format, without regard for how that chunk of storage came to hold that bit pattern. Likewise, if a compiler were given something like:
uint32_t read_uint32_bits(void *p);
uint32_t f1bits, f2bits;
void test(void)
{
float f;
f = 1.0f;
f1bits = read_uint32_bits(&f);
f = 2.0f;
f2bits = read_uint32_bits(&f);
}
the compiler would reserve storage for f on the stack, store the bit pattern for 1.0f to that storage, call read_uint32_bits and store the returned value, store the bit pattern for 2.0f to that storage, call read_uint32_bits and store that returned value.
The Standard provides no syntax to indicate that the called function might read the storage whose address it receives using type uint32_t, nor to indicate that the pointer the function was given might have been written using type float, because implementations intended for low-level programming already extended the language to supported such semantics without using special syntax.
Unfortunately, adding in Link Time Optimization will break any code that relies upon that popular extension. Some people may view such code as broken, but if one recognizes the Spirit of C principle "Don't prevent programmers from doing what needs to be done", the Standard's failure to mandate support for a popular extension cannot be viewed as intending to deprecate its usage if the Standard fails to provide any reasonable alternative.
LTO could also reveal edge-case bugs in code-signing algorithms. Consider a code-signing algorithm based on certain expectations about the TEXT portion of some object or module. Now LTO optimizes the TEXT portion away, or inlines stuff into it in a way the code-signing algorithm was not designed to handle. Worst case scenario, it only affects one particular distribution pipeline but not another, due to a subtle difference in which encryption algorithm was used on each pipeline. Good luck figuring out why the app won't launch when distributed from pipeline A but not B.
LTO support is buggy and LTO related issues has lowest priority for compiler developers. For example: mingw-w64-x86_64-gcc-10.2.0-5 works fine with lto, mingw-w64-x86_64-gcc-10.2.0-6 segfauls with bogus address. We have just noticed that windows CI stopped working.
Please refer the following issue as an example.

Enum class performance when array indexing

Can I expect to see a performance hit for the casting in this . .
enum class myEnum {A,B,C};
myArray[(int)myEnum::A] = 123;
Compared to this?
enum myEnum {A,B,C};
myArray[A] = 123;
I'm leaning towards the new style enum classes for the type safety, but don't want to do it at the expense of performance.
It depends whether the enum value used as an index is known at compile time or passed in a variable.
That is myArray[(int)myEnum::A] shall not incur any penalty but myArray[(int)e] might, depending on the physical representation of e (ie, it might be necessary to "extend" it).
On the other hand, a simple extension is a trivial operation that is unlikely to ever show up as a performance issues: things like branch prediction (in conditionals) and caching are much more important in most applications (for low-level), and at a higher level algorithms matter.
Note: to avoid the extension issue in the runtime scenario, you could define the base type of myEnum to be the natural type that is expected for compiler arithmetic, I believe a ptrdiff_t would be most appropriate here. It is a big integer though.
Of course it's ultimately up to the compiler, but it's hard to see why any reasonable compiler would generate different code in these two cases.
I've tested this with g++ 4.7.2 on Intel, and they compile to identical assembly code.
No, it's not likely that the casting should have any impact on performance. This is all resolved at compile time.
I would expect NOT, but it would depend on the compiler implementation. Why don't you try both approaches, and time it? Try a few different compiler optimisation settings too, so you know that it won't make it different when it comes to production code compiles.
I doubt there would be any assembly instructions at all for the cast, since the entire expression (int)myEnum::A can be evaluated at compile time.
But if you really want to know, make a pair of sample programs, and analyze both by dumping disassembly, and/or measuring performance.

How to limit the impact of implementation-dependent language features in C++?

The following is an excerpt from Bjarne Stroustrup's book, The C++ Programming Language:
Section 4.6:
Some of the aspects of C++’s fundamental types, such as the size of an int, are implementation- defined (§C.2). I point out these dependencies and often recommend avoiding them or taking steps to minimize their impact. Why should you bother? People who program on a variety of systems or use a variety of compilers care a lot because if they don’t, they are forced to waste time finding and fixing obscure bugs. People who claim they don’t care about portability usually do so because they use only a single system and feel they can afford the attitude that ‘‘the language is what my compiler implements.’’ This is a narrow and shortsighted view. If your program is a success, it is likely to be ported, so someone will have to find and fix problems related to implementation-dependent features. In addition, programs often need to be compiled with other compilers for the same system, and even a future release of your favorite compiler may do some things differently from the current one. It is far easier to know and limit the impact of implementation dependencies when a program is written than to try to untangle the mess afterwards.
It is relatively easy to limit the impact of implementation-dependent language features.
My question is: How to limit the impact of implementation-dependent language features? Please mention implementation-dependent language features then show how to limit their impact.
Few ideas:
Unfortunately you will have to use macros to avoid some platform specific or compiler specific issues. You can look at the headers of Boost libraries to see that it can quite easily get cumbersome, for example look at the files:
boost/config/compiler/gcc.hpp
boost/config/compiler/intel.hpp
boost/config/platform/linux.hpp
and so on
The integer types tend to be messy among different platforms, you will have to define your own typedefs or use something like Boost cstdint.hpp
If you decide to use any library, then do a check that the library is supported on the given platform
Use the libraries with good support and clearly documented platform support (for example Boost)
You can abstract yourself from some C++ implementation specific issues by relying heavily on libraries like Qt, which provide an "alternative" in sense of types and algorithms. They also attempt to make the coding in C++ more portable. Does it work? I'm not sure.
Not everything can be done with macros. Your build system will have to be able to detect the platform and the presence of certain libraries. Many would suggest autotools for project configuration, I on the other hand recommend CMake (rather nice language, no more M4)
endianness and alignment might be an issue if you do some low level meddling (i.e. reinterpret_cast and friends things alike (friends was a bad word in C++ context)).
throw in a lot of warning flags for the compiler, for gcc I would recommend at least -Wall -Wextra. But there is much more, see the documentation of the compiler or this question.
you have to watch out for everything that is implementation-defined and implementation-dependend. If you want the truth, only the truth, nothing but the truth, then go to ISO standard.
Well, the variable sizes one mentioned is a fairly well known issue, with the common workaround of providing typedeffed versions of the basic types that have well defined sizes (normally advertised in the typedef name). This is done use preprocessor macros to give different code-visibility on different platforms. E.g.:
#ifdef __WIN32__
typedef int int32;
typedef char char8;
//etc
#endif
#ifdef __MACOSX__
//different typedefs to produce same results
#endif
Other issues are normally solved in the same way too (i.e. using preprocessor tokens to perform conditional compilation)
The most obvious implementation dependency is size of integer types. There are many ways to handle this. The most obvious way is to use typedefs to create ints of the various sizes:
typedef signed short int16_t;
typedef unsigned short uint16_t;
The trick here is to pick a convention and stick to it. Which convention is the hard part: INT16, int16, int16_t, t_int16, Int16, etc. C99 has the stdint.h file which uses the int16_t style. If your compiler has this file, use it.
Similarly, you should be pedantic about using other standard defines such as size_t, time_t, etc.
The other trick is knowing when not to use these typedef. A loop control variable used to index an array, should just take raw int types so the compile will generate the best code for your processor. for (int32_t i = 0; i < x; ++i) could generate a lot of needless code on a 64-bite processor, just like using int16_t's would on a 32-bit processor.
A good solution is to use common headings that define typedeff'ed types as neccessary.
For example, including sys/types.h is an excellent way to deal with this, as is using portable libraries.
There are two approaches to this:
define your own types with a known size and use them instead of built-in types (like typedef int int32 #if-ed for various platforms)
use techniques which are not dependent on the type size
The first is very popular, however the second, when possible, usually results in a cleaner code. This includes:
do not assume pointer can be cast to int
do not assume you know the byte size of individual types, always use sizeof to check it
when saving data to files or transferring them across network, use techniques which are portable across changing data sizes (like saving/loading text files)
One recent example of this is writing code which can be compiled for both x86 and x64 platforms. The dangerous part here is pointer and size_t size - be prepared it can be 4 or 8 depending on platform, when casting or differencing pointer, cast never to int, use intptr_t and similar typedef-ed types instead.
One of the key ways of avoiding dependancy on particular data sizes is to read & write persistent data as text, not binary. If binary data must be used then all read/write operations must be centralised in a few methods and approaches like the typedefs already described here used.
A second rhing you can do is to enable all your your compilers warnings. for example, using the -pedantic flag with g++ will warn you of lots of potential portability problems.
If you're concerned about portability, things like the size of an int can be determined and dealt with without much difficulty. A lot of C++ compilers also support C99 features like the int types: int8_t, uint8_t, int16_t, uint32_t, etc. If yours doesn't support them natively, you can always include <cstdint> or <sys/types.h>, which, more often than not, has those typedefed. <limits.h> has these definitions for all the basic types.
The standard only guarantees the minimum size of a type, which you can always rely on: sizeof(char) < sizeof(short) <= sizeof(int) <= sizeof(long). char must be at least 8 bits. short and int must be at least 16 bits. long must be at least 32 bits.
Other things that might be implementation-defined include the ABI and name-mangling schemes (the behavior of export "C++" specifically), but unless you're working with more than one compiler, that's usually a non-issue.
The following is also an excerpt from Bjarne Stroustrup's book, The C++ Programming Language:
Section 10.4.9:
No implementation-independent guarantees are made about the order of construction of nonlocal objects in different compilation units. For example:
// file1.c:
Table tbl1;
// file2.c:
Table tbl2;
Whether tbl1 is constructed before tbl2 or vice versa is implementation-dependent. The order isn’t even guaranteed to be fixed in every particular implementation. Dynamic linking, or even a small change in the compilation process, can alter the sequence. The order of destruction is similarly implementation-dependent.
A programmer may ensure proper initialization by implementing the strategy that the implementations usually employ for local static objects: a first-time switch. For example:
class Zlib {
static bool initialized;
static void initialize() { /* initialize */ initialized = true; }
public:
// no constructor
void f()
{
if (initialized == false) initialize();
// ...
}
// ...
};
If there are many functions that need to test the first-time switch, this can be tedious, but it is often manageable. This technique relies on the fact that statically allocated objects without constructors are initialized to 0. The really difficult case is the one in which the first operation may be time-critical so that the overhead of testing and possible initialization can be serious. In that case, further trickery is required (§21.5.2).
An alternative approach for a simple object is to present it as a function (§9.4.1):
int& obj() { static int x = 0; return x; } // initialized upon first use
First-time switches do not handle every conceivable situation. For example, it is possible to create objects that refer to each other during construction. Such examples are best avoided. If such objects are necessary, they must be constructed carefully in stages.