Use NULL pointer in C shared library - c++

I am creating a C shared library. I provide a function to the user that has the declaration below:
int getResults(Elements** el)
where Elements is an array of structs provided by the user, which the function then fills with the values. The final number of calculated elements is different each time, depending on parameters in other functions, so there must be a way to inform the user about the number of them.
Instead of having a separate function to return the number of elements, the way I have implemented this is that the user can call this same function with NULL argument to get the number of existing elements:
int n = getResults(NULL);
allocate the required memory and then pass the array pointer. So, inside the function, I check:
if(el == NULL)
{
return numEl;
}
else
{
// Proceed to fill the structs.
// If all good, return 0
return 0;
}
Now, my concern is, could this approach fail?
I have read that NULL does not necessarily mean a specific number, 0 that is. So, if for example a user links with the library using another compiler, standard or integrates it with C++, is it guaranteed that this equation will always be true?

could this approach fail?
It could be misunderstood.
Other than that, I don't see a problem with the API.
I have read that NULL does not necessarily mean a specific number, 0 that is. So, if for example a user links with the library using another compiler, standard or integrates it with C++, is it guaranteed that this equation will always be true?
Null is null, regardless of what number represents it. There aren't many direct guarantees in the language standards about compatibility across language / compiler barriers. This is not limited to representation of null, but many aspects of the language implementation. Generally, compilers strive to be compatible with other compilers on same system. If a compiler is compatible with another, then there is no problem. If it is not compatible, then changing API is unlikely to fix the incompatiblity.
To use a shared library is to rely on compatibility of compilers used to produce the components. If you cannot rely on the compilers being compatible with one another, then you cannot make function calls across their boundary. Instead, you would have to rely on serialised communication over for example a socket.
I would consider case where the compilers are otherwise compatible except for their definition of null to be highly theoretical. But there is a way to design the API such that it doesn't rely on definition of null, to avoid problem in such imaginary case: Let the user of the API supply the pointer value that the library should accept as "null".

Related

C vs C++ file handling

I have been working in C and C++ and when it comes to file handling I get confused. Let me state the things I know.
In C, we use functions:
fopen, fclose, fwrite, fread, ftell, fseek, fprintf, fscanf, feof, fileno, fgets, fputs, fgetc, fputc.
FILE *fp for file pointer.
Modes like r, w, a
I know when to use these functions (Hope I didn't miss anything important).
In C++, we use functions / operators:
fstream f
f.open, f.close, f>>, f<<, f.seekg, f.seekp, f.tellg, f.tellp, f.read, f.write, f.eof.
Modes like ios::in, ios::out, ios::bin , etc...
So is it possible (recommended) to use C compatible file operations in C++?
Which is more widely used and why?
Is there anything other than these that I should be aware of?
Sometimes there's existing code expecting one or the other that you need to interact with, which can affect your choice, but in general the C++ versions wouldn't have been introduced if there weren't issues with the C versions that they could fix. Improvements include:
RAII semantics, which means e.g. fstreams close the files they manage when they leave scope
modal ability to throw exceptions when errors occur, which can make for cleaner code focused on the typical/successful processing (see http://en.cppreference.com/w/cpp/io/basic_ios/exceptions for API function and example)
type safety, such that how input and output is performed is implicitly selected using the variable type involved
C-style I/O has potential for crashes: e.g. int my_int = 32; printf("%s", my_int);, where %s tells printf to expect a pointer to an ASCIIZ character buffer but my_int appears instead; firstly, the argument passing convention may mean ints are passed differently to const char*s, secondly sizeof int may not equal sizeof const char*, and finally, even if printf extracts 32 as a const char* at best it will just print random garbage from memory address 32 onwards until it coincidentally hits a NUL character - far more likely the process will lack permissions to read some of that memory and the program will crash. Modern C compilers can sometimes validate the format string against the provided arguments, reducing this risk.
extensibility for user-defined types (i.e. you can teach streams how to handle your own classes)
support for dynamically sizing receiving strings based on the actual input, whereas the C functions tend to need hard-coded maximum buffer sizes and loops in user code to assemble arbitrary sized input
Streams are also sometimes criticised for:
verbosity of formatting, particularly "io manipulators" setting width, precision, base, padding, compared to the printf-style format strings
a sometimes confusing mix of manipulators that persist their settings across multiple I/O operations and others that are reset after each operation
lack of convenience class for RAII pushing/saving and later popping/restoring the manipulator state
being slow, as Ben Voigt comments and documents here
The performance differences between printf()/fwrite style I/O and C++ IO streams formatting are very much implementation dependent.
Some implementations (visual C++ for instance), build their IO streams on top of FILE * objects and this tends to increase the run-time complexity of their implementation. Note, however, that there was no particular constraint to implement the library in this fashion.
In my own opinion, the benefits of C++ I/O are as follows:
Type safety.
Flexibility of implementation. Code can be written to do specific formatting or input to or from a generic ostream or istream object. The application can then invoke this code with any kind of derived stream object. If the code that I have written and tested against a file now needs to be applied to a socket, a serial port, or some other kind of internal stream, you can create a stream implementation specific to that kind of I/O. Extending the C style I/O in this fashion is not even close to possible.
Flexibility in locale settings: the C approach of using a single global locale is, in my opinion, seriously flawed. I have experienced cases where I invoked library code (a DLL) that changed the global locale settings underneath my code and completely messed up my output. A C++ stream allows you to imbue() any locale to a stream object.
An interesting critical comparison can be found here.
C++ FQA io
Not exactly polite, but makes to think...
Disclaimer
The C++ FQA (that is a critical response to the C++ FAQ) is often considered by the C++ community a "stupid joke issued by a silly guy the even don't understand what C++ is or wants to be"(cit. from the FQA itself).
These kind of argumentation are often used to flame (or escape from) religion battles between C++ believers, Others languages believers or language atheists each in his own humble opinion convinced to be in something superior to the other.
I'm not interested in such battles, I just like to stimulate critical reasoning about the pros and cons argumentation. The C++ FQA -in this sens- has the advantage to place both the FQA and the FAQ one over the other, allowing an immediate comparison. And that the only reason why I referenced it.
Following TonyD comments, below (tanks for them, I makes me clear my intention need a clarification...), it must be noted that the OP is not just discussing the << and >> (I just talk about them in my comments just for brevity) but the entire function-set that makes up the I/O model of C and C++.
With this idea in mind, think also to other "imperative" languages (Java, Python, D ...) and you'll see they are all more conformant to the C model than C++. Sometimes making it even type safe (what the C model is not, and that's its major drawback).
What my point is all about
At the time C++ came along as mainstream (1996 or so) the <iostream.h> library (note the ".h": pre-ISO) was in a language where templates where not yet fully available, and, essentially, no type-safe support for varadic functions (we have to wait until C++11 to get them), but with type-safe overloaded functions.
The idea of oveloading << retuning it's first parameter over and over is -in fact- a way to chain a variable set of arguments using only a binary function, that can be overload in a type-safe manner. That idea extends to whatever "state management function" (like width() or precision()) through manipulators (like setw) appear as a natural consequence. This points -despite of what you may thing to the FQA author- are real facts. And is also a matter of fact that FQA is the only site I found that talks about it.
That said, years later, when the D language was designed starting offering varadic templates, the writef function was added in the D standard library providing a printf-like syntax, but also being perfectly type-safe. (see here)
Nowadays C++11 also have varadic templates ... so the same approach can be putted in place just in the same way.
Moral of the story
Both C++ and C io models appear "outdated" respect to a modern programming style.
C retain speed, C++ type safety and a "more flexible abstraction for localization" (but I wonder how many C++ programmers are in the world that are aware of locales and facets...) at a runtime-cost (jut track with a debugger the << of a number, going through stream, buffer locale and facet ... and all the related virtual functions!).
The C model, is also easily extensible to parametric messages (the one the order of the parameters depends on the localization of the text they are in) with format strings like
#1%d #2%i allowing scrpting like "text #2%i text #1%d ..."
The C++ model has no concept of "format string": the parameter order is fixed and itermixed with the text.
But C++11 varadic templates can be used to provide a support that:
can offer both compile-time and run-time locale selection
can offer both compile-time and run-time parametric order
can offer compile-time parameter type safety
... all using a simple format string methodology.
Is it time to standardize a new C++ i/o model ?

Defining Undefined Behavior

Does there exist any implementation of C++ (and/or C) that guarantees that anytime undefined behavior is invoked, it will signal an error? Obviously, such an implementation could not be as efficient as a standard C++ implementation, but it could be a useful debugging/testing tool.
If such an implementation does not exist, then are there any practical reasons that would make it impossible to implement? Or is it just that no one has done the work to implement it yet?
Edit: To make this a little more precise: I would like to have a compiler that allows me to make the assertion, for a given run of a C++ program that ran to completion, that no part of that run involved undefined behavior.
Yes, and no.
I am fairly certain that for practical purposes, an implementation could make C++ a safe language, meaning every operation has well-defined behavior. Of course, this comes at a huge overhead and there is probably some cases where it's simply unfeasible, such as race conditions in multithreaded code.
Now, the problem is that this can't guarantee your code is defined in other implementations! That is, it could still invoke UB. For instance, observe the following code:
int a;
int* b;
int foo() {
a = 5;
b = &a;
return 0;
}
int bar() {
*b = a;
return 0;
}
int main() {
std::cout << foo() << bar() << std::endl;
}
According to the standard, the order that foo and bar are called is up to the implementation to decide. Now, in a safe implementation this order would have to be defined, likely being left-to-right evaluation. The problem is that evaluating right-to-left invokes UB, which wouldn't be caught until you ran it on an unsafe implementation. The safe implementation could simply compile each permutation of evaluation order or do some static analysis, but this quickly becomes unfeasible and possibly undecidable.
So in conclusion, if such an implementation existed it would give you a false sense of security.
The new C standard has an interesting list in the new Annex L with the crude title "Analyzability". It talks about UB that is so-called critical UB. This includes among others:
An object is referred to outside of its lifetime (6.2.4).
A pointer is used to call a function whose type is not compatible with the referenced
type
The program attempts to modify a string literal
All of these are UB that are impossible or very hard to capture, since they usually can't be completely tested at compile time. This is due to the fact that a valid C (or C++) program is composed of several compilation units that may not know much of each other. E.g if one program passes a pointer to a string literal into a function with a char* parameter, or even worse, a program that casts away const-ness from a static variable.
Two C interpreters that detect a large class of undefined behaviors for a large subset of sequential C are KCC
and Frama-C's value analysis. They are both used to make sure that automatically generated, automatically reduced random C programs are appropriate to report bugs in C compilers.
From the webpage for KCC:
One of the main aims of this work is the ability to detect undefined
programs (e.g., programs that read invalid memory).
A third interpreter for a dialect of C is CompCert's interpreter mode (a writeup). This one detects all behaviors that are undefined in the input language of the certified C compiler CompCert. The input language of CompCert is essentially C, but it renders defined some behaviors that are undefined in the standard (signed arithmetic overflow is defined as computing 2's complement results, for instance).
In truth, all three of the interpreters mentioned in this answer have had difficult choices to make in the name of pragmatism.
The whole point of defining something as "undefined behaviour" is to avoid having to detect this situation in the compiler. It is defined that way, so that compilers can be built for a wide variety of platforms and architectures, and so that the hardware and software doesn't have to have specific features "just to detect undefined behaviour". Imagine that you have a memory subsystem that can't detect whether you are writing to real memory or not - how would the compiler or runtime system detect that you have just done somepointer = rand(); *somepointer = 42;
You can detect SOME situations. But to require that ALL are detected, would make life very difficult.
Given the Edit in the original question: I still don't think this is plausible to achieve in C. There is so much freedom to do almost anything (making pointers to almost anything, these pointers can be converted, indexed, recalculated, and all manner of other things), and will be able to cause all manner of undefined behaviour.
There is a list of all undefined behaviour in C here - it lists 186 different circumstances of undefined behaviour, ranging from a backslash as the last character of the file (likely to cause compiler error, but not defined as one) to "The comparison function called by the bsearch or qsort function returns ordering values inconsistently".
How on earth do you write a compiler to check that the function passed into bsearch or qsort is ordering values consistently? Of course, if the data passed into the comparison function is of a simple type, such as integers, then it's not that difficult, but if the data type is a complex type such as
struct {
char name[20];
char street[20];
int age;
char post_code[10];
};
and the programmer decides to sort the data based on ascending name, ascending street, descending age and ascending postcode, in that order? If that's what you want, but somehow the code got messed up and post code comparison returns some inconsistant result, things will go wrong, but it's very hard to formally inspect that case. There are lots of others that are similarly obscure and complex. Sure, YOUR code may not sort names and addresses etc, but someone will probably write somethng like that at some point or another.

Is it possible to strip type names from executable while keeping RTTI enabled?

I recently disabled RTTI on my compiler (MSVC10) and the executable size decreased significantly. By comparing the produced executables using a text editor, I found that the RTTI-less version contains much less symbol names, explaining the saved space.
AFAIK, those symbol names are only used to fill the type_info structure associated with each the polymorphic type, and one can programmatically access them calling type_info::name().
According to the standard, the format of the string returned by type_info::name() is unspecified. That is, no one can rely one it to do serious things portably. So, it should be possible for an implementation to always return an empty string without breaking anything, thus reducing the executable size without disabling RTTI support (so we can still use the typeid operator & compare type_info's objects safely).
But... is it possible ? I'm using MSVC10 and I've not found any option to do that. I can either disable completely RTTI (/GR-), or enable it with full type names (/GR). Does any compiler provide such an option?
So, it should be possible for an implementation to always return an empty string without breaking anything, thus reducing the executable size without disabling RTTI support (so we can still use the typeid operator & compare type_info's objects safely).
You are misreading the standard. The intent of making the return value from type_info::name() unspecified (other than a null-terminated binary string) was to give the implementers of the compiler/library/run-time environment free reign to implement the RTTI requirements as they see best. You, the programmer, have no say in how the Application Binary Interface (if there is one) is designed or implemented.
You're asking three different questions here.
The initial question asks whether there's any way to get MSVC to not generate names, or whether it's possible with other compilers, or, failing that, whether there's any way to strip the names out of the generated type_info without breaking things.
Then you want to know whether it would be possible to modify the MS ABI (presumably not too radically) so that it would be possible to strip the names.
Finally, you want to know whether it would be possible to design an ABI that didn't have names.
Question #1 is itself a complex question. As far as I know, there's no way to get MSVC to not generate names. And most other compilers are aimed at ABIs that specifically define what typeid(foo).name() must return, so they also can't be made to not generate names.
The more interesting question is, what happens if you strip out the names. For MSVC, I don't know the answer. The best thing to do here is probably to try it—go into your DLLs and change the first character of each name to \0 and see if it breaks dynamic_cast, etc. (I know that you can do this with Mac and linux x86_64 executables generated by g++ 4.2 and it works, but let's put that aside for now.)
On to question #2, assuming blanking the names doesn't work, it wouldn't be that hard to modify a name-based system to no longer require names. One trivial solution is to use hashes of the names, or even ROT13-encoded names (remember that the original goal here is "I don't want casual users to see the embarrassing names of my classes"). But I'm not sure that would count for what you're looking for. A slightly more complex solution is as follows:
For "dllexport"ed classes, generate a UUID, put that in the typeinfo, and also put it in the .LIB import library that gets generated along with the DLL.
For "dllimport"ed classes, read the UUID out of the .LIB and use that instead.
So, if you manage to get the dllexport/dllimport right, it will work, because your exe will be using the same UUID as the dll. But what if you don't? What if you "accidentally" specify identical classes (e.g., an instantiation of the same template with the same parameters) in your DLL and your EXE, without marking one as dllexport and one as dllimport? RTTI won't see them as the same type.
Is this a problem? Well, the C++ standard doesn't say it is. And neither does any MS documentation. In fact, the documentation explicitly says that you're not allowed to do this. You cannot use the same class or function in two different modules unless you explicitly export it from one module and import it into another. The fact that this is very hard to do with class templates is a problem, and it's a problem they don't try to solve.
Let's take a realistic example: Create a node-based linkedlist class template with a global static sentinel, where every list's last node points to that sentinel, and the end() function just returns a pointer to it. (Microsoft's own implementation of std::map used to do exactly this; I'm not sure if that's still true.) New up a linkedlist<int> in your exe, and pass it by reference to a function in your dll that tries to iterate from l.begin() to l.end(). It will never finish, because none of the nodes created by the exe will point to the copy of the sentinel in the dll. Of course if you pass l.begin() and l.end() into the DLL, instead of passing l itself, you won't have this problem. You can usually get away with passing a std::string or various other types by reference, just because they don't depend on anything that breaks. But you're not actually allowed to do so, you're just getting lucky. So, while replacing the names with UUIDs that have to be looked up at link time means types can't be matched up at link-loader time, the fact that types already can't be matched up at link-loader time means this is irrelevant.
It would be possible to build a name-based system that didn't have these problems. The ARM C++ ABI (and the iOS and Android ABIs based on it) restricts what programmers can get away with much less than MS, and has very specific requirements on how the link-loader has to make it work (3.2.5). This one couldn't be modified to not be name-based because it was an explicit choice in the design that:
• type_info::operator== and type_info::operator!= compare the strings returned by type_info::name(), not just the pointers to the RTTI objects and their names.
• No reliance is placed on the address returned by type_info::name(). (That is, t1.name() != t2.name() does not imply that t1 != t2).
The first condition effectively requires that these operators (and type_info::before()) must be called out of line, and that the execution environment must provide appropriate implementations of them.
But it's also possible to build an ABI that doesn't have this problem and that doesn't use names. Which segues nicely to #3.
The Itanium ABI (used by, among other things, both OS X and recent linux on x86_64 and i386) does guarantee that a linkedlist<int> generated in one object and a linkedlist<int> generated from the same header in another object can be linked together at runtime and will be the same type, which means they must have equal type_info objects. From 2.9.1:
It is intended that two type_info pointers point to equivalent type descriptions if and only if the pointers are equal. An implementation must satisfy this constraint, e.g. by using symbol preemption, COMDAT sections, or other mechanisms.
The compiler, linker, and link-loader must work together to make sure that a linkedlist<int> created in your executable points to the exact same type_info object that a linkedlist<int> created in your shared object would.
So, if you just took out all the names, it wouldn't make any difference at all. (And this is pretty easily tested and verified.)
But how could you possibly implement this ABI spec? j_kubik effectively argues that it's impossible because you'd have to preserve some link-time information in the .so files. Which points to the obvious answer: preserve some link-time information in the .so files. In fact, you already have to do that to handle, e.g., load-time relocations; this just extends what you need to preserve. And in fact, both Apple and GNU/linux/g++/ELF do exactly that. (This is part of the reason everyone building complex linux systems had to learn about symbol visibility and vague linkage a few years ago.)
There's an even more obvious way to solve the problem: Write a C++-based link loader, instead of trying to make the C++ compiler and linker work together to trick a C-based link loader. But as far as I know, nobody's tried that since Be.
Requirements for type-descriptor:
Works correctly in multi compilation-unit and shared-library environment;
Works correctly for different versions of shared libraries;
Works correctly although different compilation units don't share any information about type, except it's name: usually one header is used for all compilation units to define same type, but it's not required; even if, it doesn't affect resulting object file.
Work correctly despite fact that template instantiations must be fully-defined (so including type_info data) in every library that uses them, and yet behave like one type if several such libs are used together.
The fourth rule essentially bans all non-name based type-descriptors like UUIDs (unless specifically mentioned in type definition, but that is just name-replacement at best, and probably requires standard-alterations).
Stroing thuse UUIDs in separate files like suggeste .LIB files also causes trouble: different library versions implementing new types would cause trouble.
Compilation units should be able to share the same type (and its type_info) without the need to involve linker - because it should stay free of any language-specifics.
So type-name can be only unique type descriptor without completely re-modeling compilation and linking (also dynamic). I could imagine it working, but not under current scheme.

Fortran90 and size of arrays created in C++

I'm trying to call some Fortran 90 code from a C++ main program. The Fortran subroutine takes a array of double (call it X) as parameter, then proceeds to use size(X) in many places in the code. I call the routine with a C array created through
double *x = new double[21]
but when I print the result of size(X) in the Fortran code I get 837511505, or some other big numbers.
Right now I can modify the fortran code, so worst case is to rewrite the function, passing the size as a parameter. But I'd rather not do it.
Does anyone know if there's a way I can create the C array in such a way that the Fortran routine can figure out its size?
This is an implementation-specific feature. Many implementations (RSX and OpenVMS, for example) define a structure for passing a pointer to the data as well as a description of the dimensions, types, etc. Other implementations pass no such thing unless the external declaration explicitly invokes a mechanism to generate a descriptor. Most others provide no such mechanism.
Without knowing which implementation in use:
a) read the compiler's documentation
b) have the compiler generate assembly, and inspect it to see what it expects
Intel F95 uses array descriptor structure, which apart from the array pointer also store the bounds and dimension information. size() gets the information from the descriptor.
Since you're passing from C only pointer, no descriptor info is available, thus size() returns gibberish.
Generally, you're in the rough territory of mixed language programming, where arrays and structures are often a programmer's pain. Intel compiler user doc has a separate section about C<=>F95 mixed calling.
In particular, check about interfaces and binding -- a nice F95 feature that helps in inter-language calls.
The good news, C<=>F95 calling works very well once you stick to the conventions.
I personally use a ton of mixed coding from c++ to fortran 90/95/2003. I typically use gfortran as my compiler, but to avoid this issue, it is common practice to always send the size of the arrays. This even allows you to change the shape. Consider a 2 dimensional array containing x,y points:
double* x = new double[2*21]
real(8),intent(in),dimension(2,21)::x
This is a very handy feature and will then allow you to use the size command. The answers about compiler specifics are correct. To make your code usable on most compilers you should specify length when using multi-language interfaces.

How to limit the impact of implementation-dependent language features in C++?

The following is an excerpt from Bjarne Stroustrup's book, The C++ Programming Language:
Section 4.6:
Some of the aspects of C++’s fundamental types, such as the size of an int, are implementation- defined (§C.2). I point out these dependencies and often recommend avoiding them or taking steps to minimize their impact. Why should you bother? People who program on a variety of systems or use a variety of compilers care a lot because if they don’t, they are forced to waste time finding and fixing obscure bugs. People who claim they don’t care about portability usually do so because they use only a single system and feel they can afford the attitude that ‘‘the language is what my compiler implements.’’ This is a narrow and shortsighted view. If your program is a success, it is likely to be ported, so someone will have to find and fix problems related to implementation-dependent features. In addition, programs often need to be compiled with other compilers for the same system, and even a future release of your favorite compiler may do some things differently from the current one. It is far easier to know and limit the impact of implementation dependencies when a program is written than to try to untangle the mess afterwards.
It is relatively easy to limit the impact of implementation-dependent language features.
My question is: How to limit the impact of implementation-dependent language features? Please mention implementation-dependent language features then show how to limit their impact.
Few ideas:
Unfortunately you will have to use macros to avoid some platform specific or compiler specific issues. You can look at the headers of Boost libraries to see that it can quite easily get cumbersome, for example look at the files:
boost/config/compiler/gcc.hpp
boost/config/compiler/intel.hpp
boost/config/platform/linux.hpp
and so on
The integer types tend to be messy among different platforms, you will have to define your own typedefs or use something like Boost cstdint.hpp
If you decide to use any library, then do a check that the library is supported on the given platform
Use the libraries with good support and clearly documented platform support (for example Boost)
You can abstract yourself from some C++ implementation specific issues by relying heavily on libraries like Qt, which provide an "alternative" in sense of types and algorithms. They also attempt to make the coding in C++ more portable. Does it work? I'm not sure.
Not everything can be done with macros. Your build system will have to be able to detect the platform and the presence of certain libraries. Many would suggest autotools for project configuration, I on the other hand recommend CMake (rather nice language, no more M4)
endianness and alignment might be an issue if you do some low level meddling (i.e. reinterpret_cast and friends things alike (friends was a bad word in C++ context)).
throw in a lot of warning flags for the compiler, for gcc I would recommend at least -Wall -Wextra. But there is much more, see the documentation of the compiler or this question.
you have to watch out for everything that is implementation-defined and implementation-dependend. If you want the truth, only the truth, nothing but the truth, then go to ISO standard.
Well, the variable sizes one mentioned is a fairly well known issue, with the common workaround of providing typedeffed versions of the basic types that have well defined sizes (normally advertised in the typedef name). This is done use preprocessor macros to give different code-visibility on different platforms. E.g.:
#ifdef __WIN32__
typedef int int32;
typedef char char8;
//etc
#endif
#ifdef __MACOSX__
//different typedefs to produce same results
#endif
Other issues are normally solved in the same way too (i.e. using preprocessor tokens to perform conditional compilation)
The most obvious implementation dependency is size of integer types. There are many ways to handle this. The most obvious way is to use typedefs to create ints of the various sizes:
typedef signed short int16_t;
typedef unsigned short uint16_t;
The trick here is to pick a convention and stick to it. Which convention is the hard part: INT16, int16, int16_t, t_int16, Int16, etc. C99 has the stdint.h file which uses the int16_t style. If your compiler has this file, use it.
Similarly, you should be pedantic about using other standard defines such as size_t, time_t, etc.
The other trick is knowing when not to use these typedef. A loop control variable used to index an array, should just take raw int types so the compile will generate the best code for your processor. for (int32_t i = 0; i < x; ++i) could generate a lot of needless code on a 64-bite processor, just like using int16_t's would on a 32-bit processor.
A good solution is to use common headings that define typedeff'ed types as neccessary.
For example, including sys/types.h is an excellent way to deal with this, as is using portable libraries.
There are two approaches to this:
define your own types with a known size and use them instead of built-in types (like typedef int int32 #if-ed for various platforms)
use techniques which are not dependent on the type size
The first is very popular, however the second, when possible, usually results in a cleaner code. This includes:
do not assume pointer can be cast to int
do not assume you know the byte size of individual types, always use sizeof to check it
when saving data to files or transferring them across network, use techniques which are portable across changing data sizes (like saving/loading text files)
One recent example of this is writing code which can be compiled for both x86 and x64 platforms. The dangerous part here is pointer and size_t size - be prepared it can be 4 or 8 depending on platform, when casting or differencing pointer, cast never to int, use intptr_t and similar typedef-ed types instead.
One of the key ways of avoiding dependancy on particular data sizes is to read & write persistent data as text, not binary. If binary data must be used then all read/write operations must be centralised in a few methods and approaches like the typedefs already described here used.
A second rhing you can do is to enable all your your compilers warnings. for example, using the -pedantic flag with g++ will warn you of lots of potential portability problems.
If you're concerned about portability, things like the size of an int can be determined and dealt with without much difficulty. A lot of C++ compilers also support C99 features like the int types: int8_t, uint8_t, int16_t, uint32_t, etc. If yours doesn't support them natively, you can always include <cstdint> or <sys/types.h>, which, more often than not, has those typedefed. <limits.h> has these definitions for all the basic types.
The standard only guarantees the minimum size of a type, which you can always rely on: sizeof(char) < sizeof(short) <= sizeof(int) <= sizeof(long). char must be at least 8 bits. short and int must be at least 16 bits. long must be at least 32 bits.
Other things that might be implementation-defined include the ABI and name-mangling schemes (the behavior of export "C++" specifically), but unless you're working with more than one compiler, that's usually a non-issue.
The following is also an excerpt from Bjarne Stroustrup's book, The C++ Programming Language:
Section 10.4.9:
No implementation-independent guarantees are made about the order of construction of nonlocal objects in different compilation units. For example:
// file1.c:
Table tbl1;
// file2.c:
Table tbl2;
Whether tbl1 is constructed before tbl2 or vice versa is implementation-dependent. The order isn’t even guaranteed to be fixed in every particular implementation. Dynamic linking, or even a small change in the compilation process, can alter the sequence. The order of destruction is similarly implementation-dependent.
A programmer may ensure proper initialization by implementing the strategy that the implementations usually employ for local static objects: a first-time switch. For example:
class Zlib {
static bool initialized;
static void initialize() { /* initialize */ initialized = true; }
public:
// no constructor
void f()
{
if (initialized == false) initialize();
// ...
}
// ...
};
If there are many functions that need to test the first-time switch, this can be tedious, but it is often manageable. This technique relies on the fact that statically allocated objects without constructors are initialized to 0. The really difficult case is the one in which the first operation may be time-critical so that the overhead of testing and possible initialization can be serious. In that case, further trickery is required (§21.5.2).
An alternative approach for a simple object is to present it as a function (§9.4.1):
int& obj() { static int x = 0; return x; } // initialized upon first use
First-time switches do not handle every conceivable situation. For example, it is possible to create objects that refer to each other during construction. Such examples are best avoided. If such objects are necessary, they must be constructed carefully in stages.