What type for operations on pointers - c++

I've searched quite a bit, but couldn't find anything helpful - but then I'm not sure I'm searching for the right thing.
Is there any scalar defined by the standard that has to be at least as large as a pointer? I.e. sizeof(?) >= sizeof(void*).
I need it because I'm writing a small garbage collector and want something along the lines of this:
struct Tag {
uint32_t desc:sizeof(uint32_t)*8-2; // pointer to typedescriptor
uint32_t free:1;
uint32_t mark:1;
};
I'd prefer something that's valid according to the standard (if we're at it, I was quite surprised that sizeof(uint32_t)*8-2 is valid for the bitfield definition - but VS2010 allows it).
So does size_t fulfill this requirement?
Edit: So after my inclusion of both C and C++ lead to some problems (well and there I thought they would be similar in that regard), I'd actually settle for one of them (I don't really need C++ for this part of the code and I can link C and c++ together so that should work). And C99 seems to be the right standard in this case from the answers.

You could include <stdint.h> (or <cstdint>) and use uintptr_t or intptr_t.
Since MSVC refuses to support C99, you may need to include <Windows.h> and use ULONG_PTR or LONG_PTR instead. (See C99 stdint.h header and MS Visual Studio)
(Also, please use CHAR_BIT instead of 8.)

C99 has the optional uintptr_t in <stdint.h>which guarantees that you can convert between a uintptr_t and a pointer value, though it doesn't say anything about any operations on integer.
Generally, on common platforms a void* is the same as any other pointer and converting a pointer to an integer, manipulating that integer and converting it back to a pointer yields well defined resultes, but C does not guarantee this so you'll have to know the compilers/platform you want to target.
Best you probably can do is use the above mentioned uintptr_t if you have a C99 compiler, or compile a program on the target platform which checks whether sizeof(void*) is equal to any of the sizeof unsigned short,int,long,long long and generate a header file where you typedef your own uintptr according to what the program found out.

Related

Getting bool from C to C++ and back

When designing data structures which are to be passed through a C API which connects C and C++ code, is it safe to use bool? That is, if I have a struct like this:
struct foo {
int bar;
bool baz;
};
is it guaranteed that the size and meaning of baz as well as its position within foo are interpreted in the same way by C (where it's a _Bool) and by C++?
We are considering to do this on a single platform (GCC for Debian 8 on a Beaglebone) with both C and C++ code compiled by the same GCC version (as C99 and C++11, respectively). General comments are welcome as well, though.
C's and C++'s bool type are different, but, as long as you stick to the same compiler (in your case, gcc), it should be safe, as this is a reasonable common scenario.
In C++, bool has always been a keyword. C didn't have one until C99, where they introduced the keyword _Bool (because people used to typedef or #define bool as int or char in C89 code, so directly adding bool as a keyword would break existing code); there is the header stdbool.h which should, in C, have a typedef or #define from _Bool to bool. Take a look at yours; GCC's implementation looks like this:
/*
* ISO C Standard: 7.16 Boolean type and values <stdbool.h>
*/
#ifndef _STDBOOL_H
#define _STDBOOL_H
#ifndef __cplusplus
#define bool _Bool
#define true 1
#define false 0
#else /* __cplusplus */
/* Supporting <stdbool.h> in C++ is a GCC extension. */
#define _Bool bool
#define bool bool
#define false false
#define true true
#endif /* __cplusplus */
/* Signal that all the definitions are present. */
#define __bool_true_false_are_defined 1
#endif /* stdbool.h */
Which leads us to believe that, at least in GCC, the two types are compatible (in both size and alignment, so that the struct layout will remain the same).
Also worth noting, the Itanium ABI, which is used by GCC and most other compilers (except Visual Studio; as noted by Matthieu M. in the comments below) on many platforms, specifies that _Bool and bool follow the same rules. This is a strong garantee. A third hint we can get is from Objective-C's reference manual, which says that for Objective-C and Objective-C++, which respect C's and C++'s conventions respectively, bool and _Bool are equivalent; so I'd pretty much say that, though the standards do not guarantee this, you can assume that yes, they are equivalent.
Edit:
If the standard does not guarantee that _Bool and bool will be compatible (in size, alignment, and padding), what does?
When we say those things are "architecture dependent", we actually mean that they are ABI dependent. Every compiler implements one or more ABIs, and two compilers (or versions of the same compiler) are said to be compatible if they implement the same ABI. Since it is expected to call C code from C++, as this is ubiquitously common, all C++ ABIs I've ever heard of extend the local C ABI.
Since OP asked about Beaglebone, we must check the ARM ABI, most specifically the GNU ARM EABI used by Debian. As noted by Justin Time in the comments, the ARM ABI indeed declares C++'s ABI to extend C's, and that _Bool and bool are compatible, both being of size 1, alignment 1, representing a machine's unsigned byte. So the answer to the question, on the Beaglebone, yes, _Bool and bool are compatible.
The language standards say nothing about this (I'm happy to be proven wrong about this, I couldn't find anything), so it can't be safe if we just limit ourselves to language standards. But if you're picky about which architectures you support you can find their ABI documentation to see if it will be safe.
For example, the amd64 ABI document has a footnote for the _Bool type that says:
This type is called bool in C++.
Which I can't interpret in any other way than that it will be compatible.
Also, just musing about this. Of course it will work. Compilers generate code that both follow an ABI and the behavior of the largest compiler for the platform (if that behavior is outside the ABI). A big thing about C++ is that it can link to libraries written in C and a thing about libraries is that they can be compiled by any compiler on the same platform (this is why we have ABI documents in the first place). Can there be some minor incompatibility at some point? Sure, but that's something you'd better solve by a bug report to the compiler maker rather than workaround in your code. I doubt bool would be something compiler makers would screw up.
The only thing the C standard says on _Bool :
An object declared as type _Bool is large enough to store the values 0
and 1.
Which would mean that _Bool is at least sizeof(char) or greater (so true / false are guaranteed to be storable).
The exact size is all implementation defined as Michael said in the comments though. You're better off just performing some tests on their sizes on the relevant compiler and if those match and you stick with that same compiler I'd consider it's safe.
As Gill Bates says above, you do have a problem that sizeof(bool) is compiler-dependent in C. There's no guarantee that the same compiler will treat it the same in C and C++, or that they would be the same on different architectures. The compiler would even be within its rights (according to the C standard) to represent this as an individual bit in a bitfield if it wanted.
I've personally experienced this when working with the TI OMAP-L138 processor which combines a 32-bit ARM core and a 32-bit DSP core on the same device, with some shared memory accessible by both. The ARM core represented bool as an int (32-bit here), whereas the DSP represented bool as char (8-bit). To solve this, I defined my own type bool32_t for use with the shared memory interface, knowing that a 32-bit value would work for both sides. Of course I could have defined it as an 8-bit value, but I considered it less likely to affect performance if I kept it as the native integer size.
If you do the same as I did then you can 100% guarantee binary compatibility between your C and C++ code. If you don't then you can't. It's really as simple as that. With the same compiler, your odds are very good - but there is no guarantee, and changing compiler options can easily screw you over in unexpected ways.
On a related subject, your int should also be using int16_t, int32_t or another integer of defined size. (You should include stdint.h for these type definitions.) On the same platform it is highly unlikely that this will be different for C and C++, but it is a code smell for firmware to use int. The exception is in places where you genuinely don't care how long an int is, but it should be clear that interfaces and structures must have that well-defined. It is too easy for programmers to make assumptions (which are frequently incorrect!) about its size, and the results are generally catastrophic when it goes wrong - and worse, they often don't go wrong in testing where you can easily find and fix them.

How do you deal with the native size of integers changing between platforms?

I'm afraid I already know the answer to this but I'd like to be sure...
I have a fairly large project with a header file that typedefs native types:
typedef unsigned long int u32;
typedef signed long int s32;
// etc...
The inevitable has happened and I am now trying to compile on a system where long is 64 bits instead of 32. What is the best way to go about fixing it?
I could typedef the above with int (or int32_t/uint32_t from stdint.h) which would satisfy the 32bit size on the platforms I'm aware of but this still seems dubious. There is also the problem with printf style functions where %ld was used (the compiler complains and would like to see %d instead). These would all have to be changed, wouldn't they (perhaps with defines in inttypes.h)?
This seems straightforward but I would like to be sure before I start digging into it (fixing printf format strings seems daunting).
C has <stdint.h>, which in C++0x is <cstdint>. For non-C++0x compilers, you have <boost/cstdint.hpp> if you don't mind reliance on Boost. The <inttypes.h> header also includes macros for printf() format specifiers, which can be adapted for use with the <cstdint> types. If you're using C++, you should be using <iostream>, and consequently won't need to worry about typed format specifiers.
create a single translation (.cpp) which compiles with your library/executable. in it, use static asserts. if you need a specific size, this approach can confirm whether or not your declarations match the conditions you need them to match before you create a linkable/executable binary, should the environment ever change.
then turn up the compiler warnings and fix what must be fixed.
The solution regarding a portable 32 bit integer (and the like):
Define your own portable types in some hand-built configuration file or
Use stdint.h which does this for you and is guaranteed to be there in any C compiler that is even close to C99 compatible.
As far as printf is concerned, stdint.h provides portable macros for printf. Or just use C++ I/O and then you don't have to worry about printf formats.

Best Practices: Should I create a typedef for byte in C or C++?

Do you prefer to see something like t_byte* (with typedef unsigned char t_byte) or unsigned char* in code?
I'm leaning towards t_byte in my own libraries, but have never worked on a large project where this approach was taken, and am wondering about pitfalls.
If you're using C99 or newer, you should use stdint.h for this. uint8_t, in this case.
C++ didn't get this header until C++11, calling it cstdint. Old versions of Visual C++ didn't let you use C99's stdint.h in C++ code, but pretty much every other C++98 compiler did, so you may have that option even when using old compilers.
As with so many other things, Boost papers over this difference in boost/integer.hpp, providing things like uint8_t if your compiler's standard C++ library doesn't.
I suggest that if your compiler supports it use the C99 <stdint.h> header types such as uint8_t and int8_t.
If your compiler does not support it, create one. Here's an example for VC++, older versions of which do not have stdint.h. GCC does support stdint.h, and indeed most of C99
One problem with your suggestion is that the sign of char is implementation defined, so if you do create a type alias. you should at least be explicit about the sign. There is some merit in the idea since in C# for example a char is 16bit. But it has a byte type as well.
Additional note...
There was no problem with your suggestion, you did in fact specify unsigned.
I would also suggest that plain char is used if the data is in fact character data, i.e. is a representation of plain text such as you might display on a console. This will present fewer type agreement problems when using standard and third-party libraries. If on the other hand the data represents a non-character entity such as a bitmap, or if it is numeric 'small integer' data upon which you might perform arithmetic manipulation, or data that you will perform logical operations on, then one of the stdint.h types (or even a type defined from one of them) should be used.
I recently got caught out on a TI C54xx compiler where char is in fact 16bit, so that is why using stdint.h where possible, even if you use it to then define a byte type is preferable to assuming that unsigned char is a suitable alias.
I prefer for types to convey the meaning of the values stored in it. If I need a type describing a byte as it is on my machine, I very much prefer byte_t over unsigned char, which could mean just about anything. (I have been working in a code base that used either signed char or unsigned char to store UTF-8 strings.) The same goes for uint8_t. It could just be used as that: an 8bit unsigned integer.
With byte_t (as with any other aptly named type), there rarely ever is a need to look up what it is defined to (and if so, a good editor will take 3secs to look it up for you; maybe 10secs, if the code base is huge), and just by looking at it it's clear what's stored in objects of that type.
Personally I prefer boost::int8_t and boost::uint8_t.
If you don't want to use boost you could borrow boost\cstdint.hpp.
Another option is to use portable version of stdint.h (link from this answer).
Besides your awkward naming convention, I think that might be okay. Keep in mind boost does this for you, to help with cross-platform-ability:
#include <boost/integer.hpp>
typedef boost::uint8_t byte_t;
Note that usually type's are suffixed with _t, as in byte_t.
I prefer to use standard types, unsigned char, uint8_t, etc., so any programmer looking at the source does not have to refer back to headers to grok the code. The more typedefs you use the more time it takes for others to get to know your typing conventions. For structures, absolutely use typedefs, but for primitives use them sparingly.

How to get string from an address?

I have a ULONG value that contains the address.
The address is basically of string(array of wchar_t terminated by NULL character)
I want to retrieve that string.
what is the best way to do that?
#KennyTM's answer is right on the money if by "basically of a string" you mean it's a pointer to an instance of the std::string class. If you mean it's a pointer to a C string, which I suspect may be more likely, you need:
char *s = reinterpret_cast<char *>(your_ulong);
Or, in your case:
whcar_t *s = reinterpret_cast<wchar_t *>(your_ulong);
Note also that you can't safely store pointers in any old integral type. I can make a compiler with a 32-bit long type and a 64-bit pointer type. If your compiler supports it, a proper way to store pointers in integers is to use the stdint.h types intptr_t and uintptr_t, which are guaranteed (by the C99 standard) to be big enough to store pointer types.
However, C99 isn't part of C++, and many C++ compilers (read: Microsoft) may not provide this kind of functionality (because who needs to write portable code?). Fortunately, stdint.h is useful enough that workarounds exist, and portable (and free) implementations of stdint.h for compatability with older compilers can be found easily on the internet.
string& s = *reinterpret_cast<string*>(your_ulong);

How to limit the impact of implementation-dependent language features in C++?

The following is an excerpt from Bjarne Stroustrup's book, The C++ Programming Language:
Section 4.6:
Some of the aspects of C++’s fundamental types, such as the size of an int, are implementation- defined (§C.2). I point out these dependencies and often recommend avoiding them or taking steps to minimize their impact. Why should you bother? People who program on a variety of systems or use a variety of compilers care a lot because if they don’t, they are forced to waste time finding and fixing obscure bugs. People who claim they don’t care about portability usually do so because they use only a single system and feel they can afford the attitude that ‘‘the language is what my compiler implements.’’ This is a narrow and shortsighted view. If your program is a success, it is likely to be ported, so someone will have to find and fix problems related to implementation-dependent features. In addition, programs often need to be compiled with other compilers for the same system, and even a future release of your favorite compiler may do some things differently from the current one. It is far easier to know and limit the impact of implementation dependencies when a program is written than to try to untangle the mess afterwards.
It is relatively easy to limit the impact of implementation-dependent language features.
My question is: How to limit the impact of implementation-dependent language features? Please mention implementation-dependent language features then show how to limit their impact.
Few ideas:
Unfortunately you will have to use macros to avoid some platform specific or compiler specific issues. You can look at the headers of Boost libraries to see that it can quite easily get cumbersome, for example look at the files:
boost/config/compiler/gcc.hpp
boost/config/compiler/intel.hpp
boost/config/platform/linux.hpp
and so on
The integer types tend to be messy among different platforms, you will have to define your own typedefs or use something like Boost cstdint.hpp
If you decide to use any library, then do a check that the library is supported on the given platform
Use the libraries with good support and clearly documented platform support (for example Boost)
You can abstract yourself from some C++ implementation specific issues by relying heavily on libraries like Qt, which provide an "alternative" in sense of types and algorithms. They also attempt to make the coding in C++ more portable. Does it work? I'm not sure.
Not everything can be done with macros. Your build system will have to be able to detect the platform and the presence of certain libraries. Many would suggest autotools for project configuration, I on the other hand recommend CMake (rather nice language, no more M4)
endianness and alignment might be an issue if you do some low level meddling (i.e. reinterpret_cast and friends things alike (friends was a bad word in C++ context)).
throw in a lot of warning flags for the compiler, for gcc I would recommend at least -Wall -Wextra. But there is much more, see the documentation of the compiler or this question.
you have to watch out for everything that is implementation-defined and implementation-dependend. If you want the truth, only the truth, nothing but the truth, then go to ISO standard.
Well, the variable sizes one mentioned is a fairly well known issue, with the common workaround of providing typedeffed versions of the basic types that have well defined sizes (normally advertised in the typedef name). This is done use preprocessor macros to give different code-visibility on different platforms. E.g.:
#ifdef __WIN32__
typedef int int32;
typedef char char8;
//etc
#endif
#ifdef __MACOSX__
//different typedefs to produce same results
#endif
Other issues are normally solved in the same way too (i.e. using preprocessor tokens to perform conditional compilation)
The most obvious implementation dependency is size of integer types. There are many ways to handle this. The most obvious way is to use typedefs to create ints of the various sizes:
typedef signed short int16_t;
typedef unsigned short uint16_t;
The trick here is to pick a convention and stick to it. Which convention is the hard part: INT16, int16, int16_t, t_int16, Int16, etc. C99 has the stdint.h file which uses the int16_t style. If your compiler has this file, use it.
Similarly, you should be pedantic about using other standard defines such as size_t, time_t, etc.
The other trick is knowing when not to use these typedef. A loop control variable used to index an array, should just take raw int types so the compile will generate the best code for your processor. for (int32_t i = 0; i < x; ++i) could generate a lot of needless code on a 64-bite processor, just like using int16_t's would on a 32-bit processor.
A good solution is to use common headings that define typedeff'ed types as neccessary.
For example, including sys/types.h is an excellent way to deal with this, as is using portable libraries.
There are two approaches to this:
define your own types with a known size and use them instead of built-in types (like typedef int int32 #if-ed for various platforms)
use techniques which are not dependent on the type size
The first is very popular, however the second, when possible, usually results in a cleaner code. This includes:
do not assume pointer can be cast to int
do not assume you know the byte size of individual types, always use sizeof to check it
when saving data to files or transferring them across network, use techniques which are portable across changing data sizes (like saving/loading text files)
One recent example of this is writing code which can be compiled for both x86 and x64 platforms. The dangerous part here is pointer and size_t size - be prepared it can be 4 or 8 depending on platform, when casting or differencing pointer, cast never to int, use intptr_t and similar typedef-ed types instead.
One of the key ways of avoiding dependancy on particular data sizes is to read & write persistent data as text, not binary. If binary data must be used then all read/write operations must be centralised in a few methods and approaches like the typedefs already described here used.
A second rhing you can do is to enable all your your compilers warnings. for example, using the -pedantic flag with g++ will warn you of lots of potential portability problems.
If you're concerned about portability, things like the size of an int can be determined and dealt with without much difficulty. A lot of C++ compilers also support C99 features like the int types: int8_t, uint8_t, int16_t, uint32_t, etc. If yours doesn't support them natively, you can always include <cstdint> or <sys/types.h>, which, more often than not, has those typedefed. <limits.h> has these definitions for all the basic types.
The standard only guarantees the minimum size of a type, which you can always rely on: sizeof(char) < sizeof(short) <= sizeof(int) <= sizeof(long). char must be at least 8 bits. short and int must be at least 16 bits. long must be at least 32 bits.
Other things that might be implementation-defined include the ABI and name-mangling schemes (the behavior of export "C++" specifically), but unless you're working with more than one compiler, that's usually a non-issue.
The following is also an excerpt from Bjarne Stroustrup's book, The C++ Programming Language:
Section 10.4.9:
No implementation-independent guarantees are made about the order of construction of nonlocal objects in different compilation units. For example:
// file1.c:
Table tbl1;
// file2.c:
Table tbl2;
Whether tbl1 is constructed before tbl2 or vice versa is implementation-dependent. The order isn’t even guaranteed to be fixed in every particular implementation. Dynamic linking, or even a small change in the compilation process, can alter the sequence. The order of destruction is similarly implementation-dependent.
A programmer may ensure proper initialization by implementing the strategy that the implementations usually employ for local static objects: a first-time switch. For example:
class Zlib {
static bool initialized;
static void initialize() { /* initialize */ initialized = true; }
public:
// no constructor
void f()
{
if (initialized == false) initialize();
// ...
}
// ...
};
If there are many functions that need to test the first-time switch, this can be tedious, but it is often manageable. This technique relies on the fact that statically allocated objects without constructors are initialized to 0. The really difficult case is the one in which the first operation may be time-critical so that the overhead of testing and possible initialization can be serious. In that case, further trickery is required (§21.5.2).
An alternative approach for a simple object is to present it as a function (§9.4.1):
int& obj() { static int x = 0; return x; } // initialized upon first use
First-time switches do not handle every conceivable situation. For example, it is possible to create objects that refer to each other during construction. Such examples are best avoided. If such objects are necessary, they must be constructed carefully in stages.