Purpose of using UINT64_C? - c++

I found this line in boost source:
const boost::uint64_t m = UINT64_C(0xc6a4a7935bd1e995);
I wonder what is the purpose of using a MACRO here?
All this one does is to add ULL to the constant provided.
I assume it may be used to make it harder for people to make mistake of typing UL instead of ULL, but I wonder if there is any other reason to use it.

If you look at boost/cstdint.h, you can see that the definition of the UINT64_C macro is different on different platforms and compilers.
On some platforms it's defined as value##uL, on others it's value##uLL, and on yet others it's value##ui64. It all depends on the size of unsigned long and unsigned long long on that platform or the presence of compiler-specific extensions.
I don't think using UINT64_C is actually necessary in that context, since the literal 0xc6a4a7935bd1e995 would already be interpreted as a 64-bit unsigned integer. It is necessary in some other context though. For example, here the literal 0x00000000ffffffff would be interpreted as a 32-bit unsigned integer if it weren't specifically specified as a 64-bit unsigned integer by using UINT64_C (though I think it would be promoted to uint64_t for the bitwise AND operation).
In any case, explicitly declaring the size of literals where it matters serves a valuable role in code-clarity. Sometimes, even if an operation is perfectly well-defined by the language, it can be difficult for a human programmer to tell what types are involved. Saying it explicitly can make code easier to reason about, even if it doesn't directly alter the behavior of the program.

Related

Forcing sign of a bit field (pre-C++14) when using fixed size types

Skip to the bolded part for the essential question, the rest is just background.
For reasons I prefer not to get into, I'm writing a code generator that generates C++ structs in a (very) pre-C++14 environment. The generator has to create bit-fields; it also needs the tightest possible control over the behaviour of the generated fields, in as portable a fashion as possible. I need to control both the size of the underlying allocation unit, and how signed values are handled. I won't get into why I'm on such a fool's errand, that so obviously runs afoul of Implementation Defined behaviour, but there's a paycheck involved, and all the right ways to do what needs to be done have been rejected by the people who arrange the paychecks.
So I'm stuck generating things like:
int32_t x : 11;
because I need to convince the compiler that this field (and other adjacent fields with the same underlying type) live in a 32 bit word. Generating int for the underlying type is not an option because int doesn't have a fixed size, and things would go very wrong the day someone releases a compiler in which int is 64 bits wide, or we end up back on one where it's 16.
In pre-C++14, int x: 11 might or might not be an unsigned field, and you prepend an explicit signed or unsigned to get what you need. I'm concerned that int32_t and friends will have the same ambiguity (why wouldn't it?) but compilers are gagging on signed int32_t.
Does the C++ standard have any words on whether the intxx_t types impose their signedness on bit fields? If not, is there any guarantee that something like
typedef signed int I32;
...
I32 x : 11;
...
assert(sizeof(I32)==4); //when this breaks, you won't have fun
will carry the signed indicator into the bitfield?
Please note that any suggestion that starts with "just generate a function to..." is by fiat off the table. These generated headers will be plugged into code that does things like s->x = 17; and I've had it nicely explained to me that I must not suggest changing it all to s->set_x(17) even one more time. Even though I could trivially generate a set_x function to exactly and safely do what I need without any implementation defined behaviour at all. Also, I've very aware of the vagaries of bit fields, and left to right and right to left and inside out and whatever else compilers get up to with them, and several other reasons why this is a fool's errand. And I can't just "try stuff" because this needs to work on compilers I don't have, which is why I'm scrambling after guarantees in the standard.
Note: I can't implement any solution that doesn't allow existing code to simply cast a pointer to a buffer of bytes to a pointer to the generated struct, and then use their pointer to get to fields to read and write. The existing code is all about s->x, and must work with no changes. That rules out any solution involving a constructor in generated code.
Does the C++ standard have any words on whether the intxx_t types impose their signedness on bit fields?
No.
The standard's synopsis for the fixed-width integers of <cstdint>, [cstdint.syn] (link to modern standard; the relevant parts of the synopsis looks the same in the C++11 standard) simply specifies, descriptively (not by means of the signed/unsigned keywords), that they shall be of "signed integer type" or "unsigned integer type".
E.g. for gcc, <cstdint> expose the fixed width integers of <stdint.h>, which in turn are typedefs to predefined pre-processor macros (e.g. __INT32_TYPE__ for int32_t), the latter being platform specific.
The standard does not impose any required use of the signed or unsigned keywords in this synopsis, and thus bit fields of fixed width integer types will, in C++11, suffer the same implementation-defined behavior regarding their signedness as is present when declaring a plain integer bit field. Recall that the relevant part of [class.bit]/3 prior to C++14 was (prior to action due to CWG 739):
It is implementation-defined whether a plain (neither explicitly signed nor unsigned) char, short, int, long, or long long bit-field is signed or unsigned. ...
Indeed, the following thread
How are the GNU C preprocessor predefined macros used?
shows an example where e.g. __INT32_TYPE__ on the answerer's particular platform is defined with no explicit presence of the signed keyword:
$ gcc -dM -E - < /dev/null | grep __INT
...
#define __INT32_TYPE__ int
it also needs the tightest possible control over the behaviour of the generated fields, in as portable a fashion as possible. I need to control both the size of the underlying allocation unit, and how signed values are handled.
These two goals are incompatible. Bitfields inherently have portability problems.
If the standard defined the behaviors you want, then the "vagaries of bit fields" wouldn't exist, and people wouldn't bother recommending using bitmasks and shifts for portability.
What you possibly could do is to provide a class that exposes the same interface as a struct with bitfields but that doesn't actually use bitfields internally. Then you could make its constructor and destructor read or write those fields portably via masks and shifts. For example, something like:
class BitfieldProxy
{
public:
BitfieldProxy(uint32_t& u)
: x((u >> 4) & 0x7FF),
y(u & 0xF),
mDest(u)
{
}
~BitfieldProxy()
{
assert((x & 0x7FF) == x);
assert((y & 0xF) == y);
dest = (x << 4) | y;
}
BitfieldProxy(const BitfieldProxy&) = delete;
BitfieldProxy& operator=(const BitfieldProxy&) = delete;
// Only the last 11 bits are valid.
unsigned int x;
// Only the last 4 bits are valid.
unsigned int y;
private:
uint32_t& mDest;
};

Why certain implicit type conversions are safe on a machine and not on an other?? How can I prevent this cross platform issues?

I recently found a bug on my code that took me a few hours to debug.
the problem was in a function defined as:
unsigned int foo(unsigned int i){
long int v[]={i-1,i,i+1} ;
.
.
.
return x ; // evaluated by the function but not essential how for this problem.
}
The definition of v didn't cause any issue on my development machine (ubuntu 12.04 32 bit, g++ compiler), where the unsigned int were implicitly converted to long int and as such the negative values were correctly handled.
On a different machine (ubuntu 12.04 64 bit, g++ compiler) however this operation was not safe. When i=0, v[0] was not set to -1, but to some weird big value (as it often happens
when trying to make an unsigned int negative).
I could solve the issue casting the value of i to long int
long int v[]={(long int) i - 1, (long int) i, (long int) i + 1};
and everything worked fine (on both machines).
I can't figure out why the first works fine on a machine and doesn't work on the other.
Can you help me understanding this, so that I can avoid this or other issues in the future?
For unsigned values, addition/subtraction is well-defined as modulo arithmetic, so 0U-1 will work out to something like std::numeric_limits<unsigned>::max().
When converting from unsigned to signed, if the destination type is large enough to hold all the values of the unsigned value then it simply does a straight data copy into the destination type. If the destination type is not large enough to hold all the unsigned values I believe that it's implementation defined (will try to find standard reference).
So when long is 64-bit (presumably the case on your 64-bit machine) the unsigned fits and is copied straight.
When long is 32-bits on the 32-bit machine, again it most likely just interprets the bit pattern as a signed value which is -1 in this case.
EDIT: The simplest way to avoid these problems is to avoid mixing signed and unsigned types. What does it mean to subtract one from a value whose concept doesn't allow for negative numbers? I'm going to argue that the function parameter should be a signed value in your example.
That said g++ (at least version 4.5) provides a handy -Wsign-conversion that detects this issue in your particular code.
You can also have specialized cast catching all over-flow casts:
template<typename O, typename I>
O architecture_cast(I x) {
/* make sure I is an unsigned type. It */
static_assert(std::is_unsigned<I>::value, "Input value to architecture_cast has to be unsigned");
assert(x <= static_cast<typename std::make_unsigned<O>::type>( std::numeric_limits<O>::max() ));
return static_cast<O>(x);
}
Using this will catch in debug all of the casts from bigger numbers than the resulting type can accommodate. This includes your case of unsigned int being 0 and subtracted by -1 which results to biggest unsigned int.
Integer promotion rules in the C++ Standard are inherited from those in the C Standard, which were chosen not to describe how a language should most usefully behave, but rather to offer a behavioral description that was as consistent was practical with the ways many existing implementations had extended earlier dialects of C to add unsigned types.
Things get further complicated by an apparent desire to have the Standard specify behavioral aspects that were thought to be consistent among 100% of existing implementations, without regard for whether some other compatible behavior might be more broadly useful, while avoiding having the Standard impose any behavioral requirements on actions if on some plausible implementations it might be expensive to guarantee any behavior consistent with sequential program execution, but impossible to guarantee any behavior that would actually be useful.
I think it's pretty clear that the Committee wanted to unambiguously specify that long1 = uint1+1; uint2 = long1; and long1 = uint1+1; uint2 = long1; must set uint2 in a manner consistent with wraparound behavior in all cases, and did not want to forbid them from using wraparound behavior when setting long1. Although the Standard could have upheld the first requirement while implementations to promote to long on quiet-wraparound two's-complement platforms where the assignments to uint2 would yield results consistent with using wraparound behavior throughout, doing so would have meant including a rule specifically for quiet-wraparound two's-complement platforms, which is something C89 and--to an even greater extent C99--were exceptionally keen to avoid doing.

Disable default numeric types in compiler

When creating custom typedefs for integers, is it possible for compiler to warn when you when using a default numeric type?
For example,
typedef int_fast32_t kint;
int_fast32_t test=0;//Would be ok
kint test=0; //Would be ok
int test=0; //Would throw a warning or error
We're converting a large project and the default int size on platform is 32767 which is causing some issues. This warning would warn a user to not use ints in the code.
If possible, it would be great if this would work on GCC and VC++2012.
I'm reasonably sure gcc has no such option, and I'd be surprised if VC did.
I suggest writing a program that detects references to predefined types in source code, and invoking that tool automatically as part of your build process. It would probably suffice to search for certain keywords.
Be sure you limit this to your own source files; predefined and third-party headers are likely to make extensive use of predefined types.
But I wouldn't make the prohibition absolute. There are a number of standard library functions that use predefined types. For example, in c = getchar() it makes no sense to declare c as anything other than int. And there's no problem for something like for (int i = 0; i <= 100; i ++) ...
Ideally, the goal should be to use predefined types properly. The language has never guaranteed that an int can exceed 32767. (But "proper" use is difficult or impossible to verify automatically.)
I'd approach this by doing a replace-all first and then documenting this thoroughly.
You can use a preprocessor directive:
#define int use kint instead
Note that technically this is undefined behavior and you'll run into trouble if you do this definition before including third-party headers.
I would recommend to make bulk replacement int -> old_int_t at the very beginning of your porting. This way you can continue modifying your code without facing major restrictions and at the same time have access to all places that are not yet updated.
Eventually, at the end of your work, all occurencies of old_int_t should go away.
Even if one could somehow undefine the keyword int, that would do nothing to prevent usage of that type, since there are many cases where the compiler will end up using that type. Beyond the obvious cases of integer literals, there are some more subtle cases involving integer promotion. For example, if int happens to be 64 bits, operations between two variables of type uint32_t will be performed using type int rather than uint32_t. As nice as it would be to be able to specify that some variables represent numbers (which should be eagerly promoted when practical) while others represent members of a wrapping algebraic ring (which should not be promoted), I know of no facility to do such a thing. Consequently, int is unavoidable.

Forcing types to a specific size

I've been learning C++ and one thing that I'm not really comfortable with is the fact that datatype sizes are not consistent. Depending on what system something is deployed on an int could be 16 bits or 32 bits, etc.
So I was thinking it might be a good idea to make my own header file with data types like byte, word, etc. that are defined to be a specific size and will maintain that size on any platform.
Two questions. First is this a good idea? Or is it going to create other problems I'm not aware of? Second, how do you define a type as being, say, 8 bits? I can't just say #define BYTE char, cause char would vary across platforms.
Fortunately, other people have noticed this same problem. In C99 and C++11 (so set your compiler to compatibility with one of those two modes, there should be a switch in your compiler settings), they added the header stdint.h (for C) and cstdint (for C++). If you #include <cstdint>, you get the types int8_t, int16_t, int32_t, int64_t, and the same prefixed with a u for unsigned versions. If your platform supports those types, they will be defined in the header, along with several others.
If your compiler does not yet support that standard (or you are forced by reasons out of your control to remain on C++03), then there is also Boost.
However, you should only use this if you care exactly about the size of the type. int and unsigned are fine for throw-away variables in most cases. size_t should be used for indexing std::vector, etc.
First you need to figure out if you really care what sizes things are. If you are using an int to count the number of lines in a file, do you really care if it's 32-bit or 64? You need BYTE, WORD, etc if you are working with packed binary data, but generally not for any other reason. So you may be worrying over something that doesn't really matter.
Better yet, use the already defined stuff in stdint.h. See here for more details. Similar question here.
Example:
int32_t is always 32 bits.
Many libraries have their own .h with a lots of typedef to have constant size types. This is useful when making portable code, and avoid relying on the headers of the platform you are currently working with.
If you only want to make sure the builtin data types have a minimum size you can use std::numeric_limits in the header to check.
std::numeric_limits<int>::digits
will give you, for example, the number of bits of an int without the sign bit. And
std::numeric_limits<int>::max()
will give you the max value.

Why do C programmers use typedefs to rename basic types?

So I'm far from an expert on C, but something's been bugging me about code I've been reading for a long time: can someone explain to me why C(++) programmers use typedefs to rename simple types? I understand why you would use them for structs, but what exactly is the reason for declarations I see like
typedef unsigned char uch;
typedef uch UBYTE;
typedef unsigned long ulg;
typedef unsigned int u32;
typedef signed short s16;
Is there some advantage to this that isn't clear to me (a programmer whose experience begins with Java and hasn't ventured far outside of strictly type-safe languages)? Because I can't think of any reason for it--it looks like it would just make the code less readable for people unfamiliar with the project.
Feel free to treat me like a C newbie, I honestly know very little about it and it's likely there are things I've misunderstood from the outset. ;)
Renaming types without changing their exposed semantics/characteristics doesn't make much sense. In your example
typedef unsigned char uch;
typedef unsigned long ulg;
belong to that category. I don't see the point, aside from making a shorter name.
But these ones
typedef uch UBYTE;
typedef unsigned int u32;
typedef signed short s16;
are a completely different story. For example, s16 stands for "signed 16 bit type". This type is not necessarily signed short. Which specific type will hide behind s16 is platform-dependent. Programmers introduce this extra level of naming indirection to simplify the support for multiple platforms. If on some other platform signed 16 bit type happens to be signed int, the programmer will only have to change one typedef definition. UBYTE apparently stands for an unsigned machine byte type, which is not necessarily unsigned char.
It's worth noting that the C99 specification already provides a standard nomenclature for integral types of specific width, like int16_t, uint32_t and so on. It probably makes more sense to stick with this standard naming convention on platforms that don't support C99.
This allows for portability. For example you need an unsigned 32-bit integer type. Which standard type is that? You don't know - it's implementation defined. That's why you typedef a separate type to be 32-bit unsigned integer and use the new type in your code. When you need to compile on another C implementation you just change the typedefs.
Sometimes it is used to reduce an unwieldy thing like volatile unsigned long to something a little more compact such as vuint32_t.
Other times it is to help with portability since types like int are not always the same on each platform. By using a typedef you can set the storage class you are interested in to the platform's closest match without changing all the source code.
There are many reasons to it. What I think is:
Typename becomes shorter and thus code also smaller and more readable.
Aliasing effect for longer structure names.
Convention used in particular team/companies/style.
Porting - Have same name across all OS and machine. Its native data-structure might be slightly different.
Following is a quote from The C Programming Language (K&R)
Besides purely aesthetic issues, there are two main reasons for using
typedefs.
First- to parameterize a program
The first is to parameterize a program against portability problems.
If typedefs are used for data types
that may be machine-dependent, only
the typedefs need change when the
program is moved.
One common situation is to use typedef names for various integer
quantities, then make an appropriate
set of choices of short, int, and long
for each host machine. Types like
size_t and ptrdiff_t from the standard library are examples.
The italicized portions tells us that programmers typedef basic type for portability. If I want to make sure my program works on different platforms, using different compiler, I will try to ensure that its portability in every possible way and typedef is one of them.
When I started programming using Turbo C compiler on Windows platform, it gave us the size of int 2. When I moved to Linux platform and GCC complier, the size I get is 4. If I had developed a program using Turbo C which relied on the assertion that sizeof( int ) is always two, it would have not ported properly to my new platform.
Hope it helps.
Following quote from K&R is not related to your query but I have posted it too for the sake of completion.
Second- to provide better documentation
The second purpose of typedefs is to provide better documentation for a
program - a type called Treeptr may be easier to understand than one declared only as a
pointer to a complicated structure.
Most of these patterns are bad practices that come from reading and copying existing bad code. Often they reflect misunderstandings about what C does or does not require.
Is akin to #define BEGIN { except it saves some typing instead of making for more.
Is akin to #define FALSE 0. If your idea of "byte" is the smallest addressable unit, char is a byte by definition. If your idea of "byte" is an octet, then either char is the octet type, or your machine has no octet type.
Is really ugly shorthand for people who can't touch type...
Is a mistake. It should be typedef uint32_t u32; or better yet, uint32_t should just be used directly.
Is the same as 4. Replace uint32_t with int16_t.
Please put a "considered harmful" stamp on them all. typedef should be used when you really need to create a new type whose definition could change over the life cycle of your code or when the code is ported to different hardware, not because you think C would be "prettier" with different type names.
We use it to make it Project/platform specific, everything has a common naming convention
pname_int32, pname_uint32, pname_uint8 -- pname is project/platform/module name
And some #defines
pname_malloc, pname_strlen
It easier to read and shortens long datatypes like unsigned char to pname_uint8 also making it a convention across all modules.
When porting you need to just modify the single file , thus making porting easy.
To cut the long story short,
you might want to do that to make your code portable (with less effort/editing).
This way you don't depend to 'int', instead you are using INTEGER that can be anything you want.
All [|u]intN_t types, where N=8|16|32|64 and so forth, are defined per architecture in this exact manner. This is a direct consequence of the fact that the standard does not mandate that char,int,float, etc. have exactly N bits - that would be insane. Instead, the standard defines minimum and maximum values of each type as guarantees to the programmer, and in various architectures types may well exceed those boundaries. It is not an uncommon sight.
The typedefs in your post are used to defined types of a certain length, in a specific architecture. It's probably not the best choice of naming; u32 and s16 are a bit too short, in my opinion. Also, it's kind of a bad thing to expose the names ulg and uch, one could prefix them with an application specific string since they obviously will not be exposed.
Hope this helps.