How to detect negative number assigned to size_t? - c++

This declaration compiles without warnings in g++ -pedantic -Wall (version 4.6.3):
std::size_t foo = -42;
Less visibly bogus is declaring a function with a size_t argument, and calling it with a negative value. Can such a function protect against an inadvertent negative argument (which appears as umpteen quintillion, obeying §4.7/2)?
Incomplete answers:
Just changing size_t to (signed) long discards the semantics and other advantages of size_t.
Changing it to ssize_t is merely POSIX, not Standard.
Changing it to ptrdiff_t is brittle and sometimes broken.
Testing for huge values (high-order bit set, etc) is arbitrary.

The problem with issuing a warning for this is that it's not undefined behavior according to the standard. If you convert a signed value to an unsigned type of the same size (or larger), you can later convert that back to a signed value of the original signed type and get the original value1 on any standards-compliant compiler.
In addition, using negative values converted to size_t is fairly common practice for various error conditions -- many system calls return an unsigned (size_t or off_t) value for success or a -1 (converted to unsigned) for an error. So adding such a warning to the compiler would cause spurious warnings for much existing code. POSIX attempts to codify this with ssize_t, but that breaks calls that may be successful with a return value greater than the maximum signed value for ssize_t.
1"original value" here actually means "a bit pattern that compares as equal to the original bit pattern when compared as that signed type" -- padding bits might not be preserved, and if the signed representation has redundant encodings (eg, -0 and +0 in a sign-magnitude representation) it might be 'canonicalized'

The following excerpt is from a private library.
#include <limits.h>
#if __STDC__ == 1 && __STDC_VERSION__ >= 199901L || \
defined __GNUC__ || defined _MSC_VER
/* Has long long. */
#ifdef __GNUC__
#define CORE_1ULL __extension__ 1ULL
#else
#define CORE_1ULL 1ULL
#endif
#define CORE_IS_POS(x) ((x) && ((x) & CORE_1ULL << (sizeof (x)*CHAR_BIT - 1)) == 0)
#define CORE_IS_NEG(x) (((x) & CORE_1ULL << (sizeof (x)*CHAR_BIT - 1)) != 0)
#else
#define CORE_IS_POS(x) ((x) && ((x) & 1UL << (sizeof (x)*CHAR_BIT - 1)) == 0)
#define CORE_IS_NEG(x) (((x) & 1UL << (sizeof (x)*CHAR_BIT - 1)) != 0)
#endif
#define CORE_IS_ZPOS(x) (!(x) || CORE_IS_POS(x))
#define CORE_IS_ZNEG(x) (!(x) || CORE_IS_NEG(x))
This should work with all unsigned types.

Related

Finding SHRT_MAX on systems without <limits.h> or <values.h>

I am reading The C++ Answer Book by Tony L Hansen. It says somewhere that the value of SHRT_MAX (the largest value of a short) can be derived as follows:
const CHAR_BIT= 8;
#define BITS(type) (CHAR_BIT*(int)sizeof(type))
#define HIBIT(type) ((type)(1<< (BITS(type)-1)))
#define TYPE_MAX(type) ((type)~HIBIT(type));
const SHRT_MAX= TYPE_MAX(short);
Could someone explain in simple words what is happening in the above 5 lines?
const CHAR_BIT= 8;
Assuming int is added here (and below): CHAR_BIT is the number of bits in a char. Its value is assumed here without checking.
#define BITS(type) (CHAR_BIT*(int)sizeof(type))
BITS(type) is the number of bits in type. If sizeof(short) == 2, then BITS(short) is 8*2.
Note that C++ does not guarantee that all bits in integer types other than char contribute to the value, but the below will assume that nonetheless.
#define HIBIT(type) ((type)(1<< (BITS(type)-1)))
If BITS(short) == 16, then HIBIT(short) is ((short)(1<<15)). This is implementation-dependent, but assumed to have the sign bit set, and all value bits zero.
#define TYPE_MAX(type) ((type)~HIBIT(type));
If HIBIT(short) is (short)32768, then TYPE_MAX(short) is (short)~(short)32768. This is assumed to have the sign bit cleared, and all value bits set.
const SHRT_MAX= TYPE_MAX(short);
If all assumptions are met, if this indeed has all value bits set, but not the sign bit, then this is the highest value representable in short.
It's possible to get the maximum value more reliably in modern C++ when you know that:
the maximum value for an unsigned type is trivially obtainable
the maximum value for a signed type is assuredly either equal to the maximum value of the corresponding unsigned type, or that value right-shifted until it's in the signed type's range
a conversion of an out-of-range value to a signed type does not have undefined behaviour, but instead gives an implementation-defined value in the signed type's range:
template <typename S, typename U>
constexpr S get_max_value(U u) {
S s = u;
while (s < 0 || s != u)
s = u >>= 1;
return u;
}
constexpr unsigned short USHRT_MAX = -1;
constexpr short SHRT_MAX = get_max_value<short>(USHRT_MAX);
Reformatting a bit:
const CHAR_BIT = 8;
Invalid code in C++, it looks like old C code. Let's assume that const int was meant.
#define BITS(type) (CHAR_BIT * (int)sizeof(type))
Returns the number of bits that a type takes assuming 8-bit bytes, because sizeof returns the number of bytes of the object representation of type.
#define HIBIT(type) ((type) (1 << (BITS(type) - 1)))
Assuming type is a signed integer in two's complement, this would return an integer of that type with the highest bit set. For instance, for a 8-bit integer, you would get 1 << (8 - 1) == 1 << 7 == 0b10000000 == -1.
#define TYPE_MAX(type) ((type) ~HIBIT(type));
The bitwise not of the previous thing, i.e. flips each bit. Following the same example as before, you would get ~0b10000000 == 0b01111111 == 127.
const SHRT_MAX = TYPE_MAX(short);
Again invalid, both in C and C++. In C++ due to the missing int, in C due to the fact that CHAR_BIT is not a constant expression. Let's assume const int. Uses the previous code to get the maximum of the short type.
Taking it one line at a time:
const CHAR_BIT= 8;
Declare and initialize CHAR_BIT as a variable of type const int with
value 8. This works because int is the default type (wrong: see comments below), though it’s
better practice to specify the type.
#define BITS(type) (CHAR_BIT* (int)sizeof(type))
Preprocessor macro, converting a type to the number of bits in that
type. (The asterisk isn’t making anything a pointer, it’s for
multiplication. Would be clearer if the author had put a space before
it.)
#define HIBIT(type) ((type)(1<< (BITS(type)-1)))
Macro, converting a type to a number of that type with the highest bit
set to one and all other bits zero.
#define TYPE_MAX(type) ((type)~HIBIT(type));
Macro, inverting HIBIT so the highest bit is zero and all others are
one. This will be the maximum value of type if it’s a signed type and
the machine uses two’s complement. The semicolon shouldn’t be there, but
it will work in this code.
const SHRT_MAX= TYPE_MAX(short);
Uses the above macros to compute the maximum value of a short.

Macro values defined using bit-shifts

I've been going through an old source project, trying to make it compile and run (it's an old game that's been uploaded to GitHub). I think a lot of the code was written with C-style/C-syntax in mind (a lot of typedef struct {...} and the likes) and I've been noticing that they define certain macros with the following style:
#define MyMacroOne (1<<0) //This equals 1
#define MyMacroTwo (1<<1) //This equals 2, etc.
So my question now is this - is there any reason why macros would be defined this way? Because, for example, 0x01 and 0x02 are the numerical result of the above. Or is it that the system will not read MyMacroOne = 0x01 but rather as a "shift object" with the value (1<<0)?
EDIT: Thanks for all of your inputs!
It makes it more intuitive and less error prone to define bit values, especially on multibit bitfields. For example, compare
#define POWER_ON (1u << 0)
#define LIGHT_ON (1u << 1)
#define MOTOR_ON (1u << 2)
#define SPEED_STOP (0u << 3)
#define SPEED_SLOW (1u << 3)
#define SPEED_FAST (2u << 3)
#define SPEED_FULL (3u << 3)
#define LOCK_ON (1u << 5)
and
#define POWER_ON 0x01
#define LIGHT_ON 0x02
#define MOTOR_ON 0x04
#define SPEED_STOP 0x00
#define SPEED_SLOW 0x08
#define SPEED_FAST 0x10
#define SPEED_FULL 0x18
#define LOCK_ON 0x20
It is convenient for the humans
for example
#define PIN0 (1u<<0)
#define PIN5 (1u<<5)
#define PIN0MASK (~(1u<<0))
#define PIN5MASK (~(1u<<5))
and it is easy too see if there is a correct bit position. it does not make the code slower as it is calculated at the compile time
You can always use constant integer expression shifts as a way to express (multiples of) powers of two, i.e. Multiple*(2 to the N-th power) = Mutliple << N (with some caveats related to when you hit the guaranteed size limits of the integer types and UB sets in*) and pretty much rely on the compiler folding them.
An integer expression made of integer constants is defined as an integer constant expression. These can be used to specify array sizes, case labels and stuff like that and so every compiler has to be able to fold them into a single intermediate and it'd be stupid not to utilize this ability even where it isn't strictly required.
*E.g.: you can do 1U<<15, but at 16 you should switch to at least 1L<<16 because ints/unsigneds are only required to have at least 16 bits and leftshifting an integer by its width or into the place where its sign bit is is undefined (6.5.7p4):
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the value of
the result is E1 x 2E2 , reduced modulo one more than the maximum
value representable in the result type. If E1 has a signed type and
nonnegative value, and E1 x 2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.
Macro are just replacement text. Everywhere macro is replaced by replacement text!! This is convenient especially if you want to name something constant which otherwise is prone to mistakes.
To illustrate how this (1<<0) syntax is more practical, consider this example from the code-base of Git 2.25 (Q1 2020), which moves the definition of a set of bitmask constants from 0ctal literal to (1U<<count) notation.
See commit 8679577 (17 Oct 2019) by Hariom Verma (harry-hov).
(Merged by Junio C Hamano -- gitster -- in commit 8f40d89, 10 Nov 2019)
builtin/blame.c: constants into bit shift format
Signed-off-by: Hariom Verma
We are looking at bitfield constants, and elsewhere in the Git source code, such cases are handled via bit shift operators rather than octal numbers, which also makes it easier to spot holes in the range.
If, say, 1<<5 was missing:
it is easier to spot it between 1<<4 and 1<<6
than it is to spot a missing 040 between a 020 and a 0100.
So instead of:
#define OUTPUT_ANNOTATE_COMPAT 001
#define OUTPUT_LONG_OBJECT_NAME 002
#define OUTPUT_RAW_TIMESTAMP 004
#define OUTPUT_PORCELAIN 010
You get:
#define OUTPUT_ANNOTATE_COMPAT (1U<<0)
#define OUTPUT_LONG_OBJECT_NAME (1U<<1)
#define OUTPUT_RAW_TIMESTAMP (1U<<2)
#define OUTPUT_PORCELAIN (1U<<3)

definition of UINT_MAX macro

I would like to know if there is a particular reason to define the macro UINT_MAX as (2147483647 * 2U + 1U) and not directly its true value (4294967295U) in the climits header file.
Thank you all.
As far as the compiled code is concerned, there would be no difference, because the compiler would evaluate both constant expressions to produce the same value at compile time.
Defining UINT_MAX in terms of INT_MAX lets you reuse a constant that you have already defined:
#define UINT_MAX (INT_MAX * 2U + 1U)
In fact, this is very much what clang's header does, reusing an internal constant __INT_MAX__ for both INT_MAX and UINT_MAX:
#define INT_MAX __INT_MAX__
#define UINT_MAX (__INT_MAX__ *2U +1U)

Does "&" vs. "&&" actually make a difference for compile-time flags?

I have a habit of using the following syntax in my compile-time flags:
#if (defined(A) & defined(B))
It's usually suggested that I do it with the && as follows:
#if (defined(A) && defined(B))
I know the difference between the two operators, and that in normal code && would short-circuit. However, the above is all handled by the compiler. Does it even matter what I use? Does it affect compile time by some infinitesimal amount because it doesn't evaluate the second define()?
Since defined(SOMETHING) yields 0 or 1, so that you're guaranteed 0 or 1 on both sides, it doesn't make a technical difference whether you use & or &&.
It's mostly about good habits (using & could carry over to some situation where it would be wrong) and about writing code that is easy to grasp by simple pattern matching. A & in there causes a millisecond pause while one considers whether it possibly could be a bit-level thing.
On the third hand, you can't use keyword and, which you ¹can use in ordinary C++ code.
Notes:
¹ With Visual C++ you can use and via a forced include of <iso646.h>.
According to the C99 standard, the expressions used in the preprocessor are constant expressions as defined by the C language itself, and are evaluated using the same engine. Therefore, && is a logical and operator that short circuits based on its LHS, and & is a bitwise operator with no predefined order of evaluation.
In practical terms, when used with defined() as you are, there is no difference between the two. However, the following would show a difference:
#define A 2
#define B 5
#if (A && B)
printf("A && B\n");
#endif
#if (A & B)
printf("A & B"\n);
#endif
In this case, A && B will be output, but not A & B (since the result of that bitwise-and is 0)
I would like to add to the previous answers that it can actually matter a lot in a situation like this:
#define A 0
#define B 21
#if (A != 0) && (42 / A == B)
/* ... */
#endif
Here, if A == 0, the compiler will not break. Writing (A != 0) & (42 / A == B) will make the compiler complain about a division by zero.

Testing for a maximum unsigned value

Is this the correct way to test for a maximum unsigned value in C and C++ code:
if(foo == -1)
{
// at max possible value
}
where foo is an unsigned int, an unsigned short, and so on.
For C++, I believe you should preferably use the numeric_limits template from the <limits> header :
if (foo == std::numeric_limits<unsigned int>::max())
/* ... */
For C, others have already pointed out the <limits.h> header and UINT_MAX.
Apparently, "solutions which are allowed to name the type are easy", so you can have :
template<class T>
inline bool is_max_value(const T t)
{
return t == std::numeric_limits<T>::max();
}
[...]
if (is_max_value(foo))
/* ... */
I suppose that you ask this question since at a certain point you don't know the concrete type of your variable foo, otherwise you naturally would use UINT_MAX etc.
For C your approach is the right one only for types with a conversion rank of int or higher. This is because before being compared an unsigned short value, e.g, is first converted to int, if all values fit, or to unsigned int otherwise. So then your value foo would be compared either to -1 or to UINT_MAX, not what you expect.
I don't see an easy way of implementing the test that you want in C, since basically using foo in any type of expression would promote it to int.
With gcc's typeof extension this is easily possible. You'd just have to do something like
if (foo == (typeof(foo))-1)
As already noted, you should probably use if (foo == std::numeric_limits<unsigned int>::max()) to get the value.
However for completeness, in C++ -1 is "probably" guaranteed to be the max unsigned value when converted to unsigned (this wouldn't be the case if there were unused bit patterns at the upper end of the unsigned value range).
See 4.7/2:
If the destination type is unsigned, the resulting value is the
least unsigned integer congruent to
the source integer (modulo 2^n where n
is the number of bits used to
represent the unsigned type). [Note:
In a two’s complement representation,
this conversion is conceptual and
there is no change in the bit pattern
(if there is no truncation). ]
Note that specifically for the unsigned int case, due to the rules in 5/9 it appears that if either operand is unsigned, the other will be converted to unsigned automatically so you don't even need to cast the -1 (if I'm reading the standard correctly). In the case of unsigned short you'll need a direct check or explicit cast because of the automatic integral promotion induced by the ==.
using #include <limits.h> you could just do
if(foo == UINT_MAX)
if foo is an unsigned int it has valued [0 - +4,294,967,295] (if 32 bit)
More : http://en.wikipedia.org/wiki/Limits.h
edit: in C
if you do
#include <limits.h>
#include <stdio.h>
int main() {
unsigned int x = -1;
printf("%u",x);
return 0;
}
you will get the result 4294967295 (in a 32-bit system) and that is because internally, -1 is represented by 11111111111111111111111111111111 in two's complement. But because it is an unsigned, there is now no "sign bit" therefore making it work in the range [0-2^n]
Also see : http://en.wikipedia.org/wiki/Two%27s_complement
See other's answers for the C++ part std::numeric_limits<unsigned int>::max()
I would define a constant that would hold the maximum value as needed by the design of your code. Using "-1" is confusing. Imagine that someone in the future will change the type from unsigned int to int, it will mess your code.
Here's an attempt at doing this in C. It depends on the implementation not having padding bits:
#define IS_MAX_UNSIGNED(x) ( (sizeof(x)>=sizeof(int)) ? ((x)==-1) : \
((x)==(1<<CHAR_BIT*sizeof(x))-1) )
Or, if you can modify the variable, just do something like:
if (!(x++,x--)) { /* x is at max possible value */ }
Edit: And if you don't care about possible implementation-defined extended integer types:
#define IS_MAX_UNSIGNED(x) ( (sizeof(x)>=sizeof(int)) ? ((x)==-1) : \
(sizeof(x)==sizeof(short)) ? ((x)==USHRT_MAX) : \
(sizeof(x)==1 ? ((x)==UCHAR_MAX) : 42 )
You could use sizeof(char) in the last line, of course, but I consider it a code smell and would typically catch it grepping for code smells, so I just wrote 1. Of course you could also just remove the last conditional entirely.