Is it well-specified (for unsigned types in general), that:
static_assert(-std::size_t{1} == ~std::size_t{0}, "!");
I just looked into libstdc++'s std::align implementation and note using std::size_t negation:
inline void*
align(size_t __align, size_t __size, void*& __ptr, size_t& __space) noexcept
{
const auto __intptr = reinterpret_cast<uintptr_t>(__ptr);
const auto __aligned = (__intptr - 1u + __align) & -__align;
const auto __diff = __aligned - __intptr;
if ((__size + __diff) > __space)
return nullptr;
else
{
__space -= __diff;
return __ptr = reinterpret_cast<void*>(__aligned);
}
}
Unsigned integer types are defined to wrap around, and the highest possible value representable in an unsigned integer type is the number with all bits set to one - so yes.
As cpp-reference states it (arithmetic operators / overflow):
Unsigned integer arithmetic is always performed modulo 2n where n is
the number of bits in that particular integer. E.g. for unsigned int,
adding one to UINT_MAX gives 0, and subtracting one from 0 gives
UINT_MAX.
Related: Is it safe to use negative integers with size_t?
Is it well-specified (for unsigned types in general), that:
static_assert(-std::size_t{1} == ~std::size_t{0}, "!");
No, it is not.
For calculations using unsigned types, the assertion must hold. However, this assertion is not guaranteed to use unsigned types. Unsigned types narrower than int would be promoted to signed int or unsigned int (depending on the types' ranges) before - or ~ is applied. If it is promoted to signed int, and signed int does not use two's complement for representing negative values, the assertion can fail.
libstdc++'s code, as shown, does not perform any arithmetic in any unsigned type narrower than int though. The 1u in __aligned ensures each of the calculations use unsigned int or size_t, whichever is larger. This applies even to the subtraction in __space -= __diff.
Unsigned types at least as wide as unsigned int do not undergo integer promotions, so arithmetic and logical operations on them is applied in their own type, for which Johan Lundberg's answer applies: that's specified to be performed modulo 2N.
Related
int main() {
vector<int> v;
if (0 < v.size() - 1) {
printf("true");
} else {
printf("false");
}
}
It prints true which indicates 0 < -1
std::vector::size() returns an unsigned integer. If it is 0 and you subtract 1, it underflows and becomes a huge value (specifically std::numeric_limits<std::vector::size_type>::max()). The comparison works fine, but the subtraction produces a value you did not expect.
For more about unsigned underflow (and overflow), see: C++ underflow and overflow
The simplest fix for your code is probably if (1 < v.size()).
v.size() returns a result of size_t, which is an unsigned type. An unsigned value minus 1 is still unsigned. And all non-zero unsigned values are greater than zero.
std::vector<int>::size() returns type size_t which is an unsigned type whose rank is usually at least that of int.
When, in a math operation, you put together a signed type with a unsigned type and the unsigned type doesn't have a lower rank, the signed typed will get converted to the unsigned type (see 6.3.1.8 Usual arithmetic conversions (I'm linking to the C standard, but rules for integer arithmetic are foundational and need to be common to both languages)).
In other words, assuming that size_t isn't unsigned char or unsigned short
(it's usually unsigned long and the C standard recommends it shouldn't be unsigned long long unless necessary)
(size_t)0 - 1
gets implicitly translated to
(size_t)0 - (size_t)1
which is a positive number equal to SIZE_MAX (-1 cannot be represented in an unsigned type so it gets converted converted by formally "repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type" (6.3.1.3p)).
0 is always less than SIZE_MAX.
I'm facing a problem where signed integers should be converted to unsigneds, preserving their range and order.
Given the following definition:
#include <limits>
#define MIN(X) std::numeric_limits<X>::min();
#define MAX(X) std::numeric_limits<X>::max();
What is the fastest and correct way to map the signed range [MIN(T), MAX(T)] to the unsigned range [0, MAX(U)]?
where:
T is a signed integer type
U is an unsigned integer type
sizeof(T) == sizeof(U)
I tried various bit twiddling and numeric methods to come up with a solution, without success.
unsigned int signedToUnsigned(signed int s) {
unsigned int u = 1U + std::numeric_limits<int>::max();
u += s;
return u;
}
Live example here
This will add signed_max + 1 to signed int to ensure [MIN(int), MAX(int)] is mapped to [0, MAX(unsigned int)]
Why would this answer work and map correctly:
When you add a signed integral number to an unsigned, the signed number is promoted to unsigned type. From Section 4.7 [conv.integral]
If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2n
where n is the number of bits used to represent the unsigned type). [
Note: In a two’s complement representation, this conversion is
conceptual and there is no change in the bit pattern (if there is no
truncation). —end note ]
I am looking at some C++ code and I see:
byte b = someByteValue;
// take twos complement
byte TwosComplement = -b;
Is this code taking the twos complement of b? If not, What is it doing?
This code definitely does compute the twos-complement of an 8-bit binary number, on any implementation where stdint.h defines uint8_t:
#include <stdint.h>
uint8_t twos_complement(uint8_t val)
{
return -(unsigned int)val;
}
That is because, if uint8_t is available, it must be an unsigned type that is exactly 8 bits wide. The conversion to unsigned int is necessary because uint8_t is definitely narrower than int. Without the conversion, the value will be promoted to int before it is negated, so, if you're on a non-twos-complement machine, it will not take the twos-complement.
More generally, this code computes the twos-complement of a value with any unsigned type (using C++ constructs for illustration - the behavior of unary minus is the same in both languages, assuming no user-defined overloads):
#include <cstdint>
#include <type_traits>
template <typename T>
T twos_complement(T val,
// "allow this template to be instantiated only for unsigned types"
typename std::enable_if<std::is_unsigned<T>::value>::type* = 0)
{
return -std::uintmax_t(val);
}
because unary minus is defined to take the twos-complement when applied to unsigned types. We still need a cast to an unsigned type that is no narrower than int, but now we need it to be at least as wide as any possible T, hence uintmax_t.
However, unary minus does not necessarily compute the twos-complement of a value whose type is signed, because C (and C++) still explicitly allow implementations based on CPUs that don't use twos-complement for signed quantities. As far as I know, no such CPU has been manufactured in at least 20 years, so the continued provision for them is kind of silly, but there it is. If you want to compute the twos-complement of a value even if its type happens to be signed, you have to do this: (C++ again)
#include <type_traits>
template <typename T>
T twos_complement(T val)
{
typedef std::make_unsigned<T>::type U;
return T(-uintmax_t(U(val)));
}
i.e. convert to the corresponding unsigned type, then to uintmax_t, then apply unary minus, then back-convert to the possibly-signed type. (The cast to U is required to make sure the value is zero- rather than sign-extended from its natural width.)
(If you find yourself doing this, though, stop and change the types in question to unsigned instead. Your future self will thank you.)
The correct expression will look
byte TwosComplement = ~b + 1;
Note: provided that byte is defined as unsigned char
On a two's complement machine negation computes the two's complement, yes.
On the Unisys something-something, hopefully now dead and buried (but was still extant a few years ago), no for a signed type.
C and C++ supports two's complement, one's complement and sign-and-magnitude representation of signed integers, and only with two's complement does negation do a two's complement.
With byte as an unsigned type negation plus conversion to byte produces the two's complement bitpattern, regardless of integer representation, because conversion to unsigned as well as unsigned arithmetic is modulo 2n where n is the number of value representation bits.
That is, the resulting value after assigning or initializing with -x is 2n - x which is the two's complement of x.
This does not mean that the negation itself necessarily computes the two's complement bitpattern. To understand this, note that with byte defined as unsigned char, and with sizeof(int) > 1, the byte value is promoted to int before the negation, i.e. the negation operation is done with a signed type. But converting the resulting negative value to unsigned byte, creates the two's complement bitpattern by definition and the C++ guarantee of modulo arithmetic and conversion to unsigned type.
The usefulness of 2's complement form follows from 2n - x = 1 + ((2n - 1) - x), where the last parenthesis is an all-ones bitpattern minus x, i.e. a simple bitwise inversion of x.
twos_complement code for a byte binary number :
int byte[] = {1, 0, 1, 1, 1, 1, 1, 1};
if (byte[0] != 0){
for (int i = 0; i < 8; i++){
if (byte[i] == 1)
byte[i] = 0;
else
byte[i] = 1;
}
for (int j = 7; j >= 0; j--){
if (byte[j] == 0){
byte[j] = 1;
break;
}
else {
byte[j] = 0;
}
}
}
for (int i = 0; i < 8; i++)
cout << byte[i];
cout << endl;
Consider a typical absolute value function (where for the sake of argument the integral type of maximum size is long):
unsigned long abs(long input);
A naive implementation of this might look something like:
unsigned long abs(long input)
{
if (input >= 0)
{
// input is positive
// We know this is safe, because the maximum positive signed
// integer is always less than the maximum positive unsigned one
return static_cast<unsigned long>(input);
}
else
{
return static_cast<unsigned long>(-input); // ut oh...
}
}
This code triggers undefined behavior, because the negation of input may overflow, and triggering signed integer overflow is undefined behavior. For instance, on 2s complement machines, the absolute value of std::numeric_limits<long>::min() will be 1 greater than std::numeric_limits<long>::max().
What can a library author do to work around this problem?
One can cast to the unsigned variant first to avoid any undefined behavior:
unsigned long uabs(long input)
{
if (input >= 0)
{
// input is positive
return static_cast<unsigned long>(input);
}
else
{
return -static_cast<unsigned long>(input); // read on...
}
}
In the above code, we invoke two well defined operations. Converting the signed integer to the unsigned one is well defined by N3485 4.7 [conv.integral]/2:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]
This basically says that when making the specific conversion of going from signed to unsigned, one can assume unsigned-style wraparound.
The negation of the unsigned integer is well defined by 5.3.1 [expr.unary.op]/8:
The negative of an unsigned quantity is computed by subtracting its value from 2^n , where n is the number of bits in the promoted operand.
These two requirements effectively force implementations to operate like a 2s complement machine would, even if the underlying machine is a 1s complement or signed magnitude machine.
A generalized C++11 version that returns the unsigned version of an integral type:
#include <type_traits>
template <typename T>
constexpr
typename std::make_unsigned<T>::type uabs(T x)
{
typename std::make_unsigned<T>::type ux = x;
return (x<0) ? -ux : ux; // compare signed x, negate unsigned x
}
This compiles on the Godbolt compiler explorer, with a test case showing that gcc -O3 -fsanitize=undefined finds no UB in uabs(std::numeric_limits<long>::min()); after constant-propagation, but does in std::abs().
Further template stuff should be possible to make a version that would return the unsigned version of integral types, but return T for floating-point types, if you want a general-purpose replacement for std::abs.
Just add one if negative.
unsigned long absolute_value(long x) {
if (x >= 0) return (unsigned long)x;
x = -(x+1);
return (unsigned long)x + 1;
}
I have a sample function as below:
int get_hash (unsigned char* str)
{
int hash = (str[3]^str[4]^str[5]) % MAX;
int hashVal = arr[hash];
return hashVal;
}
Here array arr has size as MAX. ( int arr[MAX] ).
My static code checker complains that there can be a out of bound array access here, as hash could be in the range -255 to -1.
Is this correct? Can bitwise operation on unsigned char produce a negative number? Should hash be declared as unsigned int?
Is this correct?
No, the static code checker is in error(1).
Can bitwise operation on unsigned char produce a negative number?
Some bitwise operations can - bitwise complement, for example - but not the exclusive or.
For the ^, the arguments, unsigned char here, are subject to the usual arithmetic conversions (6.3.1.8), they are first promoted according to the integer promotions; about those, clause 6.3.1.1, paragraph 2 says
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.
So, there are two possibilities:
An int can represent all possible values of unsigned char. Then all values obtained from the integer promotions are non-negative, the bitwise exclusive or of these values is also non-negative, and the remainder modulo MAX too. The value of hash is then in the range from 0 (inclusive) to MAX (exclusive) [-MAX if MAX < 0].
An int cannot represent all possible values of unsigned char. Then the values are promoted to type unsigned int, and the bitwise operations are carried out at that type. The result is of course non-negative, and the remainder modulo MAX will be non-negative too. However, in that case, the assignment to int hash might convert an out-of-range value to a negative value [the conversion of out-of-range integers to a signed integer type is implementation-defined]. (1)But in that case, the range of possible negative values is greater than -255 to -1, so even in that - very unlikely - case, the static code checker is wrong in part.
Should hash be declared as unsigned int?
That depends on the value of MAX. If there is the slightest possibility that a remainder modulo MAX is out-of-range for int, then that would be safer. Otherwise, int is equally safe.
As remarked correctly by gx_, the arithmetic is done in int. Just declare your hash variable as unsigned char, again, to be sure that everybody knows that you expect this to be positive in all cases.
And if MAX is effectively UCHAR_MAX you should just use that to improve readability.