Bitwise operation on unsigned char - c++

I have a sample function as below:
int get_hash (unsigned char* str)
{
int hash = (str[3]^str[4]^str[5]) % MAX;
int hashVal = arr[hash];
return hashVal;
}
Here array arr has size as MAX. ( int arr[MAX] ).
My static code checker complains that there can be a out of bound array access here, as hash could be in the range -255 to -1.
Is this correct? Can bitwise operation on unsigned char produce a negative number? Should hash be declared as unsigned int?

Is this correct?
No, the static code checker is in error(1).
Can bitwise operation on unsigned char produce a negative number?
Some bitwise operations can - bitwise complement, for example - but not the exclusive or.
For the ^, the arguments, unsigned char here, are subject to the usual arithmetic conversions (6.3.1.8), they are first promoted according to the integer promotions; about those, clause 6.3.1.1, paragraph 2 says
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.
So, there are two possibilities:
An int can represent all possible values of unsigned char. Then all values obtained from the integer promotions are non-negative, the bitwise exclusive or of these values is also non-negative, and the remainder modulo MAX too. The value of hash is then in the range from 0 (inclusive) to MAX (exclusive) [-MAX if MAX < 0].
An int cannot represent all possible values of unsigned char. Then the values are promoted to type unsigned int, and the bitwise operations are carried out at that type. The result is of course non-negative, and the remainder modulo MAX will be non-negative too. However, in that case, the assignment to int hash might convert an out-of-range value to a negative value [the conversion of out-of-range integers to a signed integer type is implementation-defined]. (1)But in that case, the range of possible negative values is greater than -255 to -1, so even in that - very unlikely - case, the static code checker is wrong in part.
Should hash be declared as unsigned int?
That depends on the value of MAX. If there is the slightest possibility that a remainder modulo MAX is out-of-range for int, then that would be safer. Otherwise, int is equally safe.

As remarked correctly by gx_, the arithmetic is done in int. Just declare your hash variable as unsigned char, again, to be sure that everybody knows that you expect this to be positive in all cases.
And if MAX is effectively UCHAR_MAX you should just use that to improve readability.

Related

C++ can not calculate a formula with a vector's size in it?

int main() {
vector<int> v;
if (0 < v.size() - 1) {
printf("true");
} else {
printf("false");
}
}
It prints true which indicates 0 < -1
std::vector::size() returns an unsigned integer. If it is 0 and you subtract 1, it underflows and becomes a huge value (specifically std::numeric_limits<std::vector::size_type>::max()). The comparison works fine, but the subtraction produces a value you did not expect.
For more about unsigned underflow (and overflow), see: C++ underflow and overflow
The simplest fix for your code is probably if (1 < v.size()).
v.size() returns a result of size_t, which is an unsigned type. An unsigned value minus 1 is still unsigned. And all non-zero unsigned values are greater than zero.
std::vector<int>::size() returns type size_t which is an unsigned type whose rank is usually at least that of int.
When, in a math operation, you put together a signed type with a unsigned type and the unsigned type doesn't have a lower rank, the signed typed will get converted to the unsigned type (see 6.3.1.8 Usual arithmetic conversions (I'm linking to the C standard, but rules for integer arithmetic are foundational and need to be common to both languages)).
In other words, assuming that size_t isn't unsigned char or unsigned short
(it's usually unsigned long and the C standard recommends it shouldn't be unsigned long long unless necessary)
(size_t)0 - 1
gets implicitly translated to
(size_t)0 - (size_t)1
which is a positive number equal to SIZE_MAX (-1 cannot be represented in an unsigned type so it gets converted converted by formally "repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type" (6.3.1.3p)).
0 is always less than SIZE_MAX.

c++ safeness of code with implicit conversion between signed and unsigned

According to the rules on implicit conversions between signed and unsigned integer types, discussed here and here, when summing an unsigned int with a int, the signed int is first converted to an unsigned int.
Consider, e.g., the following minimal program
#include <iostream>
int main()
{
unsigned int n = 2;
int x = -1;
std::cout << n + x << std::endl;
return 0;
}
The output of the program is, nevertheless, 1 as expected: x is converted first to an unsigned int, and the sum with n leads to an integer overflow, giving the "right" answer.
In a code like the previous one, if I know for sure that n + x is positive, can I assume that the sum of unsigned int n and int x gives the expected value?
In a code like the previous one, if I know for sure that n + x is positive, can I assume that the sum of unsigned int n and int x gives the expected value?
Yes.
First, the signed value converted to unsigned, using modulo arithmetic:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n
where n is the number of bits used to represent the unsigned type).
Then two unsigned values will be added using modulo arithmetic:
Unsigned integers shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.
This means that you'll get the expected answer.
Even, if the result would be negative in the mathematical sense, the result in C++ would be a number which is modulo-equal to the negative number.
Note that I've supposed here that you add two same-sized integers.
I think you can be sure and it is not implementation defined, although this statement requires some interpretations of the standard when it comes to systems that do not use two's complement for representing negative values.
First, let's state the things that are clear: unsigned integrals do not overflow but take on a modulo 2^nrOfBits-value (cf this online C++ standard draft):
6.7.1 Fundamental types
(7) Unsigned integers shall obey the laws of arithmetic modulo 2n
where n is the number of bits in the value representation of that
particular size of integer.
So it's just a matter of whether a negative value nv is converted correctly into an unsigned integral bit pattern nv(conv) such that x + nv(conv) will always be the same as x - nv. For the case of a system using two's complement, things are clear, since the two's complement is actually designed such that this arithmetic works immediately.
For systems using other representations of negative values, we'll have to read the standard carefully:
7.8 Integral conversions
(2) If the destination type is unsigned, the resulting value is the
least unsigned integer congruent to the source integer (modulo 2n
where n is the number of bits used to represent the unsigned type). [
Note: In a two’s complement representation, this conversion is
conceptual and there is no change in the bit pattern (if there is
notruncation). —endnote]
As the footnote explicitly says, that in a two's complement representation, there is no change in the bit pattern, we may assume that in systems other than 2s complement a real conversion will take place such that x + nv(conv) == x - nv.
So due to 7.8 (2), I'd say that your assumption is valid.

Longest string in an array of strings in C++

I was trying to get the maximum length of a string from an array of strings.
I wrote this code:
char a[100][100] = {"solol","a","1234567","123","1234"};
int max = -1;
for(int i=0;i<5;i++)
if(max<strlen(a[i]))
max=strlen(a[i]);
cout<<max;
The output it gives is -1.
But when I initialize the value of max by 0 instead of 1, the code works fine. Why is it so?
The function strlen returns an unsigned integral (i.e. size_t) so when you compare it to max which is signed, it gets promoted to unsigned and that -1 turns to something like 0xffffffff (the actual value depends on your architecture/compiler) which is larger than any of those strings.
Change max type to be size_t and never compare unsigned with signed integrals.
size_t is the unsigned integer type.
From C++ Standard#4.6:
A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion
rank (4.15) is less than the rank of int can be converted to a prvalue of type int if int can represent all the
values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.
The value of max is -1 which represent the maximum value that an unsigned integer type can hold. In the expression max<strlen(a[i]), the value returned by strlen will be promoted to unsigned int and will be compared with the maximum value of the unsigned integer. Hence the condition max<strlen(a[i]) will never true for the given input and the value of max will not change.
The longest string in the array of strings a is of length 7. When you initialize the max with 0 or 1, the expression max<strlen(a[i]) evaluate to true for any string whose length is greater than the value of max. Hence you will get the expected output.
strlen returns an unsigned int and when doing the comparison -1 gets promoted to a maximum value an unsigned can hold.
To fix the code, change the max definition to unsigned int max = 0
Implicit type conversion (coercion)
When evaluating expressions, the compiler breaks each expression down into individual subexpressions. The arithmetic operators require their operands to be of the same type. To ensure this, the compiler uses the following rules:
If an operand is an integer that is narrower than an int, it undergoes integral promotion (as described above) to int or unsigned int.
If the operands still do not match, then the compiler finds the highest priority operand and implicitly converts the other operand to match.
The priority of operands is as follows:
long double (highest)
double
float
unsigned long long
long long
unsigned long
long
unsigned int
int (lowest)
In your case on of operands has type int and another one type size_t, so max was promoted to type size_t, and bit representation of -1 is the biggest possible size_t.
You might also want to look in the following parts of the Standard [conv.prom],
[conv.integral], [conv.rank].
I think this might work for you.
#include <algorithm>
const size_t ARRAYSIZE = 100;
char stringArray[ARRAYSIZE][100] = {"solol","a","1234567","123","1234"};
auto longerString = std::max_element(stringArray, stringArray + ARRAYSIZE - 1, [](const auto& s1, const auto& s2){
return strlen(s1) < strlen(s2);
});
size_t maxSize = strlen(longerString[0]);
This works for me, returns the longest char array, just do a strlen over the "first element" and youre done. For your example, it returns 7.
Hope it helps.

C++ function with unsigned int parameter gets strange result where call it with negitive

I am new to C++, I am confused with C++'s behavior for the code below:
#include <iostream>
void hello(unsigned int x, unsigned int y){
std::cout<<x<<std::endl;
std::cout<<y<<std::endl;
std::cout<<x+y<<std::endl;
}
int main(){
int a = -1;
int b = 3;
hello(a,b);
return 1;
}
The x in the output is a very large integer:4294967295, I know that negative integer convert to unsigned will behave like this. But why x+y in the output is 2?
Contrary to the other answers, there is no undefined behavior here, and there is no overflow. Unsigned integers use modulo 2n arithmetic.
Section 4.7 paragraph 2 of the standard says "If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type)." This dictates that -1 is equal to the largest possible unsigned int (modulo 2n).
Section 3.9.1 paragraph 4 says "Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer." To make it clear what this means, the footnote to this clause says "This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type."
In other words, converting -1 to 4294967295 is not just defined behavior, it is required behavior (assuming 32 bit integers). Similarly, adding 3 to that value and yielding 2 as a result is also required behavior. In this case, the value of n is irrelevant. The third value printed by hello() must be 2 or the implementation is not compliant with the standard.
Because unsigned int's will overflow. In other words, a=-1 (signed) which is 1 value below the maximum value for unsigned int's, 4294967295.
Then you add 3, the int will overflow and start at 0, so -1+3 =2
Passing negative number to unsigned int as parameter gives you an undefined behavior. The default int is a signed. It has a range of –2,147,483,648 to 2,147,483,647. Unsigned int range is from 0 to 4,294,967,295.
This isn't so much having to do with C++ but with how computers represent signed and unsigned numbers.
This is a good source on that. Basically, signed numbers are (usually) represented using two's complement, in which the most significant bit has a value of -2^n. In effect, what his means is that the positive numbers are represented the same in two's complement as they are in regular unsigned binary.
-1 is represented as all ones, which when interpreted as an unsigned integer will be the largest integer that can be represented (4294967295, when dealing with 32 bits).
One of the great things about using two complement to represent signed numbers is that you can perform addition and subtraction in the exact same way as with unsigned numbers and it will work out correctly, so long as the number does not exceed the bounds that can be represented. This isn't as easy with other forms such as signed-magnitude.
So, what this means is that because the result of -1 + 3 = 2, and because 2 is positive, it will be interpreted the same as if it were unsigned. Thus, it prints 2.

Is it safe to shift 1-based numbering to 0-based numbering by subtracting 1 if unsigned integers are used?

In a system I am maintaining, users request elements from a collection from a 1-based indexing scheme. Values are stored in 0-based arrays in C++ / C.
Is the following hypothetical code portable if 0 is erroneously entered as the input to this function? Is there a better way to validate the user's input when converting 1-based numbering schemes to 0-based?
const unsigned int arraySize;
SomeType array[arraySize];
SomeType GetFromArray( unsigned int oneBasedIndex )
{
unsigned int zeroBasedIndex = oneBasedIndex - 1;
//Intent is to check for a valid index.
if( zeroBasedIndex < arraySize )
{
return array[zeroBasedIndex];
}
//else... handle the error
}
My assumption is that (unsigned int)( 0 - 1 ) is always greater than arraySize; is this true?
The alternative, as some have suggested in their answers below, is to check oneBasedIndex and ensure that it is greater than 0:
const unsigned int arraySize;
SomeType array[arraySize];
SomeType GetFromArray( unsigned int oneBasedIndex )
{
if( oneBasedIndex > 0 && oneBasedIndex <= arraySize )
{
return array[oneBasedIndex - 1];
}
//else... handle the error
}
For unsigned types, 0-1 is the maximum value for that type, so it's always >= arraySize. In other words, yes that is absolutely safe.
Unsigned integers never overflow in C++ and in C.
For C++ language:
(C++11, 3.9.1p4) "Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer. 46)"
and footnote 46):
"46) This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type."
and for C language:
(C11, 6.2.5p9) "A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type."
Chances are good that it's safe, but for many purposes, there's a much simpler way: allocate one extra spot in your array, and just ignore element 0.
No, for example 0-1 is 0xffffffff in 4-byte unsigned int, what if your array is really that big?
32 bit is ok because 0xffffffff exceeds limit, the code breaks when compile in 64 bit if the array is that big.
Just check for oneBasedIndex > 0
Make sure oneBasedIndex is greater than zero...