Is a forced integer buffer overflow legitime? - c++

I want to implement a Handletype like in this example.
(Long story short: the structure Handle holds an index-member to an array with elements. Its other member count validates if the index is up to date, corresponding to the datas countArray. count and countArray are with a fixed size of a type/bitfield( u32 : 20bits))
To avoid being restricted to the 20bits of the generation/counter size, the following came into my mind: Why not let the unsigned char count/countArray overflow on purpose?
I could also do the same with the modulo method ( counter = ++counter % 0xff ), but that is another additional operation then..
So let the count grow upto 0xff and overflow will set it again to 0 when 0xff + 1 happens.
Is this legitime?
Here is my pseudo implementation (C++):
struct Handle
{
unsigned short index;
unsigned char count;
};
struct myData
{
unsigned short curIndex;
int* dataArray;
unsigned char* countArray;
Handle create()
{
// check if index not already used
// create object at dataArray[handle.index]
Handle handle;
handle.index = curIndex;
handle.count = countArray[curIndex];
return handle;
}
void destroy( const Handle& handle )
{
// delete object at dataArray[handle.index]
countArray[handle.index]++; // <-- overflow here?
}
bool isValid( const Handle& handle ) const
{
return handle.count == countArray[handle.index];
}
};
EDIT #1: Yes, these integral types should all be unsigned (as indexes are)

As long as you're not using signed types, you're safe.
Technically, unsigned types don't overflow:
3.9.1 Fundamental types [basic.fundamental]
46)This implies that unsigned arithmetic does not overflow because a
result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting unsigned integer type.

Related

Xor encryption in C++ with warning "Use of a signed integer operand with a binary bitwise operator"

I'm learning the simple XOR encryption algorithm in c++.
The next code works fine:
void test(int8_t* data, const int data_length) {
const uint8_t key = 123;
for (int index = 0; index < data_length; index++)
data[index] = data[index] ^ key;
}
The data that I am given is signed, therefore has a type of int8_t.
The problem is that the compiler shows the next warning:
"Use of a signed integer operand with a binary bitwise operator”
I can make the warning go by casting data with uint8_t when performing the XOR operation, but I don't know the implications. I've done some test and doesn't seem to be a problem, but I am confused because data can contain signed values, so I am not sure if by casting it I am messing the data.
Is it correct to cast to uint8_t even if data can contain negative values? or should I ignore the warning?
The compiler is giving the warning because bitwise operations are not supposed to be performed on signed integers. In C++ before C++20, there were allowed different representations of signed integers, meaning that the same number could be represented by different bit patterns on different machines and compilers. This makes the result of bit manipulations on signed integers non-portable. Granted, intN_t were always required to use two's complement representation (and C++20 extended that requirement to all signed integers), it is still not recommended to use signed integers for bitwise operations.
In your particular case, both data[index] and key get promoted to int to perform the XOR operation. However, since data[index] is a signed integer, its value gets sign-extended, and the unsigned key gets zero-extended. This means the XOR affects only the low 8 bits of the intermediate int values, and the result may not fit in int8_t range. When you assign the result back to data[index], a signed overflow can happen, which is UB in C++ (prior to C++20; since C++20 it is well defined to truncate the upper bits).
The correct thing to do in this case is to treat your data as an array of raw bytes, regardless of what values these bytes represent. This means, you should be using std::byte or std::uint8_t to represent input and output data. This way you will be operating on unsigned integers and have no portability or potential overflow issues.
With c++20, you should use a bitwise copy:
void test(int8_t* data, const int data_length) {
const uint8_t key = 123;
for (int index = 0; index < data_length; index++){
auto const encrypted = std::bit_cast<std::byte>(data[index]) ^ key;
data[index] = std::bit_cast<int8_t>(encrypted);
}
}
For previous versions you should static_cast the signed type to the corresponding unsigned type.
void test(int8_t* data, const int data_length) {
const uint8_t key = 123;
for (int index = 0; index < data_length; index++){
auto const encrypted = static_cast<std::byte>(data[index]) ^ key;
data[index] = static_cast<int8_t>(encrypted);
}
}

Negative size_t

Is it well-specified (for unsigned types in general), that:
static_assert(-std::size_t{1} == ~std::size_t{0}, "!");
I just looked into libstdc++'s std::align implementation and note using std::size_t negation:
inline void*
align(size_t __align, size_t __size, void*& __ptr, size_t& __space) noexcept
{
const auto __intptr = reinterpret_cast<uintptr_t>(__ptr);
const auto __aligned = (__intptr - 1u + __align) & -__align;
const auto __diff = __aligned - __intptr;
if ((__size + __diff) > __space)
return nullptr;
else
{
__space -= __diff;
return __ptr = reinterpret_cast<void*>(__aligned);
}
}
Unsigned integer types are defined to wrap around, and the highest possible value representable in an unsigned integer type is the number with all bits set to one - so yes.
As cpp-reference states it (arithmetic operators / overflow):
Unsigned integer arithmetic is always performed modulo 2n where n is
the number of bits in that particular integer. E.g. for unsigned int,
adding one to UINT_MAX gives ​0​, and subtracting one from 0​ gives
UINT_MAX.
Related: Is it safe to use negative integers with size_t?
Is it well-specified (for unsigned types in general), that:
static_assert(-std::size_t{1} == ~std::size_t{0}, "!");
No, it is not.
For calculations using unsigned types, the assertion must hold. However, this assertion is not guaranteed to use unsigned types. Unsigned types narrower than int would be promoted to signed int or unsigned int (depending on the types' ranges) before - or ~ is applied. If it is promoted to signed int, and signed int does not use two's complement for representing negative values, the assertion can fail.
libstdc++'s code, as shown, does not perform any arithmetic in any unsigned type narrower than int though. The 1u in __aligned ensures each of the calculations use unsigned int or size_t, whichever is larger. This applies even to the subtraction in __space -= __diff.
Unsigned types at least as wide as unsigned int do not undergo integer promotions, so arithmetic and logical operations on them is applied in their own type, for which Johan Lundberg's answer applies: that's specified to be performed modulo 2N.

Is it danger cast int * to unsigned int *

I have variable of type int *alen. Trying to pass it to function:
typedef int(__stdcall *Tfnc)(
unsigned int *alen
);
with casting
(*Tfnc)( (unsigned int *)alen )
Can I expect problems in case value is never negative?
Under the C++ standard, what you are doing is undefined behavior. The memory layout of unsigned and signed ints is not guaranteed to be compatible, as far as I know.
On most platforms (which use 2s complement integers), this will not be a problem.
The remaining issue is strict aliasing, where the compiler is free to presume that pointers to one type and pointers to another type are not pointers to the same thing.
typedef int(__stdcall *Tfnc)(
unsigned int *alen
);
int test() {
int x = 3;
Tfnc pf = [](unsigned int* bob) { *bob = 2; };
pf((unsigned int*)&x);
return x;
}
the above code might be allowed to ignore the modification to the x while it is modified through the unsigned int*, even on 2s complement hardware.
That is the price of undefined behavior.
No it won't be of any problem, until and unless the int value you pass is not negative.
But if the given value is negative then the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n where n is the number of bits used to represent the unsigned type).

reverse a number's bits

Here is a C++ class for revering bits from LeetCode discuss. https://leetcode.com/discuss/29324/c-solution-9ms-without-loop-without-calculation
For example, given input 43261596 (represented in binary as 00000010100101000001111010011100), return 964176192 (represented in binary as 00111001011110000010100101000000).
Is there anyone can explain it? Thank you so very much!!
class Solution {
public:
uint32_t reverseBits(uint32_t n) {
struct bs
{
unsigned int _00:1; unsigned int _01:1; unsigned int _02:1; unsigned int _03:1;
unsigned int _04:1; unsigned int _05:1; unsigned int _06:1; unsigned int _07:1;
unsigned int _08:1; unsigned int _09:1; unsigned int _10:1; unsigned int _11:1;
unsigned int _12:1; unsigned int _13:1; unsigned int _14:1; unsigned int _15:1;
unsigned int _16:1; unsigned int _17:1; unsigned int _18:1; unsigned int _19:1;
unsigned int _20:1; unsigned int _21:1; unsigned int _22:1; unsigned int _23:1;
unsigned int _24:1; unsigned int _25:1; unsigned int _26:1; unsigned int _27:1;
unsigned int _28:1; unsigned int _29:1; unsigned int _30:1; unsigned int _31:1;
} *b = (bs*)&n,
c =
{
b->_31, b->_30, b->_29, b->_28
, b->_27, b->_26, b->_25, b->_24
, b->_23, b->_22, b->_21, b->_20
, b->_19, b->_18, b->_17, b->_16
, b->_15, b->_14, b->_13, b->_12
, b->_11, b->_10, b->_09, b->_08
, b->_07, b->_06, b->_05, b->_04
, b->_03, b->_02, b->_01, b->_00
};
return *(unsigned int *)&c;
}
};
Consider casting as providing a different layout stencil on memory.
Using this stencil picture, the code is a layout of a stencil of 32-bits on an unsigned integer memory location.
So instead of treating the memory as a uint32_t, it is treating the memory as 32 bits.
A pointer to the 32-bit structure is created.
The pointer is assigned to the same memory location as the uint32_t variable.
The pointer will allow different treatment of the memory location.
A temporary variable, of 32-bits (using the structure), is created.
The variable is initialized using an initialization list.
The bit fields in the initialization list are from the original variable, listed in reverse order.
So, in the list:
new bit 0 <-- old bit 31
new bit 1 <-- old bit 30
The foundation of this approach relies on initialization lists.
The author is letting the compiler reverse the bits.
The solution uses brute force to revert the bits.
It declares a bitfield structure (that's when the members are followed by :1) with 32 bit fields of one bit each.
The 32 bit input is then seen as such structure, by casting the address of the input to a pointer to the structure. Then c is declared as a variable of that type which is initialized by reverting the order of the bits.
Finally, the bitfield represented by c is reinterpreted as an integer and you're done.
The assembler is not very interesting, as the gcc explorer shows:
https://goo.gl/KYHDY6
It doesn't convert per see, but it just looks at the same memory address differently. It uses the value of the int n, but gets a pointer to that address, typecasts the pointer, and that way, you can interpret the number as a struct of 32 individual bits. So through this struct b you have access to the individual bits of the number.
Then, of a new struct c, each bit is bluntly set by putting bit 31 of the number in bit 0 of the output struct c, bit 30 in bit 1, etcetera.
After that, the value at the memory location of the struct is returned.
First of all, the posted code has a small bug. The line
return *(unsigned int *)&c;
will not return an accurate number if sizeof(unsigned int) is not equal to sizeof(uint32_t).
That line should be
return *(uint32_t*)&c;
Coming to the question of how it works, I will try to explain it with a smaller type, an uint8_t.
The function
uint8_t reverseBits(uint8_t n) {
struct bs
{
unsigned int _00:1; unsigned int _01:1; unsigned int _02:1; unsigned int _03:1;
unsigned int _04:1; unsigned int _05:1; unsigned int _06:1; unsigned int _07:1;
} *b = (bs*)&n,
c =
{
b->_07, b->_06, b->_05, b->_04
, b->_03, b->_02, b->_01, b->_00
};
return *(uint8_t *)&c;
}
uses a local struct. The local struct is defined as:
struct bs
{
unsigned int _00:1; unsigned int _01:1; unsigned int _02:1; unsigned int _03:1;
unsigned int _04:1; unsigned int _05:1; unsigned int _06:1; unsigned int _07:1;
};
That struct has eight members. Each member of the struct is a bitfield of width 1. The space required for an object of type bs is 8 bits.
If you separate the definition of the struct and the variables of that type, the function will be:
uint8_t reverseBits(uint8_t n) {
struct bs
{
unsigned int _00:1; unsigned int _01:1; unsigned int _02:1; unsigned int _03:1;
unsigned int _04:1; unsigned int _05:1; unsigned int _06:1; unsigned int _07:1;
};
bs *b = (bs*)&n;
bs c =
{
b->_07, b->_06, b->_05, b->_04
, b->_03, b->_02, b->_01, b->_00
};
return *(uint8_t *)&c;
}
Now, lets' say the input to the function is 0xB7, which is 1011 0111 in binary. The line
bs *b = (bs*)&n;
says:
Take the address of n ( &n )
Treat it like it is a pointer of type bs* ( (bs*)&n )
Assign the pointer to a variable. (bs *b =)
By doing that, we are able to pick each bit of n and get their values by using the members of b. At the end of that line,
The value of b->_00 is 1
The value of b->_01 is 0
The value of b->_02 is 1
The value of b->_03 is 1
The value of b->_04 is 0
The value of b->_05 is 1
The value of b->_06 is 1
The value of b->_07 is 1
The statement
bs c =
{
b->_07, b->_06, b->_05, b->_04
, b->_03, b->_02, b->_01, b->_00
};
simply creates c such that the bits of c are reversed from the bits of *b.
The line
return *(uint8_t *)&c;
says:
Take the address of c., whose value is the bit pattern 1110 1101.
Treat it like it is a pointer of type uint8_t*.
Dereference the pointer and return the resulting uint8_t
That returns an uint8_t whose value is bitwise reversed from the input argument.
This isn't exactly obfuscated but a comment or two would assist the innocent. The key is in the middle of the variable declarations, and the first step is to recognize that there is only one line of 'code' here, everything else is variable declarations and initialization.
Between declaration and initialization we find:
} *b = (bs*)&n,
c =
{
This declares a variable 'b' which is a pointer (*) to a struct "bs" just defined. It then casts the address of function argument 'n', a unit_32_t, to the type pointer-to-bs, and assigns it to 'b', effectively creating a union of uint_32_t and the bit array bs.
A second variable, an actual struct bs, named "c", is then declared, and it is initialized through the pointer 'b'. b->_31 initializes c._00, and so on.
So after "b" and "c" are created, in that order, there's nothing left to do but return the value of "c".
The author of the code, and the compiler, know that after a struct definition ends, variables of that type or related to that type can be created, before ";", and that's why #Thomas Matthews closes with, "The author is letting the compiler reverse the bits."

converting from size_t to unsigned int

Is it possible that converting from size_t to unsigned int result in overflow .
size_t x = foo ( ) ; // foo ( ) returns a value in type size_t
unsigned int ux = (unsigned int ) x ;
ux == x // Is result of that line always 1 ?
language : c++
platform : any
Yes it's possible, size_t and int don't necessarily have the same size. It's actually very common to have 64bit size_ts and 32bit ints.
C++11 draft N3290 says this in §18.2/6:
The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object.
unsigned int on the other hand is only required to be able to store values from 0 to UINT_MAX (defined in <climits> and following the C standard header <limits.h>) which is only guaranteed to be at least 65535 (216-1).
Yes, overflow can occur on some platforms. For example, size_t can be defined as unsigned long, which can easily be bigger than unsigned int.