Is pointer arithmetic in iterations overflow-safe? - c++

I've seen very often array iterations using plain pointer arithmetic even in newer C++ code. I wonder how safe they really are and if it's a good idea to use them. Consider this snippet (it compiles also in C if you put calloc in place of new):
int8_t *buffer = new int8_t[16];
for (int8_t *p = buffer; p < buffer + 16; p++) {
...
}
Wouldn't this kind of iteration result in an overflow and the loop being skipped completely when buffer happens to become allocated at address 0xFFFFFFF0 (in a 32 bit address space) or 0xFFFFFFFFFFFFFFF0 (64 bit)?
As far as I know, this would be an exceptionally unlucky, but still possible circumstance.

This is safe. The C and C++ standards explicitly allow you to calculate a pointer value that points one item beyond the end of an array, and to compare a pointer that points within the array to that value.
An implementation that had an overflow problem in the situation you describe would simply not be allowed to place an array right at the end of memory like that.
In practice, a more likely problem is buffer + 16 comparing equal to NULL, but this is not allowed either and again a conforming implementation would need to leave an empty place following the end of the array.

Related

C++/Address Space: 2 Bytes per adress?

I was just trying something and i was wondering how this could be. I have the following Code:
int var1 = 132;
int var2 = 200;
int *secondvariable = &var2;
cout << *(secondvariable+2) << endl << sizeof(int) << endl;
I get the Output
132
4
So how is it possible that the second int is only 2 addresses higher? I mean shouldn't it be 4 addresses? I'm currently under WIN10 x64.
Regards
With cout << *(secondvariable+2) you don't print a pointer, you print the value at secondvariable[2], which is an invalid indexing and lead to undefined behavior.
If you want to print a pointer then drop the dereference and print secondvariable+2.
While you already are far in the field of undefined behaviour (see Some programmer dude's answer) due to indexing an array out of bounds (a single variable is considered an array of length 1 for such matters), some technical background:
Alignment! Compilers are allowed to place variables at addresses such that they can be accessed most efficiently. As you seem to have gotten valid output by adding 2*sizeof(int) to the second variable's address, you apparently have reached the first one by accident. Apparently, the compiler decided to leave a gap in between the two variables so that both can be aligned to addresses dividable by 8.
Be aware, though, that you don't have any guarantee for such alignment, different compilers might decide differently (or same compiler on another system), and alignment even might be changed via compiler flags.
On the other hand, arrays are guaranteed to occupy contiguous memory, so you would have gotten the expected result in the following example:
int array[2];
int* a0 = &array[0];
int* a1 = &array[1];
uintptr_t diff = static_cast<uintptr_t>(a1) - static_cast<uintptr_t>(a0);
std::cout << diff;
The cast to uintptr_t (or alternatively to char*) assures that you get address difference in bytes, not sizes of int...
This is not how C++ works.
You can't "navigate" your scope like this.
Such pointer antics have completely undefined behaviour and shall not be relied upon.
You are not punching holes in tape now, you are writing a description of a program's semantics, that gets converted by your compiler into something executable by a machine.
Code to these abstractions and everything will be fine.

Aligning buffer to an N-byte boundary but not a 2N-byte one?

I would like to allocate some char buffers0, to be passed to an external non-C++ function, that have a specific alignment requirement.
The requirement is that the buffer be aligned to a N-byte1 boundary, but not to a 2N boundary. For example, if N is 64, then an the pointer to this buffer p should satisfy ((uintptr_t)p) % 64 == 0 and ((uintptr_t)p) % 128 != 0 - at least on platforms where pointers have the usual interpretation as a plain address when cast to uintptr_t.
Is there a reasonable way to do this with the standard facilities of C++11?
If not, is there is a reasonable way to do this outside the standard facilities2 which works in practice for modern compilers and platforms?
The buffer will be passed to an outside routine (adhering to the C ABI but written in asm). The required alignment will usually be greater than 16, but less than 8192.
Over-allocation or any other minor wasted-resource issues are totally fine. I'm more interested in correctness and portability than wasting a few bytes or milliseconds.
Something that works on both the heap and stack is ideal, but anything that works on either is still pretty good (with a preference towards heap allocation).
0 This could be with operator new[] or malloc or perhaps some other method that is alignment-aware: whatever makes sense.
1 As usual, N is a power of two.
2 Yes, I understand an answer of this type causes language-lawyers to become apoplectic, so if that's you just ignore this part.
Logically, to satisfy "aligned to N, but not 2N", we align to 2N then add N to the pointer. Note that this will over-allocate N bytes.
So, assuming we want to allocate B bytes, if you just want stack space, alignas would work, perhaps.
alignas(N*2) char buffer[B+N];
char *p = buffer + N;
If you want heap space, std::aligned_storage might do:
typedef std::aligned_storage<B+N,N*2>::type ALIGNED_CHAR;
ALIGNED_CHAR buffer;
char *p = reinterpret_cast<char *>(&buffer) + N;
I've not tested either out, but the documentation suggests it should be OK.
You can use _aligned_malloc(nbytes,alignment) (in MSVC) or _mm_malloc(nbytes,alignment) (on other compilers) to allocate (on the heap) nbytes of memory aligned to alignment bytes, which must be an integer power of two.
Then you can use the trick from Ken's answer to avoid alignment to 2N:
void*ptr_alloc = _mm_malloc(nbytes+N,2*N);
void*ptr = static_cast<void*>(static_cast<char*>(ptr_alloc) + N);
/* do your number crunching */
_mm_free(ptr_alloc);
We must ensure to keep the pointer returned by _mm_malloc() for later de-allocation, which must be done via _mm_free().

How near to the maximum pointer value can an object be to avoid overflow?

How close to the maximum value can a valid pointer be (as a global, allocated on the stack, malloc, new, VirtualAlloc, or any other alloc method a program/library might use), such that ptr + n risks overflowing?
I come across a lot of code that adds values to pointers when dealing with strings/arrays (in C++ sometimes also in a generic "random access iterator" template function).
e.g.
auto end = arr_ptr + len; //or just whatever some_container.end() returns
for (auto i = begin; i < end; ++i) { ... }
for (auto i = begin; i + 2 <= end; i += 2) { ...i[0]...i[1]... }
if (arr_ptr + 4 <= end && memcmp(arr_ptr, "test", 4) == 0) { ... }
if (arr_ptr + count > end) resize(...);
Would it be valid for the last array element to end on 0xFFFFFFFF (assuming 32bit), such that end == 0? If not, how close can it be?
I think always using p != end (and only ever adding 1) or taking the length as len = end - begin then using that (e.g. (end - begin) >= 4) is always safe, but wondering if it is actually an issue to look out for, and audit and change existing code for.
The standard doesn't talk about pointer overflow, it talks about what pointer values can legitimately be formed by pointer arithmetic. Simply put, the legitimate range is pointers into your object/array plus a one-past-the-end pointer.
Then, it is the responsibility of the C or C++ implementation not to create any objects in locations where some implementation-specific danger like pointer overflow prevents those legitimate pointer values from working correctly.
So neither malloc etc, nor the stack (presuming you haven't exceeded any stack bounds) will give you an array of char, starting at an address to which you cannot (due to overflow) add the size of the array.
how close can it be?
As close as allows all the required pointer values to work correctly. So on this 32-bit system, a 1-byte object starting at 0xFFFFFFFE would be the maximum possible address. The standard doesn't permit you to add 2 to the address, so it "doesn't matter" that doing so would overflow, so far as the implementation is concerned. For a 2-byte object the max would be 0xFFFFFFFD if the type is unaligned, but that's an odd number, so 0xFFFFFFFC if it requires 2-alignment.
Of course, other implementation details might dictate a lower limit. For example, it's not unusual for a system to reserve a page of memory either side of 0 and make it inaccessible. This helps catch errors where someone has accessed a null pointer with a small offset. Granted, this is more likely to happen with positive offsets than negative, but still. If your 32-bit system decided to do that, then malloc would need to take account of it, and would never return 0xFFFFFFFE.

Why do I get a random number when increasing the integer value of a pointer?

I am an expert C# programmer, but I am very new to C++. I get the basic idea of pointers just fine, but I was playing around. You can get the actual integer value of a pointer by casting it as an int:
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
Which makes sense; it's a memory address. But I can move to the next pointer, and cast it as an int:
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
int* jptr = (int*)((int)iptr + 1);
int j = (int)*iptr;
and I get a seemingly random number (although this is not a good PSRG). What is this number? Is it another number used by the same process? Is it possibly from a different process? Is this bad practice, or disallowed? And if not, is there a use for this? It's kind of cool.
What is this number? Is it another number used by the same process? Is it possibly from a different process?
You cannot generally cast pointers to integers and back and expect them to be dereferencable. Integers are numbers. Pointers are pointers. They are totally different abstractions and are not compatible.
If integers are not large enough to be able to store the internal representation of pointers (which is likely the case; integers are usually 32 bits long and pointers are usually 64 bits long), or if you modify the integer before casting it back to a pointer, your program exhibits undefined behaviour and as such anything can happen.
See C++: Is it safe to cast pointer to int and later back to pointer again?
Is this bad practice, or disallowed?
Disallowed? Nah.
Bad practice? Terrible practice.
You move beyond i pointer by 4 or 8 bytes and print out the number, which might be another number stored in your program space. The value is unknown and this is Undefined Behavior. Also there is a good chance that you might get an error (that means your program can blow up) [Ever heard of SIGSEGV? The Segmentation violation problem]
You are discovering that random places in memory contain "unknown" data. Not only that, but you may find yourself pointing to memory that your process does not have "rights" to so that even the act of reading the contents of an address can cause a segmentation fault.
In general is you allocate some memory to a pointer (for example with malloc) you may take a look at these locations (which may have random data "from the last time" in them) and modify them. But data that does not belong explicitly to a pointer's block of memory can behave all kings of undefined behavior.
Incidentally if you want to look at the "next" location just to
NextValue = *(iptr + 1);
Don't do any casting - pointer arithmetic knows (in your case) exactly what the above means : " the contents of the next I refer location".
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
int* jptr = (int*)((int)iptr + 1);
int j = (int)*iptr;
You can cast int to pointer and back again, and it will give you same value
Is it possibly from a different process? no it's not, and you can't access memory of other process except using readProcessMemmory and writeProcessMemory under win32 api.
You get other number because you add 1 to the pointer, try to subtract 1 and you will same value.
When you define an integer by
int i = 5;
it means you allocate a space in your thread stack, and initialize it as 5. Then you get a pointer to this memory, which is actually a position in you current thread stack
When you increase your pointer by 1, it means you point to the next location in your thread stack, and you parse it again as an integer,
int* jptr = (int*)((int)iptr + 1);
int j = (int)*jptr;
Then you will get an integer from you thread stack which is close to where you defined your int i.
Of course this is not suggested to do, unless you want to become an hacker and want to exploit stack overflow (here it means what it is, not the site name, ha!)
Using a pointer to point to a random address is very dangerous. You must not point to an address unless you know what you're doing. You could overwrite its content or you may try to modify a constant in read-only memory which leads to an undefined behaviour...
This for example when you want to retrieve the elements of an array. But cannot cast a pointer to integer. You just point to the start of the array and increase your pointer by 1 to get the next element.
int arr[5] = {1, 2, 3, 4, 5};
int *p = arr;
printf("%d", *p); // this will print 1
p++; // pointer arithmetics
printf("%d", *p); // this will print 2
It's not "random". It just means that there are some data on the next address
Reading a 32-bit word from an address A will copy the 4 bytes at [A], [A+1], [A+2], [A+3] into a register. But if you dereference an int at [A+1] then the CPU will load the bytes from [A+1] to [A+4]. Since the value of [A+4] is unknown it may make you think that the number is "random"
Anyway this is EXTREMELY dangerous 💀 since
the pointer is misaligned. You may see the program runs fine because x86 allows for unaligned accesses (with some performance penalty). But most other architectures prohibit unaligned operations and your program will just end in segmentation fault. For more information read Purpose of memory alignment, Data Alignment: Reason for restriction on memory address being multiple of data type size
you may not be allowed to touch the next byte as it may be outside of your address space, is write-only, is used for another variable and you changed its value, or whatever other reasons. You'll also get a segfault in that case
the next byte may not be initialized and reading it will crash your application on some architectures
That's why the C and C++ standard state that reading memory outside an array invokes undefined behavior. See
How dangerous is it to access an array out of bounds?
Access array beyond the limit in C and C++
Is accessing a global array outside its bound undefined behavior?

Memcpy : Adding an int offset?

I was looking over some C++ code and I ran into this memcpy function. I understand what memcpy does but they add an int to the source. I tried looking up the source code for memcpy but I can't seem to understand what the adding is actually doing to the memcpy function.
memcpy(Destination, SourceData + intSize, SourceDataSize);
In other words, I want to know what SourceData + intSize is doing. (I am trying to convert this to java.)
EDIT:
So here is my attempt at doing a memcpy function in java using a for loop...
for(int i = 0 ; i < SourceDataSize ; i ++ ) {
Destination[i] = SourceData[i + 0x100];
}
It is the same thing as:
memcpy(&Destination[0], &SourceData[intSize], SourceDataSize);
This is basic pointer arithmetic. SourceData points to some data type, and adding n to it increases the address it's pointing to by n * sizeof(*SourceData).
For example, if SourceData is defined as:
uint32_t *SourceData;
and
sizeof(uint32_t) == 4
then adding 2 to SourceData would increase the address it holds by 8.
As an aside, if SourceData is defined as an array, then adding n to it is sometimes the same as accessing the nth element of the array. It's easy enough to see for n==0; when n==1, it's easy to see that you'll be accessing a memory address that's sizeof(*SourceData) bytes after the beginning of the array.
SourceData + intSize is skipping intSize * sizeof(source data type) bytes at the beginning of SourceData. Maybe SourceDataSize is stored there or something like that.
The closest equivalent to memcpy in Java that you're probably going to get is System.arraycopy, since Java doesn't really have pointers in the same sense.
The add will change the address used for the source of the memory copy.
The amount the address changes will depend on the type of SourceData.
(See http://www.learncpp.com/cpp-tutorial/68-pointers-arrays-and-pointer-arithmetic/)
It might be trying to copy a section of an array SourceData starting at offset intSize and of length SourceDataSize/sizeof(*SourceData).
EDIT
So, for example, if the array was of integers of size 4 bytes, then the equivalent java code would look like:
for(int i = 0 ; i < SourceDataSize/4 ; i ++ ) {
Destination[i] = SourceData[i + intSize];
}
Regarding doing this in Java:
Your loop
for(int i = 0 ; i < SourceDataSize ; i ++ ) {
Destination[i] = SourceData[i + 0x100];
}
will always start copying data from 0x100 elements into SourceData; this may not be desired behavior. (For instance, when i=0, Destination[0] = SourceData[0 + 0x100]; and so forth.) This would be what you wanted if you never wanted to copy SourceData[0]..SourceData[0xFF], but note that hard-coding this prevents it from being a drop-in replacement for memcpy.
The reason the intSize value is specified in the original code is likely because the first intSize elements are not part of the 'actual' data, and those bytes are used for bookkeeping somehow (like a record of what the total size of the buffer is). memcpy itself doesn't 'see' the offset; it only knows the pointer it's starting with. SourceData + intSize creates a pointer that points intSize bytes past SourceData.
But, more importantly, what you are doing is likely to be extremely slow. memcpy is a very heavily optimized function that maps to carefully tuned assembly on most architectures, and replacing it with a simple loop-per-byte iteration will dramatically impact the performance characteristics of the code. While what you are doing is appropriate if you are trying to understand how memcpy and pointers work, note that if you are attempting to port existing code to Java for actual use you will likely want to use a morally equivalent Java function like java.util.Arrays.copyOf.