reverse_iterator weird behavior with 2D arrays - c++

I have a 2D array. It's perfectly okay to iterate the rows in forward order, but when I do it in reverse, it doesn't work. I cannot figure out why.
I'm using MSVC v143 and the C++20 standard.
int arr[3][4];
for (int counter = 0, i = 0; i != 3; ++i) {
for (int j = 0; j != 4; ++j) {
arr[i][j] = counter++;
}
}
std::for_each(std::begin(arr), std::end(arr), [](auto const& row) {
for (auto const& i: row) {
fmt::print("{} ", i);
}
fmt::print("\n");
});
std::for_each(std::rbegin(arr), std::rend(arr), [](auto const& row) {
for (auto const& i: row) {
fmt::print("{} ", i);
}
fmt::print("\n");
});
The output for the first for_each is fine:
0 1 2 3
4 5 6 7
8 9 10 11
Yet the second one is garbage:
-424412040 251 -858993460 -858993460
-424412056 251 -858993460 -858993460
-424412072 251 -858993460 -858993460
When I print their addresses up I couldn't understand it:
<Row addr=0xfbe6b3fc58/>
0 1 2 3
<Row addr=0xfbe6b3fc68/>
4 5 6 7
<Row addr=0xfbe6b3fc78/>
8 9 10 11
<Row addr=0xfbe6b3fb98/>
-424412040 251 -858993460 -858993460
<Row addr=0xfbe6b3fb98/>
-424412056 251 -858993460 -858993460
<Row addr=0xfbe6b3fb98/>
-424412072 251 -858993460 -858993460
What is happening here?

This is very likely a code generation bug of MSVC related to pointers to multidimensional arrays: The std::reverse_iterator::operator*() hidden in the range-based loop is essentially doing a *--p, where p is a pointer type to an int[4] pointing to the end of the array. Decrementing and dereferencing in a single statement causes MSVC to load the address of the local variable p instead of the address of the previous element pointed to by the decremented p, essentially resulting in the address of the local variable p being returned.
You can observe the problem better in the following standalone example (https://godbolt.org/z/x9q5M74Md):
#include <iostream>
using Int4 = int[4]; // To avoid the awkward pointer-to-array syntax
int arr[3][4] = {};
Int4 & test1()
{
Int4 * p = arr;
Int4 * pP1 = p + 1;
// Works correctly
--pP1;
Int4 & deref = *pP1;
return deref;
}
Int4 & test2()
{
Int4 * p = arr;
Int4 * pP1 = p + 1;
// msvc incorrectly stores the address of the local variable pP1 (i.e. &pP1) in deref
Int4 & deref = *--pP1;
return deref;
}
int main()
{
std::cout << "arr = 0x" << &arr[0][0] << std::endl;
std::cout << "test1 = 0x" << &test1() << std::endl; // Works
std::cout << "test2 = 0x" << &test2() << std::endl; // Bad
}
In this example, &test1() correctly prints the address of the first element of arr. But &test2() actually prints the address of the local variable test2::pP1, i.e. it prints &test2::pP1. MSVC even warns that test2() returns the address of the local variable pP1 (C4172).
clang and gcc work fine. Versions before MSVC v19.23 also compile the code correctly.
Looking at the assembly output, clang and gcc emit the same code for test1() and test2(). But MSVC is doing:
; test1()
mov rax, QWORD PTR pP1$[rsp]
mov QWORD PTR deref$[rsp], rax
; test2()
lea rax, QWORD PTR pP1$[rsp]
mov QWORD PTR deref$[rsp], rax
Notice the lea instead of the mov statement, meaning that test2() loads the address of pP1.
MSVC seems to get confused somehow by pointers to multidimensional array.

Related

How do you fix error exception for access violation?

I am a little confused as to what I need to fix to handle the error Exception thrown at 0x00AFF748 in ProjectTwo.exe: 0xC0000005: Access violation executing location 0x00AFF748.
I am attempting to write an external assembly language program that is then called by a C++ program. I am unsure if I messed up the assembly language code and that is why I keep getting the error. The goal of my program is
int problem1_ ( )
{
int numberArray [4] = {1, -3, 12, 15};
int result = 0, index = 0;
int numElements = 4;
while (index < numElements)
{
if ( index >= 1 && numberArray[index] > 2 )
{
result = result * numberArray[index] ;
}
else
{
result = result - 5;
}
index++;
}
return result;
}
My C++ Code:
#include<iostream>
using namespace std;
extern "C" int problem1_();
int main()
{
//Declare the variable and set it to returned result
//From calling the assembly program
int x = problem1_();
//Print the results
cout << "The result is: "<< x << endl;
return 0;
}
My assembly Language code:
.model flat, c
.data
numberArray DWORD 1, -3, 12, 15
numElements DWORD 5
.code
problem1_ proc
mov eax, 0 ;value of eax is 0
lea ebx, numberArray ;Load the array's memory address into register ebx'
mov ecx, numElements ;ecx is count down counter
cmp ecx, 0 ;If ecx not greater than 0, jump to quit
jng FINISH
iterateThruArray:
cmp[ebx], eax ;Compare value at current index to eax
jl result1
jmp result2
result1:
mul numberArray
jmp FINISH
result2:
sub eax, 5
jmp FINISH
FINISH:
pop ebp ; pop ebp from the stack
ret ; return to calling program
problem1_ endp
end

Why does shifting a value further than its size not result in 0?

Actual refined question:
Why does this not print 0?
#include "stdafx.h"
#include <iostream>
#include <string>
int _tmain(int argc, _TCHAR* argv[])
{
unsigned char barray[] = {1,2,3,4,5,6,7,8,9};
unsigned long weirdValue = barray[3] << 32;
std::cout << weirdValue; // prints 4
std::string bla;
std::getline(std::cin, bla);
return 0;
}
The disassembly of the shift operation:
10: unsigned long weirdValue = barray[3] << 32;
00411424 movzx eax,byte ptr [ebp-1Dh]
00411428 shl eax,20h
0041142B mov dword ptr [ebp-2Ch],eax
Original question:
I found the following snippet in some old code we maintain. It converts a byte array to multiple float values and adds the floats to a list. Why does it work for byte arrays greater than 4?
unsigned long ulValue = 0;
for (USHORT usIndex = 0; usIndex < m_oData.usNumberOfBytes; usIndex++)
{
if (usIndex > 0 && (usIndex % 4) == 0)
{
float* pfValue = (float*)&ulValue;
oValues.push_back(*pfValue);
ulValue = 0;
}
ulValue += (m_oData.pabyDataBytes[usIndex] << (8*usIndex)); // Why does this work for usIndex > 3??
}
I would understand that this works if << was a rotate operator, not a shift operator. Or if it was
ulValue += (m_oData.pabyDataBytes[usIndex] << (8*(usIndex%4)))
But the code like i found it just confuses me.
The code is compiled using VS 2005.
If i try the original snippet in the immediate window, it doesn't work though.
I know how to do this properly, i just want to know why the code and especially the shift operation works as it is.
Edit: The disassembly for the shift operation is:
13D61D0A shl ecx,3 // multiply uIndex by 8
13D61D0D shl eax,cl // shift to left, does nothing for multiples of 32
13D61D0F add eax,dword ptr [ulValue]
13D61D15 mov dword ptr [ulValue],eax
So the disassembly is fine.
The shift count is masked to 5 bits, which limits the range to 0-31.
A shift of 32 therefore is same as a shift of zero.
http://x86.renejeschke.de/html/file_module_x86_id_285.html

Pointers passed by reference updated with every loop iteration: (Sanity check - Part 2)

I am trying to grasp come concepts related to pointers and references. In the code snippet below, I call function shifting() several times from a for loop while iterating a linked list. With each iteration, int *ptr and int count are changed and passed to the function shifting(). Also, some other pointers and integer variables are passed by reference.
Doubts:
can I assign pointer references like shown in shifting() function? Same question for integer val references.
What is going on in the background? I read that references cannot be re-assigned. Is this not the case here every time shifting() is called?
please note, *ptr and count are NOT passed by reference. They are only to be read.
void shifting(int const * ptr, int const * &ptr6, int const * &ptr7, int const * &ptr8,
int count, int &val6, int &val7, int &val8)
{
ptr8 = ptr7; val8 = val7;
ptr7 = ptr6; val7 = val6;
ptr6 = ptr; val6 = count;
}
int main()
{
int const *ptr6 = NULL; int val6 = 0;
int const *ptr7 = NULL; int val7 = 0;
int const *ptr8 = NULL; int val8 = 0;
int count = 0;
// myList: a linked-list
// front(): gives first element of list
// back(): gives last element of list
// nextElement(): gives next element of list
for (int *ptr = myList.front(); ptr != myList.back(); ptr = nextElement();)
{
count++;
shifting(ptr, ptr6, ptr7, ptr8, count, val6, val7, val8);
}
}
EDIT: I tried the above example (after posting here) with only integer part as shown below:
#include <iostream>
using namespace std;
void shifting( int i, int &val6, int &val7, int &val8 )
{
val8 = val7;
val7 = val6;
val6 = i;
}
int main()
{
int val6 = 0;
int val7 = 0;
int val8 = 0;
for (int i = 1; i <= 10; i++)
{
shifting(i, val6, val7, val8);
cout <<"i: "<<i<<" val6: "<<val6<<" val7: "<<val7<<" val8: "<<val8<<endl;
}
return 0;
}
I got this output as below. How come references are being re-assigned??? I read they are not supposed to reassign.
i: 1 val6: 1 val7: 0 val8: 0
i: 2 val6: 2 val7: 1 val8: 0
i: 3 val6: 3 val7: 2 val8: 1
i: 4 val6: 4 val7: 3 val8: 2
i: 5 val6: 5 val7: 4 val8: 3
i: 6 val6: 6 val7: 5 val8: 4
i: 7 val6: 7 val7: 6 val8: 5
i: 8 val6: 8 val7: 7 val8: 6
i: 9 val6: 9 val7: 8 val8: 7
i: 10 val6: 10 val7: 9 val8: 8
can I assign pointer references like shown in shifting() function? Same question for integer val references.
Yes. But when you do so, you are assigning to the referent, not the reference.
What is going on in the background?
This is what's (effectively) going on in the background. That is to say, if you wanted to achieve the same effect without references, you might do this:
void shifting(int const * ptr, int const ** ptr6, int const ** ptr7, int const ** ptr8,
int count, int* val6, int *val7, int *val8)
{
*ptr8 = *ptr7; *val8 = *val7;
*ptr7 = *ptr6; *val7 = *val6;
*ptr6 = ptr; *val6 = count;
}
And at the place where you call it:
shifting(ptr, &ptr6, &ptr7, &ptr8, count, &val6, &val7, &val8);
I read that references cannot be re-assigned.
That is correct. Although I prefer the term re-bound. You can do what looks like an assignment to a reference. But that is actually an assignment to the referent. In fact, after a reference is created, thereafter, its name acts as an alias to the referent. Any operation on the reference (with the exception of decltype) is as if done on the referent.
Is this not the case here every time shifting() is called?
No. When you use the assignment operator on a reference, you are actually assigning to the referent, not the reference.
Your example is extremely convoluted, perhaps something simpler will clear things up.
int a = 0;
int b = 77;
int& ra = a; // ra is a reference to a, and always will be
int& rb = b; // rb is a reference to b, and always will be
ra = b; // ra is still a reference to a, but now a == 77
ra = 999; // now a == 999
rb = ra; // rb is still a reference to b, but now b == 999
To further clarify, the above example is exactly equivalent to this:
int a = 0;
int b = 77;
a = b;
a = 999;
b = a;
Or this example using pointers:
int a = 0;
int b = 77;
int* pa = &a; // pa is a pointer to a, this can change, but won't in this example
int* pb = &b; // pb is a pointer to b, this can change, but won't in this example
*pa = b; // pa is still a pointer to a, but now a == 77
*pa = 999; // now a == 999
*pb = *pa; // pb is still a pointer to b, but now b == 999
I think you may partly be confusing yourself by re-using variable names between the function and the caller: For example, within shifting val6 is a local variable, bound to whatever it was called with, which just happens to be an external variable with the same name.
#include <iostream>
void test(int& i) {
std::cout << "test/i = " << i << '\n';
}
int main() {
int i = 1;
int j = 42;
test(i);
test(j);
}
This outputs (see http://ideone.com/lRNINP)
test/i = 1
test/i = 42
The significance of being a local reference is that it goes away at the end of the function iteration and is reformed with the new arguments at the next iteration.
I got this output as below. How come references are being re-assigned???
They aren't, the values are simply shifting as your function appears to aim to do: (see http://ideone.com/oCmOPk)
#include <iostream>
int main() {
int i = 0;
int& ir = i; // not an assignment, a binding
ir = 1; // pass '1' thru 'ir' to 'i'.
std::cout << "ir " << ir << ", i " << i << "\n";
int j = 0;
int& jr = j;
jr = 42; // pass '42' thru 'jr' to 'j'
std::cout << "jr " << jr << ", j " << j << "\n";
ir = jr; // pass the *value* of 'jr' thru 'ir' to 'i'
j = 99;
// if ir is rebound to jr, both of these will be 99.
std::cout << "ir " << ir << ", i " << i << "\n";
std::cout << "jr " << jr << ", j " << j << "\n";
}
Output:
ir 1, i 1
jr 42, j 42
ir 42, i 42
jr 99, j 99

How does an uint32_t pointer work in this code?

I'm really confused by how uint32_t pointers work in C++
I was just fiddling around trying to learn TEA, and I didn't understand when they passed a uint32_t parameter to the encrypt function, and then in the function declared a uint32_t variable and assigning the parameter to it as if the parameter is an array.
Like this:
void encrypt (uint32_t* v, uint32_t* k) {
uint32_t v0=v[0], v1=v[1], sum=0, i;
So I decided to play around with uint32_t pointers, and wrote this short code:
int main ()
{
uint32_t *plain_text;
uint32_t key;
unsigned int temp = 123232;
plain_text = &temp;
key = 7744;
cout << plain_text[1] << endl;
return 0;
}
And it blew my mind when the output was the value of "key". I have no idea how it works... and then when I tried with plain_text[0], it came back with the value of "temp".
So I'm stuck as hell trying to understand what's happening.
Looking back at the TEA code, is the uint32_t* v pointing to an array rather than a single unsigned int? And was what I did just a fluke?
uint32_t is a type. It means unsigned 32-bit integer. On your system it is probably a typedef name for unsigned int.
There's nothing special about a pointer to this particular type; you can have pointers to any type.
The [] in C and C++ are actually pointer indexing notation. p[0] means to retrieve the value at the location the pointer points to. p[1] gets the value at the next memory location after that. Then p[2] is the next location after that, and so on.
You can use this notation with arrays too because the name of an array is converted to a pointer to its first element when used like this.
So, your code plain_text[1] tries to read the next element after temp. Since temp is not actually an array, this causes undefined behaviour. In your particular case, the manifestation of this undefined behaviour is that it managed to read the memory address after temp without crashing, and that address was the same address where key is stored.
Formally your program has undefined behavior.
The expression plain_text[1] is equivalent to *(plain_text + 1) ([expr.sub] / 1). Although you can point to one past the end of an array (objects that aren't arrays are still considered single-element arrays for the purposes of pointer arithmetic ([expr.unary.op] / 3)), you cannot dereference this address ([expr.unary.op] / 1).
At this point the compiler can do whatever it wants to, in this case it has simply decided to treat the expression as if it were pointing to an array and that plain_text + 1, i.e. &temp + 1 points to the next uint32_t object in the stack, which in this case by coincidence is key.
You can see what's going on if you look at the assembly
mov DWORD PTR -16[rbp], 123232 ; unsigned int temp=123232;
lea rax, -16[rbp]
mov QWORD PTR -8[rbp], rax ; plain_text=&temp;
mov DWORD PTR -12[rbp], 7744 ; key=7744;
mov rax, QWORD PTR -8[rbp]
add rax, 4 ; plain_text[1], i.e. -16[rbp] + 4 == -12[rbp] == key
mov eax, DWORD PTR [rax]
mov edx, eax
mov rcx, QWORD PTR .refptr._ZSt4cout[rip]
call _ZNSolsEj ; std::ostream::operator<<(unsigned int)
mov rdx, QWORD PTR .refptr._ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_[rip]
mov rcx, rax
call _ZNSolsEPFRSoS_E ; std::ostream::operator<<(std::ostream& (*)(std::ostream&))
mov eax, 0
add rsp, 48
pop rbp
ret
In C and C++ arrays decay to pointers, resulting in array/pointer equivalence.
a[1]
when a is a simple type is equivalent to
*(a + 1)
If a is an array of simple types, a will decay at the earliest opportunity to the address of element 0.
int arr[5] = { 0, 1, 2, 3, 4 };
int i = 10;
int* ptr;
ptr = arr;
std::cout << *ptr << "\n"; // outputs 0
ptr = &arr[0]; // same address
std::cout << *ptr << "\n"; // outputs 0
std::cout << ptr[4] << "\n"; // outputs 4
std::cout << *(ptr + 4) << "\n"; // outputs 4
ptr = &i;
std::cout << *ptr << "\n"; // outputs 10
std::cout << ptr[0] << "\n";
std::cout << ptr[1] << "\n"; // UNDEFINED BEHAVIOR.
std::cout << *(ptr + 1) << "\n"; // UNDEFINED BEHAVIOR.
To understand ptr[0] and ptr[1] you simply have to understand pointer arithmetic.
uint32_t *plain_text; // In memory, four bytes are reserved for ***plain_text***
uint32_t key; // In memory, the next four bytes after ***plain_text*** are reserved for ***key***
Thus: &plain_text[0] is plain_text and &plain_text[1] refers to the the next four bytes which are at &key.
This scenario may explain that behaviour.

Where and how are constants stored?

I read this question from here and I also read related question from c-faq but I don't understand the exact reason behind this :-
#include <iostream>
using namespace std;
int main()
{
//const int *p1 = (int*) &(5); //error C2101: '&' on constant
//cout << *p1;
const int five = 5;
const int *p2 = &(five);
cout << *p2 << endl;
char *chPtr = (char*) &("abcde");
for (int i=0; i<4; i++) cout << *(chPtr+i);
cout << endl;
return 0;
}
I was wondering how constants, either integer or string literal, get stored. My understanding of string literals is that they are created in global static memory upon start of program and persist until program exit. In the case of "abcde" even though I did not give it a variable name I can take it's address (chPtr) and I assume I could probably dereference chPtr any time before program termination and the character values would still be there, even if I dereferenced it outside the scope where it was declared. Is the const int variable "five" also placed in global static and that address p2 can also be referenced any time?
Why can I take the address of "five" but I cannot ask for: &(5) ? Are the constants "5" and "five" stored differently? and where "5" is get stored in memory ?
You cannot take the address of a literal (e.g. &(5)) because the literal is not "stored" anywhere - it is actually written in the assembly instruction. Depending on the platform, you'll get different instructions, but a MIPS64 addition example would look like this:
DADDUI R1, R1, #5
Trying to take the address of the immediate is meaningless as it doesn't reside in (data) memory, but is actually part of the instruction.
If you declare a const int i = 5, and do not need the address of it, the compiler can (and likely will) convert it to a literal and place 5 in the appropriate assembly instructions. Once you attempt to take the address of i, the compiler will see that it can no longer do that, and will place it in memory. This is not the case if you just attempt to take the address of a literal because you haven't indicated to the compiler that it needed to allocate space for a variable (when you declare a const int i, it allocates the space in the first pass, and will later determine it no longer needs it - it does not function in the reverse).
String constants are stored in the static portion of the data memory - which is why you can take the address of them.
"It depends" is probably not a satisfying answer, but it is the correct one. The compiler will store some const variables in the stack if it needs to (such as if you ever take the address of it). However, there has always been the idea of a "constexpr" variable in compilers, even if we didn't always have the mechansim to call it directly: If an expression can be calculated at compile time, then instead of caluclating it at run time, we can calculate it durring compile time. And if we can calculate it at compile time, and we never do anything that requires it to be something different, then we can remove it all together and turn it into a literal, which would be part of the instruction!
Take for example, the following code:
int main(int argc, char** argv)
{
const int a = 2;
const int b = 3;
const int c = a+b;
volatile int d = 6;
volatile int e = c+d;
std::cout << e << std::endl;
return 0;
}
Look at how smart the compiler is:
37 const int a = 2;
38 const int b = 3;
39 const int c = a+b;
40
41 volatile int d = 6;
0x400949 <+0x0009> movl $0x6,0x8(%rsp)
42 volatile int e = c+d;
0x400951 <+0x0011> mov 0x8(%rsp),%eax
0x400955 <+0x0015> add $0x5,%eax
0x400958 <+0x0018> mov %eax,0xc(%rsp)
43
44 std::cout << e << std::endl;
0x400944 <+0x0004> mov $0x601060,%edi
0x40095c <+0x001c> mov 0xc(%rsp),%esi
0x400960 <+0x0020> callq 0x4008d0 <_ZNSolsEi#plt>
45 return 0;
46 }
(volatile tells the compiler not to do fancy memory tricks to that variable) In line 41, when I use c, the add is done with the LITERAL 0x5, despite it even be a combination of the other code. Lines 37-39 contain NO instructions.
Now lets change the code so that I need the location of a:
int main(int argc, char** argv)
{
const int a = 2;
const int b = 3;
const int c = a+b;
volatile int d = 6;
volatile int e = c+d;
volatile int* f = (int*)&a;
volatile int g = *f;
std::cout << e << std::endl;
std::cout << g << std::endl;
return 0;
}
37 const int a = 2;
0x400955 <+0x0015> movl $0x2,(%rsp)
38 const int b = 3;
39 const int c = a+b;
40
41 volatile int d = 6;
0x400949 <+0x0009> movl $0x6,0x4(%rsp)
42 volatile int e = c+d;
0x400951 <+0x0011> mov 0x4(%rsp),%eax
0x40095c <+0x001c> add $0x5,%eax
0x40095f <+0x001f> mov %eax,0x8(%rsp)
43 volatile int* f = (int*)&a;
44 volatile int g = *f;
0x400963 <+0x0023> mov (%rsp),%eax
0x400966 <+0x0026> mov %eax,0xc(%rsp)
45
46 std::cout << e << std::endl;
0x400944 <+0x0004> mov $0x601060,%edi
0x40096a <+0x002a> mov 0x8(%rsp),%esi
0x40096e <+0x002e> callq 0x4008d0 <_ZNSolsEi#plt>
47 std::cout << g << std::endl;
0x40097b <+0x003b> mov 0xc(%rsp),%esi
0x40097f <+0x003f> mov $0x601060,%edi
0x400984 <+0x0044> callq 0x4008d0 <_ZNSolsEi#plt>
48 return 0;
So we can see that a is initialized into actual memory space, on the stack (I can tell cuz the rsp). But wait...c is dependent on a, but whenever I use c it is still a literal 5! What is happening here? Well, the compiler knows that a needs to be in a memory location because of the way it is used. However, it knows that the variable's value is never NOT 2, so whenever I use it in ways that don't need the memory, I can use it as a literal 2. Which means the a in line 37 is not the same as the a in line 43.
So where are const variables stored? They are stored where they NEED to be stored. CRAZY.
(btw, these were all compiled with g++ -g -O2, different compilers/flags will optimize it differently, this mostly demonstrates what the compiler can do, the only guarantee is that your code will behave correctly.)
Here's an example of taking the address of a const int and demonstrating that (in gcc on my machine, at least) it's stored as a local (not global static) variable.
#include <iostream>
const int *func() {
const int five = 5;
const int *p = &(five);
std::cout << *p << '\n';
return p;
}
// function to overwrite stack values left by earlier function call
int func2(int n, int x) {
for (int i = 0; i < x; ++i)
n *= 2;
return n;
}
int main() {
const int *p = func();
std::cout << func2(2, 10) << '\n';
std::cout << *p << '\n';
return 0;
}
Example output:
5
2048
1