Contiguous memory guarantees with C++ function parameters - c++

Appel [App02] very briefly mentions that C (and presumably C++) provide guarantees regarding the locations of actual parameters in contiguous memory as opposed to registers when the address-of operator is applied to one of formal parameters within the function block.
e.g.
void foo(int a, int b, int c, int d)
{
int* p = &a;
for(int k = 0; k < 4; k++)
{
std::cout << *p << " ";
p++;
}
std::cout << std::endl;
}
and an invocation such as...
foo(1,2,3,4);
will produce the following output "1 2 3 4"
My question is "How does this interact with calling conventions?"
For example __fastcall on GCC will try place the first two arguments in registers and the remainder on the stack. The two requirements are at odds with each other, is there any way to formally reason about what will happen or is it subject to the capricious nature of implementation defined behaviour?
[App02] Modern Compiler Implementation in Java, Andrew w. Appel, Chapter 6, Page 124
Update: I suppose that this question is answered. I think I was wrong to base the whole question on contiguous memory allocation when what I was looking for (and what the reference speaks of) is the apparent mismatch between the need for parameters being in memory due the use of address-of as opposed to in registers due to calling conventions, maybe that is a question for another day.
Someone on the internet is wrong and sometimes that someone is me.

First of all your code doesn't always produce 1, 2, 3, 4. Just check this one: http://ideone.com/ohtt0
Correct code is at least like this:
void foo(int a, int b, int c, int d)
{
int* p = &a;
for (int i = 0; i < 4; i++)
{
std::cout << *p;
p++;
}
}
So now let's try with fastcall, here:
void __attribute__((fastcall)) foo(int a, int b, int c, int d)
{
int* p = &a;
for (int i = 0; i < 4; i++)
{
std::cout << *p << " ";
p++;
}
}
int main()
{
foo(1,2,3,4);
}
Result is messy: 1 -1216913420 134514560 134514524
So I really doubt that something can be guaranteed here.

There is nothing in the standard about calling conventions or how parameters are passed.
It is true that if you take the address of one variable (or parameter) that one has to be stored in memory. It doesn't say that the value cannot be passed in a register and then stored to memory when its address is taken.
It definitely doesn't affect other variables, who's addresses are not taken.

The C++ standard has no concept of a calling convention. That's left for the compiler to deal with.
In this case, if the standard requires that parameters be contiguous when the address-of operator is applied, there's a conflict between what the standard requires of your compiler and what you require of it.
It's up to the compiler to decide what to do. I'd think most compilers would give your requirements priority over the standard's, however.

Your basic assumption is flawed. On my machine, foo(1,2,3,4) using your code prints out:
1 -680135568 32767 4196336
Using g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3 on 64-bit x86.

Related

What is the point of including array's size in a function declaration?

I understand how arrays can be passed to functions in c++, but I don't understand what is the point of including array's size in a function declaration when this size is ignored anyway, because what we are really passing to function is a pointer to the first element of the array.
For example if we have following code:
void ArrayTest1(int(&ints)[3])
{
for (int i = 0; i < 3; i++) std::cout << ints[i];
}
void ArrayTest2(int ints[3])
{
for (int i = 0; i < 3; i++) std::cout << ints[i];
}
void ArrayTest3(int ints[])
{
for (int i = 0; i < 3; i++) std::cout << ints[i];
}
int main()
{
int ints[] = { 1, 2 };
//ArrayTest1(ints); //won't compile
ArrayTest2(ints); //ok
ArrayTest3(ints); //ok
return 0;
}
Then from what I understand functions ArrayTest2 and ArrayTest3 are identical.
Is syntax used in ArrayTest2 only meant to make it clear that this function expects an array with 3 elements and passing array with a different size can cause errors? I'm not new to programming, but I'm new to c++ so I'd like to know what is the point of such syntax and when do people use it.
One of C++’s major design goals was far reaching compatibility with C. Not supporting all aspects of C-array usage would have been seriously detrimental to this goal.
Your examples even give a good indication that compatibility is indeed the intent, and that C++ would work differently if that hadn’t been the case. Consider your ArrayTest1(). It takes a C-array by reference, something that does not exist in C. And in this case the array dimension is significant.
// C-array of length 3 taken by reference
void foo(int (&x)[3]);
void bar() {
int a[4] = {1,2,3,4};
foo(a); // Does not compile. Wrong array size.
}
Only ArrayTest2() and ArrayTest3() are exactly equivalent to:
void foo(int*);
Btw: Bjarne Stroustrup’s The Design and Evolution of C++ is a good read if you’re interested in why C++ is designed the way it is.
I don't exactly know why C permits this (yes, this comes from C), but it is indeed pointless.
Worse than that, the "wrong" dimension would be very misleading to readers of your code.
We might argue that it was to keep the language grammar simple, as the grammar of a declarator already makes the dimension optional (consider int x[] = {1,2,3} vs int x[3] = {1,2,3}).
Maybe some people use it to document intent; I certainly never would.

Why Is The Value Of An Uninitialized Memory Location Give The Value Of -842150451?

I was messing around with memory allocation. I was testing to see that, like Java, this program with give an exception.
int main() {
int* a = nullptr;
int b = *a;
std::cout << b;
}
Which indeed it does. Then I tested using malloc as a pointer to a, but not initializing a.
int main() {
int* a = (int*) malloc(sizeof(int));
int b = *a;
std::cout << b;
}
However, instead of throwing an exception, it prints out a seemingly random number in -842150451. I even tried replacing int with long:
int main() {
long* a = (long*) malloc(sizeof(long));
long b = *a;
std::cout << b;
}
However I got the same result. Then I tried it with short:
int main() {
short* a = (short*) malloc(sizeof(short));
short b = *a;
std::cout << b;
}
Then instead of the previous result, I got -12851. And it continued like this for every primitive type I could think of. What I want to know is, where are these numbers coming from and why these numbers specifically?
-842150451 is the two's complement representation of the value 0xCDCDCDCD, which is a common Visual Studio debugger value for heap-allocated uninitialized memory.
Uninitialized variables or memory have indeterminate values from the C++ specifications perspective, and using such values lead to undefined behavior. If you remember this, and always initialize such values or memory then you'll be alright.
All of your erroroneous programs have what's technically called undefined behaviour. Which means the behaviour of the program is unconstrained by the C++ standard, and therefore it's wrong to expect any particular outcome when you run your code.
C++ is quite unlike Java in this regard which specifies precise behaviours for most situations.

Inconsistency in passing by reference - how does this simple example of passing by reference work?

I had a simple question and was hoping for the underlying logic behind passing by reference.
Here's one code (let's call it Code1):
void fn(int& a)
{
a = 6;
}
int main()
{
int b = 5;
fn(b);
cout << b;
}
Here's another code (Code2):
void fn(int* ptr)
{
*ptr = 6;
}
int main()
{
int b = 5;
fn(&b);
cout << b;
}
And a pass by value code (Code 3):
void fn(int a)
{
a = 6;
}
int main()
{
int b = 5;
fn(b);
cout << b;
}
Here goes my question. Intuitively, I see that while passing by value (Code3), the values are copied ie a would just have taken/copied into itself the value of b. Thus, as a general rule, I see that value passed is just copied always to the called function (here fn). Even with the pointer code (ie Code2), the first line of Code 2 ensures that int *ptr = &a;
I don't understand how this would work in Code1. Saying that &a = b makes no sense. Is this an exception, or does this fit into a rule that is consistent with the cases discussed in the paragraph above?
Thanks!
In this function:
void fn(int &a) {
a=6;
}
the term "&a" does not mean "the address of the variable a". It means "a reference called a". Code 1 and Code 2 are effectively the same (but note that the function in Code 2 can be passed an invalid pointer, which is (almost) impossible for Code 1).
For most intents and purposes, a reference is just a pointer in disguise. Different syntax, same effect (mostly).
Conceptually, in your first case what happens is that the same variable has two labels: b, visible within the scope of main(); and a, visible within the scope of fn.
You don't have to worry about what the compiler does "behind the scenes" to implement this concept.
If you mentally promote the compiler's "behind the scenes" actions to actually being imagined principles of C++, e.g. "the reference is a pointer in disguise", then it leads you to get confused about what is actually a pretty simple concept: the ability to give multiple names to a variable.
It is nothing special being a function parameter; e.g. you could write in main():
int a;
int &c = a;
which is exactly equivalent to:
int c;
int &a = c;
In both cases there is an int variable with two labels, a and c.

Why strange behavior with casting back pointer to the original class?

Assume that in my code I have to store a void* as data member and typecast it back to the original class pointer when needed. To test its reliability, I wrote a test program (linux ubuntu 4.4.1 g++ -04 -Wall) and I was shocked to see the behavior.
struct A
{
int i;
static int c;
A () : i(c++) { cout<<"A() : i("<<i<<")\n"; }
};
int A::c;
int main ()
{
void *p = new A[3]; // good behavior for A* p = new A[3];
cout<<"p->i = "<<((A*)p)->i<<endl;
((A*&)p)++;
cout<<"p->i = "<<((A*)p)->i<<endl;
((A*&)p)++;
cout<<"p->i = "<<((A*)p)->i<<endl;
}
This is just a test program; in actual for my case, it's mandatory to store any pointer as void* and then cast it back to the actual pointer (with help of template). So let's not worry about that part. The output of the above code is,
p->i = 0
p->i = 0 // ?? why not 1
p->i = 1
However if you change the void* p; to A* p; it gives expected behavior. WHY ?
Another question, I cannot get away with (A*&) otherwise I cannot use operator ++; but it also gives warning as, dereferencing type-punned pointer will break strict-aliasing rules. Is there any decent way to overcome warning ?
Well, as the compiler warns you, you are violating the strict aliasing rule, which formally means that the results are undefined.
You can eliminate the strict aliasing violation by using a function template for the increment:
template<typename T>
void advance_pointer_as(void*& p, int n = 1) {
T* p_a(static_cast<T*>(p));
p_a += n;
p = p_a;
}
With this function template, the following definition of main() yields the expected results on the Ideone compiler (and emits no warnings):
int main()
{
void* p = new A[3];
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
advance_pointer_as<A>(p);
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
advance_pointer_as<A>(p);
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
}
You have already received the correct answer and it is indeed the violation of the strict aliasing rule that leads to the unpredictable behavior of the code. I'd just note that the title of your question makes reference to "casting back pointer to the original class". In reality your code does not have anything to do with casting anything "back". Your code performs reinterpretation of raw memory content occupied by a void * pointer as a A * pointer. This is not "casting back". This is reinterpretation. Not even remotely the same thing.
A good way to illustrate the difference would be to use and int and float example. A float value declared and initialized as
float f = 2.0;
cab be cast (explicitly or implicitly converted) to int type
int i = (int) f;
with the expected result
assert(i == 2);
This is indeed a cast (a conversion).
Alternatively, the same float value can be also reinterpreted as an int value
int i = (int &) f;
However, in this case the value of i will be totally meaningless and generally unpredictable. I hope it is easy to see the difference between a conversion and a memory reinterpretation from these examples.
Reinterpretation is exactly what you are doing in your code. The (A *&) p expression is nothing else than a reinterpretation of raw memory occupied by pointer void *p as pointer of type A *. The language does not guarantee that these two pointer types have the same representation and even the same size. So, expecting the predictable behavior from your code is like expecting the above (int &) f expression to evaluate to 2.
The proper way to really "cast back" your void * pointer would be to do (A *) p, not (A *&) p. The result of (A *) p would indeed be the original pointer value, that can be safely manipulated by pointer arithmetic. The only proper way to obtain the original value as an lvalue would be to use an additional variable
A *pa = (A *) p;
...
pa++;
...
And there's no legal way to create an lvalue "in place", as you attempted to by your (A *&) p cast. The behavior of your code is an illustration of that.
As others have commented, your code appears like it should work. Only once (in 17+ years of coding in C++) I ran across something where I was looking straight at the code and the behavior, like in your case, just didn't make sense. I ended up running the code through debugger and opening a disassembly window. I found what could only be explained as a bug in VS2003 compiler because it was missing exactly one instruction. Simply rearranging local variables at the top of the function (30 lines or so from the error) made the compiler put the correct instruction back in. So try debugger with disassembly and follow memory/registers to see what it's actually doing?
As far as advancing the pointer, you should be able to advance it by doing:
p = (char*)p + sizeof( A );
VS2003 through VS2010 never give you complaints about that, not sure about g++

Does a simple cast to perform a raw copy of a variable break strict aliasing?

I've been reading about strict aliasing quite a lot lately. The C/C++ standards say that the following code is invalid (undefined behavior to be correct), since the compiler might have the value of a cached somewhere and would not recognize that it needs to update the value when I update b;
float *a;
...
int *b = reinterpret_cast<int*>(a);
*b = 1;
The standard also says that char* can alias anything, so (correct me if I'm wrong) compiler would reload all cached values whenever a write access to a char* variable is made. Thus the following code would be correct:
float *a;
...
char *b = reinterpret_cast<char*>(a);
*b = 1;
But what about the cases when pointers are not involved at all? For example, I have the following code, and GCC throws warnings about strict aliasing at me.
float a = 2.4;
int32_t b = reinterpret_cast<int&>(a);
What I want to do is just to copy raw value of a, so strict aliasing shouldn't apply. Is there a possible problem here, or just GCC is overly cautious about that?
EDIT
I know there's a solution using memcpy, but it results in code that is much less readable, so I would like not to use that solution.
EDIT2
int32_t b = *reinterpret_cast<int*>(&a); also does not work.
SOLVED
This seems to be a bug in GCC.
If you want to copy some memory, you could just tell the compiler to do that:
Edit: added a function for more readable code:
#include <iostream>
using std::cout; using std::endl;
#include <string.h>
template <class T, class U>
T memcpy(const U& source)
{
T temp;
memcpy(&temp, &source, sizeof(temp));
return temp;
}
int main()
{
float f = 4.2;
cout << "f: " << f << endl;
int i = memcpy<int>(f);
cout << "i: " << i << endl;
}
[Code]
[Updated Code]
Edit: As user/GMan correctly pointed out in the comments, a full-featured implementation could check that T and U are PODs. However, given that the name of the function is still memcpy, it might be OK to rely on your developers treating it as having the same constraints as the original memcpy. That's up to your organization. Also, use the size of the destination, not the source. (Thanks, Oli.)
Basically the strict aliasing rules is "it is undefined to access memory with another type than its declared one, excepted as array of characters". So, gcc isn't overcautious.
If this is something you need to do often, you can also just use a union, which IMHO is more readable than casting or memcpy for this specific purpose:
union floatIntUnion {
float a;
int32_t b;
};
int main() {
floatIntUnion fiu;
fiu.a = 2.4;
int32_t &x = fiu.b;
cout << x << endl;
}
I realize that this doesn't really answer your question about strict-aliasing, but I think this method makes the code look cleaner and shows your intent better.
And also realize that even doing the copies correctly, there is no guarantee that the int you get out will correspond to the same float on other platforms, so count any network/file I/O of these floats/ints out if you plan to create a cross-platform project.