How can a C++ reference be changed (by any means necessary) - c++

The C++ language doesn't let you change a reference after it is assigned. However, I had a debugging need/desire to change the reference to help debug something. Is there a hacky way to basically overwrite the reference implementation with a new pointer? Once you get an address to the object you want to change, you can cast it to whatever you want and overwrite it. I could not figure out how to get a memory address of the underlying reference instance; using & to dereference the reference doesn't give you the address of the reference, but the address of the object pointed to by the reference.
I realize this is obviously going to invoke undefined behavior, and this is just an experiment. A third party library has a bug with global reference that was not getting constructed before the code is exercised, and I want to see if I can fix it by setting the reference myself. At this point, it became a challenge to see if it is even possible. I know you can do this in assembly language, if you can reference the symbol table directly.
I imagine something like this. These are globally scoped variables.
Apple a;
Apple& ref = a;
Later I want ref to refer to a new object instance b and leave a alone.
Apple b;
ref = b; // that doesn't work. that justs sets a=b.
&ref = &b; // that doesn't work. the compiler complains.
uint64_t addr = find_symbol_by_any_means_necessary(ref);
*(Apple**)addr = &b; // this should work if I could get addr
Please don't remind me this is a bad idea. I know it is a bad idea. Think of it as a challenge. This is for debug only, to test a hypotheses quickly. I want to learn something about the internals of C++ binary code. (Please tell me if it is impossible because of system page protection... I suppose you could get a seg fault if the references are placed in a holy place).
(The system is CentOS 7, compiler is Intel although I could use gcc for this experiment).

I don't think there is a way to re-direct the object that a standalone reference variable references.
If a reference is contained in a struct as a member variable, you can easily change the object the reference variable references. It's most likely UB but it works with my current version of g++, g++ 4.8.4.
Here's an example program that demonstrates a method.
#include <iostream>
#include <cstring>
struct Foo
{
int& ref;
};
int main()
{
int a = 10;
int b = 20;
Foo foo = {a}; // foo.ref is a reference to a
std::cout << foo.ref << std::endl;
// Use memcpy to change what foo.ref references
int* bPtr = &b;
std::memcpy(&foo, &bPtr, sizeof(bPtr));
// Now, foo.ref is a reference to b
std::cout << foo.ref << std::endl;
// Changing foo.ref changes b
foo.ref = 30;
std::cout << b << std::endl;
}
Output:
10
20
30

it's generally not possible, because the fact that a reference is not reseatable allows a lot of optimizations.
For instance, a reference is often implemented as a pointer, but the compiler may also notice that you often use an fixed offset. So besides storing the pointer, the compiler may decide to store the pointer plus offset. It may even decide to only store the pointer plus offset.
Another optimization is to store the reference as an address in a CPU register. Since it can't change, the compiler doesn't need to reload it.
So your statement that you can change it in assembly is rather misleading. You have no idea what the representation of the reference is after optimization, and this optimization will be situationally dependent.

Related

Reference member variable memory size optimization [duplicate]

I have been told that references, when they are data members of classes, they occupy memory since they will be transformed into constant pointers by the compiler. Why is that? Like why does the compiler(I know that it is implementation-specific in general) make a reference a pointer when they are part of a class, as opposed to when they are a temporary variable?
So in this code:
class A{
public:
A(int &refval):m_ref(refval){};
private:
int &m_ref;
}
m_ref will be treated as a constant pointer(i.e. they do occupy memory).
However, in this code:
void func(int &a){
int &a_ref = a;
}
the compiler just replaces the reference with the actual variable(i.e. they do not occupy memory).
So to simplify a little, my question basically is: What makes it more meaningful to make references into constant pointers when they are data members than when they are temporary variables?
The C++ standard only defines the semantics of a reference, not how they are actually implemented. So all answers to this question are compiler-specific. A (silly, but compliant) compiler might choose to store all references on the hard-disk. It's just that it proved to be the most convenient/efficient to store a reference as a constant pointer for class members, and replace the occurence of the reference with the actual thing where possible.
As an example for a situation where it is impossible for the compiler to decide at compile time to which object a reference is bound, consider this:
#include <iostream>
bool func() {
int i;
std::cin >> i;
return i > 5;
}
int main() {
int a = 3, b = 4;
int& r = func() ? a : b;
std::cout << r;
}
So in general a program has to store some information about references at runtime, and sometimes, for special cases, it can prove at compile time what a reference is bound to.
The reference (or pointer) has to be stored in memory somewhere, so why not store it along with the rest of the class?
Even with your example, the parameter a (int &a) is stored in memory (probably on the stack), then a_ref doesn't use any more memory, it's just an alias, but there is memory used by a.
Imagine that a class is just a user defined data type. You need to have something which can lead you to the actual thing that you are referencing.
Using the actual value in the second case is more about the compiler and his work to optimize your code.
A reference should be an alias to some variable and why should this alias use memory when it could be optimized to be taken directly from the stack.

Declare stack variable without specifying the name and get the pointer

It's known that defining a heap variable with new gets the pointer without specifying the name:
Var *p = new Var("name", 1);
But I have to clear the variable pointed to by p with delete p later on in the program.
I want to declare a stack variable so it is automatically cleared after function exits, but I only want to get the pointer, and the following:
Var v("name", 1);
Var *p = &v;
is quite tedious, and specifier v will never be referenced.
Can I declare a stack class instance and get its pointer without specifying its name?
There's two questions hidden in here. The first one is:
Var *p = new Var("name", 1);
But I have to clear the variable pointed to by p with delete p later
on in the program.
I want to declare a stack variable so it is automatically cleared
after function exits
So here, you're asking how to allocate memory without having to explicitly clean it up afterwards. The solution is to use std::unique_ptr:
std::unique_ptr<Var> p(new Var("name", 1));
Voila! unique_ptr will automatically clean itself up, it has virtually no overhead compared to a raw pointer, and it's overloaded the * and -> operators so you can use it just like a raw pointer. Search for "C++11 smart pointers" if you want to know more.
The second question is:
I only want to get the pointer, and the following:
Var v("name", 1);
Var *p = &v;
is quite tedious, and specifier v will never be referenced.
The important point here is that Var *p = &v is completely unnecessary. If you have a function that requires a pointer, you can use &v on the spot:
void SomeFunc(const Var* p);
// ...
Var v("name", 1);
SomeFunc(&v);
There's no need to put &v in a separate variable before passing it into a function that requires a pointer.
The exception is if the function takes a reference to a pointer (or a pointer to a pointer):
void SomeFunc2(Var*& p);
void SomeFunc3(Var** p);
These types of functions are rare in modern C++, and when you see them, you should read the documentation for that function very carefully. More often than not, those functions will allocate memory, and you'll have to free it explicitly with some other function.
There's no way to do this by allocating on the stack. However, you can use std::make_shared for the heap:
#include <memory>
std::shared_ptr<Var> p = std::make_shared<Var>();
At the cost/risk of being more confusing, you can avoid repeating the type in your code in the question ala:
Var v("name", 1), *p = &v;
You could also potentially use alloca, which is provided by most systems and returns a pointer to stack-allocated memory, but then you have to go through a separate painful step to placement new an object into that memory and do your own object destruction. alloca needs to be called inside the function so it's the function stack on which the object is created, and not during the preparation of function arguments (as the variable's memory may be embedded in the stack area the compiler's using to prepare function arguments), which makes it tricky to wrap into some easily reused facility. You could use macros, but they're evil (see Marshall Cline's C++ FAQ for an explanation of that). Overall - not worth the pain....
Anyway, I recommend sticking with the code in your question and not over-thinking this: using &v a few times tends to be easier, and when it's not it's normally not a big deal if there's an unnecessary identifier for the stack-based variable.
I don't think there is a way to overcome it without some overhead (like the shared_ptr). so the shortest way to write it will be:
Var v("name", 1), *p = &v;
Yes, it's possible to return an address to a temporary (i.e. stack) object and assign it to a pointer. However, the compiler might actually discard the object (i.e. cause that section in memory to be overwritten) before the end of the current scope. (TO CLARIFY: THIS MEANS DON'T DO THIS. EVER.) See the discussion in the comments below about the behavior observed in different versions of GCC on different operating systems. (I don't know whether or not the fact that version 4.5.3 only gives a warning instead of an error indicates that this will always be "safe" in the sense that the pointer will be valid everywhere within the current scope if you compile with that particular version of GCC, but I wouldn't count on it.)
Here's the code I used (modified as per Jonathan Leffler's suggestion):
#include <stdio.h>
class Class {
public:
int a;
int b;
Class(int va, int vb){a = va; b = vb;}
};
int main(){
Class *p = &Class(1, 2);
Class *q = &Class(3, 4);
printf("%p: %d,%d\n", (void *)p, p->a, p->b);
printf("%p: %d,%d\n", (void *)q, q->a, q->b);
}
When compiled using GCC 4.5.3 and run (on Windows 7 SP1), this code printed:
0x28ac28: 1,2
0x28ac30: 3,4
When compiled using GCC 4.7.1 and run (on Mac OS X 10.8.3), it printed:
0x7fff51cd04c0: 0,0
0x7fff51cd04d0: 1372390648,32767
In any case, I'm not sure why you wouldn't just declare the variable normally and use &v everywhere you need something "pointer-like" (for instance, in functions that require a pointer as an argument).

Checking for a null reference?

Lets say you have something like this:
int& refint;
int* foo =0;
refint = *foo;
How could you verify if the reference is NULL to avoid a crash?
You can't late-initialize a reference like that. It has to be initialized when it's declared.
On Visual C++ I get
error C2530: 'refint' : references
must be initialized
with your code.
If you 'fix' the code, the crash (strictly, undefined behaviour) happens at reference usage time in VC++ v10.
int* foo = 0;
int& refint(*foo);
int i(refint); // access violation here
The way to make this safe is to check the pointer at reference initialization or assignment time.
int* foo =0;
if (foo)
{
int& refint(*foo);
int i(refint);
}
though that still does not guarantee foo points to usable memory, nor that it remains so while the reference is in scope.
You don't, by the time you have a "null" reference you already have undefined behaviour. You should always check whether a pointer is null before trying to form a reference by dereferencing the pointer.
(Your code is illegal; you can't create an uninitialized reference and try and bind it by assigning it; you can only bind it during initialization.)
In general, you can't.
Whoever "creates a null reference" (or tries to, I should say) has already invoked undefined behavior, so the code might (or might not) crash before you get a chance to check anything.
Whoever created the reference should have done:
int *foo = 0;
if (foo) {
int &refint = *foo;
... use refint for something ...
}
Normally it's considered the caller's problem if they've written *foo when foo is null, and it's not one function's responsibility to check for that kind of error in the code of other functions. But you could litter things like assert(&refint); through your code. They might help catch errors made by your callers, since after all for any function you write there's a reasonable chance the caller is yourself.
All the answers above are correct, but if for some reason you want to do this I thought at least one person should provide an answer. I am currently trying to track down a bad reference in some source code and it would be useful to see if someone has deleted this reference and set it to null at some point. Hopefully this wont generate to many down votes.
#include <iostream>
int main()
{
int* foo = nullptr;
int& refint = *foo;
if(&refint == nullptr)
std::cout << "Null" << std::endl;
else
std::cout << "Value " << refint << std::endl;
}
Output:
Null
To make the above code compile, you will have to switch the order:
int* foo =0;
int& refint = *foo; // on actual PCs, this code will crash here
(There may be older processor or runtime architectures where this worked.)
....saying all of the above, if you do want to have a null reference, use boost::optional<>, works like a charm..
You don't need to, references cannot be null.
Read the manual.

Using a function with reference as a function with pointers?

Today I stumbled over a piece of code that looked horrifying to me. The pieces was chattered in different files, I have tried write the gist of it in a simple test case below. The code base is routinely scanned with FlexeLint on a daily basis, but this construct has been laying in the code since 2004.
The thing is that a function implemented with a parameter passing using references is called as a function with a parameter passing using pointers...due to a function cast. The construct has worked since 2004 on Irix and now when porting it actually do work on Linux/gcc too.
My question now. Is this a construct one can trust? I can understand if compiler constructors implement the reference passing as it was a pointer, but is it reliable? Are there hidden risks?
Should I change the fref(..) to use pointers and risk braking anything in the process?
What do you think?
Edit
In the actual code both fptr(..) and fref(..) use the same struct - changed code below to reflect this better.
#include <iostream>
#include <string.h>
using namespace std;
// ----------------------------------------
// This will be passed as a reference in fref(..)
struct string_struct {
char str[256];
};
// ----------------------------------------
// Using pointer here!
void fptr(string_struct *str)
{
cout << "fptr: " << str->str << endl;
}
// ----------------------------------------
// Using reference here!
void fref(string_struct &str)
{
cout << "fref: " << str.str << endl;
}
// ----------------------------------------
// Cast to f(const char*) and call with pointer
void ftest(void (*fin)())
{
string_struct str;
void (*fcall)(void*) = (void(*)(void*))fin;
strcpy(str.str, "Hello!");
fcall(&str);
}
// ----------------------------------------
// Let's go for a test
int main() {
ftest((void (*)())fptr); // test with fptr that's using pointer
ftest((void (*)())fref); // test with fref that's using reference
return 0;
}
What to you think?
Clean it up. That's undefined behavior and thus a bomb which might blow up anytime. A new platform or compiler version (or moon phase, for that matter) could trip it.
Of course, I don't know what the real code looks like, but from your simplified version it seems that the easiest way would be to give string_struct an implicit constructor taking a const char*, templatize ftest() on the function pointer argument, and remove all the casts involved.
It's obviously a horrible technique, and formally it's undefined behaviour and a serious error to call a function through an incompatible type, but it should "work" in practice on a normal system.
At the machine level, a reference and a pointer have exactly the same representation; they are both just the address of something. I would fully expect that fptr and fref compile to exactly the same thing, instruction for instruction, on any computer you could get your hands on. A reference in this context can simply be thought of as syntactic sugar; a pointer that is auto-dereferenced for you. At the machine level they are exactly the same. Obviously there might be some obscure and/or defunct platforms where that might not be the case, but generally speaking that's true 99% of the time.
Furthermore, on most common platforms, all object pointers have the same representation, as do all function pointers. What you've done really isn't all that different from calling a function expecting an int through a type taking a long, on a platform where those types have the same width. It's formally illegal, and all but guaranteed to work.
It can even be inferred from the definition of malloc that all object pointers have the same representation; I can malloc a huge chunk of memory, and stick any (C-style) object I like there. Since malloc only returned one value, but that memory can be reused for any object type I like, it's hard to see how different object pointers could reasonably use different representations, unless the compiler was maintaining an big set of value-representation mappings for every possible type.
void *p = malloc(100000);
foo *f = (foo*)p; *f = some_foo;
bar *b = (bar*)p; *b = some_bar;
baz *z = (baz*)p; *z = some_baz;
quux *q = (quux*)p; *q = some_quux;
(The ugly casts are necessary in C++). The above is required to work. So while I don't think it is formally required that afterwards memcmp(f, b) == memcmp(z, q) == memcmp(f, q) == 0, but it's hard to imagine a sane implementation that could make those false.
That being said, don't do this!
It works by pure chance.
fptr expects a const char * while fref expects a string_struct &.
The struct string_struct have the same memory layout as the const char * since it only contains a 256 bytes char array, and does not have any virtual members.
In c++, call by reference e.g. string_struct & is implemented by passing a hidden pointer to the reference so on the call stack it will be the same as if it was passed as a true pointer.
But if the structure string_struct changes, everything will break so the code is not considered safe at all. Also it is dependent on compiler implementation.
Let's just agree that this is very ugly and you're going to change that code.
With the cast you promise that you make sure the types match and they clearly don't.
At least get rid of the C-style cast.

Why Can't I store references in a `std::map` in C++?

I understand that references are not pointers, but an alias to an object. However, I still don't understand what exactly this means to me as a programmer, i.e. what are references under the hood?
I think the best way to understand this would be to understand why it is I can't store a reference in a map.
I know I need to stop thinking of references as syntactic suger over pointers, just not sure how to :/
They way I understand it, references are implemented as pointers under the hood. The reason why you can't store them in a map is purely semantic; you have to initialize a reference when it's created and you can't change it afterward anymore. This doesn't mesh with the way a map works.
You should think of a reference as a 'const pointer to a non-const object':
MyObject& ~~ MyObject * const
Furthermore, a reference can only be built as an alias of something which exists (which is not necessary for a pointer, though advisable apart from NULL). This does not guarantee that the object will stay around (and indeed you might have a core when accessing an object through a reference if it is no more), consider this code:
// Falsifying a reference
MyObject& firstProblem = *((MyObject*)0);
firstProblem.do(); // undefined behavior
// Referencing something that exists no more
MyObject* anObject = new MyObject;
MyObject& secondProblem = *anObject;
delete anObject;
secondProblem.do(); // undefined behavior
Now, there are two requirements for a STL container:
T must be default constructible (a reference is not)
T must be assignable (you cannot reset a reference, though you can assign to its referee)
So, in STL containers, you have to use proxys or pointers.
Now, using pointers might prove problematic for memory handling, so you may have to:
use smart pointers (boost::shared_ptr for example)
use a specialized container: Boost Pointer Container Library
DO NOT use auto_ptr, there is a problem with assignment since it modifies the right hand operand.
Hope it helps :)
The important difference apart from the syntactic sugar is that references cannot be changed to refer to another object than the one they were initialized with. This is why they cannot be stored in maps or other containers, because containers need to be able to modify the element type they contain.
As an illustration of this:
A anObject, anotherObject;
A *pointerToA=&anObject;
A &referenceToA=anObject;
// We can change pointerToA so that it points to a different object
pointerToA=&anotherObject;
// But it is not possible to change what referenceToA points to.
// The following code might look as if it does this... but in fact,
// it assigns anotherObject to whatever referenceToA is referring to.
referenceToA=anotherObject;
// Has the same effect as
// anObject=anotherObject;
actually you can use references in a map. i don't recommend this for big projects as it might cause weird compilation errors but:
map<int, int&> no_prob;
int refered = 666;
no_prob.insert(std::pair<int, int&>(0, refered)); // works
no_prob[5] = 777; //wont compile!!!
//builds default for 5 then assings which is a problem
std::cout << no_prob[0] << std::endl; //still a problem
std::cout << no_prob.at(0) << std::endl; //works!!
so you can use map but it will be difficult to guaranty it will be used correctly, but i used this for small codes (usually competitive) codes
A container that stores a reference has to initialize all its elements when constructed and therefore is less useful.
struct container
{
string& s_; // string reference
};
int main()
{
string s { "hello" };
//container {}; // error - object has an uninitialized reference member
container c { s }; // Ok
c.s_ = "bye";
cout << s; // prints bye
}
Also, once initialized, the storage for the container elements cannot be changed. s_ will always refer to the storage of s above.
This post explains how pointers are implemented under the hood - http://www.codeproject.com/KB/cpp/References_in_c__.aspx, which also supports sebastians answer.