Today I stumbled over a piece of code that looked horrifying to me. The pieces was chattered in different files, I have tried write the gist of it in a simple test case below. The code base is routinely scanned with FlexeLint on a daily basis, but this construct has been laying in the code since 2004.
The thing is that a function implemented with a parameter passing using references is called as a function with a parameter passing using pointers...due to a function cast. The construct has worked since 2004 on Irix and now when porting it actually do work on Linux/gcc too.
My question now. Is this a construct one can trust? I can understand if compiler constructors implement the reference passing as it was a pointer, but is it reliable? Are there hidden risks?
Should I change the fref(..) to use pointers and risk braking anything in the process?
What do you think?
Edit
In the actual code both fptr(..) and fref(..) use the same struct - changed code below to reflect this better.
#include <iostream>
#include <string.h>
using namespace std;
// ----------------------------------------
// This will be passed as a reference in fref(..)
struct string_struct {
char str[256];
};
// ----------------------------------------
// Using pointer here!
void fptr(string_struct *str)
{
cout << "fptr: " << str->str << endl;
}
// ----------------------------------------
// Using reference here!
void fref(string_struct &str)
{
cout << "fref: " << str.str << endl;
}
// ----------------------------------------
// Cast to f(const char*) and call with pointer
void ftest(void (*fin)())
{
string_struct str;
void (*fcall)(void*) = (void(*)(void*))fin;
strcpy(str.str, "Hello!");
fcall(&str);
}
// ----------------------------------------
// Let's go for a test
int main() {
ftest((void (*)())fptr); // test with fptr that's using pointer
ftest((void (*)())fref); // test with fref that's using reference
return 0;
}
What to you think?
Clean it up. That's undefined behavior and thus a bomb which might blow up anytime. A new platform or compiler version (or moon phase, for that matter) could trip it.
Of course, I don't know what the real code looks like, but from your simplified version it seems that the easiest way would be to give string_struct an implicit constructor taking a const char*, templatize ftest() on the function pointer argument, and remove all the casts involved.
It's obviously a horrible technique, and formally it's undefined behaviour and a serious error to call a function through an incompatible type, but it should "work" in practice on a normal system.
At the machine level, a reference and a pointer have exactly the same representation; they are both just the address of something. I would fully expect that fptr and fref compile to exactly the same thing, instruction for instruction, on any computer you could get your hands on. A reference in this context can simply be thought of as syntactic sugar; a pointer that is auto-dereferenced for you. At the machine level they are exactly the same. Obviously there might be some obscure and/or defunct platforms where that might not be the case, but generally speaking that's true 99% of the time.
Furthermore, on most common platforms, all object pointers have the same representation, as do all function pointers. What you've done really isn't all that different from calling a function expecting an int through a type taking a long, on a platform where those types have the same width. It's formally illegal, and all but guaranteed to work.
It can even be inferred from the definition of malloc that all object pointers have the same representation; I can malloc a huge chunk of memory, and stick any (C-style) object I like there. Since malloc only returned one value, but that memory can be reused for any object type I like, it's hard to see how different object pointers could reasonably use different representations, unless the compiler was maintaining an big set of value-representation mappings for every possible type.
void *p = malloc(100000);
foo *f = (foo*)p; *f = some_foo;
bar *b = (bar*)p; *b = some_bar;
baz *z = (baz*)p; *z = some_baz;
quux *q = (quux*)p; *q = some_quux;
(The ugly casts are necessary in C++). The above is required to work. So while I don't think it is formally required that afterwards memcmp(f, b) == memcmp(z, q) == memcmp(f, q) == 0, but it's hard to imagine a sane implementation that could make those false.
That being said, don't do this!
It works by pure chance.
fptr expects a const char * while fref expects a string_struct &.
The struct string_struct have the same memory layout as the const char * since it only contains a 256 bytes char array, and does not have any virtual members.
In c++, call by reference e.g. string_struct & is implemented by passing a hidden pointer to the reference so on the call stack it will be the same as if it was passed as a true pointer.
But if the structure string_struct changes, everything will break so the code is not considered safe at all. Also it is dependent on compiler implementation.
Let's just agree that this is very ugly and you're going to change that code.
With the cast you promise that you make sure the types match and they clearly don't.
At least get rid of the C-style cast.
Related
I come from a java background but am now working on large C++ code bases. I often see this pattern:
void function(int value, int& result);
And above method is called like so:
int result = 0;
function(42, result);
std::cout << "Result is " << result << std::endl;
In java, the following would be more common:
int result = function(42);
Although the above is perfectly possible in C++, how come the former appears more common (in the codebase I'm working on at least)? Is it stylistic or something more?
First, this used to be an established technique to have more than one output of a function. E.g. in this signature,
int computeNumberButMightFail(int& error_code);
you would have both the payload int as the return value, and a reference to some error variable that is set from within the function to signal an error. It is clear these days that there are better techniques, e.g. std::optional<T> is a good return value, there might be a more flexible std::expected<T, ...>, and with newer C++ standards, we can return multiple values with std::make_tuple and destructure them at the call side with structured bindings. For exceptional error scenarios, the usual approach is to use... well... exceptions.
Second, this is an optimization technique from the days in which (N)RVO wasn't widely available: if the output of a function is an object that is expensive to copy, you wanted to make sure no unnecessary copies are made:
void fillThisHugeBuffer(std::vector<LargeType>& output);
means we pass a reference to the data in order to avoid an unnessecary copy when returning it by value. However, this is outdated, too, and returning large objects by value is usually considered the more idiomatic approach, because C++17 guarantees something called materialization of temporaries and (name) return value optimization is implemented by all major compilers.
See also the core guidelines:
F.20 - "For “out” output values, prefer return values to output parameters".
as far as I know, this case is not common in C++ at least not with primitive data types as return values. There are a few cases to consider:
If you working with plain C or in a very restricted context, where C++ exceptions are not allowed (like realtime applications). Then the return value of a function is often used to indicate the success of a function. An in C could be:
#include <stdio.h>
#include <errno.h>
int func(int arg, int* res) {
if(arg > 10) {
return EINVAL; //this is an error code from errnoe
}
... //do stuff
*res = my_result;
}
This is sometimes used in C++ as well and so the result must by assigned by reference/pointer.
When your result is struct or object which exists before the call of your function and the purpose of your function is to modify attributes inside the struct or object. This is a common pattern because you have to pass the argument by reference (to avoid a copy) anyway. So it is not necessary to return the same object as you pass to the function. An example in C++ could be:
#include <iostream>
struct Point {
int x = 0;
int y = 0;
};
void fill_point(Point& p, int x, int y) {
p.x = x;
p.y = y;
}
int main() {
Point p();
fill_point(p);
return EXIT_SUCCESS;
}
However, this is a trivial and there are better solutions like defining the fill-function as a method in the object. But sometimes with regard to the single-responsible paradigm of objects this pattern is common under more complex circumstances.
In Java you can't control your heap. Every object you define is on the heap and automatically passed by reference to a function. In C++ you have the choice where you wan't your object stored (heap or stack) and how to pass the object to a function. It is important to keep in mind that a pass by value of an object copies it and returning an object from a function by value also copies the object. For returning an object by reference you have to ensure that its lifecycle exceeds the scope of your function by placing it on the heap or by passing it to the function by reference.
Modifiable parameters that receive values as a side effect of a function call are called out parameters. They are generally accepted as a bit archaic, and have fallen somewhat out of fashion as better techniques are available in C++. As you suggested, returning computed values from functions is the ideal.
But real-world constraints sometimes drive people toward out parameters:
returning objects by value is too expensive due to the cost of copying large objects or those with non-trivial copy constructors
returning multiple values, and creating a tuple or struct to contain them is awkward, expensive, or not possible.
When objects cannot be copied (possible private or deleted copy constructor) but must be created "in place"
Most of these issues face legacy code, because C++11 gained "move semantics" and C++17 gained "guaranteed copy elision" which obviate most of these cases.
In any new code, it's usually considered bad style or a code smell to use out parameters, and most likely an acquired habit that carried over from the past (when this was a more relevant technique.) It's not wrong, but one of those things we try to avoid if it's not strictly necessary.
There are several reasons why an out parameter might be used in a C++ codebase.
For example:
You have multiple outputs:
void compute(int a, int b, int &x, int &y) { x=a+b; y=a-b; }
You need the return value for something else: For example, in PEG parsing you might find something like this:
if (parseSymbol(pos,symbolName) && parseToken(pos,"=") && parseExpression(pos,exprNode)) {...}
where the parse functions look like
bool parseSymbol(int &pos, string &symbolName);
bool parseToken(int &pos, const char *token);
and so on.
To avoid object copies.
Programmer didn't knew better.
But basically I think, any answer is opinion based, because it's matter of style and coding policies if and how out-parameters are used or not.
Here is my code.
class IService {
};
class X_Service {
public:
void service1() {
std::cout<< "Service1 Running..."<<std::endl;
}
};
int main() {
IService service;
auto func = reinterpret_cast<void (IService::*)()>(&X_Service::service1);
(service.*(func))();
return 0;
}
I don't understand how this works. I didn't inherit IService and didn't create a X_Service object but it works.
Can someone explain this?
Your confusion probably comes from the misunderstanding that because something compiles and runs without crashing, it "works". Which is not really true.
There are many ways you can break the rules of the language and still write code that compiles and runs. By using reinterpret_cast here and making an invalid cast you have broken the rules of the language, and your program has Undefined Behaviour.
That means it can seem to work, it can crash or it can just do something completely different from what you intended.
In your case it seems to work, but it's still UB and the code is not valid.
Under the hood your compiler will turn all these functions into machine code that is basically just jumps to certain addresses in memory and then executing commands stored there.
A member function is just a function that has in addition to local variables and parameters, a piece of memory that stores the address of the class object. That piece of memory holds the address you are accessing when you use the this keyword.
If you call a member function on a wrong object or nullptr, then you basically just make the this pointer point to something invalid.
Your function doesn't access this, which is the reason your program doesn't blow up.
That said, this is still undefined behavior, and anything could happen.
So, I had some fun and manipulated the code a bit. This is also an empirical answer. There are a lot of pitfalls that risk stack corruption with this way of doing things, so I changed the code a bit to make it to where stack corruption does not occur but kind of show what it happening.
#include <iostream>
class IService {
public:
int x;
};
class X_Service {
public:
int x;
void service1() {
this->x = 65;
std::cout << this->x << std::endl;
}
};
int main() {
IService service;
auto func = reinterpret_cast<void (IService::*)()>(&X_Service::service1);
(service.*(func))();
std::cout << service.x << std::endl;
std::cin.get();
X_Service derp;
(derp.service1)();
std::cout << derp.x << std::endl;
return 0;
}
So from the outset, auto gave you the power to make a none type safe pointer void (IService::*)()also the instance of the object itself is this-> regardless of what member function of whatever class you are stealth inheriting from. The only issue is that the first variable of the instance is interpreted based on the first variable of the class you are stealth inheriting from, which can lead to stack corruption if the type differs.
Ways to get cool output but inevitably cause stack corruption, you can do the following fun things.
class IService {
public:
char x;
};
Your IDE will detect stack corruption of your IService object, but getting that output of
65
A
is kind of worth it, but you will see that issues will arise doing this stealth inheritance.
I'm also on an 86x compiler. So basically my variable are all lined up. Say for instance if I add an int y above int x in Iservice, this program would output nonsense. Basically it only works because my classes are binary compatible.
When you reinterpret_cast a function or member function pointer to a different type, you are never allowed to call the resulting pointer except if you cast it back to its original type first and call through that.
Violating this rule causes undefined behavior. This means that you loose any language guarantee that the program will behave in any specific way, absent additional guarantees from your specific compiler.
reinterpret_cast is generally dangerous because it completely circumvents the type system. If you use it you need to always verify yourself by looking at the language rules, whether the cast and the way the result will be used is well-defined. reinterpret_cast tells the compiler that you know what you are doing and that you don't want any warning or error, even if the result will be non-sense.
I was surprised by the results I'm seeing in VC++ 2015, and need help understanding how it works.
struct MyType
{
MyType(int x_) : x(x_) { }
int x;
};
auto u = std::make_unique<MyType>(10);
void* pv = &u;
This obviously fails because u's address is not a pointer to MyType:
MyType *pM = (MyType*)pv;
But this works, pM2 gets the address of the MyType object stored in u:
MyType** ppM = (MyType**)pv;
MyType* pM2 = *ppM;
Is there anything in the standard that says this is supposed to work? Or is it only working due to a non-portable implementation detail of my compiler? Something that allows me to treat unique_ptr like a pointer-to-pointer in a round about way?
And before you say, "that's stupid, don't use void* or C-style casts", please understand that I'm working with legacy code that handles serialization of structs through void pointers and offsets to struct members. I can't change that part right now. But I want to use a unique_ptr for a struct member to simplify memory ownership and cleanup. And I'd like to know how fragile my unique_ptr is in this legacy environment.
This is basically just you getting lucky.
In the ABI of your particular compiler, the T* that stores the object maintained by the unique_ptr is the first member of the object, so it has the same address as the object itself. In much the same way as this example:
struct container {
int val;
};
int main() {
container c{15};
intptr_t val1 = reinterpret_cast<intptr_t>(&c);
intptr_t val2 = reinterpret_cast<intptr_t>(&(c.val));
assert(val1 == val2); //will pretty much always be true
}
Of course, this is not behavior you should depend on! It's unspecified by the Standard, and could change if the vendor decides they have a better format for storing pointers inside std::unique_ptr.
Essentially you are doing something like this:
std::unique_ptr<MyType> up = ...;
MyType* p = *reinterpret_cast<MyType**>(&up);
With some detours and C-style casts. You take the pointer to the unique_ptr and reinterpret it as a pointer to pointer of MyType
This is pure luck and results in undefined behavior, you shouldn't use this type of code for any reason. If you need the internal pointer use the get() method on unique_ptr.
This is undefined behavior that happens to work because the unique pointer happens to store only a single pointer as its state, and that state is a pointer to T.
Undefined behavior can do anything, including time travel and format your hard drive. I know people say this and others think it is a joke, but these are actually true statements you can experimentally verify.
As it happens, your undefined behavior here has reinterpreted some memory in a way that "works".
You cannot serialize/deserialize non-pod structures in a defined way using your library. You can hack it to work, but any compiler update (even a compiler flag update!) could suddenly behave completely differently.
Consider having a structure for serialization/deserialization, and another for runtime use. Marshall from one to the other. Yes, this sucks.
The C++ language doesn't let you change a reference after it is assigned. However, I had a debugging need/desire to change the reference to help debug something. Is there a hacky way to basically overwrite the reference implementation with a new pointer? Once you get an address to the object you want to change, you can cast it to whatever you want and overwrite it. I could not figure out how to get a memory address of the underlying reference instance; using & to dereference the reference doesn't give you the address of the reference, but the address of the object pointed to by the reference.
I realize this is obviously going to invoke undefined behavior, and this is just an experiment. A third party library has a bug with global reference that was not getting constructed before the code is exercised, and I want to see if I can fix it by setting the reference myself. At this point, it became a challenge to see if it is even possible. I know you can do this in assembly language, if you can reference the symbol table directly.
I imagine something like this. These are globally scoped variables.
Apple a;
Apple& ref = a;
Later I want ref to refer to a new object instance b and leave a alone.
Apple b;
ref = b; // that doesn't work. that justs sets a=b.
&ref = &b; // that doesn't work. the compiler complains.
uint64_t addr = find_symbol_by_any_means_necessary(ref);
*(Apple**)addr = &b; // this should work if I could get addr
Please don't remind me this is a bad idea. I know it is a bad idea. Think of it as a challenge. This is for debug only, to test a hypotheses quickly. I want to learn something about the internals of C++ binary code. (Please tell me if it is impossible because of system page protection... I suppose you could get a seg fault if the references are placed in a holy place).
(The system is CentOS 7, compiler is Intel although I could use gcc for this experiment).
I don't think there is a way to re-direct the object that a standalone reference variable references.
If a reference is contained in a struct as a member variable, you can easily change the object the reference variable references. It's most likely UB but it works with my current version of g++, g++ 4.8.4.
Here's an example program that demonstrates a method.
#include <iostream>
#include <cstring>
struct Foo
{
int& ref;
};
int main()
{
int a = 10;
int b = 20;
Foo foo = {a}; // foo.ref is a reference to a
std::cout << foo.ref << std::endl;
// Use memcpy to change what foo.ref references
int* bPtr = &b;
std::memcpy(&foo, &bPtr, sizeof(bPtr));
// Now, foo.ref is a reference to b
std::cout << foo.ref << std::endl;
// Changing foo.ref changes b
foo.ref = 30;
std::cout << b << std::endl;
}
Output:
10
20
30
it's generally not possible, because the fact that a reference is not reseatable allows a lot of optimizations.
For instance, a reference is often implemented as a pointer, but the compiler may also notice that you often use an fixed offset. So besides storing the pointer, the compiler may decide to store the pointer plus offset. It may even decide to only store the pointer plus offset.
Another optimization is to store the reference as an address in a CPU register. Since it can't change, the compiler doesn't need to reload it.
So your statement that you can change it in assembly is rather misleading. You have no idea what the representation of the reference is after optimization, and this optimization will be situationally dependent.
I need a once-and-for-all clarification on passing by value/pointer/reference.
If I have a variable such as
int SomeInt = 10;
And I want to pass it to a function like
void DoSomething(int Integer)
{
Integer = 1;
}
In my current scenario when passing SomeInt to DoSomething() I want SomeInt's value to be updated based on whatever we do to it inside of DoSomething() as well as be most efficient on memory and performance so I'm not copying the variable around?. That being said which of the following prototypes would accomplish this task?
void DoSomething(int* Integer);
void DoSomething(int& Integer);
How would I actually pass the variable into the function? What is the difference between the previous two prototypes?
Finally if using a function within a class
class SomeClass
{
int MyInteger;
public:
void ChangeValue(int& NewValue)
{
MyInteger = NewValue;
}
};
If I pass an integer into ChangeValue, when the integer I passed in get's deleted will that mean when I try to use MyInteger from within the class it will no longer be useable?
Thank you all for your time, I know this is kind of a basic question but the explanations I keep running into confuse me further.
Functionally, all three of these work:
pass an int and change the return type to int so you can return the new value, usage: x = f(x);
when you plan to set the value without needing to read the initial value, it's much better to use a function like int DoSomething(); so the caller can just say int x = f(); without having to create x on an earlier line and wondering/worrying whether it needs to be initialised to anything before the call.
pass an int& and set it inside the function, usage: int x; x = ? /* if an input */; f(x);
pass an int* and set the pointed-to int inside the function, usage: int x; x = ?; f(&x);
most efficient on memory and performance so I'm not copying the variable around
Given the C++ Standard doesn't dictate how references should be implemented by the compiler, it's a bit dubious trying to reason about their characteristics - if you care compile your code to assembly or machine code and see how it works out on your particular compiler (for specific compiler commandline options etc.). If you need a rule of thumb, assume that references have identical performance characteristics to pointers unless profiling or generated-code inspection suggests otherwise.
For an int you can expect the first version above to be no slower than the pointer version, and possibly be faster, because the int parameter can be passed and returned in a register without ever needing a memory address.
If/when/where the by-pointer version is inlined there's more chance that the potentially slow "needing a memory address so we can pass a pointer" / "having to dereference a pointer to access/update the value" aspect of the pass-by-pointer version can be optimised out (if you've asked the compiler to try), leaving both versions with identical performance....
Still, if you need to ask a question like this I can't imagine you're writing code where these are the important optimisation choices, so a better aim is to do what gives you the cleanest, most intuitive and robust usage for the client code... now - whether that's x = f(x); (where you might forget the leading x =), or f(x) where you might not realise x could be modified, or f(&x) (where some caller might think they can pass nullptr is a reasonable question in its own right, but separate from your performance concerns. FWIW, the C++ FAQ Lite recommends references over pointers for this kind of situation, but I personally reject its reasoning and conclusions - it all boils down to familiarity with either convention, and how often you need to pass const pointer values, or pointer values where nullptr is a valid sentinel, that could be confused with the you-may-modify-me implication hoped for in your scenario... that depends a lot on your coding style, libraries you use, problem domain etc..
Both of your examples
void DoSomething(int* Integer);
void DoSomething(int& Integer);
will accomplish the task. In the first case - with pointer - you need to call the function with DoSomething(&SomeInt);, in the second case - with reference - simpler as DoSomething(SomeInt);
The recommended way is to use references whenever they are sufficient, and pointers only if they are necessary.
You can use either. Function call for first prototype would be
DoSomething(&SomeInt);
and for second prototype
DoSomething(SomeInt);
As was already said before, you can use both. The advantage of the
void DoSomething(int* Integer)
{
*Integer=0xDEADBEEF;
}
DoSomething(&myvariable);
pattern is that it becomes obvious from the call that myvariable is subject to change.
The advantage of the
void DoSomething(int& Integer)
{
Integer=0xDEADBEEF;
}
DoSomething(myvariable);
pattern is that the code in DoSomething is a bit cleaner, DoSomething has a harder time to mess with memory in bad ways and that you might get better code out of it. Disadvantage is that it isn't immediately obvious from reading the call that myvariable might get changed.