Upfront disclosure: I think the entire thing is nonsense and "works" by chance, but I found this code and it seems to "work" for low-enough values of work (as in it does not crash when run, which doesn't mean much), and I don't get why.
The issue at hand is an extern "C" API exposed as a DLL/so, which is then called over FFI (by Python in this case), but the extern "C" code uses shared_ptr. And yet it moves.
The C++ code:
#include <memory>
extern "C" {
int make(std::shared_ptr<int> p) {
p = std::make_shared<int>(42);
return 0;
}
int get(std::shared_ptr<int> p) {
return *p;
}
}
the caller:
import ctypes
lib = ctypes.CDLL('lib.so')
p = ctypes.c_void_p()
lib.make(ctypes.byref(p))
print(lib.get(ctypes.byref(p)))
After building the C++ code as a shared library (named lib.so), the Python code runs fine and does print 42. This code was tested on macOS/ARM64 compiled with Clang but the original code this was munged from reportedly works on Linux/ARM32 (compiled with GCC) and Windows/AMD64 (compiled with msvc).
My working hypothesis is that in all these runtimes shared_ptr happens to have the object pointer as first members, and the compilers decide to pass it by reference in order to avoid the copy (and thus incref/decref), thus make writes the object pointer over Python's p, and writes the control block into space (maybe somewhere on the stack). When the shared pointer is freed the memory remains accessible (possibly because it's in a small object pool / page rather than being unmapped).
Then get does not need to touch the refcount (because gain passed by ref) so it just double-derefs our pointer, which is an UAF but the memory is still around and it works out.
Note: in the original there is no UAF because the shared_ptr is obtained from a longer-lived structure, so this simplified version is a touch worse than the original.
Some speculative facts:
A shared_ptr is often 16 bytes (on 64-bit architectures) while void* is 8 bytes. A shared_ptr contains a pointer to the object it refers to, then another pointer to a "control block" containing the refcount and destructor. (This is just one possible implementation of shared_ptr)
Overwriting memory doesn't have to immediately lead to a crash; often the memory allocator even allocates more memory than you ask for (e.g. it may round up to a multiple of 16 bytes)
Non-trivial class types are passed by reference. Yes, really. Just like you wrote & after the type. I'm not making this up. See for example the Itanium ABI on which a lot of ABIs are based. In order to simulate the parameter not being a reference, the caller makes a copy and then destroys it after the call. You didn't.
So, probably: You were meant to pass a reference to a shared_ptr to the make function. You did. The make function overwrote it with a real bona-fide shared_ptr. Then, instead of destroying the object like you were supposed to by the ABI (and thereby making it pretend to not be a reference), you passed the same reference to the get function which read the new value assigned inside make.
Related
I have inherited a pile of C++ code, saturated with std::shared_ptrs, most [all?] of which are unnecessary and are, I suspect, degrading performance. It is not feasible to change everything in one go, so I'm changing the code in tranches and doing performance tests.
An issue I'm running into is the interface in the method hierarchy between my new "raw" pointer code, and the underlying shared_ptr stuff. A contrived example (I know it could be simplified...):
SomeObject *MyClass::GetSomeObject(const std::string& aString)
{
//for the underlying shared pointer methods
std::shared_ptr<std::string> tmpString = make_shared<std::string>(aString);
//call the method using my local shared pointer
std::shared_ptr<SomeObject> someObj = GetTheObject(tmpString);
//The line below gives compiler warning: "The pointer points to memory allocated on the stack"
return someObj.get(); // a pointer to an object in std::map
}
I know that GetTheObject() is returning a pointer to an object in a persistent std::map, so that memory will be in good standing after we exit GetSomeObject() and the shared pointer [and its wrapped raw pointer] have gone out of scope.
I'm not in the habit of ignoring warnings, SO:
Questions:
Is the warning because the compiler is worried about the scope of the shared pointer rather than the object pointed to? [i.e. could I ignore it in this instance?]
If it is a real problem, are there any neat ways around this (that do not involve building wrapper classes and such workarounds...) ?
If I understand you correctly, you're replacing smart pointers with dumb pointers, in 2021, and you're now facing the exact problem that smart pointers intended to solve.
The warning is 100% accurate, and I'm pleasantly surprised the compiler looked deep enough.
The solution is simple: return a shared_ptr<SomeObject>. If you want efficiency improvements, there are two real improvements possible. C++11 introduced move constructors, and moving shared_ptr is faster than copying. The compiler will use the move ctor for return someObj; since someObj goes out of scope
Secondly, shared_ptr is a heavy-weight alternative to unique_ptr. At times, you may be able to downgrade to the latter.
I have similar code in my project. I agree that the proper solution is probably just to commit fully to the smart pointers and use them properly. However, I don't want to churn through piles of perfectly functional code, but I also want the warnings to go away. I was able to work around the warning with something like:
SomeObject *MyClass::GetSomeObject(const std::string& aString)
{
//for the underlying shared pointer methods
std::shared_ptr<std::string> tmpString = make_shared<std::string>(aString);
//call the method using my local shared pointer
std::shared_ptr<SomeObject> someObj = GetTheObject(tmpString);
SomeObject *pRet = someObj.get();
return pRet; // a pointer to an object in std::map
}
I'm a little worried that at some point the compiler will get smarter and detect that as a warning as well, but it seems OK for now. (Visual Studio 2022 v17.1) Hope that helps!
This question already has an answer here:
Garbage Collection in C++11
(1 answer)
Closed 9 years ago.
According to here, VC++ 2013 supports Minimal GC.
Could you guys give me some examples to illustrate its usage?
In other words, with VC++ 2013, how to use GC?
The code example I want might look like this:
auto p = gcnew int;
Are there any?
You may be disappointed about what Minimal GC in C++11: It doesn't do garbage collection! The minimal garbage collection support in C++11 consists of two parts:
There is a mandated to not "hide" pointers for everybody. When you have a pointer you are not allowed to obfuscate this pointer to the system, e.g., by writing it to a file to be read later or by using the xor-trick to create a doubly linked list while storing just one pointer. The standard speaks about safely derived pointers (the relevant clause is 3.7.4.3 [basic.stc.dynamic.safety]).
The standard C++ library provides a set of interfaces which can be used to identify pointers which can't be tracked as being reachable or, once they are no longer reachable to say so. That is, you can define a set of root objects which are considered to be usable and shouldn't be considered released by any garbage collection system.
There is, however, nothing standardized which actually makes use of these facilities. Just because there is no standard, it doesn't mean that the promises as interfaces are not used, of course.
The relevant functions for the API outlined above are defined in 20.6.4 [util.dynamic.safety] and the header to include is <memory>. The functions are, briefly:
void std::declare_reachable(void* p) stating that if p is non-null pointer that p is a reachable object even if a garbage collector has decided that it isn't. The function may allocate memory and, thus, throw.
template <typename T> T* std::undeclare_reachable(T* p) stating that if p is a non-null pointer that p is no longer reachable. The number of calls to undeclare_reachable(p) shall not exceed the number of calls to declare_reachable(p) with the same pointer.
void std::declare_no_pointers(char* p, size_t n) declares that the range of n bytes starting at p does not contain any pointers even if a garbage collectors has decided that there would be pointers insides.
void std::undeclare_no_pointers(char* p, size_t n) undoes the declaration that there are no pointers in the n bytes starting at p.
std::pointer_safety std::get_pointer_safety() noexcept returns if the implementation has strict pointer safety.
I think that all of these functions can basically implemented to do nothing and return a default value or an argument where a return type is specified. The pointer of these function is that there is a portable system to inform garbage collectors about pointers to consider reachable and memory areas not to trace.
In the future some level of garbage collection or, more likely, litter collection may be added but I'm not sure if there is a concrete proposal on the table. If something it is added it is probably something dubbed litter collection because it actually doesn't clean up all garbage: litter collection would just reclaim memory of unreachable object but not try to destroy the objects! That is, the system would give a view of an indefinitely living object although it may reuse the memory where it was located.
I am having my first attempt at using C++11 unique_ptr; I am replacing a polymorphic raw pointer inside a project of mine, which is owned by one class, but passed around quite frequently.
I used to have functions like:
bool func(BaseClass* ptr, int other_arg) {
bool val;
// plain ordinary function that does something...
return val;
}
But I soon realized that I wouldn't be able to switch to:
bool func(std::unique_ptr<BaseClass> ptr, int other_arg);
Because the caller would have to handle the pointer ownership to the function, what I don't want to. So, what is the best solution to my problem?
I though of passing the pointer as reference, like this:
bool func(const std::unique_ptr<BaseClass>& ptr, int other_arg);
But I feel very uncomfortable in doing so, firstly because it seems non instinctive to pass something already typed as _ptr as reference, what would be a reference of a reference. Secondly because the function signature gets even bigger. Thirdly, because in the generated code, it would be necessary two consecutive pointer indirections to reach my variable.
If you want the function to use the pointee, pass a reference to it. There's no reason to tie the function to work only with some kind of smart pointer:
bool func(BaseClass& base, int other_arg);
And at the call site use operator*:
func(*some_unique_ptr, 42);
Alternatively, if the base argument is allowed to be null, keep the signature as is, and use the get() member function:
bool func(BaseClass* base, int other_arg);
func(some_unique_ptr.get(), 42);
The advantage of using std::unique_ptr<T> (aside from not having to remember to call delete or delete[] explicitly) is that it guarantees that a pointer is either nullptr or it points to a valid instance of the (base) object. I will come back to this after I answer your question, but the first message is DO use smart pointers to manage the lifetime of dynamically allocated objects.
Now, your problem is actually how to use this with your old code.
My suggestion is that if you don't want to transfer or share ownership, you should always pass references to the object. Declare your function like this (with or without const qualifiers, as needed):
bool func(BaseClass& ref, int other_arg) { ... }
Then the caller, which has a std::shared_ptr<BaseClass> ptr will either handle the nullptr case or it will ask bool func(...) to compute the result:
if (ptr) {
result = func(*ptr, some_int);
} else {
/* the object was, for some reason, either not created or destroyed */
}
This means that any caller has to promise that the reference is valid and that it will continue to be valid throughout the execution of the function body.
Here is the reason why I strongly believe you should not pass raw pointers or references to smart pointers.
A raw pointer is only a memory address. Can have one of (at least) 4 meanings:
The address of a block of memory where your desired object is located. (the good)
The address 0x0 which you can be certain is not dereferencable and might have the semantics of "nothing" or "no object". (the bad)
The address of a block of memory which is outside of the addressable space of your process (dereferencing it will hopefully cause your program to crash). (the ugly)
The address of a block of memory which can be dereferenced but which doesn't contain what you expect. Maybe the pointer was accidentally modified and now it points to another writable address (of a completely other variable within your process). Writing to this memory location will cause lots of fun to happen, at times, during the execution, because the OS will not complain as long as you are allowed to write there. (Zoinks!)
Correctly using smart pointers alleviates the rather scary cases 3 and 4, which are usually not detectable at compile time and which you generally only experience at runtime when your program crashes or does unexpected things.
Passing smart pointers as arguments has two disadvantages: you cannot change the const-ness of the pointed object without making a copy (which adds overhead for shared_ptr and is not possible for unique_ptr), and you are still left with the second (nullptr) meaning.
I marked the second case as (the bad) from a design perspective. This is a more subtle argument about responsibility.
Imagine what it means when a function receives a nullptr as its parameter. It first has to decide what to do with it: use a "magical" value in place of the missing object? change behavior completely and compute something else (which doesn't require the object)? panic and throw an exception? Moreover, what happens when the function takes 2, or 3 or even more arguments by raw pointer? It has to check each of them and adapt its behavior accordingly. This adds a whole new level on top of input validation for no real reason.
The caller should be the one with enough contextual information to make these decisions, or, in other words, the bad is less frightening the more you know. The function, on the other hand, should just take the caller's promise that the memory it is pointed to is safe to work with as intended. (References are still memory addresses, but conceptually represent a promise of validity.)
I agree with Martinho, but I think it is important to point out the ownership semantics of a pass-by-reference. I think the correct solution is to use a simple pass-by-reference here:
bool func(BaseClass& base, int other_arg);
The commonly accepted meaning of a pass-by-reference in C++ is like as if the caller of the function tells the function "here, you can borrow this object, use it, and modify it (if not const), but only for the duration of the function body." This is, in no way, in conflict with the ownership rules of the unique_ptr because the object is merely being borrowed for a short period of time, there is no actual ownership transfer happening (if you lend your car to someone, do you sign the title over to him?).
So, even though it might seem bad (design-wise, coding practices, etc.) to pull the reference (or even the raw pointer) out of the unique_ptr, it actually is not because it is perfectly in accordance with the ownership rules set by the unique_ptr. And then, of course, there are other nice advantages, like clean syntax, no restriction to only objects owned by a unique_ptr, and so.
Personally, I avoid pulling a reference from a pointer/smart pointer. Because what happens if the pointer is nullptr? If you change the signature to this:
bool func(BaseClass& base, int other_arg);
You might have to protect your code from null pointer dereferences:
if (the_unique_ptr)
func(*the_unique_ptr, 10);
If the class is the sole owner of the pointer, the second of Martinho's alternative seems more reasonable:
func(the_unique_ptr.get(), 10);
Alternatively, you can use std::shared_ptr. However, if there's one single entity responsible for delete, the std::shared_ptr overhead does not pay off.
I have been trying to find a reason why this does not work in my code - I think this should work. Here is an excerpt from a header file:
#define WARN_UNUSED __attribute__((warn_unused_result))
class Trans {
Vector GetTranslation() const WARN_UNUSED {
return t;
}
};
So my question is: why don't I get a warning when I compile code with something like:
Gt.GetTranslation();
?
Thanks for the help.
The purpose of this attribute is intended (but not exclusively) for pointers to dynamically allocated data.
It gives a compile-time garantee that the calling code will store the pointer in a variable (may as a parameter to a function too ,but that I'm not certain of) en thereby delegates the responsibility of freeing\releasing\deleting the object it points to.
This in order to prevent memory leakage and\or other lifetime controlling aspects.
for instance ,if you call malloc( ... ) without storing the pointer ,you are not able to free it it afterwards. (malloc should have this attribute)
If you use it on function return an object ,than the mechanism is meaningless because the object that is returned is stored in a temporary and may be copied to a non-temporary variable (might be optimized out) and will always be destructed (because it will.
BTW , it's not particulary usefull for returned references (unless you code is aware of it and requires some kind of release mechanism) ,since the referenced object doesn't get destructed when going out of scope.
I have an object which implements reference counting mechanism. If the number of references to it becomes zero, the object is deleted.
I found that my object is never deleted, even when I am done with it. This is leading to memory overuse. All I have is the number of references to the object and I want to know the places which reference it so that I can write appropriate cleanup code.
Is there some way to accomplish this without having to grep in the source files? (That would be very cumbersome.)
A huge part of getting reference counting (refcounting) done correctly in C++ is to use Resource Allocation Is Initialization so it's much harder to accidentally leak references. However, this doesn't solve everything with refcounts.
That said, you can implement a debug feature in your refcounting which tracks what is holding references. You can then analyze this information when necessary, and remove it from release builds. (Use a configuration macro similar in purpose to how DEBUG macros are used.)
Exactly how you should implement it is going to depend on all your requirements, but there are two main ways to do this (with a brief overview of differences):
store the information on the referenced object itself
accessible from your debugger
easier to implement
output to a special trace file every time a reference is acquired or released
still available after the program exits (even abnormally)
possible to use while the program is running, without running in your debugger
can be used even in special release builds and sent back to you for analysis
The basic problem, of knowing what is referencing a given object, is hard to solve in general, and will require some work. Compare: can you tell me every person and business that knows your postal address or phone number?
One known weakness of reference counting is that it does not work when there are cyclic references, i.e. (in the simplest case) when one object has a reference to another object which in turn has a reference to the former object. This sounds like a non-issue, but in data structures such as binary trees with back-references to parent nodes, there you are.
If you don't explicitly provide for a list of "reverse" references in the referenced (un-freed) object, I don't see a way to figure out who is referencing it.
In the following suggestions, I assume that you don't want to modify your source, or if so, just a little.
You could of course walk the whole heap / freestore and search for the memory address of your un-freed object, but if its address turns up, it's not guaranteed to actually be a memory address reference; it could just as well be any random floating point number, of anything else. However, if the found value lies inside a block a memory that your application allocated for an object, chances improve a little that it's indeed a pointer to another object.
One possible improvement over this approach would be to modify the memory allocator you use -- e.g. your global operator new -- so that it keeps a list of all allocated memory blocks and their sizes. (In a complete implementation of this, operator delete would have remove the list entry for the freed block of memory.) Now, at the end of your program, you have a clue where to search for the un-freed object's memory address, since you have a list of memory blocks that your program actually used.
The above suggestions don't sound very reliable to me, to be honest; but maybe defining a custom global operator new and operator delete that does some logging / tracing goes in the right direction to solve your problem.
I am assuming you have some class with say addRef() and release() member functions, and you call these when you need to increase and decrease the reference count on each instance, and that the instances that cause problems are on the heap and referred to with raw pointers. The simplest fix may be to replace all pointers to the controlled object with boost::shared_ptr. This is surprisingly easy to do and should enable you to dispense with your own reference counting - you can just make those functions I mentioned do nothing. The main change required in your code is in the signatures of functions that pass or return your pointers. Other places to change are in initializer lists (if you initialize pointers to null) and if()-statements (if you compare pointers with null). The compiler will find all such places after you change the declarations of the pointers.
If you do not want to use the shared_ptr - maybe you want to keep the reference count intrinsic to the class - you can craft your own simple smart pointer just to deal with your class. Then use it to control the lifetime of your class objects. So for example, instead of pointer assignment being done with raw pointers and you "manually" calling addRef(), you just do an assignment of your smart pointer class which includes the addRef() automatically.
I don't think it's possible to do something without code change. With code change you can for example remember the pointers of the objects which increase reference count, and then see what pointer is left and examine it in the debugger. If possible - store more verbose information, such as object name.
I have created one for my needs. You can compare your code with this one and see what's missing. It's not perfect but it should work in most of the cases.
http://sites.google.com/site/grayasm/autopointer
when I use it I do:
util::autopointer<A> aptr=new A();
I never do it like this:
A* ptr = new A();
util::autopointer<A> aptr = ptr;
and later to start fulling around with ptr; That's not allowed.
Further I am using only aptr to refer to this object.
If I am wrong I have now the chance to get corrections. :) See ya!