C++ precise garbage collector using clang/llvm?

C++ precise garbage collector using clang/llvm? - c++

Ok so I'm wanting to write a precise 'mark and sweep' garbage collector in C++. I have hopefully made some decisions that can help me as in all my pointers will be wrapped in a 'RelocObject' and I'll have a single block of memory for the heap. This looks something like this:
// This class acts as an indirection to the actual object in memory so that it can be
// relocated in the sweep phase of garbage collector
class MemBlock
{
public:
void* Get( void ) { return m_ptr; }
private:
MemBlock( void ) : m_ptr( NULL ){}
void* m_ptr;
};
// This is of the same size as the above class and is directly cast to it, but is
// typed so that we can easily debug the underlying object
template<typename _Type_>
class TypedBlock
{
public:
_Type_* Get( void ) { return m_pObject; }
private:
TypedBlock( void ) : m_pObject( NULL ){}
// Pointer to actual object in memory
_Type_* m_pObject;
};
// This is our wrapper class that every pointer is wrapped in
template< typename _Type_ >
class RelocObject
{
public:
RelocObject( void ) : m_pRef( NULL ) {}
static RelocObject New( void )
{
RelocObject ref( (TypedBlock<_Type_>*)Allocator()->Alloc( this, sizeof(_Type_), __alignof(_Type_) ) );
new ( ref.m_pRef->Get() ) _Type_();
return ref;
}
~RelocObject(){}
_Type_* operator-> ( void ) const
{
assert( m_pRef && "ERROR! Object is null\n" );
return (_Type_*)m_pRef->Get();
}
// Equality
bool operator ==(const RelocObject& rhs) const { return m_pRef->Get() == rhs.m_pRef->Get(); }
bool operator !=(const RelocObject& rhs) const { return m_pRef->Get() != rhs.m_pRef->Get(); }
RelocObject& operator= ( const RelocObject& rhs )
{
if(this == &rhs) return *this;
m_pRef = rhs.m_pRef;
return *this;
}
private:
RelocObject( TypedBlock<_Type_>* pRef ) : m_pRef( pRef )
{
assert( m_pRef && "ERROR! Can't construct a null object\n");
}
RelocObject* operator& ( void ) { return this; }
_Type_& operator* ( void ) const { return *(_Type_*)m_pRef->Get(); }
// SS:
TypedBlock<_Type_>* m_pRef;
};
// We would use it like so...
typedef RelocObject<Impl::Foo> Foo;
void main( void )
{
Foo foo = Foo::New();
}
So in order to find the 'root' RelocObjects when I allocate in 'RelocObject::New' I pass in the 'this' pointer of the RelocObject into the allocator(garbage collector). The allocator then checks to see if the 'this' pointer is in the range of the memory block for the heap and if it is then I can assume its not a root.
So the issue comes when I want to trace from the roots through the child objects using the zero or more RelocObjects located inside each child object.
I want to find the RelocObjects in a class (ie a child object) using a 'precise' method. I could use a reflection approach and make the user Register where in each class his or her RelocObjects are. However this would be very error prone and so I'd like to do this automatically.
So instead I'm looking to use Clang to find the offsets of the RelocObjects within the classes at compile time and then load this information at program start and use this in the mark phase of the garbage collector to trace through and mark the child objects.
So my question is can Clang help? I've heard you can gather all kinds of type information during compilation using its compile time hooks. If so what should I look for in Clang ie are there any examples of doing this kind of thing?
Just to be explicit: I want to use Clang to automatically find the offset of 'Foo' (which is a typedef of RelocObject) in FooB without the user providing any 'hints' ie they just write:
class FooB
{
public:
int m_a;
Foo m_ptr;
};
Thanks in advance for any help.

Whenever a RelocObject is instantiated, it's address can be recorded in a RelocObject ownership database along with sizeof(*derivedRelocObject) which will immediately identify which Foo belongs to which FooB. You don't need Clang for that. Also since Foo will be created shortly after FooB, your ownership database system can be very simple as the order of "I've been created, here's my address and size" calls will show the owning RelocObject record directly before the RelocObject instance's that it owns.
Each RelocObject has a ownership_been_declared flag initialized as false, upon first use (which would be after the constructors have completed, since no real work should be done in the constructor), so when any of those newly created objects is first used it requests that the database update it's ownership, the database goes through it's queue of recorded addresses and can identify which objects belong to which, clear some from it's list, setting their ownership_been_declared flag to true and you will have the offsets too (if you still need them).
p.s. if you like I can share my code for an Incremental Garbage Collector I wrote many years ago, which you might find helpful.

Related

how to find and track objects and functions that using a dynamically allocated memory?

if we have a code like this :
void g(int *n){
int m=n[1];
n= null ;
}
void main(){
int ∗ p = (int∗)malloc(10 ∗ sizeof (int));
int * q= p;
g(p);
}
so we know if we overload malloc, calloc , realloc ,new ,free and delete functions we can track first pointer that create or delete with this functions(p pointer in above example ) but how can i find and track other pointers and functions that using this allocated memory ? ( q pointer and g function in above example ) .should i overload Assignment statement and function call ? if yes how ? in other words i want to know live objects and last used time and location of an allocated memory too .
i want to implement an custom memory leak detection tools so i need to find all objects and pointer that using an allocated memory before report it that's leak or not .

What you're talking about is called reference counting. Searching stackoverflow for this topic gives you these results:
What is a smart pointer and when should I use one?
How does a reference-counting smart pointer's reference counting work?
what exactly reference counting in c++ means?,
The standard template library already has a std::shared_ptr which does that.
We need to keep track of the lifecycle of the resource, this is possible by controlling the creation, the copying and the deletion. If you do it without using a class, you'll end up with some functions, a state variable and a global variable. This is not very effective in keeping the concept of the shared resource in the focus of the user's mind. One tends to forget that one is using a shared resource because one sees a naked pointer and tends to use it like one, which would disregard the provided shared-pointer functionality.
If you were to encapsulate the functionality in a class, you should want to implement the concept for all types i.e. you should want to use templates. one way would be this:
#include <vector>
#include <stdexcept>
#include <cstdlib>
#ifndef SHARED_PTR_H
#define SHARED_PTR_H
template<class Tres_t>
class shared_pointer
{
// members
Tres_t* data_;
static std::vector<std::size_t> ref_counts_;
std::size_t ref_index_ = {};
public:
// lifecycle
shared_pointer() = delete;
shared_pointer(const std::size_t& size)
: data_{nullptr}
{
data_ = static_cast<Tres_t*>(std::malloc(size * sizeof(Tres_t)));
ref_counts_.push_back(1);
ref_index_ = ref_counts_.size() - 1;
}
shared_pointer(const shared_pointer& rhs)
: data_{rhs.data_}, ref_index_{rhs.ref_index_}
{
if (ref_index_ >= ref_counts_.size())
{
throw std::runtime_error("shared_pointer_ctor_index_error");
}
++ref_counts_[ref_index_];
}
shared_pointer(shared_pointer&& rhs)
: data_{rhs.data_}, ref_index_{rhs.ref_index_} {}
shared_pointer& operator=(const shared_pointer& rhs)
{
data_ = rhs.data_;
ref_index_ = rhs.ref_index_;
if (ref_index_ >= ref_counts_.size())
{
throw std::runtime_error("shared_pointer_ctor_index_error");
}
++ref_counts_[ref_index_];
}
shared_pointer& operator=(shared_pointer&& rhs)
{
data_ = rhs.data_;
ref_index_ = rhs.ref_index_;
}
~shared_pointer()
{
if (ref_counts_[ref_index_] == 0)
{
std::logic_error("shared_point_dtor_reference_counting_error");
}
--ref_counts_[ref_index_];
if (ref_counts_[ref_index_] == 0)
{
std::free(data_);
}
}
// main functionality
Tres_t* data()
{
return data_;
}
const Tres_t* data() const
{
return data_;
}
};
template<class Tres_t>
std::vector<std::size_t> shared_pointer<Tres_t>::ref_counts_ = {};
#endif
I've tested this code only rudimentarily so I'll leave it to you to test and improve it.

Enforce NULL checking in c++

My method can return some kind of pointer ( for example boost::shared_ptr ) and this pointer may be NULL. Is there is any way to enforce users of my code to check, if it is empty or not ?
Some example of such things - scals's Option container, may be boost has something like boost::option ?

You can do the following:
return a smart pointer type that throws an exception if accessed and set to NULL.
throw an exception instead of returning a NULL pointer
return a std::optional (or boost::optional) which expresses intent (i.e. "value may be missing") much better than a pointer

The usual solution is to wrap the return value in a class, which
contains a flag which is set if the pointer is checked or
copied, and whose destructor crashes if the flag wasn't set.
Something like:
template <typename T>
class MustBeChecked
{
T* myValue;
mutable bool myHasBeenChecked;
public:
MustBeChecked( T* value )
: myValue( value )
, myHasBeenChecked( false )
{
}
MustBeChecked( MustBeChecked const& other )
: myValue( other.myValue )
, myHasBeenChecked( false )
{
other.myHasBeenChecked = true;
}
~MustBeChecked()
{
assert( myHasBeenChecked );
}
bool operator==( nullptr_t ) const
{
myHasBeenChecked = true;
return myValue == nullptr;
}
bool operator!=( nullptr_t ) const
{
myHasBeenChecked = true;
return myValue != nullptr;
}
operator T*() const
{
assert( myHasBeenChecked );
return myValue;
}
};
To be frank, I find this to be overkill in most cases. But I've
seen it used on some critical systems.

The reality here is that the callers of your function already have to check. If they try to access the shared pointer without checking, then a seg-fault is coming their way if the underlying pointer is NULL.
You don't specify if you're writing a library, or some code within a project. Nor do you specify any details of the context this code lives in -- all of these might decide which approach I'd take in this situation -- but broadly speaking, all of utnapistim's suggestions are good ones.

Very strange memory leak

I am running the following piece of code under the Marmalade SDK. I need to know if there's a "bug" in my code or in Marmalade:
template <class Return = void, class Param = void*>
class IFunction {
private:
static unsigned int counterId;
protected:
unsigned int id;
public:
//
static unsigned int getNewId() { return counterId++; }
template <class FunctionPointer>
static unsigned int discoverId(FunctionPointer funcPtr) {
typedef std::pair<FunctionPointer, unsigned int> FP_ID;
typedef std::vector<FP_ID> FPIDArray;
static FPIDArray siblingFunctions; // <- NOTE THIS
typename FPIDArray::iterator it = siblingFunctions.begin();
while (it != siblingFunctions.end()) {
if (funcPtr == it->first) return it->second; /// found
++it;
}
/// not found
unsigned int newId = getNewId();
siblingFunctions.push_back( FP_ID(funcPtr, newId) ); // <- NOTE THIS
return newId;
}
//
virtual ~IFunction() {}
bool operator<(const IFunction* _other) const {
if (this->id < _other->id) return true;
return false;
}
virtual Return call(Param) = 0;
};
Note that every time template class discoverId is called for the 1st time, a static local array is created.
At program exit, the Marmalade memory manager complains that the memory reserved at this line :
siblingFunctions.push_back( FP_ID(funcPtr, newId) );
hasn't been freed. (The truth is that I don't empty the array, but how could I, I don't have access to it outside that function!).
Here is the catch : Marmalade complains only for the memory reserved at the very first call of this function! This function is called several times and with several different template parameters, but the complaining always occurs only for the memory reserved at the 1st call. This is the case even if I mix up the order of the various calls to this function. Memory reserved for every call after the 1st one is automatically freed - I have checked this out.
So, who's to blame now?

I don't know what "Marmalade" is (and a quick search for this word expectedly found a lot of irrelevant references) but your code doesn't have a resource leak with respect to the static FPIDArray siblingFunctions: this object is constructed the first time the function is called. It is destroyed at some point after main() is exited. I seem to recall that the order of destruction of objects with static linkage is the reverse of order in which objects are constructed but I'm not sure if this extends function local statics.

boost::shared_?? for non-pointer resources

Basically i need to do reference counting on certain resources (like an integer index) that are not inmediately equivalent to a pointer/address semantic; basically i need to pass around the resource around, and call certain custom function when the count reaches zero. Also the way to read/write access to the resource is not a simple pointer derreference operation but something more complex. I don't think boost::shared_ptr will fit the bill here, but maybe i'm missing some other boost equivalent class i might use?
example of what i need to do:
struct NonPointerResource
{
NonPointerResource(int a) : rec(a) {}
int rec;
}
int createResource ()
{
data BasicResource("get/resource");
boost::shared_resource< MonPointerResource > r( BasicResource.getId() ,
boost::function< BasicResource::RemoveId >() );
TypicalUsage( r );
}
//when r goes out of scope, it will call BasicResource::RemoveId( NonPointerResource& ) or something similar
int TypicalUsage( boost::shared_resource< NonPointerResource > r )
{
data* d = access_object( r );
// do something with d
}

Allocate NonPointerResource on the heap and just give it a destructor as normal.

Maybe boost::intrusive_ptr could fit the bill. Here's a RefCounted base class and ancillary functions that I'm using in some of my code. Instead of delete ptr you can specify whatever operation you need.
struct RefCounted {
int refCount;
RefCounted() : refCount(0) {}
virtual ~RefCounted() { assert(refCount==0); }
};
// boost::intrusive_ptr expects the following functions to be defined:
inline
void intrusive_ptr_add_ref(RefCounted* ptr) { ++ptr->refCount; }
inline
void intrusive_ptr_release(RefCounted* ptr) { if (!--ptr->refCount) delete ptr; }
With that in place you can then have
boost::intrusive_ptr<DerivedFromRefCounted> myResource = ...

Here
is a small example about the use of shared_ptr<void> as a counted handle.
Preparing proper create/delete functions enables us to use
shared_ptr<void> as any resource handle in a sense.
However, as you can see, since this is weakly typed, the use of it causes us
inconvenience in some degree...

C++ method that can/cannot return a struct

I have a C++ struct and a method:
struct Account
{
unsigned int id;
string username;
...
};
Account GetAccountById(unsigned int id) const { }
I can return an Account struct if the account exists, but what to do if there's no account?
I thought of having:
An "is valid" flag on the struct (so an empty one can be returned, with that set to false)
An additional "is valid" pointer (const string &id, int *is_ok) that's set if the output is valid
Returning an Account* instead, and returning either a pointer to a struct, or NULL if it doesn't exist?
Is there a best way of doing this?

You forgot the most obvious one, in C++:
bool GetAccountById(unsigned int id, Account& account);
Return true and fill in the provided reference if the account exists, else return false.
It might also be convenient to use the fact that pointers can be null, and having:
bool GetAccountById(unsigned int id, Account* account);
That could be defined to return true if the account id exists, but only (of course) to fill in the provided account if the pointer is non-null. Sometimes it's handy to be able to test for existance, and this saves having to have a dedicated method for only that purpose.
It's a matter of taste what you prefer having.

From the options given I would return Account*. But returning pointer may have some bad side effect on the interface.
Another possibility is to throw an exception when there is no such account. You may also try boost::optional.

You could also try the null object pattern.

It depends how likely you think the non-existent account is going to be.
If it is truly exceptional - deep in the bowels of the internals of the banking system where the data is supposed to be valid - then maybe throw an exception.
If it is in a user-interface level, validating the data, then probably you don't throw an exception.
Returning a pointer means someone has to deallocate the allocated memory - that's messier.
Can you use an 'marker ID' (such as 0) to indicate 'invalid account'?

I would use Account* and add a documentation comment to the method stating that the return value can be NULL.

There are several methods.
1) Throw an exception. This is useful if you want GetAccountById to return the account by value and the use of exceptions fit your programming model. Some will tell you that exceptions are "meant" to be used only in exceptional circumstances. Things like "out of memory" or "computer on fire." This is highly debatable, and for every programmer you find who says exceptions are not for flow control you'll find another (myself included) who says that exceptions can be used for flow control. You need to think about this and decide for yourself.
Account GetAccountById(unsigned int id) const
{
if( account_not_found )
throw std::runtime_error("account not found");
}
2) Don't return and Account by value. Instead, return by pointer (preferably smart pointer), and return NULL when you didn't find the account:
boost::shared_ptr<Account> GetAccountById(unsigned int id) const
{
if( account_not_found )
return NULL;
}
3) Return an object that has a 'presence' flag indicating whether or not the data item is present. Boost.Optional is an example of such a device, but in case you can't use Boost here is a templated object that has a bool member that is true when the data item is present, and is false when it is not. The data item itself is stored in the value_ member. It must be default constructible.
template<class Value>
struct PresenceValue
{
PresenceValue() : present_(false) {};
PresenceValue(const Value& val) : present_(true), value_(val) {};
PresenceValue(const PresenceValue<Value>& that) : present_(that.present_), value_(that.value_) {};
explicit PresenceValue(Value val) : present_(true), value_(val) {};
template<class Conv> explicit PresenceValue(const Conv& conv) : present_(true), value_(static_cast<Value>(conv)) {};
PresenceValue<Value>& operator=(const PresenceValue<Value>& that) { present_ = that.present_; value_ = that.value_; return * this; }
template<class Compare> bool operator==(Compare rhs) const
{
if( !present_ )
return false;
return rhs == value_;
}
template<class Compare> bool operator==(const Compare* rhs) const
{
if( !present_ )
return false;
return rhs == value_;
}
template<class Compare> bool operator!=(Compare rhs) const { return !operator==(rhs); }
template<class Compare> bool operator!=(const Compare* rhs) const { return !operator==(rhs); }
bool operator==(const Value& rhs) const { return present_ && value_ == rhs; }
operator bool() const { return present_ && static_cast<bool>(value_); }
operator Value () const;
void Reset() { value_ = Value(); present_ = false; }
bool present_;
Value value_;
};
For simplicity, I would create a typedef for Account:
typedef PresenceValue<Account> p_account;
...and then return this from your function:
p_account GetAccountByIf(...)
{
if( account_found )
return p_account(the_account); // this will set 'present_' to true and 'value_' to the account
else
return p_account(); // this will set 'present_' to false
}
Using this is straightforward:
p_account acct = FindAccountById(some_id);
if( acct.present_ )
{
// magic happens when you found the account
}

Another way besides returning a reference is to return a pointer. If the account exists, return its pointer. Else, return NULL.

There is yet another way similar to the "is valid" pattern. I am developing an application right now that has a lot of such stuff in it. But my IDs can never be less than 1 (they are all SERIAL fields in a PostgreSQL database) so I just have a default constructor for each structure (or class in my case) that initializes id with -1 and isValid() method that returns true if id is not equal to -1. Works perfectly for me.

I would do:
class Bank
{
public:
class Account {};
class AccountRef
{
public:
AccountRef(): m_account(NULL) {}
AccountRef(Account const& acc) m_account(&acc) {}
bool isValid() const { return m_account != NULL);}
Account const& operator*() { return *m_account; }
operator bool() { return isValid(); }
private:
Account const* m_account;
};
Account const& GetAccountById(unsigned int id) const
{
if (id < m_accounts.size())
{ return m_accounts[id];
}
throw std::outofrangeexception("Invalid account ID");
}
AccountRef FindAccountById(unsigned int id) const
{
if (id < m_accounts.size())
{ return AccountRef(m_accounts[id]);
}
return AccountRef();
}
private:
std::vector<Account> m_accounts;
};
A method called get should always return (IMHO) the object asked for. If it does not exist then that is an exception. If there is the possibility that something may not exists then you should also provide a find method that can determine if the object exists so that a user can test it.
int main()
{
Bank Chase;
// Get a reference
// As the bank ultimately ownes the account.
// You just want to manipulate it.
Account const& account = Chase.getAccountById(1234);
// If there is the possibility the account does not exist then use find()
AccountRef ref = Chase.FindAccountById(12345);
if ( !ref )
{ // Report error
return 1;
}
Account const& anotherAccount = *ref;
}
Now I could have used a pointer instead of going to the effort of creating AccountRef. The problem with that is that pointers do not have ownership symantics and thus there is no true indication of who should own (and therefore delete) the pointer.
As a result I like to wrap pointers in some container that allows the user to manipulate the object only as I want them too. In this case the AccountRef does not expose the pointer so there is no opportunity for the user of AccountRef to actually try and delete the account.
Here you can check if AccountRef is valid and extract a reference to an account (assuming it is valid). Because the object contains only a pointer the compiler is liable to optimize this to the point that this is no more expensive than passing the pointer around. The benefit is that the user can not accidentally abuse what I have given them.
Summary: AccountRef has no real run-time cost. Yet provides type safety (as it hides the use of pointer).

I like to do a combination of what you suggest with the Valid flag and what someone else suggested with the null object pattern.
I have a base class called Status that I inherit from on objects that I want to use as return values. I'll leave most of it out of this discussion since it's a little more involved but it looks something like this
class Status
{
public:
Status(bool isOK=true) : mIsOK(isOK)
operator bool() {return mIsOK;}
private
bool mIsOK
};
now you'd have
class Account : public Status
{
public:
Account() : Status(false)
Account(/*other parameters to initialize an account*/) : ...
...
};
Now if you create an account with no parameters:
Account A;
It's invalid. But if you create an account with data
Account A(id, name, ...);
It's valid.
You test for the validity with the operator bool.
Account A=GetAccountByID(id);
if (!A)
{
//whoa there! that's an invalid account!
}
I do this a lot when I'm working with math-types. For example, I don't want to have to write a function that looks like this
bool Matrix_Multiply(a,b,c);
where a, b, and c are matrices. I'd much rather write
c=a*b;
with operator overloading. But there are cases where a and b can't be multiplied so it's not always valid. So they just return an invalid c if it doesn't work, and I can do
c=a*b;
if (!c) //handle the problem.

boost::optional is probably the best you can do in a language so broken it doesn't have native variants.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ precise garbage collector using clang/llvm? - c++

Related

how to find and track objects and functions that using a dynamically allocated memory?

Enforce NULL checking in c++

Very strange memory leak

boost::shared_?? for non-pointer resources

C++ method that can/cannot return a struct

Categories

Resources