Concurrent_hash_map implementation throwing SIGSEGV - c++

I am trying to use tbb’s concurrent_hash_map to increase my application’s concurrent performance. Read up about it and implemented it according to my application but I am seeing crashes..
So, my application is a multi-threadedd application where I am storing pairs,the key is char* and the value is an integer. Pseudocode looks like this:
In .h file,
typedef tbb::concurrent_hash_map<const void*, unsigned, Hasher> tbb_concurrent_hash;
tbb_concurrent_hash concurrent_hash_table;
tbb_concurrent_hash::accessor write_lock;
tbb_concurrent_hash::const_accessor read_lock;
In .c file,
void storeName(const char * name) {
int id=0;
// This creates a pair object of name and index
std::pair object(name, 0);
// read_lock is a const accessor for reading. This find function searches for char* in the table and if not found, create a write_lock.
bool found = concurrent_hash_table.find(read_lock, name);
if (found == FALSE) {
concurrent_hash_table.insert(write_lock, name);
// Get integer id somehow.
id = somefunction();
write_lock->second = id;
write_lock.release();
} else {
// if the name is found in the table then get the value and release it later
id = read_lock->second;
read_lock.release();
}
}
As per my understanding, I am good with the implementation but as I said, there are multiple crashes coming when find returns me FALSE. Crash have traces of mutexs as well.

Your 'locks', i.e. accessors are declared global in .h file. so, basically you write to a shared scoped_lock instance... which logically leads to a data race. Accessors are like fused std::shared_ptr and std::scoped_lock classes in one, or simpler - a pointer for your result and a lock guard for the data it points. You don't want to use one global pointer from multiple threads. Declare them locally in a scope you want to have that access (and you'd not need to call .release() as well)
Another problem is the data race between find() and insert (). Two or more threads can decide that they have to insert since they found nothing. In this case, the first thread will insert the new element while other threads will return existing element because insert() acts as find() if there's existing element. The problem is that your code doesn't account for that.
I can see why you might want to double check using const_accessor as the read lock is more scalable. But instead, you might want to use bool insert( const_accessor& result, const value_type& value ); with read lock (const_accessor) and value_type instead of a key only, which will initialize the whole pair in the case when a new element is added.

Related

Generate a unique ID C++

I want to generate a unique identifier "ident" for my complex structure, how can i do that?
in my header, the complexe structure is:
struct Complexe {
float x;
float y;
static unsigned int ident;
};
void Init(Complexe&);
etc...
and in the cpp file, i need to attribute ident a unique int
Init(Complexe&z){
z.ident = 0;
z.y = 0;
z.x = 0;
};
May I recommend you std::hash?
std::size_t ident = std::hash<Complex>()(complexVar);
Writing it from memory but it should return you unique value (with very small chance of it being not) for each Complex type object.
Consider UUID, specifically uuid_generate on GNU/Linux, or (it seems) UuidCreate on Windows.
Generating a unique id is easy even if you want to write your own algorithm. Though your algorithm will need some thoughts if the environment you are working in is multi-threaded. In this case you will need to write a thread safe code. for example below code will generate a unique id and is also thread safe:
class Utility {
public :
static int getUniqueId();
};
int Utility::getUniqueId() {
static std::atomic<std::uint32_t> uid { 0 };
return ++uid;
}
One simple way is to just make a free-list to help you reuse IDs. So for instance you start at ID 0 and whenever you create a structure, you first check the free-list for any released IDs. It will be empty the first time through so you increment the ID counter to 1 and use that for your new structure. When you destroy a structure it's ID goes back on the free-list (which can be implemented as a stack), so that the next time you create a structure that ID will be reused. If the free-list runs out of IDs you just start incrementing the ID counter from where you left off....wash, rinse and repeat. The nice thing about this method is that you will never wrap integer range and accidentally use an already in use ID if your program ends up running a long time. The main down side is the extra storage for the stack, but the stack can always grow as you need it.

How can I calculate a hash/checksum/fingerprint of an object in c++?

How can I calculate a hash/checksum/fingerprint of an object in c++?
Requirements:
The function must be 'injective'(*). In other words, there should be no two different input objects, that return the same hash/checksum/fingerprint.
Background:
I am trying to come up with a simple pattern for checking whether or not an entity object has been changed since it was constructed. (In order to know which objects need to be updated in the database).
Note that I specifically do not want to mark the object as changed in my setters or anywhere else.
I am considering the following pattern: In short, every entity object that should be persisted, has a member function "bool is_changed()". Changed, in this context, means changed since the objects' constructor was called.
Note: My motivation for all this is to avoid the boilerplate code that comes with marking objects as clean/dirty or doing a member by member comparison. In other words, reduce risk of human error.
(Warning: psudo c++ code ahead. I have not tried compiling it).
class Foo {
private:
std::string my_string;
// Assume the "fingerprint" is of type long.
long original_fingerprint;
long current_fingerprint()
{
// *** Suggestions on which algorithm to use here? ***
}
public:
Foo(const std::string& my_string) :
my_string(my_string)
{
original_fingerprint = current_fingerprint();
}
bool is_changed() const
{
// If new calculation of fingerprint is different from the one
// calculated in the constructor, then the object has
// been changed in some way.
return current_fingerprint() != original_fingerprint;
}
void set_my_string(const std::string& new_string)
{
my_string = new_string;
}
}
void client_code()
{
auto foo = Foo("Initial string");
// should now return **false** because
// the object has not yet been changed:
foo.is_changed();
foo.set_my_string("Changed string");
// should now return **true** because
// the object has been changed:
foo.is_changed();
}
(*) In practice, not necessarily in theory (like uuids are not unique in theory).
You can use the CRC32 algorithm from Boost. Feed it with the memory locations of the data you want to checksum. You could use a hash for this, but hashes are cryptographic functions intended to guard against intentional data corruption and are slower. A CRC performs better.
For this example, I've added another data member to Foo:
int my_integer;
And this is how you would checksum both my_string and my_integer:
#include <boost/crc.hpp>
// ...
long current_fingerprint()
{
boost::crc_32_type crc32;
crc32.process_bytes(my_string.data(), my_string.length());
crc32.process_bytes(&my_integer, sizeof(my_integer));
return crc32.checksum();
}
However, now we're left with the issue of two objects having the same fingerprint if my_string and my_integer are equal. To fix this, we should include the address of the object in the CRC, since C++ guarantees that different objects will have different addresses.
One would think we can use:
process_bytes(&this, sizeof(this));
to do it, but we can't since this is an rvalue and thus we can't take its address. So we need to store the address in a variable instead:
long current_fingerprint()
{
boost::crc_32_type crc32;
void* this_ptr = this;
crc32.process_bytes(&this_ptr, sizeof(this_ptr));
crc32.process_bytes(my_string.data(), my_string.length());
crc32.process_bytes(&my_integer, sizeof(my_integer));
return crc32.checksum();
}
Such a function does not exist, at least not in the context that you are requesting.
The STL provides hash functions for basic types (std::hash), and you could use these to implement a hash function for your objects using any reasonable hashing algorithm.
However, you seem to be looking for an injective function, which causes a problem. Essentially, to have an injective function, it would be necessary to have an output of size greater or equal to that of the object you are considering, since otherwise (from the pigeon hole principle) there would be two inputs that give the same output. Given that, the most sensible option would be to just do a straight-up comparison of the object to some sort of reference object.

Prevent v8::Local value from being garbage collected

I have a function that stores the value of an argument to an std::vector<v8::Local<v8::Value>> property of a C++ class exposes as an ObjectWrap like this:
NAN_METHOD(MyObject::Write) {
MyObject* obj = Nan::ObjectWrap::Unwrap<MyObject>(info.This());
obj->data.push_back(info[0]);
}
However, when I try to read back the value from another C++ function, the value is lost, and becomes undefined.
I'm passing a number to MyObject::Write, and I can confirm info[0]->IsNumber() returns true before pushing it to the vector, however when reading it back, the value it not a number, and in fact returns false for all the types I tested using Is<Type> methods from v8::Value, but still returns true for BooleanValue().
My guess is that the variable is being garbage collected after MyObject::Write returns, however I have no idea how to prevent this from happening.
I'm currently trying to initialise the value as a Persistent value. I tried the following attempts without success:
Nan::CopyablePersistentTraits<v8::Value>::CopyablePersistent p;
Nan::Persistent<v8::Value> persistent(info[0]);
Nan::CopyablePersistentTraits::Copy(persistent, p);
And:
v8::Isolate *isolate = info.GetIsolate();
v8::Persistent<v8::Value, v8::CopyablePersistentTraits<v8::Value>> persistent(isolate, info[0]);
But getting tons of C++ errors.
I was running into problems untangling this mess myself. There's a lot of template stuff going on here that we both missed. Here was the solution I found most readable:
// Define the copyable persistent
v8::CopyablePersistentTraits<v8::Value>::CopyablePersistent p;
// Create the local value
auto val = v8::Local<v8::Value>::New(
v8::Isolate::GetCurrent(), //< Isolate required
v8::Integer::New(v8::Isolate::GetCurrent(), v) //< Isolate required
);
// Reset() is a TEMPLATE FUNCTION, you have to template it with the same
// template type parameter as the v8::Local you are passing
p.Reset<v8::Value>(v8::Isolate::GetCurrent(), val); //< Isolate required
By "info" I assume you are referring to a v8::FunctionCallbackInfo reference. If so the above code would collapse to the following:
void SomeFunc(v8::FunctionCallbackInfo<v8::Value>& info) {
v8::CopyablePersistentTraits<v8::Value>::CopyablePersistent p;
p.Reset<v8::Value>(info[0]);
}
Because the persistent is now copyable you can do things like store it inside a standard library container. This was my use case. This is an example of storing a value in a vector:
std::vector<v8::CopyablePersistentTraits<v8::Value>::CopyablePersistent> vect;
void AccumulateData(v8::FunctionCallbackInfo<v8::Value>& info) {
v8::CopyablePersistentTraits<v8::Value>::CopyablePersistent p;
p.Reset<v8::Value>(info[0]);
vect.push_back(p);
}
I hope this helps someone out there.
If you plan on storing v8 values in C++, you need to make them persistent instead of local so they're independent of handle scope and not garbage-collected when the handle scope is released.
Nan has version-independant wrappers for v8::Persistent and Co. Because of using inside std::vector<>, you'll also need to initialize Nan::Persistent with Nan::CopyablePersistentTraits so it becomes copyable (or make an own reference-counted container for it).

tbb concurrent hash map find & insert

I am currently using tbb's concurrent hash map to perform concurrent insertions into a hash map. Each key is a string and a value is a vector of integers. I would like to achieve the following: during insertions, if the key does not exist, I insert it and add the value to its vector. If it exists, I simply add the value to its vector.
After inspecting the tbb concurrent hash map API, I noticed that both the find and insert functions return booleans only. So how can I return a pointer to the key if it exists?
There are methods which require an accessor in theirs arguments. The accessor is basically a pointer coupled with a scoped_lock protecting concurrent access to the element. Without the lock, an element can be modified concurrently resulting in a data-race. So, never use a pointer to element in concurrent_hash_map directly (unless protected by the accessor).
Also, you don't need a find() method for your task since insert() methods create the element if it does not exist.
According to the Reference manual, the hash map has the following methods which will likely satisfy your needs:
bool insert( accessor& result, const Key& key ); // creates new element by default
bool insert( accessor& result, const value_type& value );// creates new element by copying
Here is an example:
{
hash_map_t::accessor a;
hash_map.insert( a, key ); // creates by default if not exists, acquires lock
a->second.my_vector.push_back( value ); // new or old entry, add to vector anyway
} // the accessor's lock is released here
During insertions, if the key does not exist then key inserted and added the value to its vector.
If it exists, return false and I simply add the value to its vector.
{
hash_map_t::accessor accessor;
bool result = hash_map.insert(accessor, std::make_pair(key, {value})); // creates by default if not exists, acquires lock
if(result == false)
accessor->second.push_back(value); // if key exists
} // the accessor's lock is released here

Crash using concurrent_unordered_map

I've got a concurrent_unordered_map. I use the insert function (and no other) to try to insert into the map concurrently. However, many times, this crashes deep in the insert function internals. Here is some code:
class ModuleBase {
public:
virtual Wide::Parser::AST* GetAST() = 0;
virtual ~ModuleBase() {}
};
struct ModuleContents {
ModuleContents() {}
ModuleContents(ModuleContents&& other)
: access(other.access)
, base(std::move(other.base)) {}
Accessibility access;
std::unique_ptr<ModuleBase> base;
};
class Module : public ModuleBase {
public:
// Follows Single Static Assignment form. Once it's been written, do not write again.
Concurrency::samples::concurrent_unordered_map<Unicode::String, ModuleContents> contents;
Wide::Parser::AST* GetAST() { return AST; }
Wide::Parser::NamespaceAST* AST;
};
This is the function I use to actually insert into the map. There is more but it doesn't touch the map, only uses the return value of insert.
void CollateModule(Parser::NamespaceAST* module, Module& root, Accessibility access_level) {
// Build the new module, then try to insert it. If it comes back as existing, then we discard. Else, it was inserted and we can process.
Module* new_module = nullptr;
ModuleContents m;
{
if (module->dynamic) {
auto dyn_mod = MakeUnique<DynamicModule>();
dyn_mod->libname = module->libname->contents;
new_module = dyn_mod.get();
m.base = std::move(dyn_mod);
} else {
auto mod = MakeUnique<Module>();
new_module = mod.get();
m.base = std::move(mod);
}
new_module->AST = module;
m.access = access_level;
}
auto result = root.contents.insert(std::make_pair(module->name->name, std::move(m)));
This is the root function. It is called in parallel from many threads on different inputs, but with the same root.
void Collater::Context::operator()(Wide::Parser::NamespaceAST* input, Module& root) {
std::for_each(input->contents.begin(), input->contents.end(), [&](Wide::Parser::AST* ptr) {
if (auto mod_ptr = dynamic_cast<Wide::Parser::NamespaceAST*>(ptr)) {
CollateModule(mod_ptr, root, Accessibility::Public);
}
});
}
I'm not entirely sure wtf is going on. I've got one bit of shared state, and I only ever access it atomically, so why is my code dying?
Edit: This is actually completely my own fault. The crash was in the insert line, which I assumed to be the problem- but it wasn't. It wasn't related to the concurrency at all. I tested the return value of result the wrong way around- i.e., true for value existed, false for value did not exist, whereas the Standard defines true for insertion succeeded- i.e., value did not exist. This mucked up the memory management significantly, causing a crash- although exactly how it led to a crash in the unordered_map code, I don't know. Once I inserted the correct negation, it worked flawlessly. This was revealed because I didn't properly test the single-threaded version before jumping the concurrent fence.
One possibility is that you are crashing because of some problem with move semantics. Is the crash caused by a null pointer dereference? That would happen if you inadvertently accessed an object (e.g., ModuleContents) after it's been moved.
It's also possible that the crash is the result of a concurrency bug. The concurrent_unordered_map is thread safe in the sense that insertion and retrieval are atomic. However, whatever you are storing inside it is not automatically protected. So if multiple threads retrieve the same ModuleContents object, they will share the AST tree that's inside a Module. I'm not sure which references are modifiable, since I don't see any const pointers or references. Anything that is shared and modifiable must be protected by some synchronization mechanism (for instance, locks).