Boost, Shared Memory and Vectors - c++

I need to share a stack of strings between processes (possibly more complex objects in the future). I've decided to use boost::interprocess but I can't get it to work. I'm sure it's because I'm not understanding something. I followed their example, but I would really appreciate it if someone with experience with using that library can have a look at my code and tell me what's wrong. The problem is it seems to work but after a few iterations I get all kinds of exceptions both on the reader process and sometimes on the writer process. Here's a simplified version of my implementation:
using namespace boost::interprocess;
class SharedMemoryWrapper
{
public:
SharedMemoryWrapper(const std::string & name, bool server) :
m_name(name),
m_server(server)
{
if (server)
{
named_mutex::remove("named_mutex");
shared_memory_object::remove(m_name.c_str());
m_segment = new managed_shared_memory (create_only,name.c_str(),65536);
m_stackAllocator = new StringStackAllocator(m_segment->get_segment_manager());
m_stack = m_segment->construct<StringStack>("MyStack")(*m_stackAllocator);
}
else
{
m_segment = new managed_shared_memory(open_only ,name.c_str());
m_stack = m_segment->find<StringStack>("MyStack").first;
}
m_mutex = new named_mutex(open_or_create, "named_mutex");
}
~SharedMemoryWrapper()
{
if (m_server)
{
named_mutex::remove("named_mutex");
m_segment->destroy<StringStack>("MyStack");
delete m_stackAllocator;
shared_memory_object::remove(m_name.c_str());
}
delete m_mutex;
delete m_segment;
}
void push(const std::string & in)
{
scoped_lock<named_mutex> lock(*m_mutex);
boost::interprocess::string inStr(in.c_str());
m_stack->push_back(inStr);
}
std::string pop()
{
scoped_lock<named_mutex> lock(*m_mutex);
std::string result = "";
if (m_stack->size() > 0)
{
result = std::string(m_stack->begin()->c_str());
m_stack->erase(m_stack->begin());
}
return result;
}
private:
typedef boost::interprocess::allocator<boost::interprocess::string, boost::interprocess::managed_shared_memory::segment_manager> StringStackAllocator;
typedef boost::interprocess::vector<boost::interprocess::string, StringStackAllocator> StringStack;
bool m_server;
std::string m_name;
boost::interprocess::managed_shared_memory * m_segment;
StringStackAllocator * m_stackAllocator;
StringStack * m_stack;
boost::interprocess::named_mutex * m_mutex;
};
EDIT Edited to use named_mutex. Original code was using interprocess_mutex which is incorrect, but that wasn't the problem.
EDIT2 I should also note that things work up to a point. The writer process can push several small strings (or one very large string) before the reader breaks. The reader breaks in a way that the line m_stack->begin() does not refer to a valid string. It's garbage. And then further execution throws an exception.
EDIT3 I have modified the class to use boost::interprocess::string rather than std::string. Still the reader fails with invalid memory address. Here is the reader/writer
//reader process
SharedMemoryWrapper mem("MyMemory", true);
std::string myString;
int x = 5;
do
{
myString = mem.pop();
if (myString != "")
{
std::cout << myString << std::endl;
}
} while (1); //while (myString != "");
//writer
SharedMemoryWrapper mem("MyMemory", false);
for (int i = 0; i < 1000000000; i++)
{
std::stringstream ss;
ss << i; //causes failure after few thousand iterations
//ss << "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" << i; //causes immediate failure
mem.push(ss.str());
}
return 0;

There are several things that leaped out at me about your implementation. One was the use of a pointer to the named mutex object, whereas the documentation of most boost libraries tends to bend over backwards to not use a pointer. This leads me to ask for a reference to the program snippet you worked from in building your own test case, as I have had similar misadventures and sometimes the only way out was to go back to the exemplar and work forward one step at a time until I come across the breaking change.
The other thing that seems questionable is your allocation of a 65k block for shared memory, and then in your test code, looping to 1000000000, pushing a string onto your stack each iteration.
With a modern PC able to execute 1000 instructions per microsecond and more, and operating systems like Windows still doling out execution quanta in 15 millisecond. chunks, it won't take long to overflow that stack. That would be my first guess as to why things are haywire.
P.S.
I just returned from fixing my name to something resembling my actual identity. Then the irony hit that my answer to your question has been staring us both in the face from the upper left hand corner of the browser page! (That is, of course, presuming I was correct, which is so often not the case in this biz.)

Well maybe shared memory is not the right design for your problem to begin with. However we would not know, because we don't know what you try to achieve in the first place.

Related

writing pointer contents without copy

I'm trying to print N characters pointed to by a pointer, there is no terminating character. Lets just say I have something like this (hopefully my ascii artwork is ok here.. I want to write the chars/string "bcd" to file/stdout )
char* ptr ----> 'a' , 'b' , 'c' , 'd' , 'e' , 'f'
^ ^
| |
begin end
Now I have no terminating character there. I have a pointer to beginning and end of the chars I want to write to stdout (or a logfile say). Performance is really important say and I want to avoid the overhead of copy constructing std::string (using the begin/end pointers).
Whats the fastest way to accomplish this, can anybody tell me? I've googled around but can't see anything. I could iterate over begin->end and print/write each char at a time but I'd like to get something faster/ready made. This is a theoretical question (for my own benefit) but I'd like to know how this is done in high performance applications (think FIX message strings in low latency applications).
Thanks much
Graham
If you would like to make custom buffering I could suggest on something like this
class buffered_stream_buf : public std::streambuf {
public:
buffered_stream_buf(std::ostream* stream)
: _size(), _stream(stream), _out(_stream->rdbuf()) {
_stream->rdbuf(this);
}
~buffered_stream_buf() {
_stream->flush();
_stream->rdbuf(_out);
}
int sync() override {
if (_size) {
_out->sputn(_buffer, _size);
_size = 0;
}
return _out->pubsync();
}
int overflow(int c) override {
if (c == std::streambuf::traits_type::eof()) {
return !std::streambuf::traits_type::eof();
}
_buffer[_size] = static_cast<char>(c);
++_size;
if (_size == sizeof(_buffer) && sync() != 0)
return std::streambuf::traits_type::eof();
return c;
}
private:
char _buffer[8 * 1024];
size_t _size;
std::ostream* _stream;
std::streambuf* _out;
};
int main() {
// Unbuffering `cout` might be a good idea:
// to avoid double copying
std::cout.setf(std::ios::unitbuf);
buffered_stream_buf mycoutbuf(&std::cout);
std::ofstream f("testmybuffer.txt", std::ios_base::out);
buffered_stream_buf myfbuf(&f);
std::cout << "Hello";
f << "Hello";
std::string my_long_string("long_long_long");
auto b = my_long_string.begin() + 3;
auto e = my_long_string.begin() + 5;
for (; b != e; ++b) {
std::cout << *b;
f << *b;
}
return 0;
}
The question will this improve the performance? I am not sure, it can even worsen your performance. Why? cout and fstream are usually already buffered with a probably good size according to your machine. That means before being send to OS objects (files, pipes, etc) C++ std implementaion might buffer it (though it is not required by the standard). Adding new layer of buffering might not be unnecessary and hit the performance (you first copy it to your buffer, then later you copy it again to std buffer). Second, OS objects files, pipes are already mapped to memory, i.e. buffered. However, with this language buffering you achieve less system calls that might be expensive. To be sure if your own buffering is helpful you should benchmark it. You do not have time for that I recommend leaving it out of your scope and rely on std and OS, those are usually quite good in this.
You can use std::basic_ostream::write:
std::cout.write(begin, end - begin);

Prevent or detect "this" from being deleted during use

One error that I often see is a container being cleared whilst iterating through it. I have attempted to put together a small example program demonstrating this happening. One thing to note is that this can often happen many function calls deep so is quite hard to detect.
Note: This example deliberately shows some poorly designed code. I am trying to find a solution to detect the errors caused by writing code such as this without having to meticulously examine an entire codebase (~500 C++ units)
#include <iostream>
#include <string>
#include <vector>
class Bomb;
std::vector<Bomb> bombs;
class Bomb
{
std::string name;
public:
Bomb(std::string name)
{
this->name = name;
}
void touch()
{
if(rand() % 100 > 30)
{
/* Simulate everything being exploded! */
bombs.clear();
/* An error: "this" is no longer valid */
std::cout << "Crickey! The bomb was set off by " << name << std::endl;
}
}
};
int main()
{
bombs.push_back(Bomb("Freddy"));
bombs.push_back(Bomb("Charlie"));
bombs.push_back(Bomb("Teddy"));
bombs.push_back(Bomb("Trudy"));
for(size_t i = 0; i < bombs.size(); i++)
{
bombs.at(i).touch();
}
return 0;
}
Can anyone suggest a way of guaranteeing this cannot happen?
The only way I can currently detect this kind of thing is replacing the global new and delete with mmap / mprotect and detecting use after free memory accesses. This and Valgrind however sometimes fail to pick it up if the vector does not need to reallocate (i.e only some elements removed or the new size is not yet the reserve size). Ideally I don't want to have to clone much of the STL to make a version of std::vector that always reallocates every insertion/deletion during debug / testing.
One way that almost works is if the std::vector instead contains std::weak_ptr, then the usage of .lock() to create a temporary reference prevents its deletion whilst execution is within the classes method. However this cannot work with std::shared_ptr because you do not need lock() and same with plain objects. Creating a container of weak pointers just for this would be wasteful.
Can anyone else think of a way to protect ourselves from this.
Easiest way is to run your unit tests with Clang MemorySanitizer linked in.
Let some continuous-integration Linux box to do it automatically on each push
into repo.
MemorySanitizer has "Use-after-destruction detection" (flag -fsanitize-memory-use-after-dtor + environment variable MSAN_OPTIONS=poison_in_dtor=1) and so it will blow the test up that executes the code and that turns your continuous-integration red.
If you have neither unit tests nor continuous integration in place then you can also just manually debug your code with MemorySanitizer but that is hard way compared with the easiest. So better start to use continuous integration and write unit tests.
Note that there may be legitimate reasons of memory reads and writes after destructor has been ran but memory hasn't yet been freed. For example std::variant<std::string,double>. It lets us to assign it std::string then double and so its implementation might destroy the string and reuse same storage for double. Filtering such cases out is unfortunately manual work at the moment, but tools evolve.
In your particular example the misery boils down to no less than two design flaws:
Your vector is a global variable. Limit the scope of all of your objects as much as possible and issues like this are less likely to occur.
Having the single responsibility principle in mind, I can hardly imagine how one could come up with a class that needs to have some method that either directly or indirectly (maybe through 100 layers of call stack) deletes objects that could happen to be this.
I am aware that your example is artificial and intentionally bad, so please don't get me wrong here: I'm sure that in your actual case it is not so obvious how sticking to some basic design rules can prevent you from doing this. But as I said, I strongly believe that good design will reduce the likelyhood of such bugs coming up. And in fact, I cannot remember that I was ever facing such an issue, but maybe I am just not experienced enough :)
However, if this really keeps being an issue despite sticking with some design rules, then I have this idea how to detect it:
Create a member int recursionDepth in your class and initialize it with 0
At the beginning of each non-private method increment it.
Use RAII to make sure that at the end of each method it is decremented again
In the destructor check it to be 0, otherwise it means that the destructor is directly or indirectly called by some method of this.
You may want to #ifdef all of this and enable it only in debug build. This would essentially make it a debug assertion, some people like them :)
Note, that this does not work in a multi threaded environment.
In the end I went with a custom iterator that if the owner std::vector resizes whilst the iterator is still in scope, it will log an error or abort (giving me a stacktrace of the program). This example is a bit convoluted but I have tried to simplify it as much as possible and removed unused functionality from the iterator.
This system has flagged up about 50 errors of this nature. Some may be repeats. However Valgrind and ElecricFence at this point came up clean which is disappointing (In total they flagged up around 10 which I have already fixed since the start of the code cleanup).
In this example I use clear() which Valgrind does flag as an error. However in the actual codebase it is random access erases (i.e vec.erase(vec.begin() + 9)) which I need to check and Valgrind unfortunately misses quite a few.
main.cpp
#include "sstd_vector.h"
#include <iostream>
#include <string>
#include <memory>
class Bomb;
sstd::vector<std::shared_ptr<Bomb> > bombs;
class Bomb
{
std::string name;
public:
Bomb(std::string name)
{
this->name = name;
}
void touch()
{
if(rand() % 100 > 30)
{
/* Simulate everything being exploded! */
bombs.clear(); // Causes an ABORT
std::cout << "Crickey! The bomb was set off by " << name << std::endl;
}
}
};
int main()
{
bombs.push_back(std::make_shared<Bomb>("Freddy"));
bombs.push_back(std::make_shared<Bomb>("Charlie"));
bombs.push_back(std::make_shared<Bomb>("Teddy"));
bombs.push_back(std::make_shared<Bomb>("Trudy"));
/* The key part is the lifetime of the iterator. If the vector
* changes during the lifetime of the iterator, even if it did
* not reallocate, an error will be logged */
for(sstd::vector<std::shared_ptr<Bomb> >::iterator it = bombs.begin(); it != bombs.end(); it++)
{
it->get()->touch();
}
return 0;
}
sstd_vector.h
#include <vector>
#include <stdlib.h>
namespace sstd
{
template <typename T>
class vector
{
std::vector<T> data;
size_t refs;
void check_valid()
{
if(refs > 0)
{
/* Report an error or abort */
abort();
}
}
public:
vector() : refs(0) { }
~vector()
{
check_valid();
}
vector& operator=(vector const& other)
{
check_valid();
data = other.data;
return *this;
}
void push_back(T val)
{
check_valid();
data.push_back(val);
}
void clear()
{
check_valid();
data.clear();
}
class iterator
{
friend class vector;
typename std::vector<T>::iterator it;
vector<T>* parent;
iterator() { }
iterator& operator=(iterator const&) { abort(); }
public:
iterator(iterator const& other)
{
it = other.it;
parent = other.parent;
parent->refs++;
}
~iterator()
{
parent->refs--;
}
bool operator !=(iterator const& other)
{
if(it != other.it) return true;
if(parent != other.parent) return true;
return false;
}
iterator operator ++(int val)
{
iterator rtn = *this;
it ++;
return rtn;
}
T* operator ->()
{
return &(*it);
}
T& operator *()
{
return *it;
}
};
iterator begin()
{
iterator rtn;
rtn.it = data.begin();
rtn.parent = this;
refs++;
return rtn;
}
iterator end()
{
iterator rtn;
rtn.it = data.end();
rtn.parent = this;
refs++;
return rtn;
}
};
}
The disadvantages of this system is that I must use an iterator rather than .at(idx) or [idx]. I personally don't mind this one so much. I can still use .begin() + idx if random access is needed.
It is a little bit slower (nothing compared to Valgrind though). When I am done, I can do a search / replace of sstd::vector with std::vector and there should be no performance drop.

An attempt to create atomic reference counting is failing with deadlock. Is this the right approach?

So I'm attempting to create copy-on-write map that uses an attempt at atomic reference counting on the read-side to not have locking.
Something isn't quite right. I see some references getting over-incremented and some are going down negative, so something isn't really atomic. In my tests I have 10 reader threads looping 100 times each doing a get() and 1 writer thread doing 100 writes.
It gets stuck in the writer because some of the references never go down to zero, even though they should.
I'm attempting to use the 128-bit DCAS technique laid explained by this blog.
Is there something blatantly wrong with this or is there an easier way to debugging this rather than playing with it in the debugger?
typedef std::unordered_map<std::string, std::string> StringMap;
static const int zero = 0; //provides an l-value for asm code
class NonBlockingReadMapCAS {
public:
class OctaWordMapWrapper {
public:
StringMap* fStringMap;
//std::atomic<int> fCounter;
int64_t fCounter;
OctaWordMapWrapper(OctaWordMapWrapper* copy) : fStringMap(new StringMap(*copy->fStringMap)), fCounter(0) { }
OctaWordMapWrapper() : fStringMap(new StringMap), fCounter(0) { }
~OctaWordMapWrapper() {
delete fStringMap;
}
/**
* Does a compare and swap on an octa-word - in this case, our two adjacent class members fStringMap
* pointer and fCounter.
*/
static bool inline doubleCAS(OctaWordMapWrapper* target, StringMap* compareMap, int64_t compareCounter, StringMap* swapMap, int64_t swapCounter ) {
bool cas_result;
__asm__ __volatile__
(
"lock cmpxchg16b %0;" // cmpxchg16b sets ZF on success
"setz %3;" // if ZF set, set cas_result to 1
: "+m" (*target),
"+a" (compareMap), //compare target's stringmap pointer to compareMap
"+d" (compareCounter), //compare target's counter to compareCounter
"=q" (cas_result) //results
: "b" (swapMap), //swap target's stringmap pointer with swapMap
"c" (swapCounter) //swap target's counter with swapCounter
: "cc", "memory"
);
return cas_result;
}
OctaWordMapWrapper* atomicIncrementAndGetPointer()
{
if (doubleCAS(this, this->fStringMap, this->fCounter, this->fStringMap, this->fCounter +1))
return this;
else
return NULL;
}
OctaWordMapWrapper* atomicDecrement()
{
while(true) {
if (doubleCAS(this, this->fStringMap, this->fCounter, this->fStringMap, this->fCounter -1))
break;
}
return this;
}
bool atomicSwapWhenNotReferenced(StringMap* newMap)
{
return doubleCAS(this, this->fStringMap, zero, newMap, 0);
}
}
__attribute__((aligned(16)));
std::atomic<OctaWordMapWrapper*> fReadMapReference;
pthread_mutex_t fMutex;
NonBlockingReadMapCAS() {
fReadMapReference = new OctaWordMapWrapper();
}
~NonBlockingReadMapCAS() {
delete fReadMapReference;
}
bool contains(const char* key) {
std::string keyStr(key);
return contains(keyStr);
}
bool contains(std::string &key) {
OctaWordMapWrapper *map;
do {
map = fReadMapReference.load()->atomicIncrementAndGetPointer();
} while (!map);
bool result = map->fStringMap->count(key) != 0;
map->atomicDecrement();
return result;
}
std::string get(const char* key) {
std::string keyStr(key);
return get(keyStr);
}
std::string get(std::string &key) {
OctaWordMapWrapper *map;
do {
map = fReadMapReference.load()->atomicIncrementAndGetPointer();
} while (!map);
//std::cout << "inc " << map->fStringMap << " cnt " << map->fCounter << "\n";
std::string value = map->fStringMap->at(key);
map->atomicDecrement();
return value;
}
void put(const char* key, const char* value) {
std::string keyStr(key);
std::string valueStr(value);
put(keyStr, valueStr);
}
void put(std::string &key, std::string &value) {
pthread_mutex_lock(&fMutex);
OctaWordMapWrapper *oldWrapper = fReadMapReference;
OctaWordMapWrapper *newWrapper = new OctaWordMapWrapper(oldWrapper);
std::pair<std::string, std::string> kvPair(key, value);
newWrapper->fStringMap->insert(kvPair);
fReadMapReference.store(newWrapper);
std::cout << oldWrapper->fCounter << "\n";
while (oldWrapper->fCounter > 0);
delete oldWrapper;
pthread_mutex_unlock(&fMutex);
}
void clear() {
pthread_mutex_lock(&fMutex);
OctaWordMapWrapper *oldWrapper = fReadMapReference;
OctaWordMapWrapper *newWrapper = new OctaWordMapWrapper(oldWrapper);
fReadMapReference.store(newWrapper);
while (oldWrapper->fCounter > 0);
delete oldWrapper;
pthread_mutex_unlock(&fMutex);
}
};
Maybe not the answer but this looks suspicious to me:
while (oldWrapper->fCounter > 0);
delete oldWrapper;
You could have a reader thread just entering atomicIncrementAndGetPointer() when the counter is 0 thus pulling the rug underneath the reader thread by deleting the wrapper.
Edit to sum up the comments below for potential solution:
The best implementation I'm aware of is to move fCounter from OctaWordMapWrapper to fReadMapReference (You don't need the OctaWordMapWrapper class at all actually). When the counter is zero swap the pointer in your writer. Because you can have high contention of reader threads which essentially blocks the writer indefinitely you can have highest bit of fCounter allocated for reader lock, i.e. while this bit is set the readers spin until the bit is cleared. The writer sets this bit (__sync_fetch_and_or()) when it's about to change the pointer, waits for the counter to fall down to zero (i.e. existing readers finish their work) and then swap the pointer and clears the bit.
This approach should be waterproof, though it's obviously blocking readers upon writes. I don't know if this is acceptable in your situation and ideally you would like this to be non-blocking.
The code would look something like this (not tested!):
class NonBlockingReadMapCAS
{
public:
NonBlockingReadMapCAS() :m_ptr(0), m_counter(0) {}
private:
StringMap *acquire_read()
{
while(1)
{
uint32_t counter=atom_inc(m_counter);
if(!(counter&0x80000000))
return m_ptr;
atom_dec(m_counter);
while(m_counter&0x80000000);
}
return 0;
}
void release_read()
{
atom_dec(m_counter);
}
void acquire_write()
{
uint32_t counter=atom_or(m_counter, 0x80000000);
assert(!(counter&0x80000000));
while(m_counter&0x7fffffff);
}
void release_write()
{
atom_and(m_counter, uint32_t(0x7fffffff));
}
StringMap *volatile m_ptr;
volatile uint32_t m_counter;
};
Just call acquire/release_read/write() before & after accessing the pointer for read/write. Replace atom_inc/dec/or/and() with __sync_fetch_and_add(), __sync_fetch_and_sub(), __sync_fetch_and_or() and __sync_fetch_and_and() respectively. You don't need doubleCAS() for this actually.
As noted correctly by #Quuxplusone in a comment below this is single producer & multiple consumer implementation. I modified the code to assert properly to enforce this.
Well, there are probably lots of problems, but here are the obvious two.
The most trivial bug is in atomicIncrementAndGetPointer. You wrote:
if (doubleCAS(this, this->fStringMap, this->fCounter, this->fStringMap, this->fCounter +1))
That is, you're attempting to increment this->fCounter in a lock-free way. But it doesn't work, because you're fetching the old value twice with no guarantee that the same value is read each time. Consider the following sequence of events:
Thread A fetches this->fCounter (with value 0) and computes argument 5 as this->fCounter +1 = 1.
Thread B successfully increments the counter.
Thread A fetches this->fCounter (with value 1) and computes argument 3 as this->fCounter = 1.
Thread A executes doubleCAS(this, this->fStringMap, 1, this->fStringMap, 1). It succeeds, of course, but we've lost the "increment" we were trying to do.
What you wanted is more like
StringMap* oldMap = this->fStringMap;
int64_t oldCounter = this->fCounter;
if (doubleCAS(this, oldMap, oldValue, oldMap, oldValue+1))
...
The other obvious problem is that there's a data race between get and put. Consider the following sequence of events:
Thread A begins to execute get: it fetches fReadMapReference.load() and prepares to execute atomicIncrementAndGetPointer on that memory address.
Thread B finishes executing put: it deletes that memory address. (It is within its rights to do so, because the wrapper's reference count is still at zero.)
Thread A starts executing atomicIncrementAndGetPointer on the deleted memory address. If you're lucky, you segfault, but of course in practice you probably won't.
As explained in the blog post:
The garbage collection interface is omitted, but in real applications you would need to scan the hazard pointers before deleting a node.
Another user has suggested a similar approach, but if you are compiling with gcc (and perhaps with clang), you could use the intrinsic __sync_add_and_fetch_4 which does something similar to what your assembly code does, and is likely much more portable.
I have used it when I implemented refcounting in an Ada library (but the algorithm remains the same).
int __sync_add_and_fetch_4 (int* ptr, int value);
// increments the value pointed to by ptr by value, and returns the new value
Although I'm not sure how your reader threads work, I suspect your problem is that you are not catching and handling possible out_of_range exceptions in your get() method that might arise from this line: std::string value = map->fStringMap->at(key);. Note that if key is not found in the map, this will throw, and exit the function without decrementing the counter, which would lead to the condition you describe (of getting stuck in the while-loop within the writer thread while waiting for the counters to decrement).
In any event, whether this is the cause of the issues you're seeing or not, you definitely need to either handle this exception (and any others) or modify your code such that there's no risk of a throw. For the at() method, I would probably just use find() instead, and then check the iterator it returns. However, more generally, I would suggest using the RAII pattern to ensure that you don't let any unexpected exceptions escape without unlocking/decrementing. For example, you might check out boost::scoped_lock to wrap your fMutex and then write something simple like this for the OctaWordMapWrapper increment/decrement:
class ScopedAtomicMapReader
{
public:
explicit ScopedAtomicMapReader(std::atomic<OctaWordMapWrapper*>& map) : fMap(NULL) {
do {
fMap = map.load()->atomicIncrementAndGetPointer();
} while (NULL == fMap);
}
~ScopedAtomicMapReader() {
if (NULL != fMap)
fMap->atomicDecrement();
}
OctaWordMapWrapper* map(void) {
return fMap;
}
private:
OctaWordMapWrapper* fMap;
}; // class ScopedAtomicMapReader
With something like that, then for example, your contains() and get() methods would simplify to (and be immune to exceptions):
bool contains(std::string &key) {
ScopedAtomicMapReader mapWrapper(fReadMapReference);
return (mapWrapper.map()->fStringMap->count(key) != 0);
}
std::string get(std::string &key) {
ScopedAtomicMapReader mapWrapper(fReadMapReference);
return mapWrapper.map()->fStringMap->at(key); // Now it's fine if this throws...
}
Finally, although I don't think you should have to do this, you might also try declaring fCounter as volatile as well (given your access to it in the while-loop in the put() method will be on a different thread than the writes to it on the reader threads.
Hope this helps!
By the way, one other minor thing: fReadMapReference is leaking. I think you should delete this in your destructor.

C++ throwing a std::bad_alloc exception for very small std::vector using std::sort

I'm working on a project in C++ which deals with comma separated data (CSV). What I do is reading the data from a .csv file into a vector of CsvRow objects.
So, today I encountered a really weird std::bad_alloc exceptions being thrown in much more weird situations. Namely, the first test case in which I managed to get a little more time until I get the exception thrown was reading a whole csv file into a vector. The file consists of 500,000 rows and its size is about 70MB. The file was read into memory like a charm, but then after a few seconds into the sorting procedure, the std::bad_alloc gets thrown. It used roughly 67MB of RAM
Note: I'm using boost's flyweights in order to reduce memory consumption.
BUT, this test case was even stranger:
I'm reading a 146KB file with a few hundreds of lines, and this time I got the exception while reading the data into a vector, which is totally ridiculous having a 70MB successfully read previously.
I'm suspecting a memory-leak, but my machine has 8GB of RAM, using 64-bit Windows 8.
I'm using CodeBlocks, and a MinGW 64-bit boost distro.
Any help would be appreciated.
Here is a chunk of code in which the std::bad_alloc is being thrown:
Reading data from a csv file
std::ifstream file(file_name_);
int k=0;
for (CsvIterator it(file); it != CsvIterator(); ++it) {
if(columns_ == 0) {
columns_ = (*it).size();
for (unsigned int i=0; i<columns_; i++) {
distinct_values_.push_back(*new __gnu_cxx::hash_set<std::string,
std::hash<std::string> >());
}
}
for (unsigned int i=0; i<columns_; i++) {
distinct_values_[i].insert((*it)[i]);
}
all_rows_[k]=(*it);
k++;
}
Sorting the vector using a internal struct stored in my class
struct SortRowsStruct
{
CsvSorter* r;
SortRowsStruct(CsvSorter* rr) : r(rr) { };
bool operator() (CsvRow a, CsvRow b)
{
for (unsigned int i=0; i<a.size(); i++) {
if(a[r->sorting_order_[i]] != b[r->sorting_order_[i]]) {
int dir = r->sorting_direction_[i];
switch(dir) {
case 0:
return (a[r->sorting_order_[i]] < b[r->sorting_order_[i]]);
break;
case 1:
return !(a[r->sorting_order_[i]] < b[r- >sorting_order_[i]]);
break;
case 2:
return true;
break;
default:
return true;
}
}
}
return true;
}
};
Then, I'm using std::sort() to sort the vector of CsvRows
SortRowsStruct s(this);
std::sort(all_rows_.begin(), all_rows_.end(), s);
This line looks really suspicious, but I could not figure out an easier way to initialize those hash sets.
distinct_values_.push_back( *new __gnu_cxx::hash_set<std::string,
std::hash<std::string> >() );
Deleting those hash sets in the destructor crashes the program (SIGSEGV)
Oh, and another thing to point out is that I can't use the default 32-bit gdb debugger due to my MinGW being 64-bit. The 32bit gdb is bugged and won't work with MinGW 64.
Edit:
Could the boost::flyweight<std::string> which I use in the CsvRow class cause the problem?
In addition to that, here is a part of the CsvRow class:
private:
std::vector<boost::flyweights::flyweight<std::string> > row_data_;
And the overloaded [] operator on the CsvRow class:
std::string const& CsvRow::operator[](std::size_t index) const
{
boost::flyweights::flyweight<std::string> fly = row_data_[index];
return fly.get();
}
Thanks in advance
EDIT - SOLVED:
So, this question solved my problem, although I didn't even think of it.
Every custom comparator we pass to the std::sort() has to be a strict weak ordering, that is being:
1. Irreflexive
2. Asymmetric
3. Transitive
4. Transitivity of incomparability
More info at :This question and This Wiki article
Actually, I did not follow the first one (irreflexivity), that is, if both of the CsvRow objects are equal, it should not "compare" them and return true as if they were okay, but instead return false.
I solved the whole problem by only changing the default return value when both CsvRow a and CsvRow b are equal.
bool operator() (CsvRow a, CsvRow b)
{
for (unsigned int i=0; i<a.size(); i++) {
if(a[r->sorting_order_[i]] != b[r->sorting_order_[i]]) {
...
...
}
}
return false; //this line does not violate the irreflexivity rule
//return true; //but this one does
}
Thanks to everyone who tried to help.
Remember this solution in case you experience a similar problem. It's pretty tricky.
This:
distinct_values_.push_back( *new __gnu_cxx::hash_set<std::string,
std::hash<std::string> >() );
Looks like you are trying to add one default-constructed element to the vector. There's an easier way:
distinct_values_.resize(distinct_values_.size() + 1);
Apart from being easier to type, and more generic, it's also a lot more correct: we should not be newing anything here, just creating a single value at the end, and we should let the vector construct it rather than copying it in, which might be wasteful.
And of course we should never try to delete these values.

Implicit new and delete operator killing perfomance

I am running very sleepy to profile my application and its showing me that 25% and 23% of the time spent by my function is doing new and delete respectively. I don't understand where this is occurring. So can someone tell me where this is occurring in my code.
inline FixParser(fixmessage& tokenMap, const std::string& str) {
static seperator sep_delim("\x01");
static seperator sep_equal("=");
static std::string error("ERROR: ");
static FixKey fix_Key;
static tokenizer token_equal(error);
static tokenizer token_delim(error);
static tokenizer::iterator itr;
token_delim.assign(str, sep_delim);
int key;
try {
for(tokenizer::iterator it = token_delim.begin();
it != token_delim.end(); ++it) {
token_equal.assign(*it, sep_equal);
itr = token_equal.begin();
key = boost::lexical_cast<int>(*itr);
if(fix_Key.keys.find(key) == fix_Key.keys.end()) continue;
++itr;
const std::string& value(*itr);
tokenMap.insert(std::pair<int, std::string>(key, value));
}
} catch(boost::bad_lexical_cast &) {
std::cerr << error << str << std::endl;
return;
}
}
I beg forgiveness for the use of static they will be removed later and placed in a struct.
One note: there are lots of strings being copied. Each string will incur a call to new to grab memory and delete to release it.
If performance is a premium and you have the ability to keep the copy of str around, you might want to use indexes instead. That is, having the tokens be pairs of indexes (begin, end) instead of full-blown strings. This is more error-prone obviously.
Also, tokenMap allocates one node per entry in the map, if you have a lot of entries, there will be a lot of nodes (and thus new to create them). You might want to use a deque instead, and sort the items once you're done, unless you really need what map offers (automatic deduplication).
Bikesheded version, removing most static variables (could not help myself):
inline FixParser(fixmessage& tokenMap, const std::string& str) {
static seperator sep_delim("\x01");
static seperator sep_equal("=");
static FixKey const fix_Key;
try {
tokenizer token_delim(str, sep_delim);
// avoid computing token_delim.end() at each iteration
for(tokenizer::iterator it = token_delim.begin(), end = token_delim.end();
it != end; ++it)
{
tokenizer token_equal(*it, sep_equal);
tokenizer::iterator itr = token_equal.begin();
int const key = boost::lexical_cast<int>(*itr);
if(fix_Key.keys.find(key) == fix_Key.keys.end()) continue;
++itr;
tokenMap.insert(std::make_pair(key, *itr));
}
} catch(boost::bad_lexical_cast &) {
std::cerr << error << str << std::endl;
return;
}
}
Make sure you are testing the Release build, not the Debug version. Debug builds use different versions of new and delete that help detect memory leaks at the expense of speed, and Debug builds don't optimise much (if at all).
I'd look at boost::lexical_cast. In its simplest form it simply uses streams. It probably does a lot of allocations.
The statics may be the problem. How many time are you calling the function FixParser?
Every time you call it the token_delim and token_equal objects have there assign methods called and if these are implemented like a vector assign then the memory backing the sequence will be destroyed and then allocated every time the FixParser function is called to assign the new entry.