I have a routine that is meant to load and parse data from a file. There is a possibility that the data from the same file might need to be retrieved from two places at once, i.e. during a background caching process and from a user request.
Specifically I am using C++11 thread and mutex libraries. We compile with Visual C++ 11 (aka 2012), so are limited by whatever it lacks.
My naive implementation went something like this:
map<wstring,weak_ptr<DataStruct>> data_cache;
mutex data_cache_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
auto data_ptr = make_shared<DataStruct>();
/* Parses and processes the data, may take a while */
return data_ptr;
}
shared_ptr<DataStruct> CreateStructFromData(wstring file_path) {
lock_guard<mutex> lock(data_cache_mutex);
auto cache_iter = data_cache.find(file_path);
if (cache_iter != end(data_cache)) {
auto data_ptr = cache_iter->second.lock();
if (data_ptr)
return data_ptr;
// reference died, remove it
data_cache.erase(cache_iter);
}
auto data_ptr = ParseDataFile(file_path);
if (data_ptr)
data_cache.emplace(make_pair(file_path, data_ptr));
return data_ptr;
}
My goals were two-fold:
Allow multiple threads to load separate files concurrently
Ensure that a file is only processed once
The problem with my current approach is that it doesn't allow concurrent parsing of multiple files at all. If I understand what will happen correctly, they're each going to hit the lock and end up processing linearly, one thread at a time. It may change from run to run the order which the threads pass through the lock first, but the end result is the same.
One solution I've considered was to create a second map:
map<wstring,mutex> data_parsing_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
lock_guard<mutex> lock(data_parsing_mutex[file_path]);
/* etc. */
data_parsing_mutex.erase(file_path);
}
But now I have to be concerned with how data_parsing_mutex is being updated. So I guess I need another mutex?
map<wstring,mutex> data_parsing_mutex;
mutex data_parsing_mutex_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
unique_lock<mutex> super_lock(data_parsing_mutex_mutex);
lock_guard<mutex> lock(data_parsing_mutex[file_path]);
super_lock.unlock();
/* etc. */
super_lock.lock();
data_parsing_mutex.erase(file_path);
}
In fact, looking at this, it's not going to avoid necessarily double-processing a file if it hasn't been completed by the background process when the user requests it, unless I check the cache yet again.
But by now my spidey senses are saying There must be a better way. Is there? Would futures, promises, or atomics help me at all here?
From what you described, it sounds like you're trying to do a form of lazy initialization of the DataStruct using a thread pool, along with a reference counted cache. std::async should be able to provide a lot of the dispatch and synchronization necessary for something like this.
Using std::async, the code would look something like this...
map<wstring,weak_ptr<DataStruct>> cache;
map<wstring,shared_future<shared_ptr<DataStruct>>> pending;
mutex cache_mutex, pending_mutex;
shared_ptr<DataStruct> ParseDataFromFile(wstring file) {
auto data_ptr = make_shared<DataStruct>();
/* Parses and processes the data, may take a while */
return data_ptr;
}
shared_ptr<DataStruct> CreateStructFromData(wstring file) {
shared_future<weak_ptr<DataStruct>> pf;
shared_ptr<DataStruct> ce;
{
lock_guard(cache_mutex);
auto ci = cache.find(file);
if (!(ci == cache.end() || ci->second.expired()))
return ci->second.lock();
}
{
lock_guard(pending_mutex);
auto fi = pending.find(file);
if (fi == pending.end() || fi.second.get().expired()) {
pf = async(ParseDataFromFile, file).share();
pending.insert(fi, make_pair(file, pf));
} else {
pf = pi->second;
}
}
pf.wait();
ce = pf.get();
{
lock_guard(cache_mutex);
auto ci = cache.find(file);
if (ci == cache.end() || ci->second.expired())
cache.insert(ci, make_pair(file, ce));
}
{
lock_guard(pending_mutex);
auto pi = pending.find(file);
if (pi != pending.end())
pending.erase(pi);
}
return ce;
}
This can probably be optimized a bit, but the general idea should be the same.
On a typical computer there is little point in trying to load files concurrently, since disk access will be the bottleneck. Instead, it's better to have a single thread load files (or use asynchronous I/O) and dish out the parsing to a thread pool. Then store the results in a shared container.
Regarding preventing double work, you should consider if this is really necessary. If you are only doing this out of premature optimization, you'd probably make users happier by focussing on making the program responsive, rather than efficient. That is, make sure the user gets what they ask for quickly, even if it means doing double work.
OTOH, if there is a technical reason for not parsing a file twice, you can keep track of the status of each file (loading, parsing, parsed) in the shared container.
Related
I've run into code that simplified looks like this
inline someClass* otherClass::getSomeClass()
{
if (m_someClass)
return m_someClass.get();
std::unique_lock<std::shared_mutex> lock(m_lock);
if (m_someClass)
return m_someClass.get();
m_someClass= std::make_unique<someClass>(this);
return m_someClass.get();
}
So it seems it's a pattern to be sure thread safety of creation of someClass object. I don't have much experience in multithreading, but this code doesn't look nice to me. Is there some other way to rewrite this or it's a way it should be?
The biggest problem here is that you are violating the C++ memory model. In the C++ memory model, a write operation and a read operation to the same data must be synchronized.
The m_someClass at the front is reading what is written to in the mutex.
It is possible that the operator bool on m_someClass is atomic somehow.
Also, your code doesn't handle the object ever being destroyed.
If it is atomic, then you should possibly be using atomic operations to update it and not a lock. Such a pattern can result in "wasted" objects being created; often this is worth the cost of removing the lock.
make m_someClass be std::atomic<std::shared_ptr<someClass>>.
Return std::shared_ptr<someClass> from getSomeClass.
auto existing = m_someClass.load();
if (existing)
return existing;
auto created = std::make_shared<someClass>(this);
if (
m_someClass.compare_exchange_strong(existing, created)
) {
return created;
} else {
return existing;
}
Two threads can both create a new someClass if they both try to get at the same time, but only one will persist, the other will be discarded, and the function will return the one that persists.
I have a decision to make regarding the way I code something, which is running on an embedded platform and am hoping there is a general "rule-of-thumb" that can be used in this case. Coding both my ideas and then benchmarking would obviously be the best way to go, but to get any meaningful, or rather accurate results out of this platform, in my particular case, would be quite tricky. I'm also sure that there may be others that are having the same question on their respective platforms, so I decided to ask it here. Please be kind, as I'm not very familiar with the threading library, so constructive feedback would be useful.
I have many threads (well, about 10-20 at maximum) all wanting to write to this hardware device. So I decided on using a simple ring-buffer consisting of 2 buffers (primary/secondary) of 8k each. This way each in-coming thread could be dealt with in a timely fashion. An arriving thread would obtain a mutex and write into the primary buffer and then release its mutex ready for the next thread. Now when the primary buffer is full, new incoming threads obviously switch to using the secondary buffer and then you start to write the primary buffer to the hardware device.
So the question really is... How best to write to the hardware device??? I'm thinking that there are two choices:
As soon as the buffer is full, create a new thread that does the write operation.
Signal a pre-created waiting worker-thread to do the write operation.
Both of the options seem to come with their respective pros/cons. Option 1 is the simplest to code and there are a number of ways to do this, but its effectiveness is dependent on how expensive it is to create/start the thread. The thread would be created, it would perform the write operation and then it would die. Option 2 however seems to be the most performant, but if you're going to have a reusable thread, you're going to need a mutex and a couple condition variables to control it. One to notify the thread that data is ready and another to ask for the thread to terminate when the program ends. Add to that a sprinkle of atomics for spurious wake-ups/missing notifications etc, and you've got quite an intricate solution to get right.
So what is the best method here? Are threads in general heavy to create/start or is this something that is completely platform dependent and benchmarking is the only way to know? Is there any benefit to using one method over the other that I've not thought about?
-- This is for the people not suffering from TL;DR syndrome --
I'm sure some of you have already wondered what happens if the secondary buffer becomes full before the write operation has finished? The answer in my case is fairly simple: this should never happen! Although the write operation is slow, it would never be slow enough such that the secondary buffer is filled before the write is complete. However, if someone is going to use this ring-buffer method, they must be prepared for this contingency. The way I thought about tackling this is to have a second mutex that is held during the write operation. This would mean that the thread that was due to write to the buffer would block until the write completed and the mutex was released.
Here's what I roughly ended up with after going with Option 2, but it seems awfully messy. I actually wanted to use promise/futures to avoid the spin-lock predicates on the condition variable, but couldn't think of a good way of moving a promise to an already created thread. Anyway... nice feedback is appreciated, bad-feedback, well, I'm not overly familiar with the threading library.
class Bar
{
public:
Bar(const size_t size) : buffer(new uint8_t[size]), buffer_size(size), used_size(0) {}
const size_t GetRemainingBufferSize(void) const { return buffer_size - used_size; }
const size_t GetUsedBufferSize(void) const { return used_size; }
const uint8_t* GetBuffer(void) const { return buffer.get(); }
const size_t GetBufferSize() const { return buffer_size; }
void ResetBuffer(void) { used_size = 0; }
void WriteIntoBuffer(const vector<uint8_t>& data)
{
std::copy(data.begin(), data.end(), buffer.get() + used_size);
used_size += data.size();
}
private:
std::unique_ptr<uint8_t[]> buffer;
size_t buffer_size;
size_t used_size;
};
class Foo
{
public:
Foo(const size_t buffer_size = 8192) : bar_buffers{ buffer_size, buffer_size }, primary_buffer(&bar_buffers[0]), secondary_buffer(&bar_buffers[1]),
write_predicate(false), quit_predicate(false), write_buffer(primary_buffer)
{
foo_thread = std::thread(&Foo::WriteHWThread, this);
}
~Foo()
{
quit_predicate = true;
begin_write.notify_one();
if (foo_thread.joinable())
foo_thread.join();
}
Foo(const Foo&) = delete;
Foo& operator=(const Foo&) = delete;
void WriteData(const std::vector<uint8_t>& data)
{
if (std::lock_guard<std::mutex> foo_lk(foo_lock); primary_buffer->GetRemainingBufferSize() < data.size())
{
std::unique_lock<std::mutex> write_lk(write_lock);
write_buffer = primary_buffer;
write_lk.unlock();
std::swap(primary_buffer, secondary_buffer);
primary_buffer->ResetBuffer();
write_predicate = true;
begin_write.notify_one();
}
primary_buffer->WriteIntoBuffer(data);
}
void WriteHWThread(void)
{
do
{
std::unique_lock<std::mutex> write_lk(write_lock);
begin_write.wait(write_lk, [&]() -> bool { return write_predicate.load() || quit_predicate.load(); });
write_predicate = false;
if (write_buffer.load()->GetUsedBufferSize())
<<< WRITE TO DEDICATED HARDWARE >>>
write_lk.unlock();
} while (!quit_predicate);
}
private:
Bar bar_buffers[2];
Bar* primary_buffer, *secondary_buffer;
std::atomic<bool> write_predicate, quit_predicate;
std::atomic<Bar*> write_buffer;
std::mutex foo_lock, write_lock;
std::thread foo_thread;
std::condition_variable begin_write;
};
I try to turn some central data structure of a large codebase multithreaded.
The access interfaces were changed to represent read/write locks, which may be up- and downgraded:
Before:
Container& container = state.getContainer();
auto value = container.find( "foo" )->bar;
container.clear();
Now:
ReadContainerLock container = state.getContainer();
auto value = container.find( "foo" )->bar;
{
// Upgrade read lock to write lock
WriteContainerLock write = state.upgrade( container );
write.clear();
} // Downgrades write lock to read lock
Using an actual std::mutex for the locking (instead of r/w implementation) works fine but brings no performance benefit (actually degrades runtime).
Actual changing data is relatively rare, so it seems very desirable to go with the read/write concept. The big issue now is that I cannot seem to find any library, which implements the read/write concept and supports upgrade and downgrade and works on Windows, OSX and Linux alike.
Boost has BOOST_THREAD_PROVIDES_SHARED_MUTEX_UPWARDS_CONVERSIONS but does not seem to support downgrading (blocking) atomic upgrading from shared to unique.
Is there any library out there, that supports the desired feature set?
EDIT:
Sorry for being unclear. Of course I mean multiple-readers/single-writer lock semantic.
The question has changed since I answered. As the previous answer is still useful, I will leave it up.
The new question seems to be "I want a (general purpose) reader writer lock where any reader can be upgraded to a writer atomically".
This cannot be done without deadlocks, or the ability to roll back operations (transactional reads), which is far from general-purpose.
Suppose you have Alice and Bob. Both want to read for a while, then they both want to write.
Alice and Bob both get a read lock. They then upgrade to a write lock. Neither can progress, because a write lock cannot be acquired while a read lock is acquired. You cannot unlock the read lock, because then the state Alice read while read locked may not be consistent with the state after the write lock is acquired.
This can only be solved with the possibility the read->write upgrade can fail, or the ability to rollback all operations in a read (so Alice can "unread", Bob can advance, then Alice can re-read and try to get the write lock).
Writing type-safe transactional code isn't really supported in C++. You can do it manually, but beyond simple cases it is error prone. Other forms of transactional rollbacks can also be used. None of them are general purpose reader-writer locks.
You can roll your own. If the states are R, U, W and {} (read, upgradable, write and no lock), these are transitions you can easily support:
{} -> R|U|W
R|U|W -> {}
U->W
W->U
U->R
and implied by the above:
W->R
which I think satisifies your requirements.
The "missing" transition is R->U, which is what lets us have multiple-readers safely. At most one reader (the upgrade reader) has the right to upgrade to write without releasing their read lock. While they are in that upgrade state they do not block other threads from reading (but they do block other threads from writing).
Here is a sketch. There is a shared_mutex A; and a mutex B;.
B represents the right to upgrade to write and the right to read while you hold it. All writers also hold a B, so you cannot both have the right to upgrade to write while someone else has the right to write.
Transitions look like:
{}->R = read(A)
{}->W = lock(B) then write(A)
{}->U = lock(B)
U->W = write(A)
W->U = unwrite(A)
U->R = read(A) then unlock(B)
W->R = W->U->R
R->{} = unread(A)
W->{} = unwrite(A) then unlock(B)
U->{} = unlock(B)
This simply requires std::shared_mutex and std::mutex, and a bit of boilerplate to write up the locks and the transitions.
If you want to be able to spawn a write lock while the upgrade lock "remains in scope" extra work needs to be done to "pass the upgrade lock back to the read lock".
Here are some bonus try transitions, inspired by #HowardHinnat below:
R->try U = return try_lock(B) && unread(A)
R->try W = return R->try U->W
Here is an upgradable_mutex with no try operations:
struct upgradeable_mutex {
std::mutex u;
std::shared_timed_mutex s;
enum class state {
unlocked,
shared,
aspiring,
unique
};
// one step at a time:
template<state start, state finish>
void transition_up() {
transition_up<start, (state)((int)finish-1)>();
transition_up<(state)((int)finish-1), finish>();
}
// one step at a time:
template<state start, state finish>
void transition_down() {
transition_down<start, (state)((int)start-1)>();
transition_down<(state)((int)start-1), finish>();
}
void lock();
void unlock();
void lock_shared();
void unlock_shared();
void lock_aspiring();
void unlock_aspiring();
void aspiring_to_unique();
void unique_to_aspiring();
void aspiring_to_shared();
void unique_to_shared();
};
template<>
void upgradeable_mutex::transition_up<
upgradeable_mutex::state::unlocked, upgradeable_mutex::state::shared
>
() {
s.lock_shared();
}
template<>
void upgradeable_mutex::transition_down<
upgradeable_mutex::state::shared, upgradeable_mutex::state::unlocked
>
() {
s.unlock_shared();
}
template<>
void upgradeable_mutex::transition_up<
upgradeable_mutex::state::unlocked, upgradeable_mutex::state::aspiring
>
() {
u.lock();
}
template<>
void upgradeable_mutex::transition_down<
upgradeable_mutex::state::aspiring, upgradeable_mutex::state::unlocked
>
() {
u.unlock();
}
template<>
void upgradeable_mutex::transition_up<
upgradeable_mutex::state::aspiring, upgradeable_mutex::state::unique
>
() {
s.lock();
}
template<>
void upgradeable_mutex::transition_down<
upgradeable_mutex::state::unique, upgradeable_mutex::state::aspiring
>
() {
s.unlock();
}
template<>
void upgradeable_mutex::transition_down<
upgradeable_mutex::state::aspiring, upgradeable_mutex::state::shared
>
() {
s.lock();
u.unlock();
}
void upgradeable_mutex::lock() {
transition_up<state::unlocked, state::unique>();
}
void upgradeable_mutex::unlock() {
transition_down<state::unique, state::unlocked>();
}
void upgradeable_mutex::lock_shared() {
transition_up<state::unlocked, state::shared>();
}
void upgradeable_mutex::unlock_shared() {
transition_down<state::shared, state::unlocked>();
}
void upgradeable_mutex::lock_aspiring() {
transition_up<state::unlocked, state::aspiring>();
}
void upgradeable_mutex::unlock_aspiring() {
transition_down<state::aspiring, state::unlocked>();
}
void upgradeable_mutex::aspiring_to_unique() {
transition_up<state::aspiring, state::unique>();
}
void upgradeable_mutex::unique_to_aspiring() {
transition_down<state::unique, state::aspiring>();
}
void upgradeable_mutex::aspiring_to_shared() {
transition_down<state::aspiring, state::shared>();
}
void upgradeable_mutex::unique_to_shared() {
transition_down<state::unique, state::shared>();
}
I attempt to get the compiler to work out some of the above transitions "for me" with the transition_up and transition_down trick. I think I can do better, and it did increase code bulk significantly.
Having it 'auto-write' the unlocked-to-unique, and unique-to-(unlocked|shared) was all I got out of it. So probably not worth it.
Creating smart RAII objects that use the above is a bit tricky, as they have to support some transitions that the default unique_lock and shared_lock do not support.
You could just write aspiring_lock and then do conversions in there (either as operator unique_lock, or as methods that return said, etc), but the ability to convert from unique_lock&& down to shared_lock is exclusive to upgradeable_mutex and is a bit tricky to work with implicit conversions...
live example.
Here's my usual suggestion: Seqlock
You can have a single writer and many readers concurrently. Writers compete using a spinlock. A single writer doesn't need to compete so is cheaper.
Readers are truly only reading. They're not writing any state variables, counters, etc. This means you don't really know how many readers are there. But also, there no cache line ping pong so you get the best performance possible in terms of latency and throughput.
What's the catch? the data almost has to be POD. It doesn't really have to POD, but it can not be invalidated (no deleting std::map nodes) as readers may read it while it's being written.
It's only after the fact that readers discover the data is possibly bad and they have to re-read.
Yes, writers don't wait for readers so there's no concept of upgrade/downgrade. You can unlock one and lock the other. You pay less than with any sort of mutex but the data may have changed in the process.
I can go into more detail if you like.
The std::shared_mutex (as implemented in boost if not available on your platform(s)) provides some alternative for the problem.
For atomic upgrade lock semantics, the boost upgrade lock may be the best cross platform alternative.
It does not have an upgrade and downgrade locking mechanism you are looking for, but to get an exclusive lock, the shared access can be relinquished first, then exclusive access sought.
// assumes shared_lock with shared access has been obtained
ReadContainerLock container = state.getContainer();
auto value = container.find( "foo" )->bar;
{
container.shared_mutex().unlock();
// Upgrade read lock to write lock
std::unique_lock<std::shared_mutex> write(container.shared_mutex());
// container work...
write.unlock();
container.shared_mutex().lock_shared();
} // Downgrades write lock to read lock
A utility class can be used to cause the re-locking of the shared_mutex at the end of the scope;
struct re_locker {
re_locker(std::shared_mutex& m) : m_(m) { m_.unlock(); }
~re_locker() { m_.shared_lock(); }
// delete the copy and move constructors and assignment operator (redacted for simplicity)
};
// ...
auto value = container.find( "foo" )->bar;
{
re_locker re_lock(container.shared_mutex());
// Upgrade read lock to write lock
std::unique_lock<std::shared_mutex> write(container.shared_mutex());
// container work...
} // Downgrades write lock to read lock
Depending on what exception guarantees you want or require, you may need to add a "can re-lock" flag to the re_locker to either do the re-lock or not if an exception is thrown during the container operations/work.
I currently have a program that has a cache like mechanism. I have a thread listening for updates from another server to this cache. This thread will update the cache when it receives an update. Here is some pseudo code:
void cache::update_cache()
{
cache_ = new std::map<std::string, value>();
while(true)
{
if(recv().compare("update") == 0)
{
std::map<std::string, value> *new_info = new std::map<std::string, value>();
std::map<std::string, value> *tmp;
//Get new info, store in new_info
tmp = cache_;
cache_ = new_cache;
delete tmp;
}
}
}
std::map<std::string, value> *cache::get_cache()
{
return cache_;
}
cache_ is being read from many different threads concurrently. I believe how I have it here I will run into undefined behavior if one of my threads call get_cache(), then my cache updates, then the thread tries to access the stored cache.
I am looking for a way to avoid this problem. I know I could use a mutex, but I would rather not block reads from happening as they have to be as low latency as possible, but if need be, I can go that route.
I was wondering if this would be a good use case for a unique_ptr. Is my understanding correct in that if a thread calls get_cache, and that returns a unique_ptr instead of a standard pointer, once all threads that have the old version of cache are finished with it(i.e leave scope), the object will be deleted.
Is using a unique_ptr the best option for this case, or is there another option that I am not thinking of?
Any input will be greatly appreciated.
Edit:
I believe I made a mistake in my OP. I meant to use and pass a shared_ptr not a unique_ptr for cache_. And when all threads are finished with cache_ the shared_ptr should delete itself.
A little about my program: My program is a webserver that will be using this information to decide what information to return. It is fairly high throughput(thousands of req/sec) Each request queries the cache once, so telling my other threads when to update is no problem. I can tolerate slightly out of date information, and would prefer that over blocking all of my threads from executing if possible. The information in the cache is fairly large, and I would like to limit any copies on value because of this.
update_cache is only run once. It is run in a thread that just listens for an update command and runs the code.
I feel there are multiple issues:
1) Do not leak memory: for that never use "delete" in your code and stick with unique_ptr (or shared_ptr in specific cases)
2) Protect accesses to shared data, for that either using locking (mutex) or lock-free mecanism (std::atomic)
class Cache {
using Map = std::map<std::string, value>();
std::unique_ptr<Map> m_cache;
std::mutex m_cacheLock;
public:
void update_cache()
{
while(true)
{
if(recv().compare("update") == 0)
{
std::unique_ptr<Map> new_info { new Map };
//Get new info, store in new_info
{
std::lock_guard<std::mutex> lock{m_cacheLock};
using std::swap;
swap(m_cache, new_cache);
}
}
}
}
Note: I don't like update_cache() being part of a public interface for the cache as it contains an infinite loop. I would probably externalize the loop with the recv and have a:
void update_cache(std::unique_ptr<Map> new_info)
{
{ // This inner brace is not useless, we don't need to keep the lock during deletion
std::lock_guard<std::mutex> lock{m_cacheLock};
using std::swap;
swap(m_cache, new_cache);
}
}
Now for the reading to the cache, use proper encapsulation and don't leave the pointer to the member map escape:
value get(const std::string &key)
{
// lock, fetch, and return.
// Depending on value type, you might want to allocate memory
// before locking
}
Using this signature you have to throw an exception if the value is not present in the cache, another option is to return something like a boost::optional.
Overall you can keep a low latency (everything is relative, I don't know your use case) if you take care of doing costly operations (memory allocation for instance) outside of the locking section.
shared_ptr is very reasonable for this purpose, C++11 has a family of functions for handling shared_ptr atomically. If the data is immutable after creation, you won't even need any additional synchronization:
class cache {
public:
using map_t = std::map<std::string, value>;
void update_cache();
std::shared_ptr<const map_t> get_cache() const;
private:
std::shared_ptr<const map_t> cache_;
};
void cache::update_cache()
{
while(true)
{
if(recv() == "update")
{
auto new_info = std::make_shared<map_t>();
// Get new info, store in new_info
// Make immutable & publish
std::atomic_store(&cache_,
std::shared_ptr<const map_t>{std::move(new_info)});
}
}
}
auto cache::get_cache() const -> std::shared_ptr<const map_t> {
return std::atomic_load(&cache_);
}
Let's say I have a multithreaded C++ program that handles requests in the form of a function call to handleRequest(string key). Each call to handleRequest occurs in a separate thread, and there are an arbitrarily large number of possible values for key.
I want the following behavior:
Simultaneous calls to handleRequest(key) are serialized when they have the same value for key.
Global serialization is minimized.
The body of handleRequest might look like this:
void handleRequest(string key) {
KeyLock lock(key);
// Handle the request.
}
Question: How would I implement KeyLock to get the required behavior?
A naive implementation might start off like this:
KeyLock::KeyLock(string key) {
global_lock->Lock();
internal_lock_ = global_key_map[key];
if (internal_lock_ == NULL) {
internal_lock_ = new Lock();
global_key_map[key] = internal_lock_;
}
global_lock->Unlock();
internal_lock_->Lock();
}
KeyLock::~KeyLock() {
internal_lock_->Unlock();
// Remove internal_lock_ from global_key_map iff no other threads are waiting for it.
}
...but that requires a global lock at the beginning and end of each request, and the creation of a separate Lock object for each request. If contention is high between calls to handleRequest, that might not be a problem, but it could impose a lot of overhead if contention is low.
You could do something similar to what you have in your question, but instead of a single global_key_map have several (probably in an array or vector) - which one is used is determined by some simple hash function on the string.
That way instead of a single global lock, you spread that out over several independent ones.
This is a pattern that is often used in memory allocators (I don't know if the pattern has a name - it should). When a request comes in, something determines which pool the allocation will come from (usually the size of the request, but other parameters can factor in as well), then only that pool needs to be locked. If an allocation request comes in from another thread that will use a different pool, there's no lock contention.
It will depend on the platform, but the two techniques that I'd try would be:
Use named mutex/synchronization
objects, where object name = Key
Use filesystem-based locking, where you
try to create a non-shareable
temporary file with the key name. If it exists already (=already
locked) this will fail and you'll
have to poll to retry
Both techniques will depend on the detail of your OS. Experiment and see which works.
.
Perhaps an std::map<std::string, MutexType> would be what you want, where MutexType is the type of the mutex you want. You will probably have to wrap accesses to the map in another mutex in order to ensure that no other thread is inserting at the same time (and remember to perform the check again after the mutex is locked to ensure that another thread didn't add the key while waiting on the mutex!).
The same principle could apply to any other synchronization method, such as a critical section.
Raise granularity and lock entire key-ranges
This is a variation on Mike B's answer, where instead of having several fluid lock maps you have a single fixed array of locks that apply to key-ranges instead of single keys.
Simplified example: create array of 256 locks at startup, then use first byte of key to determine index of lock to be acquired (i.e. all keys starting with 'k' will be guarded by locks[107]).
To sustain optimal throughput you should analyze distribution of keys and contention rate. The benefits of this approach are zero dynamic allocations and simple cleanup; you also avoid two-step locking. The downside is potential contention peaks if key distribution becomes skewed over time.
After thinking about it, another approach might go something like this:
In handleRequest, create a Callback that does the actual work.
Create a multimap<string, Callback*> global_key_map, protected by a mutex.
If a thread sees that key is already being processed, it adds its Callback* to the global_key_map and returns.
Otherwise, it calls its callback immediately, and then calls the callbacks that have shown up in the meantime for the same key.
Implemented something like this:
LockAndCall(string key, Callback* callback) {
global_lock.Lock();
if (global_key_map.contains(key)) {
iterator iter = global_key_map.insert(key, callback);
while (true) {
global_lock.Unlock();
iter->second->Call();
global_lock.Lock();
global_key_map.erase(iter);
iter = global_key_map.find(key);
if (iter == global_key_map.end()) {
global_lock.Unlock();
return;
}
}
} else {
global_key_map.insert(key, callback);
global_lock.Unlock();
}
}
This has the advantage of freeing up threads that would otherwise be waiting for a key lock, but apart from that it's pretty much the same as the naive solution I posted in the question.
It could be combined with the answers given by Mike B and Constantin, though.
/**
* StringLock class for string based locking mechanism
* e.g. usage
* StringLock strLock;
* strLock.Lock("row1");
* strLock.UnLock("row1");
*/
class StringLock {
public:
/**
* Constructor
* Initializes the mutexes
*/
StringLock() {
pthread_mutex_init(&mtxGlobal, NULL);
}
/**
* Lock Function
* The thread will return immediately if the string is not locked
* The thread will wait if the string is locked until it gets a turn
* #param string the string to lock
*/
void Lock(string lockString) {
pthread_mutex_lock(&mtxGlobal);
TListIds *listId = NULL;
TWaiter *wtr = new TWaiter;
wtr->evPtr = NULL;
wtr->threadId = pthread_self();
if (lockMap.find(lockString) == lockMap.end()) {
listId = new TListIds();
listId->insert(listId->end(), wtr);
lockMap[lockString] = listId;
pthread_mutex_unlock(&mtxGlobal);
} else {
wtr->evPtr = new Event(false);
listId = lockMap[lockString];
listId->insert(listId->end(), wtr);
pthread_mutex_unlock(&mtxGlobal);
wtr->evPtr->Wait();
}
}
/**
* UnLock Function
* #param string the string to unlock
*/
void UnLock(string lockString) {
pthread_mutex_lock(&mtxGlobal);
TListIds *listID = NULL;
if (lockMap.find(lockString) != lockMap.end()) {
lockMap[lockString]->pop_front();
listID = lockMap[lockString];
if (!(listID->empty())) {
TWaiter *wtr = listID->front();
Event *thdEvent = wtr->evPtr;
thdEvent->Signal();
} else {
lockMap.erase(lockString);
delete listID;
}
}
pthread_mutex_unlock(&mtxGlobal);
}
protected:
struct TWaiter {
Event *evPtr;
long threadId;
};
StringLock(StringLock &);
void operator=(StringLock&);
typedef list TListIds;
typedef map TMapLockHolders;
typedef map TMapLockWaiters;
private:
pthread_mutex_t mtxGlobal;
TMapLockWaiters lockMap;
};