Structuring and Synchronizing a Multithreaded Game Loop

Structuring and Synchronizing a Multithreaded Game Loop - c++

I'm running into a mild conundrum concerning thread safety for my game loop. What I have below is 3 threads (including the main) that are meant to work together. One for event managing (main thread), one for logic, and one for the rendering. All 3 of these threads exist within their own class, as you can see below. In basic testing the structure works without problems. This system uses SFML and renders with OpenGL.
int main(){
Gamestate gs;
EventManager em(&gs);
LogicManager lm(&gs);
Renderer renderer(&gs);
lm.start();
renderer.start();
em.eventLoop();
return 0;
}
However, as you may have noticed I have a "Gamestate" class that is meant to act as a container of all the resources that need to be shared between the threads (mostly with LogicManager as a writer and Renderer as a reader. EventManager is mostly just for window events). My questions are: (1 and 2 being the most important)
1) Is this a good way of going about things? Meaning is having a "global" Gamestate class a good idea to use? Is there a better way of going about it?
2) My intention was to have Gamestate have mutexes in the getters/setters, except that doesn't work for reading because I can't return the object while it's still locked, which means I'd have to put synchronization outside of the getters/setters and make the mutexes public. It also means I'd have a bloody ton of mutexes for all the different resources. What is the most elegant way of going about this problem?
3) I have all of the threads accessing "bool run" to check if to continue their loops
while(gs->run){
....
}
run gets set to false if I receive a quit message in the EventManager. Do I need to synchronize that variable at all? Would I set it to volatile?
4) Does constantly dereferencing pointers and such have an impact on performance? eg gs->objects->entitylist.at(2)->move(); Do all those '->' and '.' cause any major slowdown?

Global state
1) Is this a good way of going about things? Meaning is having a "global" Gamestate class a good idea to use? Is there a better way of going about it?
For a game, as opposed to some reusable piece of code, I'd say a global state is good enough. You might even avoid passing gamestate pointers around, and really make it a global variable instead.
Synchronization
2) My intention was to have Gamestate have mutexes in the getters/setters, except that doesn't work for reading because I can't return the object while it's still locked, which means I'd have to put synchronization outside of the getters/setters and make the mutexes public. It also means I'd have a bloody ton of mutexes for all the different resources. What is the most elegant way of going about this problem?
I'd try to think of this in terms of transactions. Wrapping every single state change into its own mutex locking code will not only impact performance, but might lead to actually incorrect behaviour if the code gets one state element, performs some computation on it and sets the value later on, while some other code modified the same element in between. So I'd try to structure LogicManager and Renderer in such ways that all the interaction with the Gamestate occurs bundled in a few places. For the duration of that interaction, the thread should hold a mutex on the state.
If you want to enforce the use of mutexes, then you can create some construct where you have at least two classes. Let's call them GameStateData and GameStateAccess. GameStateData would contain all the state, but without providing public access to it. GameStateAccess would be a friend of GameStateData and provide access to its private data. The constructor of GameStateAccess would take a reference or pointer to the GameStateData and would lock the mutex for that data. The destructor would free the mutex. That way, your code to manipulate the state would simply be written as a block where a GameStateAccess object is in scope.
There is still a loophole, though: In cases where objects returned from this GameStateAccess class are pointers or references to mutable objects, then this setup won't keep your code from carrying such a pointer out of the scope protected by the mutex. To prevent this, either take care about how you write things, or use some custom pointer-like template class which can be cleared once the GameStateAccess goes out of scope, or make sure you only pass things by value not reference.
Example
Using C++11, the above idea for lock management could be implemented as follows:
class GameStateData {
private:
std::mutex _mtx;
int _val;
friend class GameStateAccess;
};
GameStateData global_state;
class GameStateAccess {
private:
GameStateData& _data;
std::lock_guard<std::mutex> _lock;
public:
GameStateAccess(GameStateData& data)
: _data(data), _lock(data._mtx) {}
int getValue() const { return _data._val; }
void setValue(int val) { _data._val = val; }
};
void LogicManager::performStateUpdate {
int valueIncrement = computeValueIncrement(); // No lock for this computation
{ GameStateAccess gs(global_state); // Lock will be held during this scope
int oldValue = gs.getValue();
int newValue = oldValue + valueIncrement;
gs.setValue(newValue); // still in the same transaction
} // free lock on global state
cleanup(); // No lock held here either
}
Loop termination indicator
3) I have all of the threads accessing "bool run" to check if to continue their loops
while(gs->run){
....
}
run gets set to false if I receive a quit message in the EventManager. Do I need to synchronize that variable at all? Would I set it to volatile?
For this application, a volatile but otherwise unsynchronized variable should be fine. You have to declare it volatile in order to prevent the compiler from generating code which caches that value, thus hiding a modification by another thread.
As an alternative, you might want to use a std::atomic variable for this.
Pointer indirection overhead
4) Does constantly dereferencing pointers and such have an impact on performance? eg gs->objects->entitylist.at(2)->move(); Do all those -> and . cause any major slowdown?
It depends on the alternatives. In many cases, the compiler will be able to keep the value of e.g. gs->objects->entitylist.at(2) in the above code, if it is used repeatedly, and won't have to compute it over and over again. In general I would consider the performance penalty due to all this pointer indirection to be of minor concern, but that is hard to tell for sure.

Is it a good way of going about things? (class Gamestate)
1) Is this a good way of going about things?
Yes.
Meaning is having a "global" Gamestate class a good idea to use?
Yes, if the getter/setter are thread-safe.
Is there a better way of going about it?
No. The data is necessary for both game logic and representation. You could remove the global gamestate if you put it in a sub-routine, but this would only transport your problem to another function. A global Gamestate will also enable you to safe the current state very easily.
Mutex and getters/setters
2) My intention was to have Gamestate have mutexes in the getters/setters [...]. What is the most elegant way of going about this problem?
This is called reader/writer problem. You don't need public mutexes for this. Just keep in mind that you can have many readers, but only one writer. You could implement a queue for the readers/writers and block additional readers until the writer has finished.
while(gs->run)
Do I need to synchronize that variable at all?
Whenever a non-synchronized access of a variable could result in a unknown state, it should be synchronized. So if run will be set to false immediately after the rendering engine started the next iteration and the Gamestate has been destroyed, it will result in a mess. However, if the gs->run is only an indicator whether the loop should continue, it is safe.
Keep in mind that both logic and rendering engine should be stopped at the same time. If you can't shutdown both at the same time stop the rendering engine first in order to prevent a freeze.
Dereferencing pointers
4) Does constantly dereferencing pointers and such have an impact on performance?
There are two rules of optimization:
Do not optimize
Do not optimize yet.
The compiler will probably take care of this problem. You, as a programmer, should use the version which is most readable for you.

Related

Guarded Data Design Pattern

In our application we deal with data that is processed in a worker thread and accessed in a display thread and we have a mutex that takes care of critical sections. Nothing special.
Now we thought about re-working our code where currently locking is done explicitely by the party holding and handling the data. We thought of a single entity that holds the data and only gives access to the data in a guarded fashion.
For this, we have a class called GuardedData. The caller can request such an object and should keep it only for a short time in local scope. As long as the object lives, it keeps the lock. As soon as the object is destroyed, the lock is released. The data access is coupled with the locking mechanism without any explicit extra work in the caller. The name of the class reminds the caller of the present guard.
template<typename T, typename Lockable>
class GuardedData {
GuardedData(T &d, Lockable &m) : data(d), guard(m) {}
boost::lock_guard<Lockable> guard;
T &data;
T &operator->() { return data; }
};
Again, a very simple concept. The operator-> mimics the semantics of STL iterators for access to the payload.
Now I wonder:
Is this approach well known?
Is there maybe a templated class like this already available, e.g. in the boost libraries?
I am asking because I think it is a fairly generic and usable concept. I could not find anything like it though.

Depending upon how this is used, you are almost guaranteed to end up with deadlocks at some point. If you want to operate on 2 pieces of data then you end up locking the mutex twice and deadlocking (unless each piece of data has its own mutex - which would also result in deadlock if the lock order is not consistent - you have no control over that with this scheme without making it really complicated). Unless you use a recursive mutex which may not be desired.
Also, how are your GuardedData objects passed around? boost::lock_guard is not copyable - it raises ownership issues on the mutex i.e. where & when it is released.
Its probably easier to copy parts of the data you need to the reader/writer threads as and when they need it, keeping the critical section short. The writer would similarly commit to the data model in one go.
Essentially your viewer thread gets a snapshot of the data it needs at a given time. This may even fit entirely in a cpu cache sitting near the core that is running the thread and never make it into RAM. The writer thread may modify the underlying data whilst the reader is dealing with it (but that should invalidate the view). However since the viewer has a copy it can continue on and provide a view of the data at the moment it was synchronized with the data.
The other option is to give the view a smart pointer to the data (which should be treated as immutable). If the writer wishes to modify the data, it copies it at that point, modifies the copy and when completes, switches the pointer to the data in the model. This would necessitate blocking all readers/writers whilst processing, unless there is only 1 writer. The next time the reader requests the data, it gets the fresh copy.

Well known, I'm not sure. However, I use a similar mechanism in Qt pretty often called a QMutexLocker. The distinction (a minor one, imho) is that you bind the data together with the mutex. A very similar mechanism to the one you've described is the norm for thread synchronization in C#.
Your approach is nice for guarding one data item at a time but gets cumbersome if you need to guard more than that. Additionally, it doesn't look like your design would stop me from creating this object in a shared place and accessing the data as often as I please, thinking that it's guarded perfectly fine, but in reality recursive access scenarios are not handled, nor are multi-threaded access scenarios if they occur in the same scope.
There seems to be to be a slight disconnect in the idea. Its use conveys to me that accessing the data is always made to be thread-safe because the data is guarded. Often, this isn't enough to ensure thread-safety. Order of operations on protected data often matters, so the locking is really scope-oriented, not data-oriented. You could get around this in your model by guarding a dummy object and wrapping your guard object in a temporary scope, but then why not just use one the existing mutex implementations?
Really, it's not a bad approach, but you need to make sure its intended use is understood.

std::mutex vs std::recursive_mutex as class member

I have seen some people hate on recursive_mutex:
http://www.zaval.org/resources/library/butenhof1.html
But when thinking about how to implement a class that is thread safe (mutex protected), it seems to me excruciatingly hard to prove that every method that should be mutex protected is mutex protected and that mutex is locked at most once.
So for object oriented design, should std::recursive_mutex be default and std::mutex considered as an performance optimization in general case unless it is used only in one place (to protect only one resource)?
To make things clear, I'm talking about one private nonstatic mutex. So each class instance has only one mutex.
At the beginning of each public method:
{
std::scoped_lock<std::recursive_mutex> sl;

Most of the time, if you think you need a recursive mutex then your design is wrong, so it definitely should not be the default.
For a class with a single mutex protecting the data members, then the mutex should be locked in all the public member functions, and all the private member functions should assume the mutex is already locked.
If a public member function needs to call another public member function, then split the second one in two: a private implementation function that does the work, and a public member function that just locks the mutex and calls the private one. The first member function can then also call the implementation function without having to worry about recursive locking.
e.g.
class X {
std::mutex m;
int data;
int const max=50;
void increment_data() {
if (data >= max)
throw std::runtime_error("too big");
++data;
}
public:
X():data(0){}
int fetch_count() {
std::lock_guard<std::mutex> guard(m);
return data;
}
void increase_count() {
std::lock_guard<std::mutex> guard(m);
increment_data();
}
int increase_count_and_return() {
std::lock_guard<std::mutex> guard(m);
increment_data();
return data;
}
};
This is of course a trivial contrived example, but the increment_data function is shared between two public member functions, each of which locks the mutex. In single-threaded code, it could be inlined into increase_count, and increase_count_and_return could call that, but we can't do that in multithreaded code.
This is just an application of good design principles: the public member functions take responsibility for locking the mutex, and delegate the responsibility for doing the work to the private member function.
This has the benefit that the public member functions only have to deal with being called when the class is in a consistent state: the mutex is unlocked, and once it is locked then all invariants hold. If you call public member functions from each other then they have to handle the case that the mutex is already locked, and that the invariants don't necessarily hold.
It also means that things like condition variable waits will work: if you pass a lock on a recursive mutex to a condition variable then (a) you need to use std::condition_variable_any because std::condition_variable won't work, and (b) only one level of lock is released, so you may still hold the lock, and thus deadlock because the thread that would trigger the predicate and do the notify cannot acquire the lock.
I struggle to think of a scenario where a recursive mutex is required.

should std::recursive_mutex be default and std::mutex considered as an performance optimization?
Not really, no. The advantage of using non-recursive locks is not just a performance optimization, it means that your code is self-checking that leaf-level atomic operations really are leaf-level, they aren't calling something else that uses the lock.
There's a reasonably common situation where you have:
a function that implements some operation that needs to be serialized, so it takes the mutex and does it.
another function that implements a larger serialized operation, and wants to call the first function to do one step of it, while it is holding the lock for the larger operation.
For the sake of a concrete example, perhaps the first function atomically removes a node from a list, while the second function atomically removes two nodes from a list (and you never want another thread to see the list with only one of the two nodes taken out).
You don't need recursive mutexes for this. For example you could refactor the first function as a public function that takes the lock and calls a private function that does the operation "unsafely". The second function can then call the same private function.
However, sometimes it's convenient to use a recursive mutex instead. There's still an issue with this design: remove_two_nodes calls remove_one_node at a point where a class invariant doesn't hold (the second time it calls it, the list is in precisely the state we don't want to expose). But assuming we know that remove_one_node doesn't rely on that invariant this isn't a killer fault in the design, it's just that we've made our rules a little more complex than the ideal "all class invariants always hold whenever any public function is entered".
So, the trick is occasionally useful and I don't hate recursive mutexes to quite the extent that article does. I don't have the historical knowledge to argue that the reason for their inclusion in Posix is different from what the article says, "to demonstrate mutex attributes and thread extensons". I certainly don't consider them the default, though.
I think it's safe to say that if in your design you're uncertain whether you need a recursive lock or not, then your design is incomplete. You will later regret the fact that you're writing code and you don't know something so fundamentally important as whether the lock is allowed to be already held or not. So don't put in a recursive lock "just in case".
If you know that you need one, use one. If you know that you don't need one, then using a non-recursive lock isn't just an optimization, it's helping to enforce a constraint of the design. It's more useful for the second lock to fail, than for it to succeed and conceal the fact that you've accidentally done something that your design says should never happen. But if you follow your design, and never double-lock the mutex, then you'll never find out whether it's recursive or not, and so a recursive mutex isn't directly harmful.
This analogy might fail, but here's another way to look at it. Imagine you had a choice between two kinds of pointer: one that aborts the program with a stacktrace when you dereference a null pointer, and another one that returns 0 (or to extend it to more types: behaves as if the pointer refers to a value-initialized object). A non-recursive mutex is a bit like the one that aborts, and a recursive mutex is a bit like the one that returns 0. They both potentially have their uses -- people sometimes go to some lengths to implement a "quiet not-a-value" value. But in the case where your code is designed to never dereference a null pointer, you don't want to use by default the version that silently allows that to happen.

I'm not going to directly weigh in on the mutex versus recursive_mutex debate, but I thought it would be good to share a scenario where recursive_mutex'es are absolutely critical to the design.
When working with Boost::asio, Boost::coroutine (and probably things like NT Fibers although I'm less familiar with them), it is absolutely essential that your mutexes be recursive even without the design problem of re-entrancy.
The reason is because the coroutine based approach by its very design will suspend execution inside a routine and then subsequently resume it. This means that two top level methods of a class might "be being called at the same time on the same thread" without any sub calls being made.

interprocess object passing

I need to have a class with one activity that is performed once per 5 seconds in its own thread. It is a web service one, so it needs an endpoint to be specified. During the object runtime the main thread can change the endpoint. This is my class:
class Worker
{
public:
void setEndpoint(const std::string& endpoint);
private:
void activity (void);
mutex endpoint_mutex;
volatile std::auto_ptr<std::string> newEndpoint;
WebServiceClient client;
}
Does the newEndpoint object need to be declared volatile? I would certainly do it if the read was in some loop (to make the complier not optimize it out), but here I don't know.
In each run the activity() function checks for a new endpoint (if a new one is there, then passes it to the client and perform some reconnection steps) and do its work.
void Worker::activity(void)
{
endpoint_mutex.lock(); //don't consider exceptions
std::auto_ptr<std::string>& ep = const_cast<std::auto_ptr<string> >(newEndpoint);
if (NULL != ep.get())
{
client.setEndpoint(*ep);
ep.reset(NULL);
endpoint_mutex.unlock();
client.doReconnectionStuff();
client.doReconnectionStuff2();
}
else
{
endpoint_mutex.unlock();
}
client.doSomeStuff();
client.doAnotherStuff();
.....
}
I lock the mutex, which means that the newEndpoint object cannot change anymore, so I remove the volatile class specification to be able to invoke const methods.
The setEndpoint method (called from another threads):
void Worker::setEndpoint(const std::string& endpoint)
{
endpoint_mutex.lock(); //again - don't consider exceptions
std::auto_ptr<std::string>& ep = const_cast<std::auto_ptr<string> >(newEndpoint);
ep.reset(new std::string(endpoint);
endpoint_mutex.unlock();
}
Is this thing thread safe? If not, what is the problem? Do I need the newEndpoint object to be volatile?

volatile is used in the following cases per MSDN:
The volatile keyword is a type qualifier used to declare that an
object can be modified in the program by something such as the
operating system, the hardware, or a concurrently executing thread.
Objects declared as volatile are not used in certain optimizations
because their values can change at any time. The system always reads
the current value of a volatile object at the point it is requested,
even if a previous instruction asked for a value from the same object.
Also, the value of the object is written immediately on assignment.
The question in your case is, how often does your NewEndPoint actually change? You create a connection in thread A, and then you do some work. While this is going on, nothing else can fiddle with your endpoint, as it is locked by a mutex. So, per my analysis, and from what I can see in your code, this variable doesn't necessarily change enough.
I cannot see the call site of your class, so I don't know if you are using the same class instance 100 times or more, or if you are creating new objects.
This is the kind of analysis you need to make when asking whether something should be volatile or not.
Also, on your thread-safety, what happens in these functions:
client.doReconnectionStuff();
client.doReconnectionStuff2();
Are they using any of the shared state from your Worker class? Are they sharing and modifying any other state use by another thread? If yes, you need to do the appropriate synchronization.
If not, then you're ok.
Threading requires some thinking, you need to ask yourself these questions. You need to look at all state and wonder whether or not you're sharing. If you're dealing with pointers, then you need wonder who own's the pointer, and whether you're ever sharing it amongst threads, accidentally or not, and act accordingly. If you pass a pointer to a function that is run in a different thread, then you're sharing the object that the pointer points to. If you then alter what it points to in this new thread, you are sharing and need to synchronize.

Mutex protection for Singleton resources in multithreaded env

I have a server listening on a port for request. When the request comes in, it is dispatched to a singleton class Singleton. This Singleton class has a data structure RootData.
class Singleton {
void process();
void refresh();
private:
RootData mRootData;
}
In the class there are two functions: process: Work with the mRootData to do some processing and refresh: called periodically from another thread to refresh the mRootData with the latest changes from Database.
It is required that the access to mRootData be gaurded by Mutex.
I have the following questions:
1] If the class is a singleton and mRootData is inside that class, is the Mutex gaurd really necessary?
I know it is necessary for conflict between refresh/process. But from the server, i think there will be only one call to process happening at any give time ( coz the class is Singleton) Please correct me if my understanding is wrong.
2] Should i protect the i) data structure OR ii) function accessing the data structure. E.g.
i) const RootData& GetRootData()
{
ACE_Read_Guard guard(m_oMutexReadWriteLock);
return mRootData;
// Mutex is released when this function returns
}
// Similarly Write lock for SetRootData()
ii) void process()
{
ACE_Read_Guard guard(m_oMutexReadWriteLock);
// use mRootData and do processing
// GetRootData() and SetRootData() wont be mutex protected.
// Mutex is released when this function returns
}
3] If the answer to above is i) should i return by reference or by object?
Please explain in either case.
Thanks in advance.

1] If the class is a singleton and mRootData is inside that class, is the Mutex gaurd really necessary?
Yes it is, since one thread may call process() while another is calling refresh().
2] Should i protect the i) data structure OR ii) function accessing the data structure.
Mutex is meant to protect a common code path, i.e. (part of) the function accessing the shared data. And it is easiest to use when locking and releasing happens within the same code block. Putting them into different methods is almost an open invitation for deadlocks, since it is up to the caller to ensure that every lock is properly released.
Update: if GetRootData and SetRootData are public functions too, there is not much point to guard them with mutexes in their current form. The problem is, you are publishing a reference to the shared data, after which it is completely out of your control what and when the callers may do with it. 100 callers from 100 different threads may store a reference to mRootData and decide to modify it at the same time! Similarly, after calling SetRootData the caller may retain the reference to the root data, and access it anytime, without you even noticing (except eventually from a data corruption or deadlock...).
You have the following options (apart from praying that the clients be nice and don't do nasty things to your poor singleton ;-)
create a deep copy of mRootData both in the getter and the setter. This keeps the data confined to the singleton, where it can be guarded with locks. OTOH callers of GetRootData get a snapshot of the data, and subsequent changes are not visible to them - this may or may not be acceptable to you.
rewrite RootData to make it thread safe, then the singleton needs to care no more about thread safety (in its current setup - if it has other data members, the picture may be different).
Update2:
or remove the getter and setter altogether (possibly together with moving data processing methods from other classes into the singleton, where these can be properly guarded by mutexes). This would be the simplest and safest, unless you absolutely need other parties to access mRootData directly.

1.) Yes, the mutex is necessary. Although there is only one instance of the class in existence at any one time, multiple threads could still call process() on that instance at the same time (unless you design your app so that never happens).
2.) Anytime you use the value you should protect it with the mutex.
However, you don't mention a GetRootData and SetRootData in your class declaration above. Are these private (used only inside the class to access the data) or public (to allow other code to access the data directly)?
If you need to provide outside access to the data by making the GetRootData() function public, then you would need to return a copy, or your callers could then store a reference and manipulate the data after the lock has been released. Of course, then changes they made to the data wouldn't be reflected inside the singleton, which might not be what you want.

Does a getter function need a mutex?

I have a class that is accessed from multiple threads. Both of its getter and setter functions are guarded with locks.
Are the locks for the getter functions really needed? If so, why?
class foo {
public:
void setCount (int count) {
boost::lock_guard<boost::mutex> lg(mutex_);
count_ = count;
}
int count () {
boost::lock_guard<boost::mutex> lg(mutex_); // mutex needed?
return count_;
}
private:
boost::mutex mutex_;
int count_;
};

The only way you can get around having the lock is if you can convince yourself that the system will transfer the guarded variable atomicly in all cases. If you can't be sure of that for one reason or another, then you'll need the mutex.
For a simple type like an int, you may be able to convince yourself this is true, depending on architecture, and assuming that it's properly aligned for single-instruction transfer. For any type that's more complicated than this, you're going to have to have the lock.

If you don't have a mutex around the getter, and a thread is reading it while another thread is writing it, you'll get funny results.

Is the mutex really only protecting a single int? It makes a difference -- if it is a more complex datatype you definitely need locking.
But if it is just an int, and you are sure that int is an atomic type (i.e., the processor will not have to do two separate memory reads to load the int into a register), and you have benchmarked the performance and determined you need better performance, then you may consider dropping the lock from both the getter and the setter. If you do that, make sure to qualify the int as volatile. And write a comment explaining why you do not have mutex protection, and under what conditions you would need it if the class changes.
Also, beware that you don't have code like this:
void func(foo &f) {
int temp = f.count();
++temp;
f.setCount(temp);
}
That is not threadsafe, regardless of whether you use a mutex or not. If you need to do something like that, the mutex protection has to be outside the setter/getter functions.

The synchronization concern is already covered in other answers (specifically David Schwartz's).
There's another concern I don't see addressed, though: this is usually a bad design.
Consider David's example code, assuming we have a correctly-synchronized version of foo
{
foo j;
some_func(j);
while (j.count() == 0)
{
// do we still expect (j.count() == 0) here?
bar();
}
}
The code suggests that the while condition still holds in the body. That's how single-threaded code works, after all.
But of course, even if we correctly synchronize the implementation of a getter, the setter can still be called from another thread, between our while condition succeeding and the first instruction of the loop body executing.
So, if any logic in the loop body can't depend on the condition being true, what was the point of testing it?
Sometimes it makes perfect sense, such as
while (foo.shouldKeepRunning())
{
// foo event loop or something
}
where it's OK if our shouldKeepRunning state changes during the loop body, because we only need to test it periodically. However, if you're going to do something with count, you need a longer-lived lock, and an interface to support it:
{
auto guard = j.lock_guard();
while (j.count(guard) == 0) // prove to count that we're locked
{
// now we _know_ count is zero in the body
// (but bar should release and re-acquire the lock or that can never change)
bar(j);
}
} // guard goes out of scope and unlocks

in you case probably not, if your cpu is 32 bit, however if count is a complex object or cpu needs more than one instruction to update its value, then yes

The lock is necessary to serialize access to shared resource. In your specific case you might get away with just atomic integer operations but in general, for larger objects that require more then one bus transaction, you do need locks to guarantee that reader always sees a consistent object.

It depends on the exact implementation of the object being locked. However, in general you do not want someone modifying (setting?) an object while someone else is in the process of reading (getting?) it. The easiest way to prevent that is to have a reader lock it.
In more complicated setups the lock will be implemented in such a way that any number of folks can read at once, but nobody can write to it while anyone is reading, and nobody can read while a write is going on.

They are really needed.
Imagine if you have an instance of class foo that's completely local to some piece of code. And you have something like this:
{
foo j;
some_func(j); // this stashes a reference to j where another thread can find it
while (j.count() == 0)
bar();
}
Suppose the optimizer looks carefully at the code to bar and sees that it can't possibly modify j.count_. This allows the optimizer to rewrite the code as follows:
{
foo j;
some_func(j); // this stashes a reference to j where another thread can find it
if (j.count() == 0)
{
while (1)
bar();
}
}
Clearly this is a disaster. Another thread might call j.setCount(5) and the thread wouldn't exit to loop.
The compiler can prove that bar can't modify the return value of j.count(). If it was required to assume that another thread could modify every memory value it accesses, it could never stash anything in a register ever, which would clearly be an untenable situation.
So, yes, the lock is needed. Alternatively, you need to use some other construct that provides similar guarantees.
Do not ever write code that relies on compilers not being able to make any optimization that they are permitted to make unless you really have no other practical choice. I have seen this cause a lot of pain over the many years I've been programming. Optimizers today can do things that would have been considered absurdly implausible a decade ago and lots of code lasts longer than you expect.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js