How to implement efficient C++ runtime statistics

How to implement efficient C++ runtime statistics - c++

I would like to know if there is a good way to monitor my application internals, ideally in the form of an existing library.
My application is heavily multithreaded, and uses a messaging system to communicate in-between threads and to the external world. My goal is to monitor what kind of messages are sent, at which frequency, etc.
There could also be other statistics in a more general way, like how many threads are spawned every minute, how much new/delete are called, or more specific aspects of the application; you name it.
What would be awesome is something like the "internal pages" you have for Google Chrome, like net or chrome://tracing , but in a command line fashion.
If there is a library that's generic enough to accomodate for the specificities of my app, that would be great.
Otherwise I'm prepared to implement a small class that would do the job, but I don't know where to start. I think the most important thing is that the code shouldn't interfere too much, so that performances are not impacted.
Do you guys have some pointers on this matter?
Edit: my application runs on Linux, in an embedded environment, sadly not supported by Valgrind :(

I would recommend that in your code, you maintain counters that get incremented. The counters can be static class members or globals. If you use a class to define your counter, you can have the constructor register your counter with a single repository along with a name. Then, you can query and reset your counters by consulting the repository.
struct Counter {
unsigned long c_;
unsigned long operator++ () { return ++c_; }
operator unsigned long () const { return c_; }
void reset () { unsigned long c = c_; ATOMIC_DECREMENT(c_, c); }
Counter (std::string name);
};
struct CounterAtomic : public Counter {
unsigned long operator++ () { return ATOMIC_INCREMENT(c_, 1); }
CounterAtomic (std::string name) : Counter(name) {}
};
ATOMIC_INCREMENT would be a platform specific mechanism to increment the counter atomically. GCC provides a built-in __sync_add_and_fetch for this purpose. ATOMIC_DECREMENT is similar, with GCC built-in __sync_sub_and_fetch.
struct CounterRepository {
typedef std::map<std::string, Counter *> MapType;
mutable Mutex lock_;
MapType map_;
void add (std::string n, Counter &c) {
ScopedLock<Mutex> sl(lock_);
if (map_.find(n) != map_.end()) throw n;
map_[n] = &c;
}
Counter & get (std::string n) const {
ScopedLock<Mutex> sl(lock_);
MapType::const_iterator i = map_.find(n);
if (i == map_.end()) throw n;
return *(i->second);
}
};
CounterRepository counterRepository;
Counter::Counter (std::string name) {
counterRepository.add(name, *this);
}
If you know the same counter will be incremented by more than one thread, then use CounterAtomic. For counters that are specific to a thread, just use Counter.

I gather you are trying to implement the gathering of run-time statistics -- things like how many bytes you sent, how long you've been running, and how many times the user has activated a particular function.
Typically, in order to compile run-time statistics such as these from a variety of sources (like worker threads), I would have each source (thread) increment its own, local counters of the most fundamental data but not perform any lengthy math or analysis on that data yet.
Then back in the main thread (or wherever you want these stats analyzed & displayed), I send a RequestProgress type message to each of the worker threads. In response, the worker threads will gather up all the fundamental data and perhaps perform some simple analysis. This data, along with the results of the basic analysis, are sent back to the requesting (main) thread in a ProgressReport message. The main thread then aggregates all this data, does additional (perhaps costly) analysis, formatting and display to the user or logging.
The main thread sends this RequestProgress message either on user request (like when they press the S key), or on a timed interval. If a timed interval is what I'm going for, I'll typically implement another new "heartbeat" thread. All this thread does is Sleep() for a specified time, then send a Heartbeat message to the main thread. The main thread in turn acts on this Heartbeat message by sending RequestProgress messages to every worker thread the statistics are to be gathered from.
The act of gathering statistics seems like it should be fairly straightforward. So why such a complex mechanism? The answer is two-fold.
First, the worker threads have a job to do, and computing usage statistics isn't it. Trying to refactor these threads to take on a second responsibility orthoganal to their main purpose is a little like trying to jam a square peg in to a round hole. They weren't built to do that, so the code will resist being written.
Second, the computation of run-time statistics can be costly if you try to do too much, too often. Suppose for example you have a worker thread that send multicast data on the network, and you want to gather throughput data. How many bytes, over how long a time period, and an average of how many bytes per second. You could have the worker thread compute all this on the fly itself, but it's a lot of work and that CPU time is better spent by the worker thread doing what it's supposed to be doing -- sending multicast data. If instead you simply incremented a counter for how many bytes you've sent every time you send a message, the counting has minimal impact on the performance of the thread. Then in response to the occasional RequestProgress message you can figure out the start & stop times, and send just that along to let the main thread do all the divison etc.

Use shared memory (POSIX, System V, mmap or whatever you have available). Put a fixed length array of volatile unsigned 32- or 64-bit integers (i.e. the largest you can atomically increment on your platform) in there by casting the raw block of memory to your array definition. Note that the volatile doesn't get you atomicity; it prevents compiler optimizations that might trash your stats values. Use intrinsics like gcc's __sync_add_and_fetch() or the newer C++11 atomic<> types.
You can then write a small program that attaches to the same block of shared memory and can print out one or all stats. This small stats reader program and you main program would have to share a common header file that enforced the position of each stat in the array.
The obvious drawback here is that you're stuck with a fixed number of counters. But it's hard to beat, performance-wise. The impact is the atomic increment of an integer at various points in your program.

In embedded systems, a common technique is to reserve a block of memory for a "log" and treat it like a circular queue. Write some code that can read this block of memory; which will help take "snapshots" during run-time.
Search the web for "debug logging". Should turn up some source you could use to play with. Most shops I've been at usually roll their own.
Should you have extra non-volatile memory, you could reserve an area and write to that. This would also include files if your system is large enough to support a file system.
Worst case, write data out to a debug (serial) port.
For actual, real-time, measurements, we usually use an oscilloscope connected to a GPIO or test point and output pulses to the GPIO / Test point.

Have a look at valgrind/callgrind.
It can be used for profiling, which is what I understand you are looking for. I do not think it works at runtime though, but it can generate after your process finnished.

That's a good answer, #John Dibling! I had a system quite similar to this. However, my "stat" thread was querying workers 10 times per second and it affected performance of the worker threads as each time the "stat" thread asks for a data, there's a critical section accessing this data (counters, etc.) and it means that the worker thread is blocked for the time this data is being retrieved. It turned out, that under heavy load of worker threads, this 10Hz stat querying affected overall performance of the workers.
So I switched to a slightly different stat reporting model - instead of actively querying worker threads from the main threads, I now have worker threads to report their basic stat counters to their exclusive statistics repos, which can be queried by the main thread at any time with no direct impact on the workers.

If you are on C++11 you could use std::atomic<>
#include <atomic>
class GlobalStatistics {
public:
static GlobalStatistics &get() {
static GlobalStatistics instance;
return instance;
}
void incrTotalBytesProcessed(unsigned int incrBy) {
totalBytesProcessed += incrBy;
}
long long getTotalBytesProcessed() const { return totalBytesProcessed; }
private:
std::atomic_llong totalBytesProcessed;
GlobalStatistics() { }
GlobalStatistics(const GlobalStatistics &) = delete;
void operator=(const GlobalStatistics &) = delete;
};

Related

How to templatize code for a class member function based on constructor parameter

I have a highly performance-sensitive (read low latency requirement) C++ 17 class for logging that has member functions that can either log locally or can log remotely depending upon the flags with which the class is implemented. "Remote Logging" or "Local Logging" functionality is fully defined at the time when the object is constructed.
The code looks something like this
class Logger {
public:
Logger(bool aIsTx):isTx_(aIsTx) {init();}
~Logger() {}
uint16_t fbLog(const fileId_t aId, const void *aData, const uint16_t aSz){
if (isTx_)
// do remote logging
return remoteLog(aId, aData, aSz);
else
// do local logging
return fwrite(aData, aSz, 1,fd_[aId]);
}
protected:
bool isTx_
}
What I would like to do is
Some way of removing the if(isTx_) such that the code to be used gets defined at the time of instantiating.
Since the class objects are used by multiple other modules, I would not like to templatize the class because this will require me to wrap two templatized implementations of the class in an interface wrapper which will result in v-table call every time a member function is called.

You cannot "templetize" the behaviour, since you want the choice to be done at runtime.
In case you want to get rid of the if because of performance, rest assured that it will have negligible impact compared to disk access or network communication. Same goes for virtual function call.
If you need low latency, I recommend considering asynchronous logging: The main thread would simply copy the message into an internal buffer. Memory is way faster than disk or network, so there will be much less latency. You can then have a separate service thread that waits for the buffer to receive messages, and handles the slow communication.
As a bonus, you don't need branches or virtual functions in the main thread since it is the service thread that decides what to do with the messages.
Asynchronisity is not an easy approach however. There are many cases that must be taken into consideration:
How to synchronise the access to the buffer (I suggest trying out a lock free queue instead).
How much memory should the buffer be allowed to occupy? Without limit it can consume too much if the program logs faster than can be written.
If the buffer limit is reached, what should the main thread do? It either needs to fall back to synchronously waiting while the buffer is being processed or messages need to be discarded.
How to flush the buffer when the program crashes? If it is not possible, then the last messages may be lost - which probably are what you need to figure out why the program crashed in the first place.
Regardless of choice: If performance is critical, then try out multiple approaches and measure.

What is the best way to share data containers between threads in c++

I have an application which has a couple of processing levels like:
InputStream->Pre-Processing->Computation->OutputStream
Each of these entities run in separate thread.
So in my code I have the general thread, which owns the
std::vector<ImageRead> m_readImages;
and then it passes this member variable to each thread:
InputStream input{&m_readImages};
std::thread threadStream{&InputStream::start, &InputStream};
PreProcess pre{&m_readImages};
std::thread preStream{&PreProcess::start, &PreProcess};
...
And each of these classes owns a pointer member to this data:
std::vector<ImageRead>* m_ptrReadImages;
I also have a global mutex defined, which I lock and unlock on each read/write operation to that shared container.
What bothers me is that this mechanism is pretty obscure and sometimes I get confused whether the data is used by another thread or not.
So what is the more straightforward way to share this container between those threads?

The process you described as "Input-->preprocessing-->computation-->Output" is sequential by design: each step depends on the previous one so parallelization in this particular manner is not beneficial as each thread just has to wait for another to complete. Try to find out which step takes most time and parallelize that. Or try to set up multiple parallel processing pipelines that operate sequentially on independent, individual data sets. A usual approach for that would employ a processing queue which distributes the tasks among a set of threads.

It would seem to me that your reading and preprocessing could be done independently of the container.
Naively, I would structure this as a fan-out and then fan-in network of tasks.
First, make dispatch task (a task is a unit of work that is given to a thread to actually operate) that will create input-and-preprocess tasks.
Use futures as a means for the sub-tasks to communicate back a pointer to the completely loaded image.
Make a second task, the std::vector builder task that just calls join on the futures to get the results when they are done and adds them to the std::vector array.
I suggest you structure things this way because I suspect that any IO and preprocessing you are doing will take longer than setting a value in the vector. Using tasks instead of threads directly lets you tune the parallel portion of your work.
I hope that's not too abstracted away from the concrete elements. This is a pattern I find to be well balanced between saturating available hardware, reducing thrash / lock contention, and is understandable by future-you debugging it later.

I would use 3 separate queues, ready_for_preprocessing which is fed by InputStream and consumed by Pre-processing, ready_for_computation which is fed by Pre-Processing and consumed by Computation, and ready_for_output which is fed by Computation and consumed by OutputStream.
You'll want each queue to be in a class, which has an access mutex (to control actually adding and removing items from the queue) and an "image available" semaphore (to signal that items are available) as well as the actual queue. This would allow multiple instances of each thread. Something like this:
class imageQueue
{
std::deque<ImageRead> m_readImages;
std::mutex m_changeQueue;
Semaphore m_imagesAvailable;
public:
bool addImage( ImageRead );
ImageRead getNextImage();
}
addImage() takes the m_changeQueue mutex, adds the image to m_readImages, then signals m_imagesAvailable;
getNextImage() waits on m_imagesAvailable. When it becomes signaled, it takes m_changeQueue, removes the next image from the list, and returns it.
cf. http://en.cppreference.com/w/cpp/thread

Ignoring the question of "Should each operation run in an individual thread", it appears that the objects that you want to process move from thread to thread. In effect, they are uniquely owned by only one thread at a time (no thread ever needs to access any data from other threads, ). There is a way to express just that in C++: std::unique_ptr.
Each step then only works on its owned image. All you have to do is find a thread-safe way to move the ownership of your images through the process steps one by one, which means the critical sections are only at the boundaries between tasks. Since you have multiple of these, abstracting it away would be reasonable:
class ProcessBoundary
{
public:
void setImage(std::unique_ptr<ImageRead> newImage)
{
while (running)
{
{
std::lock_guard<m_mutex> guard;
if (m_imageToTransfer == nullptr)
{
// Image has been transferred to next step, so we can place this one here.
m_imageToTransfer = std::move(m_newImage);
return;
}
}
std::this_thread::yield();
}
}
std::unique_ptr<ImageRead> getImage()
{
while (running)
{
{
std::lock_guard<m_mutex> guard;
if (m_imageToTransfer != nullptr)
{
// Image has been transferred to next step, so we can place this one here.
return std::move(m_imageToTransfer);
}
}
std::this_thread::yield();
}
}
void stop()
{
running = false;
}
private:
std::mutex m_mutex;
std::unique_ptr<ImageRead> m_imageToTransfer;
std::atomic<bool> running; // Set to true in constructor
};
The process steps would then ask for an image with getImage(), which they uniquely own once that function returns. They process it and pass it to the setImage of the next ProcessBoundary.
You could probably improve on this with condition variables, or adding a queue in this class so that threads can get back to processing the next image. However, if some steps are faster than others they will necessarily be stalled by the slower ones eventually.

This is a design pattern problem. I suggest to read about concurrency design pattern and see if there is anything that would help you out.
If you wan to add concurrency to the following sequential process.
InputStream->Pre-Processing->Computation->OutputStream
Then I suggest to use the active object design pattern. This way each process is not blocked by the previous step and can run concurrently. It is also very simple to implement(Here is an implementation:
http://www.drdobbs.com/parallel/prefer-using-active-objects-instead-of-n/225700095)
As to your question about each thread sharing a DTO. This is easily solved with a wrapper on the DTO. The wrapper will contain write and read functions. The write functions blocks with a mutext and the read returns const data.
However, I think your problem lies in design. If the process is sequential as you described, then why are each process sharing the data? The data should be passed into the next process once the current one completes. In other words, each process should be decoupled.

You are correct in using mutexes and locks. For C++11, this is really the most elegant way of accessing complex data between threads.

Implementing Realtime in a Text-Adventure?

I'm making a text-based RPG, and I'd really like to emulate time.
I could just make some time pass between each time the player types something, but id like it to be better than that if possible. I was wondering if multithreading would be a good way to do this.
I was thinking maybe just have a second, really simple thread in the background that just has a loop, looping every 1000ms. For every pass though its loop the world time would increase by 1 sec and the player would regenerate a bit of health and mana.
Is this something that multithreading could do, or is there some stuff i don't know about that would make this not work? (I'd prefer not to spend a bunch of time struggling to learn this if its not going to help me with this project.)

Yes, mutlithreading could certainly do this, but be weary that threading is usually more complicated than the alternative (which would be the main thread polling various update events as part of its main loop, which should be running at least once every 100ms or so anyway).
In your case, if the clock thread follows pretty strict rules, you'll probably be "ok."
The clock thread is the only thread allowed to set/modify the time variables.
The main/ui thread is only allowed to read the time.
You must still use a system time function, since the thread sleep functions cannot be trusted for accuracy (depending on system activity, the thread's update loop may not run until some milliseconds after you requested it run).
If you implement it like that, then you won't even need to familiarize yourself with mutexes in order to get the thread up and running safely, and your time will be accurate.
But! Here's some food for thought: what if you want to bind in-game triggers at specific times of the day? For example, a message that would be posted to the user "The sun has set" or similar. The code needed to do that will need to be running on the main thread anyway (unless you want to implement cross-thread message communication queues!), and will probably look an awful lot like basic periodic-check-and-update-clock code. So at that point you would be better off just keeping a simple unified thread model anyway.

I usually use a class named Simulation to step forward time. I don't have it in C++ but I've done threading in Java that is stepping time forward and activating events according to schedule (or a random event at a planned time). You can take this and translate to C++ or use to see how an object-oriented implementation is.
package adventure;
public class Simulation extends Thread {
private PriorityQueue prioQueue;
Simulation() {
prioQueue = new PriorityQueue();
start();
}
public void wakeMeAfter(Wakeable SleepingObject, double time) {
prioQueue.enqueue(SleepingObject, System.currentTimeMillis() + time);
}
public void run() {
while (true) {
try {
sleep(5);
if (prioQueue.getFirstTime() <= System.currentTimeMillis()) {
((Wakeable) prioQueue.getFirst()).wakeup();
prioQueue.dequeue();
}
} catch (InterruptedException e) {
}
}
}
}
To use it, you just instantiate it and add your objects:
` Simulation sim = new Simulation();
// Load images to be used as appearance-parameter for persons
Image studAppearance = loadPicture("Person.gif");
// --- Add new persons here ---
new WalkingPerson(sim, this, "Peter", studAppearance);

I'm going to assume that your program currently spends the majority of its time waiting for user input - which blocks your main thread irregularly and for a relatively long period of time, preventing you from having short time-dependant updates. And that you want to avoid complicated solutions (threading).
If you want to access the time in the main thread, accessing it without a separate thread is relatively easy (look at the example).
If you don't need to do anything in the background while waiting for user input, couldn't you write a function to calculate the new value, based on the amount of time that has passed while waiting? You can have some variable LastSystemTimeObserved that gets updated every time you need to use one of your time-dependant variables - calling some function that calculates the variable's changed value based on how much time has passed since it was last called, instead of recalculating values every second.
If you do make a separate thread, be sure that you properly protect any variables that are accessed by both threads.

How should a thread pool be implemented in C?

I'm programming in C++, but I'm only using pthread.h, no boost or C++11 threads.
So I'm trying to use threads but based on one of my previous questions (link), this doesn't seem feasible since threads terminate right after completion of its task, and one of the more prevalent reasons to use a thread-pool implementation is to reduce thread-creation overhead by reusing these threads for multiple tasks.
So is the only other way to implement this in C to use fork(), and create a pipe from the main to child processes? Or is there a way to set up a pipe between threads and their parent that I don't know about?
Many thanks in advance!

Yes, you can create a thread-safe queue between the threads. Then the threads in the pool will sit in a loop retrieving an item from the queue, executing whatever it needs, then going back and getting another.
That's generally a bit easier/simpler in C++ because it's a little easier to agree on some of the interface (e.g., overload operator() to execute the code for a task), but at a fundamental level you can do all the same things in C (e.g., each task struct you put in the queue will contain a pointer to a function to carry out the work for that task).
In your case, since you are using C++, it's probably easier to use an overload of operator() to do the work though. The rest of the task struct (or whatever you choose to call it) will contain any data needed, etc.

From the POSIX standard:
int pthread_create(pthread_t *restrict thread,
const pthread_attr_t *restrict attr,
void *(*start_routine)(void*), void *restrict arg);
(...) The thread is created executing start_routine with arg as its sole argument.
So, you should create a bunch of threads with this function, and have them all execute a function that goes something like
void *consumer(void *arg)
{
WorkQueue *queue = static_cast<WorkQueue *>(arg);
for (task in queue) {
if (task == STOP_WORKING)
break;
do work;
}
return WHATEVER;
}
(At the end of input, push n STOP_WORKING items to the queue where n is the number of threads.)
Mind you, pthreads is a very low-level API that offers very little type-safety (all data is passed as void pointers). If you're trying to parallelize CPU-intensive tasks, you might want to look at OpenMP instead.

'doesn't seem feasible since threads terminate right after completion of its task' what??
for(;;){
Task *myTask=theCommonProducerConsumerQueue->pop();
myTask->run();
}
.. never return anything, in fact, never return.

You may find it helpful to look at the source code for libdispatch, which is the basis for Apple's Grand Central Dispatch and uses thread pools.

I would suggest using Threaded Building Blocks from Intel to accomplish work-queue/threadpool like tasks. A fairly contrived example using TBB 3.0:
class PoorExampleTask : public tbb::task {
PoorExampleTask(int foo, tbb::concurrent_queue<float>& results)
: _bar(foo), _results(results)
{ }
tbb::task* execute() {
_results.push(pow(2.0, foo));
return NULL;
}
private:
int _bar;
tbb::concurrent_queue<float>& _results;
}
Used later on like so:
tbb::concurrent_queue<float> powers;
for (int ww = 0; ww < LotsOfWork; ++ww) {
PoorExampleTask* tt
= new (tbb::task::allocate_root()) PoorExampleTask(ww, powers);
tbb::task::enqueue(*tt);
}

http://people.clarkson.edu/~jmatthew/cs644.archive/cs644.fa2001/proj/locksmith/code/ExampleTest/threadpool.c
I used google a couple months ago, you should try it.
Edit: it seems maybe you want a group instead. I was able to create one with some minor alteration of the above so that the worker didn't perform work, but just joined threads.

Is it practically safe to write static data from multiple threads

I have some status data that I want to cache from a database. Any of several threads may modify the status data. After the data is modified it will be written to the database. The database writes will always be done in series by the underlying database access layer which queues database operations in a different process so I cam not concerned about race conditions for those.
Is it a problem to just modify the static data from several threads? In theory it is possible that modifications are implemented as read, modify, write but in practice I can't imagine that this is so.
My data handling class will look something like this:
class StatusCache
{
public:
static void SetActivityStarted(bool activityStarted)
{ m_activityStarted = activityStarted; WriteToDB(); }
static void SetActivityComplete(bool activityComplete);
{ m_activityComplete = activityComplete; WriteToDB(); }
static void SetProcessReady(bool processReady);
{ m_processReady = processReady; WriteToDB(); }
static void SetProcessPending(bool processPending);
{ m_processPending = processPending; WriteToDB(); }
private:
static void WriteToDB(); // will write all the class data to the db (multiple requests will happen in series)
static bool m_activityStarted;
static bool m_activityComplete;
static bool m_processReady;
static bool m_processPending;
};
I don't want to use locks as there are already a couple of locks in this part of the app and adding more will increase the possibility of deadlocks.
It doesn't matter if there is some overlap between 2 threads in the database update, e.g.
thread 1 thread 2 activity started in db
SetActivityStarted(true) SetActivityStarted(false)
m_activityStated = true
m_activityStarted = false
WriteToDB() false
WriteToDB() false
So the db shows the status that was most recently set by the m_... = x lines. This is OK.
Is this a reasonable approach to use or is there a better way of doing it?
[Edited to state that I only care about the last status - order is unimportant]

No, it's not safe.
The code generated that does the writing to m_activityStarted and the others may be atomic, but that is not garantueed. Also, in your setters you do two things: set a boolean and make a call. That is definately not atomic.
You're better off synchronizing here using a lock of some sort.
For example, one thread may call the first function, and before that thread goes into "WriteDB()" another thread may call another function and go into WriteDB() without the first going there. Then, perhaps the status is written in the DB in the wrong order.
If you're worried about deadlocks then you should revise your whole concurrency strategy.

On multi CPU machines, there's no guarantee that memory writes will be seen by threads running on different CPUs in the correct order without issuing a synchronisation instruction. It's only when you issue a synch order, e.g. a mutex lock or unlock, that the each thread's view of the data is guaranteed to be consistent.
To be safe, if you want the state shared between your threads, you need to use synchronisation of some form.

You never know exactly how things are implemented at the lower levels. Especially when you start dealing with multiple cores, the various cache levels, pipelined execution, etc. At least not without a lot of work, and implementations change frequently!
If you don't mutex it, eventually you will regret it!
My favorite example involves integers. This one particular system wrote its integer values in two writes. E.g. not atomic. Naturally, when the thread was interrupted between those two writes, well, you got the upper bytes from one set() call, and the lower bytes() from the other. A classic blunder. But far from the worst that can happen.
Mutexing is trivial.
You mention: I don't want to use locks as there are already a couple of locks in this part of the app and adding more will increase the possibility of deadlocks.
You'll be fine as long as you follow the golden rules:
Don't mix mutex lock orders. E.g. A.lock();B.lock() in one place and B.lock();A.lock(); in another. Use one order or the other!
Lock for the briefest possible time.
Don't try to use one mutex for multiple purposes. Use multiple mutexes.
Whenever possible use recursive or error-checking mutexes.
Use RAII or macros to insure unlocking.
E.g.:
#define RUN_UNDER_MUTEX_LOCK( MUTEX, STATEMENTS ) \
do { (MUTEX).lock(); STATEMENTS; (MUTEX).unlock(); } while ( false )
class StatusCache
{
public:
static void SetActivityStarted(bool activityStarted)
{ RUN_UNDER_MUTEX_LOCK( mMutex, mActivityStarted = activityStarted );
WriteToDB(); }
static void SetActivityComplete(bool activityComplete);
{ RUN_UNDER_MUTEX_LOCK( mMutex, mActivityComplete = activityComplete );
WriteToDB(); }
static void SetProcessReady(bool processReady);
{ RUN_UNDER_MUTEX_LOCK( mMutex, mProcessReady = processReady );
WriteToDB(); }
static void SetProcessPending(bool processPending);
{ RUN_UNDER_MUTEX_LOCK( mMutex, mProcessPending = processPending );
WriteToDB(); }
private:
static void WriteToDB(); // read data under mMutex.lock()!
static Mutex mMutex;
static bool mActivityStarted;
static bool mActivityComplete;
static bool mProcessReady;
static bool mProcessPending;
};

Im no c++ guy but i dont think it will be safe to write to it if you dont have some sort of synchronization..

It looks like you have two issues here.
#1 is that your boolean assignment is not necessarily atomic, even though it's one call in your code. So, under the hood, you could have inconsistent state. You could look into using atomic_set(), if your threading/concurrency library supports that.
#2 is synchronization between your reading and writing. From your code sample, it looks like your WriteToDB() function writes out the state of all 4 variables. Where is WriteToDB serialized? Could you have a situation where thread1 starts WriteToDB(), which reads m_activityStarted but doesn't finish writing it to the database, then is preempted by thread2, which writes m_activityStarted all the way through. Then, thread1 resumes, and finishes writing its inconsistent state through to the database. At the very least, I think that you should have write access to the static variables locked out while you are doing the read access necessary for the database update.

In theory it is possible that modifications are implemented as read, modify, write but in practice I can't imagine that this is so.
Generally it is so unless you've set up some sort of transactional memory. Variables are generally stored in RAM but modified in hardware registers, so the read isn't just for kicks. The read is necessary to copy the value out of RAM and into a place it can be modified (or even compared to another value). And while the data is being modified in the hardware register, the stale value is still in RAM in case somebody else wants to copy it into another hardware register. And while the modified data is being written back to RAM somebody else may be in the process of copying it into a hardware register.
And in C++ ints are guaranteed to take at least a byte of space. Which means it is actually possible for them to have a value other than true or false, say due to race condition where the read happens partway through a write.
On .Net there is some amount of automatic synchronization of static data and static methods. There is no such guarantee in standard C++.
If you're looking at only ints, bools, and (I think) longs, you have some options for atomic reads/writes and addition/subtraction. C++0x has something. So does Intel TBB. I believe that most operating systems also have the needed hooks to accomplish this.

While you may be afraid of deadlocks, I am sure you will be extremely proud of your code to know it works perfectly.
So I would recommend you throw in the locks, you may also want to consider semaphores, a more primitive(and perhaps more versatile) type of lock.

You may get away with it with bools, but if the static objects being changed are of types of any great complexity, terrible things will occur. My advice - if you are going to write from multiple threads, always use synchronisation objects, or you will sooner or later get bitten.

This is not a good idea. There are many variables that will affect the timing of different threads.
Without some kind of lock you will not be guaranteed to have the correct last state.
It is possible that two status updates could be written to the database out of order.
As long as the locking code is designed properly dead locks should not be an issue with a simple process like this.

As others have pointed out, this is generally a really bad idea (with some caveats).
Just because you don't see a problem on your particular machine when you happen to test it doesn't prove that the algorithm works right. This is especially true for concurrent applications. Interleavings can change dramatically for example when you switch to a machine with a different number of cores.
Caveat: if all your setters are doing atomic writes and if you don't care about the timing of them, then you may be okay.
Based on what you've said, I'd think that you could just have a dirty flag that's set in the setters. A separate database writing thread would poll the dirty flag every so often and send the updates to the database. If some items need extra atomicity, their setters would need to lock a mutex. The database writing thread must always lock the mutex.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js