Access to two variables safely when an interrupt might occur between them - concurrency

First of all I'd welcome edits to the title of this question, I couldn't think how to word it better but I'm not too happy with what I came up with.
This is a question about concurrency, my application is on a microcontroller in C but I don't think that matters a great deal.
I have an interrupt routine which can change the values of two global variables. I have some main code which can read those variables. But it must get get consistent values from both which means I can't read one and then read the other because the interrupt might occur between the two and change them both, leaving me having read one from one set and one from the other.
Normally I would just disable the interrupt for the tiny part of code that reads both variables but I can't do this because the interrupt needs to be called at exactly the right time with no "jitter" in calls. 4 or 5 instructions to read and store the variables will lead to too much jitter in interrupt timing. (I'm generating PAL video in the interrupt so any jitter in timing will lead to visible movement of pixels on the screen).
I could rearrange the code to do this in a different way so that the same interrupt isn't responsible for the two things, and long term I'll probably do this, but it's significant work to do so. The question has become interesting to me anyway now as a 'puzzle' even if I recode it later to avoid the situation.
So what I'm asking is, is there any way I can read both variables and ensure they are in a consistent state without disabling interrupts?
It doesn't matter if I get the value before or after the interrupt occurs as long as both values come from the same place.
I've thought about having a separate "version number" count to read to ensure that each variable is the same version that is read, but that just makes the problem worse as now I have 4 variables to read.

What microcontroller are you using. What kind of variables are we talking about? If you need less then 4 bytes and using 32bit MCU, you can solve this problem by putting the two variables into one 32bit variable.
EDIT:
If you are using 8 bit MCU, I think that the best way to you can do is to put variables to structure, create array of this structures (2 elements) and have one variable which indicates which structure is currently used for read. Then your interrupt changes the unused structure, and after that changes the value of indicator.

Related

Is it bad to use globals in a low power multicore microcontroller? (C++) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have a Parallax Propeller, which is an 8-core microcontroller, clocked at 80MHz (i.e. its a lot better than an Arduino but not amazing).
I have a few variables I need access to throughout my code - pitch and roll angles, and so on (all floats). My intention is to declare them as extern volatiles - globals that will only be set inside a class, and only read elsewhere (the volatile is there as it is multicore)(extern as the class is in a separate file).
Is this bad practise? I am trying to avoid calling them up using functions, as that will only slow things down - all variables are stored in main ram, so it shouldn't matter how they are placed there. Is there a better way to do this?
Assuming it was better to call them up using functions, how would this be done?
volatile guarantees that the compiler doesn't "optimise away accesses to a variable" (the classic example is:
bool b = false;
while(!b) /* do nothing */;
which the compiler turns to:
bool b = false;
bool tmp = !b; // tmp is really a processor register.
while(tmp) /* do nothing */;
Since nothing in the loop is changing b, it's a valid optimisation for the compiler to do this.
Using volatile bool b; will guarantee that every time you "use" b in your code, the compiler reads/writes the actual variable, rather than some "temporary copy".
However, volatile does not guarantee:
That the data is updated atomically - in other words, if you store something, half the new data may be stored, while another half is the old value, causing all kinds of "interesting" results [such as random values in a float].
That the other cpu's caches are aware that the value has changed - in other words, the value read by one CPU may have been updated some time ago on aother CPU - say b in the above code - despit volatile, the CPU may well not have caused a flush on another CPU's caches.
You need to take care of caches and atomic updates to ensure that ytou don't get "fun" bugs caused by either of those not being done correctly. This of course applies regardless of whether you use global variables or otherwise share data between processors - the only case where you can rely on it is when you have a proper interface via the OS or runtime environment, which provides guarantees for these things.
For the topical question: No, it is not necessarily bad to use global variables in a microcontroller. Whether the Propeller is "low power" is arguable, but if I recall correctly it has neither floating point nor functionality to do multiword atomic access, by design. Any core in it has 1/8th of the main memory access cycles, whether it wants them or not.
You describe storing attitude in a set of global variables. If every word may be independently updated and is written in one atomic operation (I believe the propeller has 32 bit RAM as well as processors), you can get away with ignoring synchronisation, but if there's ever a requirement to correlate the measurements you want to use some sort of barrier or lock. The Propeller was designed to operate without locks by having predictable timing, but that's basically only for cycle counting assembly programmers; it won't help you with C++ at all.
The Propeller does have hardware locks, implemented through a pointlessly awkward and limited set of test-and-set primitives. If you manage to make sense of the Spin documentation for them you could use them (because neither the datasheet nor the GCC documentation covers their semantics).
As for this controller being "a lot better" than the Arduino? I wouldn't say that's a given. It has only primitive timer and video generation peripherals (anything else is done in software), the processors take four cycles per instruction compared to the AVR's typical one or two, and you could just as easily have gotten a Cortex M4F based microcontroller which do have floating point.
The problem inherits from the multithread application, namely two or more threads can access (read or write) one variable parallel. In that cases the synchronisation is one of the most important aspect. It is all same which technique you use (locking, atomic operation, semaphores etc.) you have avoid it inconsistency of the variable (i.e. during a reading operation an other thread writes the variable).
So, if you use external global variable all the external modules can access this variable directly without synchronisation. Certainly the developer implementing the specified module can use atomic operation (or other solution), but it depends on his or her attention, and the responsibility of synchronisation goes to all the external modules accessing the global variable which is a bad methodology and (if it is possible) you have to avoid.
As solution in case of the most multithread system all the methods which can access a variable directly have to be implemented in the same module. The benefits of this solution:
The methods can uses the common synchronisation solution therefore the parallel and synchronised access of the variable is solved.
Generally one module is implemented by one developer therefore the synchronisation mistakes are also reduced.
The other modules (and the other developer) will get a well-specified and synchronized interface to access the common resources (global variables).
So, the external modules can access these global variables only by means of interface methods. Certainly in this case some insignificant calling overhead will appear, but if you would like to avoid it and make your code more efficient you can declare these interface methods as inline methods.
the gcc or C impementation on the propeller is a hack at best the multicore features and stuff are tossed out to make C available. so while it is one thing to say the C standard this or that or the religious debates over globals or not are one thing, but then there is the implementation of C on that platform which may or may not really conform to any said standards.
I agree with another person the propeller if used as designed has some features that are interesting, but an arduino is much cleaner, pretty much anything is. If you actually want hardware multithreading go with XMOS, it has their own C implementation for their features, very cool hardware as in style/design, but not cool as in power consumption. The propeller is a neat toy if you toss the C and use it as it was designed, once you go with C you are crippling it and turning it into a very weak microcontroller, pretty much every other competitor is better at that point. I was extremely disappointed when I finally bought one and looked behind the curtain.
Unless their is an implementation problem, it is okay to use globals and use volatile on those to get code to share that global. May leave a bad taste in some folks mouths, but per spec it should work.

Is it necessary for all data accessing by multi-thread, decorated with volatile?

For the primitive, I think it's necessary.
Even for the non-primitive, for example an array, I think it's also necessary.
Without volatile:
int d[2];
Thread 1:
while (d[1] > 0) modify(d[0]);
Thread 2:
while (d[0] > 0) modify(d[1]);
I am afraid that the compiler change my code as follow, when without volatile.
while (true) modify();
So I put volatile before 'int d[2]';
But I feel a little strange with everything decorated with volatile.
No, that's not what volatile is used for. Volatile is used for variables which may changed outside your program - for eg. memory mapped devices, graphics memory etc.
It's not necessary just because a program is multithreaded - neither for primitive types nor for arrays.
No. Volatile is for variables that may be read and/or written without the compiler knowing about it. Although another thread changing the variable might look like that situation, volatile is not enough nor actually needed for multithreading programming.
Unless you are writing the synchronization primitives yourself, but that is way more difficult to do right than it seems. And it seems hard enough...
For more details you can read the Linux insight about this issue at Volatile considered harmful. The article is for C, not C++, but the same principles apply.
In this case, there are two threads are "modifying each other's data", which indeed would require the compiler to KNOW that the data is being modified by another thread. There are several solutions to solve this, volatile will tell the compiler that it can't store the value in a register from the first read, but there are problems with that....
Most importanly, volatile will NOT solve the problem of precisely detecting the "edge" when the d[1] > 0 is being changed, since with volatile, all you are guaranteed is that the compiler doesn't remove the read of the variable. In a system with multiple cores, there could well be a "delay" between the new data in thread 1 reaching thread 2. Meaning that d[0] may be modified more times than you expected, because the loop ran a few extra cycles. In extreme cases, such as certain models of ARM processors, the loop may run more or less indefinitely since the processor(core) will need the cache-line flushed, and that will not happen without intervention unless something else is using that same cache-line. In a system that is otherwise not busy, this may take as long as you've got, or longer... ;)
So, I don't agree that volatile isn't needed in multithreaded environments, but I do agree that it's not the whole solution. The std::atomic and other similar constructs are required to ensure the correctness if detecting values has changed "immediately" is needed for the code to work correctly.
Sharing data across threads is a difficult matter, and it needs careful planning and understanding to make it work right. I know the above code is probably just a simplified example, but if modify(d[1]) is trivial, then it would be a very bad case of sharing data, and it is likely that it will run MUCH slower as two threads than as single-threaded loop, because every cache-line write by one processor will force a flush of the cache-line on the other processor. So it will be like driving a Ferrari sports car in busy Manhattan traffic - not very energy efficient, and no faster than the simple solution.

Real time programming in C++

I have two C++ classes, objecManip and updater. The updater class has a timer to check the status of the robot arm of my application.
If it moving then do nothing, else getNextAction() from the actions queue.
The actions queue is populated with class objectManip. I have a global variable: current_status of the robot arm that I need in objectManip.
The problem is that when filling in actions queue current_status is taken constantly not dynamically.
Question is very unclear, so this is really a stab in the dark, but you need to use atomic data types. With C++11, you have std::atomic (see here or here. For an earlier version of C++, I think you need to use some library or compiler specific data type, which offers atomic data types.
If you make some assumptions about how multithreading works for your CPU and operating system, you may get away with just declaring shared variables volatile and reading value to temp variable when you use it. volatile is really meant for cases like reading hardware-mapped values, where value must be read from the memory every time, so many optimizations are not possible. It does not guarantee atomic updates in itself, because a thread modifying a value may be interrupted in the middle of update, and then another read may read invalid, partially updated value. For booleans this should be fairly safe. For integers which do not cross memory word boundary and which word size or less, this may be safe on many CPUs, which will not interrupt a thread in the middle of writing single memory word. Otherwise, it is data corruption waiting to happen. Some (today uncommon) CPUs also do not synchronize caches between multiple CPU cores, and in that case volatile will not help, different threads may see different cached value. So conclusion: use volatile as last resort hack!

Is it possible to have a race condition for a write-only operation?

If I have several threads trying to write the same value to a single location in memory, is it possible to have a race condition? Can the data somehow get corrupted during the writes? There is no preceding read or test conditions, only the write...
EDIT: To clarify, I'm computing a dot product on a GPU. I'm using several threads to calculate the individual products (one thread per row/column element) and saving them to a temporary location in memory. I need to then sum those intermediate products and save the result.
I was thinking about having all threads individually perform this sum/store operation since branching on a GPU can hurt performance. (You would think it should take the same amount of time for the sum/store whether it's done by a single thread or all threads, but I've tested this and there is a small performance hit.) All threads will get the same sum, but I'm concerned about a race condition when they each try to write their answer to the same location in memory. In the limited testing I've done, everything seems fine, but I'm still nervous...
Under most threading standards on most platforms, this is simply prohibited or undefined. That is, you are not allowed to do it and if you do, anything can happen.
High-level language compilers like those for C and C++ are free to optimize code based on the assumption that you will not do anything you are not allowed to do. So a "write-only" operation may turn out to be no such thing. If you write i = 1; in C or C++, the compiler is free to generate the same code as if you wrote i = 0; i++;. Similarly confounding optimizations really do occur in the real world.
Instead, follow the rules for whatever threading model you are using to use appropriate synchronization primitives. If your platform provides them, use appropriate atomic operations.
There is no problem having multiple threads writing a single (presumably shared or global) memory location in CUDA, even "simultaneously" i.e. from the same line of code.
If you care about the order of the writes, then this is a problem, as CUDA makes no guarantee of order, for multiple threads executing the same write operation to the same memory location. If this is an issue, you should use atomics or some other method of refactoring your code to sort it out. (It doesn't sound like this is an issue for you.)
Presumably, as another responder has stated, you care about the result at some point. Therefore it's necessary to have a barrier of some sort, either explicit (e.g. __synchthreads(), for multiple threads within a block using shared memory for example) or implicit (e.g. end of a kernel, for multiple threads writing to a location in global memory) before you read that location and expect a sensible result. Note these are not the only possible barrier methods that could give you sane results, just two examples. Warp-synchronous behavior or other clever coding techniques could be leveraged to ensure the sanity of a read following a collection of writes.
Although on the surface the answer would seem to be no, there are no race conditions, the answer is a bit more nuanced. Boris is right that on some 32-bit architectures, storing a 64-bit long or address may take two operations and therefore may be read in an invalid state. This is probably pretty difficult to reproduce since memory pages are what typically are updated and a long would never span a memory page.
However, the more important issue is that you need to realize that without memory synchronization there are no guarantees around when a thread would see the updated value. A thread could run for a long period of time reading an out-of-date value from memory. It wouldn't be an invalid value but it would not be the most recent one written. That may not specifically cause a "race-condition" but it might cause your program to perform in an unexpected manner.
Also, although you say it is "write-only", obviously someone is reading the value otherwise there would be no reason to perform the update. The details of what portion of the code is reading the value will better inform us as to whether the write-only without synchronization is truly safe.
If write-only operations are not atomic obviously there will be a moment when another thread may observe the data in the corrupted state.
For example writing to 64-bit integers, that are stored as a pair of 32-bit integers.
Thread A - just finished writing the high order word, and the Thread B has just finished writing to the low order word, and is going to set the high order word;
Thread C may see that integer consists of a low order word written by thread B and a high order word written by thread A.
P.S. this question is very generic, actual results will depend on the memory model of the environment(language) and the underlying processor architecture(hardware).

Is it safe to read an integer variable that's being concurrently modified without locking?

Suppose that I have an integer variable in a class, and this variable may be concurrently modified by other threads. Writes are protected by a mutex. Do I need to protect reads too? I've heard that there are some hardware architectures on which, if one thread modifies a variable, and another thread reads it, then the read result will be garbage; in this case I do need to protect reads. I've never seen such architectures though.
This question assumes that a single transaction only consists of updating a single integer variable so I'm not worried about the states of any other variables that might also be involved in a transaction.
atomic read
As said before, it's platform dependent. On x86, the value must be aligned on a 4 byte boundary. Generally for most platforms, the read must execute in a single CPU instruction.
optimizer caching
The optimizer doesn't know you are reading a value modified by a different thread. declaring the value volatile helps with that: the optimizer will issue a memory read / write for every access, instead of trying to keep the value cached in a register.
CPU cache
Still, you might read a stale value, since on modern architectures you have multiple cores with individual cache that is not kept in sync automatically. You need a read memory barrier, usually a platform-specific instruction.
On Wintel, thread synchronization functions will automatically add a full memory barrier, or you can use the InterlockedXxxx functions.
MSDN: Memory and Synchronization issues, MemoryBarrier Macro
[edit] please also see drhirsch's comments.
You ask a question about reading a variable and later you talk about updating a variable, which implies a read-modify-write operation.
Assuming you really mean the former, the read is safe if it is an atomic operation. For almost all architectures this is true for integers.
There are a few (and rare) exceptions:
The read is misaligned, for example accessing a 4-byte int at an odd address. Usually you need to force the compiler with special attributes to do some misalignment.
The size of an int is bigger than the natural size of instructions, for example using 16 bit ints on a 8 bit architecture.
Some architectures have an artificially limited bus width. I only know of very old and outdated ones, like a 386sx or a 68008.
I'd recommend not to rely on any compiler or architecture in this case.
Whenever you have a mix of readers and writers (as opposed to just readers or just writers) you'd better sync them all. Imagine your code running an artificial heart of someone, you don't really want it to read wrong values, and surely you don't want a power plant in your city go 'boooom' because someone decided not to use that mutex. Save yourself a night-sleep in a long run, sync 'em.
If you only have one thread reading -- you're good to go with just that one mutex, however if you're planning for multiple readers and multiple writers you'd need a sophisticated piece of code to sync that. A nice implementation of read/write lock that would also be 'fair' is yet to be seen by me.
Imagine that you're reading the variable in one thread, that thread gets interrupted while reading and the variable is changed by a writing thread. Now what is the value of the read integer after the reading thread resumes?
Unless reading a variable is an atomic operation, in this case only takes a single (assembly) instruction, you can not ensure that the above situation can not happen.
(The variable could be written to memory, and retrieving the value would take more than one instruction)
The consensus is that you should encapsulate/lock all writes individualy, while reads can be executed concurrently with (only) other reads
Suppose that I have an integer variable in a class, and this variable may be concurrently modified by other threads. Writes are protected by a mutex. Do I need to protect reads too? I've heard that there are some hardware architectures on which, if one thread modifies a variable, and another thread reads it, then the read result will be garbage; in this case I do need to protect reads. I've never seen such architectures though.
In the general case, that is potentially every architecture. Every architecture has cases where reading concurrently with a write will result in garbage.
However, almost every architecture also has exceptions to this rule.
It is common that word-sized variables are read and written atomically, so synchronization is not needed when reading or writing. The proper value will be written atomically as a single operation, and threads will read the current value as a single atomic operation as well, even if another thread is writing. So for integers, you're safe on most architectures. Some will extend this guarantee to a few other sizes as well, but that's obviously hardware-dependant.
For non-word-sized variables both reading and writing will typically be non-atomic, and will have to be synchronized by other means.
If you don't use prevous value of this variable when write new, then:
You can read and write integer variable without using mutex. It is because integer is base type in 32bit architecture and every modification/read of value is doing with one operation.
But, if you donig something such as increment:
myvar++;
Then you need use mutex, because this construction is expanded to myvar = myvar + 1 and between read myvar and increment myvar, myvar can be modified. In that case you will get bad value.
While it would probably be safe to read ints on 32 bit systems without synchronization. I would not risk it. While multiple concurrent reads are not a problem, I do not like writes to happen at the same time as reads.
I would recommend placing the reads in the Critical Section too and then stress test your application on multiple cores to see if this is causing too much contention. Finding concurrency bugs is a nightmare I prefer to avoid. What happens if in the future some one decides to change the int to a long long or a double, so they can hold larger numbers?
If you have a nice thread library like boost.thread or zthread then you should have read/writer locks. These would be ideal for your situation as they allow multiple reads while protecting writes.
This may happen on 8 bit systems which use 16 bit integers.
If you want to avoid locking you can under suitable circumstances get away with reading multiple times, until you get two equal consecutive values. For example, I've used this approach to read the 64 bit clock on a 32 bit embedded target, where the clock tick was implemented as an interrupt routine. In that case reading three times suffices, because the clock can only tick once in the short time the reading routine runs.
In general, each machine instruction goes thru several hardware stages when executing. As most current CPUs are multi-core or hyper-threaded, that means that reading a variable may start it moving thru the instruction pipeline, but it doesn't stop another CPU core or hyper-thread from concurrently executing a store instruction to the same address. The two concurrently executing instructions, read and store, might "cross paths", meaning that the read will receive the old value just before the new value is stored.
To resume: you do need the mutex for both read and writes.
Both reading / writing to variables with concurrency must be protected by a critical section (not mutex). Unless you want to waste your whole day debugging.
Critical sections are platform-specific, I believe. On Win32, critical section is very efficient: when no interlocking occur, entering critical section is almost free and does not affect overall performance. When interlocking occur, it is still more efficient, than mutex, because it implements a series of checks before suspending the thread.
Depends on your platform. Most modern platforms offer atomic operations for integers: Windows has InterlockedIncrement, InterlockedDecrement, InterlockedCompareExchange, etc. These operations are usually supported by the underlying hardware (read: the CPU) and they are usually cheaper than using a critical section or other synchronization mechanisms.
See MSDN: InterlockedCompareExchange
I believe Linux (and modern Unix variants) support similar operations in the pthreads package but I don't claim to be an expert there.
If a variable is marked with the volatile keyword then the read/write becomes atomic but this has many, many other implications in terms of what the compiler does and how it behaves and shouldn't just be used for this purpose.
Read up on what volatile does before you blindly start using it: http://msdn.microsoft.com/en-us/library/12a04hfd(VS.80).aspx