Memory barriers in userspace? (Linux, x86-64) - c++

It is easy to set memory barriers on the kernel side: the macros mb, wmb, rmb, etc. are always in place thanks to the Linux kernel headers.
How to accomplish this on the user side?

You are looking for the full memory barrier atomic builtins of gcc.
Please note the detail on the reference i gave here says,
The [following] builtins are intended to be compatible with those described in the Intel Itanium Processor-specific Application Binary Interface, section 7.4. As such, they depart from the normal GCC practice of using the “__builtin_” prefix, and further that they are overloaded such that they work on multiple types.

Posix defines a number of functions as acting as memory barriers. Memory locations must not be concurrently accessed; to prevent this, use synchronization - and that synchronization will also work as a barrier.

Use libatomic_ops. http://www.hpl.hp.com/research/linux/atomic_ops/
It's not compiler-specific, and less buggy than the GCC stuff. It's not a giganto-library that provides tons of functionality you don't care about. It just provides atomic operations. Also, it's portable to different CPU architectures.

Linux x64 means you can use the Intel memory barrier instructions.
You might wrap them in macros similar to those in the Linux headers, if
those macros aren't appropriate or accessible to your code

__sync_synchronize() in GCC 4.4+
The Intel Memory Ordering White Paper, a section from Volume 3A of Intel 64 and IA-32 manual http://developer.intel.com/Assets/PDF/manual/253668.pdf

The Qprof profiling library (nothing to do with Qt) also includes in its source code a library of atomic operations, including memory barriers. They work on many compilers and architectures. I'm using it on a project of mine.
http://www.hpl.hp.com/research/linux/qprof/download.php4

The include/arch/qatomic_*.h headers of a recent Qt distribution include (LGPL) code for a lot of architectures and all kinds of memory barriers (acquire, release, both).

Simply borrowing barriers defined for Linux kernel, just add those macros to your header file: http://lxr.linux.no/#linux+v3.6.5/arch/x86/include/asm/barrier.h#L21 . And of course, give Linux developers credit in your source code.

Related

atomic<int> for older c++ compilers

I am using atomic<int> in my code, but the machine in which now I'm compiling has an older g++ version which doesn't support C++11. Is there any equivalent class available on the net, so that I can use it in my code, or if not, where I can find the C++11 implementation of atomic<int> so I can copy it from there. Can this be easily done?
The proposed (i.e. unofficial) Boost.Atomic library aims to do exactly this. I don't know what state it's in currently, but it's used in the implementation of the recently (officially) accepted Boost.Lockfree library, so presumably it's usable.
EDIT — updated links, now that both Atomic and Lockfree have officially been in Boost for some time:
Boost.Atomic
Boost.Lockfree
Hans Boehm's atomic ops library is good, although it's hard to determine what's available on various platforms.
If you're OK with the LGPL, Intel TBB has what you're looking for as well (plus a lot of other stuff).
If you're only looking at GCC, then you may be able to get away with just using GCC's intrinsics (I'm not sure which version of GCC those showed up in, but they've been around for a while).
sig_atomic_t
This is an integral type of an object that can be accessed as an
atomic entity, even in the presence of asynchronous signals.
in gcc is atomic
To avoid uncertainty about interrupting access to a variable, you can use a particular data type for which access is always atomic: sig_atomic_t.

Interlocked ops vs XXX::atomic on Win32

What are the advantages and disadvantages of using Interlocked winapi functions instead of any library provides atomic operations on Win32 platform?
Portability is not an issue.
If portability is not a concern then you're basically down to deciding whom you trust more to get this right. A library is generally designed to provide portability. It otherwise has a tough time competing with an OS provided implementation that's been battle-hardened for over 15 years.
Check this thread to see an example of how the obvious implementation is not in fact the best.
The Interlocked winapi functions work on old processors even when there is no CPU support for locked operations. 386 and maybe 486, not really a issue today unless you still support Win9x and older NT.
It would likely depend up on the specific atomic library in question.
A good library with a specific back-end would likely end up with the same implementation of a couple of ASM instructions to issue an x86 lock instruction and do their work. And assuming the library itself is portable, subsequently make your code portable.
A naive atomic implementation might do something heavier like use a mutex to protect a normal variable. I don't know of any that do - just making the point for argument.
As such, given your stated non-portability requirements, using the Win32 functions should be fine. Alternately, go ahead with an Atomic version, but perhaps look at the actual implementation.

Portable c++ atomic swap (Windows - GNU/Linux - MacOSX)

Is there free a portable (Windows, GNU/Linux & MacOSX) library providing a lock-free atomic swap function?
If not, how would it be implemented for each of these platforms? (x86 with VC++ or g++)
Thanks
There's a lock-free library pending review in boost. Also if you dig into source of boost smart pointers library you will find atomic ops inlined for multiple platforms. Another one - Intel Threading Building Blocks has implementation of atomic<> template.
Depends what you want to swap. In assembler for x86 you might be able to get a "nearly" atomic xor swap, otherwise I'd go with some solution that uses locking, which will differ on Win32/{Linux,Darwin}.
If you are looking for a library, have a look at APR (Apache Portable Runtime) - http://apr.apache.org/
Boost has a set of macros for facilitating lock-free operations in a portable way.

Intel C++ compiler as an alternative to Microsoft's?

Is anyone here using the Intel C++ compiler instead of Microsoft's Visual c++ compiler?
I would be very interested to hear your experience about integration, performance and build times.
The Intel compiler is one of the most advanced C++ compiler available, it has a number of advantages over for instance the Microsoft Visual C++ compiler, and one major drawback. The advantages include:
Very good SIMD support, as far as I've been able to find out, it is the compiler that has the best support for SIMD instructions.
Supports both automatic parallelization (multi core optimzations), as well as manual (through OpenMP), and does both very well.
Support CPU dispatching, this is really important, since it allows the compiler to target the processor for optimized instructions when the program runs. As far as I can tell this is the only C++ compiler available that does this, unless G++ has introduced this in their yet.
It is often shipped with optimized libraries, such as math and image libraries.
However it has one major drawback, the dispatcher as mentioned above, only works on Intel CPU's, this means that advanced optimizations will be left out on AMD cpu's. There is a workaround for this, but it is still a major problem with the compiler.
To work around the dispatcher problem, it is possible to replace the dispatcher code produced with a version working on AMD processors, one can for instance use Agner Fog's asmlib library which replaces the compiler generated dispatcher function. Much more information about the dispatching problem, and more detailed technical explanations of some of the topics can be found in the Optimizing software in C++ paper - also from Anger (which is really worth reading).
On a personal note I have used the Intel c++ Compiler with Visual Studio 2005 where it worked flawlessly, I didn't experience any problems with microsoft specific language extensions, it seemed to understand those I used, but perhaps the ones mentioned by John Knoeller were different from the ones I had in my projects.
While I like the Intel compiler, I'm currently working with the microsoft C++ compiler, simply because of the financial extra investment the Intel compiler requires. I would only use the Intel compiler as an alternative to Microsofts or the GNU compiler, if performance were critical to my project and I had a the financial part in order ;)
I'm not using Intel C++ compiler at work / personal (I wish I would).
I would use it because it has:
Excellent inline assembler support. Intel C++ supports both Intel and AT&T (GCC) assembler syntaxes, for x86 and x64 platforms. Visual C++ can handle only Intel assembly syntax and only for x86.
Support for SSE3, SSSE3, and SSE4 instruction sets. Visual C++ has support for SSE and SSE2.
Is based on EDG C++, which has a complete ISO/IEC 14882:2003 standard implementation. That means you can use / learn every C++ feature.
I've had only one experience with this compiler, compiling STLPort. It took MSVC around 5 minutes to compile it and ICC was compiling for more than an hour. It seems that their template compilation is very slow. Other than this I've heard only good things about it.
Here's something interesting:
Intel's compiler can produce different
versions of pieces of code, with each
version being optimised for a specific
processor and/or instruction set
(SSE2, SSE3, etc.). The system detects
which CPU it's running on and chooses
the optimal code path accordingly; the
CPU dispatcher, as it's called.
"However, the Intel CPU dispatcher
does not only check which instruction
set is supported by the CPU, it also
checks the vendor ID string," Fog
details, "If the vendor string says
'GenuineIntel' then it uses the
optimal code path. If the CPU is not
from Intel then, in most cases, it
will run the slowest possible version
of the code, even if the CPU is fully
compatible with a better version."
OSnews article here
I tried using Intel C++ at my previous job. IIRC, it did indeed generate more efficient code at the expense of compilation time. We didn't put it to production use though, for reasons I can't remember.
One important difference compared to MSVC is that the Intel compiler supports C99.
Anecdotally, I've found that the Intel compiler crashes more frequently than Visual C++. Its diagnostics are a bit more thorough and clearly written than VC's. Thus, it's possible that the compiler will give diagnostics that weren't given with VC, or will crash where VC didn't, making your conversion more expensive.
However, I do believe that Intel's compiler allows you to link with Microsoft runtimes like the CRT, easing the transition cost.
If you are interoperating with managed code you should probably stick with Microsoft's compiler.
Recent Intel compilers achieve significantly better performance on floating-point heavy benchmarks, and are similar to Visual C++ on integer heavy benchmarks. However, it varies dramatically based on the program and whether or not you are using link-time code generation or profile-guided optimization. If performance is critical for you, you'll need to benchmark your application before making a choice. I'd only say that if you are doing scientific computing, it's probably worth the time to investigate.
Intel allows you a month-long free trial of its compiler, so you can try these things out for yourself.
I've been using the Intel C++ compiler since the first Release of Intel Parallel Studio, and so far I haven't felt the temptation to go back. Here's an outline of dis/advantages as well as (some obvious) observations.
Advantages
Parallelization (vectorization, OpenMP, SSE) is unmatched in other compilers.
Toolset is simply awesome. I'm talking about the profiling, of course.
Inclusion of optimized libraries such as Threading Building Blocks (okay, so Microsoft replicated TBB with PPL), Math Kernel Library (standard routines, and some implementations have MPI (!!!) support), Integrated Performance Primitives, etc. What's great also is that these libraries are constantly evolving.
Disadvantages
Speed-up is Intel-only. Well duh! It doesn't worry me, however, because on the server side all I have to do is choose Intel machines. I have no problem with that, some people might.
You can't really do OSS or anything like that on this, because the project file format is different. Yes, you can have both VS and IPS file formats, but that's just weird. You'll get lost in synchronising project options whenever you make a change. Intel's compiler has twice the number of options, by the way.
The compiler is a lot more finicky. It is far too easy to set incompatible project settings that will give you a cryptic compilation error instead of a nice meaningful explanation.
It costs additional money on top of Visual Studio.
Neutrals
I think that the performance argument is not a strong one anymore, because plenty of libraries such as Thrust or Microsoft AMP let you use GPGPU which will outgun your cpu anyway.
I recommend anyone interested to get a trial and try out some code, including the libraries. (And yes, the libraries are nice, but C-style interfaces can drive you insane.)
The last time the company I work for compared the two was about a year ago, (maybe 2). The Intel compiler generated faster code, usually only a bit faster, but in some cases quite a bit.
But it couldn't handle some of the MS language extensions that we depended on, so we ended up sticking with MS. It was VS 2005 that we were comparing it to. And I'm wracking my brain to remember exactly what MS extension the Intel compiler couldn't handle. I'll come back and edit this post if I can remember.
Intel C++ Compiler has AMAZING (human) support. Talking to Microsoft can literally take days. My non-trivial issue was solved through chat in under 10 minutes (including membership verification time).
EDIT: I have talked to Microsoft about problems in their products such as Office 2007, even got a bug reported. While I eventually succeeded, the overall size and complexity of their products and organization hierarchy is daunting.

Interlocked equivalent on Linux

In a C++ Linux app, what is the simplest way to get the functionality that the Interlocked functions on Win32 provide? Specifically, a lightweight way to atomically increment or add 32 or 64 bit integers?
Just few notes to clarify the issue which has nothing to do with Linux.
RWM (read-modify-write) operations and those that do not execute in a single-step need the hardware-support to execute atomically; among them increments and decrements, fetch_and_add, etc.
For some architecture (including I386, AMD_64 and IA64) gcc has a built-in support for atomic memory access, therefore no external libray is required. Here you can read some information about the API.
Intel's open-source ThreadBuildingBlocks has a template, Atomic, that offers the same functionality as .NET's Interlocked class.
Unlike gcc's Atomic built-ins, it's cross platform and doesn't depend on a particular compiler. As Nemanja Trifunovic correctly points out above, it does depend on the compare-and-swap CPU instruction provided by x86 and Itanium chips. I guess you wouldn't expect anything else from an Intel library : )
Strictly speaking, Linux cannot offer atomic "interlocked" functions like ones in Win32, simply because these functions require hardware support, and Linux runs on some platforms that don't offer that support. Having said that, if you can constrain yourself to Intel x86/x64, take a look at the implementation of reference counting in Boost shared pointers library.
The atomic functions from the Apache Portable Runtime are really close to the Win32 InterlockedXXX functions.
You can insert some assembly code in your source, to use x68 interlocked instructions directly.
You should use a lock xadd operation.
See for instance this.
The fairly common glib library that's used in GTK and QT programming as well as standalone offers a variety of atomic operations. See http://library.gnome.org/devel/glib/2.16/glib-Atomic-Operations.html for a list. There are g_atomic functions for most of the operations that Interlocked supports on Win32, and on platforms where the hardware directly supports these, they are inlined as the needed assembly code.
Upon further review, this looks promising. Yay stack overflow.