Lock-free atomic bools on RISC-V in C++

Lock-free atomic bools on RISC-V in C++ - c++

I have this C++ code that asserts atomic bools are always lock-free (for reference, this is from Godot Engine):
static_assert(std::atomic_bool::is_always_lock_free);
This works fine on x86, but when compiling with GCC for RISC-V, this assert fails with this message:
./core/templates/safe_refcount.h:140:34: error: static assertion failed
140 | static_assert(std::atomic_bool::is_always_lock_free);
| ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
This assert doesn't fail when compiling with Clang (clang++), it only happens with GCC (g++).
I've been looking around for awhile but the available resources for RISC-V are fairly low. I did find this somewhat similar question about lock-free atomics on ARM on a Raspberry Pi 3 where the problem is that the compiler is unaware that atomic bools are guaranteed to be lock-free at compile time. If this is the case, perhaps I am just missing some compiler flag. Alternatively, it is possible that the target architecture is missing a RISC-V extension for this (I have tried both the default rv64imafdc, and rv64gc), but I doubt it since the same target works with Clang.

Related

Is there a difference between Apple Clang and OpenMP-enabled Clang from Homebrew?

I want to know if there are any advantages in Apple's provided Clang compiler compared to the Clang compiler that comes with OpenMP available from Homebrew?
Will there be any performance loss if switching to OpenMP Clang (regardless of the multi-threading ability)?
I also found this old question that has no good answer
Update
I compiled the OOFEM using Apple's Clang and mainstream Clang and ran the same problem,
Apple's Clang: Real time consumed: 000h:01m:26s
Mainstream Clang: Real time consumed: 000h:01m:24s
With multi-threading enabled also the performance is similar.
One difference that I also noticed, is that Apple's Clang seems to ignore some CMake options e.g. -DOpenMP_CXX_FLAGS="-I/usr/local/opt/libomp/include" has no effect with Apple's Clang while works fine with the mainstream Clang.

Is there a difference?
As stated that answers itself. They're two different compilers and we don't know what Apple have done inside theirs. We do know that they don't provide OpenMP support, so that is at least one difference.
Will there be any performance loss if switching to OpenMP Clang
(regardless of the multi-threading ability)?
I doubt it, but since you're clearly measuring performance and playing with both compilers, you seem in a good position to tell us :-)

Alternative for missing __sync_fetch_and_add_8 on MIPS 32-bit

I am writing an atomic increment function for int64_t type that works on many different OS / CPU combinations. For example, on Windows I can use InterlockedIncrement64, on OS X I can use OSAtomicIncrement64Barrier, and on Linux variants I can use GCC built-in __sync_fetch_and_add.
However, when cross-compiling with GCC for MIPS 32-bit architecture, I encounter a link error regarding missing reference to __sync_fetch_and_add_8. Some quick Googling showed that the MIPS 32-bit architecture does not support 64-bit atomic increment instruction (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56300). The suggestion in that bug report to link against libatomic does not seem to work, which may be because I am still on GCC 4.7.
I know that I can always resort to a pthread mutex to protect the increment logic, but this is dramatically slower than taking advantage of a native instruction.
Do you have any recommendation on how to achieve the 64-bit atomic increment in any other way for the MIPS 32-bit architecture?

I encountered the similar problem when using __atomic
undefined reference to `__atomic_fetch_add_8'
I solved it by linking with libatomic.
BTW, my mipsel cross compiler is GCC 4.8.1
See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56300

LLVM vs clang on OS X

I have a question concerning llvm, clang, and gcc on OS X.
What is the difference between the llvm-gcc 4.2, llvm 2.0 and clang? I know that they all build on llvm but how are they different?
Besides faster compiling, what is the advantage of llvm over gcc?

LLVM originally stood for "low-level virtual machine", though it now just stands for itself as it has grown to be something other than a traditional virtual machine. It is a set of libraries and tools, as well as a standardized intermediate representation, that can be used to help build compilers and just-in-time compilers. It cannot compile anything other than its own intermediate representation on its own; it needs a language-specific frontend in order to do so. If people just refer to LLVM, they probably mean just the low-level library and tools. Some people might refer to Clang or llvm-gcc incorrectly as "LLVM", which may cause some confusion.
llvm-gcc is a modified version of GCC, which uses LLVM as its backend instead of GCC's own. It is now deprecated, in favor of DragonEgg, which uses GCC's new plugin system to do the same thing without forking GCC.
Clang is a whole new C/C++/Objective-C compiler, which uses its own frontend, and LLVM as the backend. The advantages it provides are better error messages, faster compile time, and an easier way for other tools to hook into the compilation process (like the LLDB debugger and Clang static analyzer). It's also reasonably modular, and so can be used as a library for other software that needs to analyze C, C++, or Objective-C code.
Each of these approaches (plain GCC, GCC + LLVM, and Clang) have their advantages and disadvantages. The last few sets of benchmarks I've seen showed GCC to produce slightly faster code in most test cases (though LLVM had a slight edge in a few), while LLVM and Clang gave significantly better compile times. GCC and the GCC/LLVM combos have the advantage that a lot more code has been tested and works on the GCC flavor of C; there are some compiler specific extensions that only GCC has, and some places where the standard allows the implementation to vary but code depends on one particular implementation. It is a lot more likely if you get a large amount of legacy C code that it will work in GCC than that it will work in Clang, though this is improving over time.

There are 2 different things here.
LLVM is a backend compiler meant to build compilers on top of it. It deals with optimizations and production of code adapted to the target architecture.
CLang is a front end which parses C, C++ and Objective C code and translates it into a representation suitable for LLVM.
llvm gcc was an initial version of a llvm based C++ compiler based on gcc 4.2, which is now deprecated since CLang can parse everything it could parse, and more.
Finally, the main difference between CLang and gcc does not lie in the produced code but in the approach. While gcc is monolithic, CLang has been built as a suite of libraries. This modular design allow great reuse opportunities for IDE or completion tools for example.
At the moment, the code produced by gcc 4.6 is generally a bit faster, but CLang is closing the gap.

llvm-gcc-4.2 uses the GCC front-end to parse your code, then generates the compiled output using LLVM.
The "llvm compiler 2.0" uses the clang front-end to parse your code, and generates the compiled output using LLVM. "clang" is actually just the name for this front-end, but it is often used casually as a name for the compiler as a whole.

Would C/C++ code compiling on Mingw guarantee full compatibility with GCC(on linux and Mac)

I'd like to produce cross-compiler compatible C++ code.
I've produced a somewhat "exotic" code, that push the C++ language in its gray, weird, mysterious areas.
Considering my code only depends on boost and the STL, that the issue is to check code compatibility, and not lib compatibility:
Would my code compiling both msvc and Mingw ensures a 100% that my code is compatible with GCC on every platform?

Not at all.
Compiling your code with MSVC and MinGW guarantees that your code is compatible with Microsoft's C/C++ libraries. I understand you're only talking about code compatibility, but such a thing doesn't exist. If you're pushing C++ into the gray areas, it might well be that the same code will have different results depending on the platform you compile it on.
The best, and only way to guarantee full compatibility is compiling and testing it on both platforms.
Although using GCC with -std=c++0X -Wall -Wextra -pedantic (or any other std version) and getting rid of all the warnings will give a pretty good idea of code quality.

Honestly? To guarantee your code will compile with GCC on any platform is impossible. There is always that likelihood that something could be off, especially if you are doing 'exotic' things with your code.

You could also try compiling with cygwin, which would give a better idea of how it will build on a more Unix like system (although it's still not guaranteed to work on all systems, but it's better than just trying msvc and MingW which are both just windows compilers).

Benchmarks for Intel C++ compiler and GCC

I have an AMD Opteron server running CentOS 5. I want to have a compiler for a fairly large C++ Boost based program. Which compiler I should choose?

There is an interesting PDF here which compares a number of compilers.

I hope this helps more than hurts :)
I did a little compiler shootout sometime over a year ago, and I am going off memory.
GCC 4.2 (Apple)
Intel 10
GCC 4.2 (Apple) + LLVM
I tested multiple template heavy audio signal processing programs that I'd written.
Compilation times: The Intel compiler was by far the slowest compiler - more than '2x times slower' as another posted cited.
GCC handled deep templates very well in comparison to Intel.
The Intel compiler generated huge object files.
GCC+LLVM yielded the smallest binary.
The generated code may have significant variance due to the program's construction, and where SIMD could be used.
For the way I write, I found that GCC + LLVM generated the best code. For programs which I'd written before I took optimization seriously (as I wrote), Intel was generally better.
Intel's results varied; it handled some programs far better, and some programs far worse. It handled raw processing very well, but I give GCC+LLVM the cake because when put into the context of a larger (normal) program... it did better.
Intel won for out of the box, number crunching on huge data sets.
GCC alone generated the slowest code, though it can be as fast with measurement and nano-optimizations. I prefer to avoid those because the wind may change direction with the next compiler release, so to speak.
I never measured poorly written programs in this test (i.e. results outperformed distributions of popular performance libraries).
Finally, the programs were written over several years, using GCC as the primary compiler in that time.
Update: I was also enabling optimizations/extensions for Core2Duo. The programs were clean enough to enable strict aliasing.

The MySQL team posted once that icc gave them about a 10% performanct boost over gcc. I'll try to find the link.
In general I've found that the 'native' compilers perform better than gcc on their respective platforms
edit: I was a little off. Typical gains were 20-30% not 10%. Some narrow edge cases got a doubling of performance. http://www.mysqlperformanceblog.com/files/presentations/LinuxWorld2004-Intel.pdf

I suppose it varies depending on the code, but with the codebase I am working on now, ICC 11.035 gives an almost 2x improvement over gcc 4.4.0 on a Xeon 5504.
icc options: -O2 -fno-alias
gcc options: -O3 -msse3 -mfpmath=sse -fargument-noalias-global
The options are specific to just the file containing the compute-intensive code, where I know there is no aliasing. Single-threaded code with a 5-level nested loop.
Although autovectorization is enabled, neither compilers generate vectorized code (not a fault of the compilers)
Update (2015/02/27):
While optimizing some geophysics code (Q2, 2013) to run on Sandy Bridge-E Xeons, I had an opportunity to compare the performance of ICC 11.1 against GCC 4.8.0, and GCC was now generating faster code than ICC. The code made used of AVX intrinsics and did use 8-way vectorized instructions (nieither compiler autovectorized the code properly due to certain data layout requirements). In addition, GCC's LTO implementation (with the IR core embedded in the .o files) was much easier to manage than that in ICC. GCC with LTO was running roughly 3 times faster than ICC without LTO. I'm not able to find the numbers right now for GCC without LTO, but I recall it was still faster than ICC. It's by no means a general statement on ICC's performance, but the results were sufficient for us to go ahead with GCC 4.8.*.
Looking forward to GCC 5.0 (http://www.phoronix.com/scan.php?page=article&item=gcc-50-broadwell)!

We use the Intel compiler on our product (DB2), on Linux and Windows IA32/AMD64, and on OS X (i.e. all our Intel platform ports except SunAMD).
I don't know the numbers, but the performance is good enough that we:
pay for the compiler which I'm told is very expensive.
live with the 2x times slower build times (primarily due to the time it spends acquiring licenses before it allows itself to run).

PHP - Compilation from source, with ICC rather than GCC, should result in a 10 % to 20 % speed improvment - http://www.papelipe.no/tags/ez_publish/benchmark_of_intel_compiled_icc_apache_php_and_apc
MySQL - Compilation from source, with ICC rather than GCC, should result in a 25 % to 50 % speed improvment - http://www.mysqlperformanceblog.com/files/presentations/LinuxWorld2005-Intel.pdf

I used to work on a fairly large signal processing system which ran on a large cluster. We used to reckon for heavy maths crunching, the Intel compiler gave us about 10% less CPU load than GCC. That's very unscientific but it was our experience (that was about 18 months ago).
What would have been interesting is if we'd been able to use Intel's math libraries as well which use their chipset more efficiently.

I used UnixBench (v. 5.1.3) on an openSUSE 12.2 (kernel 3.4.33-2.24-default x86_64), and compiled it first with GCC, and then with Intel's compiler.
With 1 parallel copy, UnixBench compiled with Intel's is about 20% faster than the version compiled with GCC.
However this hides huge differences. Dhrystone is about 25% slower with Intel compiler, while Whetstone runs 2x faster.
With 4 copies of UnixBench running in parallel, the improvement of Intel compiler over GCC is only 7%. Again Intel is much better at Whetstone (> 200%), and slower at Dhrystone (about 20%).

Many optimizations which the Intel compiler performs routinely require specific source syntax and use of -O3 -ffast-math for gcc. Unfortunately, the -funsafe-math-optimizations component of -ffast-math -O3 -march=native has turned out to be incompatible with -fopenmp, so I must split my source files into groups named with the different options in Makefile. Today I ran into a failure where a g++ build using -O3 -ffast-math -fopenmp -march=native was able to write to screen but not redirect to a file.
One of the more egregious differences in my opinion is the optimization by icpc only of std::max and min where gcc/g++ want the fmax|min[f] with -ffast-math to change their meaning away from standard.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js