It seems that for CMPXCHG16B to be used, one has to define _STD_ATOMIC_ALWAYS_USE_CMPXCHG16B = 1
so that these instructions are used.
Why is this the default? I would have never found out about this unless I read the whole atomic.h header either.
What other global defines in the STL are there? Is there a list to review so one can reliably be aware of these implementation details?
_STD_ATOMIC_ALWAYS_USE_CMPXCHG16B was recently introduced in Visual Studio 2019 (the PR)
Visual Studio 2019 still supports older OSes such as Windows Vista, and Windows 7. These OSes can run on old AMD Opteron CPUs that don't have this instruction.
Even if _STD_ATOMIC_ALWAYS_USE_CMPXCHG16B = 0, there's runtime detection that uses CMPXCHG16B if it is available. But in this case the instructions is not inlined, and there's also a branch, so it is less efficient than defining _STD_ATOMIC_ALWAYS_USE_CMPXCHG16B = 1.
Please also note that CMPXCHG16B is used for atomic_ref, but not for atomic due to ABI compatibility. (It was possible to introduce for atomic_ref, since there was no pre-C++20 atomic_ref to be ABI-compatible with).
In vNext version (the next major, ABI breaking version), atomic should use CMPXCHG16B as well. There's also hope that old CPUs/OSes support will be dropped, and the use of CMPXCHG16B would become unconditional. (See https://github.com/microsoft/STL/issues/1151).
I would have never found out about this unless I read the whole atomic.h header either.
What other global defines in the STL are there? Is there a list to review so one can reliably be aware of these implementation details?
I'm afraid there's no comprehensive list, although some are documented.
The excuse for _STD_ATOMIC_ALWAYS_USE_CMPXCHG16B in particular could be that whole atomic_ref is not documented, and as C++20 feature it has experimental status in Visual Studio 2019.
Related
Just interesting how it works in games and other software.
More precisely, I'm asking for a solution in C++.
Something like:
if AMX available -> Use AMX version of the math library
else if AVX-512 available -> Use AVX-512 version of the math library
else if AVX-256 available -> Use AVX-256 version of the math library
etc.
The basic idea I have is to compile the library in different DLLs and swap them on runtime but it seems not to be the best solution for me.
For the detection part
See Are the xgetbv and CPUID checks sufficient to guarantee AVX2 support? which shows how to detect CPU and OS support for new extensions: cpuid and xgetbv, respectively.
ISA extensions that add new/wider registers that need to be saved/restored on context switch also need to be supported and enabled by the OS, not just the CPU. New instructions like AVX-512 will still fault on a CPU that supports them if the OS hasn't set a control-register bit. (Effectively promising that it knows about them and will save/restore them.) Intel designed things so the failure mode is faulting, not silent corruption of registers on CPU migration, or context switch between two programs using the extension.
Extensions that added new or wider registers are AVX, AVX-512F, and AMX. OSes need to know about them. (AMX is very new, and adds a large amount of state: 8 tile registers T0-T7 of 1KiB each. Apparently OSes need to know about AMX for power-management to work properly.)
OSes don't need to know about AVX2/FMA3 (still YMM0-15), or any of the various AVX-512 extensions which still use k0-k7 and ZMM0-31.
There's no OS-independent way to detect OS support of SSE, but fortunately it's old enough that these days you don't have to. It and SSE2 are baseline for x86-64. Everything up to SSE4.2 uses the same register state (XMM0-15) so OS support for SSE1 is sufficient for user-space to use SSE4.2. SSE1 was new in 1999, with Pentium 3.
Different compilers have different ways of doing CPUID and xgetbv detection. See does gcc's __builtin_cpu_supports check for OS support? - unfortunately no, only CPUID, at least when that was asked. I'd consider that a GCC bug, but IDK if it ever got reported or fixed.
For the optional-use part
Typically setting function pointers to selected versions of some important functions. Inlining through function pointers isn't generally possible, so make sure you choose the boundaries appropriately, like an AVX-512 version of a function that includes a loop, not just a single vector.
GCC's function multi-versioning can automate that for you, transparently compiling multiple versions and hooking some function-pointer setup.
There have been some previous Q&As about this with different compilers, search for "CPU dispatch avx" or something like that, along with other search terms.
See The Effect of Architecture When Using SSE / AVX Intrinisics to understand the difference between GCC/clang's model for intrinsics where you have to enable -march=skylake or whatever, or manually -mavx2, before you can use an intrinsic. vs. MSVC and classic ICC where you could use any intrinsic anywhere, even to emit instructions the compiler wouldn't be able to auto-vectorize with. (Those compilers can't or don't optimize intrinsics much at all, perhaps because that could lead to them getting hoisted out of if(cpu) statements.)
Windows provides IsProcessorFeaturePresent but AVX support is not on the list.
For more detailed detection you need to ask the CPU directly. On x86 this means the CPUID instruction. Visual C++ provides the __cpuidex intrinsic for this. In your case, function/leaf 1 and check bit 28 in ECX. Wikipedia has a decent article but you really should download the Intel instruction set manual to use as a reference.
In order to narrow the scope of this question, let's consider projects in C / C++ only.
There is a whole array of new SIMD instruction set extensions for x86 architecture, though in order to benefit from them a developer should recompile the code with an appropriate optimization flag, and perhaps, modify it accordingly as well.
Since new instruction set extensions come out relatively frequently, it's unclear how the backward compatibility can be maintained while utilizing the benefits of available instruction set extensions.
Is a resulting application stays compatible with the older CPU models that don't support a new institution set extension? If yes, could you elaborate on how such support implemented?
New CPU instructions require new hardware to execute. If you try to run them on older CPUs that don't support those instructions, your program will crash with an Invalid Opcode fault. Occasionally OSes will handle this condition, but usually not.
To run with the new instructions, you either need to require that they are supported in hardware, or (if the benefit is great enough) check at runtime to see if the new instructions you need are supported. If they are, you run a section of code that uses them. If they are not, you run a different section of code that does not use them.
Generally "backwards compatible" refers to a new version of something running stuff that runs on the older, existing things, and not old things running with new stuff.
Historically, most x86 instruction sets have been (practically) strict supersets of previous sets. However, the AVX-512 extension comes in several mutually-incompatible variants, so particular care will need to be taken.
Fortunately, compilers are also getting smarter. GCC has __attribute__((simd)) and __attribute__((target_clones(...))) to automatically create multiple implementations of the given function, and choose the best one at load time based on what the actual CPU supports. (For older GCC versions, you had to use IFUNC manually ... and in ancient days, ld.so would load libraries from a completely separate directory depending on things like cmov).
When Microsoft initially released Visual Studio 2012 in September 2012, they announced their plan for providing updates for Visual Studio on a more regular basis. Since then, they have released Visual Studio 2012 Update 1 (Visual Studio 2012.1) in November 2012 and Visual Studio 2012 Update 2 (Visual Studio 2012.2) in April 2013.
My question is: Did the updates introduce any changes to the C++ ABI (with regard to the initial VS2012 version)? Is it safe to link .libs of different VS2012 versions?
I have searched the internet for a while and could not find any definite statement from Microsoft. Some sources mention that some bugs in the C++ code generation have been fixed but I suppose that does not imply an ABI change?
Stephan T. Lavavej, a key author of Visual C++'s STL implementation laid out the rules in this Reddit thread:
Here are the precise rules:
If you include any C++ Standard Library headers, you have to play by its rules, and we intentionally break binary compatibility between major versions (but preserve it between hotfixes and service packs). Any representation changes (including but not limited to adding/removing data members) break binary compatibility, which is why this always happens and why we jealously guard this right.
[snip]
So, if you're playing by the STL's rules, you need to ensure the following:
All object files and static libraries linked into a single binary (EXE/DLL) must be compiled with the same major version. We added linker checks, so that mismatching VS 2010+ major versions will trigger hard errors at link time, but if VS 2008 or earlier is involved we can't help you (no time machines). Because the ODR applies here, you really should be using the same toolset (i.e. same service pack level) for all of the object files and static libraries. For example, we fixed a std::string memory leak between VS 2010 RTM and SP1, but if you mix RTM and SP1, the resulting binary may or may not be affected by the leak. (Additionally, you need to be using the same _ITERATOR_DEBUG_LEVEL and release/debug settings; we have linker checks for these now.)
If you have multiple binaries loaded into the same process, and they pass C++ Standard Library objects to each other, those binaries must be built with the same major version and _ITERATOR_DEBUG_LEVEL settings (release/debug should match too, I forget if you can get away with mismatch here). Importantly, we cannot detect violations of this rule, so it's up to you to follow it.
Multiple binaries whose interfaces are purely C or COM (or now WinRT) may internally use different major versions, because those things guarantee binary compatibility. If your interfaces involve the C++ Core Language (e.g. stuff with virtuals) but are extremely careful to never mention any C++ Standard Library types, then you are probably okay - the compiler really tries to avoid breaking binary compatibility.
Note, however, that when multiple binaries loaded into a single process are compiled with different major versions, you'll almost certainly end up with multiple CRTs loaded into your process, which is undesirable.
Bottom line - if you compile everything 100% consistently, you just don't have to worry about any of this stuff. Don't play mixing games if you can possibly avoid it.
Finally, I found an answer to my question in Stephan T. Lavavej's blog post C++11/14 STL Features, Fixes, And Breaking Changes In VS 2013:
The VS Update mechanism is primarily for shipping high-priority bugfixes, not for shipping new features, especially massive rewrites with breaking changes (which are tied to equally massive compiler changes).
Major versions like Visual C++ 2013 give us the freedom to change and break lots of stuff. There's simply no way we can ship this stuff in an Update.
Q5: What about the bugfixes? Can we get those in an Update?
A5: This is an interesting question because the answer depends on my choices (whereas in the previous question, I wouldn't be allowed to ship such a rewrite in an Update even if I wanted to).
Each team gets to choose which bugfixes they take to "shiproom" for consideration to be included in an Update. There are things shiproom won't let us get away with (e.g. binary breaking changes are forbidden outside of major versions), but otherwise we're given latitude to decide things. I personally prioritize bandwidth over latency - that is, I prefer to ship a greater total number of bugfixes in every major version, instead of shipping a lesser total number of bugfixes (over the same period of time) more frequently in multiple Updates.
Some months ago I posted the following question
Problem with templates in VS 6.0
The ensuing discussion and your comments helped me to realize that getting my hands on a new compiler was mandatory - or basically they were the final spark which set me into motion. After one month of company-internal "lobbying" I am finally getting VS 2012 !! (thank you guys)
Several old tools which I have to use were developed with VS 6.0
My concerns are that some of these tools might not work with the new Compiler. This is why I was wondering whether somebody here could point out the major differences between VS 6 and VS 2012 - or at least the ones between VS 6 and VS 2010 - the changes from 2010 to 2012 are well documentes online.
Obviously the differences between VS 6.0 and VS 12 must be enormous ... I am mostly concerned with basic things like casts etc. There is hardly any information about VS 6.0 on the web - and I am somewhat at a loss :(
I think I will have to create new projects with the same classes. In the second step I would overwrite the .h and .cpp files with the ones of the old tools. Thus at least I will be able to open the files via the new compiler. Still some casts or Class definitions might not be supported and I would like to have a general idea of what to look for while debugging :)
The language has evolved significantly since VS 6.0 came out.
VS6.0 is pre-C++98; VS 2012 is C++03, with a few features from
C++11.
Most of the newer language features are upwards compatible;
older code should still work. Still, VC 6.0 is pre-standard,
and the committee was less concerned about breaking existing
code when there was no previous standard (and implementations
did vary). There are several aspects of the language (at least)
which might cause problems.
The first is that VC 6.0 still used the old scoping for
variables defined in a for. Thus, in VC 6.0, things like the following
were legal:
int findIndex( int* array, int size, int target )
{
for ( int i = 0; i < size && array[i] != target ; ++ i ) {
}
return i;
}
This will not compile in VC 2012 (unless there is also a global
variable i, in which case, it will return that, and not the
local one).
IIRC, too, VC 6.0 wasn't very strict in enforcing access
controls and const. This may not be problem when migrating,
however, because VC 2012 still fails to conform to C++98 in some
of the more flagrant cases, at least with the default options.
(You can still bind a temporary to a non-const reference, for
example.)
Another major language change which isn't backwards compatible
is name lookup in templates. Here too, however, even in VC
2012, Microsoft has implemented pre-standanrd name lookup (and
I mean pre-C++98). This is a serious problem if you want to
port your code to other compilers, but it does make migrating
from VC 6.0 to VC 2012 a lot easier.
With regards to the library, I can't remember whether 6.0
supported the C++98 library, or whether it was still
pre-standard (or possibly it supported both). If your code has
things like #include <iostream.h> in it, be prepared for some
differences here: minor for straight forward use of << and
>>; major if you implement some complicated streambuf. And
of course, all of the library was moved from global namespace to
std::.
For the rest: your code obviously won't use any of the features
introduced after VC 6.0 appeared. This won't cause migration
problems (since the older features are still supported), but
you'll doubtlessly want to go back, and gradually upgrade the
code once you've migrated. (You mentionned casts. This is
a good example: C style casts are still legal, with the same
semantics they've always had, but in new code, you'll want to
avoid them, and least when pointers or references are involved.)
I am curious, do new compilers use some extra features built into new CPUs such as MMX SSE,3DNow! and so?
I mean, in original 8086 there was even no FPU, so compiler that old cannot even use it, but new compilers can, since FPU is part of every new CPU. So, does new compilers use new features of CPU?
Or, it should be more right to ask, does new C/C++ standart library functions use new features?
Thanks for answer.
EDIT:
OK, so, if I get all of you right,even some standart operations, especially with float numbers can be done using SSE faster.
In order to use it, I must enable this feature in my compiler, if it supports it. If it does, I must be sure that targeted platform supports that features.
In case of some system libraries that require top performance, such as OpenGL, DirectX and so, this support may be supported in system.
By default, for compatibility reasons, compiler doesen´t support it, but you can add this support using special C functions delivered by, for example Intel. This should be the best way, since you can directly control wheather and when you use special features of desired platform, to write multi-CPU-support applications.
gcc will support newer instructions via command line arguments. See here for more info. To quote:
GCC can take advantage of the
additional instructions in the MMX,
SSE, SSE2, SSE3 and 3dnow extensions
of recent Intel and AMD processors.
The options -mmmx, -msse, -msse2,
-msse3 and -m3dnow enable the use of these extra instructions, allowing
multiple words of data to be processed
in parallel. The resulting executables
will only run on processors supporting
the appropriate extensions--on other
systems they will crash with an
Illegal instruction error (or similar)
These instructions are not part of any ISO C/C++ standards. They are available through compiler intrinsics, depending on the compiler used.
For MSVC, see http://msdn.microsoft.com/en-us/library/26td21ds(VS.80).aspx
For GCC, you could look at http://developer.apple.com/hardwaredrivers/ve/sse.html
AFAIK, SSE intrinsics are the same between GCC and MSVC.
Compilers will aim for producing code for a minimal set of features in a processor. They also provide compilation switches that allow you to target specific processors. In this manner, they can sell more compilers (to those folks with old processors as well as the trendy folk with new ones).
You will need to study the documentation that came with your compiler.
Sometimes the runtime library will contain multiple implementations of a feature, and the library will dynamically choose between implementations when the program is run. The overhead might be the cost of a function pointer call instead of a direct function call, but the benefit could be much greater when using a CPU-specific optimised function.
JIT compilers (for VM languages such as Java and C#) take this one step further and compile the bytecode for the specific CPU that it's running on. This gives your own code the benefit of specific CPU optimisation. This is one reason why Java code can actually be faster than compiled C code, because the Java JIT compiler can delay its optimisation decisions until the program is run on the actual target machine. A C compiler must make those decisions without always knowing what the target CPU is. Furthermore, JIT compilers evolve and can make your program faster over time without you having to do anything.
If you use the Intel C compiler, and set sufficiently high optimisation options, you will find that some of your loops get 'vectorised', which means the compiler has rewritten them to use SSE-style instructions.
If you want to use SSE operations directly, you use the intrinsics defined in the 'xmmintrin.h' header file; say
#include <xmmintrin.h>
__m128 U, V, W;
float ww[4];
V=_mm_set1_ps(1.5);
U=_mm_set_ps(0,1,2,3);
W=_mm_add_ps(U,V);
_mm_storeu_ps(ww,W);
Varying compilers will use varying new features. Visual Studio will use SSE/2, and I believe the Intel compiler will support the very latest in CPU features. You should, of course, be wary about the market penetration of your favourite feature.
As for what your favourite standard library use, that depends on what it was compiled with. However, C++ standard library is typically compiled on-site, since it's very heavily templated, so if you enable SSE2, the C++ std libs should use it. As for the CRT, depends on what they were compiled with.
There are generally two ways a compiler can generate code that uses special features like these:
When the compiler itself is compiled, you configure it to generate code for a particular architecture, and it can take advantage of any features it knows that architecture will have. For example, if it gcc is configured for an Intel processor new enough (or is that "not old enough"?) to contain an integrated FPU, it will generate floating-point instructions.
When the compiler is invoked, flags or parameters can specify the type of features available to the processor that will run the program, and then the compiler will know it is safe to use these features. If the flags aren't present, it will generate equivalent code without using the special instructions provided by those features.
If you're talking about code written in C/C++, the new features are explited if you tell to your compiler to do so. By default, your compiler probably targets "plain x86" (naturally with FPU :) ), usually optimized for the most widespread processor generation at the moment, but still able to run on older processors.
If you want the compiler to generate code also considering the new instruction sets, you should tell it to do so with the appropriate command line switch/project setting, for example for Visual C++ the option to enable SSE/SSE2 instructions generation is /arch.
Notice that many features of new instruction sets cannot be exploited directly in "normal" code, so you are usually provided with compiler intrinsics to operate on the particular datatypes native of the new instruction sets.
Intel provides updated CPUID example code every time they release a new cpu so that you can check for the new features and has been as long as I remember. At least this is what I found the first time I thought about this same question myself.
Using CPUID to Detect the presence of SSE 4.1 and SSE 4.2 Instruction Sets
As new compilers are released they add the new features directly like VS2010 for example.
Visual C++ Code Generation in Visual Studio 2010