Detect the availability of SSE/SSE2 instruction set in Visual Studio - c++

How can I check in code whether SSE/SSE2 is enabled or not by the Visual Studio compiler?
I have tried #ifdef __SSE__ but it didn't work.

Some additional information on _M_IX86_FP.
_M_IX86_FP is only defined for 32-bit code. 64-bit x86 code has at least SSE2. You can use _M_AMD64 or _M_X64 to determine if the code is 64-bit.
#ifdef __AVX2__
//AVX2
#elif defined ( __AVX__ )
//AVX
#elif (defined(_M_AMD64) || defined(_M_X64))
//SSE2 x64
#elif _M_IX86_FP == 2
//SSE2 x32
#elif _M_IX86_FP == 1
//SSE x32
#else
//nothing
#endif

From the documentation:
_M_IX86_FP
Expands to a value indicating which /arch compiler option was used:
0 if /arch:IA32 was used.
1 if /arch:SSE was used.
2 if /arch:SSE2 was used. This value is the default if /arch was not specified.
I don't see any mention of _SSE_.

The relevant preprocessor macros have two underscores at each end:
#ifdef __SSE__
#ifdef __SSE2__
#ifdef __SSE3__
#ifdef __SSE4_1__
#ifdef __AVX__
...etc...
UPDATE: apparently the above macros are not automatically pre-defined for you when using Visual Studio (even though they are in every other x86 compiler that I have ever used), so you may need to define them yourself if you want portability with gcc, clang, ICC, et al...

This is a late answer, but on MSDN you can find an article about __cpuid and __cpuidex. I redid the class into a function and it checks the support of MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1.
https://learn.microsoft.com/en-us/cpp/intrinsics/cpuid-cpuidex?view=vs-2019
[[nodiscard]] bool CheckSimdSupport() noexcept
{
std::array<int, 4> cpui;
int nIds_{};
std::bitset<32> f_1_ECX_{};
std::bitset<32> f_1_EDX_{};
std::vector<std::array<int, 4>> data_;
__cpuid(cpui.data(), 0);
nIds_ = cpui[0];
for (int i = 0; i <= 1; ++i)
{
__cpuid(cpui.data(), i);
data_.push_back(cpui);
}
if (nIds_ >= 1)
{
f_1_ECX_ = data_[1][2];
f_1_EDX_ = data_[1][3];
}
// f_1_ECX_[0] - SSE3
// f_1_ECX_[9] - SSSE3
// f_1_ECX_[19] - SSE4.1
// f_1_EDX_[23] - MMX
// f_1_EDX_[25] - SSE
// f_1_EDX_[26] - SSE2
return f_1_ECX_[0] && f_1_ECX_[9] && f_1_ECX_[19] && f_1_EDX_[23] && f_1_EDX_[25] && f_1_EDX_[26];
}

Related

Is there a way to check if virtualisation is enabled on the BIOS using C++?

I'm trying to check if virtualisation (AMD-V or Intel VT) is enabled programmatically. I know the bash commands that gives you this information but I'm trying to achieve this in C++.
On that note, I'm trying to avoid using std::system to execute shell code because of how hacky and unoptimal that solution is. Thanks in advance!
To check if VMX or SVM (Intel and AMD virtualization technologies) are enabled you need to use the cpuid instruction.
This instruction comes up so often in similar tests that the mainstream compilers all have an intrinsic for it, you don't need inline assembly.
For Intel's CPUs you have to check CPUID.1.ecx[5], while for AMD's ones you have to check CPUID.1.ecx[2].
Here's an example. I only have gcc on an Intel CPU, you are required to properly test and fix this code for other compilers and AMD CPUs.
The principles I followed are:
Unsupported compilers and non-x86 architecture should fail to compile the code.
If run on an x86 compatible CPU that is neither Intel nor AMD, the function return false.
The code assumes cpuid exists, for Intel's CPUs this is true since the 486. The code is designed so that cpuid_ex can return false to denote that cpuid is not present, but I didn't implement a test for it because that would require inline assembly. GCC and clang have a built-in for this but MSVCC doesn't.
Note that if you only build 64-bit binaries you always have cpuid, you can remove the 32-bit architecture check so that if someday somebody tries to use a 32-bit build, the code fails to compile.
#if defined(_MSC_VER)
#include <intrin.h>
#elif defined(__clang__) || defined(__GNUC__)
#include <cpuid.h>
#endif
#include <string>
#include <iostream>
#include <cstdint>
//Execute cpuid with leaf "leaf" and given the temporary array x holding the
//values of eax, ebx, ecx and edx (after cpuid), copy x[start:end) into regs
//converting each item in a uint32_t value (so this interface is the same even
//for esoteric ILP64 compilers).
//If CPUID is not supported, returns false (NOTE: this check is not done, CPUID
//exists since the 486, it's up to you to implement a test. GCC & CLANG have a
//nice macro for this. MSVCC doesn't. ICC I don't know.)
bool cpuid_ex(unsigned int leaf, uint32_t* regs, size_t start = 0, size_t end = 4)
{
#if ( ! defined(__x86_64__) && ! defined(__i386__) && ! defined(_M_IX86) && ! defined(_M_X64))
//Not an x86
return false;
#elif defined(_MSC_VER) || defined(__INTEL_COMPILER)
//MS & Intel
int x[4]
__cpuid((int*)x, leaf);
#elif defined(__clang__) || defined(__GNUC__)
//GCC & clang
unsigned int x[4];
__cpuid(leaf, x[0], x[1], x[2], x[3]);
#else
//Unknown compiler
static_assert(false, "cpuid_ex: compiler is not supported");
#endif
//Conversion from x[i] to uint32_t is safe since GP registers are 32-bit at least
//if we are using cpuid
for (; start < end; start++)
*regs++ = static_cast<uint32_t>(x[start]);
return true;
}
//Check for Intel and AMD virtualization.
bool support_virtualization()
{
//Get the signature
uint32_t signature_reg[3] = {0};
if ( ! cpuid_ex(0, signature_reg, 1))
//cpuid is not supported, this returns false but you may want to throw
return false;
uint32_t features;
//Get the features
cpuid_ex(1, &features, 2, 3);
//Is intel? Check bit5
if (signature_reg[0] == 0x756e6547 && signature_reg[1] == 0x6c65746e && signature_reg[2] == 0x49656e69)
return features & 0x20;
//Is AMD? check bit2
if (signature_reg[0] == 0x68747541 && signature_reg[1] == 0x69746e65 && signature_reg[2] == 0x444d4163)
return features & 0x04;
//Not intel or AMD, this returns false but you may want to throw
return false;
}
int main()
{
std::cout << "Virtualization is " << (support_virtualization() ? "": "NOT ") << "supported\n";
return 0;
}
No, there is no way to detect virtualization support using only standard C++ facilities. The C++ standard library does not have anything related to hardware virtualization. (Its exposure of low level details like that is extremely limited.)
You would need to use OS-specific facilities to detect this.

How to define endian enum class waiting for C++20? [duplicate]

This question already has answers here:
Macro definition to determine big endian or little endian machine?
(22 answers)
Closed 2 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Is there a safe, portable way to determine (during compile time) the endianness of the platform that my program is being compiled on? I'm writing in C.
[EDIT]
Thanks for the answers, I decided to stick with the runtime solution!
To answer the original question of a compile-time check, there's no standardized way to do it that will work across all existing and all future compilers, because none of the existing C, C++, and POSIX standards define macros for detecting endianness.
But, if you're willing to limit yourself to some known set of compilers, you can look up each of those compilers' documentations to find out which predefined macros (if any) they use to define endianness. This page lists several macros you can look for, so here's some code which would work for those:
#if defined(__BYTE_ORDER) && __BYTE_ORDER == __BIG_ENDIAN || \
defined(__BIG_ENDIAN__) || \
defined(__ARMEB__) || \
defined(__THUMBEB__) || \
defined(__AARCH64EB__) || \
defined(_MIBSEB) || defined(__MIBSEB) || defined(__MIBSEB__)
// It's a big-endian target architecture
#elif defined(__BYTE_ORDER) && __BYTE_ORDER == __LITTLE_ENDIAN || \
defined(__LITTLE_ENDIAN__) || \
defined(__ARMEL__) || \
defined(__THUMBEL__) || \
defined(__AARCH64EL__) || \
defined(_MIPSEL) || defined(__MIPSEL) || defined(__MIPSEL__)
// It's a little-endian target architecture
#else
#error "I don't know what architecture this is!"
#endif
If you can't find what predefined macros your compiler uses from its documentation, you can also try coercing it to spit out its full list of predefined macros and guess from there what will work (look for anything with ENDIAN, ORDER, or the processor architecture name in it). This page lists a number of methods for doing that in different compilers:
Compiler C macros C++ macros
Clang/LLVM clang -dM -E -x c /dev/null clang++ -dM -E -x c++ /dev/null
GNU GCC/G++ gcc -dM -E -x c /dev/null g++ -dM -E -x c++ /dev/null
Hewlett-Packard C/aC++ cc -dM -E -x c /dev/null aCC -dM -E -x c++ /dev/null
IBM XL C/C++ xlc -qshowmacros -E /dev/null xlc++ -qshowmacros -E /dev/null
Intel ICC/ICPC icc -dM -E -x c /dev/null icpc -dM -E -x c++ /dev/null
Microsoft Visual Studio (none) (none)
Oracle Solaris Studio cc -xdumpmacros -E /dev/null CC -xdumpmacros -E /dev/null
Portland Group PGCC/PGCPP pgcc -dM -E (none)
Finally, to round it out, the Microsoft Visual C/C++ compilers are the odd ones out and don't have any of the above. Fortunately, they have documented their predefined macros here, and you can use the target processor architecture to infer the endianness. While all of the currently supported processors in Windows are little-endian (_M_IX86, _M_X64, _M_IA64, and _M_ARM are little-endian), some historically supported processors like the PowerPC (_M_PPC) were big-endian. But more relevantly, the Xbox 360 is a big-endian PowerPC machine, so if you're writing a cross-platform library header, it can't hurt to check for _M_PPC.
This is for compile time checking
You could use information from the boost header file endian.hpp, which covers many platforms.
edit for runtime checking
bool isLittleEndian()
{
short int number = 0x1;
char *numPtr = (char*)&number;
return (numPtr[0] == 1);
}
Create an integer, and read its first byte (least significant byte). If that byte is 1, then the system is little endian, otherwise it's big endian.
edit Thinking about it
Yes you could run into a potential issue in some platforms (can't think of any) where sizeof(char) == sizeof(short int). You could use fixed width multi-byte integral types available in <stdint.h>, or if your platform doesn't have it, again you could adapt a boost header for your use: stdint.hpp
With C99, you can perform the check as:
#define I_AM_LITTLE (((union { unsigned x; unsigned char c; }){1}).c)
Conditionals like if (I_AM_LITTLE) will be evaluated at compile-time and allow the compiler to optimize out whole blocks.
I don't have the reference right off for whether this is strictly speaking a constant expression in C99 (which would allow it to be used in initializers for static-storage-duration data), but if not, it's the next best thing.
Interesting read from the C FAQ:
You probably can't. The usual techniques for detecting endianness
involve pointers or arrays of char, or maybe unions, but preprocessor
arithmetic uses only long integers, and there is no concept of
addressing. Another tempting possibility is something like
#if 'ABCD' == 0x41424344
but this isn't reliable, either.
I would like to extend the answers for providing a constexpr function for C++
union Mix {
int sdat;
char cdat[4];
};
static constexpr Mix mix { 0x1 };
constexpr bool isLittleEndian() {
return mix.cdat[0] == 1;
}
Since mix is constexpr too it is compile time and can be used in constexpr bool isLittleEndian(). Should be safe to use.
Update
As #Cheersandhth pointed out below, these seems to be problematic.
The reason is, that it is not C++11-Standard conform, where type punning is forbidden. There can always only one union member be active at a time. With a standard conforming compiler you will get an error.
So, don't use it in C++. It seems, you can do it in C though. I leave my answer in for educational purposes :-) and because the question is about C...
Update 2
This assumes that int has the size of 4 chars, which is not always given as #PetrVepřek correctly pointed out below. To make your code truly portable you have to be more clever here. This should suffice for many cases though. Note that sizeof(char) is always 1, by definition. The code above assumes sizeof(int)==4.
Use CMake TestBigEndian as
INCLUDE(TestBigEndian)
TEST_BIG_ENDIAN(ENDIAN)
IF (ENDIAN)
# big endian
ELSE (ENDIAN)
# little endian
ENDIF (ENDIAN)
Not during compile time, but perhaps during runtime. Here's a C function I wrote to determine endianness:
/* Returns 1 if LITTLE-ENDIAN or 0 if BIG-ENDIAN */
#include <inttypes.h>
int endianness()
{
union { uint8_t c[4]; uint32_t i; } data;
data.i = 0x12345678;
return (data.c[0] == 0x78);
}
From Finally, one-line endianness detection in the C preprocessor:
#include <stdint.h>
#define IS_BIG_ENDIAN (*(uint16_t *)"\0\xff" < 0x100)
Any decent optimizer will resolve this at compile-time. gcc does at -O1.
Of course stdint.h is C99. For ANSI/C89 portability see Doug Gwyn's Instant C9x library.
I took it from rapidjson library:
#define BYTEORDER_LITTLE_ENDIAN 0 // Little endian machine.
#define BYTEORDER_BIG_ENDIAN 1 // Big endian machine.
//#define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
#ifndef BYTEORDER_ENDIAN
// Detect with GCC 4.6's macro.
# if defined(__BYTE_ORDER__)
# if (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# elif (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
# define BYTEORDER_ENDIAN BYTEORDER_BIG_ENDIAN
# else
# error "Unknown machine byteorder endianness detected. User needs to define BYTEORDER_ENDIAN."
# endif
// Detect with GLIBC's endian.h.
# elif defined(__GLIBC__)
# include <endian.h>
# if (__BYTE_ORDER == __LITTLE_ENDIAN)
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# elif (__BYTE_ORDER == __BIG_ENDIAN)
# define BYTEORDER_ENDIAN BYTEORDER_BIG_ENDIAN
# else
# error "Unknown machine byteorder endianness detected. User needs to define BYTEORDER_ENDIAN."
# endif
// Detect with _LITTLE_ENDIAN and _BIG_ENDIAN macro.
# elif defined(_LITTLE_ENDIAN) && !defined(_BIG_ENDIAN)
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# elif defined(_BIG_ENDIAN) && !defined(_LITTLE_ENDIAN)
# define BYTEORDER_ENDIAN BYTEORDER_BIG_ENDIAN
// Detect with architecture macros.
# elif defined(__sparc) || defined(__sparc__) || defined(_POWER) || defined(__powerpc__) || defined(__ppc__) || defined(__hpux) || defined(__hppa) || defined(_MIPSEB) || defined(_POWER) || defined(__s390__)
# define BYTEORDER_ENDIAN BYTEORDER_BIG_ENDIAN
# elif defined(__i386__) || defined(__alpha__) || defined(__ia64) || defined(__ia64__) || defined(_M_IX86) || defined(_M_IA64) || defined(_M_ALPHA) || defined(__amd64) || defined(__amd64__) || defined(_M_AMD64) || defined(__x86_64) || defined(__x86_64__) || defined(_M_X64) || defined(__bfin__)
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# elif defined(_MSC_VER) && (defined(_M_ARM) || defined(_M_ARM64))
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# else
# error "Unknown machine byteorder endianness detected. User needs to define BYTEORDER_ENDIAN."
# endif
#endif
I once used a construct like this one:
uint16_t HI_BYTE = 0,
LO_BYTE = 1;
uint16_t s = 1;
if(*(uint8_t *) &s == 1) {
HI_BYTE = 1;
LO_BYTE = 0;
}
pByte[HI_BYTE] = 0x10;
pByte[LO_BYTE] = 0x20;
gcc with -O2 was able to make it completely compile time. That means, the HI_BYTE and LO_BYTE variables were replaced entirely and even the pByte acces was replaced in the assembler by the equivalent of *(unit16_t *pByte) = 0x1020;.
It's as compile time as it gets.
To my knowledge no, not during compile time.
At run-time, you can do trivial checks such as setting a multi-byte value to a known bit string and inspect what bytes that results in. For instance using a union,
typedef union {
uint32_t word;
uint8_t bytes[4];
} byte_check;
or casting,
uint32_t word;
uint8_t * bytes = &word;
Please note that for completely portable endianness checks, you need to take into account both big-endian, little-endian and mixed-endian systems.
For my part, I decided to use an intermediate approach: try the macros, and if they don't exist, or if we can't find them, then do it in runtime. Here is one that works on the GNU-compiler:
#define II 0x4949 // arbitrary values != 1; examples are
#define MM 0x4D4D // taken from the TIFF standard
int
#if defined __BYTE_ORDER__ && __BYTE_ORDER__ == __LITTLE_ENDIAN
const host_endian = II;
# elif defined __BYTE_ORDER__ && __BYTE_ORDER__ == __BIG__ENDIAN
const host_endian = MM;
#else
#define _no_BYTE_ORDER
host_endian = 1; // plain "int", not "int const" !
#endif
and then, in the actual code:
int main(int argc, char **argv) {
#ifdef _no_BYTE_ORDER
host_endian = * (char *) &host_endian ? II : MM;
#undef _no_BYTE_ORDER
#endif
// .... your code here, for instance:
printf("Endedness: %s\n", host_endian == II ? "little-endian"
: "big-endian");
return 0;
}
On the other hand, as the original poster noted, the overhead of a runtime check is so little (two lines of code, and micro-seconds of time) that it's hardly worth the bother to try and do it in the preprocessor.

C++14 : 2 random generators - one works, the other doesnt

Im learning C++, and in my random number gen code, im always getting the same number
random_device rd;
mt19937 x{rd()};
uniform_int_distribution<int> ran{1, 100};
cout << ran(x);
but srand/rand() works.
srand (time(0));
cout << rand()%100;
I think that it has to do with time(). But how do i get the first code to work?
Assuming your problem is with the MinGW g++ compiler, you can define a header that wraps <random>, like this:
#pragma once
#ifndef MY_NO_FIX_OF_RANDOM_DEVICE
# ifdef __GNUC__
# undef _GLIBCXX_USE_RANDOM_TR1
# define _GLIBCXX_USE_RANDOM_TR1
# endif
#endif
#include <random>
That's just a modified-for-SO version of a header in the Wrapped stdlib library.
I would recommend to use a forced include (command line option) of this fix, or just defining _GLIBCXX_USE_RANDOM_TR1 in the command line.
By inspecting the source code of my MinGW g++ 7.3.0, files <random.h> and random.cc, it appears that the approach works on most PCs because (with that compiler) _GLIBCXX_USE_RANDOM_TR1 selects number generation via the rdrand instruction, if available, and else via the "/dev/urandom" *nix world device, if available.
So, criteria for “works”:
the processor supports the rdrand instruction, or
fopen succeeds in opening "/dev/urandom".
According to the Wikipedia article about rdrand
” AMD added support for the instruction in June 2015.
... so this approach may fail on a Windows PC (no "/dev/urandom") with AMD processor produced before that time (no rdrand instruction).
Technical details:
With _GLIBCXX_USE_RANDOM_TR1 defined the random_device default constructor calls the following function with argument "default":
void
random_device::_M_init(const std::string& token)
{
const char *fname = token.c_str();
if (token == "default")
{
#if (defined __i386__ || defined __x86_64__) && defined _GLIBCXX_X86_RDRAND
unsigned int eax, ebx, ecx, edx;
// Check availability of cpuid and, for now at least, also the
// CPU signature for Intel's
if (__get_cpuid_max(0, &ebx) > 0 && ebx == signature_INTEL_ebx)
{
__cpuid(1, eax, ebx, ecx, edx);
if (ecx & bit_RDRND)
{
_M_file = nullptr;
return;
}
}
#endif
fname = "/dev/urandom";
}
else if (token != "/dev/urandom" && token != "/dev/random")
fail:
std::__throw_runtime_error(__N("random_device::"
"random_device(const std::string&)"));
_M_file = static_cast<void*>(std::fopen(fname, "rb"));
if (!_M_file)
goto fail;
}
If __cpuid reports that the processor supports the rdrand instruction then this causes the _M_file member to be zeroed, which in turn causes the number generation code to use the rdrand instruction.
And otherwise, this code attempts to open the *nix random device, and if that fails then it and hence the random_device construction fails with an exception.

Flushing denormalised numbers to zero

I've scoured the web to no avail.
Is there a way for Xcode and Visual C++ to treat denormalised numbers as 0? I would have thought there's an option in the IDE preferences to turn on this option but can't seem to find it.
I'm doing some cross-platform audio stuff and need to stop certain processors hogging resources.
Cheers
You're looking for a platform-defined way to set FTZ and/or DAZ in the MXCSR register (on x86 with SSE or x86-64); see https://stackoverflow.com/a/2487733/567292
Usually this is called something like _controlfp; Microsoft documentation is at http://msdn.microsoft.com/en-us/library/e9b52ceh.aspx
You can also use the _MM_SET_FLUSH_ZERO_MODE macro: http://msdn.microsoft.com/en-us/library/a8b5ts9s(v=vs.71).aspx - this is probably the most cross-platform portable method.
For disabling denormals globally I use these 2 macros:
//warning these macros has to be used in the same scope
#define MXCSR_SET_DAZ_AND_FTZ \
int oldMXCSR__ = _mm_getcsr(); /*read the old MXCSR setting */ \
int newMXCSR__ = oldMXCSR__ | 0x8040; /* set DAZ and FZ bits */ \
_mm_setcsr( newMXCSR__ ); /*write the new MXCSR setting to the MXCSR */
#define MXCSR_RESET_DAZ_AND_FTZ \
/*restore old MXCSR settings to turn denormals back on if they were on*/ \
_mm_setcsr( oldMXCSR__ );
I call the first one at the beginning of the process and the second at the end.
Unfortunately this seems to not works well on Windows.
To flush denormals locally I use this
const Float32 k_DENORMAL_DC = 1e-25f;
inline void FlushDenormalToZero(Float32& ioFloat)
{
ioFloat += k_DENORMAL_DC;
ioFloat -= k_DENORMAL_DC;
}
See update (4 Aug 2022 at the end of this entry
To do this, use the Intel Intrinsics macros during program startup. For example:
#include <immintrin.h>
int main() {
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
}
In my version of MSVC, this emitted the following assembly code:
stmxcsr DWORD PTR tv805[rsp]
mov eax, DWORD PTR tv805[rsp]
bts eax, 15
mov DWORD PTR tv807[rsp], eax
ldmxcsr DWORD PTR tv807[rsp]
MXCSR is the control and status register, and this code is setting bit 15, which turns flush zero mode on.
One thing to note: this only affects denormals resulting from a computation. If you want to also set denormals to zero if they're used as input, you also need to set the DAZ flag (denormals are zero), using the following command:
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
See https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-setting-the-ftz-and-daz-flags for more information.
Also note that you need to set MXCSR for each thread, as the values contained are local to each thread.
Update 4 Aug 2022
I've now had to deal with ARM processors as well. The following is a cross-platform macro that works on ARM and Intel:
#ifndef __ARM_ARCH
extern "C" {
extern unsigned int _mm_getcsr();
extern void _mm_setcsr(unsigned int);
}
#define MY_FAST_FLOATS _mm_setcsr(_mm_getcsr() | 0x8040U)
#else
#define MY_FPU_GETCW(fpcr) __asm__ __volatile__("mrs %0, fpcr" : "=r"(fpcr))
#define MY_FPU_SETCW(fpcr) __asm__ __volatile__("msr fpcr, %0" : : "r"(fpcr))
#define MY_FAST_FLOATS \
{ \
uint64_t eE2Hsb4v {}; /* random name to avoid shadowing warnings */ \
MY_FPU_GETCW(eE2Hsb4v); \
eE2Hsb4v |= (1 << 24) | (1 << 19); /* FZ flag, FZ16 flag; flush denormals to zero */ \
MY_FPU_SETCW(eE2Hsb4v); \
} \
static_assert(true, "require semi-colon after macro with this assert")
#endif

Detecting CPU architecture compile-time

What is the most reliable way to find out CPU architecture when compiling C or C++ code? As far as I can tell, different compilers have their own set of non-standard preprocessor definitions (_M_X86 in MSVS, __i386__, __arm__ in GCC, etc).
Is there a standard way to detect the architecture I'm building for? If not, is there a source for a comprehensive list of such definitions for various compilers, such as a header with all the boilerplate #ifdefs?
Enjoy, I was the original author of this.
extern "C" {
const char *getBuild() { //Get current architecture, detectx nearly every architecture. Coded by Freak
#if defined(__x86_64__) || defined(_M_X64)
return "x86_64";
#elif defined(i386) || defined(__i386__) || defined(__i386) || defined(_M_IX86)
return "x86_32";
#elif defined(__ARM_ARCH_2__)
return "ARM2";
#elif defined(__ARM_ARCH_3__) || defined(__ARM_ARCH_3M__)
return "ARM3";
#elif defined(__ARM_ARCH_4T__) || defined(__TARGET_ARM_4T)
return "ARM4T";
#elif defined(__ARM_ARCH_5_) || defined(__ARM_ARCH_5E_)
return "ARM5"
#elif defined(__ARM_ARCH_6T2_) || defined(__ARM_ARCH_6T2_)
return "ARM6T2";
#elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__)
return "ARM6";
#elif defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) || defined(__ARM_ARCH_7S__)
return "ARM7";
#elif defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) || defined(__ARM_ARCH_7S__)
return "ARM7A";
#elif defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) || defined(__ARM_ARCH_7S__)
return "ARM7R";
#elif defined(__ARM_ARCH_7M__)
return "ARM7M";
#elif defined(__ARM_ARCH_7S__)
return "ARM7S";
#elif defined(__aarch64__) || defined(_M_ARM64)
return "ARM64";
#elif defined(mips) || defined(__mips__) || defined(__mips)
return "MIPS";
#elif defined(__sh__)
return "SUPERH";
#elif defined(__powerpc) || defined(__powerpc__) || defined(__powerpc64__) || defined(__POWERPC__) || defined(__ppc__) || defined(__PPC__) || defined(_ARCH_PPC)
return "POWERPC";
#elif defined(__PPC64__) || defined(__ppc64__) || defined(_ARCH_PPC64)
return "POWERPC64";
#elif defined(__sparc__) || defined(__sparc)
return "SPARC";
#elif defined(__m68k__)
return "M68K";
#else
return "UNKNOWN";
#endif
}
}
There's no inter-compiler standard, but each compiler tends to be quite consistent. You can build a header for yourself that's something like this:
#if MSVC
#ifdef _M_X86
#define ARCH_X86
#endif
#endif
#if GCC
#ifdef __i386__
#define ARCH_X86
#endif
#endif
There's not much point to a comprehensive list, because there are thousands of compilers but only 3-4 in widespread use (Microsoft C++, GCC, Intel CC, maybe TenDRA?). Just decide which compilers your application will support, list their #defines, and update your header as needed.
If you would like to dump all available features on a particular platform, you could run GCC like:
gcc -march=native -dM -E - </dev/null
It would dump macros like #define __SSE3__ 1, #define __AES__ 1, etc.
If you want a cross-compiler solution then just use Boost.Predef which contains
BOOST_ARCH_ for system/CPU architecture one is compiling for.
BOOST_COMP_ for the compiler one is using.
BOOST_LANG_ for language standards one is compiling against.
BOOST_LIB_C_ and BOOST_LIB_STD_ for the C and C++ standard library in use.
BOOST_OS_ for the operating system we are compiling to.
BOOST_PLAT_ for platforms on top of operating system or compilers.
BOOST_ENDIAN_ for endianness of the os and architecture combination.
BOOST_HW_ for hardware specific features.
BOOST_HW_SIMD for SIMD (Single Instruction Multiple Data) detection.
Note that although Boost is usually thought of as a C++ library, Boost.Predef is pure header-only and works for C
For example
#include <boost/predef.h>
// or just include the necessary headers
// #include <boost/predef/architecture.h>
// #include <boost/predef/other.h>
#if BOOST_ARCH_X86
#if BOOST_ARCH_X86_64
std::cout << "x86-64\n";
#elif BOOST_ARCH_X86_32
std::cout << "x86-32\n";
#else
std::cout << "x86-" << BOOST_ARCH_WORD_BITS << '\n'; // Probably x86-16
#endif
#elif BOOST_ARCH_ARM
#if BOOST_ARCH_ARM >= BOOST_VERSION_NUMBER(8, 0, 0)
#if BOOST_ARCH_WORD_BITS == 64
std::cout << "ARMv8+ Aarch64\n";
#elif BOOST_ARCH_WORD_BITS == 32
std::cout << "ARMv8+ Aarch32\n";
#else
std::cout << "Unexpected ARMv8+ " << BOOST_ARCH_WORD_BITS << "bit\n";
#endif
#elif BOOST_ARCH_ARM >= BOOST_VERSION_NUMBER(7, 0, 0)
std::cout << "ARMv7 (ARM32)\n";
#elif BOOST_ARCH_ARM >= BOOST_VERSION_NUMBER(6, 0, 0)
std::cout << "ARMv6 (ARM32)\n";
#else
std::cout << "ARMv5 or older\n";
#endif
#elif BOOST_ARCH_MIPS
#if BOOST_ARCH_WORD_BITS == 64
std::cout << "MIPS64\n";
#else
std::cout << "MIPS32\n";
#endif
#elif BOOST_ARCH_PPC_64
std::cout << "PPC64\n";
#elif BOOST_ARCH_PPC
std::cout << "PPC32\n";
#else
std::cout << "Unknown " << BOOST_ARCH_WORD_BITS << "-bit arch\n";
#endif
You can find out more on how to use it here
Demo on Godbolt
There's nothing standard. Brian Hook documented a bunch of these in his "Portable Open Source Harness", and even tries to make them into something coherent and usable (ymmv regarding that). See the posh.h header on this site:
http://hookatooka.com/poshlib/
Note, the link above may require you to enter some bogus userid/password due to a DOS attack some time ago.
There's a list of the #defines here. There was a previous highly voted answer that included this link but it was deleted by a mod presumably due to SO's "answers must have code" rule. So here's a random sample. Follow the link for the full list.
AMD64
Type
Macro
Description
Identification
__amd64__ __amd64 __x86_64__ __x86_64
Defined by GNU C and Sun Studio
Identification
_M_X64 _M_AMD64
Defined by Visual Studio
If you need a fine-grained detection of CPU features, the best approach is to ship also a CPUID program which outputs to stdout or some "cpu_config.h" file the set of features supported by the CPU. Then you integrate that program with your build process.