How to define endian enum class waiting for C++20? [duplicate]

How to define endian enum class waiting for C++20? [duplicate] - c++

This question already has answers here:
Macro definition to determine big endian or little endian machine?
(22 answers)
Closed 2 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Is there a safe, portable way to determine (during compile time) the endianness of the platform that my program is being compiled on? I'm writing in C.
[EDIT]
Thanks for the answers, I decided to stick with the runtime solution!

To answer the original question of a compile-time check, there's no standardized way to do it that will work across all existing and all future compilers, because none of the existing C, C++, and POSIX standards define macros for detecting endianness.
But, if you're willing to limit yourself to some known set of compilers, you can look up each of those compilers' documentations to find out which predefined macros (if any) they use to define endianness. This page lists several macros you can look for, so here's some code which would work for those:
#if defined(__BYTE_ORDER) && __BYTE_ORDER == __BIG_ENDIAN || \
defined(__BIG_ENDIAN__) || \
defined(__ARMEB__) || \
defined(__THUMBEB__) || \
defined(__AARCH64EB__) || \
defined(_MIBSEB) || defined(__MIBSEB) || defined(__MIBSEB__)
// It's a big-endian target architecture
#elif defined(__BYTE_ORDER) && __BYTE_ORDER == __LITTLE_ENDIAN || \
defined(__LITTLE_ENDIAN__) || \
defined(__ARMEL__) || \
defined(__THUMBEL__) || \
defined(__AARCH64EL__) || \
defined(_MIPSEL) || defined(__MIPSEL) || defined(__MIPSEL__)
// It's a little-endian target architecture
#else
#error "I don't know what architecture this is!"
#endif
If you can't find what predefined macros your compiler uses from its documentation, you can also try coercing it to spit out its full list of predefined macros and guess from there what will work (look for anything with ENDIAN, ORDER, or the processor architecture name in it). This page lists a number of methods for doing that in different compilers:
Compiler C macros C++ macros
Clang/LLVM clang -dM -E -x c /dev/null clang++ -dM -E -x c++ /dev/null
GNU GCC/G++ gcc -dM -E -x c /dev/null g++ -dM -E -x c++ /dev/null
Hewlett-Packard C/aC++ cc -dM -E -x c /dev/null aCC -dM -E -x c++ /dev/null
IBM XL C/C++ xlc -qshowmacros -E /dev/null xlc++ -qshowmacros -E /dev/null
Intel ICC/ICPC icc -dM -E -x c /dev/null icpc -dM -E -x c++ /dev/null
Microsoft Visual Studio (none) (none)
Oracle Solaris Studio cc -xdumpmacros -E /dev/null CC -xdumpmacros -E /dev/null
Portland Group PGCC/PGCPP pgcc -dM -E (none)
Finally, to round it out, the Microsoft Visual C/C++ compilers are the odd ones out and don't have any of the above. Fortunately, they have documented their predefined macros here, and you can use the target processor architecture to infer the endianness. While all of the currently supported processors in Windows are little-endian (_M_IX86, _M_X64, _M_IA64, and _M_ARM are little-endian), some historically supported processors like the PowerPC (_M_PPC) were big-endian. But more relevantly, the Xbox 360 is a big-endian PowerPC machine, so if you're writing a cross-platform library header, it can't hurt to check for _M_PPC.

This is for compile time checking
You could use information from the boost header file endian.hpp, which covers many platforms.
edit for runtime checking
bool isLittleEndian()
{
short int number = 0x1;
char *numPtr = (char*)&number;
return (numPtr[0] == 1);
}
Create an integer, and read its first byte (least significant byte). If that byte is 1, then the system is little endian, otherwise it's big endian.
edit Thinking about it
Yes you could run into a potential issue in some platforms (can't think of any) where sizeof(char) == sizeof(short int). You could use fixed width multi-byte integral types available in <stdint.h>, or if your platform doesn't have it, again you could adapt a boost header for your use: stdint.hpp

With C99, you can perform the check as:
#define I_AM_LITTLE (((union { unsigned x; unsigned char c; }){1}).c)
Conditionals like if (I_AM_LITTLE) will be evaluated at compile-time and allow the compiler to optimize out whole blocks.
I don't have the reference right off for whether this is strictly speaking a constant expression in C99 (which would allow it to be used in initializers for static-storage-duration data), but if not, it's the next best thing.

Interesting read from the C FAQ:
You probably can't. The usual techniques for detecting endianness
involve pointers or arrays of char, or maybe unions, but preprocessor
arithmetic uses only long integers, and there is no concept of
addressing. Another tempting possibility is something like
#if 'ABCD' == 0x41424344
but this isn't reliable, either.

I would like to extend the answers for providing a constexpr function for C++
union Mix {
int sdat;
char cdat[4];
};
static constexpr Mix mix { 0x1 };
constexpr bool isLittleEndian() {
return mix.cdat[0] == 1;
}
Since mix is constexpr too it is compile time and can be used in constexpr bool isLittleEndian(). Should be safe to use.
Update
As #Cheersandhth pointed out below, these seems to be problematic.
The reason is, that it is not C++11-Standard conform, where type punning is forbidden. There can always only one union member be active at a time. With a standard conforming compiler you will get an error.
So, don't use it in C++. It seems, you can do it in C though. I leave my answer in for educational purposes :-) and because the question is about C...
Update 2
This assumes that int has the size of 4 chars, which is not always given as #PetrVepřek correctly pointed out below. To make your code truly portable you have to be more clever here. This should suffice for many cases though. Note that sizeof(char) is always 1, by definition. The code above assumes sizeof(int)==4.

Use CMake TestBigEndian as
INCLUDE(TestBigEndian)
TEST_BIG_ENDIAN(ENDIAN)
IF (ENDIAN)
# big endian
ELSE (ENDIAN)
# little endian
ENDIF (ENDIAN)

Not during compile time, but perhaps during runtime. Here's a C function I wrote to determine endianness:
/* Returns 1 if LITTLE-ENDIAN or 0 if BIG-ENDIAN */
#include <inttypes.h>
int endianness()
{
union { uint8_t c[4]; uint32_t i; } data;
data.i = 0x12345678;
return (data.c[0] == 0x78);
}

From Finally, one-line endianness detection in the C preprocessor:
#include <stdint.h>
#define IS_BIG_ENDIAN (*(uint16_t *)"\0\xff" < 0x100)
Any decent optimizer will resolve this at compile-time. gcc does at -O1.
Of course stdint.h is C99. For ANSI/C89 portability see Doug Gwyn's Instant C9x library.

I took it from rapidjson library:
#define BYTEORDER_LITTLE_ENDIAN 0 // Little endian machine.
#define BYTEORDER_BIG_ENDIAN 1 // Big endian machine.
//#define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
#ifndef BYTEORDER_ENDIAN
// Detect with GCC 4.6's macro.
# if defined(__BYTE_ORDER__)
# if (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# elif (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
# define BYTEORDER_ENDIAN BYTEORDER_BIG_ENDIAN
# else
# error "Unknown machine byteorder endianness detected. User needs to define BYTEORDER_ENDIAN."
# endif
// Detect with GLIBC's endian.h.
# elif defined(__GLIBC__)
# include <endian.h>
# if (__BYTE_ORDER == __LITTLE_ENDIAN)
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# elif (__BYTE_ORDER == __BIG_ENDIAN)
# define BYTEORDER_ENDIAN BYTEORDER_BIG_ENDIAN
# else
# error "Unknown machine byteorder endianness detected. User needs to define BYTEORDER_ENDIAN."
# endif
// Detect with _LITTLE_ENDIAN and _BIG_ENDIAN macro.
# elif defined(_LITTLE_ENDIAN) && !defined(_BIG_ENDIAN)
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# elif defined(_BIG_ENDIAN) && !defined(_LITTLE_ENDIAN)
# define BYTEORDER_ENDIAN BYTEORDER_BIG_ENDIAN
// Detect with architecture macros.
# elif defined(__sparc) || defined(__sparc__) || defined(_POWER) || defined(__powerpc__) || defined(__ppc__) || defined(__hpux) || defined(__hppa) || defined(_MIPSEB) || defined(_POWER) || defined(__s390__)
# define BYTEORDER_ENDIAN BYTEORDER_BIG_ENDIAN
# elif defined(__i386__) || defined(__alpha__) || defined(__ia64) || defined(__ia64__) || defined(_M_IX86) || defined(_M_IA64) || defined(_M_ALPHA) || defined(__amd64) || defined(__amd64__) || defined(_M_AMD64) || defined(__x86_64) || defined(__x86_64__) || defined(_M_X64) || defined(__bfin__)
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# elif defined(_MSC_VER) && (defined(_M_ARM) || defined(_M_ARM64))
# define BYTEORDER_ENDIAN BYTEORDER_LITTLE_ENDIAN
# else
# error "Unknown machine byteorder endianness detected. User needs to define BYTEORDER_ENDIAN."
# endif
#endif

I once used a construct like this one:
uint16_t HI_BYTE = 0,
LO_BYTE = 1;
uint16_t s = 1;
if(*(uint8_t *) &s == 1) {
HI_BYTE = 1;
LO_BYTE = 0;
}
pByte[HI_BYTE] = 0x10;
pByte[LO_BYTE] = 0x20;
gcc with -O2 was able to make it completely compile time. That means, the HI_BYTE and LO_BYTE variables were replaced entirely and even the pByte acces was replaced in the assembler by the equivalent of *(unit16_t *pByte) = 0x1020;.
It's as compile time as it gets.

To my knowledge no, not during compile time.
At run-time, you can do trivial checks such as setting a multi-byte value to a known bit string and inspect what bytes that results in. For instance using a union,
typedef union {
uint32_t word;
uint8_t bytes[4];
} byte_check;
or casting,
uint32_t word;
uint8_t * bytes = &word;
Please note that for completely portable endianness checks, you need to take into account both big-endian, little-endian and mixed-endian systems.

For my part, I decided to use an intermediate approach: try the macros, and if they don't exist, or if we can't find them, then do it in runtime. Here is one that works on the GNU-compiler:
#define II 0x4949 // arbitrary values != 1; examples are
#define MM 0x4D4D // taken from the TIFF standard
int
#if defined __BYTE_ORDER__ && __BYTE_ORDER__ == __LITTLE_ENDIAN
const host_endian = II;
# elif defined __BYTE_ORDER__ && __BYTE_ORDER__ == __BIG__ENDIAN
const host_endian = MM;
#else
#define _no_BYTE_ORDER
host_endian = 1; // plain "int", not "int const" !
#endif
and then, in the actual code:
int main(int argc, char **argv) {
#ifdef _no_BYTE_ORDER
host_endian = * (char *) &host_endian ? II : MM;
#undef _no_BYTE_ORDER
#endif
// .... your code here, for instance:
printf("Endedness: %s\n", host_endian == II ? "little-endian"
: "big-endian");
return 0;
}
On the other hand, as the original poster noted, the overhead of a runtime check is so little (two lines of code, and micro-seconds of time) that it's hardly worth the bother to try and do it in the preprocessor.

Related

Is there a way to check if virtualisation is enabled on the BIOS using C++?

I'm trying to check if virtualisation (AMD-V or Intel VT) is enabled programmatically. I know the bash commands that gives you this information but I'm trying to achieve this in C++.
On that note, I'm trying to avoid using std::system to execute shell code because of how hacky and unoptimal that solution is. Thanks in advance!

To check if VMX or SVM (Intel and AMD virtualization technologies) are enabled you need to use the cpuid instruction.
This instruction comes up so often in similar tests that the mainstream compilers all have an intrinsic for it, you don't need inline assembly.
For Intel's CPUs you have to check CPUID.1.ecx[5], while for AMD's ones you have to check CPUID.1.ecx[2].
Here's an example. I only have gcc on an Intel CPU, you are required to properly test and fix this code for other compilers and AMD CPUs.
The principles I followed are:
Unsupported compilers and non-x86 architecture should fail to compile the code.
If run on an x86 compatible CPU that is neither Intel nor AMD, the function return false.
The code assumes cpuid exists, for Intel's CPUs this is true since the 486. The code is designed so that cpuid_ex can return false to denote that cpuid is not present, but I didn't implement a test for it because that would require inline assembly. GCC and clang have a built-in for this but MSVCC doesn't.
Note that if you only build 64-bit binaries you always have cpuid, you can remove the 32-bit architecture check so that if someday somebody tries to use a 32-bit build, the code fails to compile.
#if defined(_MSC_VER)
#include <intrin.h>
#elif defined(__clang__) || defined(__GNUC__)
#include <cpuid.h>
#endif
#include <string>
#include <iostream>
#include <cstdint>
//Execute cpuid with leaf "leaf" and given the temporary array x holding the
//values of eax, ebx, ecx and edx (after cpuid), copy x[start:end) into regs
//converting each item in a uint32_t value (so this interface is the same even
//for esoteric ILP64 compilers).
//If CPUID is not supported, returns false (NOTE: this check is not done, CPUID
//exists since the 486, it's up to you to implement a test. GCC & CLANG have a
//nice macro for this. MSVCC doesn't. ICC I don't know.)
bool cpuid_ex(unsigned int leaf, uint32_t* regs, size_t start = 0, size_t end = 4)
{
#if ( ! defined(__x86_64__) && ! defined(__i386__) && ! defined(_M_IX86) && ! defined(_M_X64))
//Not an x86
return false;
#elif defined(_MSC_VER) || defined(__INTEL_COMPILER)
//MS & Intel
int x[4]
__cpuid((int*)x, leaf);
#elif defined(__clang__) || defined(__GNUC__)
//GCC & clang
unsigned int x[4];
__cpuid(leaf, x[0], x[1], x[2], x[3]);
#else
//Unknown compiler
static_assert(false, "cpuid_ex: compiler is not supported");
#endif
//Conversion from x[i] to uint32_t is safe since GP registers are 32-bit at least
//if we are using cpuid
for (; start < end; start++)
*regs++ = static_cast<uint32_t>(x[start]);
return true;
}
//Check for Intel and AMD virtualization.
bool support_virtualization()
{
//Get the signature
uint32_t signature_reg[3] = {0};
if ( ! cpuid_ex(0, signature_reg, 1))
//cpuid is not supported, this returns false but you may want to throw
return false;
uint32_t features;
//Get the features
cpuid_ex(1, &features, 2, 3);
//Is intel? Check bit5
if (signature_reg[0] == 0x756e6547 && signature_reg[1] == 0x6c65746e && signature_reg[2] == 0x49656e69)
return features & 0x20;
//Is AMD? check bit2
if (signature_reg[0] == 0x68747541 && signature_reg[1] == 0x69746e65 && signature_reg[2] == 0x444d4163)
return features & 0x04;
//Not intel or AMD, this returns false but you may want to throw
return false;
}
int main()
{
std::cout << "Virtualization is " << (support_virtualization() ? "": "NOT ") << "supported\n";
return 0;
}

No, there is no way to detect virtualization support using only standard C++ facilities. The C++ standard library does not have anything related to hardware virtualization. (Its exposure of low level details like that is extremely limited.)
You would need to use OS-specific facilities to detect this.

Reading CF, PF, ZF, SF, OF

I am writing a virtual machine for my own assembly language, I want to be able to set the carry, parity, zero, sign and overflowflags as they are set in the x86-64 architecture, when I perform operations such as addition.
Notes:
I am using Microsoft Visual C++ 2015 & Intel C++ Compiler 16.0
I am compiling as a Win64 application.
My virtual machine (currently) only does arithmetic on 8-bit integers
I'm not (currently) interested in any other flags (e.g. AF)
My current solution is using the following function:
void update_flags(uint16_t input)
{
Registers::flags.carry = (input > UINT8_MAX);
Registers::flags.zero = (input == 0);
Registers::flags.sign = (input < 0);
Registers::flags.overflow = (int16_t(input) > INT8_MAX || int16_t(input) < INT8_MIN);
// I am assuming that overflow is handled by trunctation
uint8_t input8 = uint8_t(input);
// The parity flag
int ones = 0;
for (int i = 0; i < 8; ++i)
if (input8 & (1 << i) != 0) ++ones;
Registers::flags.parity = (ones % 2 == 0);
}
Which for addition, I would use as follows:
uint8_t a, b;
update_flags(uint16_t(a) + uint16_t(b));
uint8_t c = a + b;
EDIT:
To clarify, I want to know if there is a more efficient/neat way of doing this (such as by accessing RFLAGS directly)
Also my code may not work for other operations (e.g. multiplication)
EDIT 2 I have updated my code now to this:
void update_flags(uint32_t result)
{
Registers::flags.carry = (result > UINT8_MAX);
Registers::flags.zero = (result == 0);
Registers::flags.sign = (int32_t(result) < 0);
Registers::flags.overflow = (int32_t(result) > INT8_MAX || int32_t(result) < INT8_MIN);
Registers::flags.parity = (_mm_popcnt_u32(uint8_t(result)) % 2 == 0);
}
One more question, will my code for the carry flag work properly?, I also want it to be set correctly for "borrows" that occur during subtraction.
Note: The assembly language I am virtualising is of my own design, meant to be simple and based of Intel's implementation of x86-64 (i.e. Intel64), and so I would like these flags to behave in mostly the same way.

TL:DR: use lazy flag evaluation, see below.
input is a weird name. Most ISAs update flags based on the result of an operation, not the inputs. You're looking at the 16bit result of an 8bit operation, which is an interesting approach. In the C, you should just use unsigned int, which is guaranteed to be at least uint16_t. It will compile to better code on x86, where unsigned is 32bit. 16bit ops take an extra prefix and can lead to partial-register slowdowns.
That might help with the 8bx8b->16b mul problem you noted, depending on how you want to define the flag-updating for the mul instruction in the architecture you're emulating.
I don't think your overflow detection is correct. See this tutorial linked from the x86 tag wiki for how it's done.
This will probably not compile to very fast code, especially the parity flag. Do you need the ISA you're emulating/designing to have a parity flag? You never said you're emulating an x86, so I assume it's some toy architecture you're designing yourself.
An efficient emulator (esp. one that needs to support a parity flag) would probably benefit a lot from some kind of lazy flag evaluation. Save a value that you can compute flags from if needed, but don't actually compute anything until you get to an instruction that reads flags. Most instructions only write flags without reading them, and they just save the uint16_t result into your architectural state. Flag-reading instructions can either compute just the flag they need from that saved uint16_t, or compute all of them and store that somehow.
Assuming you can't get the compiler to actually read PF from the result, you might try _mm_popcnt_u32((uint8_t)x) & 1. Or, horizontally XOR all the bits together:
x = (x&0b00001111) ^ (x>>4)
x = (x&0b00000011) ^ (x>>2)
PF = (x&0b00000001) ^ (x>>1) // tweaking this to produce better asm is probably possible
I doubt any of the major compilers can peephole-optimize a bunch of checks on a result into LAHF + SETO al, or a PUSHF. Compilers can be led into using a flag condition to detect integer overflow to implement saturating addition, for example. But having it figure out that you want all the flags, and actually use LAHF instead of a series of setcc instruction, is probably not possible. The compiler would need a pattern-recognizer for when it can use LAHF, and probably nobody's implemented that because the use-cases are so vanishingly rare.
There's no C/C++ way to directly access flag results of an operation, which makes C a poor choice for implementing something like this. IDK if any other languages do have flag results, other than asm.
I expect you could gain a lot of performance by writing parts of the emulation in asm, but that would be platform-specific. More importantly, it's a lot more work.

I appear to have solved the problem, by splitting the arguments to update flags into an unsigned and signed result as follows:
void update_flags(int16_t unsigned_result, int16_t signed_result)
{
Registers::flags.zero = unsigned_result == 0;
Registers::flags.sign = signed_result < 0;
Registers::flags.carry = unsigned_result < 0 || unsigned_result > UINT8_MAX;
Registers::flags.overflow = signed_result < INT8_MIN || signed_result > INT8_MAX
}
For addition (which should produce the correct result for both signed & unsigned inputs) I would do the following:
int8_t a, b;
int16_t signed_result = int16_t(a) + int16_t(b);
int16_t unsigned_result = int16_t(uint8_t(a)) + int16_t(uint8_t(b));
update_flags(unsigned_result, signed_result);
int8_t c = a + b;
And signed multiplication I would do the following:
int8_t a, b;
int16_t result = int16_t(a) * int16_t(b);
update_flags(result, result);
int8_t c = a * b;
And so on for the other operations that update the flags
Note: I am assuming here that int16_t(a) sign extends, and int16_t(uint8_t(a)) zero extends.
I have also decided against having a parity flag, my _mm_popcnt_u32 solution should work if I change my mind later..
P.S. Thank you to everyone who responded, it was very helpful. Also if anyone can spot any mistakes in my code, that would be appreciated.

SPI_IOC_MESSAGE(N) macro giving me fits

I'm having trouble getting a SPI program I'm working on to behave correctly and it seems to be some issue with the SPI_IOC_MESSAGE(N) macro.
Here's sample code that DOESN'T work (ioctl returns EINVAL (22) ):
std::vector<spi_ioc_transfer> tr;
<code that fills tr with 1+ transfers>
// Hand the transmission(s) off to the SPI driver
if (tr.size() > 0)
{
int ret = ioctl(fd, SPI_IOC_MESSAGE(tr.size()), tr.data());
if (ret < 1)
{
int err = errno;
}
}
My test code right now is creating a vector of length 1.
If I explicitly change the code to:
int ret = ioctl(fd, SPI_IOC_MESSAGE( 1 ), tr.data());
...then ioctl(...) succeeds and my bits go down the pipe.
Looking at the expansion of the SPI_IOC_MESSAGE macro in Eclipse, I don't see why this isn't happy.
Suggestions?
I'm cross-compiling for Linux/ARM (Beaglebone Black) from a 64-bit Linux VM, but that I can't see that affecting the macro.
EDIT:
Here are the two macro expansions out of the C pre-processor
int ret = ioctl(fd, (((1U) << (((0 +8)+8)+14)) | ((('k')) << (0 +8)) | (((0)) << 0) | ((((sizeof(char[((((tr.size())*(sizeof (struct spi_ioc_transfer))) < (1 << 14)) ? ((tr.size())*(sizeof (struct spi_ioc_transfer))) : 0)])))) << ((0 +8)+8))), tr.data());
and the literal:
int ret = ioctl(fd, (((1U) << (((0 +8)+8)+14)) | ((('k')) << (0 +8)) | (((0)) << 0) | ((((sizeof(char[((((1)*(sizeof (struct spi_ioc_transfer))) < (1 << 14)) ? ((1)*(sizeof (struct spi_ioc_transfer))) : 0)])))) << ((0 +8)+8))), tr.data());
Absolutely hideous, but I don't see anything surprising in how tr.size() would be getting used there.
Edit to include what seems to be the answer
#ifdef __cplusplus /* If this is a C++ compiler, use C linkage */
extern "C" {
#endif
#include <linux/spi/spidev.h>
#ifdef __cplusplus /* If this is a C++ compiler, use C linkage */
}
#endif
Wrapping the linux SPI include file in an "extern C" instructs the system to treat that section as plain old C, and that seems to let me call SPI_IOC_MESSAGE( tr.size() ) or SPI_IOC_MESSAGE( an_int ) and have the correct thing happen (verified with GDB stepthrough and a signal analyzer).

I suspect the problem might lie in this particular nugget buried in the macro soup:
...sizeof(char[...tr.size()...])...
Noting that Linux code is exclusively C, the C standard (I have C99 draft n1256 here) states for the sizeof operator that an expression operand is unevaluated, and the result is a constant, unless the type of the operand is a variable-length array, in which case it is evaluated and the result is an integer.
The C++ standard (C++11 draft n3242), however, does not seem to give any condition under which the operand is evaluated, and only states the result is a constant.
Thus it looks like this may be one of the corners where C and C++ differ, and compiling one as the other leads to undefined behaviour. In that case, I think the choice is either just hacking up your own version of the macro, or having a separate external C function that wraps it.

Detect the availability of SSE/SSE2 instruction set in Visual Studio

How can I check in code whether SSE/SSE2 is enabled or not by the Visual Studio compiler?
I have tried #ifdef __SSE__ but it didn't work.

Some additional information on _M_IX86_FP.
_M_IX86_FP is only defined for 32-bit code. 64-bit x86 code has at least SSE2. You can use _M_AMD64 or _M_X64 to determine if the code is 64-bit.
#ifdef __AVX2__
//AVX2
#elif defined ( __AVX__ )
//AVX
#elif (defined(_M_AMD64) || defined(_M_X64))
//SSE2 x64
#elif _M_IX86_FP == 2
//SSE2 x32
#elif _M_IX86_FP == 1
//SSE x32
#else
//nothing
#endif

From the documentation:
_M_IX86_FP
Expands to a value indicating which /arch compiler option was used:
0 if /arch:IA32 was used.
1 if /arch:SSE was used.
2 if /arch:SSE2 was used. This value is the default if /arch was not specified.
I don't see any mention of _SSE_.

The relevant preprocessor macros have two underscores at each end:
#ifdef __SSE__
#ifdef __SSE2__
#ifdef __SSE3__
#ifdef __SSE4_1__
#ifdef __AVX__
...etc...
UPDATE: apparently the above macros are not automatically pre-defined for you when using Visual Studio (even though they are in every other x86 compiler that I have ever used), so you may need to define them yourself if you want portability with gcc, clang, ICC, et al...

This is a late answer, but on MSDN you can find an article about __cpuid and __cpuidex. I redid the class into a function and it checks the support of MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1.
https://learn.microsoft.com/en-us/cpp/intrinsics/cpuid-cpuidex?view=vs-2019
[[nodiscard]] bool CheckSimdSupport() noexcept
{
std::array<int, 4> cpui;
int nIds_{};
std::bitset<32> f_1_ECX_{};
std::bitset<32> f_1_EDX_{};
std::vector<std::array<int, 4>> data_;
__cpuid(cpui.data(), 0);
nIds_ = cpui[0];
for (int i = 0; i <= 1; ++i)
{
__cpuid(cpui.data(), i);
data_.push_back(cpui);
}
if (nIds_ >= 1)
{
f_1_ECX_ = data_[1][2];
f_1_EDX_ = data_[1][3];
}
// f_1_ECX_[0] - SSE3
// f_1_ECX_[9] - SSSE3
// f_1_ECX_[19] - SSE4.1
// f_1_EDX_[23] - MMX
// f_1_EDX_[25] - SSE
// f_1_EDX_[26] - SSE2
return f_1_ECX_[0] && f_1_ECX_[9] && f_1_ECX_[19] && f_1_EDX_[23] && f_1_EDX_[25] && f_1_EDX_[26];
}

Detecting CPU architecture compile-time

What is the most reliable way to find out CPU architecture when compiling C or C++ code? As far as I can tell, different compilers have their own set of non-standard preprocessor definitions (_M_X86 in MSVS, __i386__, __arm__ in GCC, etc).
Is there a standard way to detect the architecture I'm building for? If not, is there a source for a comprehensive list of such definitions for various compilers, such as a header with all the boilerplate #ifdefs?

Enjoy, I was the original author of this.
extern "C" {
const char *getBuild() { //Get current architecture, detectx nearly every architecture. Coded by Freak
#if defined(__x86_64__) || defined(_M_X64)
return "x86_64";
#elif defined(i386) || defined(__i386__) || defined(__i386) || defined(_M_IX86)
return "x86_32";
#elif defined(__ARM_ARCH_2__)
return "ARM2";
#elif defined(__ARM_ARCH_3__) || defined(__ARM_ARCH_3M__)
return "ARM3";
#elif defined(__ARM_ARCH_4T__) || defined(__TARGET_ARM_4T)
return "ARM4T";
#elif defined(__ARM_ARCH_5_) || defined(__ARM_ARCH_5E_)
return "ARM5"
#elif defined(__ARM_ARCH_6T2_) || defined(__ARM_ARCH_6T2_)
return "ARM6T2";
#elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__)
return "ARM6";
#elif defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) || defined(__ARM_ARCH_7S__)
return "ARM7";
#elif defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) || defined(__ARM_ARCH_7S__)
return "ARM7A";
#elif defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) || defined(__ARM_ARCH_7S__)
return "ARM7R";
#elif defined(__ARM_ARCH_7M__)
return "ARM7M";
#elif defined(__ARM_ARCH_7S__)
return "ARM7S";
#elif defined(__aarch64__) || defined(_M_ARM64)
return "ARM64";
#elif defined(mips) || defined(__mips__) || defined(__mips)
return "MIPS";
#elif defined(__sh__)
return "SUPERH";
#elif defined(__powerpc) || defined(__powerpc__) || defined(__powerpc64__) || defined(__POWERPC__) || defined(__ppc__) || defined(__PPC__) || defined(_ARCH_PPC)
return "POWERPC";
#elif defined(__PPC64__) || defined(__ppc64__) || defined(_ARCH_PPC64)
return "POWERPC64";
#elif defined(__sparc__) || defined(__sparc)
return "SPARC";
#elif defined(__m68k__)
return "M68K";
#else
return "UNKNOWN";
#endif
}
}

There's no inter-compiler standard, but each compiler tends to be quite consistent. You can build a header for yourself that's something like this:
#if MSVC
#ifdef _M_X86
#define ARCH_X86
#endif
#endif
#if GCC
#ifdef __i386__
#define ARCH_X86
#endif
#endif
There's not much point to a comprehensive list, because there are thousands of compilers but only 3-4 in widespread use (Microsoft C++, GCC, Intel CC, maybe TenDRA?). Just decide which compilers your application will support, list their #defines, and update your header as needed.

If you would like to dump all available features on a particular platform, you could run GCC like:
gcc -march=native -dM -E - </dev/null
It would dump macros like #define __SSE3__ 1, #define __AES__ 1, etc.

If you want a cross-compiler solution then just use Boost.Predef which contains
BOOST_ARCH_ for system/CPU architecture one is compiling for.
BOOST_COMP_ for the compiler one is using.
BOOST_LANG_ for language standards one is compiling against.
BOOST_LIB_C_ and BOOST_LIB_STD_ for the C and C++ standard library in use.
BOOST_OS_ for the operating system we are compiling to.
BOOST_PLAT_ for platforms on top of operating system or compilers.
BOOST_ENDIAN_ for endianness of the os and architecture combination.
BOOST_HW_ for hardware specific features.
BOOST_HW_SIMD for SIMD (Single Instruction Multiple Data) detection.
Note that although Boost is usually thought of as a C++ library, Boost.Predef is pure header-only and works for C
For example
#include <boost/predef.h>
// or just include the necessary headers
// #include <boost/predef/architecture.h>
// #include <boost/predef/other.h>
#if BOOST_ARCH_X86
#if BOOST_ARCH_X86_64
std::cout << "x86-64\n";
#elif BOOST_ARCH_X86_32
std::cout << "x86-32\n";
#else
std::cout << "x86-" << BOOST_ARCH_WORD_BITS << '\n'; // Probably x86-16
#endif
#elif BOOST_ARCH_ARM
#if BOOST_ARCH_ARM >= BOOST_VERSION_NUMBER(8, 0, 0)
#if BOOST_ARCH_WORD_BITS == 64
std::cout << "ARMv8+ Aarch64\n";
#elif BOOST_ARCH_WORD_BITS == 32
std::cout << "ARMv8+ Aarch32\n";
#else
std::cout << "Unexpected ARMv8+ " << BOOST_ARCH_WORD_BITS << "bit\n";
#endif
#elif BOOST_ARCH_ARM >= BOOST_VERSION_NUMBER(7, 0, 0)
std::cout << "ARMv7 (ARM32)\n";
#elif BOOST_ARCH_ARM >= BOOST_VERSION_NUMBER(6, 0, 0)
std::cout << "ARMv6 (ARM32)\n";
#else
std::cout << "ARMv5 or older\n";
#endif
#elif BOOST_ARCH_MIPS
#if BOOST_ARCH_WORD_BITS == 64
std::cout << "MIPS64\n";
#else
std::cout << "MIPS32\n";
#endif
#elif BOOST_ARCH_PPC_64
std::cout << "PPC64\n";
#elif BOOST_ARCH_PPC
std::cout << "PPC32\n";
#else
std::cout << "Unknown " << BOOST_ARCH_WORD_BITS << "-bit arch\n";
#endif
You can find out more on how to use it here
Demo on Godbolt

There's nothing standard. Brian Hook documented a bunch of these in his "Portable Open Source Harness", and even tries to make them into something coherent and usable (ymmv regarding that). See the posh.h header on this site:
http://hookatooka.com/poshlib/
Note, the link above may require you to enter some bogus userid/password due to a DOS attack some time ago.

There's a list of the #defines here. There was a previous highly voted answer that included this link but it was deleted by a mod presumably due to SO's "answers must have code" rule. So here's a random sample. Follow the link for the full list.
AMD64
Type
Macro
Description
Identification
__amd64__ __amd64 __x86_64__ __x86_64
Defined by GNU C and Sun Studio
Identification
_M_X64 _M_AMD64
Defined by Visual Studio

If you need a fine-grained detection of CPU features, the best approach is to ship also a CPUID program which outputs to stdout or some "cpu_config.h" file the set of features supported by the CPU. Then you integrate that program with your build process.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js