Header for _blsr_u64 with Sun supplied GCC on Solaris 11? - c++

We've got some code that runs on multiple platforms. The code uses BMI/BMI2 intrinsics when available, like a Core i7 5th gen. GCC supplied by Sun on Solaris 11.3 is defining __BMI__ and __BMI2__, but its having trouble locating BMI/BMI2 intrinsics:
$ cat test.cxx
#include <x86intrin.h>
int main(int argc, char* argv[])
{
unsigned long long t = argc;
#if defined(__BMI__) || defined(__BMI2__)
t = _blsr_u64(t);
#endif
return int(t);
}
$ /bin/g++ -march=native test.cxx -o test.exe
test.cxx: In function ‘int main(int, char**)’:
test.cxx:6:18: error: ‘_blsr_u64’ was not declared in this scope
t = _blsr_u64(t);
^
Including immintrin.h does not make a difference.
Which header do we include for _blsr_u64 when using GCC on Solaris 11.3?
Here are the relevant defines from GCC:
$ /bin/g++ -march=native -dM -E - < /dev/null | sort | \
/usr/gnu/bin/egrep -i '(sse|aes|rdrnd|rdseed|avx|bmi)'
#define __AES__ 1
#define __AVX__ 1
#define __AVX2__ 1
#define __BMI__ 1
#define __BMI2__ 1
#define __core_avx2 1
#define __core_avx2__ 1
#define __RDRND__ 1
#define __RDSEED__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSSE3__ 1
#define __tune_core_avx2__ 1
And CPU features:
$ isainfo -v
64-bit amd64 applications
avx xsave pclmulqdq aes movbe sse4.2 sse4.1 ssse3 amd_lzcnt popcnt tscp
ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu prfchw adx
rdseed efs rtm hle bmi2 avx2 bmi1 f16c fma rdrand
And GCC version:
$ /bin/g++ --version
g++ (GCC) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.

Which header do we include for _blsr_u64 when using GCC on Solaris 11.3?
It looks like #include <x86intrin.h> is correct.
The problem was the compiler invocation required both -march=native -m64 even though 64-bit is native for the machine and the kernel is 64-bit:
$ /bin/g++ -march=native -m64 test.cxx -o test.exe

Related

Why isn't -mmacosx-version-min=10.10 preventing use of a function tagged as starting in 10.11?

By my understanding of how the availability macros and the -mmacosx-version-min flag works, the following code should fail to compile when targeting OS X 10.10:
#include <Availability.h>
#include <CoreFoundation/CoreFoundation.h>
#include <Security/Security.h>
#if !defined(__MAC_OS_X_VERSION_MIN_REQUIRED)
#error
#endif
#if __MAC_OS_X_VERSION_MIN_REQUIRED < 101000
#error __MAC_OSX_VERSION_MIN_REQUIRED too low
#endif
#if __MAC_OS_X_VERSION_MIN_REQUIRED > 101000
#error __MAC_OSX_VERSION_MIN_REQUIRED too high
#endif
int main() {
size_t len = 0;
SSLContextRef x{};
auto status = SSLCopyRequestedPeerNameLength(x, &len);
return status != 0;
}
because the function SSLCopyRequestedPeerNameLength is tagged as becoming available in 10.11 in SecureTransport.h:
$ grep -C5 ^SSLCopyRequestedPeerNameLength /System/Library/Frameworks//Security.framework/Headers/SecureTransport.h
/*
* Server Only: obtain the hostname specified by the client in the ServerName extension (SNI)
*/
OSStatus
SSLCopyRequestedPeerNameLength (SSLContextRef ctx,
size_t *peerNameLen)
__OSX_AVAILABLE_STARTING(__MAC_10_11, __IPHONE_9_0);
Yet when I compile on the command line with -mmacosx-version-min=10.10 I get no warning at all, despite -Wall -Werror -Wextra:
$ clang++ -Wall -Werror -Wextra ./foo.cpp --std=c++11 -framework Security -mmacosx-version-min=10.10 --stdlib=libc++ ; echo $?
0
Is there some additional definition I need to provide or specific warning to enable to ensure that I don't pick up a dependency on APIs newer than 10.10? I really had expected that -mmacosx-version-min=10.10 would prevent usage of APIs tagged with higher version numbers.
What have I misunderstood here?
Using XCode 10.0 (10A255) on macOS 10.13.6 here.
Now that I can answer my own question, I will: you need to add -Wunguarded-availability to your compile flags. Only then will you get a warning/error.

Illegal instruction - vcvtsi2sd

I am writing a program to compute Groebner bases using the library FGB. While it has a C interface, I am calling the library from C++ code compiled with g++ on Ubuntu.
Compiling with the option -g and using x/i $pc in gdb, the illegal instruction is as follows.
0x421c39 FGb_xmalloc_spec+985: vcvtsi2sd %rbx,%xmm0,%xmm0
The line above has angle brackets around FGB_xmalloc_spec+985. As far as I can tell, my processor does not support this instruction, and I am trying to figure out why the program uses it. It looks to me like the instruction comes from the library code. However, the code I am compiling used to work on the desktop it is now failing on - one day just started throwing the illegal instruction. I assumed I screwed up some libraries or something, so I reinstalled Ubuntu 16.04 but I continue to get the illegal instruction. The same exact code does work on another desktop and a chromebook, running Ubuntu 16.04 and 14.04 respectively.
Technical information:
g++: 5.4.0 20160609
gdb: 7.11.1
Ubuntu: 16.04/14.04 LTS
Process: x86info output
Found 4 identical CPUs
Extended Family: 0 Extended Model: 1 Family: 6 Model: 23 Stepping: 10
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Core 2 Duo
Processor name string (BIOS programmed): Intel(R) Core(TM)2 Quad CPU Q9650 # 3.00GHz
cpu flags
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority dtherm
Compile line
g++ -std=c++11 -g -I src -o bin/main.o -c src/main.cpp
g++ -std=c++11 -g -I src -o bin/Polynomial.o -c src/Polynomial.cpp
g++ -std=c++11 -g -I src -o bin/Util.o -c src/Util.cpp
g++ -std=c++11 -g -I src -o bin/Solve.o -c src/Solve.cpp
g++ -std=c++11 -g -o bin/StartUp bin/main.o bin/Util.o bin/Polynomial.o bin/Solve.o -Llib -lfgb -lfgbexp -lgb -lgbexp -lminpoly -lminpolyvgf -lgmp -lm -fopenmp
At this point, I am not sure what further things I can try to avoid this illegal instruction and welcome any and all suggestions.

warning: 'assume_aligned' attribute directive ignored

I just started with C++ and i think the best way is to look at source codes. I have code as follows in the header file.
#ifdef _MSC_VER
#define MYAPP_CACHE_ALIGNED_RETURN /* not supported */
#else
#define MYAPP_CACHE_ALIGNED_RETURN __attribute__((assume_aligned(64)))
#endif
I am using gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11) and its quite old. I get this warning during compilation:
warning: 'assume_aligned' attribute directiv e ignored [-Wattributes] –
How can I make the if statement more specific to fix the the warning during compilation?
It seems that assume_aligned is not supported in RHEL's GCC (it hasn't been backported to upstream gcc-4_8-branch and also not available in Ubuntu 14.04's GCC 4.8.4 so that wouldn't be surprising).
To emit a more user-friendly diagnostics you can do an explicit check for GCC version in one of your headers:
#if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 9)
# warning "Your version of GCC does not support 'assume_aligned' attribute"
#endif
But this may not work if your distro vendor has back-ported assume_aligned from upstream (which is not the case for RedHat and Ubuntu but who knows about other distros). The most robust way to check this would be to do a build-time test in configure script or in Makefile:
CFLAGS += $(shell echo 'void* my_alloc1() __attribute__((assume_aligned(16)));' | gcc -x c - -c -o /dev/null -Werror && echo -DHAS_ASSUME_ALIGNED)
This will add HAS_ASSUME_ALIGNED to predefined macro if the attribute is supported by compiler.
Note that you can achieve similar effect with __builtin_assume_aligned function:
void foo() {
double *p = func_that_misses_assume_align();
p = __builtin_assume_aligned(p, 128);
...
}

float.h not found or included properly when using mingw-w64

I'm trying to cross-compile a piece of code which uses float.h to set some FPU sizes.
The particular piece of code that requires it is:
#ifdef SINGLE
_control87(_PC_24, _MCW_PC); /* Set FPU control word for single precision. */
#else /* not SINGLE */
_control87(_PC_53, _MCW_PC); /* Set FPU control word for double precision. */
#endif /* not SINGLE */
When I compile, however, I get the error
/home/rcrozier/src/xfemm-hg/mfemm/../cfemm/fmesher/triangle.c:4922:14: error: '_PC_53' undeclared (first use in this function)
_control87(_PC_53, _MCW_PC); /* Set FPU control word for double precision. */
Another person explains what seems to be the same problem in more detail here. There is also a very similar issue described in a (rather old) thread here. In case it's relevant, I'm using mingw-w64, but via the M Cross Environment
What exactly is the problem with float.h in this case, and is there a workaround?
EDIT: Verbose output from gcc
Using built-in specs.
COLLECT_GCC=/opt/mxe/usr/bin/x86_64-w64-mingw32.shared-gcc
Target: x86_64-w64-mingw32.shared
Configured with: /opt/mxe/tmp-gcc-x86_64-w64-mingw32.shared/gcc-4.9.4/configure --target=x86_64-w64-mingw32.shared --build=x86_64-unknown-linux-gnu --prefix=/opt/mxe/usr --libdir=/opt/mxe/usr/lib --enable-languages=c,c++,objc,fortran --enable-version-specific-runtime-libs --with-gcc --with-gnu-ld --with-gnu-as --disable-nls --disable-multilib --without-x --disable-win32-registry --enable-threads=win32 --enable-libgomp --with-gmp=/opt/mxe/usr/x86_64-unknown-linux-gnu --with-isl=/opt/mxe/usr/x86_64-unknown-linux-gnu --with-mpc=/opt/mxe/usr/x86_64-unknown-linux-gnu --with-mpfr=/opt/mxe/usr/x86_64-unknown-linux-gnu --with-cloog=/opt/mxe/usr/x86_64-unknown-linux-gnu --with-as=/opt/mxe/usr/bin/x86_64-w64-mingw32.shared-as --with-ld=/opt/mxe/usr/bin/x86_64-w64-mingw32.shared-ld --with-nm=/opt/mxe/usr/bin/x86_64-w64-mingw32.shared-nm
Thread model: win32
gcc version 4.9.4 (GCC)
COLLECT_GCC_OPTIONS='-c' '-I' '../cfemm/fmesher' '-I' '../cfemm/libfemm' '-I' '../cfemm/libfemm/liblua' '-I' '/usr/local/MATLAB/R2015a/extern/include' '-I' '/usr/local/MATLAB/R2015a/simulink/include' '-D' 'MATLAB_MEX_FILE' '-std=c99' '-D' '_GNU_SOURCE' '-fexceptions' '-fPIC' '-fno-omit-frame-pointer' '-pthread' '-v' '-fpermissive' '-D' 'CPU86' '-D' 'MX_COMPAT_32' '-O' '-D' 'NDEBUG' '-o' '/home/rcrozier/src/xfemm-hg/mfemm/../cfemm/fmesher/triangle.o' '-mtune=generic' '-march=x86-64'
/opt/mxe/usr/libexec/gcc/x86_64-w64-mingw32.shared/4.9.4/cc1 -quiet -v -I ../cfemm/fmesher -I ../cfemm/libfemm -I ../cfemm/libfemm/liblua -I /usr/local/MATLAB/R2015a/extern/include -I /usr/local/MATLAB/R2015a/simulink/include -D_REENTRANT -U_REENTRANT -D MATLAB_MEX_FILE -D _GNU_SOURCE -D CPU86 -D MX_COMPAT_32 -D NDEBUG /home/rcrozier/src/xfemm-hg/mfemm/../cfemm/fmesher/triangle.c -quiet -dumpbase triangle.c -mtune=generic -march=x86-64 -auxbase-strip /home/rcrozier/src/xfemm-hg/mfemm/../cfemm/fmesher/triangle.o -O -std=c99 -version -fexceptions -fPIC -fno-omit-frame-pointer -fpermissive -o /tmp/ccMkwwWD.s
cc1: warning: command line option '-fpermissive' is valid for C++/ObjC++ but not for C
GNU C (GCC) version 4.9.4 (x86_64-w64-mingw32.shared)
compiled by GNU C version 4.8.4, GMP version 6.1.1, MPFR version 3.1.4, MPC version 1.0.2
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/opt/mxe/usr/lib/gcc/x86_64-w64-mingw32.shared/4.9.4/../../../../x86_64-w64-mingw32.shared/sys-include"
#include "..." search starts here:
#include <...> search starts here:
../cfemm/fmesher
../cfemm/libfemm
../cfemm/libfemm/liblua
/usr/local/MATLAB/R2015a/extern/include
/usr/local/MATLAB/R2015a/simulink/include
/opt/mxe/usr/lib/gcc/x86_64-w64-mingw32.shared/4.9.4/include
/opt/mxe/usr/lib/gcc/x86_64-w64-mingw32.shared/4.9.4/include-fixed
/opt/mxe/usr/lib/gcc/x86_64-w64-mingw32.shared/4.9.4/../../../../x86_64-w64-mingw32.shared/include
End of search list.
EDIT: more info
I also get the same result if I use the full directory path to the mingw-w64 float.h like so:
//#include <float.h>
#include "/opt/mxe/usr/x86_64-w64-mingw32.static/include/float.h"
EDIT more info on code structure
To give some further information, I am actually compiling C library (header and C file) where declaration of the function I'm using is included using extern C. The actual declaration from the header file is shown below:
#ifdef __cplusplus
extern "C" {
#endif
#ifdef ANSI_DECLARATORS
int triangulate(char *, struct triangulateio *, struct triangulateio *,
struct triangulateio *, int (*TriMessage)(const char * format, ...));
void trifree(VOID *memptr);
#else /* not ANSI_DECLARATORS */
int triangulate();
void trifree();
#endif /* not ANSI_DECLARATORS */
#ifdef __cplusplus
}
#endif
The actual library I'm using is Triangle. The float.h include is in triangle.c, and looks like this:
#ifdef CPU86
//#include <float.h>
#include "/opt/mxe/usr/x86_64-w64-mingw32.static/include/float.h"
#endif /* CPU86 */
#ifdef LINUX
#include <fpu_control.h>
#endif /* LINUX */
Where you define CPU86 or LINUX at compile time. For the cross build, I'm defining CPU86.
OK, after some searching around I found this in triangle.c:
/* On some machines, my exact arithmetic routines might be defeated by the */
/* use of internal extended precision floating-point registers. The best */
/* way to solve this problem is to set the floating-point registers to use */
/* single or double precision internally. On 80x86 processors, this may */
/* be accomplished by setting the CPU86 symbol for the Microsoft C */
/* compiler, or the LINUX symbol for the gcc compiler running on Linux. */
/* */
Note that it says "On 80x86 processors". The host you're compiling for - windows 64 bit (x86_64) - does not match that.
This is further supported by the official documentation from Microsoft about the extension that's used by the library you're trying to compile:
Mask
_MCW_PC (Precision control)
(Not supported on ARM or x64 platforms.)
[..]
_PC_24 (24 bits)
_PC_53 (53 bits)
_PC_64 (64 bits)
[..]
Thus I guess you need to configure your build differently, probably not defining CPU86. Though I don't know whether this really solves your issue, or just leads to wrong results. After all ... is this library even ported to 64 bit?

Why is __ARM_FEATURE_CRC32 not being defined by the compiler?

I've been working on this issue for some time now, and I hope someone can point out my mistake. I guess I can no longer see the forest through the trees.
I have a LeMaker HiKey dev board I use for testing. Its AArch64, so its has NEON and the other cpu features like AES, SHA and CRC32:
$ cat /proc/cpuinfo
Processor : AArch64 Processor rev 3 (aarch64)
...
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
...
When I attempt to compile a program:
$ cat test.cxx
#if (defined(__ARM_NEON__) || defined(__ARM_NEON))
# define NEON_INTRINSICS_AVAILABLE 1
#else
# define NEON_INTRINSICS_AVAILABLE 0
#endif
#if BOOL_NEON_INTRINSICS_AVAILABLE
# include <arm_neon.h>
# if defined(__ARM_FEATURE_CRC32) || (__ARM_ACLE >= 200)
# include <arm_acle.h>
# endif
#endif
#include <stdint.h>
int main(int argc, char* argv[])
{
uint32_t crc = 0;
crc = __crc32b(crc, (uint8_t)0);
return 0
}
It results in the following:
$ g++ test.cxx -o test.exe
test.cxx: In function ‘int main(int, char**)’:
test.cxx:20:33: error: ‘__crc32b’ was not declared in this scope
crc = __crc32b(crc, (uint8_t)0);
^
test.cxx:22:1: error: expected ‘;’ before ‘}’ token
}
^
$ clang++ test.cxx -o test.exe
test.cxx:20:9: error: use of undeclared identifier '__crc32b'
crc = __crc32b(crc, (uint8_t)0);
^
test.cxx:21:11: error: expected ';' after return statement
return 0
^
;
2 errors generated.
A grep of the file system reveals arm_acle.h is in fact the header:
$ grep -IR '__crc32' /usr/lib
/usr/lib/gcc/.../include/arm_acle.h:__crc32b (uint32_t __a, uint8_t __b)
...
And according to ARM® C Language Extensions, Section 9.7 CRC32 Intrinsics, the missing symbols are suppose be present when __ARM_FEATURE_CRC32 is defined. Inspecting arm_acle.h confirms it.
For completeness, I tried compiling with -march=native, but the compiler rejected it.
Why is __ARM_FEATURE_CRC32 not being defined by the compiler?
What can I do to get the program to compile with the native features available on the board?
$ gcc --version
gcc (Debian/Linaro 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ clang --version
Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)
Target: aarch64-unknown-linux-gnu
Thread model: posix
$ g++ -dM -E - </dev/null | egrep -i '(arm|neon|acle)'
#define __ARM_NEON 1
$ clang++ -dM -E - </dev/null | egrep -i '(arm|neon|acle)'
#define __ARM_64BIT_STATE 1
#define __ARM_ACLE 200
#define __ARM_ALIGN_MAX_STACK_PWR 4
#define __ARM_ARCH 8
#define __ARM_ARCH_ISA_A64 1
#define __ARM_ARCH_PROFILE 'A'
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_DIV 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_UNALIGNED 1
#define __ARM_FP 0xe
#define __ARM_FP16_FORMAT_IEEE 1
#define __ARM_FP_FENV_ROUNDING 1
#define __ARM_NEON 1
#define __ARM_NEON_FP 0xe
#define __ARM_PCS_AAPCS64 1
#define __ARM_SIZEOF_MINIMAL_ENUM 4
#define __ARM_SIZEOF_WCHAR_T 4
As for why this feature isn't enabled by default; this is an optional feature not present in the baseline ABI that your compiler targets, i.e. the binaries that your compiler produces are expected to be able to run on devices lacking the CRC feature.
At least for gcc, you can enable this feature with the -march modifier crc, like this:
$ gcc -dM -E - -march=armv8-a+crc < /dev/null | egrep -i '(arm|neon|acle|crc)'
#define __ARM_FEATURE_CRC32 1
#define __ARM_NEON 1
See https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/AArch64-Options.html (or the same page for older gcc versions) for more docs on how to set this.
I guess one could expect -march=native to do the same, but that option currently only seems to be implemented for x86 architectures.