compiling error while using sse4.2 function on intel machine - c++

I am trying to use the intrensic function _mm_crc32_u32 on my Xeon(R) CPU E5-2650 v2 INTEL machine,
I compile the project with the sse4.2 flag enabled (inside the makefile):
CCFLAGS += -msse4.2
but i still get the error:
nmmintrin.h:31:3: error: #error "SSE4.2 instruction set not enabled"
any ideas why this might still happen?

Related

Cross compile C++ for ARM64/x86_64, using clang, with core2-duo enabled

OK, so I am new to cross compilation.
I am writing some shell-scripts to compile some C++ files, on my Mac.
I want to build a "Fat universal binary", so I want this to work for Arm64 and x86_64.
After a lot of searching, I found using: --arch arm64 --arch x86_64 will solve my problem of cross compilation.
However, my old "optimisation flags" conflicted this. I used them to make my code run faster, for a computer-game I was making. Well... these are the flags:
-march=core2 -mfpmath=sse -mmmx -msse -msse2 -msse4.1 -msse4.2
Unfortunately... clang can't figure out that I mean to use this, for the intel build only. I get error message of:
clang: error: the clang compiler does not support '-march=core2'
If I remove --arch arm64 --arch x86_64 the code compiles again.
I tried various things like --arch x86_64+sse4 which seem allowed by the gcc documentation, but clang does not recognise them.
As far as I know, gcc/clang do not compile sse4.2 instructions by default. Despite the CPUs being released about 17 years ago. This is quite a bad assumption I think.

Enable AVX intrinsics in GCC 4.8.2 without -mavx

Is there anyways to get the AVX intrinsics included via x86intrin.h in GCC 4.8.2 without turning on -mavx for the whole compilation unit? I only want to use them in a function that I've marked target("avx"). More recent versions of gcc seem to do this correctly. But with 4.8.2 even if I manually #define __AVX__ (ew), it complains about the __builtin* functions not being defined...

Performance comparison issue between OpenMPI and Intel MPI

I am working with a C++ MPI code which when compiled with openMPI takes 1min12 seconds and 16 seconds with Intel MPI (I have tested it on other inputs too, difference is similar. Both compiled codes give correct answer). I want to understand why is there such a big difference in run time. And what can be done to decrease run time with openMPI (GCC).
I am using CentOS 6 OS with Intel Haswell processor.
I am using following flags for compiling.
openMPI (GCC): mpiCC -Wall -O3
I have also tried -march=native -funroll-loops. It does not make a great difference. I have also tried -lm option. I cannot compile for 32 bit.
Intel MPI: mpiicpc -Wall -O3 -xhost
-xhost saves 3 seconds in run time.

GNU Fortran compiler optimisation flags for Ivy Bridge architecture

May I please ask for your suggestions on the GNU Fortran compiler (v6.3.0) flags to optimise the code for the Ivy Bridge architecture (Intel Xeon CPU E5-2697v2 Ivy Bridge # 2.7 GHz)?
At the moment I’m compiling the code with the following flags:
-O3 -march=ivybridge -mtune=ivybridge -ffast-math -mavx -m64 -w
Unless you use intrinsics specific to Ivy bridge, Sandy bridge flag os sufficient. I expect you should find some advantage by setting additionally -funroll-loops --param max-unroll-times=2
Sometimes -O2 -ftree-vectorize will work out better than -O3.
If you have complex data type you will want to check vs. -fno-cx-limited-range as the default of -ffast-math may be too aggressive.

Successful compilation of SSE instruction with qmake (but SSE2 is not recognized)

I'm trying to compile and run my code migrated from Unix to windows. My code is pure C++ and not using Qt classes. it is fine in Unix.
I'm also using Qt creator as an IDE and qmake.exe with -spec win32-g++ for compiling. As I have sse instructions within my code, I have to include emmintrin.h header.
I added:
QMAKE_FLAGS_RELEASE += -O3 -msse4.1 -mssse3 -msse3 -msse2 -msse
QMAKE_CXXFLAGS_RELEASE += -O3 -msse4.1 -mssse3 -msse3 -msse2 -msse
In the .pro file. I have been able to compile my code without errors. but after running it gives run-time error while going through some functions containing __m128 or like that.
When I open emmintrin.h, I see:
#ifndef __SSE2__
# error "SSE2 instruction set not enabled"
#else
and It is undefined after #else.
I don't know how to enable SSE in my computer.
Platform: Windows Vista
System type: 64-bit
Processor: intel(R) Core(TM) i5-2430M CPU # 2.40Hz
Does anyone know the solution?
Thanks in advance.
It sounds like your data is not 16 byte aligned, which is a requirement for SSE loads such as mm_load_ps. You can either:
use _mm_loadu_ps as a temporary workaround. On newer CPUs the performance hit for misaligned loads such as this is fairly small (on older CPUs it's much more significant), but it should still be avoided if possible
or
fix your memory alignment. On Windows/Visual Studio you can use the declspec(align(16)) attribute for static allocations or _aligned_malloc for dynamic allocations. For gcc and most other civilised platforms/compilers use __attribute__ ((align(16))) for the former and posix_memalign for the latter.