Processor optimization flags in OpenCV - c++

I'm building an application that uses OpenCV that will run on a variety of Windows computers (using Win7, Win8, Win10).
Now I have discovered that my application crashes randomly at some computers. After a lot of googling I have realized that enabling SSE3 in OpenCV can cause Illegal Instruction crashes on processors that doesn't support SSE3.
http://answers.opencv.org/question/18001/illegal-instruction-when-running-any-compiled-opencv-demo-binary-sse3-flag/
https://bugs.launchpad.net/linuxmint/+bug/1258259
So this is my question: Does anyone of you know which processor flags are "safe". I understand what they do, but I don't know how common it is for a processor to support, for instance, SSE42.
In other words: Which of these flags do you think I should disable when I compile OpenCV?
OCV_OPTION:
ENABLE_SSE
ENABLE_SSE2
ENABLE_SSE3
ENABLE_SSSE3
ENABLE_SSE41
ENABLE_SSE42
ENABLE_POPCNT
ENABLE_AVX
ENABLE_AVX2
ENABLE_FMA3

Related

How Can I Assemble ARM and Flash to STM32 in Linux?

In this term, I have Microprocessors lectures and we're working on ARM Development with C/C++ and Assembly.
For a while I've been looking for an alternative for Keil uVision which is compatible with Linux Distributions (now using Arch) and able to assemble ARM and flash, but could not find anything. The most related platform was Eclipse but it does not look supports ARM Assembly and nothing that I read about flashing to STM32.
I don't want to work on Windows for ARM Development, is there any way to assemble ARM and flash it?
Very simple. Install STM32CubeIDE for linux and nucleotides board with your preferred STM32 uC. Follow the tutorials online
Be aware that Keil uses ARM's own compiler version 5 or 6 (current releases of Keil MDK support both v5 being ARM's legacy ARMCC, and 6 is based on clang/llvm). If you are following a course, and the course material is based on a different toolchain, you may encounter difficulties - or worse your tutor may not be able to mark your work. Just a consideration before you go off-piste.
Linux solutions are likely to be GNU toolchain based. An ARM GNU toolchain for Cortex-M can be found at: https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm.
Flashing STM32 may either be done through a JTAG/SWD debugger usually using OpenOCD, or via the on-chip bootloader using a tool such as http://manpages.ubuntu.com/manpages/bionic/man1/stm32flash.1.html. Your hardware debugger vendor may have their own Linux driver, so worth checking. Mbed compatable boards present as a USB mass-storage device and can be flashed simply by copying the image file drive.
Building and flashing on Linux is only half the battle however; you will presumably want to debug your code too. GDB with OpenOCD or a proprietary driver will of course work, but raw GDB is not a pleasant experience, and you might want to have a more "visual" debug solution. IDEs such as STM32CubeIDE integrate the toolchain, flashing and debugging - but is specific to STM32.

How can we distribute compiled source code if it is specific to the hardware it was compiled on?

Suppose we take a compiled language, for example, C++. Now let's take an example Framework, suppose Qt. Qt has it's source code publically available and has the options for users to download the binary files and let users use their API. My question is however, when they compiled their code, it was compiled to their specific HardWare, Operating System, all that stuff. I understand how many Software Require recompilation for different types of Operating Systems (Including 32 vs 64bit) and offer multiple downloads on their website, however how does it not go even further to suggest it is also Hardware Specific and eventually result in the redistribution of compiled executes extremely frustrating to produce?
Code gets compiled to a target base CPU (e.g. 32-bit x86, x86_64, or ARM), but not necessarily a specific processor like the Core i9-10900K. By default, the compiler typically generates the code to run on the widest range of processors. And Intel and AMD guarantee forward compatibility for running that code on newer processors. Compilers often offer switches for optimizing to run on newer processors with new instruction sets, but you rarely do that since not all your customers have that config. Or perhaps you build your code twice (once for older processors, and an optimized build for newer processors).
There's also a concept called cross-compiling. That's where the compiler generates code for a completely different processor than it runs on. Such is the case when you build your iOS app on a Mac. The compiler itself is an x86_64 program, but it's generating ARM CPU instruction set to run on the iPhone.
Code gets compiled and linked with a certain set of OS APIs and external runtime libraries (including the C/C++ runtime). If you want your code to run on Windows 7 or Mac OSX Maverics, you wouldn't statically link to an API that only exists on Windows 10 or Mac OS Big Sur. The code would compile, but it wouldn't run on the older operating systems. Instead, you'd do a workaround or conditionally load the API if it is available. Microsoft and Apple provides the forward compatibility of providing those same runtime library APIs to be available on later OS releases.
Additionally Windows supports running 32-bit processes on 64-bit chips and OS. Mac can even emulate x86_64 on their new ARM based devices coming out later this year. But I digress.
As for Qt, they actually offer several pre-built configurations for their reference binary downloads. Because, at least on Windows, the MSVCRT (C-runtime APIs from Visual Studio) are closely tied to different compiler versions of Visual Studio. So they offer various downloads to match the configuration you want to build your your code for (32-bit, 64-bit, VS2017, VS2019, etc...). So when you put together a complete application with 3rd party dependencies, some of these build, linkage, and CPU/OS configs have to be accounted for.

cv::split() crashes on Xeon processor but works elsewhere

I am using pre-built opencv lib & dll, version 3.4.3 Winpack (downloaded from official site https://opencv.org/releases.html).
Till now everything worked fine, but recently my code started to crash.
It is one specific function that causes this crash: cv::split(). It is a common utility funtion to extract channels
from cv::Mat array. The crash occurs only on Xeon processor, Windows Server 2012. Regardless of preceding calls or context, it just crashes immediately on this call and the application just closes.
On other processors the same .exe works without problems, the code is normally tested on Windows 10 with ordinary processors. I don't have Xeon processor at hand to test every function, but the mentioned crash could be reproduced 100% on a Xeon Gold machine and I have used quite a lot of different library functions and they worked there, so it is the first one that crashed.
It seems that some functions' runtime simply contains instruction that are incompatible with the Xeon processor so it just crashes there.
Question: how do I know in advance whether certain openCV function will work or not on a Xeon processor?
Currently I have just removed cv::split() calls from my code and replced it by cv::extractChannel() methods which works fine on all tested platforms. I suspect one option would be to compile a custom version of the lib and disable specific instructions, but that will need knowledge of what to disable, etc, so frankly I am not in the mood involving
custom compiled version for what seems relatively 'standard architecture' (Xeon processor).
What can you suggest to avoid these errors?
Maybe there is a list of openCV functions that are known to be 'special' (not for Xeon processor so I can just avoid them)?
Code example:
# include <opencv2/opencv.hpp>
int main ( int argc, char* argv[] )
{
cv::Mat Patch = cv::imread ( "image.png", -1 );
cv::Mat Patch_planes[4];
cv::split ( Patch, Patch_planes );
return 0;
}
Compiler command (Microsoft (R) C/C++ Optimizing Compiler Version 19.15.26732.1 for x64):
cl.exe "minim.cpp" /EHsc /W2 /I "c:\VCLIB\openCV-3.4.3" "c:\VCLIB\openCV-3.4.3\lib\opencv_world343.lib" /link /SUBSYSTEM:CONSOLE
How do I know in advance whether certain openCV function will work or not on a Xeon processor?
You don't. The compiler will use whatever instructions it deems most suitable to compile any particular piece of code, subject to the constraints given on the command line.
So to be safe (assuming it is an 'illegal instruction' error), you probably do need to compile openCV for the least capable processor you need to support and then check the performance hit on other processors. Either that or check the CPU in your installer and install a version of openCV tailored to that processor. Yuk, I don't envy you.

How to select processor(MIPS R2000) in g++?

What is the command for selecting processor(MIPS R2000) in g++? Thanks
You'll probably need a cross-compilation environment for your target platform. You might find an existing one or you may need to build your own cross-compiler using the gcc toolchain. There's no single way to do this - it will depend on the specifics of the target architecture. Specifically, is there already an operating system (e.g. Linux, BSD, etc.) running on your target system? What kind of userland does it use - your build chain will need the relevant C and C++ library as well as any other libraries you need to build and run your software. Or are you coding straight against the metal? In this case, you'll want to find existing bootstrap code for getting the system into a sensible state for running your code - rolling your own will not be easy.
Generally, you're probably best off finding an existing developer community centred around the platform in question and asking for advice there. They may have step-by-step instructions for getting started.
Note that the CPU alone is only part of the picture - for example, the ARM architecture is very popular, but compiling code for Android devices (Linux kernel with Android userland), iOS devices (xnu kernel with BSD- and OSX-derived iOS userland), a Nintendo DS or a Playstation Vita (probably no multitasking OS at all) will be extremely different, even though they all use ARM chips, in many cases even the same instruction set generation.

HD Photo source compile on ARM?

I've downloaded HD Photo Device Porting Kit 1.0 and successfully compiled and executed it on x86 PC.
I want to port the image viewer program to ARM-based Windows Mobile Smartphone, but there is some missing ARM code.
First, no "/image/x86/x86.h" equivalent header file for ARM. But the file is very simple, so I copied and renamed it to "arm.h" and successfully compiled and linked the source code.
But at runtime, DWORD alignment exception occurrs. I found that on ARM build, it seems that ARMOPT_BITIO should be declared for properly aligned read & write. But with ARMOPT_BITIO, some IO functions are missing, e. g. peekBits, getBits, flushToByte, flushBits.
I copied x86 version of these functions (peekBit16, flushBit16, etc), but no luck, it does not work (I've got a stack overflow error).
I can't debug the complex HD Photo source files. Please let me know where can I find the missing ARM code.
Any help would be much appreciated. Thanks!
Based on my experience of porting some Microsoft code to ARM Linux, I do not think there is an easy way around it, unless someone has ported it already. You'll have to dive into this sort of low-level debugging.
Bugs I encountered were mainly related to unaligned access, and missing platform API calls. Also incorrect preprocessor checks resulted in code thinking it's running on big-endian platform.
The method I found useful to debug in such scenario is to build the code for the target platform and for the platform where it's known to work, and debug/trace these builds in parallel using a number of use cases. This will catch the most severe bugs.