CUDA Check out nvcc "-arch"-flag during run time - c++

Is there somehow a possibility to call different kernels depending on whether for example nvcc -arch=sm_11 or nvcc -arch=sm_20 has been used to compile the code? To be a bit more explicit:
if (FANCY_FLAG == CU_TARGET_COMPUTE_11)
// Do some conversions here..
krnl1<<<GRID_DIM1, BLOCK_DIM1>>>(converted_value1);
else if (FANCY_FLAG == CU_TARGET_COMPUTE_20)
krnl2<<<GRID_DIM2, BLOCK_DIM2>>>(value1);
As you can see I found the CUjit_target_enum in cuda.h but I wasn't able to find out whether the nvcc defines any flags which would be equal to one of the enums values.
My intention for this is that I don't know whether my device support double precision floats or not. That would mean I have to convert my data from double to float and hence, run a different kernel (Yes, I'd prefer to run the kernel with double precision over single precision wherever possible).
I'd also appreciate a completely different approach as long as it does the trick.

In the device code check CUDA_ARCH macro value
In the host code - check major and minor fields of the device properties.

Related

speeding up complex-number multiplication in C++

I have some code which multiplies complex numbers, and have noticed that mulxc3 (long double version of muldc3) is being called frequently: i.e. the complex number multiplications are not being inlined.
I am compiling with g++ version 7.5, with -O3 and --ffast-math.
It is similar to this question, except the problem persists when I compile with -ffast-math. Since I do not require checking for whether the arguments are Inf or NaN, I was considering making my own very simple complex class without such checks to allow the multiplication to be inlined, but given my lack of C++ proficiency, and having read this article makes me think that would be counterproductive.
So, is there a way to change either my code or compilation process so that I can keep using std::complex, but inline the multiplication?

asin produces different answers on different platforms using Clang

#include <cmath>
#include <cstdio>
int main() {
float a = std::asin(-1.f);
printf("%.10f\n", a);
return 0;
}
I ran the code above on multiple platforms using clang, g++ and Visual studio. They all gave me the same answer: -1.5707963705
If I run it on macOS using clang it gives me -1.5707962513.
Clang on macOS is supposed to use libc++, but does macOS has its own implementation of libc++?
If I run clang --verison I get:
Apple LLVM version 10.0.0 (clang-1000.11.45.5)
Target: x86_64-apple-darwin18.0.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
asin is implemented in libm, which is part of the standard C library, not the C++ standard library. (Technically, the C++ standard library includes C library functions, but in practice both the Gnu and the LLVM C++ library implementations rely on the underlying platform math library.) The three platforms -- Linux, OS X and Windows -- each have their own implementation of the math library, so if the library function were being used, it could certainly be a different library function and the result might differ in the last bit position (which is what your test shows).
However, it is quite possible that the library function is never being called in all cases. This will depend on the compilers and the optimization options you pass them (and maybe some other options as well). Since the asin function is part of the standard library and thus has known behaviour, it is entirely legitimate for a compiler to compute the value of std::asin(-1.0F) at compile time, as it would for any other constant expression (like 1.0 + 1.0, which almost any compiler will constant-fold to 2.0 at compile time).
Since you don't mention what optimization settings you are using, it's hard to tell exactly what's going on, but I did a few tests with http://gcc.godbolt.org to get a basic idea:
GCC constant folds the call to asin without any optimisation flags, but it does not precompute the argument promotion in printf (which converts a to a double in order to pass it to printf) unless you specify at least -O1. (Tested with GCC 8.3).
Clang (7.0) calls the standard library function unless you specify at least -O2. However, if you explicitly call asinf, it constant folds at -O1. Go figure.
MSVC (v19.16) does not constant fold. It either calls a std::asin wrapper or directly calls asinf, depending on optimisation settings. I don't really understand what the wrapper does, and I didn't spend much time investigating.
Both GCC and Clang constant fold the expression to precisely the same binary value (0xBFF921FB60000000 as a double), which is the binary value -1.10010010000111111011011 (trailing zeros truncated).
Note that there is also a difference between the printf implementations on the three platforms (printf is also part of the platform C library). In theory, you could see a different decimal output from the same binary value, but since the argument to printf is promoted to double before printf is called and the promotion is precisely defined and not value-altering, it is extremely unlikely that this has any impact in this particular case.
As a side note, if you really care about the seventh decimal point, use double instead of float. Indeed, you should only use float in very specific applications in which precision is unimportant; the normal floating point type is double.
The mathematically exact value of asin(-1) would be -pi/2, which of course is irrational and not possible to represent exactly as a float. The binary digits of pi/2 start with
1.1001001000011111101101010100010001000010110100011000010001101..._2
Your first three libraries round this (correctly) to
1.10010010000111111011011_2 = 1.57079637050628662109375_10
On MacOS it appears to get truncated to:
1.10010010000111111011010_2 = 1.57079625129699707031250_10
This is an error of less than 1 ULP (unit in the last place). This could be caused either by a different implementation, or your FPU is set to a different rounding mode, or perhaps in some cases the compiler computes the value at compile-time.
I don't think the C++ standard really gives any guarantees on the accuracy of transcendental functions.
If you have code which really depends on having (platform/hardware independent) accuracy, I suggest to use a library, like e.g., MPFR. Otherwise, just live with the difference. Or have a look at the source of the asin function which is called in each case.

-Wtype-limits on attempt to limit an unsigned integer

Consider the following example:
unsigned short c = // ...
if (c > 0xfffful)
c = 0xfffful;
Since unsigned short can actually be larger than 16 bits, I want to limit the value before snprintf it in hex format to a fixed-size buffer.
However, GCC (but not clang) gives a warning: comparison is always false due to limited range of data type [-Wtype-limits].
Is it a bug in GCC or I missed something? I understand that on my machine unsigned short is exactly 16 bits, but it's not guaranteed to be so on other platforms.
I'd say it is not a bug. GCC is claiming if (c > 0xfffful) will always be false, which, on your machine, is true. GCC was smart enough to catch this, clang wasn't. Good job GCC!
On the other hand, GCC was not smart enough to notice that while it was always false on your machine, its not necessarily always false on someone else's machine. Come on GCC!
Note that in C++11, the *_least##_t types appear (I reserve the right to be proven wrong!) to be implemented by typedef. By the time GCC is running it's warning checks it likely has no clue that the original data type was uint_least16_t. If that is the case, the compiler would have no way of inferring that the comparison might be true on other systems. Changing GCC to remember what the original data type was might be extremely difficult. I'm not defending GCC's naive warning but suggesting why it might be hard to fix.
I'd be curious to see what the GCC guys say about this. Have you considered filing an issue with them?
This doesn't seem like a bug (maybe it could be deemed a slightly naive feature), but I can see why you'd want this code there for portability.
In the absence of any standard macros to tell you what the size of the type is on your platform (and there aren't any), I would probably have a step in my build process that works that out and passes it to your program as a -D definition.
e.g. in Make:
if ...
CFLAGS += -DTRUNCATE_UINT16_LEAST_T
endif
then:
#ifdef TRUNCATE_UINT16_LEAST_T
if (c > 0xfffful)
c = 0xfffful;
#endif
with the Makefile conditional predicated on output from a step in configure, or the execution of some other C++ program that simply prints out sizeofs. Sadly that rules out cross-compiling.
Longer-term I propose suggesting more intelligent behaviour to the GCC guys, for when these particular type aliases are in use.

Is there a standard way to determine at compile-time if system is 32 or 64 bit?

I need to set #ifdef - checks for conditional compile. I want to automate the process but cannot specify the target OS/machine. Is there some way that the pre-compiler can resolve whether it it is running on 32-bit or 64-bit?
(Explanation) I need to define a type that is 64 bits in size. On 64bit OS it is a long, on most others it is a long long.
I found this answer - is this the correct way to go?
[edit] a handy reference for compiler macros
The only compile check you can do reliably would be sizeof(void*) == 8, true for x64 and false for x86. This is a constexpr and you can pass it to templates but you can forget using ifdef with it. There is no platform-independent way to know the address size of the target architecture (at pre-process time), you will need to ask your IDE for one. The Standard doesn't even have the concept of the address size.
No there is no standard language support for macro to determine if the machine is a 64-bit or 32-bit at preprocessor stage.
In response to your edit, there is a "macro-less for you" way to get a type that is 64 bits.
if you need a type that can hold 64 bits, then #include <cstdint> and use either int64_t or uint64_t. You can also use the Standard Integer Types provided by Boost.
Another option is to use long long. It's technically not part of the C++ standard (it will be in C++0x) but is supported on just about every compiler.
I would look at source code for a cross-platform library. It is a quite large part. Every pair of OS and compiler has own set of definitions. Few libraries You may look at:
http://www.libsdl.org/ \include\SDL_config*.h (few files)
http://qt.nokia.com/ \src\corelib\global\qglobal.h
Boost has absorbed the old Predef project. You'll want the architecture macros, more specifically BOOST_ARCH_X86_32/BOOST_ARCH_X86_64, assuming you only care about x86.
If you need a wider detection (e.g. ARM64), either add the relevant macro's to your check, or check what you actually want to check, e.g.
sizeof(void*) == 8
Well, the answer is clearly going to be OS-specific, so you need to narrow down your requirements.
For example, on Unix uname -a typically gives enough info to distinguish a 32-bit build of the OS from a 64-bit build.
The command can be invoked by your pre-compiler. Depending on its output, compiler flags can be set appropriately.
I would be tempted to hoist the detection out of the code and put that into the Makefile. Then, you can leverage system tools to detect and set the appropriate macro upon which you are switching in your code.
In your Makefile ...
<do stuff to detect and set SUPPORT_XX_BIT to the appropriate value>
gcc myFile.c -D$(SUPPORT_XX_BIT) -o myFile
In your code ...
#if defined(SUPPORT_32_BIT)
...
#elif defined(SUPPORT_64_BIT)
...
#else
#error "Select either 32 or 64 bit option\n"
#endif
Probably the easiest way might be comparing the size of int and long long. You cannot do it in the pre-processor though but you can use it in static_assert.
Edit: WoW all the negative votes. I made my point a bit more clear. Also it appears I should have mentioned 'long long' rather than 'long' because of the way MSVC works.

How to detect an overflow in C++?

I just wonder if there is some convenient way to detect if overflow happens to any variable of any default data type used in a C++ program during runtime? By convenient, I mean no need to write code to follow each variable if it is in the range of its data type every time its value changes. Or if it is impossible to achieve this, how would you do?
For example,
float f1=FLT_MAX+1;
cout << f1 << endl;
doesn't give any error or warning in either compilation with "gcc -W -Wall" or running.
Thanks and regards!
Consider using boosts numeric conversion which gives you negative_overflow and positive_overflow exceptions (examples).
Your example doesn't actually overflow in the default floating-point environment in a IEEE-754 compliant system.
On such a system, where float is 32 bit binary floating point, FLT_MAX is 0x1.fffffep127 in C99 hexadecimal floating point notation. Writing it out as an integer in hex, it looks like this:
0xffffff00000000000000000000000000
Adding one (without rounding, as though the values were arbitrary precision integers), gives:
0xffffff00000000000000000000000001
But in the default floating-point environment on an IEEE-754 compliant system, any value between
0xfffffe80000000000000000000000000
and
0xffffff80000000000000000000000000
(which includes the value you have specified) is rounded to FLT_MAX. No overflow occurs.
Compounding the matter, your expression (FLT_MAX + 1) is likely to be evaluated at compile time, not runtime, since it has no side effects visible to your program.
In situations where I need to detect overflow, I use SafeInt<T>. It's a cross platform solution which throws an exception in overflow situations.
SafeInt<float> f1 = FLT_MAX;
f1 += 1; // throws
It is available on codeplex
http://www.codeplex.com/SafeInt/
Back in the old days when I was developing C++ (199x) we used a tool called Purify. Back then it was a tool that instrumented the object code and logged everything 'bad' during a test run.
I did a quick google and I'm not quite sure if it still exists.
As far as I know nowadays several open source tools exist that do more or less the same.
Checkout electricfence and valgrind.
Clang provides -fsanitize=signed-integer-overflow and -fsanitize=unsigned-integer-overflow.
http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation