LLVM with CUDA inline assembly - c++

I am trying to compile a CUDA code with following inline assembly:
static __device__ uint get_smid(void) {
uint ret;
asm("mov.u32 %0, %smid;" : "=r"(ret) );
return ret;
}
The code compiles fine with nvcc with a flag -Xptxas -v.
When i try to compile it with clang++ (version 4.0), with corresponding flag -Xcuda-ptxas -v (I think this is right, but I maybe mistaken), I get following error:
../../include/cutil_subset.h:23:25: error: invalid % escape in inline assembly string
asm("mov.u32 %0, %smid;" : "=r"(ret) );
It points to %smid.
I think I am suppose to link proper library but I have this too: L/cuda/install/lib.
Another possibility is NVPTX asm incompatibility. On this page, it is explained that LLVM has different definitions for all PTX variables (there are some for smid and warpid as well). Now I am lost if the mentioned code has to be separately (not inline) written and compiled as such.
Has anybody dealt with similar issue before? Suggestions are welcomed.

You need to reference the special register with a double percent sign: %%smid.
The %% escape sequence gets converted to a single percent sign during compilation, so that ptxas sees the correct special register name. The double percent sign version also works under nvcc.
nvcc seems to be more forgiving with escape sequences in inline assembler than clang++ is, and leaves unknown escape sequences untouched rather than emitting an error as clang does in this case.

Related

asm function with c++

I would like to add method to my class using assebler language. How can I do it?
example:
main.cpp
Struct ex {
int field1;
asm_method(char*);
}
add.asm
asm_method:
//some asm code
Get asm output the compiler generates for a non-inline definition of the C++ member function, and use that as a starting point for an asm source file. This works for any ISA with any compiler that can emit valid asm (which is most of them, although apparently MSVC emits a bunch of extra junk that you have to remove.)
Example with GCC (for x86-64 GNU/Linux, but works anywhere)
Also works with clang.
e.g. g++ -O3 -fverbose-asm -masm=intel -S -o foo_func.S foo.cpp (How to remove "noise" from GCC/clang assembly output?)
That .S file is now your asm source file. Remove the compiler-generated instruction lines and insert your own.
Obviously you need to know the calling convention and other stuff like that (e.g. for x86 see https://www.agner.org/optimize/#manuals for a calling convention guide), but this will get the compiler to do the name mangling for you, for that specific target platform's ABI.
struct ex { // lower case struct not Struct
int field1;
void *asm_method(char*); // methods need a return type
}; // struct declarations end with a ;
void *ex::asm_method(char*) {
return this; // easy way to find out what register `this` is passed in.
}
compiles as follows for x86-64 System V, with g++ -O3 (Godbolt with Linux gcc and Windows MSVC)
# x86-64 System V: GNU/Linux g++ -O3
# This is GAS syntax
.intel_syntax noprefix
.text # .text section is already the default at top of file
.align 2
.p2align 4 # aligning functions by 16 bytes is typical
.globl _ZN2ex10asm_methodEPc # the symbol is global, not private to this file
.type _ZN2ex10asm_methodEPc, #function # (optional) and it's a function.
_ZN2ex10asm_methodEPc: # a label defines the symbol
.cfi_startproc
## YOUR CODE GOES HERE ##
## RSP-8 is aligned by 16 in x86-64 SysV and Windows ##
mov rax, rdi # copy first arg (this) to return-value register.
ret # pop into program counter
.cfi_endproc
.size _ZN2ex10asm_methodEPc, .-_ZN2ex10asm_methodEPc # maybe non-optional for dynamic linking
It's probably fine to omit the .cfi stack-unwind directives from hand-written asm for leaf functions, since you're not going to be throwing C++ exceptions from hand-written asm (I hope).
This depends on your target platform and compiler/toolchain and is generally too broad a question for StackOverflow.
For example, the C++ compiler in the GCC toolchain actually generates assembly from C++, and then produces object files from that assembly. Then the linker links together multiple object files to produce an ELF module.
You can bypass the C++ compilation step for a single object file and directly write .asm files.
You can compile it the same way you compile .c: gcc myfile.S -o myfile.o.
Though you should take platform ABI into account such that you can accept function arguments and return values via the correct registers. The platform ABI also specifies the calling convention and which registers should be preserved across function calls. Finally, you need to produce correct function names according to C++ name mangling rules, or use C naming rules (which are simpler) and declare your function extern "C".
For more details see C++ to ASM linkage and for Linux ABI refer to System V ABI.
For Windows start here: calling conventions and compiling assembly in Visual Studio.

How to turn off the constant folding optimization in llvm

I am new to clang and llvm. I'm trying to generate an unoptimized version of bit code from a c source code. I found that the generated bit code is having the constant folding optimization which I don't want.
I'm using this command: clang -O0 -Xclang -disable-O0-optnone test1.c -S -emit-llvm -o test1.ll
The test1.c file has the following code:
int test() {
int y;
y = 2 * 4;
return y;
}
The content of the test1.ll file:
Instead of generating an instruction for multiplying 2 and 4, it is directly storing the value 8 by doing the constant folding operation:
store i32 8, i32* %1, align 4
It would be really nice if someone kindly let me know what I am missing and how should I turn off the constant folding optimization. The version of llvm I am using is 6.0.0.
Thank you.
It would be really nice if someone kindly let me know what I am missing and how should I turn off the constant folding optimization. The version of llvm I am using is 6.0.0.
It is a Clang feature and can't be turned off even with -O0. To workaround this try making variables global, pass them as parameters to the function, or just write the IR manually.

G++ ignores _Pragma diagnostic ignored

I am trying to disable g++ warnings in code expanded from macros. By my understanding, _Pragma should follow macro usage and this should not trigger Wparentheses when being compiled with g++:
#include <stdio.h>
#define TEST(expr) \
int a = 1; \
_Pragma( "GCC diagnostic push" ) \
_Pragma( "GCC diagnostic ignored \"-Wparentheses\"" ) \
if (a <= expr) { \
printf("filler\n"); \
} \
_Pragma( "GCC diagnostic pop" )
int main(){
int b = 2, c = 3;
TEST(b == c);
}
When I compile this with g++, I get Wparentheses warning, which I am trying to disable.
xarn#DESKTOP-B2A3CNC:/mnt/c/ubuntu$ g++ -Wall -Wextra test3.c
test3.c: In function ‘int main()’:
test3.c:8:11: warning: suggest parentheses around comparison in operand of ‘==’ [-Wparentheses]
if (a <= expr) { \
^
test3.c:15:5: note: in expansion of macro ‘TEST’
TEST(b == c);
^
However it works as expected when using gcc:
xarn#DESKTOP-B2A3CNC:/mnt/c/ubuntu$ gcc -Wall -Wextra test3.c
test3.c: In function ‘main’:
test3.c:16:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
I am using g++ version 4.8.5.
There are long-standing bugs in g++ handling of _Pragmas, that are not present when using the gcc front-end. The only solution is to either go forward to a sufficiently modern version of g++ (IIRC 6+), or to disable the warning for the entire TU.
Xarn's answer was very helpful in working out why we were hitting the same issues with our macros when compiling with g++ < 9.0, but fortunately I'm stubborn and don't take "the only solution" for an answer. Some more digging revealed that there is a workaround for affected versions of GCC.
One of the original 2012 reports for this issue at GNU's bugzilla included an offhand mention from the reporter, that _Pragma() would be processed as expected if they added either -save-temps or -no-integrated-cpp to the compile command.
Turns out, either of those options cause g++ NOT to run in its default streamlined mode, which folds the preprocessing and compiling stages together into a single pass. From the man page for g++ 9.1.1:
-no-integrated-cpp
Perform preprocessing as a separate pass before compilation. By
default, GCC performs preprocessing as an integrated part of input
tokenization and parsing. If this option is provided, the
appropriate language front end (cc1, cc1plus, or cc1obj for C, C++,
and Objective-C, respectively) is instead invoked twice, once for
preprocessing only and once for actual compilation of the
preprocessed input. This option may be useful in conjunction with
the -B or -wrapper options to specify an alternate preprocessor or
perform additional processing of the program source between normal
preprocessing and compilation.
Which means that adding -no-integrated-cpp does indeed work around the _Pragma() bug in every affected version of GCC we've tested — so far that's 5.4, 7.3, and I believe 8.1 — but otherwise has no effect on the final results of the build. (One can deduce from this that the _Pragma() bug was introduced with and by that single-pass streamlining.)
The only real tradeoff is that compilation is indeed a bit slower, if you build with that option enabled. While that's certainly worth it when your GCC is one of the affected versions, we're using a conditional in our CMake build setup to ensure -no-integrated-cpp is only set when necessary:
#### Work around a GCC < 9 bug with handling of _Pragma() in macros
#### See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55578
if ((${CMAKE_CXX_COMPILER_ID} STREQUAL "GNU") AND
(${CMAKE_CXX_COMPILER_VERSION} VERSION_LESS "9.0.0"))
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -no-integrated-cpp")
endif()
(Substitute appropriately modern calls to target_compile_options() for the ugly brute-forcing of CMAKE_CXX_FLAGS, if your CMake setup is better than ours.)
Typically you use warning suppression only to deal with unavoidable warning coming from third-party code so they won't clutter compilation logs. In your case it would be better to
1) use regular function because macros are evil
2) deal with warning by adding round brackets around potentially broken expression
if (a <= (expr)) {

Assembler Messages: no such instruction when Compiling C++

I am attempting to compile a C++ code using gcc/5.3 on Scientific Linux release 6.7. I keep getting the following errors whenever I run my Makefile though:
/tmp/ccjZqIED.s: Assembler messages:
/tmp/ccjZqIED.s:768: Error: no such instruction: `shlx %rax,%rdx,%rdx'
/tmp/ccjZqIED.s:1067: Error: no such instruction: `shlx %rax,%rdx,%rdx'
/tmp/ccjZqIED.s: Assembler messages:
/tmp/ccjZqIED.s:6229: Error: no such instruction: `mulx %r10,%rcx,%rbx'
/tmp/ccjZqIED.s:6248: Error: no such instruction: `mulx %r13,%rcx,%rbx'
/tmp/ccjZqIED.s:7109: Error: no such instruction: `mulx %r10,%rcx,%rbx'
/tmp/ccjZqIED.s:7128: Error: no such instruction: `mulx %r13,%rcx,%rbx'
I've attmpted to follow the advice from this question with no change to my output:
Compile errors with Assembler messages
My compiler options are currently:
CXXFLAGS = -g -Wall -O0 -pg -std=c++11
Does anyone have any idea what could be causing this?
This means that GCC is outputting an instruction that your assembler doesn't support. Either that's coming from inline asm in the source code, or that shouldn't happen, and suggests that you have compiled GCC on a different machine with a newer assembler, then copied it to another machine where it doesn't work properly.
Assuming those instructions aren't used explicitly in an asm statement you should be able to tell GCC not to emit those instructions with a suitable flag such as -mno-avx (or whatever flag is appropriate to disable use of those particular instructions).
#jonathan-wakely's answer is correct in that the assembler, which your compiler invokes, does not understand the assembly code, which your compiler generates.
As to why that happens, there are multiple possibilities:
You installed the newer compiler by hand without also updating your assembler
Your compiler generates 64-bit instructions, but assembler is limited to 32-bit ones for some reason
Disabling AVX (-mno-avx) is unlikely to help, because it is not explicitly requested either -- there is no -march in the quoted CXXFLAGS. If it did help, then you did not show us all of the compiler flags -- it would've been best, if you simply included the entire compiler command-line.
If my suspicion is correct in 1. above, then you should build and/or install the latest binutils package, which will provide as aware of AVX instructions, among other things. You would then need to rebuild the compiler with the --with-as=/path/to/the/updated/as flag passed to configure.
If your Linux installation is 32-bit only (suspicion 2.), then you should not be generating 64-bit binaries at all. It is possible, but not trivial...
Do post the output of uname -a and your entire compiler command-line leading to the above error-messages.

g++ compilation of a separately preprocessed file gives error depending on the architecture

I am using g++ version 4.1.2 on a x64_86 GNU linux architecture. Code base is very huge and I don't have sufficient understanding of makefiles used in the project. The code compiles fine as it is.
For some debugging purpose, I need to preprocess (g++ -E) few source files individually and then re-compile it. I am giving the required include paths using -I. Ideally the compilation should go fine.
But I am getting few discrepancies in standard headers like:
typedef unsigned long size_t; causes errors with operator new()
declaration generated by compiler (if I change to unsigned int
manually then this error disappears)
In library functions like unsigned long numeric_limits<>::max(),
compiler complains for big numbers such as 922...807L; it generates
compiler error as integer constant is too large for long type
Mismatch declaration of __errorno_location() gives compiler error
I am having hard time finding what is going wrong. Why compilation goes fine when I do make on unchanged file and why standard headers start cribbing when I give g++ -I <> -E option on individual file ?
(Note that there is no problem with the code we have written, it's just from standard library side. I tried locating the stddef.h which has unsigned int as typedef, but that just fixes the 1st problem. )
Any idea to fix this errors would be highly appreciated.
Don't preprocess and compile separately, or if you must then use consistent compiler options and a consistent environment.
It sounds a though you're running the preprocessor on a 32-bit machine (or using the -m32 option) then compiling on a 64-bit machine.
When compiling the output of the preprocessor, make sure that you use the-fpreprocessed compiler option so that the preprocessor will not run again.
If you don't pass in that option certain constructs that produced identifiers that look like macros may get expanded again into something they shouldn't get expanded to. It's hard for me to come up with a case that shows a difference (I'm sure I can, but it would take a bit of puzzling out and would be pretty contrived). However, the implementation headers may well use some arcane macro techniques that might be sensitive to this option.