Using AVX with GCC: __builtin_ia32_addpd256 not declared - c++

If I #include <immintrin.h> I get this error:
error: '__builtin_ia32_addpd256' was not declared in this scope
I have defined __AVX__ and __FMA__ macros to make AVX avilable, but apparently this isn't enough. There is no error if I use compiler flag -mavx instead of the macros, but that solution is not acceptable. So, what else should I define to use AVX?

You shouldn't be defining __AVX__ and __FMA__ yourself - these get defined automatically when you enable the correct compiler options, e.g.
gcc -Wall -mavx ...
You can check this yourself if you're interested:
No AVX:
$ gcc -dM -E - < /dev/null | egrep "AVX|FMA"
$
AVX:
$ gcc -mavx -dM -E - < /dev/null | egrep "AVX|FMA"
#define __AVX__ 1
$
AVX + FMA:
$ gcc -mavx -mfma -dM -E - < /dev/null | egrep "AVX|FMA"
#define __AVX__ 1
#define __FMA__ 1
$

The proper solution might be to have a specific file that contains the processor specific intrinsic. And you set -mavx -mfma options only to this file. The program itself determine which version to call at runtime.
I use GCC helpers to get the best optimized version at runtime.
func_avx_fma.c
void domagic_avx_fma(...) {}
func_general.c
void domagic_general(...) {}
helper.c
void domagic_avx_fma(...);
void domagic_general(...);
typedef void (*domagic_func_t)(...);
domagic_func_t resolve_domagic()
{
__builtin_cpu_init();
if (__builtin_cpu_supports("avx") && __builtin_cpu_supports("fma")) {
return domagic_avx_fma;
}
return domagic_general;
}
void domagic(...) __attribute__ ((ifunc ("resolve_domagic")));
program.c
void domagic(...);
int main() {
domagic(...);
}
To compile
$ gcc -c func_avx_fma.c -o func_avx_fma.o -O3 -mfma -mavx
$ gcc -c func_general.c -o func_general.o -O3
$ gcc -c helper.c -o helper.o
$ ...
This approach works great on x86 (x86_64) but not all targets support these helpers

Related

Detecting this undefined behavior in gcc/clang?

I am trying to detect the following undefined behavior:
% cat undef.cxx
#include <iostream>
class C
{
int I;
public:
int getI() { return I; }
};
int main()
{
C c;
std::cout << c.getI() << std::endl;
return 0;
}
For some reason all my naive attempts have failed so far:
% g++ -Wall -pedantic -o undef -fsanitize=undefined undef.cxx && ./undef
21971
same goes for:
% clang++ -Weverything -o undef -fsanitize=undefined undef.cxx && ./undef
0
Is there a way to use a magic flag in gcc/clang to report a warning/error for the above code at compile time ? at run time ?
References:
% g++ --version
g++ (Debian 10.2.1-6) 10.2.1 20210110
and
% clang++ --version
Debian clang version 11.0.1-2
Turns out my g++ version seems to handle it just fine, all I was missing is the optimization flag:
% g++ -O2 -Wall -pedantic -o undef -fsanitize=undefined undef.cxx && ./undef
undef.cxx: In function ‘int main()’:
undef.cxx:7:25: warning: ‘c.C::I’ is used uninitialized in this function [-Wuninitialized]
7 | int getI() { return I; }
| ^
0
This is clearly documented upstream:
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wuninitialized
Because these warnings depend on optimization, the exact variables or
elements for which there are warnings depend on the precise
optimization options and version of GCC used.
Here is upstream 'meta'-bug to track all those related issues:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24639

Is there any guarantee that multiple type_index instances for a type will compare equal?

I have some code which expects type_index instances for a particular type created in a shared library and instances created in an executable (for the same particular type) to compare equal.
However, I have encountered a case where this does not work on QNX 7:
// idxlib.h
#include <typeindex>
#include <string>
#include <iostream>
#ifdef BUILD_LIB
#define LIB_EXPORT __attribute__((visibility("default")))
#else
#define LIB_EXPORT
#endif
template <typename T>
class Templ
{
};
class LIB_EXPORT LibType
{
public:
LibType();
template <typename T=int>
void templateMethod(int arg = 0) const
{
#ifndef REMOVE_INSTANTIATION
if (arg == 42)
{
// arg is never 42. This code path is not taken, but it instantiates the template
templateMethod();
}
#endif
if (mti == std::type_index(typeid(Templ<int>)))
std::cout << "Type indexes the same" << std::endl;
else
std::cout << "Type indexes NOT the same" << std::endl;
}
void normalMethod();
protected:
std::type_index mti;
};
// idxlib.cpp
#include "idxlib.h"
LibType::LibType() : mti(std::type_index(typeid(Templ<int>))) {}
void LibType::normalMethod()
{
templateMethod();
}
// sharedidx.cpp
#include "idxlib.h"
int main(int argc, char* argv[])
{
LibType lt;
if (argc == 65)
// argc is not 65, so don't call it, just instantiate it
lt.templateMethod();
lt.normalMethod();
return 0;
}
Build, scp and run:
QCC -Vgcc_ntox86_64 -g -fPIC -o idxlib.cpp.o -c idxlib.cpp -DBUILD_LIB -fvisibility=hidden -fvisibility-inlines-hidden
QCC -Vgcc_ntox86_64 -g -shared -o libidx.so idxlib.cpp.o
QCC -Vgcc_ntox86_64 -g -o sharedidx libidx.so sharedidx.cpp
scp -i ~/qnxinstall/id_rsa_qnx sharedidx libidx.so qnxuser#${QNXBOX}:/home/qnxuser/test
echo
echo "comparison fails:"
ssh -i ~/qnxinstall/id_rsa_qnx -t qnxuser#${QNXBOX} "cd /home/qnxuser/test && LD_LIBRARY_PATH=/home/qnxuser/test ./sharedidx"
QCC -Vgcc_ntox86_64 -g -shared -fPIC -o idxlib.cpp.o -c idxlib.cpp -DREMOVE_INSTANTIATION -DBUILD_LIB -fvisibility=hidden -fvisibility-inlines-hidden
QCC -Vgcc_ntox86_64 -g -shared -o libidx.so idxlib.cpp.o
QCC -Vgcc_ntox86_64 -g -o sharedidx libidx.so -DREMOVE_INSTANTIATION sharedidx.cpp -fvisibility=hidden -fvisibility-inlines-hidden
scp -i ~/qnxinstall/id_rsa_qnx sharedidx libidx.so qnxuser#${QNXBOX}:/home/qnxuser/test
echo
echo "comparison works:"
ssh -i ~/qnxinstall/id_rsa_qnx -t qnxuser#${QNXBOX} "cd /home/qnxuser/test && LD_LIBRARY_PATH=/home/qnxuser/test ./sharedidx"
Output:
Type indexes NOT the same
Type indexes the same
So, the type_index comparison fails when there is a template instantiation which contains a template instantiation of itself.
Is it a bug in QNX 7, or is my expectation (that it should ever work) wrong?
Is this code relying on implementation-defined behavior? Or undefined behavior?
QNX 7 QCC compiler is based on GCC 5.4 and uses a standard library based on libc++ from the same era. I have tested GCC 5.4 (and clang with libc++ and libstdc++) on Linux and I do not get the same behavior. I have also tried with and without _LIBCPP_NONUNIQUE_RTTI_BIT defined.
So, I'm assuming this is a result of the linker rather than the compiler. Could that be true?
Are the GCC compilers just "too helpful" in making this work on Linux across shared library boundaries?
I would never assume that RTTI works correctly on an embedded systems targeting toolchain. It might be supposed to work correctly, but almost nobody enables RTTI nor exceptions for embedded systems, so it'll get zero testing nor attention from support.
I'd suggest that you use a library based RTTI emulation, such as https://www.boost.org/doc/libs/1_74_0/doc/html/boost_typeindex.html which works on systems without RTTI, and is also fully deterministic and bounded in space and time unlike language RTTI.

Clang does not generate profraw file when linking manually

I am trying out the profiling functionality of clang using llvm-cov and llvm-profdata. I have everything setup with CMake, but it doesn't generate the default.profraw as expected. I'v tried the steps manually and discovered that clang does not generate the default.profraw file in case I split the steps between generating the object files and compiling the executable.
For example, The following works:
$ clang++ -g -O0 -fprofile-instr-generate -fcoverage-mapping -std=gnu++2a binoperator.cpp main.cpp
$ ./a.out
38
Done...
$ ls -al default.profraw
-rw-rw-r--. 1 marten marten 224 May 13 13:59 default.profraw
The following doesn't work (this is roughly what CMake tries to do):
$ clang++ -g -O0 -fprofile-instr-generate -fcoverage-mapping -std=gnu++2a -o binoperator.cpp.o -c binoperator.cpp
$ clang++ -g -O0 -fprofile-instr-generate -fcoverage-mapping -std=gnu++2a -o main.cpp.o -c main.cpp
$ clang++ -o a.out binoperator.cpp.o main.cpp.o
$ ./a.out
38
Done...
$ ls -al default.profraw
ls: cannot access 'default.profraw': No such file or directory
Why? What is the difference? How can I make the second case work?
With kind regards,
Marten
Additional info:
main.cpp
#include "binoperator.h"
#include <iostream>
int main()
{
BinOperator bo;
int result = bo.add(5, 33);
std::cout << result << std::endl;
std::cout << "Done..." << std::endl;
return 0;
}
binoperator.h
#ifndef BINOPERATOR_H
#define BINOPERATOR_H
class BinOperator
{
public:
int add(int a, int b) const;
};
#endif
binoperator.cpp
#include "binoperator.h"
int BinOperator::add(int a, int b) const
{
return (a + b);
}
$ clang --version
clang version 8.0.0 (Fedora 8.0.0-1.fc30)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
I've found out that in the second case, the -fprofile-instr-generate -fcoverage-mapping options should also be specified in the linking call to clang++:
$ clang++ -O0 -fprofile-instr-generate -fcoverage-mapping binoperator.cpp.o main.cpp.o -o a.out
In CMake, this can be done with target_link_options().

Why are only some of these C++ template instantiations exported in a shared library?

I have a C++ dynamic library (on macOS) that has a templated function with some explicit instantiations that are exported in the public API. Client code only sees the template declaration; they have no idea what goes on inside it and are relying on these instantiations to be available at link time.
For some reason, only some of these explicit instantiations are made visible in the dynamic library.
Here is a simple example:
// libtest.cpp
#define VISIBLE __attribute__((visibility("default")))
template<typename T> T foobar(T arg) {
return arg;
}
template int VISIBLE foobar(int);
template int* VISIBLE foobar(int*);
I would expect both instantiations to be visible, but only the non-pointer one is:
$ clang++ -dynamiclib -O2 -Wall -Wextra -std=c++1z -stdlib=libc++ -fvisibility=hidden -fPIC libtest.cpp -o libtest.dylib
$ nm -gU libtest.dylib | c++filt
0000000000000f90 T int foobar<int>(int)
This test program fails to link because the pointer one is missing:
// client.cpp
template<typename T> T foobar(T); // assume this was in the library header
int main() {
foobar<int>(1);
foobar<int*>(nullptr);
return 0;
}
$ clang++ -O2 -Wall -Wextra -std=c++1z -stdlib=libc++ -L. -ltest client.cpp -o client
Undefined symbols for architecture x86_64:
"int* foobar<int*>(int*)", referenced from:
_main in client-e4fe7d.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
There does seem to be some connection between the types and the visibility. If I change the return type to void, they are all visible (even if the template arguments are still pointers or whatever). Especially bizarre, this exports both:
template auto VISIBLE foobar(int) -> int;
template auto VISIBLE foobar(int*) -> int*;
Is this a bug? Why would apparent syntactic sugar change behavior?
It works if I change the template definition to be visible, but it seems non-ideal because only a few of these instantiations should be exported... and I still want to understand why this is happening, either way.
I am using Apple LLVM version 8.0.0 (clang-800.0.42.1).
Your problem is reproducible on linux:
$ clang++ --version
clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
$ clang++ -shared -O2 -Wall -Wextra -std=c++1z -fvisibility=hidden \
-fPIC libtest.cpp -o libtest.so
$ nm -C libtest.so | grep foobar
0000000000000620 W int foobar<int>(int)
0000000000000630 t int* foobar<int*>(int*)
The non-pointer overload is weakly global but the pointer overload is
local.
The cause of this is obscured by clang's slack diagnosing of the __attribute__
syntax extension, which after all is a GCC invention. If we compile with
g++ instead we get:
$ g++ -shared -O2 -Wall -Wextra -std=c++1z -fvisibility=hidden -fPIC libtest.cpp -o libtest.so
libtest.cpp:9:36: warning: ‘visibility’ attribute ignored on non-class types [-Wattributes]
template int * VISIBLE foobar(int *);
^
Notice that g++ ignores the visibility attribute only in the pointer overload,
and, just like clang - and consistent with that warning - it emits code with:
$ nm -C libtest.so | grep foobar
0000000000000610 W int foobar<int>(int)
0000000000000620 t int* foobar<int*>(int*)
Clearly clang is doing the same thing, but not telling us why.
The difference between the overloads that satisfies g++ with one and
dissatisfies it with the other is the difference between int and int *.
On that basis we'd expect g++ to be satisfied with the change:
template int VISIBLE foobar(int);
//template int * VISIBLE foobar(int *);
template float VISIBLE foobar(float);
And so it is:
$ g++ -shared -O2 -Wall -Wextra -std=c++1z -fvisibility=hidden -fPIC libtest.cpp -o libtest.so
$ nm -C libtest.so | grep foobar
0000000000000650 W float foobar<float>(float)
0000000000000640 W int foobar<int>(int)
And so is clang:
$ clang++ -shared -O2 -Wall -Wextra -std=c++1z -fvisibility=hidden -fPIC libtest.cpp -o libtest.so
$ nm -C libtest.so | grep foobar
0000000000000660 W float foobar<float>(float)
0000000000000650 W int foobar<int>(int)
Both of them will do what you want for overloads with T a non-pointer type, but
not with T a pointer type.
What you face here, however, is not a ban on dynamically visible functions
that return pointers rather than non-pointers. It couldn't have escaped notice if
visibility was as broken as that. It is just a ban on types of the form:
D __attribute__((visibility("...")))
where D is a pointer or reference type, as distinct from types of the form:
E __attribute__((visibility("..."))) *
or:
E __attribute__((visibility("..."))) &
where E is not a pointer or reference type. The distinction is between:
A (pointer or reference that has visibility ...) to type D
and:
A (pointer or reference to type E) that has visibility ...
See:
$ cat demo.cpp
int xx ;
int __attribute__((visibility("default"))) * pvxx; // OK
int * __attribute__((visibility("default"))) vpxx; // Not OK
int __attribute__((visibility("default"))) & rvxx = xx; // OK,
int & __attribute__((visibility("default"))) vrxx = xx; // Not OK
$ g++ -shared -Wall -Wextra -std=c++1z -fvisibility=hidden -o libdemo.so demo.cpp
demo.cpp:3:46: warning: ‘visibility’ attribute ignored on non-class types [-Wattributes]
int * __attribute__((visibility("default"))) vpxx; // Not OK
^
demo.cpp:5:46: warning: ‘visibility’ attribute ignored on non-class types [-Wattributes]
int & __attribute__((visibility("default"))) vrxx = xx; // Not OK
^
$ nm -C libdemo.so | grep xx
0000000000201030 B pvxx
0000000000000620 R rvxx
0000000000201038 b vpxx
0000000000000628 r vrxx
0000000000201028 b xx
The OK declarations become global symbols; the Not OK ones become local,
and only the former are dynamically visible:
nm -CD libdemo.so | grep xx
0000000000201030 B pvxx
0000000000000620 R rvxx
This behaviour is reasonable. We can't expect a compiler to attribute
global, dynamic visibility to a pointer or reference that could point or
refer to something that does not have global or dynamic visibility.
This reasonable behaviour only appears to frustrate your objective because
- as you probably now see:
template int VISIBLE foobar(int);
template int* VISIBLE foobar(int*);
doesn't mean what you thought it did. You thought that, for given type U,
template U VISIBLE foobar(U);
declares a template instantiating function that has default
visibility, accepting an argument of type U and returning the same. In fact,
it declares a template instantiating function that accepts an argument of
type U and returns type:
U __attribute__((visibility("default")))
which is allowed for U = int, but disallowed for U = int *.
To express your intention that instantations of template<typename T> T foobar(T arg)
shall be dynamically visible functions, qualify the type of the template function
itself with the visibility attribute. Per GCC's documentation of the __attribute__
syntax - which admittedly
says nothing specific concerning templates - you must make an attribute
qualification of a function in a declaration other than its definition. So complying
with that, you'd revise your code like:
// libtest.cpp
#define VISIBLE __attribute__((visibility("default")))
template<typename T> T foobar(T arg) VISIBLE;
template<typename T> T foobar(T arg) {
return arg;
}
template int foobar(int);
template int* foobar(int*);
g++ no longer has any gripes:
$ g++ -shared -O2 -Wall -Wextra -std=c++1z -fvisibility=hidden -fPIC libtest.cpp -o libtest.so
$ nm -CD libtest.so | grep foobar
0000000000000640 W int foobar<int>(int)
0000000000000650 W int* foobar<int*>(int*)
and both of the overloads are dynamically visible. The same goes for clang:
$ clang++ -shared -O2 -Wall -Wextra -std=c++1z -fvisibility=hidden -fPIC libtest.cpp -o libtest.so
$ nm -CD libtest.so | grep foobar
0000000000000650 W int foobar<int>(int)
0000000000000660 W int* foobar<int*>(int*)
With any luck, you'll have the same result with clang on Mac OS

resolved: c++ : put normal method definition into source file while the templates method in header file

Since the definition of template must be put in header file, so I don't like it if the template class is big. so I want to make a normal class with some templated methods. Putting the defintion of templated method into header file, for others, put them into c++ source files. So here is what I am thinking.
// lambda.h
#include <iostream>
class X {
public:
std::function<bool(int)> filter;
template <class F>
void setFilter(F fn) {
filter = fn;
}
void big_function(int x);
};
// cat lambda.cpp
#include <iostream>
#include "lambda.h"
void X::big_function(int x) {
if (filter(x)) std::cout << x << std::endl;
}
// main2.cpp
#include <stdlib.h>
#include "lambda.h"
class Filter {
public:
bool operator()(int x) { return true; }
};
int main() {
X x;
x.setFilter(Filter());
x.big_function(3);
return 0;
}
// cat 2.sh
g++ -c lambda.cpp -ggdb
g++ -c main2.cpp -ggdb -std=c++11
g++ -o main2 main2.o lambda.o -ggdb
this program can compile, but got segment fault during executing (x.big_function(3));
#update
Q1: is my thinking is reasonable? is there any obvious error in my code?
Answer: Yes, it is reasonable, and no obvious error. Thanks to the first 4 comments, I did more test and works.
Q2: actually if I compile with -std=c++11, I will got segment fault. but no segment fault if I don't use std=c++11. ( I tried c++11 yesterday because I used lambda expression rather than function object for "Filter" at beginning). And it my real case, I can't discard c++11 features.
Answer: shame about my fault. fixed the issue by adding -std=c++11 for every compile unit.
zhifan$ sh -x 2.sh
+ g++ -c lambda.cpp -ggdb
+ g++ -c main2.cpp -ggdb
+ g++ -o main2 main2.o lambda.o -ggdb
zhifan$ ./main2
3
zhifan$ vim 2.sh
hifan$ sh -x 2.sh
+ g++ -c lambda.cpp -ggdb **-std=c++11**
+ g++ -c main2.cpp -ggdb -std=c++11
+ g++ -o main2 main2.o lambda.o -ggdb
zhifan$ ./main2
Segmentation fault: 11
zhifan$ g++ -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix