I'm designing a CUDA-C++ library with template classes. There are template functions my classes use, and they are invisible to main as well as the user. I need to specialize them explicitly because of the two steps of compiling to be performed, otherwise I'd get an "unresolved external" error when linking. Being this classes used in main.cpp, there's no way (I guess...) to tell nvcc what types are going to be used in tha main program, so I thought of using some macros to specialize them. Here's a simplified versione of the code:
//CUDA_functions.h
// CUDA functions declared here and included in files that will be compiled
// with g++. Those functions are implemented in .cu files, compiled with nvcc
template <typename T>
void foo1(T x);
template <typename T>
void foo2(T x);
template <typename T>
void foo3(T x);
//fileA.h - included in main.cpp
#include "CUDA_functions.h"
template <typename T>
class A {
// it uses foo1 & foo2 inside
}
//fileB.h - included in main.cpp
#include "CUDA_functions.h"
template <typename T>
class B {
// it uses foo1 & foo3 inside
}
//macros.h
#define _USE_CLASS_A(T) template void foo1(T); \
template void foo2(T); /**/
#define _USE_CLASS_B(T) template void foo1(T); \
template void foo3(T); /**/
//user_spec.cu - template specializations by user. This is the first file to be
// - compiled and it doesn't know what classes are going to be used
// say, user wants to use classes A & B: HERE THE ERROR RAISES!
#include "macros.h"
_USE_CLASS_A( int );
_USE_CLASS_B( int );
When I compile this code with Visual Studio, I get a warning about the double explicit instantiation (foo1), but when I compile it with g++ warning becomes an error!
I can't write macros like
#define _USE_FOO1(T) template void foo1(T) /**/
#define _USE_FOO2(T) template void foo2(T) /**/
#define _USE_FOO3(T) template void foo3(T) /**/
because the user doesn't have to worry about the existence of those functions and I'd like to specialize a list of them based on what class he/she is going to use. Last but not least, I found nothing about a "conditional specialization" of template. What can I do to solve? Thanks to everyone would be so nice to answer. Bye.
Is it for host code or device code? I believe CUDA does not support linking for device code. Linking template functions in host code has always been a bit fishy, CUDA or no CUDA.
Instead of having your hands dirty with macros -- how about putting them in a header, inside of namespace detail?
By convention, detail namespace indicates library internal stuff that you shouldn't ever access as a user.
Related
I'm trying to optimize my build time using extern templates because I have a lot of generated headers that contain typedefs to a template class.
The template class
// TypeID.h
template <typename T>
class TypeID
{
public:
TypeID(/* <some parameters> */);
bool isNull() const;
// ... many other methods
};
template <typename T>
TypeID<T>::TypeID(/* <some parameters> */)
{
// impl
}
template <typename T>
bool TypeID<T>::isNull() const
{
// impl
}
// same for the rest of the methods
Example of generated header
// NamedID.h
#include "TypeID.h"
typedef TypeID</* some type */> NamedID;
There are many (~2k) headers like NamedID with different types and they're included throughout the project.
I changed the code generator to add this line above the typedef:
extern template class TypeID</* some type */>;
and in addition to the header files, it now also generates a cpp where all the extern templates have a corresponding
template class TypeID</* some type */>;
Due to the number of headers and how many times they're used in the project I expected a big difference in compile time (at least something noticeable) but there's no difference.
I ran several runs of the build with and without this change and all of them take 2h 30m +/-2m.
Did I implement this wrong ? Am I expecting too much ?
My environment:
RHEL 7.7
GCC 8.3.1
CMake + ninja, no ccache, no icecream/distcc
First of all, I have found some seemingly related threads in this forum, but they do not help. For example, 33182246 is about static template member function, but the template member function in my question is not static, and the error therein is not the one I encountered. 12229396 is another irrelevant thread in which answerers are quarreling about whether it is allowed to export a static. But I don't intend to export a static in dll, and the question I am going to ask has nothing to do with it. As for 1053097 which is the first in my search result here, it is not about C++ at all. So I think my question is a new one and here comes the problem.
Environment:
Windows 10 version 1803
Visual Studio 2015 Update 3
Debug x64 mode in VS
Source:
There are two projects in the solution:
1) DllProject, built as a dll, contains three sources: Dll.h, Dll2.h and Dll2.cpp.
Dll.h:
#pragma once
#include "Dll2.h"
#ifdef _WINDLL
#define API_TYPE __declspec(dllexport)
#else
#define API_TYPE __declspec(dllimport)
#endif
class API_TYPE AClass {
public:
template <class T> void CallFunc() {
BClass<T>::testStatic();
}
};
template void AClass::CallFunc<float>(); //explicit instantiation
Dll2.h:
#pragma once
template <typename T>
class BClass {
public:
static T m_Static;
static T testStatic();
};
template <typename T>
T BClass<T>::testStatic() {
return m_Static;
}
Dll2.cpp:
#include "Dll2.h"
template <typename T>
T BClass<T>::m_Static; //define the static
template class BClass<float>; //explicit instantiation
2) ExeProject, built as an exe, contains Exe.cpp.
Exe.cpp:
#include "Dll.h"
int main() {
AClass a;
a.CallFunc<float>();
}
The idea behind the solution structure is as follows. The exe program ExeProject calls functions in dll, with Dll.h specifying the interface. To abstract complexities as many large opensources do, Dll.h only provides a wrapper of the underlying details, and it is this wrapper that is called by exe. The details are implemented by a template class BClass in Dll2.h and Dll2.cpp. By design, BClass is used only inside the dll so it is not qualified with __declspec(dllexport). Since BClass is a template class, I explicitly instantiate it in Dll2.cpp at line 6. Because CallFunc is also a template, I explicitly instantiated it at line 27 of Dll.h. The problem is caused by the static member in BClass. Since it is static, I need to define it and this is done in line 3-4 in Dll2.cpp, so there is an m_Static static variable inside dll that the dll uses through static function testStatic(). The DllProject compiles correctly. The exe project ExeProject also compiles without any problem. But errors arise at linking time:
1>------ Build started: Project: ExeProject, Configuration: Debug x64 ------
1>Exe.obj : error LNK2001: unresolved external symbol "public: static float BClass<float>::m_Static" (?m_Static#?$BClass#M##2MA)
1>C:\tmp\TestStatic\x64\Debug\ExeProject.exe : fatal error LNK1120: 1 unresolved externals
========== Build: 0 succeeded, 1 failed, 1 up-to-date, 0 skipped ==========
As I said above, I have defined the static, I don't want the static to be exported, I have instantiated the template class BClass and the template member function CallFunc, I have specified API_TYPE to AClass so stuffs inside it should all be exported, but why still error?
The following are what I have tried. To make sure CallFunc is really exported from dll, I qualified it with API_TYPE
class API_TYPE AClass {
public:
template <class T> void API_TYPE CallFunc() {
BClass<T>::testStatic();
}
};
Then I received a compiling error: error C2491: 'AClass::CallFunc': definition of dllimport function not allowed. The same is true if I move the implementation out of the class.
However, everything works fine if I move the implementation of template function CallFunc out of the header Dll.h and into Dll.cpp:
Dll.cpp:
#include "Dll.h"
#include "Dll2.h"
template <class T> void AClass::CallFunc() {
BClass<T>::testStatic();
}
and accordingly Dll.h is changed to:
#pragma once
#ifdef _WINDLL
#define API_TYPE __declspec(dllexport)
#else
#define API_TYPE __declspec(dllimport)
#endif
class API_TYPE AClass {
public:
template <class T> void CallFunc();
};
template void API_TYPE AClass::CallFunc<float>(); //explicit instantiation
where #include "Dll2.h" is removed, only declaration of CallFunc is in AClass' declaration and API_TYPE is added to the explicit instantiation of CallFunc at the end (API_TYPE is really erratic to use. Sometimes it works without it but sometimes not).
Now this is my question. For some reason, I would like to keep the implementation of the template member function calling static function in the header Dll.h (out of the class preferably), but as you can see, VS insists on putting the implementation in a .cpp file. So, is there any chance to work around this restriction and have the implementation code stay in header in VS? Thank you.
The workaround I found is move the definition of static variable from Dll2.cpp to Dll2.h. The implementation is still residing in Dll.h, nothing big is changed and everything is now working. For those who have doubt, Visual Studio handles this very well: the static has only one instance in the memory, as expected, and they are not exported, as desired. Pound.
Since I have solved my question, I had planned to just delete it, but I am surprised to see the downvote. Following is what I would like to say to this downvoter: I think I know what you are thinking: Sigh! This question is so low. The answer is all in the documentation or C++ Standard which I can recite immediately without even a single glance at it. I am an expert of C++, I can't tolerate such a low question ...
OK, I agree you are, but this question is all about problem-solving, without any intention to deeply discuss the theory of programming language so that I can write a phd dissertation. A practical problem does not have to be solved by showing off a profound mastery of the topic concerned. Just a small surface modification would suffice like what I did described at the beginning. So stop your arrogance and be open to all possibilities so that you can learn some problem-solving skills. It is not only good to you, but also good to the SO as a whole.
First of all, This is not a duplicate question, because 1) this is a linker problem, compiler is passed successfully because I have explicitly instantiated. 2) It's not about template class, but template member function, 3) I have some restrictions on the code structure so some existing tricks do not apply. I have searched my title here and the first few threads (40832391, 20330521, 25320619, 12848876, 36940394) are all about template class, not template member function. Some other threads are actually talking about failure of instantiation so actually a compiler issue but I have tried explicit instantiation and compiling has passed successfully, to repeat. So I hope you can suppress the temptation a little bit to close my question as duplicate and here comes my thread.
Environment:
Windows 10 version 1803
Visual Studio 2015 Update 3
Debug x64 mode in VS
Source:
There are two projects:
1) DllProject, build as a dll, contains two sources: Dll.h and Dll.cpp.
Dll.h:
#pragma once
#ifdef _WINDLL
#define API_TYPE __declspec(dllexport)
#else
#define API_TYPE __declspec(dllimport)
#endif
class API_TYPE AClass {
public:
template <class T> void Func(T& data);
template <class T> void CallFunc(T& data) {
Func<T>(data);
}
};
Dll.cpp:
#include "Dll.h"
template <class T> void AClass::Func(T& data) {
data++;
}
template void AClass::Func<float>(float&); //attempt to explicitly instantiate
2) ExeProject, built as an exe, contains Exe.cpp.
Exe.cpp:
#include "Dll.h"
int main() {
AClass a;
float f = 0.f;
a.CallFunc<float>(f);
}
As you can see, what I want is to only call the template member function CallFunc defined in dll which in turn calls another core template member function Func that does the real work. I don't need to call Func directly in exe so I don't need to export it in dll. Only CallFunc is in the API that needs to be exported and it works fine. The dll project DllProject compiles correctly. The exe project ExeProject also compiles without any problem. But errors arise at linking time:
1>------ Build started: Project: ExeProject, Configuration: Debug x64 ------
1>Exe.obj : error LNK2019: unresolved external symbol "public: void __cdecl AClass::Func<float>(float &)" (??$Func#M#AClass##QEAAXAEAM#Z) referenced in function "public: void __cdecl AClass::CallFunc<float>(float &)" (??$CallFunc#M#AClass##QEAAXAEAM#Z)
1>C:\tmp\DllProject\x64\Debug\ExeProject.exe : fatal error LNK1120: 1 unresolved externals
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
I have explicitly instantiated the template member function Func in Dll.cpp at line 8, so compilation works. But linker fails to link to this instantiated function in dll from inside CallFunc. I think the function is instantiated in dll, not in exe, because I specified __declspec(dllimport) for the class. I don't know why CallFunc can't find Func; it's just right above it.
The sources are isolated and minimized from a big opensource (DllProject corresponds to the library itself while ExeProject to test code), so the implementation and declaration of template member function are separated into two files which is unavoidable in large project, and the code structure are not to be changed. The DllProject, as its name suggests, is to be build as a dll, though building and linking everything as static library works without any problem. I have done a lot of search, but threads in this forum and others either move the template implementation into the class declaration, or into the header by #include a .tpp file, all violating the above restrictions, or suggest explicit instantiation in various ways of expression, which I think I have done already because compiling is passed.
I have tried the following methods:
1) Compile under Release configuration
2) use inline (and even __forceinline)
3) Put explicit specialization in Dll.h in class:
class API_TYPE AClass {
public:
template <class T> void Func(T& data);
template<> void AClass::Func<float>(float&);
template <class T> void CallFunc(T& data) {
Func<T>(data);
}
};
4) Put explicit specialization in Dll.h outside class:
class API_TYPE AClass {
...
};
template<> void AClass::Func<float>(float&);
6) Put explicit instantiation in Dll.h outside class:
class API_TYPE AClass {
...
};
template void AClass::Func<float>(float&);
7) Add API_TYPE in the declaration of Func, template implementation and explicit instantiation
template <class T> void API_TYPE Func(T& data);
But none of them works and reports the same error.
8) Put explicit instantiation in Dll.h in class:
class API_TYPE AClass {
public:
template <class T> void Func(T& data);
template <class T> void CallFunc(T& data) {
Func<T>(data);
}
template void AClass::Func<float>(float&);
};
Compile error: error C2252: an explicit instantiation of a template can only occur at namespace scope
9) Put explicit specialization in Dll.cpp:
Compile error: error C2910: 'AClass::Func': cannot be explicitly specialized
I hope that's enough to demonstrate my effort. So, is there any chance to fix the "unresolved external symbol" under the aforementioned restriction? In case you forget it or didn't read it at all, the restriction is
The template implementation of Func must be separate
from its declaration, i.e., must not be in class declaration or in header file.
In case you assume I don't know template function should be instantiated, to repeat, I have explicitly instantiated Func, and tried it in many ways. So compiler is perfectly happy but the linker spews the error "unresolved external symbol." So why can't the linker find the already instantiated template member function? I also checked the output symbols in the import library DllProject.dll using dumpbin. The symbol ??$Func#M#AClass##QEAAXAEAM#Z indeed resides in it, as factual as the earth rotating eastward. So do you know how to actively trace the behavior of the linker to figure out why it fails to find the function location instead of blind guessing? Thanks a lot.
№6 with export directive should work:
template API_TYPE void AClass::Func<float>(float&);
Explicit instantiation tells the compiler that this variant is instantiated in some translation unit while export directive informs the linker that it is should be exported / imported.
I'm trying to templatize a CUDA kernel based on a boolean variable (as shown here: Should I unify two similar kernels with an 'if' statement, risking performance loss?), but I keep getting a compiler error that says my function is not a template. I think that I'm just missing something obvious so it's pretty frustrating.
The following does NOT work:
util.cuh
#include "kernels.cuh"
//Utility functions
kernels.cuh
#ifndef KERNELS
#define KERNELS
template<bool approx>
__global__ void kernel(...params...);
#endif
kernels.cu
template<bool approx>
__global__ void kernel(...params...)
{
if(approx)
{
//Approximate calculation
}
else
{
//Exact calculation
}
}
template __global__ void kernel<false>(...params...); //Error occurs here
main.cu
#include "kernels.cuh"
kernel<false><<<dimGrid,dimBlock>>>(...params...);
The following DOES work:
util.cuh
#include "kernels.cuh"
//Utility functions
kernels.cuh
#ifndef KERNELS
#define KERNELS
template<bool approx>
__global__ void kernel(...params...);
template<bool approx>
__global__ void kernel(...params...)
{
if(approx)
{
//Approximate calculation
}
else
{
//Exact calculation
}
}
#endif
main.cu
#include "kernels.cuh"
kernel<false><<<dimGrid,dimBlock>>>(...params...);
If I throw in the
template __global__ void kernel<false>(...params...);
line at the end of kernels.cuh it also works.
I get the following errors (both referring to the marked line above):
kernel is not a template
invalid explicit instantiation declaration
If it makes a difference I compile all of my .cu files in one line, like:
nvcc -O3 -arch=sm_21 -I. main.cu kernels.cu -o program
All explicit specialization declarations must be visible at the time of the template instantiation. Your explicit specialization declaration is visible only in the kernels.cu translation unit, but not in main.cu.
The following code is indeed working correctly (apart from adding a __global__ qualifier at the explicit instantiation instruction).
#include<cuda.h>
#include<cuda_runtime.h>
#include<stdio.h>
#include<conio.h>
template<bool approx>
__global__ void kernel()
{
if(approx)
{
printf("True branch\n");
}
else
{
printf("False branch\n");
}
}
template __global__ void kernel<false>();
int main(void) {
kernel<false><<<1,1>>>();
getch();
return 0;
}
EDIT
In C++, templated functions are not compiled until an explicit instantiation of the function is encountered. From this point of view, CUDA, which now fully supports templates, behaves exactly the same way as C++.
To make a concrete example, when the compiler finds something like
template<class T>
__global__ void kernel(...params...)
{
...
T a;
...
}
it just checks the function syntax, but produces no object code. So, if you would compile a file with a single templated function as above, you will have an "empty" object file. This is reasonable, since the compiler would not know which type assigning to a.
The compiler produces an object code only when it encounters an explicit instantiation of the function template. This is, at that moment, how compilation of templated functions work and this behavior introduces a restriction for multiple-file projects: the implementation (definition) of a templated function must be in the same file as its declaration. So, you cannot separate the interface contained in kernels.cuh in a header file separated from kernels.cu, which is the main reason why the first version of your code does not compile. Accordingly, you must include both interface and implementation in any file that uses the templates, namely, you must include in main.cu both, kernels.cuh and kernels.cu.
Since no code is generated without an explicit instantiation, compilers tolerate the inclusion more than once of the same template file with both declarations and definitions in a project without generating linkage errors.
There are several tutorials on using templates in C++. An Idiot's Guide to C++ Templates - Part 1, apart from the irritating title, will provide you with a step-by-step introduction to the topic.
Been away from C++ for a few years and am getting a linker error from the following code:
Gene.h
#ifndef GENE_H_INCLUDED
#define GENE_H_INCLUDED
template <typename T>
class Gene {
public:
T getValue();
void setValue(T value);
void setRange(T min, T max);
private:
T value;
T minValue;
T maxValue;
};
#endif // GENE_H_INCLUDED
Gene.cpp
#include "Gene.h"
template <typename T>
T Gene<T>::getValue() {
return this->value;
}
template <typename T>
void Gene<T>::setValue(T value) {
if(value >= this->minValue && value <= this->minValue) {
this->value = value;
}
}
template <typename T>
void Gene<T>::setRange(T min, T max) {
this->minValue = min;
this->maxValue = max;
}
Using Code::Blocks and GCC if it matters to anyone. Also, clearly porting some GA stuff to C++ for fun and practice.
The template definition (the cpp file in your code) has to be included prior to instantiating a given template class, so you either have to include function definitions in the header, or #include the cpp file prior to using the class (or do explicit instantiations if you have a limited number of them).
Including the cpp file containing the implementations of the template class functions works. However, IMHO, this is weird and awkward. There must surely be a slicker way of doing this?
If you have only a few different instances to create, and know them beforehand, then you can use "explicit instantiation"
This works something like this:
At the top of gene.cpp add the following lines
template class Gene<int>;
template class Gene<float>;
In if(value >= this->minValue && value <= this->minValue) the second minValue should be maxValue, no?
Echo what Sean said: What's the error message? You've defined and declared the functions, but you've not used them in anything anywhere, nor do I see an error (besides the typo).
TLDR
It seems that you need an Explicit Instantiation i.e. to actually create the class. Since template classes are just "instructions" on how to create a class you actually need to tell the compiler to create the class. Otherwise the linker won't find anything when it goes looking.
The thorough explanation
When compiling your code g++ goes through a number of steps the problem you're seeing occurs in the Linking step. Template classes define how classes "should" be created, they're literally templates. During compile time g++ compiles each cpp file individually so the compiler sees your template on how to create a class but no instructions on what "classes" to create. Therefore ignores it. Later during the linking step the g++ attempts to link the file containing the class (the one that doesn't exist) and fails to find it ultimately returning an error.
To remedy this you actually need to "explicitly instantiate" the class by adding the following lines to Gene.cpp after the definition of the class
template class Gene<whatever_type_u_wanna_use_t>;int
Check out these docs I found them to be super helpful.