Can you include a .cu extension header in a c++ header? - c++

I have a .cu file that when compiled on its own, right click and select compile, it compiles just fine, but when I have another header file, a c++ header file, that calls this .cu file the build fails. The .cu file properties have been edited to build with the CUDA compiler. The errors that I am getting are 'blockIdx': undeclared identifier 'blockDim': undeclared identifier, etc.. basically errors that I would expect compiling cuda code with a c++ compiler. So is it possible to include a .cu cuda code in a c++ header?
Here is the .cu file:
Matrix.cu
#include <cuda.h>
#include <cuda_runtime.h>
#include <cuda_device_runtime_api.h>
#define BLOCKSIZE 32
using namespace std;
template<typename T> class Matrix
{
public:
typedef T value_type;
~Matrix();
Matrix();
Matrix(int rows, int columns);
int height;
int width;
int stride;
size_t size;
void CreateIdentity(Matrix<T>&I);
private:
vector<T> elements;
T* firstElement;
};
template<typename T>
Matrix<T>::~Matrix()
{
}
template<typename T>
Matrix<T>::Matrix()
{
}
template<typename T>
Matrix<T>::Matrix(int rows, int columns)
{
height = rows;
width = columns;
stride = columns; //in row major order this is equal to the # of columns
elements.resize(rows*columns);
firstElement = elements.data();
size = height*width*sizeof(T);
}
__global__ void IdentityMatrixKernel(float* identity, int size)
{
int index_x = blockIdx.x * blockDim.x + threadIdx.x;
int index_y = blockIdx.y * blockDim.y + threadIdx.y;
// map the two 2D indices to a single linear, 1D index
int grid_width = gridDim.x * blockDim.x;
int index = index_y * grid_width + index_x;
// map the two 2D block indices to a single linear, 1D block index
//int result = blockIdx.y * gridDim.x + blockIdx.x;
// write out the result
if (index % (size+1))
{
identity[index] = 0;
}
else
{
identity[index] = 1;
}
}
template<typename T>
void Matrix<T>::CreateIdentity(Matrix<T>&I)
{
float* d_I;
int size1 = I.height;
int size2 = I.height*I.width*sizeof(float);
cudaMalloc(&d_I,size2);
dim3 block_size;
block_size.x = BLOCKSIZE;
block_size.y = BLOCKSIZE;
dim3 grid_size;
grid_size.x = size1/ block_size.x + 1;
grid_size.y = size1/ block_size.y + 1;
IdentityMatrixKernel<<<block_size,grid_size>>>(d_I,size1);
cudaMemcpy(I.GetPointer(),d_I,size2,cudaMemcpyDeviceToHost);
cudaFree(d_I);
}
And here is the header file that #include "Matrix.cu"
Element.h
#pragma once
#include "Matrix.cu"
#include <vector>
using namespace std;
class Element
{
public:
Element(void);
~Element(void);
Element(int iD, float k, vector<int> nodes);
Element(int iD, vector<int> nodes, int pId);
void SetElementType(DOF type);
DOF GetElementType();
int GetNodeId(int index);
int GetNodesPerElement();
int GetPartId();
void CalculateShapeFunctions(Matrix<int> spaceCoordinates);
void CalculateSShapeDerivative(Matrix<int> spaceCoordinates);
void CalculateTShapeDerivative(Matrix<int> spaceCoordinates);
Matrix<float> GetShapeFunctions();
float GetSShapeDerivative(int row, int column);
float GetTShapeDerivative(int row, int column);
void SetStrainDisplacement(Matrix<float> B);
Matrix<float> GetStrainDisplacement();
private:
int elementId;
float stiffness;
vector<int> nodeIds;
DOF elementType;
int partId;
Matrix<float> shapeFunctions;
Matrix<float> sShapeDerivative;
Matrix<float> tShapeDerivative;
Matrix<float> strainDisplacement;
};
EDIT:
So I have been directed to try and separate the template class member functions implementing cuda into a .cu file while keeping the template class definition and any template member functions not using cuda in the original header file. This does seem on the right path, c++ compiler compiles the .h file while the cuda compiler does the .cu, but I am having trouble getting rid of link errors. I understand that I need to explicitly instantiate my template class for the types I need in the .cu file to avoid link errors, but I seem to still get them.
I instantiated my template class at the end of the .cu file as follows:
template class Matrix<float>;
template class Matrix<int>;
template class Matrix<string>;
I am now getting link errors to the template member functions using cuda.

Answer: .cu files cannot be used as #include "file.cu" like header files because they will be compiled with the C++ compiler not cuda. The solution was to move anything implementing cuda into a separate .cu file while still keeping the definitions of the template functions inside the template class definition in the header, and adding an #include "file.h" in the file.cu. To solve any link errors with the template function declarations that were moved to a .cu file, an explicit instantiation of the template class was added to the bottom of the header file. Since only float types were used in the template functions using cuda, only an instantiation of type float was added: template class Matrix. The above solution compiled and ran perfectly.

Related

How can I resolve calloc() error in .cpp and .hpp files?

I try to run a .cpp with a .hpp file in Linux using this command: g++ -c main.cpp but I have this error about calloc():
error: there are no arguments to ‘calloc’ that depend on a template parameter, so a declaration of ‘calloc’ must be available [-fpermissive]
Tr=(T *)calloc(Rows*Colomns, sizeof(T));
In member function ‘T* MyMatrix::Adjoint()’:
MyMatrix.hpp:276:35: error: there are no arguments to ‘calloc’ that depend on a template parameter, so a declaration of ‘calloc’ must be available [-fpermissive]
Temp = (T*)calloc(N*N, sizeof(T));
I noticed that this code works in Microsoft Visual Studio:
#pragma once
#include <iostream>
#include <fstream>
template <typename T>
class MyMatrix {
private:
int Rows;
int Colomns;
T* A; //Matricea
T* Tr; //Transpusa acesteia
float* Inv; //Inversa
public:
MyMatrix(int L, int C)
{
Rows = L;
Colomns = C;
A = (T*)calloc(Rows * Colomns, sizeof(T));
if (A == NULL)
throw("Eroare la alocarea matricii! :(");
}
MyMatrix(T* S, int L, int C)
: MyMatrix(L, C)
{
for (int i = 0; i < Rows * Colomns; ++i)
A[i] = S[i];
}
~MyMatrix() { free(A); }
void Transposed()
{
Tr = (T*)calloc(Rows * Colomns, sizeof(T));
for (int i = 0; i < Colomns; ++i)
for (int j = 0; j < Rows; ++j)
Tr[j * Colomns + i] = A[i * Rows + j];
}
void Inverse()
{ //some code
T* Adj = Adjoint();
Inv = (float*)calloc(Rows * Rows, sizeof(float));
for (int i = 0; i < this->Rows * this->Rows; ++i)
Inv[i] = Adj[i] / (float)Det;
}
};
#endif // MYMATRIX_HPP_INCLUDED
a declaration of ‘calloc’ must be available
The solution is to declare calloc before using it. Since it is a standard function, it must be declared by including the standard header that is specified to declare it.
calloc is declared in the header <stdlib.h>. Note that the .h suffixed headers from the C standard library are deprecated in favour of using the c prefixed headers such as <cstdlib>. However, the c prefixed headers declare the functions in the std namespace which you have failed to use in this case.
So the complete solution is to include <cstdlib>, and use std::calloc.
However, you don't need to use calloc at all. Better solution is to use std::make_unique or std::vector.
As the error message suggests, the g++ compiler used here does not have an implementation wherein the second parameter is a template type i.e. the compiler recognizes the arguments when the second argument is of type int or float because these are the types the compiler knows about its 'calloc' implementation works with these types BUT it does not recognize when the second argument is of a templated type.
Visual Studio used here probably has an implementation which allows template types to be passed to 'calloc'.
Maybe you can try updating the g++ compiler to the latest and then it might support what you are trying to do here.
hope this helps!

VisualC++ compiler error C3646: unknown override specifier

I'm trying to code a simple DirectX11 engine but I keep getting this strange error and I can't find the problem: I define a Terrain class and a Mesh class and #include the Mesh class in the Terrain class:
the Terrain class definition:
// Terrain.h
#pragma once
#include "Noise.h"
#include "Mesh.h"
class Terrain
{
public:
Terrain(float width, float depth, int numVerticesW, int numVerticesD);
~Terrain();
float GetHeight(float x, float z);
void Draw();
private:
Mesh mMesh; // I get the error on this line
Noise mNoiseGenerator;
std::vector<float> mHeights;
void CreateTerrain(float width, float depth, int numVerticesW, int numVerticesD);
float ComputeHeight(float x, float z, float startFrequency, float startAmplitude, float persistence, int octaves);
};
and the Mesh class definition:
// Mesh.h
#pragma once
#include <d3d11.h>
#include <vector>
#include "Game.h"
class Mesh
{
public:
Mesh();
~Mesh();
template <typename T, unsigned int N>
void LoadVertexBuffer(T data[][N], unsigned int size, bool dynamic = false);
void LoadIndexBuffer(std::vector<unsigned int> indices);
void SetVertexCount(unsigned int vertexCount);
void Bind();
void Draw();
private:
std::vector<ID3D11Buffer*> mVertexBuffers;
std::vector<unsigned int> mStrides;
ID3D11Buffer *mIndexBuffer;
unsigned int mVertexCount;
};
template <typename T, unsigned int N>
void Mesh::LoadVertexBuffer(T data[][N], unsigned int size, bool dynamic)
{
D3D11_BUFFER_DESC bufferDesc = {};
bufferDesc.Usage = dynamic ? D3D11_USAGE_DYNAMIC : D3D11_USAGE_IMMUTABLE;
bufferDesc.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bufferDesc.ByteWidth = sizeof(T[N]) * size;
bufferDesc.CPUAccessFlags = dynamic ? D3D11_CPU_ACCESS_WRITE : 0;
bufferDesc.MiscFlags = 0;
bufferDesc.StructureByteStride = 0;
D3D11_SUBRESOURCE_DATA bufferData = {};
bufferData.pSysMem = data;
ID3D11Buffer *buffer;
Game::GetInstance()->GetDevice()->CreateBuffer(&bufferDesc, &bufferData, &buffer);
mVertexBuffers.push_back(buffer);
mStrides.push_back(sizeof(T[N]));
}
When I compile the code I get:
Severity Code Description Project File Line Suppression State
Error C3646 'mMesh': unknown override specifier DirectX11 engine 0.3 c:\users\luca\desktop\programming\code\c++\source\visual studio\directx11 engine 0.3\terrain.h 14
Severity Code Description Project File Line Suppression State
Error C4430 missing type specifier - int assumed. Note: C++ does not support default-int DirectX11 engine 0.3 c:\users\luca\desktop\programming\code\c++\source\visual studio\directx11 engine 0.3\terrain.h 14
I searched the web but most results show missing semicolons or circular inclusion issues but I can't find any.
EDIT
I found the issue but I can't explain why my solution works:
following the inclusion tree:
Terrain.h --> Mesh.h --> Game.h --> Renderer.h --> Terrain.h
eliminating #include "Terrain.h" (since I just declare Terrain * pointers inside the class) and adding it to Terrain.cpp seems to solve the issue.
So it must be a matter of circular inclusion, but shouldn't I be guarded against that by using header/include guards?
Your problem is that #pragma once only prevents against double inclusion. I.e. it makes the following safe (simplified to make it obvious) :
// Terrain.cpp
#include "Terrain.h"
#include "Terrain.h"
It does not solve circular inclusion, which is far harder to solve automatically. With double inclusion, it's clear which one is first. But a circle has no begin.

Eclipse C++: class, namespace, enumeration not found

Greetings and thank you in advance!
I'm working in macOS X 10.12; Eclipse Neon 4.6, Compiling using macOS X GCC. I am receiving the following error:
../matrix.h:82:1: error: 'Matx' is not a class, namespace, or enumeration
`Matx::~matx(){`
`^`
`../matrix.h:27:7: note: 'Matx' declared here`
The error is confusing due to the following matrix.h file:
#ifndef MATRIX_H_
#define MATRIX_H_
#include <iostream>
template <class T>
class Matx {
int ROWS, COLS ;
int colix[COLS], rowix[ROWS] ;
T ** array ;
Matx(int, int) ;
~Matx() ;
void rowSwap() ;
void size( void ) ;
void swapRows(int i1, int i2) { std::swap(this->array[i1], this->array[i2]); }
void printMat( void ) ;
};// end class matrix
template <class T>
Matx::~Matx(){
delete this->array ;
}// end ~matx()
Note there are a few more functions in the file, but the error is consistent across all of them. I have tried defining the functions with scope resolution and without, i.e. Matx::~m but to no avail. Any help is much appreciated!
You should write the definition of the function like this:
template <class T>
Matx<T>::~Matx(){
delete this->array ;
}// end ~matx()
This Part is wrong.
int ROWS, COLS ;
int colix[COLS], rowix[ROWS] ;
your're defining arrays of size COLS and ROWS. But these are non-const member variables. You need compile time expressions. For example:
static constexpr int ROWS = 4;
static constexpr int COLS = 4;

What exactly is template constants in c++, and how we can use them?

Here is my code:
template<int BlockWidth, int FilterWidth>
__global__ gaussian_blur(uchar4 inputImage, uchar4 outputImage,
int numRows, int numCols,
int BLOCKWIDTH, int FILTERWIDTH)
{...}
int main()
{
...
const int blockWidth = 16;
const int filerWidth = 9;
gaussian_blur<<<gridDimension, blockDimension>>>(d_input, d_output,
numRows, numCols,
blockWidth, filterWidth);
...
}
The compiler just kept saying it cannot match the parameters. After searching for a while, I also tried:
gaussian_blur<<<gridDimension, blockDimension>>><int, int>(d_input, d_output,
numRows, numCols,
blockWidth, filterWidth);
but it still does not work.
By the way, can you also explain why template constants were used when trying to squeeze performance out of hardware?
These are not template arguments; the triple shift is part of NVIDIA's non-standard CUDA syntax. You should remove your template declaration, the use of the arguments you provide is handled by the CUDA compiler.
(Specifically, get rid of template<int BlockWidth, int FilterWidth>. The arguments you pass with <<< when calling a device (or global) function are handled by nvcc.)
The template arguments should be specified before CUDA <<<...>>> syntax:
const int blockWidth = 16;
const int filerWidth = 9;
gaussian_blur<blockWidth, filerWidth><<<gridDimension, blockDimension>>>(
d_input, d_output,
numRows, numCols,
blockWidth, filterWidth);

Using static const + const as array bound

I'm doing something like this
Class.hpp:
class Class {
private:
static const unsigned int arraySize;
int ar[arraySize+2];
};
Class.cpp:
#include <Class.hpp>
const unsigned int arraySize = 384;
The compiler (q++, a c++ compiler for the QNX OS based on g++) gives me error: array bound is not an integer constant while compiling a unit including Class.hpp (not while compiling Class.cpp).
Why isn't that working? I know that a static const member can be used as an array bound, guaranteed by the C++ standard (see this anwser). But why doesn't the compiler see the result of static const + const as a constant?
This is good code which should have been accepted by the compiler:
class Class {
const static int arraySize = 384;
int ar[arraySize+2];
};
and if it isn't, your compiler is broken.
However, if you move actual constant out of the header file to selected translation unit, that invalidates the code.
// Class.h
class Class {
const static int arraySize;
int ar[arraySize+2]; // ERROR
};
// Class.cpp
const int Class::arraySize = 384;
This is because the size of your Class object cannot be determined at compile time from the data available in the header alone. This is not exactly right reason, but reasoning along these lines helps to understand compilation errors such as this.
To avoid making such mistakes, you can replace static const int with an enum, e.g.
class Class {
enum { arraySize = 384 };
int ar[arraySize+2];
};
I'm surprised this actually compiles on gcc, as a comment says. Since the 384 isn't in the header file, the size of the Class is not known to other compilation units. It might not matter in some compilation units depending on how/if they are using Class, but I can't imagine this compiling:
// this is a source file called, say, blah.cpp
#include <Class.hpp>
void someFunc()
{
void *mem = malloc(sizeof(Class)); // size is not known, so this can't compile
// do something with mem
}
You need to have in your .hpp:
class Class {
private:
static const unsigned int arraySize = 384;
int ar[arraySize+2];
};
.. as it is in the OP that you link to here.