Calling a function makes my code ridiculousy slow

Calling a function makes my code ridiculousy slow - c++

I'm trying to implement an algorithm that given a rectangle and a number of polygons decided by the user, can recognize whether they are inside, outside or intersect the rectangle and provides the number of said polygons.
I coded an algorithm and it works, but I noticed that right after the compilation it takes at least 20seconds to start ( this won't happen if I start it a second, third or any other time ).
Trying to figure out what was slowing my code so much, I noticed that the program runs instantly if I delete the call to the function that determines polygon's position in relation to the rectangle.
I tried to find something wrong but found nothing
Here it is
// struct used in the function
struct Polygon
{
int ** points;
int vertices;
};
// inside, outside and over are the number of polygons that are inside, outside or intersect the rectangle,
// they're initialized to 0 in the main.
// down_side, up_side are the y_coordinate of the two horizontals sides.
// left_side, right_side are the x_coordinate of the two vertical sides.
void checkPolygons( Polygon * polygon, int & inside, int & outside, int & over, unsigned int polygons, const unsigned int down_side, const unsigned int up_side, const unsigned int left_side, const unsigned int right_side )
{
for ( unsigned int pol = 0; pol < polygons; ++pol )
{
unsigned int insideVertices = 0;
unsigned int vertices = polygon[ pol ].vertices;
for ( unsigned int point = 0; point < vertices; ++point )
{
unsigned int x_coordinate = polygon[ pol ].points[ point ][ 0 ];
unsigned int y_coordinate = polygon[ pol ].points[ point ][ 1 ];
if ( ( x_coordinate <= right_side ) and ( x_coordinate >= left_side ) and ( y_coordinate <= up_side ) and ( y_coordinate >= down_side ) )
{
insideVertices++;
}
}
if ( insideVertices == 0 )
++outside;
else if ( insideVertices == vertices )
++inside;
else
++over;
}
}

Check your antivirus activity and configuration. It may be scanning the newly compiled executables for viruses. If that's the case, you may want to exclude the directory where you are compile from virus scanning.

Related

Salome problem with importing 2D elements from another mesh

I want to have conformal meshes in the interface of two solids but the import 2D algorithm misidentifies which nodes belongs to which sub-shape and raises an error.I went through source files and couldn't find the problem and I wast able to join Salome forum.
and here is the copied mesh :
it correctly identifies the node belonging to edge 541 but the next node is set to belong to another edge.this problem only happens when there is an extra vertex like so:
I suspect this part of the source code to be the problem:
// check if a not shared link lies on face boundary
bool nodesOnBoundary = true;
list< TopoDS_Shape > bndShapes;
for ( int is1stN = 0; is1stN < 2 && nodesOnBoundary; ++is1stN )
{
const SMDS_MeshNode* n = is1stN ? link.node1() : link.node2();
if ( !subShapeIDs.count( n->getshapeId() )) // n is assigned to FACE
{
for ( size_t iE = 0; iE < edges.size(); ++iE )
if ( helper.CheckNodeU( edges[iE], n, u=0, projTol, /*force=*/true ))
{
BRep_Tool::Range(edges[iE],f,l);
if ( Abs(u-f) < 2 * faceTol || Abs(u-l) < 2 * faceTol )
// duplicated node on vertex
return error("Source elements overlap one another");
tgtFaceSM->RemoveNode( n );
tgtMesh->SetNodeOnEdge( n, edges[iE], u );
break;
}
nodesOnBoundary = subShapeIDs.count( n->getshapeId());
}

OpenGL triangle adjacency calculation

I am trying to write a program that uses OpenGL's triangle adjacencies feature (GL_TRIANGLES_ADJACENCY) to determine the silhouette of a mesh from a local light source. I'm using ASSIMP to load my mesh, and everything seems to be working correctly as far as loading and displaying the mesh is concerned. Unfortunately, the code I've written to store the indices of the adjacent triangles does not seem to be working correctly.
index[0] = mesh.mFaces[i].mIndices[0];
index[2] = mesh.mFaces[i].mIndices[1];
index[4] = mesh.mFaces[i].mIndices[2];
index[1] = findAdjacentIndex( mesh, index[0], index[2], index[4] );
index[3] = findAdjacentIndex( mesh, index[0], index[2], index[4] );
index[5] = findAdjacentIndex( mesh, index[0], index[2], index[4] );
The basic idea behind my algorithm is that, given a mesh and three indices from that mesh, find all faces (should be 1 or 2, depending on whether there is actually an adjacent face or not) of the mesh that share the edge between the first and second vertices. Then, return the third index of the triangle that does NOT use the third index of our original passed triangle. This way the same algorithm can be used for all indices of the triangle in sequence.
unsigned int Mesh::findAdjacentIndex(const aiMesh& mesh, const unsigned int index1, const unsigned int index2, const unsigned int index3) {
std::vector<unsigned int> indexMap[2];
// first pass: find all faces that use the first index
for( unsigned int i=0; i<mesh.mNumFaces; ++i ) {
unsigned int*& indices = mesh.mFaces[i].mIndices;
if( indices[0] == index1 || indices[1] == index1 || indices[2] == index1 ) {
indexMap[0].push_back(i);
}
}
// second pass: find the two faces that share the second index
for( unsigned int i=0; i<indexMap[0].size(); ++i ) {
unsigned int*& indices = mesh.mFaces[indexMap[0][i]].mIndices;
if( indices[0] == index2 || indices[1] == index2 || indices[2] == index2 ) {
indexMap[1].push_back(i);
}
}
// third pass: find the face that does NOT use the third index and return its third index
for( unsigned int i=0; i<indexMap[1].size(); ++i ) {
unsigned int*& indices = mesh.mFaces[indexMap[1][i]].mIndices;
if( indices[0] != index3 && indices[1] != index3 && indices[2] != index3 ) {
if( indices[0] != index1 && indices[0] != index2 ) {
return indices[0];
}
if( indices[1] != index1 && indices[1] != index2 ) {
return indices[1];
}
if( indices[2] != index1 && indices[2] != index2 ) {
return indices[2];
}
}
}
// no third index was found, this means there is no face adjacent to this one.
// return primitive restart index
return restartIndex;
}
Based on my understanding of what I've written, the above function should work perfectly on this example image taken from the OpenGL spec:
Triangle Adjacency Example
Unfortunately, my function does NOT work on any of my real world meshes and I have no idea why. Passing a simple box mesh through the function for example seems to usually return 0 as the adjacent index for each vertex, which makes little sense to me. The result is that the adjacencies are not uploaded correctly and I get an incorrect silhouette from my object...
If anyone here could thus shed any light on what's going wrong and what I can do to fix it, I'd be very grateful. I'd also be happy to provide more info if any is needed.

You are making it way more complicated than it needed to be. You want to search for triangles that share a specific edge and return the third vertex. Then just do so.
for(unsigned int i=0; i<mesh.mNumFaces; ++i ) {
unsigned int*& indices = mesh.mFaces[i].mIndices;
for(int edge = 0; edge < 3; ++edge) { //iterate all edges of the face
unsigned int v1 = indices[edge]; //first edge index
unsigned int v2 = indices[(edge + 1) % 3]; //second edge index
unsigned int vOpp = indices[(edge + 2) % 3]; //index of opposite vertex
//if the edge matches the search edge and the opposite vertex does not match
if(((v1 == index1 && v2 == index2) || (v2 == index1 && v1 == index2)) && vOpp != index3)
return vOpp; //we have found the adjacent vertex
}
}
return -1;
Furthermore, you need to change your calls. If you call the function three times with the same arguments, you will get the same results, of course:
index[1] = findAdjacentIndex( mesh, index[0], index[2], index[4] );
index[3] = findAdjacentIndex( mesh, index[2], index[4], index[0] );
index[5] = findAdjacentIndex( mesh, index[4], index[0], index[2] );

How to use Matlab's 512 element lookup table array in OpenCV?

I am designing morphological operations in OpenCV. I am trying to mimic the functions remove and bridge in Matlab's bwmorph. To do this I referred to the function definition of bwmorph.m, there I obtained the Look up table arrays for remove and bridge.
After that step the procedure is same for both Matlab and OpenCV.
lut(img,lutarray,img)
Problem is that Matlab uses a 512 element (9bit) look up table scheme while OpenCV uses a 256 element (8bit) look up scheme, how do I use the Matlab lutarray in OpenCV?
After doing some research I came across this post.
What does the person mean when they're saying that they "split" the image from 0-512 and then into two parts?
Is the above method even correct? Are there any alternates to doing this?

bwlookup(bw,lut)
http://se.mathworks.com/help/images/ref/bwlookup.html
or internally, applylut both perform a 2-by-2 or 3-by-3 neighborhood operation on a binary (black & white) image, whereas OpenCV's cv::LUT performs a per pixel gray level transform (closely related to intlut in MATLAB). An example of latter is performing a gamma correction on gray level image.
//! transforms array of numbers using a lookup table: dst(i)=lut(src(i))
CV_EXPORTS_W void LUT(InputArray src, InputArray lut, OutputArray dst,
int interpolation=0);
To my knowledge, there is no neighborhood bwlookup implementation in OpenCV. However, following the description of MATLAB's bwlookup, you can write it yourself.
// performs 3-by-3 lookup on binary image
void bwlookup(
const cv::Mat & in,
cv::Mat & out,
const cv::Mat & lut,
int bordertype=cv::BORDER_CONSTANT,
cv::Scalar px = cv::Scalar(0) )
{
if ( in.type() != CV_8UC1 )
CV_Error(CV_StsError, "er");
if ( lut.type() != CV_8UC1 || lut.rows*lut.cols!=512 || !lut.isContinuous() )
CV_Error(CV_StsError, "lut size != 512" );
if ( out.type() != in.type() || out.size() != in.size() )
out = cv::Mat( in.size(), in.type() );
const unsigned char * _lut = lut.data;
cv::Mat t;
cv::copyMakeBorder( in,t,1,1,1,1,bordertype,px);
const int rows=in.rows+1;
const int cols=in.cols+1;
for ( int y=1;y<rows;++y)
{
for ( int x=1;x<cols;++x)
{
int L = 0;
const int jmax=y+1;
#if 0 // row-major order
for ( int j=y-1, k=1; j<=jmax; ++j, k<<=3 )
{
const unsigned char * p = t.ptr<unsigned char>(j) + x-1;
for ( unsigned int u=0;u<3;++u )
{
if ( p[u] )
L += (k<<u);
#else // column-major order (MATLAB)
for ( int j=y-1, k=1; j<=jmax; ++j, k<<=1 )
{
const unsigned char * p = t.ptr<unsigned char>(j) + x-1;
for ( unsigned int u=0;u<3;++u )
{
if ( p[u] )
L += (k<<3*u);
#endif
}
}
out.at<unsigned char>(y-1,x-1)=_lut[ L ];
}
}
}
I tested it against remove and bridge so should work. Hope that helps.
Edit: After checking against a random lookup table,
lut = uint8( rand(512,1)>0.5 ); % #MATLAB
B = bwlookup( A, lut );
I flipped the order the indices appear in the lookup table (doesn't matter if the operation is symmetric).

Cuda efficient insertion of data into unsorted populated array

I have two arrays within Cuda;
int *main; // unsorted
int *source; // sorted
Part of my algorithm requires that I regulary insert new data into the main array from the source array. If a position within the main array is zero, it assumes it is empty, therefore it can be populated with a value from the source array.
I'm just wondering what the most efficient method of doing this is, I've tried a couple of approaches but still think there are some more performance gains to be made here.
Currently I'm using a modified version of a radix sort, to "shuffle" the contents of the main array to the very end of the main array, leaving all zero values at the beginning of the array, making the insertion from source trivial. The sort has been modified to iterate over a single bit, rather than 32 bits, this works with a simple switch on the input;
input[i] = source[i] > 1 ? 1 : 0
I'm wondering if this is already quite an efficient way of doing this? I'm wondering if I wouldn't gain something by using a tactically deployed atomicAdd such as;
__global__ void find(int *destination, int *indices, const int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if((destination[idx] == 0)&&(count<elements_to_add))
{
indices[count] = idx;
atomicAdd(&count, 1);
}
}
__global__ void insert(int *destination, int *indices, int *source, const int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if((source[idx] > 0)&&(indices[idx] > 0))
{
destination[indices[idx]] = source[idx];
}
}
find<<<G,T>>>(...);
insert<<<G,T>>>(...);
I'm not inserting that many items via the source array at the moment, but that could changing in the future.
This feels like it should be a common problem that has been solved before, I'm wondering if the thrust library may help, but having a browse for appropriate functions it doesn't quite feel right for what I'm trying to accomplish (not very neatly fitting with the code I already have)
Thoughts from experienced Cuda developers appreciated!

You can decouple your finding algorithm, which is categorized as a stream compaction procedure, and your insertion , which is categorized as scatter procedure. However, you can merge the functionality of the two.
Assuming srcPtr is a pointer that its content resides inside the global memory and is already set to zero before the kernel launch.
__global__ void find_and_insert( int* destination, int const* source, int const N, int* srcPtr ) { // Assuming N is the length of the destination buffer and also the length of the source buffer is less than N.
int const idx = blockIdx.x * blockDim.x + threadIdx.x;
// Get the assigned element.
int const dstElem = destination[ idx ];
bool const pred = ( dstElem == 0 );
// Intra-warp binary reduction to count the total number of lanes with empty elements.
int const predBallot = __ballot( pred );
int const intraWarpRed = __popc( predBallot );
// Warp-aggregated atomics to reduce the contention over the srcPtr content.
unsigned int laneID; asm( "mov.u32 %0, %laneid;" : "=r"(laneID) ); //const uint laneID = tidWithinCTA & ( WARP_SIZE - 1 );
int posW;
if( laneID == 0 )
posW = atomicAdd( srcPtr, intraWarpRed );
posW = __shfl( posW, 0 );
// Threads that have found empty elements can fill out their assigned positions from the src. Intra-warp binary prefix sum is used here.
uint laneMask; asm( "mov.u32 %0, %lanemask_lt;" : "=r"(laneMask) ); //const uint laneMask = 0xFFFFFFFF >> ( WARP_SIZE - laneID ) ;
int const positionToRead = posW + __popc( predBallot & laneMask );
if( pred )
destination[ idx ] = source[ positionToRead ];
}
A few things:
This kernel is just a suggestion on how you can do it. Here threads inside the warps collaborate on the task. You can extend the binary reduction and prefix sum over the thread-block.
I wrote this kernel inside the browser and haven't tested it. So be careful.
The whole design is not something new. Similar approaches have been implemented (for example this paper) and is mostly based on the work done by Mark Harris and Michael Garland.

Implementing De Boors algorithm for finding points on a B-spline

I've been working on this for several weeks but have been unable to get my algorithm working properly and i'm at my wits end. Here's an illustration of what i have achieved:
If everything was working i would expect a perfect circle/oval at the end.
My sample points (in white) are recalculated every time a new control point (in yellow) is added. At 4 control points everything looks perfect, again as i add a 5th on top of the 1st things look alright, but then on the 6th it starts to go off too the side and on the 7th it jumps up to the origin!
Below I'll post my code, where calculateWeightForPointI contains the actual algorithm. And for reference- here is the information i'm trying to follow. I'd be so greatful if someone could take a look for me.
void updateCurve(const std::vector<glm::vec3>& controls, std::vector<glm::vec3>& samples)
{
int subCurveOrder = 4; // = k = I want to break my curve into to cubics
// De boor 1st attempt
if(controls.size() >= subCurveOrder)
{
createKnotVector(subCurveOrder, controls.size());
samples.clear();
for(int steps=0; steps<=20; steps++)
{
// use steps to get a 0-1 range value for progression along the curve
// then get that value into the range [k-1, n+1]
// k-1 = subCurveOrder-1
// n+1 = always the number of total control points
float t = ( steps / 20.0f ) * ( controls.size() - (subCurveOrder-1) ) + subCurveOrder-1;
glm::vec3 newPoint(0,0,0);
for(int i=1; i <= controls.size(); i++)
{
float weightForControl = calculateWeightForPointI(i, subCurveOrder, controls.size(), t);
newPoint += weightForControl * controls.at(i-1);
}
samples.push_back(newPoint);
}
}
}
//i = the weight we're looking for, i should go from 1 to n+1, where n+1 is equal to the total number of control points.
//k = curve order = power/degree +1. eg, to break whole curve into cubics use a curve order of 4
//cps = number of total control points
//t = current step/interp value
float calculateWeightForPointI( int i, int k, int cps, float t )
{
//test if we've reached the bottom of the recursive call
if( k == 1 )
{
if( t >= knot(i) && t < knot(i+1) )
return 1;
else
return 0;
}
float numeratorA = ( t - knot(i) );
float denominatorA = ( knot(i + k-1) - knot(i) );
float numeratorB = ( knot(i + k) - t );
float denominatorB = ( knot(i + k) - knot(i + 1) );
float subweightA = 0;
float subweightB = 0;
if( denominatorA != 0 )
subweightA = numeratorA / denominatorA * calculateWeightForPointI(i, k-1, cps, t);
if( denominatorB != 0 )
subweightB = numeratorB / denominatorB * calculateWeightForPointI(i+1, k-1, cps, t);
return subweightA + subweightB;
}
//returns the knot value at the passed in index
//if i = 1 and we want Xi then we have to remember to index with i-1
float knot(int indexForKnot)
{
// When getting the index for the knot function i remember to subtract 1 from i because of the difference caused by us counting from i=1 to n+1 and indexing a vector from 0
return knotVector.at(indexForKnot-1);
}
//calculate the whole knot vector
void createKnotVector(int curveOrderK, int numControlPoints)
{
int knotSize = curveOrderK + numControlPoints;
for(int count = 0; count < knotSize; count++)
{
knotVector.push_back(count);
}
}

Your algorithm seems to work for any inputs I tried it on. Your problem might be a that a control point is not where it is supposed to be, or that they haven't been initialized properly. It looks like there are two control-points, half the height below the bottom left corner.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Calling a function makes my code ridiculousy slow - c++

Check your antivirus activity and configuration. It may be scanning the newly compiled executables for viruses. If that's the case, you may want to exclude the directory where you are compile from virus scanning.

Related

Salome problem with importing 2D elements from another mesh

OpenGL triangle adjacency calculation

How to use Matlab's 512 element lookup table array in OpenCV?

Cuda efficient insertion of data into unsorted populated array

Implementing De Boors algorithm for finding points on a B-spline

Categories

Resources