Nested loop summation in 2D using OpenCL

Nested loop summation in 2D using OpenCL - c++

I have recently started working with OpenCL in C++ and I'm trying to fully understand how to use 2D and 3D NDRange. I'm currently implementing Inverse Distance Weighting in OpenCL, but my problem is general.
Below is the serial function to compute the weights and it consists of a nested loop.
void computeWeights(int nGrids, int nPoints, double *distances, double *weightSum, const double p) {
for (int i = 0; i < nGrids; ++i) {
double sum = 0;
for (int j = 0; j < nPoints; ++j) {
double weight = 1 / pow(distances[i * nPoints + j], p);
distances[i * nPoints + j] = weight;
sum += weight;
}
weightSum[i] = sum;
}
}
What I would want is to implement the above function using a 2D NDRange, the first being over nGrids and the second over nPoints. What I don't understand, though, is how to handle the summation of the weights into weightSum[i]. I understand that I may have to use parallel sum reduction, somehow.

When dispatching a kernel with a 2D global workspace, OpenCL creates a grid of work-items. Each work-item executes the kernel and gets unique ids in both those dimensions.
(x,y)|________________________
| (0,0) (0,1) (0,2) ...
| (1,0) (1,1) (1,2)
| (2,0) (2,1) (2,2)
| ...
The work-items are also divided into groups and get unique ids within those work-groups. E.g. for work-groups of size (2,2):
(x,y)|________________________
| (0,0) (0,1) (0,0) ...
| (1,0) (1,1) (1,0)
| (0,0) (0,1) (0,0)
| ...
You can arrange the work-groups, so that each one of them performs a reduction.
Your SDK probably has samples, and a parallel reduction will be one of them.
To get you started, here is a kernel that solves your problem. It's in its simplest form, and works for a single work-group per row.
// cl::NDRange global(nPoints, nGrids);
// cl::NDRange local(nPoints, 1);
// cl::Local data(nPoints * sizeof (double));
kernel
void computeWeights(global double *distances, global double *weightSum, local double *data, double p)
{
uint nPoints = get_global_size(0);
uint j = get_global_id(0);
uint i = get_global_id(1);
uint lX = get_local_id(0);
double weight = 1.0 / pow(distances[i * nPoints + j], p);
distances[i * nPoints + j] = weight;
data[lX] = weight;
for (uint d = get_local_size(0) >> 1; d > 0; d >>= 1)
{
barrier(CLK_LOCAL_MEM_FENCE);
if (lX < d)
data[lX] += data[lX + d];
}
if (lX == 0)
weightSum[i] = data[0];
}
Each row of work-items (i.e. each work-group) computes the weights (and their sum) for grid i. Each work-item computes a weight, stores it back to distances, and loads it onto local memory. Then each work-group performs a reduction in local memory, and finally the result gets stored in weightSum.

Related

How to get texture coordinates from VTK IntersectWithLine?

I've loaded a texture mapped OBJ via vtkOBJReader and loaded it into a vtkModifiedBSPTree:
auto readerOther(vtkSmartPointer<vtkOBJReader>::New());
auto rawOtherPath(modelPathOther.toLatin1());
readerOther->SetFileName(rawOtherPath.data());
readerOther->Update();
auto meshDataOther(readerOther->GetOutput());
auto bspTreeOther(vtkSmartPointer<vtkModifiedBSPTree>::New());
bspTreeOther->SetDataSet(meshDataOther);
bspTreeOther->BuildLocator();
I then compute my line segment start and end and feed that into
if (bspTreeOther->IntersectWithLine(p1, p2, tolerance, distanceAlongLine, intersectionCoords, pcoords, subId, cellId, cell))
With all the relevant predefined variables of course.
What I need is the texture's UV coordinates at the point of intersection.
I'm so very new to VTK that I've not yet caught the logic of how its put together yet; the abstraction layers are still losing me while I'm digging through the source.
I've hunted for this answer across SO and the VTK users archives and found vague hints given by those who understood VTK deeply to those who were nearly there themselves, and thus of little help to me thus far.
(Appended 11/9/2018)
To clarify, I'm working with non-degenerate triangulated meshes created by a single 3D scanner shot, so quads and other higher polygons are not going to be ever seen by my code. A general solution should account for such things, but that can be accomplished via triangulating the mesh first via a good application of handwavium.

Code
Note that if one vertex belongs to several polygons and has different texture coordinates, VTK will create duplicates of the vertex.
I don't use vtkCleanPolyData, because VTK will merge such "duplicates" and we will lose needed information, as far as I know.
I use vtkCellLocator instead of vtkModifiedBSPTree,
because in my case it was faster.
The main file main.cpp.
You can find magic numbers in start and end arrays — these are your p1 and p2.
I've set these values just for example
#include <vtkSmartPointer.h>
#include <vtkPointData.h>
#include <vtkCellLocator.h>
#include <vtkGenericCell.h>
#include <vtkOBJReader.h>
#include <vtkTriangleFilter.h>
#include <vtkMath.h>
#include <iostream>
int main(int argc, char * argv[])
{
if (argc < 2)
{
std::cerr << "Usage: " << argv[0] << " OBJ_file_name" << std::endl;
return EXIT_FAILURE;
}
auto reader{vtkSmartPointer<vtkOBJReader>::New()};
reader->SetFileName(argv[1]);
reader->Update();
// Triangulate the mesh if needed
auto triangleFilter{vtkSmartPointer<vtkTriangleFilter>::New()};
triangleFilter->SetInputConnection(reader->GetOutputPort());
triangleFilter->Update();
auto mesh{triangleFilter->GetOutput()};
// Use `auto mesh(reader->GetOutput());` instead if no triangulation needed
// Build a locator to find intersections
auto locator{vtkSmartPointer<vtkCellLocator>::New()};
locator->SetDataSet(mesh);
locator->BuildLocator();
// Initialize variables needed for intersection calculation
double start[3]{-1, 0, 0.5};
double end[3]{ 1, 0, 0.5};
double tolerance{1E-6};
double relativeDistanceAlongLine;
double intersectionCoordinates[3];
double parametricCoordinates[3];
int subId;
vtkIdType cellId;
auto cell{vtkSmartPointer<vtkGenericCell>::New()};
// Find intersection
int intersected = locator->IntersectWithLine(
start,
end,
tolerance,
relativeDistanceAlongLine,
intersectionCoordinates,
parametricCoordinates,
subId,
cellId,
cell.Get()
);
// Get points of intersection cell
auto pointsIds{vtkSmartPointer<vtkIdList>::New()};
mesh->GetCellPoints(cellId, pointsIds);
// Store coordinates and texture coordinates of vertices of the cell
double meshTrianglePoints[3][3];
double textureTrianglePoints[3][2];
auto textureCoordinates{mesh->GetPointData()->GetTCoords()};
for (unsigned pointNumber = 0; pointNumber < cell->GetNumberOfPoints(); ++pointNumber)
{
mesh->GetPoint(pointsIds->GetId(pointNumber), meshTrianglePoints[pointNumber]);
textureCoordinates->GetTuple(pointsIds->GetId(pointNumber), textureTrianglePoints[pointNumber]);
}
// Normalize the coordinates
double movedMeshTrianglePoints[3][3];
for (unsigned i = 0; i < 3; ++i)
{
movedMeshTrianglePoints[0][i] = 0;
movedMeshTrianglePoints[1][i] =
meshTrianglePoints[1][i] -
meshTrianglePoints[0][i];
movedMeshTrianglePoints[2][i] =
meshTrianglePoints[2][i] -
meshTrianglePoints[0][i];
}
// Normalize the texture coordinates
double movedTextureTrianglePoints[3][2];
for (unsigned i = 0; i < 2; ++i)
{
movedTextureTrianglePoints[0][i] = 0;
movedTextureTrianglePoints[1][i] =
textureTrianglePoints[1][i] -
textureTrianglePoints[0][i];
movedTextureTrianglePoints[2][i] =
textureTrianglePoints[2][i] -
textureTrianglePoints[0][i];
}
// Calculate SVD of a matrix consisting of normalized vertices
double U[3][3];
double w[3];
double VT[3][3];
vtkMath::SingularValueDecomposition3x3(movedMeshTrianglePoints, U, w, VT);
// Calculate pseudo inverse of a matrix consisting of normalized vertices
double pseudoInverse[3][3]{0};
for (unsigned i = 0; i < 3; ++i)
{
for (unsigned j = 0; j < 3; ++j)
{
for (unsigned k = 0; k < 3; ++k)
{
if (w[k] != 0)
{
pseudoInverse[i][j] += VT[k][i] * U[j][k] / w[k];
}
}
}
}
// Calculate interpolation matrix
double interpolationMatrix[3][2]{0};
for (unsigned i = 0; i < 3; ++i)
{
for (unsigned j = 0; j < 2; ++j)
{
for (unsigned k = 0; k < 3; ++k)
{
interpolationMatrix[i][j] += pseudoInverse[i][k] * movedTextureTrianglePoints[k][j];
}
}
}
// Calculate interpolated texture coordinates of the intersection point
double interpolatedTexturePoint[2]{textureTrianglePoints[0][0], textureTrianglePoints[0][1]};
for (unsigned i = 0; i < 2; ++i)
{
for (unsigned j = 0; j < 3; ++j)
{
interpolatedTexturePoint[i] += (intersectionCoordinates[j] - meshTrianglePoints[0][j]) * interpolationMatrix[j][i];
}
}
// Print the result
std::cout << "Interpolated texture coordinates";
for (unsigned i = 0; i < 2; ++i)
{
std::cout << " " << interpolatedTexturePoint[i];
}
std::cout << std::endl;
return EXIT_SUCCESS;
}
CMake project file CMakeLists.txt
cmake_minimum_required(VERSION 3.1)
PROJECT(IntersectInterpolate)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
find_package(VTK REQUIRED)
include(${VTK_USE_FILE})
add_executable(IntersectInterpolate MACOSX_BUNDLE main.cpp)
if(VTK_LIBRARIES)
target_link_libraries(IntersectInterpolate ${VTK_LIBRARIES})
else()
target_link_libraries(IntersectInterpolate vtkHybrid vtkWidgets)
endif()
Math
What we need
Suppose you have a mesh consisting of triangles and your vertices have texture coordinates.
Given vertices of a triangle A, B and C, corresponding texture coordinates A', B' and C', you want to find a mapping (to interpolate) from another inner and boundary points of the triangle to the texture.
Let's make some rational assumptions:
Points A, B, C should correspond to their texture coordinates A', B', C';
Each point X on the border, say AB, should correspond to the points of A'B' line in following way: |AX| / |AB| = |A'X'| / |A'B'| — half way on the original triangle should be a half way on the texture map;
Centroid of the triangle (A + B + C) / 3 should correspond to centroid of the texture triangle (A' + B' + C') / 3.
Equations to solve
Looks like we want to have affine mapping: coordinates of vertices of the original triangle should be multiplied by some coefficients and be added to some constants.
Let's construct the system of equations
Ax * Mxx + Ay * Myx + Az * Mzx + M0x = A'x
Ax * Mxy + Ay * Myy + Az * Mzy + M0y = A'y
Ax * Mxz + Ay * Myz + Az * Mzz + M0z = 0
and the same for B and C.
You can see that we have 9 equations and 12 unknowns.
Though, equations containing Miz (for i in {x, y, z}) have the solution 0 and don't play any role in further computations, so we can just set them equal to 0.
Thus, we have system with 6 equations and 8 unknowns
Ax * Mxx + Ay * Myx + Az * Mzx + M0x = A'x
Ax * Mxy + Ay * Myy + Az * Mzy + M0y = A'y
Let's write entire system in matrix view
-- -- -- -- -- --
| 1 Ax Ay Az | | M0x M0y | | A'x A'y |
| 1 Bx By Bz | x | Mxx Mxy | = | B'x B'y |
| 1 Cx Cy Cz | | Myx Myy | | C'x C'y |
-- -- | Mzx Mzy | -- --
-- --
I subtract coordinates of A vertex from B and C
and texture coordinates A' from B' and C',
and now we have the triangle with the first vertex located
in start of coordinate system as well as corresponding texture coordinates.
This means that now triangles are not translated (moved) one relative to another
and we don't need M0 part of interpolation matrix
-- -- -- -- -- --
| Bx By Bz | | Mxx Mxy | | B'x B'y |
| Cx Cy Cz | x | Myx Myy | = | C'x C'y |
-- -- | Mzx Mzy | -- --
-- --
Solution
Let's call the first matrix P, the second M and the last one T
P M = T
The matrix P is not square.
If we add zero row to it, the matrix becomes singular.
So, we have to calculate pseudo-inverse of it in order to solve the equation.
There's no function for calculating pseudo-inverse matrix in VTK.
We go to Moore–Penrose inverse article on Wikipedia and see that it can be calculated using SVD.
VTKMath::SingularValueDecomposition3x3 function allows us to do it.
The function gives us U, S and VT matrices.
I'll write pseudo-inverse of matrix P as P",
transposition of U as UT and transposition of VT as V.
Pseudo-inverse of diagonal matrix S is a matrix with 1 / Sii elements
where Sii is not a zero and 0 for zero elements
P = U S VT
P" = V S" UT
M = P" T
Usage
To apply interpolation matrix,
we need to not forget that we need to translate input and output vectors.
A' is a 2D vector of texture coordinates of the first vertex in the triangle,
A is a 3D vector of coordinates of the vertex,
M is the found interpolation matrix,
p is a 3D intersection point we want to get texture coordinates for,
t' is the resulting 2D vector with interpolated texture coordinates
t' = A' + (p - A) M

[Rewritten 2019/5/7 to reflect an updated understanding.]
After finding out that the parametric coordinates are inputs to a function from which one can get barycentric coordinates in the case of triangular cells, and then learning about what barycentric coordinates are, I was able to work out the following.
const auto readerOther(vtkSmartPointer<vtkOBJReader>::New());
const auto rawOtherPath(modelPathOther.toLatin1());
readerOther->SetFileName(rawOtherPath.data());
readerOther->Update();
const auto meshDataOther(readerOther->GetOutput());
const auto bspTreeOther(vtkSmartPointer<vtkModifiedBSPTree>::New());
bspTreeOther->SetDataSet(meshDataOther);
bspTreeOther->BuildLocator();
double point1[3]{0.0, 0.0, 0.0}; // start of line segment used to intersect the model.
double point2[3]{0.0, 0.0, 10.0}; // end of line segment
double distanceAlongLine;
double intersectionCoords[3]; // The coordinate of the intersection.
double parametricCoords[3]; // Parametric Coordinates of the intersection - see https://lorensen.github.io/VTKExamples/site/VTKBook/08Chapter8/#82-interpolation-functions
int subId; // ?
vtkIdType cellId;
double intersectedTextureCoords[2];
if (bspTreeOther->IntersectWithLine(point1, point2, TOLERANCE, distanceAlongLine, intersectionCoords, parametricCoords, subId, cellId))
{
const auto textureCoordsOther(meshDataOther->GetPointData()->GetTCoords());
const auto pointIds{meshDataOther->GetCell(cellId)->GetPointIds()};
const auto vertexIndex0{pointIds->GetId(0)};
const auto vertexIndex1{pointIds->GetId(1)};
const auto vertexIndex2{pointIds->GetId(2)};
double texCoord0[2];
double texCoord1[2];
double texCoord2[2];
textureCoordsOther->GetTuple(vertexIndex0, texCoord0);
textureCoordsOther->GetTuple(vertexIndex1, texCoord1);
textureCoordsOther->GetTuple(vertexIndex2, texCoord2);
const auto parametricR{parametricCoords[0]};
const auto parametricS{parametricCoords[1]};
const auto barycentricW0{1 - parametricR - parametricS};
const auto barycentricW1{parametricR};
const auto barycentricW2{parametricS};
intersectedTextureCoords[0] =
barycentricW0 * texCoord0[0] +
barycentricW1 * texCoord1[0] +
barycentricW2 * texCoord2[0];
intersectedTextureCoords[1] =
barycentricW0 * texCoord0[1] +
barycentricW1 * texCoord1[1] +
barycentricW2 * texCoord2[1];
}
Please note that this code is an interpretation of the actual code I'm using; I'm using Qt and its QVector2D and QVector3D classes along with some interpreter glue functions to go to and from arrays of doubles.
See https://lorensen.github.io/VTKExamples/site/VTKBook/08Chapter8 for details about the parametric coordinate systems of various cell types.

Suggestions to Compute the Intersetions of Multiple Convex 2D Polygons

I am writing this question fishing for any state-of-the-art software or methods that can quickly compute the intersection of N 2D polygons (the convex hulls of projected convex polyhedrons), and M 2D polygons where typically N >> M. N may be in the order or at least 1M polygons and N in the order 50k. I've searched for some time now, but I keep coming up with the same answer shown below.
Use boost and a loop to
compute the projection of the polyhedron (not the bottleneck)
compute the convex hull of said polyhedron (bottleneck)
compute the intersection of the projected polyhedron and existing 2D polygon (major bottleneck).
This loop is repeated NK times where typically K << M, and K is the average number of 2D polygons intersecting a single projected polyhedron. This is done to reduce the number of computations.
The problem with this is that if I have N=262144 and M=19456 it takes about 129 seconds (when multithreaded by polyhedron), and this must be done about 300 times. Ideally, I would like to reduce the computation time to about 1 second for the above sizes, so I was wondering if someone could help point to some software or literature that could improve efficiency.
[EDIT]
#sehe's request I'm posting the most relevant parts of the code. I haven't compiled it, so this is just to get the gist... this code assumes, there are voxels and pixels, but the shapes can be anything. The order of the points in the grid can be any, but the indices of where the points reside in the grid are the same.
#include <boost/geometry/geometry.hpp>
#include <boost/geometry/geometries/point.hpp>
#include <boost/geometry/geometries/ring.hpp>
const std::size_t Dimension = 2;
typedef boost::geometry::model::point<float, Dimension, boost::geometry::cs::cartesian> point_2d;
typedef boost::geometry::model::polygon<point_2d, false /* is cw */, true /* closed */> polygon_2d;
typedef boost::geometry::model::box<point_2d> box_2d;
std::vector<float> getOverlaps(std::vector<float> & projected_grid_vx, // projected voxels
std::vector<float> & pixel_grid_vx, // pixels
std::vector<int> & projected_grid_m, // number of voxels in each dimension
std::vector<int> & pixel_grid_m, // number of pixels in each dimension
std::vector<float> & pixel_grid_omega, // size of the pixel grid in cm
int projected_grid_size, // total number of voxels
int pixel_grid_size) { // total number of pixels
std::vector<float> overlaps(projected_grid_size * pixel_grid_size);
std::vector<float> h(pixel_grid_m.size());
for(int d=0; d < pixel_grid_m.size(); d++) {
h[d] = (pixel_grid_omega[2*d+1] - pixel_grid_omega[2*d]) / pixel_grid_m[d];
}
for(int i=0; i < projected_grid_size; i++){
std::vector<float> point_indices(8);
point_indices[0] = i;
point_indices[1] = i + 1;
point_indices[2] = i + projected_grid_m[0];
point_indices[3] = i + projected_grid_m[0] + 1;
point_indices[4] = i + projected_grid_m[0] * projected_grid_m[1];
point_indices[5] = i + projected_grid_m[0] * projected_grid_m[1] + 1;
point_indices[6] = i + (projected_grid_m[1] + 1) * projected_grid_m[0];
point_indices[7] = i + (projected_grid_m[1] + 1) * projected_grid_m[0] + 1;
std::vector<float> vx_corners(8 * projected_grid_m.size());
for(int vn = 0; vn < 8; vn++) {
for(int d = 0; d < projected_grid_m.size(); d++) {
vx_corners[vn + d * 8] = projected_grid_vx[point_indices[vn] + d * projeted_grid_size];
}
}
polygon_2d proj_voxel;
for(int vn = 0; vn < 8; vn++) {
point_2d poly_pt(vx_corners[2 * vn], vx_corners[2 * vn + 1]);
boost::geometry::append(proj_voxel, poly_pt);
}
boost::geometry::correct(proj_voxel);
polygon_2d proj_voxel_hull;
boost::geometry::convex_hull(proj_voxel, proj_voxel_hull);
box_2d bb_proj_vox;
boost::geometry::envelope(proj_voxel_hull, bb_proj_vox);
point_2d min_pt = bb_proj_vox.min_corner();
point_2d max_pt = bb_proj_vox.max_corner();
// then get min and max indices of intersecting bins
std::vector<float> min_idx(projected_grid_m.size() - 1),
max_idx(projected_grid_m.size() - 1);
// compute min and max indices of incidence on the pixel grid
// this is easy assuming you have a regular grid of pixels
min_idx[0] = std::min( (float) std::max( std::floor((min_pt.get<0>() - pixel_grid_omega[0]) / h[0] - 0.5 ), 0.), pixel_grid_m[0]-1);
min_idx[1] = std::min( (float) std::max( std::floor((min_pt.get<1>() - pixel_grid_omega[2]) / h[1] - 0.5 ), 0.), pixel_grid_m[1]-1);
max_idx[0] = std::min( (float) std::max( std::floor((max_pt.get<0>() - pixel_grid_omega[0]) / h[0] + 0.5 ), 0.), pixel_grid__m[0]-1);
max_idx[1] = std::min( (float) std::max( std::floor((max_pt.get<1>() - pixel_grid_omega[2]) / h[1] + 0.5 ), 0.), pixel_grid_m[1]-1);
// iterate only over pixels which intersect the projected voxel
for(int iy = min_idx[1]; iy <= max_idx[1]; iy++) {
for(int ix = min_idx[0]; ix <= max_idx[0]; ix++) {
int idx = ix + iy * pixel_grid_size[0]; // `first' index of pixel corner point
polygon_2d pix_poly;
for(int pn = 0; pn < 4; pn++) {
point_2d pix_corner_pt(
pixel_grid_vx[idx + pn % 2 + (pn / 2) * pixel_grid_m[0]],
pixel_grid_vx[idx + pn % 2 + (pn / 2) * pixel_grid_m[0] + pixel_grid_size]
);
boost::geometry::append(pix_poly, pix_corner_pt);
}
boost::geometry::correct( pix_poly );
//make this into a convex hull since the order of the point may be any
polygon_2d pix_hull;
boost::geometry::convex_hull(pix_poly, pix_hull);
// on to perform intersection
std::vector<polygon_2d> vox_pix_ints;
polygon_2d vox_pix_int;
try {
boost::geometry::intersection(proj_voxel_hull, pix_hull, vox_pix_ints);
} catch ( std::exception e ) {
// skip since these may coincide at a point or line
continue;
}
// both are convex so only one intersection expected
vox_pix_int = vox_pix_ints[0];
overlaps[i + idx * projected_grid_size] = boost::geometry::area(vox_pix_int);
}
} // end intersection for
} //end projected_voxel for
return overlaps;
}

You could create the ratio of polygon to bounding box:
This could be done computationally once to arrive at an avgerage poly area to BB ratio R constant.
Or you could do it with geometry using a circle bounded by its BB Since your using only projected polyhedron:
R = 0.0;
count = 0;
for (each poly) {
count++;
R += polyArea / itsBoundingBoxArea;
}
R = R/count;
Then calculate the summation of intersection of bounding boxes.
Sbb = 0.0;
for (box1, box2 where box1.isIntersecting(box2)) {
Sbb += box1.intersect(box2);
}
Then:
Approximation = R * Sbb
All of this would not work if concave polys were allowed. Because a concave poly can occupy less than 1% of it's bounding box. You will still have to find the convex hull.
Alternatively, If you can find the polygons area quicker than its hull, you could use the actual computed average poly area. This would give you a decent approximation as well while avoiding both poly intersection and wrapping.

Hm, the problem seems similar to doing "collision-detection" i game-engines. Or "potentially visible sets".
While I don't know much about the current state-of-the-art, i remember an optimization was to enclose objects in spheres, since checking overlaps between spheres (or circles in 2D) is really cheap.
In order to speed-up checks for collisions, objects were often put into search-structures (e.g. a sphere-tree (circle-tree in 2D case)). Basically organizing the space into a hierarchical structure, to make queries for overlaps fast.
So basically my suggestion boils down to: Try looking at algorithms for collision-detection i game-engines.

Assumption
I'm assuming that you mean "intersections" and not intersection. Moreover, It is not the expected use case that most of the individual polys from M and N will overlap at the same time. If this assumption is true then:
Answer
The way this is done with 2D game engines is by having a scene graph where every object has a bounding box. Then place all the the polygons into a node in an quadtree according to their location determined by bounding box. Then the task becomes parallel because each node can be processed separately for intersection.
Here is the wiki for quadtree:
Quadtree Wiki
An octree could be used when in 3D.
It actually doesn't even have to be a octree. You could get the same results with any space partition. You could find the maximum separation of polys (lets call it S). And create say S/10 space partitions. Then you would have 10 separate spaces to execute in parallel. Not only would it be concurrent, but It would no longer be M * N time since not every poly must be compared against every other poly.

Computing Rand error efficiently

I'm trying to compare two image segmentations to one another.
In order to do so, I transform each image into a vector of unsigned short values, and calculate the rand error,
according to the following formula:
where:
Here is my code (the rand error calculation part):
cv::Mat im1,im2;
//code for acquiring data for im1, im2
//code for copying im1(:)->v1, im2(:)->v2
int N = v1.size();
double a = 0;
double b = 0;
for (int i = 0; i <N; i++)
{
for (int j = 0; j < i; j++)
{
unsigned short l1 = v1[i];
unsigned short l2 = v1[j];
unsigned short gt1 = v2[i];
unsigned short gt2 = v2[j];
if (l1 == l2 && gt1 == gt2)
{
a++;
}
else if (l1 != l2 && gt1 != gt2)
{
b++;
}
}
}
double NPairs = (double)(N*N)/2;
double res = (a + b) / NPairs;
My problem is that length of each vector is 307,200.
Therefore the total number of iterations is 47,185,920,000.
It makes the running time of the entire process is very slow (a few minutes to compute).
Do you have any idea how can I improve it?
Thanks!

Let's assume that we have P distinct labels in the first image and Q distinct labels in the second image. The key observation for efficient computation of Rand error, also called Rand index, is that the number of distinct labels is usually much smaller than the number of pixels (i.e. P, Q << n).
Step 1
First, pre-compute the following auxiliary data:
the vector s1, with size P, such that s1[p] is the number of pixel positions i with v1[i] = p.
the vector s2, with size Q, such that s2[q] is the number of pixel positions i with v2[i] = q.
the matrix M, with size P x Q, such that M[p][q] is the number of pixel positions i with v1[i] = p and v2[i] = q.
The vectors s1, s2 and the matrix M can be computed by passing once through the input images, i.e. in O(n).
Step 2
Once s1, s2 and M are available, a and b can be computed efficiently:
This holds because each pair of pixels (i, j) that we are interested in has the property that both its pixels have the same label in image 1, i.e. v1[i] = v1[j] = p; and the same label in image 2, i.e. v2[i] = v2[ j ] = q. Since v1[i] = p and v2[i] = q, the pixel i will contribute to the bin M[p][q], and the same does the pixel j. Therefore, for each combination of labels p and q we need to consider the number of pairs of pixels that fall into the M[p][q] bin, and then to sum them up for all possible labels p and q.
Similarly, for b we have:
Here, we are counting how many pairs are formed with one of the pixels falling into the bin M[p][q]. Such a pixel can form a good pair with each pixel that is falling into a bin M[p'][q'], with the condition that p != p' and q != q'. Summing over all such M[p'][q'] is equivalent to subtracting from the sum over the entire matrix M (this sum is n) the sum on row p (i.e. s1[p]) and the sum on the column q (i.e. s2[q]). However, after subtracting the row and column sums, we have subtracted M[p][q] twice, and this is why it is added at the end of the expression above. Finally, this is divided by 2 because each pair was counted twice (once for each of its two constituent pixels as being part of a bin M[p][q] in the argument above).
The Rand error (Rand index) can now be computed as:
The overall complexity of this method is O(n) + O(PQ), with the first term usually being the dominant one.

After reading your comments, I tried the following approach:
calculate the intersections for each possible pair of values.
use the intersection results to calculate the error.
I performed the calculation straight on the cv::Mat objects, without converting them into std::vector objects. That gave me the ability to use opencv functions and achieve a faster runtime.
Code:
double a = 0, b = 0; //init variables
//unique function finds all the unique value of a matrix, with an optional input mask
std::set<unsigned short> m1Vals = unique(mat1);
for (unsigned short s1 : m1Vals)
{
cv::Mat mask1 = (mat1 == s1);
std::set<unsigned short> m2ValsInRoi = unique(mat2, mat1==s1);
for (unsigned short s2 : m2ValsInRoi)
{
cv::Mat mask2 = mat2 == s2;
cv::Mat andMask = mask1 & mask2;
double andVal = cv::countNonZero(andMask);
a += (andVal*(andVal - 1)) / 2;
b += ((double)cv::countNonZero(andMask) * (double)cv::countNonZero(~mask1 & ~mask2)) / 2;
}
}
double NPairs = (double)(N*(N-1)) / 2;
double res = (a + b) / NPairs;
The runtime is now reasonable (only a few milliseconds vs a few minutes), and the output is the same as the code above.
Example:
I ran the code on the following matrices:
//mat1 = [1 1 2]
cv::Mat mat1 = cv::Mat::ones(cv::Size(3, 1), CV_16U);
mat1.at<ushort>(cv::Point(2, 0)) = 2;
//mat2 = [1 2 1]
cv::Mat mat2 = cv::Mat::ones(cv::Size(3, 1), CV_16U);
mat2.at<ushort>(cv::Point(1, 0)) = 2;
In this case a = 0 (no matching pairs correspondence), and b=1(one matching pair for i=2,j=3). The algorithm result:
a = 0
b = 1
NPairs = 3
result = 0.3333333
Thank you all for your help!

Optimized float Blur variations

I am looking for optimized functions in c++ for calculating areal averages of floats. the function is passed a source float array, a destination float array (same size as source array), array width and height, "blurring" area width and height.
The function should "wrap-around" edges for the blurring/averages calculations.
Here is example code that blur with a rectangular shape:
/*****************************************
* Find averages extended variations
*****************************************/
void findaverages_ext(float *floatdata, float *dest_data, int fwidth, int fheight, int scale, int aw, int ah, int weight, int xoff, int yoff)
{
printf("findaverages_ext scale: %d, width: %d, height: %d, weight: %d \n", scale, aw, ah, weight);
float total = 0.0;
int spos = scale * fwidth * fheight;
int apos;
int w = aw;
int h = ah;
float* f_temp = new float[fwidth * fheight];
// Horizontal
for(int y=0;y<fheight ;y++)
{
Sleep(10); // Do not burn your processor
total = 0.0;
// Process entire window for first pixel (including wrap-around edge)
for (int kx = 0; kx <= w; ++kx)
if (kx >= 0 && kx < fwidth)
total += floatdata[y*fwidth + kx];
// Wrap
for (int kx = (fwidth-w); kx < fwidth; ++kx)
if (kx >= 0 && kx < fwidth)
total += floatdata[y*fwidth + kx];
// Store first window
f_temp[y*fwidth] = (total / (w*2+1));
for(int x=1;x<fwidth ;x++) // x width changes with y
{
// Substract pixel leaving window
if (x-w-1 >= 0)
total -= floatdata[y*fwidth + x-w-1];
// Add pixel entering window
if (x+w < fwidth)
total += floatdata[y*fwidth + x+w];
else
total += floatdata[y*fwidth + x+w-fwidth];
// Store average
apos = y * fwidth + x;
f_temp[apos] = (total / (w*2+1));
}
}
// Vertical
for(int x=0;x<fwidth ;x++)
{
Sleep(10); // Do not burn your processor
total = 0.0;
// Process entire window for first pixel
for (int ky = 0; ky <= h; ++ky)
if (ky >= 0 && ky < fheight)
total += f_temp[ky*fwidth + x];
// Wrap
for (int ky = fheight-h; ky < fheight; ++ky)
if (ky >= 0 && ky < fheight)
total += f_temp[ky*fwidth + x];
// Store first if not out of bounds
dest_data[spos + x] = (total / (h*2+1));
for(int y=1;y< fheight ;y++) // y width changes with x
{
// Substract pixel leaving window
if (y-h-1 >= 0)
total -= f_temp[(y-h-1)*fwidth + x];
// Add pixel entering window
if (y+h < fheight)
total += f_temp[(y+h)*fwidth + x];
else
total += f_temp[(y+h-fheight)*fwidth + x];
// Store average
apos = y * fwidth + x;
dest_data[spos+apos] = (total / (h*2+1));
}
}
delete f_temp;
}
What I need is similar functions that for each pixel finds the average (blur) of pixels from shapes different than rectangular.
The specific shapes are: "S" (sharp edges), "O" (rectangular but hollow), "+" and "X", where the average float is stored at the center pixel on destination data array. Size of blur shape should be variable, width and height.
The functions does not need to be pixelperfect, only optimized for performance. There could be separate functions for each shape.
I am also happy if anyone can tip me of how to optimize the example function above for rectangluar blurring.

What you are trying to implement are various sorts of digital filters for image processing. This is equivalent to convolving two signals where the 2nd one would be the filter's impulse response. So far, you regognized that a "rectangular average" is separable. By separable I mean, you can split the filter into two parts. One that operates along the X axis and one that operates along the Y axis -- in each case a 1D filter. This is nice and can save you lots of cycles. But not every filter is separable. Averaging along other shapres (S, O, +, X) is not separable. You need to actually compute a 2D convolution for these.
As for performance, you can speed up your 1D averages by properly implementing a "moving average". A proper "moving average" implementation only requires a fixed amount of little work per pixel regardless of the averaging "window". This can be done by recognizing that neighbouring pixels of the target image are computed by an average of almost the same pixels. You can reuse these sums for the neighbouring target pixel by adding one new pixel intensity and subtracting an older one (for the 1D case).
In case of arbitrary non-separable filters your best bet performance-wise is "fast convolution" which is FFT-based. Checkout www.dspguide.com. If I recall correctly, there is even a chapter on how to properly do "fast convolution" using the FFT algorithm. Although, they explain it for 1-dimensional signals, it also applies to 2-dimensional signals. For images you have to perform 2D-FFT/iFFT transforms.

To add to sellibitze's answer, you can use a summed area table for your O, S and + kernels (not for the X one though). That way you can convolve a pixel in constant time, and it's probably the fastest method to do it for kernel shapes that allow it.
Basically, a SAT is a data structure that lets you calculate the sum of any axis-aligned rectangle. For the O kernel, after you've built a SAT, you'd take the sum of the outer rect's pixels and subtract the sum of the inner rect's pixels. The S and + kernels can be implemented similarly.
For the X kernel you can use a different approach. A skewed box filter is separable:
You can convolve with two long, thin skewed box filters, then add the two resulting images together. The center of the X will be counted twice, so will you need to convolve with another skewed box filter, and subtract that.
Apart from that, you can optimize your box blur in many ways.
Remove the two ifs from the inner loop by splitting that loop into three loops - two short loops that do checks, and one long loop that doesn't. Or you could pad your array with extra elements from all directions - that way you can simplify your code.
Calculate values like h * 2 + 1 outside the loops.
An expression like f_temp[ky*fwidth + x] does two adds and one multiplication. You can initialize a pointer to &f_temp[ky*fwidth] outside the loop, and just increment that pointer in the loop.
Don't do the division by h * 2 + 1 in the horizontal step. Instead, divide by the square of that in the vertical step.

How do I draw a cylinder in OpenTK(.Glu.Cylinder)?

How do I draw a cylinder with OpenGL in OpenTK?

Sample code from an older project of mine. This creates an "uncapped" cylinder (top and bottom are empty).
int segments = 10; // Higher numbers improve quality
int radius = 3; // The radius (width) of the cylinder
int height = 10; // The height of the cylinder
var vertices = new List<Vector3>();
for (double y = 0; y < 2; y++)
{
for (double x = 0; x < segments; x++)
{
double theta = (x / (segments - 1)) * 2 * Math.PI;
vertices.Add(new Vector3()
{
X = (float)(radius * Math.Cos(theta)),
Y = (float)(height * y),
Z = (float)(radius * Math.Sin(theta)),
});
}
}
var indices = new List<int>();
for (int x = 0; x < segments - 1; x++)
{
indices.Add(x);
indices.Add(x + segments);
indices.Add(X + segments + 1);
indices.Add(x + segments + 1);
indices.Add(x + 1);
indices.Add(x);
}
You can now render the cylinder like this:
GL.Begin(BeginMode.Triangles);
foreach (int index in indices)
GL.Vertex3(vertices[index]);
GL.End();
You can also upload vertices and indices into a vertex buffer object to improve performance.

Generating the geometry for a cylinder is quite simple (let's consider a Z-aligned cylinder). Let me use pseudocode:
points = list of (x,y,z)
where x = sin(a)*RADIUS, y = cos(a)*RADIUS, z = b,
for each a in [0..2*PI) with step StepA,
for each b in [0..HEIGHT] with step StepB
About the indices: Let us assume N equal to the number of "levels" or "slices" of the cylinder (which depends on HEIGHT and StepB) and M equal to the number of points on every "slice" (which depends on StepA).
The cylinder contains some quads, each spanning 2 neighbouring slices, so the indices would look like:
indices = list of (a,b,c,d)
where a = M * slice + point,
b = M * slice + (point+1) % M,
c = (M+1) * slice + (point+1) % M,
d = (M+1) * slice + point
for each slice in [0..N-2]
for each point in [0..M-1]
If you need normals for the cylinder, they are simple to generate:
normals = (x/RADIUS,y/RADIUS,0)
for each (x,y,z) in points
That's it for the round part of the cylinder, you might also want the "caps" but I believe they are easy to do.
I'll leave the fun part of translating my pseudocode into your language of choice for you. :)
The rest is to create/bind the VBO, load up the geometry, set pointers, use your shader of choice and call glDrawArrays(...) - any OpenGL 3 tutorial should cover this; are you familiar with that part?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js