Running error of SSE2 code in VS2013 - c++

I have the following SIMD code trying to run in vs2013. It can be well compiled but cannot run. Anyone knows why?
#include <cstdio>
#include <xmmintrin.h>
int main()
{
const size_t num = 7;
float a[num] = { 1, 2, 3, 4, 5, 6, 7 };
float b[num] = { 1, -1, -2, 1, -3, -2, 5 };
float c[num];
__m128 A, B, C;
A = _mm_load_ps(&a[0]); // <== crash here.
B = _mm_load_ps(&b[0]);
C = _mm_add_ps(A, B);
_mm_store_ps(&c[0], C);
return 0;
}

The address being loaded from or stored to using these intrinsics needs to be 16 byte aligned (divisible by 16). See
https://msdn.microsoft.com/en-us/library/zzd50xxt(v=vs.90).aspx
You should declare the variables a,b and c like this:
__declspec(align(16)) float a[num] = { 1, 2, 3, 4, 5, 6, 7 };

Related

Solving a C2039 error and a C3861 error using std::minmax_element

I'm newer to C++.
I've written the following line in a test function inside a standard VS2019 test project:
auto minAndMaxYards = std::minmax_element(simResults.begin(), simResults.end());
It yields both C2039 and C3861 errors for the minmax_element function even though intellisense recognizes it as a member of std, and I can peek its definition. I can't figure out what I'm missing. I've included the algorithm file as well at the top of the test project.
Is there a project setting that I don't have right?
Full error text:
C2039 'minmax_element': is not a member of 'std'
C3861 'minmax_element': identifier not found
Edit, including code in case it helps
#include <algorithm>
#include "pch.h"
#include "CppUnitTest.h"
#include "Playbook.h"
#include "PlaySim.h"
using namespace Microsoft::VisualStudio::CppUnitTestFramework;
std::string output;
using std::vector;
namespace FootballDynastyV20UnitTest
{
TEST_CLASS(PlaybookIO)
{
public:
TEST_METHOD(setAndGetPlayblookName)
{
Playbook testPlays;
string testName = "testPlays";
testPlays.setName(testName);
string name = testPlays.getName();
Assert::IsTrue(name == testName);
}
TEST_METHOD(addPlayIncrementsPlayNum)
{
Playbook testPlays;
string playName = "Play1";
int numDLine = 4;
int numLB = 3;
vector<int> playerPos = { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 19 };
vector<int> playerStance = { 2, 1, 0, 0, 1, 2, 2, 3, 2, 3, 3 };
vector<int> playerBlitzGaps = { 0, 3, 0, 0, 3, 0, 0, 0, 0, 0, 0 };
testPlays.setName("testPlays");
testPlays.addPlay(playName, numDLine, numLB, playerPos, playerStance, playerBlitzGaps);
Assert::IsTrue(testPlays.getNumPlays() == 1);
}
TEST_METHOD(saveAndLoadPlayblook)
{
Playbook testPlays;
Playbook testPlaysLoad;
string playName = "Play1";
int numDLine = 4;
int numLB = 3;
vector<int> playerPos = { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 19 };
vector<int> playerStance = { 2, 1, 0, 0, 1, 2, 2, 3, 2, 3, 3 };
vector<int> playerBlitzGaps = { 0, 3, 0, 0, 3, 0, 0, 0, 0, 0, 0 };
testPlays.setName("testPlays");
testPlays.addPlay(playName, numDLine, numLB, playerPos, playerStance, playerBlitzGaps);
testPlays.save();
testPlaysLoad.load(testPlays.getName());
Assert::IsTrue(testPlays == testPlaysLoad);
}
};
TEST_CLASS(PlaySimTesting)
{
public:
TEST_METHOD(playSimReturnsYdsGainedBetweenNegative10And40)
{
PlaySim newPlay;
int numSims = 2000;
int lwrBound = -10;
int uprBound = 40;
vector<int> simResults;
for (int i = 0; i < numSims; i++)
{
newPlay.Run();
simResults.push_back(newPlay.GetYds());
}
auto minAndMaxYards = std::minmax_element(simResults.begin(), simResults.end());
int actualMin = *minAndMaxYards.first;
int actualMax = *minAndMaxYards.second;
int yds = newPlay.GetYds();
Assert::IsTrue((actualMin >= lwrBound) && (actualMax <= uprBound));
}
};
}
Move #include "pch.h" to the top of the file. When using precompiled headers, the compiler ignores everything above this line. In your example, that would be #include <algorithm>, that's why std::minmax_element is not found.

How to define a C++ function in VTK

I'm new with C++ and VTK. I'm trying to get cells ID into a rectilinearGrid basic example. I'm using this code, but the compiler say that is wrong with the error that I wrote in comment
#include <vtkActor.h>
#include <vtkCamera.h>
#include <vtkFloatArray.h>
#include <vtkNamedColors.h>
#include <vtkNew.h>
#include <vtkPolyDataMapper.h>
#include <vtkProperty.h>
#include <vtkRectilinearGrid.h>
#include <vtkRectilinearGridGeometryFilter.h>
#include <vtkRenderWindow.h>
#include <vtkRenderWindowInteractor.h>
#include <vtkRenderer.h>
#include <array>
int main()
{
vtkNew<vtkNamedColors> colors;
std::array<int, 16> x = {
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}};
std::array<int, 16> y = {
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}};
std::array<int, 16> z = {
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}};
// Create a rectilinear grid by defining three arrays specifying the
// coordinates in the x-y-z directions.
vtkNew<vtkFloatArray> xCoords;
for (auto&& i : x)
{
xCoords->InsertNextValue(i);
}
vtkNew<vtkFloatArray> yCoords;
for (auto&& i : y)
{
yCoords->InsertNextValue(i);
}
vtkNew<vtkFloatArray> zCoords;
for (auto&& i : z)
{
zCoords->InsertNextValue(i);
}
// The coordinates are assigned to the rectilinear grid. Make sure that
// the number of values in each of the XCoordinates, YCoordinates,
// and ZCoordinates is equal to what is defined in SetDimensions().
//
vtkNew<vtkRectilinearGrid> rgrid;
rgrid->SetDimensions(int(x.size()), int(y.size()), int(z.size()));
rgrid->SetXCoordinates(xCoords);
rgrid->SetYCoordinates(yCoords);
rgrid->SetZCoordinates(zCoords);
vtkCell* GetCell(vtkRectilinearGrid * rgrid, int i, int j, int k) //I SHOULD INSERT IN HERE ";" FOR
{ //CLOSING THE STATEMENT. BUT IN
int dims[3]; //THIS WAY THE FUNCTION PARAMETER
rgrid->GetDimensions(dims); // BEHIND WOULDN'T BE CONNECTED.
if (i < 0 || i > dims[0] - 1 ||
j < 0 || j > dims[1] - 1 ||
k < 0 || k > dims[2] - 1)
{
return NULL; // out of bounds!
}
int pos[3];
pos[0] = i;
pos[1] = j;
pos[2] = k;
vtkIdType id;
id = vtkStructuredData::ComputeCellId(dims, pos);
return rgrid->GetCell(id);
};
// Extract a plane from the grid to see what we've got.
vtkNew<vtkRectilinearGridGeometryFilter> plane;
plane->SetInputData(rgrid);
plane->SetExtent(0, 46, 16, 16, 0, 43);
vtkNew<vtkPolyDataMapper> rgridMapper;
rgridMapper->SetInputConnection(plane->GetOutputPort());
vtkNew<vtkActor> wireActor;
wireActor->SetMapper(rgridMapper);
wireActor->GetProperty()->SetRepresentationToWireframe();
wireActor->GetProperty()->SetColor(colors->GetColor3d("Black").GetData());
// Create the usual rendering stuff.
vtkNew<vtkRenderer> renderer;
vtkNew<vtkRenderWindow> renWin;
renWin->AddRenderer(renderer);
vtkNew<vtkRenderWindowInteractor> iren;
iren->SetRenderWindow(renWin);
renderer->AddActor(wireActor);
renderer->SetBackground(1, 1, 1);
renderer->ResetCamera();
renderer->GetActiveCamera()->Elevation(30.0);
renderer->GetActiveCamera()->Azimuth(15.0);
renderer->GetActiveCamera()->Zoom(1.0);
renderer->SetBackground(colors->GetColor3d("Beige").GetData());
renWin->SetSize(600, 600);
// interact with data
renWin->Render();
iren->Start();
return EXIT_SUCCESS;
}
How could be fixed?
UPDATE 1: I have inserted an image of the compiling error. Should be inserted ";" for closing the statement before {}
UPDATE 2: the exact error is
Errore (attivo) E0065 expected ';' RGrid C:\vtk\VTK-8.2.0\Examples\DataManipulation\Cxx\RGrid.cxx 73
I'm using Visual Studio. I have tried to drop the last ";" but nothing change
UPDATE 3: I have uploaded all the code
You have defined your GetCell function inside the body of the main function, which is not allowed in C++. Only a declaration would be allowed inside the body, hence the compiler expects a semicolon after the function header.
Move the whole GetCell function block outside the main function. If that leads to problems you cannot solve ask another question about them.

AVX calculation precision

I wrote a program to display the mandelbrot set. To speed it up, I used AVX (really AVX2) instructions through the <immintrin.h> header.
The problem is: The result of the AVX computation (with double precision) has artifacts, and it differs to the result when computed using "normal" doubles.
In detail, there is a function getIterationCount which calculates the number of iterations until the mandelbrot sequence exceeds 4, or assumes the point is included in the set if the sequences does not exceed 4 during the first N steps.
The code looks like this:
#include "stdafx.h"
#include <iostream>
#include <complex>
#include <immintrin.h>
class MandelbrotSet {
public:
int getIterationCount(const std::complex<double>, const int) const noexcept;
__m256i getIterationCount(__m256d cReal, __m256d cIm, unsigned maxIterations) const noexcept;
};
inline int MandelbrotSet::getIterationCount(const std::complex<double> c, const int maxIterations) const noexcept
{
double currentReal = 0;
double currentIm = 0;
double realSquare;
double imSquare;
for (int i = 0; i < maxIterations; ++i) {
realSquare = currentReal * currentReal;
imSquare = currentIm * currentIm;
currentIm = 2 * currentReal * currentIm + c.imag();
currentReal = realSquare - imSquare + c.real();
if (realSquare + imSquare >= 4) {
return i;
}
}
return -1;
}
const __m256i negone = _mm256_set_epi64x(-1, -1, -1, -1);
const __m256i one = _mm256_set_epi64x(1, 1, 1, 1);
const __m256d two = _mm256_set_pd(2, 2, 2, 2);
const __m256d four = _mm256_set_pd(4, 4, 4, 4);
//calculates for i = 0,1,2,3
//output[i] = if ctrl[i] == 0b11...1 then onTrue[i] else onFalse[i]
inline __m256i _mm256_select_si256(__m256i onTrue, __m256i onFalse, __m256i ctrl) {
return _mm256_or_si256(_mm256_and_si256(onTrue, ctrl), _mm256_and_si256(onFalse, _mm256_xor_si256(negone, ctrl)));
}
inline __m256i MandelbrotSet::getIterationCount(__m256d cReal, __m256d cIm, unsigned maxIterations) const noexcept {
__m256i result = _mm256_set_epi64x(0, 0, 0, 0);
__m256d currentReal = _mm256_set_pd(0, 0, 0, 0);
__m256d currentIm = _mm256_set_pd(0, 0, 0, 0);
__m256d realSquare;
__m256d imSquare;
for (unsigned i = 0; i <= maxIterations; ++i)
{
realSquare = _mm256_mul_pd(currentReal, currentReal);
imSquare = _mm256_mul_pd(currentIm, currentIm);
currentIm = _mm256_mul_pd(currentIm, two);
currentIm = _mm256_fmadd_pd(currentIm, currentReal, cIm);
currentReal = _mm256_sub_pd(realSquare, imSquare);
currentReal = _mm256_add_pd(currentReal, cReal);
__m256i isSmaller = _mm256_castpd_si256(_mm256_cmp_pd(_mm256_add_pd(realSquare, imSquare), four, _CMP_LE_OS));
result = _mm256_select_si256(_mm256_add_epi64(one, result), result, isSmaller);
//if (i % 10 == 0 && !isSmaller.m256i_i64[0] && !isSmaller.m256i_i64[1] && !isSmaller.m256i_i64[2] && !isSmaller.m256i_i64[3]) return result;
}
return result;
}
using namespace std;
int main() {
MandelbrotSet m;
std::complex<double> point(-0.14203954214360026, 1);
__m256i result_avx = m.getIterationCount(_mm256_set_pd(-0.14203954214360026, -0.13995837669094691, -0.13787721123829355, -0.13579604578563975),
_mm256_set_pd(1, 1, 1, 1), 2681);
int result_normal = m.getIterationCount(point, 2681);
cout << "Normal: " << result_normal << ", AVX: " << result_avx.m256i_i64[0] << ", at point " << point << endl;
return 0;
}
When I run this code, I get the following result:
(The point -0.14203954214360026 + i is chosen intentionally, because both methods return the same/almost the same value in most points)
Normal: 13, AVX: 20, at point (-0.14204,1)
A difference of 1 might be acceptable, but a difference of 7 seems quite big, since both methods use double precision.
Have AVX instructions a lower precision than "normal" instruction? If not, why do both results differ so much?
I use MS Visual Studio 2017, MS Visual C++ 2017 15.6 v14.13 141 and my computer has a i7-7700K Processor. The Project is compiled for x64. The result is the same if it is compiler with no or full optimization.
The rendered results look like this:
AVX:
Normal
The values of realSquare and imSquare during the loop are as follows:
0, 0, 0
1, 0.0201752, 1
2, 1.25858, 0.512543
3, 0.364813, 0.367639
4, 0.0209861, 0.0715851
5, 0.0371096, 0.850972
6, 0.913748, 0.415495
7, 0.126888, 0.0539759
8, 0.00477863, 0.696364
9, 0.69493, 0.782567
10, 0.0527514, 0.225526
11, 0.0991077, 1.48388
12, 2.33115, 0.0542994
13, 4.5574, 0.0831971
In the AVX loop the values are:
0, 0, 0
1, 0.0184406, 1
2, 1.24848, 0.530578
3, 0.338851, 0.394109
4, 0.0365017, 0.0724287
5, 0.0294888, 0.804905
6, 0.830307, 0.478687
7, 0.04658, 0.0680608
8, 0.024736, 0.78746
9, 0.807339, 0.519651
10, 0.0230712, 0.0872787
11, 0.0400014, 0.828561
12, 0.854433, 0.404359
13, 0.0987707, 0.0308286
14, 0.00460416, 0.791455
15, 0.851277, 0.773114
16, 0.00332154, 0.387519
17, 0.270393, 1.14866
18, 1.02832, 0.0131355
19, 0.773319, 1.51892
20, 0.776852, 10.0336
Reversing the order of the arguments passed to _mm256_set_pd solves the problem.
If you inspect the value of cReal in the debugger you'll see that the first element is set to -0.13579604578563975 not -0.14203954214360026.

GEOS OverlayOp intersection operation

I am using GEOS 3.6.2 to compute an intersection between two polygons. I was able to construct my polygons, but when I try to compute the intersection it won't work.
Compiling my program in Debug mode, I get the error message:
The inferior stopped because it received a signal from the operating
system.
Signal name : SIGSEG
Signal meaning : Segmentation fault
Any idea where I'm wrong?
Here is my code:
#include <geos/geom/Polygon.h>
#include <geos/geom/LinearRing.h>
#include <geos/geom/CoordinateSequenceFactory.h>
#include <geos/geom/GeometryFactory.h>
#include <geos/geom/Geometry.h>
#include <geos/operation/overlay/OverlayOp.h>
#include <iostream>
#include <array>
////////////////////////////////////////////////////////////////////////////////
geos::geom::Polygon* MakePoly(std::vector<std::vector<int>> const& polyCoords)
{
geos::geom::GeometryFactory* factory = geos::geom::GeometryFactory::create().get();
geos::geom::CoordinateSequence* temp = factory->getCoordinateSequenceFactory()->create((std::size_t) 0, 0);
std::vector<std::vector<int>>::const_iterator it_x = polyCoords.begin();
int size = it_x->size();
for (int i=0; i<size; i++)
{
temp->add(geos::geom::Coordinate(polyCoords[0][i], polyCoords[1][i]));
}
geos::geom::LinearRing *shell=factory->createLinearRing(temp);
//NULL in this case could instead be a collection of one or more holes
//in the interior of the polygon
return factory->createPolygon(shell,NULL);
}
////////////////////////////////////////////////////////////////////////////////
int main()
{
// Create geometry.
std::vector<std::vector<int>> polyCoords1 = {
{1, 1, 2, 2, 1, 1, 4, 5, 4, 1},
{1, 2, 2, 4, 4, 5, 5, 3, 1, 1}
};
geos::geom::Polygon* poly1 = MakePoly(polyCoords1);
std::vector<std::vector<int>> polyCoords2 = {
{4, 4, 6, 6, 4},
{1, 5, 5, 1, 1}
};
geos::geom::Polygon* poly2 = MakePoly(polyCoords2);
// Actually perform the operation.
geos::operation::overlay::OverlayOp intersection(poly1, poly2);
// Extracting the geometry of the intersection (position of the error).
geos::geom::Geometry* intersectionGeo = intersection.getResultGeometry( geos::operation::overlay::OverlayOp::OpCode::opINTERSECTION );
std::cout<<intersectionGeo->getArea()<<std::endl;
}
The problem in your code is getting the GeometryFactory pointer.
geos::geom::GeometryFactory::create() returns a smart pointer (std::unique_ptr) so after this line:
geos::geom::GeometryFactory* factory = geos::geom::GeometryFactory::create().get();
The unique_ptr returned by create is disposed.
Change that line with:
geos::geom::GeometryFactory::Ptr factory = geos::geom::GeometryFactory::create();
And the code works.

Member List implementation of different class not working

I can't seem to get the implementation of my member list correct. I want to DEFAULT initialize my Set members nyX and nyY, however I keep getting an error.
class Location
{
public:
vector<int> nyXv = { 0, 1, 2, 3, 4, 5};
vector<int> nyYv = { 0, 1, 2, 3, 4, 5 };
Set nyX(vector<int>);
Set nyY(vector<int>);
Location();
~Location();
};
Location::Location()
:nyX(nyXv), nyY(nyYv)
{
}
Look at this example
You can initialize you vectors like this:
class Location
{
public:
vector<int> nyXv;// = { 0, 1, 2, 3, 4, 5};
vector<int> nyYv;// = { 0, 1, 2, 3, 4, 5 };
///...
Location();
~Location();
};
static const int arrX[] = {0, 1, 2, 3, 4, 5};
static const int arrY[] = {0, 1, 2, 3, 4, 5};
Location::Location()
:nyXv(arrX, arrX + sizeof(arrX) / sizeof(arrX[0]) )
,nyYv(arrY, arrY + sizeof(arrY) / sizeof(arrY[0]))
{
}
P.S. Of course there are many ways to improve this code but it should give you an idea