Memory access error with _mm512_i64gather_pd() - c++

I am trying to use a very simple example of the AVX-512 gather instructions:
double __attribute__((aligned(64))) array3[17] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0,
9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0,
17.0};
int __attribute__((aligned(64))) i_index_ar[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};
__m512i i_index = _mm512_load_epi64(i_index_ar);
__m512d a7AVX = _mm512_i64gather_pd(i_index, &array3[0], 1);
Unfortunetly, my last call to _mm512_i64gather_pd results in an memory access error (memory dumped).
Error message in German: Speicherzugriffsfehler (Speicherabzug geschrieben)
I am using Intel Xeon Phi (KNL) 7210.
edit: The error here was, that I was using 32 bit integers with 64bit load instructions and scale in _mm512_i64gather_pd has to be 8 or sizeof(double).

I think you need to set scale to sizeof(double), not 1.
Change:
__m512d a7AVX = _mm512_i64gather_pd(i_index, &array3[0], 1);
to:
__m512d a7AVX = _mm512_i64gather_pd(i_index, &array3[0], sizeof(double));
See also: this question and its answers for a fuller explanation of Intel SIMD gathered loads and their usage.
—
Another problem: your indices need to be 64 bit ints, so change:
int __attribute__((aligned(64))) i_index_ar[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, ...
to:
int64_t __attribute__((aligned(64))) i_index_ar[16] = {1, 2, 3, 4, 5, 6, 7, 8, 9, ...

Related

avx2 register bits reverse

Is there a (fast) way to perform bits reverse of 32bit int values within avx2 register?
E.g.
_mm256_set1_epi32(2732370386);
<do something here>
//binary: 10100010110111001010100111010010 => 1001011100101010011101101000101
//register contains 1268071237 which is decimal representation of 1001011100101010011101101000101
Since I can't find a suitable dupe, I'll just post it.
The main idea here is to make use of pshufb's dual use a parallel 16-entry table lookup to reverse the bits of each nibble. Reversing bytes is obvious. Reversing the order of the two nibble in every byte could be done by building it into the lookup tables (saves a shift) or by explicitly shifting the low part nibble up (saves a LUT).
Something like this in total, not tested:
__m256i rbit32(__m256i x) {
__m256i shufbytes = _mm256_setr_epi8(3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12);
__m256i luthigh = _mm256_setr_epi8(0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15, 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15);
__m256i lutlow = _mm256_slli_epi16(luthigh, 4);
__m256i lowmask = _mm256_set1_epi8(15);
__m256i rbytes = _mm256_shuffle_epi8(x, shufbytes);
__m256i high = _mm256_shuffle_epi8(lutlow, _mm256_and_si256(rbytes, lowmask));
__m256i low = _mm256_shuffle_epi8(luthigh, _mm256_and_si256(_mm256_srli_epi16(rbytes, 4), lowmask));
return _mm256_or_si256(low, high);
}
In a typical context in a loop, those loads should be lifted out.
Curiously Clang uses 4 shuffles, it's duplicating the first shuffle.

Efficient Eigen Matrix SubIndexing + Concatenation

I'm using Eigen for easy optimization of some of my matrix math. I'm currently trying to make the following operation more efficient:
Given Matrix A:
1, 2, 3
4, 5, 6
Matrix B:
7, 11, 13, 19, 26, 7, 11
8, 9, 15, 6, 8, 4, 1
and "index map" column vector IM:
0, 1, 3, 6
I'd like to append the columns of Matrix B mapping to the indexes in IM, to Matrix A as such:
1, 2, 3, 7, 11, 19, 11
4, 5, 6, 8, 9, 6, 1
I'm currently able to do this with a massive for loop, but this is the bottleneck in my code and I'd like to avoid this:
#pragma unroll
for (int i = 0; i < 25088; i++) {
block.noalias() += _features.col(ff[i]);
}
I've seen the discussion here and poured over the docs but can't seem to figure out the right syntax relating to Eigen matrices: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=329
Any thoughts/tips would be much appreciated!

How do I generate missing values of an array in Fortran?

I have an array, x, with dimension 4, that has the following values
3, 4.5, 7, 9
How do I generate the missing values, and make a new one like this?
3, 3.5, 4, 4.5, 5, 5.5, ...,9
Thanks in advance.

How do I find size of varying rows of a dynamically allocated array?

I have the following code:
int *exceptions[7];
int a[] = {1, 4, 11, 13};
int b[] = {5, 6, 11, 12, 14, 15};
int c[] = {2, 12, 14, 15};
int d[] = {1, 4, 7, 9, 10, 15};
int e[] = {1, 3, 4, 5, 7, 9};
int f[] = {1, 2, 3, 7, 13};
int g[] = {0, 1, 7, 12};
exceptions[0] = a;
exceptions[1] = b;
exceptions[2] = c;
exceptions[3] = d;
exceptions[4] = e;
exceptions[5] = f;
exceptions[6] = g;
Size of exception[0] and exception[1] should be 4 and 6 respectively.
Here's my code:
short size = sizeof(exceptions[1]) / sizeof(exceptions[1][0]);
But I'm getting 2 for every row. How can I solve this problem?
short size = sizeof(exceptions[1]) / sizeof(exceptions[1][0]);
effectively does the same as
short size = sizeof(int*) / sizeof(int);
On a 64 bit platform, that yields most probably 2.
How can I solve this problem?
Use some c++ standard container like std::vector<std::vector<int>> instead:
std::vector<std::vector<int>> exceptions {
{1, 4, 11, 13},
{5, 6, 11, 12, 14, 15},
{2, 12, 14, 15},
{1, 4, 7, 9, 10, 15},
{1, 3, 4, 5, 7, 9},
{1, 2, 3, 7, 13},
{0, 1, 7, 12},
}
Your statement will become:
short size = exceptions[0].size();
size = exceptions[1].size();
(for whatever that's needed)
The best remedy would be to use vector provided in standard template library. They have a size() function which you can use and they are much more versatile than array.

Unknown error in array initialization: invalid in-class initialization of static data member of non- integral type `const unsigned char[256]'

I was trying to make a Intel 8080 CPU emulator (then I'd like to emulate Space Invaders, which use it).
I coded nearly complete implementation of this CPU (thanks to MAME and Tickle project (mostly) ;) ) except undocument instructions (0x08, 0x10, 0x18, 0x20, 0x28, 0x30, 0x38, 0x0CB, 0x0D9, 0x0DD, 0x0ED, 0x0FD).
I've have only problems when I compile it, I don't know why.
This is the code:
static const unsigned char cycles_table[256] =
{
/* 8080's Cycles Table */
/* 0 1 2 3 4 5 6 7 8 9 A B C D E F */
/*0*/ 4, 10, 7, 5, 5, 5, 7, 4, 0, 10, 7, 5, 5, 5, 7, 4,
/*1*/ 0, 10, 7, 5, 5, 5, 7, 4, 0, 10, 7, 5, 5, 5, 7, 4,
/*2*/ 0, 10, 16, 5, 5, 5, 7, 4, 0, 10, 16, 5, 5, 5, 7, 4,
/*3*/ 0, 10, 13, 5, 10, 10, 10, 4, 0, 10, 13, 5, 5, 5, 7, 4,
/*4*/ 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 7, 5,
/*5*/ 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 7, 5,
/*6*/ 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 7, 5,
/*7*/ 7, 7, 7, 7, 7, 7, 7, 7, 5, 5, 5, 5, 5, 5, 7, 5,
/*8*/ 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
/*9*/ 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
/*A*/ 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
/*B*/ 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
/*C*/ 5, 10, 10, 10, 11, 11, 7, 11, 5, 10, 10, 0, 11, 17, 7, 11,
/*D*/ 5, 10, 10, 10, 11, 11, 7, 11, 5, 0, 10, 10, 11, 0, 7, 11,
/*E*/ 5, 10, 10, 18, 11, 11, 7, 11, 5, 5, 10, 4, 11, 0, 7, 11,
/*F*/ 5, 10, 10, 4, 11, 11, 7, 11, 5, 5, 10, 4, 11, 0, 7, 11
};
g++ takes me this error:
8080.h:521: error: invalid in-class initialization of static data member of non- integral type `const unsigned char[256]'
This array is in a class called i8080.
Like it says, you cannot initialize static non-integral types in a class definition. That is, you could do this:
static const unsigned value = 123;
static const bool value_again = true;
But not anything else.
What you should do is place this in your class definition:
static const unsigned char cycles_table[256];
And in the corresponding source file, place what you have:
const unsigned char i8080::cycles_table[256] = // ...
What this does is say (in the definition), "Hey, there's gonna be this array." and in the source file, "Hey, here's that array."
Static data members need to be initialised outside of the class.
You cannot initialize a static array embedded within a class like this:
class Thing
{
public:
static const int [3] = {1, 2, 3};
};
You have to do it like this:
thing.h:
class Thing
{
public:
static const int vals[3];;
};
thing.cpp:
const int Thing::vals[3] = {1, 2, 3};