Is there a (fast) way to perform bits reverse of 32bit int values within avx2 register?
E.g.
_mm256_set1_epi32(2732370386);
<do something here>
//binary: 10100010110111001010100111010010 => 1001011100101010011101101000101
//register contains 1268071237 which is decimal representation of 1001011100101010011101101000101
Since I can't find a suitable dupe, I'll just post it.
The main idea here is to make use of pshufb's dual use a parallel 16-entry table lookup to reverse the bits of each nibble. Reversing bytes is obvious. Reversing the order of the two nibble in every byte could be done by building it into the lookup tables (saves a shift) or by explicitly shifting the low part nibble up (saves a LUT).
Something like this in total, not tested:
__m256i rbit32(__m256i x) {
__m256i shufbytes = _mm256_setr_epi8(3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12);
__m256i luthigh = _mm256_setr_epi8(0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15, 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15);
__m256i lutlow = _mm256_slli_epi16(luthigh, 4);
__m256i lowmask = _mm256_set1_epi8(15);
__m256i rbytes = _mm256_shuffle_epi8(x, shufbytes);
__m256i high = _mm256_shuffle_epi8(lutlow, _mm256_and_si256(rbytes, lowmask));
__m256i low = _mm256_shuffle_epi8(luthigh, _mm256_and_si256(_mm256_srli_epi16(rbytes, 4), lowmask));
return _mm256_or_si256(low, high);
}
In a typical context in a loop, those loads should be lifted out.
Curiously Clang uses 4 shuffles, it's duplicating the first shuffle.
I'm using Eigen for easy optimization of some of my matrix math. I'm currently trying to make the following operation more efficient:
Given Matrix A:
1, 2, 3
4, 5, 6
Matrix B:
7, 11, 13, 19, 26, 7, 11
8, 9, 15, 6, 8, 4, 1
and "index map" column vector IM:
0, 1, 3, 6
I'd like to append the columns of Matrix B mapping to the indexes in IM, to Matrix A as such:
1, 2, 3, 7, 11, 19, 11
4, 5, 6, 8, 9, 6, 1
I'm currently able to do this with a massive for loop, but this is the bottleneck in my code and I'd like to avoid this:
#pragma unroll
for (int i = 0; i < 25088; i++) {
block.noalias() += _features.col(ff[i]);
}
I've seen the discussion here and poured over the docs but can't seem to figure out the right syntax relating to Eigen matrices: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=329
Any thoughts/tips would be much appreciated!
I have the following code:
int *exceptions[7];
int a[] = {1, 4, 11, 13};
int b[] = {5, 6, 11, 12, 14, 15};
int c[] = {2, 12, 14, 15};
int d[] = {1, 4, 7, 9, 10, 15};
int e[] = {1, 3, 4, 5, 7, 9};
int f[] = {1, 2, 3, 7, 13};
int g[] = {0, 1, 7, 12};
exceptions[0] = a;
exceptions[1] = b;
exceptions[2] = c;
exceptions[3] = d;
exceptions[4] = e;
exceptions[5] = f;
exceptions[6] = g;
Size of exception[0] and exception[1] should be 4 and 6 respectively.
Here's my code:
short size = sizeof(exceptions[1]) / sizeof(exceptions[1][0]);
But I'm getting 2 for every row. How can I solve this problem?
short size = sizeof(exceptions[1]) / sizeof(exceptions[1][0]);
effectively does the same as
short size = sizeof(int*) / sizeof(int);
On a 64 bit platform, that yields most probably 2.
How can I solve this problem?
Use some c++ standard container like std::vector<std::vector<int>> instead:
std::vector<std::vector<int>> exceptions {
{1, 4, 11, 13},
{5, 6, 11, 12, 14, 15},
{2, 12, 14, 15},
{1, 4, 7, 9, 10, 15},
{1, 3, 4, 5, 7, 9},
{1, 2, 3, 7, 13},
{0, 1, 7, 12},
}
Your statement will become:
short size = exceptions[0].size();
size = exceptions[1].size();
(for whatever that's needed)
The best remedy would be to use vector provided in standard template library. They have a size() function which you can use and they are much more versatile than array.
I was trying to make a Intel 8080 CPU emulator (then I'd like to emulate Space Invaders, which use it).
I coded nearly complete implementation of this CPU (thanks to MAME and Tickle project (mostly) ;) ) except undocument instructions (0x08, 0x10, 0x18, 0x20, 0x28, 0x30, 0x38, 0x0CB, 0x0D9, 0x0DD, 0x0ED, 0x0FD).
I've have only problems when I compile it, I don't know why.
This is the code:
static const unsigned char cycles_table[256] =
{
/* 8080's Cycles Table */
/* 0 1 2 3 4 5 6 7 8 9 A B C D E F */
/*0*/ 4, 10, 7, 5, 5, 5, 7, 4, 0, 10, 7, 5, 5, 5, 7, 4,
/*1*/ 0, 10, 7, 5, 5, 5, 7, 4, 0, 10, 7, 5, 5, 5, 7, 4,
/*2*/ 0, 10, 16, 5, 5, 5, 7, 4, 0, 10, 16, 5, 5, 5, 7, 4,
/*3*/ 0, 10, 13, 5, 10, 10, 10, 4, 0, 10, 13, 5, 5, 5, 7, 4,
/*4*/ 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 7, 5,
/*5*/ 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 7, 5,
/*6*/ 5, 5, 5, 5, 5, 5, 7, 5, 5, 5, 5, 5, 5, 5, 7, 5,
/*7*/ 7, 7, 7, 7, 7, 7, 7, 7, 5, 5, 5, 5, 5, 5, 7, 5,
/*8*/ 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
/*9*/ 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
/*A*/ 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
/*B*/ 4, 4, 4, 4, 4, 4, 7, 4, 4, 4, 4, 4, 4, 4, 7, 4,
/*C*/ 5, 10, 10, 10, 11, 11, 7, 11, 5, 10, 10, 0, 11, 17, 7, 11,
/*D*/ 5, 10, 10, 10, 11, 11, 7, 11, 5, 0, 10, 10, 11, 0, 7, 11,
/*E*/ 5, 10, 10, 18, 11, 11, 7, 11, 5, 5, 10, 4, 11, 0, 7, 11,
/*F*/ 5, 10, 10, 4, 11, 11, 7, 11, 5, 5, 10, 4, 11, 0, 7, 11
};
g++ takes me this error:
8080.h:521: error: invalid in-class initialization of static data member of non- integral type `const unsigned char[256]'
This array is in a class called i8080.
Like it says, you cannot initialize static non-integral types in a class definition. That is, you could do this:
static const unsigned value = 123;
static const bool value_again = true;
But not anything else.
What you should do is place this in your class definition:
static const unsigned char cycles_table[256];
And in the corresponding source file, place what you have:
const unsigned char i8080::cycles_table[256] = // ...
What this does is say (in the definition), "Hey, there's gonna be this array." and in the source file, "Hey, here's that array."
Static data members need to be initialised outside of the class.
You cannot initialize a static array embedded within a class like this:
class Thing
{
public:
static const int [3] = {1, 2, 3};
};
You have to do it like this:
thing.h:
class Thing
{
public:
static const int vals[3];;
};
thing.cpp:
const int Thing::vals[3] = {1, 2, 3};