I'm reading an ESRI Shapefile, and to my dismay it uses big endian and little endian at different points (see, for instance, the table at page 4, plus the tables from page 5 to 8).
So I created two functions in C++, one for each endianness.
uint32_t readBig(ifstream& f) {
uint32_t num;
uint8_t buf[4];
f.read((char*)buf,4);
num = buf[3] | buf[2]<<8 | buf[1]<<16 | buf[0]<<24;
return num;
}
uint32_t readLittle(ifstream& f) {
uint32_t num;
f.read(reinterpret_cast<char *>(&num),4);
//f.read((char*)&num,4);
return num;
}
But I'm not sure this is the most efficient way to do it. Can this code be improved? Keep in mind it will run thousands, maybe millions of times for a single shapefile. So to have even one of the functions calling the other seem worse than to have two separate functions. Is there a difference in performance between using reinterpret_cast or explicit type conversion (char*)? Should I use the same in both functions?
Casting between pointer types does not affect performance -- In
this case, it's just a technicality to make the compiler happy.
If you're really making a separate call to read for every 32-bit
value, the time taken by the byte-swapping operation will likely be
in the noise. For speed, you probably should have your own
buffering layer so that you inner loop doesn't make any function
calls.
It's nice if the swap compiles down to a single opcode (like bswap), but whether or not that
is possible, or the fastest option, is processor-specific.
If you're really interested in maximizing speed, consider using SIMD intrinsics.
In most cases the compiler should generate a bswap instruction, which is probably sufficient. If however you need something faster than that, vpshufb is your friend...
#include <immintrin.h>
#include <cstdint>
// swap byte order in 16 x int16
inline void swap_16xi16(uint16_t input[16])
{
constexpr uint8_t mask_data[] = {
1, 0,
3, 2,
5, 4,
7, 6,
9, 8,
11, 10,
13, 12,
15, 14,
1, 0,
3, 2,
5, 4,
7, 6,
9, 8,
11, 10,
13, 12,
15, 14
};
const __m256i swapped = _mm256_shuffle_epi8(
_mm256_loadu_si256((const __m256i*)input),
_mm256_loadu_si256((const __m256i*)mask_data)
);
_mm256_storeu_si256((__m256i*)input, swapped);
}
// swap byte order in 8 x int32
inline void swap_8xi32(uint32_t input[8])
{
constexpr uint8_t mask_data[] = {
3, 2, 1, 0,
7, 6, 5, 4,
11, 10, 9, 8,
15, 14, 13, 12,
3, 2, 1, 0,
7, 6, 5, 4,
11, 10, 9, 8,
15, 14, 13, 12
};
const __m256i swapped = _mm256_shuffle_epi8(
_mm256_loadu_si256((const __m256i*)input),
_mm256_loadu_si256((const __m256i*)mask_data)
);
_mm256_storeu_si256((__m256i*)input, swapped);
}
// swap byte order in 4 x int64
inline void swap_4xi64(uint64_t input[4])
{
constexpr uint8_t mask_data[] = {
7, 6, 5, 4, 3, 2, 1, 0,
15, 14, 13, 12, 11, 10, 9, 8,
7, 6, 5, 4, 3, 2, 1, 0,
15, 14, 13, 12, 11, 10, 9, 8
};
const __m256i swapped = _mm256_shuffle_epi8(
_mm256_loadu_si256((const __m256i*)input),
_mm256_loadu_si256((const __m256i*)mask_data)
);
_mm256_storeu_si256((__m256i*)input, swapped);
}
inline void swap_16xi16(int16_t input[16])
{ swap_16xi16((uint16_t*)input); }
inline void swap_8xi32(int32_t input[8])
{ swap_8xi32((uint32_t*)input); }
inline void swap_4xi64(int64_t input[4])
{ swap_4xi64((uint64_t*)input); }
inline void swap_8f(float input[8])
{ swap_8xi32((uint32_t*)input); }
inline void swap_4d(double input[4])
{ swap_4xi64((uint64_t*)input); }
Related
The task is following: find indices of duplicating rows of 2D array. Rows considered to be duplicated if 2nd and 4th elements of one row are equal to 2nd and 4th elements of another row.The simplest way to do it is something like that:
std::unordered_set<int> result;
for (int i = 0; i < rows_count; ++i)
{
for (int j = i + 1; j < rows_count; ++j)
{
if (arr[i][2] == arr[j][2] && arr[i][4] == arr[j][4])
{
result.push_back(j);
}
}
}
But if rows_count is very large this algorithm is too slow. So my question is there any way to get needed indices using some data structures (from stl or other) with only single loop (without nested loop)?
You could take advantage of the properties of a `std::unordered_set.
A small helper class will further ease up things.
So, we can store in a class the 2nd and 4th value and use a comparision function to detect duplicates.
The std::unordered_set has, besides the data type, 2 additional template parameters.
A functor for equality and
a functor for calculating a hash function.
So we will add 2 functions to our class an make it a functor for both parameters at the same time. In the below code you will see:
std::unordered_set<Dupl, Dupl, Dupl> dupl{};
So, we use our class additionally as 2 functors.
The rest of the functionality will be done by the std::unordered_set
Please see below one of many potential solutions:
#include <vector>
#include <unordered_set>
#include <iostream>
struct Dupl {
Dupl() {}
Dupl(const size_t row, const std::vector<int>& data) : index(row), firstValue(data[2]), secondValue(data[4]){};
size_t index{};
int firstValue{};
int secondValue{};
// Hash function
std::size_t operator()(const Dupl& d) const noexcept {
return d.firstValue + (d.secondValue << 8) + (d.index << 16);
}
// Comparison
bool operator()(const Dupl& lhs, const Dupl& rhs) const {
return (lhs.firstValue == rhs.firstValue) and (lhs.secondValue == rhs.secondValue);
}
};
std::vector<std::vector<int>> data{
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, // Index 0
{2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, // Index 1
{3, 4, 42, 6, 42, 8, 9, 10, 11, 12}, // Index 2 ***
{4, 5, 6, 7, 8, 9, 10, 11, 12, 13}, // Index 3
{5, 6, 42, 8, 42, 10, 11, 12, 13, 14}, // Index 4 ***
{6, 7, 8, 9, 10, 11, 12, 13, 14, 15}, // Index 5
{7, 8, 9, 10, 11, 12, 13, 14, 15, 16}, // Index 6
{8, 9, 10, 11, 12, 13, 14, 15, 16, 17}, // Index 7
{9, 10, 42, 12, 42, 14, 15, 16, 17, 18}, // Index 8 ***
{10, 11, 12, 13, 14, 15, 16, 17, 18, 19}, // Index 9
};
int main() {
std::unordered_set<Dupl, Dupl, Dupl> dupl{};
// Find the unique rows
for (size_t i{}; i < data.size(); ++i)
dupl.insert({i, data[i]});
// Show some debug output
for (const Dupl& d : dupl) {
std::cout << "\nIndex:\t " << d.index << "\t\tData: ";
for (const int i : data[d.index]) std::cout << i << ' ';
}
}
I'm working with C++ and found a problem. I want to pass an argument to a function. The argument must be a 2d array. When I try to do it, I get 2 errors:
Too many initializer values
and
initializing cannnot convert from initializer list to size_t**
How do I fix this? I've tried with changing it as 5x5 matrix, but it doesn't make it good.
size_t** matrix =
{
{1, 16, 20, 23, 25},
{6, 2, 17, 21, 24},
{10, 7, 3, 18, 22},
{13, 11, 8, 4, 19},
{15, 14, 12, 9, 5},
};
set<bool> set1 = iterateover(matrix);
The function:
std::set<bool> iterateover(size_t **arrayy)
size_t** matrix defines a pointer to a pointer to a size_t. An array is not a pointer. It can decay to a pointer, but in the case of a 2D array, it decays to a pointer to a 1D array, not to a pointer to a pointer.
The closest thing I can think of to what you seem to be after is
// here be the data
size_t actual_matrix[][5] = // note: We can omit the first dimension but we cannot
// omit the inner dimensions
{
{1, 16, 20, 23, 25},
{6, 2, 17, 21, 24},
{10, 7, 3, 18, 22},
{13, 11, 8, 4, 19},
{15, 14, 12, 9, 5},
};
// an array of pointers to the rows of actual data. This 1D array of pointers will
// decay to a size_t **
size_t * matrix[] =
{
actual_matrix[0],
actual_matrix[1],
actual_matrix[2],
actual_matrix[3],
actual_matrix[4],
};
// now we have the correct type to use with iterateover
std::set<bool> set1 = iterateover(matrix);
I want to pass an argument to a function. The argument must be a 2d array.
You can make iteratreOver a function template which can take a 2D array by reference, as shown below. You can make additional changes to the function according to your needs since it is not clear from the question what your iterateover function does. I have just printed all the elements inside the 2D array.
#include <iostream>
template<typename T,std::size_t N, std::size_t M>
void iterateOver(T (&arr)[N][M])
{
for(std::size_t i= 0; i < N; ++i)
{
for(std::size_t j = 0; j < M; ++j)
{
std::cout<<arr[i][j] <<" ";
}
std::cout<<std::endl;
}
}
int main()
{
size_t matrix[5][5] =
{
{1, 16, 20, 23, 25},
{6, 2, 17, 21, 24},
{10, 7, 3, 18, 22},
{13, 11, 8, 4, 19},
{15, 14, 12, 9, 5},
};
//call iterateOver by passing the matrix by reference
iterateOver(matrix);
}
The output of the above program can be seen here:
1 16 20 23 25
6 2 17 21 24
10 7 3 18 22
13 11 8 4 19
15 14 12 9 5
I'm facing a strange behaviour using Intel C++ compiler 2019 update 5. When I fill a std::map it seems to lead to a non deterministic (?) result. The stl is from VS2019 16.1.6 in which ICC is embedded. I am on Windows 10.0.17134.286.
My code:
#include <map>
#include <vector>
#include <iostream>
std::map<int, int> AddToMapWithDependencyBetweenElementsInLoop(const std::vector<int>& values)
{
std::map<int, int> myMap;
for (int i = 0; i < values.size(); i+=3)
{
myMap.insert(std::make_pair(values[i], myMap.size()));
myMap.insert(std::make_pair(values[i + 1], myMap.size()));
myMap.insert(std::make_pair(values[i + 2], myMap.size()));
}
return myMap;
}
std::map<int, int> AddToMapOnePerLoop(const std::vector<int>& values)
{
std::map<int, int> myMap;
for (int i = 0; i < values.size(); ++i)
{
myMap.insert(std::make_pair(values[i], 0));
}
return myMap;
}
int main()
{
std::vector<int> values{ 6, 7, 15, 5, 4, 12, 13, 16, 11, 10, 9, 14, 0, 1, 2, 3, 8, 17 };
{
auto myMap = AddToMapWithDependencyBetweenElementsInLoop(values);
for (const auto& keyValuePair : myMap)
{
std::cout << keyValuePair.first << ", ";
}
std::cout << std::endl;
}
{
auto myMap = AddToMapOnePerLoop(values);
for (const auto& keyValuePair : myMap)
{
std::cout << keyValuePair.first << ", ";
}
std::cout << std::endl;
}
return 0;
}
I simply wanted to perform a test so I call directly icl from the command line:
$ icl /nologo mycode.cpp
$ mycode.exe
0, 1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14, 15, 16, 17,
0, 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 17
Curious. I expected to have 18 entries and I got 15 and 14 (depending on the insertion method, see the code).
$ icl /nologo /EHsc mycode.cpp
$ mycode.exe
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17,
0, 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 17
Still curious, now I got 17 and 14 entries rather than 18 and 18!
$ icl /nologo /Od mycode.cpp
$ mycode.exe
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
Now, with no optimization, I got 18/18, as expected.
My question is two-fold: 1) is it normal to get such results and 2) if it's not (what I suspect) what did I do wrong? I tought a simple call to the compiler would call the std::map::insert() function correctly?
Does the problem lies in the for(){}???
Thanks for helping me understanding this problem and finding a solution!
I cannot reproduce this but in either case, for peace of mind you could populate the map much simpler:
for (auto i: values) {
myMap[i] = 0;
}
There is no need to use myMap.insert(std::make_pair(key, value)) just to add an entry to the map.
Otherwise your code produces the expected output (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 twice, the sequence is obviously sorted because this is the ordered map) if compiled with gcc 8.4.0 under Ubuntu. I suspect this is simply a bug of that particular compiler you use. It would be beneficial to report the bug to the compiler developers so that they could fix it.
Syntactically, your code is fine. I see no possible undefined behavior here (as far as you did no further hidden crazy hacks like redefining size_t/map, modifying standard headers etc.).
But:
Since I experienced loop-optimizer issues with older compilers due to lines like this one
for (int i = 0; i < values.size(); ++i)
where you mixed signed and unsigned integers / data type ranges, I suspect your intel compiler might have an issue with loop-unrolling here. Maybe it's also due to an according issue inside the loop and the subscript operator usage there. Typical fundamental issue here: Misassumption about allowed register usage. Can you try your code again with a strict size_t usage here?
Further idea:
Can you reproduce the issue if your 'static' pre-defined values to print are created in a very dynamic way instead of hard-code construction? That might at least exclude a lot of possible underlying reasons if you cannot.
Just guessing that there could be an optimization related to for(...; i+=3)
I see that your use-case has the number of items dividable by 3, but anyway I would fix a bug in your code for more general cases:
{
std::map<int, int> myMap;
for (int i = 0; (i + 2) < values.size(); i+=3) // ignore the possibly incomplete last triplet
I know it is not directly related to your problem, but maybe this fix triggers something in the compiler optimizer to build a correct code.
Is there a (fast) way to perform bits reverse of 32bit int values within avx2 register?
E.g.
_mm256_set1_epi32(2732370386);
<do something here>
//binary: 10100010110111001010100111010010 => 1001011100101010011101101000101
//register contains 1268071237 which is decimal representation of 1001011100101010011101101000101
Since I can't find a suitable dupe, I'll just post it.
The main idea here is to make use of pshufb's dual use a parallel 16-entry table lookup to reverse the bits of each nibble. Reversing bytes is obvious. Reversing the order of the two nibble in every byte could be done by building it into the lookup tables (saves a shift) or by explicitly shifting the low part nibble up (saves a LUT).
Something like this in total, not tested:
__m256i rbit32(__m256i x) {
__m256i shufbytes = _mm256_setr_epi8(3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12);
__m256i luthigh = _mm256_setr_epi8(0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15, 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15);
__m256i lutlow = _mm256_slli_epi16(luthigh, 4);
__m256i lowmask = _mm256_set1_epi8(15);
__m256i rbytes = _mm256_shuffle_epi8(x, shufbytes);
__m256i high = _mm256_shuffle_epi8(lutlow, _mm256_and_si256(rbytes, lowmask));
__m256i low = _mm256_shuffle_epi8(luthigh, _mm256_and_si256(_mm256_srli_epi16(rbytes, 4), lowmask));
return _mm256_or_si256(low, high);
}
In a typical context in a loop, those loads should be lifted out.
Curiously Clang uses 4 shuffles, it's duplicating the first shuffle.
Is it possible to initialize a static eigen matrix4d in a header file? I want to use it as a global variable.
I'd like to do something along the lines of:
static Eigen::Matrix4d foo = Eigen::Matrix4d(1, 2 ... 16);
Or similar to vectors:
static Eigen::Matrix4d foo = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
Here is a link to the eigen matrix docs. I can't seem to find how to do this from there.
A more elegant solution might include the use of finished(). The function returns 'the built matrix once all its coefficients have been set.'
E.g:
static Eigen::Matrix4d foo = (Eigen::Matrix4d() << 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16).finished();
On the lines of Dawid's answer (which has a small issue, see the comments), you can do:
static Eigen::Matrix4d foo = [] {
Eigen::Matrix4d tmp;
tmp << 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16;
return tmp;
}();
Return value optimization takes care of the temporary, so no worries about an extra copy.
You can use initialization lambda like this:
static Eigen::Matrix4d foo = [] {
Eigen::Matrix4d matrix;
matrix << 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16;
return matrix;
}();