Compile time efficient remove duplicates from a boost::hana tuple

Compile time efficient remove duplicates from a boost::hana tuple - c++

I use the boost::hana to_map function to remove duplicates from boost::hana tuple of types. See it at the compiler explorer. The code works very well but compiles very long (~10s). I wonder if there exist a faster solution that is compatible with boost::hana tuple.
#include <boost/hana/map.hpp>
#include <boost/hana/pair.hpp>
#include <boost/hana/type.hpp>
#include <boost/hana/basic_tuple.hpp>
#include <boost/hana/size.hpp>
using namespace boost::hana;
constexpr auto to_type_pair = [](auto x) { return make_pair(typeid_(x), x); };
template <class Tuple>
constexpr auto remove_duplicate_types(Tuple tuple)
{
return values(to_map(transform(tuple, to_type_pair)));
}
int main(){
auto tuple = make_basic_tuple(
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30
, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40
, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50
, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60
, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70
);
auto noDuplicatesTuple = remove_duplicate_types(tuple);
// Should return 1 since there is only one distinct type in the tuple
return size(noDuplicatesTuple);
}

I haven't run any benchmarks, but your example does not appear to take 10 seconds on Compiler Explorer. However, I can explain why it is a relatively slow solution, and suggest an alternative that assumes you are only interested getting a unique list of types and not retaining any run-time information in your result.
Creating large tuples and/or instantiating function templates that have large tuples in their prototypes are expensive compile-time operations.
Just your call to transform instantiates a lambda for each element which in turn instantiates pair. The input/output of this call are both large tuples.
The call to to_map makes an empty map and recursively calls insert for each element each time making a new map, but in this simple case the intermediate result will always be hana::map<int>. I'm willing to bet that this is exploding your compile-times if your actual use case is non-trivial. (It was certainly an issue when we were implementing hana::map so we made hana::make_map avoid this since it has all of its inputs up front).
All of this, and there is a significant penalty for these large function types being used in run-time code. You might notice a difference if you wrapped the operations in decltype and only used the resulting type.
Alternatively, using raw template metaprogramming can sometimes yield performance results over function template based metaprogramming. Here is an example for your use case:
#include <boost/hana/basic_tuple.hpp>
#include <boost/mp11/algorithm.hpp>
namespace hana = boost::hana;
using namespace boost::mp11;
int main() {
auto tuple = hana::make_basic_tuple(
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30
, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40
, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50
, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60
, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70
);
hana::basic_tuple<int> no_dups = mp_unique<std::decay_t<decltype(tuple)>>{};
}
https://godbolt.org/z/EnTWf6

Related

Inlined function to return nested array value not performing as expected

I want to inline the function MyClass:at(), but performance isn't as I expect.
MyClass.cpp
#include <algorithm>
#include <chrono>
#include <iostream>
#include <vector>
#include <string>
// Making this a lot shorter than in my actual program
std::vector<std::vector<int>> arrarr =
{
{ 1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52, 1, 89, 19, 67, 48},
{24, 47, 32, 60, 99, 3, 45, 2, 44, 75, 33, 53, 78, 36, 84, 20, 35, 17, 12, 50},
{32, 98, 81, 28, 64, 23, 67, 10, 26, 38, 40, 67, 59, 54, 70, 66, 18, 38, 64, 70},
{67, 26, 20, 68, 2, 62, 12, 20, 95, 63, 94, 39, 63, 8, 40, 91, 66, 49, 94, 21},
{24, 55, 58, 5, 66, 73, 99, 26, 97, 17, 78, 78, 96, 83, 14, 88, 34, 89, 63, 72},
{21, 36, 23, 9, 75, 0, 76, 44, 20, 45, 35, 14, 0, 61, 33, 97, 34, 31, 33, 95},
{78, 17, 53, 28, 22, 75, 31, 67, 15, 94, 3, 80, 4, 62, 16, 14, 9, 53, 56, 92},
{16, 39, 5, 42, 96, 35, 31, 47, 55, 58, 88, 24, 0, 17, 54, 24, 36, 29, 85, 57},
{86, 56, 0, 48, 35, 71, 89, 7, 5, 44, 44, 37, 44, 60, 21, 58, 51, 54, 17, 58},
{19, 80, 81, 68, 5, 94, 47, 69, 28, 73, 92, 13, 86, 52, 17, 77, 4, 89, 55, 40},
{ 4, 52, 8, 83, 97, 35, 99, 16, 7, 97, 57, 32, 16, 26, 26, 79, 33, 27, 98, 66},
{88, 36, 68, 87, 57, 62, 20, 72, 3, 46, 33, 67, 46, 55, 12, 32, 63, 93, 53, 69},
{ 4, 42, 16, 73, 38, 25, 39, 11, 24, 94, 72, 18, 8, 46, 29, 32, 40, 62, 76, 36},
{20, 69, 36, 41, 72, 30, 23, 88, 34, 62, 99, 69, 82, 67, 59, 85, 74, 4, 36, 16},
{20, 73, 35, 29, 78, 31, 90, 1, 74, 31, 49, 71, 48, 86, 81, 16, 23, 57, 5, 54},
{ 1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52, 1, 89, 19, 67, 48},
};
class MyClass
{
public:
MyClass(std::vector<std::vector<int>> arr) : arr(arr)
{
rows = arr.size();
cols = arr.at(0).size();
}
inline auto at(int row, int col) const { return arr[row][col]; }
void arithmetic(int n) const;
private:
std::vector<std::vector<int>> arr;
int rows;
int cols;
};
MyClass.cpp:
void MyClass::arithmetic(int n) const
{
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::duration;
using std::chrono::milliseconds;
auto t1 = high_resolution_clock::now();
int highest_product = 0;
for (auto y = 0; y < rows; ++y)
{
for (auto x = 0; x < cols; ++x)
{
// Horizontal product
if (x + n < cols)
{
auto product = 1;
for (auto i = 0; i < n; ++i)
{
product *= at(y, x + i);
}
highest_product = std::max(highest_product, product);
}
}
}
auto t2 = high_resolution_clock::now();
duration<double, std::milli> ms_double = t2 - t1;
std::cout << ms_double.count() << "ms\n";
return highestProduct;
};
Now what I want know is why do I get better performance when I replace product *= at(y, x + i); with product *= arr[y][x+i];? When I test it with the first case, the timing on my large array takes roughly 6.7ms, and the second case takes 5.3ms. I thought when I inlined the function, it should be the same implementation as the second case.

Member function directly defined in the class definition (typically in header files) are implicitly inlined so using inline is useless in this case. inline do not guarantee the function is inlined. It is just an hint for the compiler. The keyword is also an important during the link to avoid the multiple-definition issue. Function that are not make inline can still be inlined if the compiler can see the code of the target function (ie. it is in the same translation unit or link time optimization are applied). For more information about this, please read Why are class member functions inlined?
Note that the inlining is typically performed in the optimization step of compilers (eg. -O1//O1). Thus without optimizations, most compilers will not inline the function.
Using std::vector<std::vector<int>> is not efficient since it is not a contiguous data structure and it require 2 indirection to access an item. Two sub-vectors next to each other can be stored far away in memory likely causing more cache misses (and/or thrashing due to the alignment). Please consider using one big flatten array and access items using y*cols+x where cols is the size of the sub-vectors (20 here). Alternatively a int[16][20] data type should do the job well if the size if fixed at compile-time.
MyClass(std::vector<std::vector<int>> arr) cause the input parameter to be copied (and so all the sub-vectors). Please consider using a const std::vector<std::vector<int>>& type.
While at is convenient for checking bounds at runtime, this feature can strongly decrease performance. Consider using the operator [] if you do not need that. You can use assertions combined with flatten arrays so to get a fast code in release and a safe code in debug (you can enable/disable them by defining the NDEBUG macro).

How are tripled sequence in IBO working?

I'm analyzing an obfuscated OpenGL application. I want to generate a .obj file that describes the multi-polygon model which is displayed in the application.
So I froze the app and dig out the values set in VBO and IBO. But the values set in IBO was far more mysterious than what I've expected. The value was
0, 0, 1, 2, 3, 4, 5, 6, 7, 7, 5, 8, 3, 3, 9, 9, 10, 11, 12, 12, 10, 13, 14, 14, 10, 15, 16, 16, 17, 17, 7, 8, 8, 18, 18, 19, 20, 21, 21, 22, 22, 23, 24, 25, 25, 26, 26, 27, 28, 29, 29, 30, 30, 31, 32, 32, 33, 33, 34, 35, 36, 37, 38, 38, 36, 39, 34, 34, 40, 40, 40, 41, 42, 43, 44, 44, 45, 45, 46, 47, 48, 49, 49, 50, 50, 51, 52, 52, 53, 53, 54, 55, 55, 56, 56, 57, 58, 58, 59, 59, 60, 61, 62, 62, 63, 63, 63, 64, 65, 66, 67, 64, 68, 68, 69, 69, 70, 71, 72, 73, 74, 75, 76, 76, 77, 77, 78, 79, 80, 81, 82, 82, 80, 83, 83, 84, 84, 85, 86, 87, 88, 88, 89, 89, 90, 91, 91, 92, 92, 92, 93, 94, 95, 96, 96, 97, 97, 97, 98, 99, 100, 101, 102, 102, 100, 103, 103, 104, 104, 105, 106, 107, 107, 108, 108, 108, 109, 110, 111, 112, 112, 100, 100, 101, 113, 114, 114, ... (length=10495)
As you can see indices like 40, 63, 92 and 108 are tripled, so setting neither GL_TRIANGLES, GL_TRIANGLE_STRIP, GL_TRIANGLE_FAN, GL_QUADS, GL_QUAD_STRIP nor GL_POLYGON to glDrawElements won't work correctly.
Are there some kind of advanced techniques to use triple sequenced indices in IBO? What does it mean? For what reason is it used for?

Repeated indices like that are indicative of aggressive optimization of triangle strips. A repeated index creates degenerate triangles: triangles with zero area. Since they have no visible area, they are not rendered. They exist so that you can jump from one triangle strip to the next without having to issue another draw command.
So a double-index is often used to stitch two strips together. The two triangles it generates will not be rendered.
However, because of the way strips work with the winding order, the facing for the triangles can work out incorrectly. That is, if you stitched two strips together with a double-index, the second strip would start out with the reverse winding order than it desires.
That's where triple indices come in. The third index fixes the winding order for the triangles in the destination strip. The three extra triangles it generates will not be rendered.
The more modern way to handle multiple strips in the same draw call is to use primitive restart indices. But the index list as it currently stands is adequate for use with GL_TRIANGLE_STRIP.
You can read this strip list and process it into a series of separate triangles (as appropriate for GL_TRIANGLES) easily enough. Simply look at each sequence of 3 vertices, and output that to your triangle buffer, so long as it is not a degenerate triangle. And you'll have to reverse the order of two of the indices for every odd-numbered triangle. The code would look something like this:
const int num_faces = indices.size() - 2;
faces.reserve(num_faces);
for(auto i = 0; i < num_faces; ++i)
{
Face f(indices[i], indices[i + 1], indices[i + 2]);
//Don't add any degenerate faces.
if(!(f[0] == f[1] || f[0] == f[2] || f[1] == f[2]))
{
if(i % 2 == 1) //Every odd-numbered face.
std::swap(f[1], f[2]);
faces.push_back(f);
}
}

Vector to Matrix

I am new using Eigen library and I am having problems transform/reshape a vector in a matrix.
I am trying to get an specific row of a matrix and convert it as a matrix, but each time that I do that the result is not what I am expecting.
Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor> m(8, 9);
m << 11, 12, 13, 14, 15, 16, 17, 18, 19,
21, 22, 23, 24, 25, 26, 27, 28, 29,
31, 32, 33, 34, 35, 36, 37, 38, 39,
41, 42, 43, 44, 45, 46, 47, 48, 49,
51, 52, 53, 54, 55, 56, 57, 58, 59,
61, 62, 63, 64, 65, 66, 67, 68, 69,
71, 72, 73, 74, 75, 76, 77, 78, 79,
81, 82, 83, 84, 85, 86, 87, 88, 89;
std::cout << m << std::endl << std::endl;
Matrix<double,1,Dynamic,RowMajor> B = m.row(0);
std::cout << B << std::endl << std::endl;
Map<Matrix3d,RowMajor> A(B.data(),3,3);
std::cout << A << std::endl << std::endl;
Result
11 14 17
12 15 18
13 16 19
I want:
11 12 13
14 15 16
17 18 19

You dont need to select a row first and then map. Just map directly from m and assign the transpose of map to a matrix A as follows
Matrix3d A = Map<Matrix3d>(m.data()).transpose();
If you don't like transposing then forcing the map to use RowMajor for the destination type works too
Matrix3d A = Map<Matrix<double, 3, 3, RowMajor>>(m.data());
Although, at this small size it doesn't matter. Cheers

You need to get the transpose of the result matrix. I think eigen library is converting a vector to a matrix by picking every n'th element to form a row in a n*n sized vector.

Permuting on a schedule python

I'm trying to implement simplified DES for learning purposes in python, but I am having trouble figuring out how to do the permutations based on a "schedule." Essentially, I have a tuple with the appropriate permutations, and I need to bit shift to the correct location.
For example, using a key:
K = 00010011 00110100 01010111 01111001 10011011 10111100 11011111 11110001
Would move the 57st bit to the first bit spot, 49th bit to the second bit spot, etc...
K+ = 1111000 0110011 0010101 0101111 0101010 1011001 1001111 0001111
Current code:
def keyGen(key):
PC1table = (57, 49, 41, 33, 25, 17, 9,
1, 58, 50, 42, 34, 26, 18,
10, 2, 59, 51, 43, 35, 27,
19, 11, 3, 60, 52, 44, 36,
63, 55, 47, 39, 31, 23, 15,
7, 62, 54, 46, 38, 30, 22,
14, 6, 61, 53, 45, 37, 29,
21, 13, 5, 28, 20, 12, 4)
keyBinary = bin(int(key, 16))[2:].zfill(64)
print keyBinary
permute(PC1table, keyBinary)
def permute(permutation, permuteInput):
elements = list(enumerate(permutation))
for bit in permuteInput:
***magic bitshifting goes here***
keyGen("133457799BBCDFF1")
The logic I thought would work was to enumerate the tuple of permutations, and for each bit of my old key, look in the enumeration to find the index corresponding the the bit, and bit shift the appropriate number of times, but I just can't figure out how to go about doing this. It may be that I am approaching the problem from the wrong angle, but any guidance would be greatly appreciated!

Ok, I ended up figuring a way to make this work, although this probably isn't the most efficient way...
prior to calling the function, turn the binary number into a list:
keyBinary = bin(int(key, 16))[2:].zfill(64)
keyBinary = [int(i) for i in keyBinary]
Kplus = permute(PC1table, keyBinary)
def permute(mapping, permuteInput):
permuteOutput = []
for i in range(len(mapping)):
permuteOutput.append(permuteInput[mapping[i % 56] - 1])
return permuteOutput
if anyone has a better way of tackling this, I'd love to see your solutions!

Automatically Deleting specific Elements in Mathematica Tables

I have a question which can be divided into two subquestions.
I have created a table the code of which is given below.
Problem 1.
xstep = 1;
xmaximum = 6;
numberofxnodes = 6;
numberofynodes = 3;
numberofzlayers = 3;
maximumgridnodes = numberofxnodes*numberofynodes
mnodes = numberofxnodes*numberofynodes*numberofzlayers;
orginaltable =
Table[{i,
node2 = i + xstep, node3 = node2 + xmaximum,
node4 = node3 - xstep,node5 = i + maximumgridnodes,
node6 = node5 + xstep,node7 = node6 + xmaximum,
node8 = node7 - xstep},
{i, 1, mnodes}]
If I run this I will get my original table. Basically I want to remove the sixth element and multiples of the sixth element from my original table. I am able to do this by using this code below.
modifiedtable = Drop[orginaltable, {6, mnodes, 6}]
Now I get the modified table where every sixth element and multiples of sixth element of my original table is removed. This solves my Problem 1.
Now my Problem 2:
** MAJOR EDITED VERSION**:(ALL THE CODES GIVEN ABOVE IS CORRECT)
Thanks a lot for the answers, but I wanted something else and I made a mistake
while explaining it initially so I'm making another try.
Below is my modified table: I want the elements in between
"/** and **/" deleted and remaining there.
{{1, 2, 8, 7, 19, 20, 26, 25}, {2, 3, 9, 8, 20, 21, 27, 26}, {3, 4,10, 9, 21, 22, 28, 27}, {4, 5, 11, 10, 22, 23, 29, 28}, {5, 6, 12, 11, 23, 24, 30, 29}, {7, 8, 14, 13, 25, 26, 32, 31}, {8, 9, 15, 14, 26, 27, 33, 32}, {9, 10, 16, 15, 27, 28, 34, 33}, {10, 11, 17, 16, 28, 29, 35, 34}, {11, 12, 18, 17, 29, 30, 36, 35}, /**{13, 14, 20, 19, 31, 32, 38, 37}, {14, 15, 21, 20, 32, 33, 39, 38}, {15, 16, 22, 21, 33, 34, 40, 39}, {16, 17, 23, 22, 34, 35, 41, 40}, {17, 18, 24, 23, 35, 36, 42, 41},**/ {19, 20, 26, 25, 37, 38, 44, 43}, {20, 21, 27, 26, 38, 39, 45, 44}, {21, 22, 28, 27, 39, 40, 46, 45}, {22, 23, 29, 28, 40, 41, 47, 46}, {23, 24, 30, 29, 41, 42, 48, 47}, {25, 26, 32, 31,43, 44, 50, 49}, {26, 27, 33, 32, 44, 45, 51, 50}, {27, 28, 34, 33, 45, 46, 52, 51}, {28, 29, 35, 34, 46, 47, 53, 52}, {29, 30, 36, 35, 47, 48, 54, 53}, /**{31, 32, 38, 37, 49, 50, 56, 55}, {32, 33, 39, 38,50, 51, 57, 56}, {33, 34, 40, 39, 51, 52, 58, 57}, {34, 35, 41, 40, 52, 53, 59, 58}, {35, 36, 42, 41, 53, 54, 60, 59},**/ {37, 38, 44, 43,55, 56, 62, 61}, {38, 39, 45, 44, 56, 57, 63, 62}, {39, 40, 46, 45, 57, 58, 64, 63}, {40, 41, 47, 46, 58, 59, 65, 64}, {41, 42, 48, 47,59, 60, 66, 65}, {43, 44, 50, 49, 61, 62, 68, 67}, {44, 45, 51, 50, 62, 63, 69, 68}, {45, 46, 52, 51, 63, 64, 70, 69}, {46, 47, 53, 52, 64, 65, 71, 70}, {47, 48, 54, 53, 65, 66, 72, 71}, /**{49, 50, 56, 55, 67, 68, 74, 73}, {50, 51, 57, 56, 68, 69, 75, 74},{51,52, 58, 57, 69, 70, 76, 75}, {52, 53, 59, 58, 70, 71, 77, 76}, {53, 54, 60, 59, 71, 72, 78, 77}}**/
Now, if you observe, I wanted the first ten elements
(1st to 10th element of modifiedtable) to be there in my final table
( DoubleModifiedTable ). the the next five (11th to 15th elements of modifiedtable) deleted.
Then the next ten elements ( 16th to 25th elements of modifiedtable)
to be present in my final table ( DoubleModifiedTable )
then the next five deleted (26th to 30th elements of modifiedtable) and so on for the whole table.
Let say we solve this problem and we name the final table DoubleModifiedTable.
I am basically interested in getting the DoubleModifiedTable. I decided to subdivide the problem as it easy to explain.
I want this to happen automatically through the table since as this is just an example table but in reality I have huge table. If I can understand how I can solve this problem for this table, then I can solve it for my large table too.

Perhaps simpler:
DoubleModifiedTable =
Module[{copy = modifiedtable},
copy[[Flatten[# + Range[5] & /# Range[10, Length[copy], 10]]]] = Sequence[];
copy]
EDIT
Even simpler:
DoubleModifiedTable =
Delete[modifiedtable,
Transpose[{Flatten[# + Range[5] & /# Range[10, Length[modifiedtable], 10]]}]]
EDIT 2
Per OP's request: one only has to change a single number (10 to 15) in any of my solutions to get the answer to a modified problem:
DoubleModifiedTable =
Delete[modifiedtable,
Transpose[{Flatten[# + Range[5] & /# Range[10, Length[modifiedtable], 15]]}]]

Another way is to do something like
DoubleModifiedTable = With[{n = 10, m = 5},
Flatten[{{modifiedtable[[;; m]]},
Partition[modifiedtable, n - m, n, {n - m + 1, 1}, {}]}, 2]]
Edit
The edited version of Problem 2 is actually slightly simpler to solve than the original version. You could for example do something like
DoubleModifiedTable =
With[{n = 10, m = 5}, Flatten[Partition[modifiedtable, n, n + m, 1, {}], 1]]
Edit 2
What my second version does is to split the original list modifiedtable into sublists using Partition and then to flatten these sublists to form the final list. If you look at the Documentation for Partition you can see that I'm using the 6th form of Partition which means that the length of the sublists is n and the offset (the distance be is n+m. The gap between the sublists is therefore n+m-n==m.
The next argument, 1, is actually equivalent to {1,1} which tells Mathematica that the first element of modifiedtable should appear at position 1 in the first sublist and the last element of modifiedtable should appear on or after position 1 of the last sublist.
The last argument, {} is to indicate that no padding should be used for sublists with length <=n.
In summary, if you want to delete the first 10 elements and keep the next 5 you want sublists of length n=5 with gap m=10. Since you want the first sublist to start with the (m+1)-th element of modifiedtable, you could replace the fourth argument in Partition with something of the form {k,1} for some value of k but it's probably easier to just drop the first m elements of modifiedtable beforehand, i.e.
DoubleModifiedTable =
With[{n = 5, m = 10},
Flatten[Partition[Drop[modifiedtable, m], n, n + m, 1, {}], 1]]

DoubleModifiedTable=
modifiedtable[[
Complement[
Range[Length[modifiedtable]],
Flatten#Table[10 i + j, {i, Floor[Length[modifiedtable]/10]}, {j, 5}]
]
]]
or, slightly shorter
DoubleModifiedTable=
#[[
Complement[
Range[Length[#]],
Flatten#Table[10 i + j, {i, Floor[Length[#]/10]}, {j, 5}]
]
]] & # modifiedtable

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Compile time efficient remove duplicates from a boost::hana tuple - c++

Related

Inlined function to return nested array value not performing as expected

How are tripled sequence in IBO working?

Vector to Matrix

Permuting on a schedule python

Automatically Deleting specific Elements in Mathematica Tables

Categories

Resources