I encountered weird behaviour when trying to access pixels as shown below:
void Dbscan::regionQuery(int i, int j, std::vector<Point>* res) const {
// check rect. grid around center point
const size_t row_min = std::max(0, i-eps_);
const size_t row_max = std::min(n_rows_, i+eps_+1);
const size_t col_min = std::max(0, j-eps_);
const size_t col_max = std::min(n_cols_, j+eps_+1);
assert(masked_img_.depth() == CV_8UC1);
for (int m = row_min; m<row_max; ++m) {
const uchar* mask_ptr = masked_img_.ptr(m);
for (int n = col_min; n<col_max; ++n) {
assert(*mask_ptr == masked_img_.at<uchar>(m, n));
if (masked_img_.at<uchar>(m, n) == 255) {
res->emplace_back(Point(m,n));
}
mask_ptr++;
}
}
Basically, the second assertion as shown fails and I'm rather clueless as to what is going on. Does anyone have an idea how to best approach debugging the problem above?
Bests regards
Felix
cv::Mat::ptr returns a pointer to the beginning of the row from the argument, which is an address of an element in the first column of this row. cv::Mat::at returns a reference to the element in the row and column from the argument. In your code the row matches, but the column doesn't (unless your col_min evaluates to 0), thus you need to move the pointer from cv::Mat::ptr n elements to match your column as well:
for (int m = row_min; m<row_max; ++m) {
const uchar* mask_ptr = masked_img_.ptr(m);
for (int n = col_min; n<col_max; ++n) {
assert(*(mask_ptr + n) == masked_img_.at<uchar>(m, n));
if (masked_img_.at<uchar>(m, n) == 255) {
res->emplace_back(Point(m,n));
}
}
}
Related
I want to implement 2D convolution function in C++ by myself, without using filter2D(). I'm trying to iterate all pixels of input image and kernel, then, assign new value to each pixel of dst.
However, I got this error.
Thread 1: EXC_BAD_ACCESS (code=1, address=0x0)
I found that this error tells I'm accessing nullptr, but I could not solve the problem. Here is my c++ code.
cv::Mat_<float> spatialConvolution(const cv::Mat_<float>& src, const cv::Mat_<float>& kernel)
{
// declare variables
Mat_<float> dst;
Mat_<float> flipped_kernel;
float tmp = 0.0;
// flip kernel
flip(kernel, flipped_kernel, -1);
// multiply and integrate
// input rows
for(int i=0;i<src.rows;i++){
// input columns
for(int j=0;j<src.cols;j++){
// kernel rows
for(int k=0;k<flipped_kernel.rows;k++){
// kernel columns
for(int l=0;l<flipped_kernel.cols;l++){
tmp += src.at<float>(i,j) * flipped_kernel.at<float>(k,l);
}
}
dst.at<float>(i,j) = tmp;
}
}
return dst.clone();
}
To simplify let's suppose you have kernel 3x3
k(0,0) k(0,1) k(0,2)
k(1,0) k(1,1) k(1,2)
k(2,0) k(2,1) k(2,2)
to calculate convolution you are scanning input image (marked as I) from left to fright, from top to bottom
and for every pixel of input image you assign one value calculated from the formula below:
newValue(y,x) = I(y-1,x-1) * k(0,0) + I(y-1,x) * k(0,1) + I(y-1,x+1) * k(0,2)
+ I(y,x-1) * k(1,0) + I(y,x) * k(1,1) + I(y,x+1) * k(1,2) +
+ I(y+1,x-1) * k(2,0) + I(y+1,x) * k(2,1) + I(y+1,x+1) * k(2,2)
------------------x------------>
|
|
| [k(0,0) k(0,1) k(0,2)]
y [k(1,0) k(1,1) k(1,2)]
| [k(2,0) k(2,1) k(2,2)]
|
(y,x) of input Image (I) is anchor point of kernel, to assign new value to I(y,x)
you need to multiply every k coefficient by corresponding point of I - your code doesn't do it.
First you need to create dst matrix with dimenstion as original image, and the same type of pixel.
Then you need to rewrite your loops to reflect formula described above:
cv::Mat_<float> spatialConvolution(const cv::Mat_<float>& src, const cv::Mat_<float>& kernel)
{
Mat dst(src.rows,src.cols,src.type());
Mat_<float> flipped_kernel;
flip(kernel, flipped_kernel, -1);
const int dx = kernel.cols / 2;
const int dy = kernel.rows / 2;
for (int i = 0; i<src.rows; i++)
{
for (int j = 0; j<src.cols; j++)
{
float tmp = 0.0f;
for (int k = 0; k<flipped_kernel.rows; k++)
{
for (int l = 0; l<flipped_kernel.cols; l++)
{
int x = j - dx + l;
int y = i - dy + k;
if (x >= 0 && x < src.cols && y >= 0 && y < src.rows)
tmp += src.at<float>(y, x) * flipped_kernel.at<float>(k, l);
}
}
dst.at<float>(i, j) = saturate_cast<float>(tmp);
}
}
return dst.clone();
}
Your memory access error is presumably happening due to the line:
dst.at<float>(i,j) = tmp;
because dst is not initialized. You can't assign something to that index of the matrix if it has no size/data. Instead, initialize the matrix first, as Mat_<float> is a declaration, not an initialization. Use one of the initializations where you can specify a cv::Size or the rows/columns from the different constructors for Mat (see the docs). For example, you can initialize dst with:
Mat dst{src.size(), src.type()};
I have a homework exercise. I'm almost sure it's unsolvable the way they ask it. However, I'm interested if you guys have any solution for the problem mentioned below because it seems like something that often occurs.
The description is not long, so I share it with you below:
A matrix S ∈ R n×n is skewsymmetric if it holds that S(Transpose) =
−S. Derive from the class SquareMatrix from the lecture the class
SkewSymmetricMatrix. Use a vector of length n(n − 1)/2 to store the
matrix entries. Implement constructors, type casting and a suitable
access to the coefficients.
The problem occurs while trying to provide the access, because the virtual access method defined in SquareMatrix returns a reference.
const double& SquareMatrix::operator()( int j, int k ) const
{
assert( j >= 0 && j < m );
assert( k >= 0 && k < n );
return coeff[ j + k * m ];
}
However, I can't return reference to the not stored variables. The following code is just for demonstrating my problem. In this case the j > k block would obviously not work.
const double& SkewSymmetricMatrix::operator()( int j, int k ) const
{
assert( j >= 0 && j < size() );
assert( k >= 0 && k < size() );
if( j < k )
{
const double* coeff = getCoeff();
return coeff[ j * ( j - 1 ) / 2 + k ];
}
else if ( j > k )
{
const double* coeff = getCoeff();
return -coeff[ k * ( k - 1 ) / 2 + j ];
}
else
{
return const_zero;
}
}
Do you have any suggestions that how to provide a proper access operator while trying to reduce the use of memory by
storing less elements
and calculating the non-stored
desired elements from the actually stored ones?
One idea in comments was using a temporary private member and return a reference to it. That would be a really bad idea:
If you use a reference to a private member, the value changes at next call. If the caller uses:
const double& val1 = my_square_matrix(2, 1);
const double& val2 = my_square_matrix(3, 1);
double sum = val1 + val2;
The result will be val2 + val2, not val1 + val2, because the wrong value is referenced.
But there are two solutions to fulfill the requirements:
Implement a get-method and throw an exception, if the parentheses-operator is called.
Make a second member vector, but leave it empty. Always if you access a not existent element, check the vector (and resize it once to n(n-1)/2 if empty, but never to another size!), and write the value at the desired position. Then, return a reference to this position.
Only once resize the vector, because it can allocate new memory if resized, so old references would get invalid.
The problem requires using a vector of n*(n-1)/2 elements, but it doesn't specify the type of element that that vector should contain.
A matrix S ∈ R n×n is skewsymmetric if it holds that S(Transpose) = −S. Derive from the class SquareMatrix from the lecture the class SkewSymmetricMatrix. Use a vector of length n(n − 1)/2 to store the matrix entries. Implement constructors, type casting and a suitable access to the coefficients.
Given that we have no capacity to change the interface given by the SquareMatrix class, we will have to resort to solving this problem with a solution that is technically compliant, even if it doesn't follow the spirit of the professor's request (which is impossible).
Each element of the vector will contain two double values, one corresponding to the element m_ij, and the other corresponding to m_ji (for i < j). We can call these the left and right elements:
struct mirror {
double left;
double right;
void assignLeft(double val) {
left = val;
right = -val;
}
void assignRight(double val) {
left = -val;
right = val;
}
};
We may use a vector of n*(n-1) / 2 elements of type mirror, and we return either the left or right members based on which index is asked for:
const double& SkewSymmetricMatrix::operator()( int j, int k ) const
{
assert( j >= 0 && j < size() );
assert( k >= 0 && k < size() );
if( j < k )
{
const mirror* coeff = getCoeff();
return coeff[ j * ( j - 1 ) / 2 + k ].left;
}
else if ( j > k )
{
const mirror* coeff = getCoeff();
return coeff[ k * ( k - 1 ) / 2 + j ].right;
}
else
{
return const_zero;
}
}
Such is life in Moscow.
If you want const double& SquareMatrix::operator()(int j, int k) const { ... } to override a virtual method, thus keep returning a const double&, then I don't know how to do it.
If you are allowed to change the signature, then create
class ValueAccessor {
double &value;
bool is_negated;
public:
ValueAccessor& operator=(double other) {
value = is_negated ? -other : other;
return *this;
}
operator double() const {
return is_negated ? -value : value;
}
...
};
, and return a ValueAccessor from operator()(...).
I've tried to compute the binomial coefficient by making a recursion with Pascal's triangle. It works great for small numbers, but 20 up is either really slow or doesn't work at all.
I've tried to look up some optimization techniques, such as "chaching" but they don't really seem to be well integrated in C++.
Here's the code if that helps you.
int binom(const int n, const int k)
{
double sum;
if(n == 0 || k == 0){
sum = 1;
}
else{
sum = binom(n-1,k-1)+binom(n-1,k);
}
if((n== 1 && k== 0) || (n== 1 && k== 1))
{
sum = 1;
}
if(k > n)
{
sum = 0;
}
return sum;
}
int main()
{
int n;
int k;
int sum;
cout << "Enter a n: ";
cin >> n;
cout << "Enter a k: ";
cin >> k;
Summe = binom(n,k);
cout << endl << endl << "Number of possible combinations: " << sum <<
endl;
}
My guess is that the programm wastes a lot of time calculating results it has already calculated. It somehow must memorize past results.
My guess is that the program wastes a lot of time calculating results it has already calculated.
That's definitely true.
On this topic, I'd suggest you have a look to Dynamic Programming Topic.
There is a class of problem which requires an exponential runtime complexity but they can be solved with Dynamic Programming Techniques.
That'd reduce the runtime complexity to polynomial complexity (most of the times, at the expense of increasing space complexity).
The common approaches for dynamic programming are:
Top-Down (exploiting memoization and recursion).
Bottom-Up (iterative).
Following, my bottom-up solution (fast and compact):
int BinomialCoefficient(const int n, const int k) {
std::vector<int> aSolutions(k);
aSolutions[0] = n - k + 1;
for (int i = 1; i < k; ++i) {
aSolutions[i] = aSolutions[i - 1] * (n - k + 1 + i) / (i + 1);
}
return aSolutions[k - 1];
}
This algorithm has a runtime complexity O(k) and space complexity O(k).
Indeed, this is a linear.
Moreover, this solution is simpler and faster than the recursive approach. It is very CPU cache-friendly.
Note also there is no dependency on n.
I have achieved this result exploiting simple math operations and obtaining the following formula:
(n, k) = (n - 1, k - 1) * n / k
Some math references on the Binomial Coeffient.
Note
The algorithm does not really need a space complexity of O(k).
Indeed, the solution at i-th step depends only on (i-1)-th.
Therefore, there is no need to store all intermediate solutions but just the one at the previous step. That would make the algorithm O(1) in terms of space complexity.
However, I would prefer keeping all intermediate solutions in solution code to better show the principle behind the Dynamic Programming methodology.
Here my repository with the optimized algorithm.
I would cache the results of each calculation in a map. You can't make a map with a complex key, but you could turn the key into a string.
string key = string("") + n.to_s() + "," + k.to_s();
Then have a global map:
map<string, double> cachedValues;
You can then do a lookup with the key, and if found, return immediately. otherwise before your return, store to the map.
I began mapping out what would happen with a call to 4,5. It gets messy, with a LOT of calculations. Each level deeper results in 2^n lookups.
I don't know if your basic algorithm is correct, but if so, then I'd move this code to the top of the method:
if(k > n)
{
return 0;
}
As it appears that if k > n, you always return 0, even for something like 6,100. I don't know if that's correct or not, however.
You're computing some binomial values multiple times. A quick solution is memoization.
Untested:
int binom(int n, int k);
int binom_mem(int n, int k)
{
static std::map<std::pair<int, int>, std::optional<int>> lookup_table;
auto const input = std::pair{n,k};
if (lookup_table[input].has_value() == false) {
lookup_table[input] = binom(n, k);
}
return lookup_table[input];
}
int binom(int n, int k)
{
double sum;
if (n == 0 || k == 0){
sum = 1;
} else {
sum = binom_mem(n-1,k-1) + binom_mem(n-1,k);
}
if ((n== 1 && k== 0) || (n== 1 && k== 1))
{
sum = 1;
}
if(k > n)
{
sum = 0;
}
return sum;
}
A better solution would be to turn the recursion tailrec (not easy with double recursions) or better yet, not use recursion at all ;)
I found this very simple (perhaps a bit slow) method of writing the binomial coefficient even for non integers, based on this proof (written by me):
double binomial_coefficient(float k, int a) {
double b=1;
for(int p=1; p<=a; p++) {
b=b*(k+1-p)/p;
}
return b;
}
If you can tolerate wasting some compile time memory, you can pre-compute a Pascal-Triangle at compile time. With a simple lookup mechanism, this will give you maximum speed.
The downsite is that you can only calculate up to the 69th row. After that, even an unsigned long long would overflow.
So, we simply use a constexpr function and calculate the values for a Pascal triangle in a 2 dimensional compile-time constexpr std::array.
The nCr function simply uses an index into that array (into Pascals Triangle).
Please see the following example code:
#include <iostream>
#include <utility>
#include <array>
#include <iomanip>
#include <cmath>
// Biggest number for which nCR will work with a 64 bit variable: 69
constexpr size_t MaxN = 69u;
// If we store Pascal Triangle in a 2 dimensional array, the size will be that
constexpr size_t ArraySize = MaxN;
// This function will generate Pascals triangle stored in a 2 dimension std::array
constexpr auto calculatePascalTriangle() {
// Result of function. Here we will store Pascals triangle as a 1 dimensional array
std::array<std::array<unsigned long long, ArraySize>, ArraySize> pascalTriangle{};
// Go through all rows and columns of Pascals triangle
for (size_t row{}; row < MaxN; ++row) for (size_t col{}; col <= row; ++col) {
// Border valus are always one
unsigned long long result{ 1 };
if (col != 0 && col != row) {
// And calculate the new value for the current row
result = pascalTriangle[row - 1][col - 1] + pascalTriangle[row - 1][col];
}
// Store new value
pascalTriangle[row][col] = result;
}
// And return array as function result
return pascalTriangle;
}
// This is a constexpr std::array<std::array<unsigned long long,ArraySize>, ArraySize> with the name PPP, conatining all nCr results
constexpr auto PPP = calculatePascalTriangle();
// To calculate nCr, we used look up the value from the array
constexpr unsigned long long nCr(size_t n, size_t r) {
return PPP[n][r];
}
// Some debug test driver code. Print Pascal triangle
int main() {
constexpr size_t RowsToPrint = 16u;
const size_t digits = static_cast<size_t>(std::ceil(std::log10(nCr(RowsToPrint, RowsToPrint / 2))));
for (size_t row{}; row < RowsToPrint; ++row) {
std::cout << std::string((RowsToPrint - row) * ((digits + 1) / 2), ' ');
for (size_t col{}; col <= row; ++col)
std::cout << std::setw(digits) << nCr(row, col) << ' ';
std::cout << '\n';
}
return 0;
}
We can also store Pascals Triangle in a 1 dimensional constexpr std::array. But then we need to additionally calculate the Triangle numbers to find the start index for a row. But also this can be done completely at compile time.
Then the solution would look like this:
#include <iostream>
#include <utility>
#include <array>
#include <iomanip>
#include <cmath>
// Biggest number for which nCR will work with a 64 bit variable
constexpr size_t MaxN = 69u; //14226520737620288370
// If we store Pascal Triangle in an 1 dimensional array, the size will be that
constexpr size_t ArraySize = (MaxN + 1) * MaxN / 2;
// To get the offset of a row of a Pascals Triangle stored in an1 1 dimensional array
constexpr size_t getTriangleNumber(size_t row) {
size_t sum{};
for (size_t i = 1; i <= row; i++) sum += i;
return sum;
}
// Generate a std::array with n elements of a given type and a generator function
template <typename DataType, DataType(*generator)(size_t), size_t... ManyIndices>
constexpr auto generateArray(std::integer_sequence<size_t, ManyIndices...>) {
return std::array<DataType, sizeof...(ManyIndices)>{ { generator(ManyIndices)... } };
}
// This is a std::arrax<size_t,MaxN> withe the Name TriangleNumber, containing triangle numbers for ip ti MaxN
constexpr auto TriangleNumber = generateArray<size_t, getTriangleNumber>(std::make_integer_sequence<size_t, MaxN>());
// This function will generate Pascals triangle stored in an 1 dimension std::array
constexpr auto calculatePascalTriangle() {
// Result of function. Here we will store Pascals triangle as an 1 dimensional array
std::array <unsigned long long, ArraySize> pascalTriangle{};
size_t index{}; // Running index for storing values in the array
// Go through all rows and columns of Pascals triangle
for (size_t row{}; row < MaxN; ++row) for (size_t col{}; col <= row; ++col) {
// Border valuse are always one
unsigned long long result{ 1 };
if (col != 0 && col != row) {
// So, we are not at the border. Get the start index the upper 2 values
const size_t offsetOfRowAbove = TriangleNumber[row - 1] + col;
// And calculate the new value for the current row
result = pascalTriangle[offsetOfRowAbove] + pascalTriangle[offsetOfRowAbove - 1];
}
// Store new value
pascalTriangle[index++] = result;
}
// And return array as function result
return pascalTriangle;
}
// This is a constexpr std::array<unsigned long long,ArraySize> with the name PPP, conatining all nCr results
constexpr auto PPP = calculatePascalTriangle();
// To calculate nCr, we used look up the value from the array
constexpr unsigned long long nCr(size_t n, size_t r) {
return PPP[TriangleNumber[n] + r];
}
// Some debug test driver code. Print Pascal triangle
int main() {
constexpr size_t RowsToPrint = 16; // MaxN - 1;
const size_t digits = static_cast<size_t>(std::ceil(std::log10(nCr(RowsToPrint, RowsToPrint / 2))));
for (size_t row{}; row < RowsToPrint; ++row) {
std::cout << std::string((RowsToPrint - row+1) * ((digits+1) / 2), ' ');
for (size_t col{}; col <= row; ++col)
std::cout << std::setw(digits) << nCr(row, col) << ' ';
std::cout << '\n';
}
return 0;
}
I'm trying to understand how to solve the problem of finding all unique paths in a grid using dynamic programming:
A robot is located at the top-left corner of a m x n grid (marked ‘Start’ in the diagram below). The robot can only move either down or right at any point in time. The robot is trying to reach the bottom-right corner of the grid (marked ‘Finish’ in the diagram below). How many possible unique paths are there?
I was looking at this article and I was wondering why in the below solution, the matrix is initialized at M_MAX + 2 and N_MAX + 2, and also why in the function signature of backtrack, why the last parameter is initialized with int mat[][N_MAX+2]
const int M_MAX = 100;
const int N_MAX = 100;
int backtrack(int r, int c, int m, int n, int mat[][N_MAX+2]) {
if (r == m && c == n)
return 1;
if (r > m || c > n)
return 0;
if (mat[r+1][c] == -1)
mat[r+1][c] = backtrack(r+1, c, m, n, mat);
if (mat[r][c+1] == -1)
mat[r][c+1] = backtrack(r, c+1, m, n, mat);
return mat[r+1][c] + mat[r][c+1];
}
int bt(int m, int n) {
int mat[M_MAX+2][N_MAX+2];
for (int i = 0; i < M_MAX+2; i++) {
for (int j = 0; j < N_MAX+2; j++) {
mat[i][j] = -1;
}
}
return backtrack(1, 1, m, n, mat);
}
Then in the author's bottom-up approach solution:
const int M_MAX = 100;
const int N_MAX = 100;
int dp(int m, int n) {
int mat[M_MAX+2][N_MAX+2] = {0};
mat[m][n+1] = 1;
for (int r = m; r >= 1; r--)
for (int c = n; c >= 1; c--)
mat[r][c] = mat[r+1][c] + mat[r][c+1];
return mat[1][1];
}
I don't know what the purpose of the line mat[m][n+1] = 1; serves.
I'm not familiar with Java, so I apologize if these boil down to syntactical or language-specific questions.
Firstly, notice that the author and the second solution both use 1-based indexing. So, of course, mat[M_MAX+1][N_MAX+1] would be quite justified.
Now, notice the logic the author is using.
mat[r][c] = mat[r+1][c] + mat[r][c+1];
Hence, to prevent r+1 or c+1 from going out of bounds when c = n+1 or r = m+1, instead of adding an if-statement like this:
if (r == m)
mat[r][c] = mat[r][c+1];
if (c == n)
mat[r][c] = mat[r+1][c];
He has decided to simply add an extra row or column with 0 value stored in it. Hence:
mat[M_MAX+2][N_MAX+2] = {0};
Finally, in a bottom up approach, one must initialize mat[m][n] to 1. Instead of doing that, knowing that mat[m][n] = mat[m+1][n] + mat[m][n+1];, he initialized :
mat[m][n+1] = 1; // mat[m+1][n] = 0;
Feel free to ask any questions in comments.
EDIT You can checkout my implementation on Github: https://github.com/Sheljohn/WalshHadamard
I am looking for an implementation, or indications on how to implement, the sequency-ordered Fast Walsh Hadamard transform (see this and this).
I slightly adapted a very nice implementation found online:
// (a,b) -> (a+b,a-b) without overflow
void rotate( long& a, long& b )
{
static long t;
t = a;
a = a + b;
b = t - b;
}
// Integer log2
long ilog2( long x )
{
long l2 = 0;
for (; x; x >>=1) ++l2;
return l2;
}
/**
* Fast Walsh-Hadamard transform
*/
void fwht( std::vector<long>& data )
{
const long l2 = ilog2(data.size()) - 1;
for (long i = 0; i < l2; ++i)
{
for (long j = 0; j < (1 << l2); j += 1 << (i+1))
for (long k = 0; k < (1 << i ); ++k)
rotate( data[j + k], data[j + k + (1<<i)] );
}
}
but it does not compute the WHT in sequency order (the natural Hadamard matrix is used implicitly). Note that in the code above (and if you try it), the size of data needs to be a power of 2.
My question is: is there a simple adaptation of this implementation that gives the sequency-ordered FWHT?
A possible solution would be to write a small function to compute dynamically the elements of Hn (the Hadamard matrix of order n), count the number of zero crossings, and create a ranking of the rows, but I am wondering whether there is a smarter way. Thanks in advance for any input! Cheers
As indicated here (linked from within your reference):
The sequency ordering of the rows of the Walsh matrix can be derived from the ordering of the Hadamard matrix by first applying the bit-reversal permutation and then the Gray code permutation.
There are various implementations of bit-reversal algorithm such as this:
// Bit-reversal
// adapted from http://www.idi.ntnu.no/~elster/pubs/elster-bit-rev-1989.pdf
void bitrev(int t, std::vector<long>& c)
{
long n = 1<<t;
long L = 1;
c[0] = 0;
for (int q=0; q<t; ++q)
{
n /= 2;
for (long j=0; j<L; ++j)
{
c[L+j] = c[j] + n;
}
L *= 2;
}
}
The gray code can be obtained from here:
/*
The purpose of this function is to convert an unsigned
binary number to reflected binary Gray code.
The operator >> is shift right. The operator ^ is exclusive or.
*/
unsigned int binaryToGray(unsigned int num)
{
return (num >> 1) ^ num;
}
These can be combined to yields the final permutation:
// Compute a permutation of size 2^order
// to reorder the Fast Walsh-Hadamard transform's output
// into the Walsh-ordered (sequency-ordered)
void sequency_permutation(long order, std::vector<long>& p)
{
long n = 1<<order;
std::vector<long> tmp(n);
bitrev(order, tmp);
p.resize(n);
for (long i=0; i<n; ++i)
{
p[i] = tmp[binaryToGray(i)];
}
}
All that's left to do is to apply the permutation to the normal Walsh-Hadamard Transform output.
void permuted_fwht(std::vector<long>& data, const std::vector<long>& permutation)
{
std::vector<long> tmp = data;
fwht(tmp);
for (long i=0; i<data.size(); ++i)
{
data[i] = tmp[permutation[i]];
}
}
Note that the permutation is fixed for a given data size, so it only needs to be computed once (assuming you are processing multiple blocks of data). So, putting it all together you would get something such as:
std::vector<long> p;
const long order = ilog2(data_block_size) - 1;
sequency_permutation(order, p);
permuted_fwht( data_block_1, p);
permuted_fwht( data_block_2, p);
//...