Write a class that takes in a step size (n) in its constructor. The only method in the class takes in an integer, adds it into a sequence of numbers, and returns the average of the last n values inserted into the sequence. Do not iterate over the sequence to calculate the average.
And NO, this isn't homework
Following is my way of doing it in C++:
Initialize two STL queue<int>, one of which has length n and is called buffer
User - input values are stored dynamically in the buffer. Once this buffer is full, add the user - input value to "sum" and subtract the buffer.front() value.
Push the first value from buffer into the second queue<int> named values
Pop the first value (buffer.pop())
return the average by dividing sum by n
Following is the code I came up with:
#ifndef calcAverage_Window_h
#define calcAverage_Window_h
#include <iostream>
#include <queue>
using namespace std;
class Window{
private:
int n, sum;
queue<int> values, buffer, sums;
public:
Window(int);
float calcAverage(int);
};
#endif
#include "Window.h"
Window::Window(int m){
n = m;
buffer.push(1);
buffer.push(2);
buffer.push(3);
sum = 6;
}
float Window::calcAverage(int val){
buffer.push(val);
values.push(buffer.front());
sum = sum + val - buffer.front();
buffer.pop();
return float(sum)/n; //float(sum) required so that calcAverage doesn't return an int
}
#include "Window.h"
int main()
{
Window w(3);
cout<<w.calcAverage(4)<<endl;
cout<<w.calcAverage(5)<<endl;
cout<<w.calcAverage(6)<<endl;
return 0;
}
I have the following questions:
Is there a better way to do this?
If we are not allowed to use STL either, I would implement a queue and use that for buffer and values. Does anyone have a better idea?
I cheated a bit by initializing the buffer in the Window(n) constructor. That is because: 1) I did not know how else I would go about it
2) It maybe clear for the case when n = 2, but it is ambiguous for n = 3.
Where will this method / code fail?
I came to think of this way empirically. Is there an algorithmic way to look at this problem?
To answer a few of your questions:
Where will this method / code fail?
Well, assuming the above code is bug-free, it will not necessarily work correctly if you decide to move to floating-point data.
Note that its overflow behaviour is also subtly different compared to a direct implementation of a moving average.
Is there an algorithmic way to look at this problem?
Yes. With window size L the moving sums for time n and time n-1 are as follows:
y[n] = x[n] + x[n-1] + ... + x[n-L+1]
y[n-1] = x[n-1] + ... + x[n-L+1] + x[n-L]
Subtract one equation from the other, you get:
y[n] - y[n-1] = x[n] - x[n-L]
Move y[n-1] to the other side of the equals sign, and you're done.
Related
So I was asked to write a function that changes array's values in a way that:
All of the values that are the smallest aren't changed
if, let's assume, the smallest number is 2 and there is no 3's and 4's then all 5's are changed for 3's etc.
for example, for an array = [2, 5, 7, 5] we would get [2, 3, 4, 3], which generalizes to getting a minimal value of an array which remains unchanged, and every other minimum (not including the first one) is changed depending on which minimum it is. On our example - 5 is the first minimum (besides 2), so it is 2 (first minimum) + 1 = 3, 7 is 2nd smallest after 2, so it is 2+2(as it is 2nd smallest).
I've come up with something like this:
int fillGaps(int arr[], size_t sz){
int min = *min_element(arr, arr+sz);
int w = 1;
for (int i = 0; i<sz; i++){
if (arr[i] == min) {continue;}
else{
int mini = *min_element(arr+i, arr+sz);
for (int j = 0; j<sz; j++){
if (arr[j] == mini){arr[j] = min+w;}
}
w++;}
}
return arr[sz-1];
}
However it works fine only for the 0th and 1st value, it doesnt affect any further items. Could anyone please help me with that?
I don't quite follow the logic of your function, so can't quite comment on that.
Here's how I interpret what needs to be done. Note that my example implementation is written to be as understandable as possible. There might be ways to make it faster.
Note that I'm also using an std::vector, to make things more readable and C++-like. You really shouldn't be passing raw pointers and sizes, that's super error prone. At the very least bundle them in a struct.
#include <algorithm>
#include <set>
#include <unordered_map>
#include <vector>
int fillGaps (std::vector<int> & data) {
// Make sure we don't have to worry about edge cases in the code below.
if (data.empty()) { return 0; }
/* The minimum number of times we need to loop over the data is two.
* First to check which values are in there, which lets us decide
* what each original value should be replaced with. Second to do the
* actual replacing.
*
* So let's trade some memory for speed and start by creating a lookup table.
* Each entry will map an existing value to its new value. Let's use the
* "define lambda and immediately invoke it" to make the scope of variables
* used to calculate all this as small as possible.
*/
auto const valueMapping = [&data] {
// Use an std::set so we get all unique values in sorted order.
std::set<int> values;
for (int e : data) { values.insert(e); }
std::unordered_map<int, int> result;
result.reserve(values.size());
// Map minimum value to itself, and increase replacement value by one for
// each subsequent value present in the data vector.
int replacement = *values.begin();
for (auto e : values) { result.emplace(e, replacement++); }
return result;
}();
// Now the actual algorithm is trivial: loop over the data and replace each
// element with its replacement value.
for (auto & e : data) { e = valueMapping.at(e); }
return data.back();
}
My question is a follow-up to How to make this code faster (learning best practices)?, which has been put on hold (bummer). The problem is to optimize a loop over an array with floats which are tested for whether they lie within a given interval. Indices of matching elements in the array are to be stored in a provided result array.
The test includes two conditions (smaller than the upper threshold and bigger than the lower one). The obvious code for the test is if( elem <= upper && elem >= lower ) .... I observed that branching (including the implicit branch involved in the short-circuiting operator&&) is much more expensive than the second comparison. What I came up with is below. It is about 20%-40% faster than a naive implementation, more than I expected. It uses the fact that bool is an integer type. The condition test result is used as an index into two result arrays. Only one of them will contain the desired data, the other one can be discarded. This replaces program structure with data structure and computation.
I am interested in more ideas for optimization. "Technical hacks" (of the kind provided here) are welcome. I'm also interested in whether modern C++ could provide means to be faster, e.g. by enabling the compiler to create parallel running code. Think visitor pattern/functor. Computations on the single srcArr elements are almost independent, except that the order of indices in the result array depends on the order of testing the source array elements. I would loosen the requirements a little so that the order of the matching indices reported in the result array is irrelevant. Can anybody come up with a fast way?
Here is the source code of the function. A supporting main is below. gcc needs -std=c++11 because of chrono. VS 2013 express was able to compile this too (and created 40% faster code than gcc -O3).
#include <cstdlib>
#include <iostream>
#include <chrono>
using namespace std;
using namespace std::chrono;
/// Check all elements in srcArr whether they lie in
/// the interval [lower, upper]. Store the indices of
/// such elements in the array pointed to by destArr[1]
/// and return the number of matching elements found.
/// This has been highly optimized, mainly to avoid branches.
int findElemsInInterval( const float srcArr[], // contains candidates
int **const destArr, // two arrays to be filled with indices
const int arrLen, // length of each array
const float lower, const float upper // interval
)
{
// Instead of branching, use the condition
// as an index into two distinct arrays. We need to keep
// separate indices for both those arrays.
int destIndices[2];
destIndices[0] = destIndices[1] = 0;
for( int srcInd=0; srcInd<arrLen; ++srcInd )
{
// If the element is inside the interval, both conditions
// are true and therefore equal. In all other cases
// exactly one condition is true so that they are not equal.
// Matching elements' indices are therefore stored in destArr[1].
// destArr[0] is a kind of a dummy (it will incidentally contain
// indices of non-matching elements).
// This used to be (with a simple int *destArr)
// if( srcArr[srcInd] <= upper && srcArr[srcInd] >= lower) destArr[destIndex++] = srcInd;
int isInInterval = (srcArr[srcInd] <= upper) == (srcArr[srcInd] >= lower);
destArr[isInInterval][destIndices[isInInterval]++] = srcInd;
}
return destIndices[1]; // the number of elements in the results array
}
int main(int argc, char *argv[])
{
int arrLen = 1000*1000*100;
if( argc > 1 ) arrLen = atol(argv[1]);
// destArr[1] will hold the indices of elements which
// are within the interval.
int *destArr[2];
// we don't check destination boundaries, so make them
// the same length as the source.
destArr[0] = new int[arrLen];
destArr[1] = new int[arrLen];
float *srcArr = new float[arrLen];
// Create always the same numbers for comparison (don't srand).
for( int srcInd=0; srcInd<arrLen; ++srcInd ) srcArr[srcInd] = rand();
// Create an interval in the middle of the rand() spectrum
float lowerLimit = RAND_MAX/3;
float upperLimit = lowerLimit*2;
cout << "lower = " << lowerLimit << ", upper = " << upperLimit << endl;
int numInterval;
auto t1 = high_resolution_clock::now(); // measure clock time as an approximation
// Call the function a few times to get a longer run time
for( int srcInd=0; srcInd<10; ++srcInd )
numInterval = findElemsInInterval( srcArr, destArr, arrLen, lowerLimit, upperLimit );
auto t2 = high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>( t2 - t1 ).count();
cout << numInterval << " elements found in " << duration << " milliseconds. " << endl;
return 0;
}
Thinking of the integer range check optimization of turning a <= x && x < b into ((unsigned)(x-a)) < b-a, a floating point variant comes to mind:
You could try something like
const float radius = (b-a)/2;
if( fabs( x-(a+radius) ) < radius )
...
to reduce the check to one conditional.
I see about a 10% speedup from this:
int destIndex = 0; // replace destIndices
int isInInterval = (srcArr[srcInd] <= upper) == (srcArr[srcInd] >= lower);
destArr[1][destIndex] = srcInd;
destIndex += isInInterval;
Eliminate the pair of output arrays. Instead only advance the 'number written' by 1 if you want to keep the result, otherwise just keep overwriting the 'one past the end' index.
Ie, retval[destIndex]=curIndex; destIndex+= isInArray; -- better coherancy and less wasted memory.
Write two versions: one that supports a fixed array length (of say 1024 or whatever) and another that supports a runtime parameter. Use a template argumemt to remove code duplication. Assume the length is less than that constant.
Have the function return size and a RVO'd std::array<unsigned, 1024>.
Write a wrapper function that merges results (create all results, then merge them). Then throw the parrallel patterns library at the problem (so the results get computed in parrallel).
If you allow yourself vectorization using the SSE (or better, AVX) instruction set, you can perform 4/8 comparisons in a go, do this twice, 'and' the results, and retrieve the 4 results (-1 or 0). At the same time, this unrolls the loop.
// Preload the bounds
__m128 lo= _mm_set_ps(lower);
__m128 up= _mm_set_ps(upper);
int srcIndex, dstIndex= 0;
for (srcInd= 0; srcInd + 3 < arrLen; )
{
__m128 src= _mm_load_ps(&srcArr[srcInd]); // Load 4 values
__m128 tst= _mm_and_ps(_mm_cmple_ps(src, lo), _mm_cmpge_ps(src, up)); // Test
// Copy the 4 indexes with conditional incrementation
dstArr[dstIndex]= srcInd++; destIndex-= tst.m128i_i32[0];
dstArr[dstIndex]= srcInd++; destIndex-= tst.m128i_i32[1];
dstArr[dstIndex]= srcInd++; destIndex-= tst.m128i_i32[2];
dstArr[dstIndex]= srcInd++; destIndex-= tst.m128i_i32[3];
}
CAUTION: unchecked code.
I have a large array of floating point numbers and I want to find out the minimum value of the array (ignoring -1s wherever present) as well as its index, using reduction in CUDA. I have written the following code to do this, which in my opinion should work:
__global__ void get_min_cost(float *d_Cost,int n,int *last_block_number,int *number_in_last_block,int *d_index){
int tid = threadIdx.x;
int myid = blockDim.x * blockIdx.x + threadIdx.x;
int s;
if(result == (*last_block_number)-1){
s = (*number_in_last_block)/2;
}else{
s = 1024/2;
}
for(;s>0;s/=2){
if(myid+s>=n)
continue;
if(tid<s){
if(d_Cost[myid+s] == -1){
continue;
}else if(d_Cost[myid] == -1 && d_Cost[myid+s] != -1){
d_Cost[myid] = d_Cost[myid+s];
d_index[myid] = d_index[myid+s];
}else{
// both not -1
if(d_Cost[myid]<=d_Cost[myid+s])
continue;
else{
d_Cost[myid] = d_Cost[myid+s];
d_index[myid] = d_index[myid+s];
}
}
}
else
continue;
__syncthreads();
}
if(tid==0){
d_Cost[blockIdx.x] = d_Cost[myid];
d_index[blockIdx.x] = d_index[myid];
}
return;
}
The last_block_number argument is the id of the last block, and number_in_last_block is the number of elements in last block (which is a power of 2). Thus, all blocks will launch 1024 threads every time and the last block will only use number_in_last_block threads, while others will use 1024 threads.
After this function runs, I expect the minimum values for each block to be in d_Cost[blockIdx.x] and their indices in d_index[blockIdx.x].
I call this function multiple times, each time updating the number of threads and blocks. The second time I call this function, the number of threads now become equal to the number of blocks remaining etc.
However, the above function isn't giving me the desired output. In fact, it gives a different output every time I run the program, i.e, it returns an incorrect value as the minimum during some intermediate iteration (though that incorrect value is quite close to the minimum every time).
What am I doing wrong here?
As I mentioned in my comment above, I would recommend to avoid writing reductions of your own and use CUDA Thrust whenever possible. This holds true even in the case when you need to customize those operations, the customization being possible by properly overloading, e.g., relational operations.
Below I'm providing a simple code to evaluate the minimum in an array along with its index. It is based on a classical example contained in the An Introduction to Thrust presentation. The only addition is skipping, as you requested, the -1's from the counting. This can be reasonably done by replacing all the -1's in the array by INT_MAX, i.e., the maximum representable integer according to IEEE floating point standards.
#include <thrust\device_vector.h>
#include <thrust\replace.h>
#include <thrust\sequence.h>
#include <thrust\reduce.h>
#include <thrust\iterator\zip_iterator.h>
#include <thrust\tuple.h>
// --- Struct returning the smallest of two tuples
struct smaller_tuple
{
__host__ __device__ thrust::tuple<int,int> operator()(thrust::tuple<int,int> a, thrust::tuple<int,int> b)
{
if (a < b)
return a;
else
return b;
}
};
void main() {
const int N = 20;
const int large_value = INT_MAX;
// --- Setting the data vector
thrust::device_vector<int> d_vec(N,10);
d_vec[3] = -1; d_vec[5] = -2;
// --- Copying the data vector to a new vector where the -1's are changed to FLT_MAX
thrust::device_vector<int> d_vec_temp(d_vec);
thrust::replace(d_vec_temp.begin(), d_vec_temp.end(), -1, large_value);
// --- Creating the index sequence [0, 1, 2, ... )
thrust::device_vector<int> indices(d_vec_temp.size());
thrust::sequence(indices.begin(), indices.end());
// --- Setting the initial value of the search
thrust::tuple<int,int> init(d_vec_temp[0],0);
thrust::tuple<int,int> smallest;
smallest = thrust::reduce(thrust::make_zip_iterator(thrust::make_tuple(d_vec_temp.begin(), indices.begin())),
thrust::make_zip_iterator(thrust::make_tuple(d_vec_temp.end(), indices.end())),
init, smaller_tuple());
printf("Smallest %i %i\n",thrust::get<0>(smallest),thrust::get<1>(smallest));
getchar();
}
I'm implementing on Visual Studio 2010 C++
I have two binary arrays. For example,
array1[100] = {1,0,1,0,0,1,1, .... }
array2[100] = {0,0,1,1,1,0,1, .... }
To calculate the Hamming distance between array1 and array2,
array3[100] stores the xor result of array1 and array2.
Then I have to count the number of 1 bits in array3. To do this, I know I can use the __popcnt instruction.
For now, I'm doing something like below:
popcnt_result = 0;
for (i=0; i<100; i++) {
popcnt_result = popcnt_result + __popcnt(array3[i]);
}
It shows a good result but is slow. How can I make it faster?
array3 seems a bit wasteful, you're accessing a whole extra 400 bytes of memory that you don't need to. I would try comparing what you have with the following:
for (int i = 0; i < 100; ++i) {
result += (array1[i] ^ array2[i]); // could also try != in place of ^
}
If that helps at all, then I leave it as an exercise for the reader how to apply both this change and duskwuff's.
As implemented, the __popcnt call is not helping. It's actually slowing you down.
__popcnt counts the number of set bits in its argument. You're only passing in one element, which looks like it's guaranteed to be 0 or 1, so the result (also 0 or 1) is not useful. Doing this would be slightly faster:
popcnt_result += array3[i];
Depending on how your array is laid out, you may or may not be able to use __popcnt in a cleverer way. Specifically, if your array consists of one-byte elements (e.g, char, bool, int8_t, or similar), you could perform a population count on four elements at a time:
for(i = 0; i < 100; i += 4) {
uint32_t *p = (uint32_t *) &array3[i];
popcnt_result += __popcnt(*p);
}
(Note that this depends on the fact that 100 is divisible evenly by 4. You'd have to add some special-case handling for the last few elements otherwise.)
If the array consists of larger values, such as int, though, you're out of luck, and there's still no guarantee that this will be any faster than the naïve implementation above.
If your arrays only contain two values (0 or 1) the Hamming distance is just the number of positions where corresponding values are different. This can be done in one pass using std::inner_product from the standard library.
#include <iostream>
#include <functional>
#include <numeric>
int main()
{
int array1[100] = { 1,0,1,0,0,1,1, ... };
int array2[100] = { 0,0,1,1,1,0,1, ... };
int distance = std::inner_product(array1, array1 + 100, array2, 0, std::plus<int>(), std::not_equal_to<int>());
std::cout << "distance=" << distance << '\n';
return 0;
}
Hey, so basically I have this issue, where I'm trying to put an equation inside of a function however it doesn't seem to set the value to the function and instead doesn't change it at all.
This is a predator prey simulation and I have this code inside of a for loop.
wolves[i+1] = ((1 - wBr) * wolves[i] + I * S * rabbits[i] * wolves[i]);
rabbits[i+1] = (1 + rBr) * rabbits[i] - I * rabbits[i] * wolves[i];
When I execute this, it works as intended and changes the value of both of these arrays appropriately, however when I try to put it inside of a function,
int calcRabbits(int R, int rBr, int I, int W)
{
int x = (1 + rBr) * R - I * R * W;
return x;
}
int calcWolves(int wBr, int W, int I, int S, int R)
{
int x = ((1 - wBr) * W + I * S * R * R);
return x;
}
And set the values as such
rabbits[i+1] = calcRabbits ( rabbits[i], rBr, I, wolves[i]);
wolves[i+1] = calcWolves(wBr, wolves[i], I, S, rabbits[i]);
The values remain the same as they were when they were initialized and it doesn't seem to work at all, and I have no idea why. I have been at this for a good few hours and it's probably something that I'm missing, but I can't figure it out.
Any and all help is appreciated.
Edit: I realized the parameters were wrong, but I tried it before with the correct parameters and it still didnt work, just accidentally changed it to the wrong parameters (Compiler mouse-over was showing the old version of the parameters)
Edit2: The entire section of code is this
days = getDays(); // Runs function to get Number of days to run the simulation for
dayCycle = getCycle(); // Runs the function get Cycle to get the # of days to mod by
int wolves[days]; // Creates array wolves[] the size of the amount of days
int rabbits[days]; // Creates array rabbits [] the size of the amount of days
wolves[0] = W; // Sets the value of the starting number of wolves
rabbits[0] = R; // sets starting value of rabbits
for(int i = 0; i < days; i++) // For loop runs the simulation for the number of days
{
// rabbits[i+1] = calcRabbits ( rabbits[i], rBr, I, wolves[i]);
// // //This is the code to change the value of both of these using the function
// wolves[i+1] = calcWolves(wBr, wolves[i], I, S, rabbits[i]);
// This is the code that works and correctly sets the value for wolves[i+1]
wolves[i+1] = calcWolves(wBr, wolves[i], I, S, rabbits[i]);
rabbits[i+1] = (1 + rBr) * rabbits[i] - I * rabbits[i] * wolves[i];
}
Edit: I realized my mistake, I was putting rBr and wBr in as ints, and they were floats which were numbers that were below 1, so they were being automatically converted to be 0. Thanks sje
Phil I cannot see anything evidently wrong in your code.
My hunch is that your are messing up the parameters.
Using gdb at this point would be an over kill. I recommend you put print outs in calcRabbits and calcWolves. Print out all the parameters, the new value, and the iteration number. That will give you a good idea of what is going on and will help trace the problem.
Do you have the full code with initialization we could try to test and run?
I'm not sure this is the problem, but this is bad:
int wolves[days]; // Creates array wolves[] the size of the amount of days
int rabbits[days]; // Creates array rabbits [] the size of the amount of days
days is determined at runtime. This is nonstandard in c++ (and for large number of days could destroy your stack) you should only be using constants in array sizes. You can dynamically size a vector to workaround this limitation (or heap allocate the array).
Change to this:
std::vector<int> wolves(days);
std::vector<int> rabbits(days);
Or to this:
int *wolves = new int[days];
int *rabbits = new int[days];
// all your code goes here
delete [] wolves; // when you're done
delete [] rabbits; // when you're done
Which will dynamically allocate the array on the heap. The rest of the code should work the same.
Don't forget to #include <vector>, if you use the vector approach.
If you're still having problems, I would cout << "Days: " << days << endl; to make sure you're getting the right number back from getDays(). If you got zero, it would seem to manifest itself in "the loop not working".
I was using an integer as an argument for a double.