Knapsack algorithm for two bags - c++
I've found thread which provides pseudo-code for knapsack algorithm with 2 knapsacks.
I've tried implement it in C++, but it doesn't work as suppose. Here's code:
#include <cstdio>
#define MAX_W1 501
#define MAX_W2 501
int maximum(int a, int b, int c) {
int max = a>b?a:b;
return c>max?c:max;
}
int knapsack[MAX_W1][MAX_W2] = {0};
int main() {
int n, s1, s2, gain, weight; // items, sack1, sack2, gain, cost
scanf("%d %d %d", &n, &s1, &s2);
// filing knapsack
for (int i = 0; i < n; i++) {
scanf("%d %d", &gain, &weight);
for (int w1 = s1; w1 >= weight; w1--) {
for (int w2 = s2; w2 >= weight; w2--) {
knapsack[w1][w2] = maximum(
knapsack[w1][w2], // we have best option
knapsack[w1 - weight][w2] + gain, // put into sack one
knapsack[w1][w2 - weight] + gain // put into sack two
);
}
}
}
int result = 0;
// searching for result
for (int i = 0; i <= s1; i++) {
for (int j = 0; j <= s2; j++) {
if (knapsack[i][j] > result) {
result = knapsack[i][j];
}
}
}
printf("%d\n", result);
return 0;
}
For instance for following input:
5 4 3
6 2
3 2
4 1
2 1
1 1
I have output:
13
Obviously it's wrong because I can take all items (1,2 into first bag and rest to second bag) and sum is 16.
I would be grateful for any explanation where I get pseudo-code wrong.
I made little update since, some people have problem with understanding the input format:
First line contains 3 numbers as follows number of items, capacity of sack one, capacity of sack two
Later on there are n lines where each contains 2 numbers: gain, cost of i-th item.
Assume that sacks cannot be larger than 500.
The algorithm you're using appears incorrect, because it will only consider cases where the object happens to fit in both sacks. I made the following changes to your code and it operates correctly now:
#include <algorithm>
using std::max;
int max3(int a, int b, int c) {
return max(a, max(b, c));
}
and
for (int w1 = s1; w1 >= 0; w1--) {
for (int w2 = s2; w2 >= 0; w2--) {
if (w1 >= weight && w2 >= weight) // either sack has room
{
knapsack[w1][w2] = max3(
knapsack[w1][w2], // we have best option
knapsack[w1 - weight][w2] + gain, // put into sack one
knapsack[w1][w2 - weight] + gain // put into sack two
);
}
else if (w1 >= weight) // only sack one has room
{
knapsack[w1][w2] = max(
knapsack[w1][w2], // we have best option
knapsack[w1 - weight][w2] + gain // put into sack one
);
}
else if (w2 >= weight) // only sack two has room
{
knapsack[w1][w2] = max(
knapsack[w1][w2], // we have best option
knapsack[w1][w2 - weight] + gain // put into sack two
);
}
}
}
Here is modification to code to make it work:-
#include <cstdio>
#define MAX_W1 501
#define MAX_W2 501
int maximum(int a, int b, int c) {
int max = a>b?a:b;
return c>max?c:max;
}
int knapsack[MAX_W1][MAX_W2] = {0};
int main() {
int n, s1, s2, gain, weight; // items, sack1, sack2, gain, cost
scanf("%d %d %d", &n, &s1, &s2);
// filing knapsack
for (int i = 0; i < n; i++) {
scanf("%d %d", &gain, &weight);
// need to fill up all the table cannot stop if one sack is full because item might fit in other
for (int w1 = s1; w1 >= 0; w1--) {
for (int w2 = s2; w2 >= 0; w2--) {
int val1=0,val2=0;
if(weight<=w1)
val1 = knapsack[w1 - weight][w2] + gain;
if(weight<=w2)
val2 = knapsack[w1][w2 - weight] + gain;
knapsack[w1][w2] = maximum(
knapsack[w1][w2], // we have best option
val1, // put into sack one
val2 // put into sack two
);
}
}
}
// No need to search for max value it always be Knapsack[s1][s2]
printf("%d\n", knapsack[s1][s2]);
return 0;
}
Related
Finding realtime reliable and precise peak detection algorithm for noisy signals
I am working on one project right now, basically I need to precisely realtime measure peaks measured by Hall sensor through RPi Pico, coding in Arduino IDE through Arduino-Pico library, the problem is, the signal is quite noisy and not every peak is perfect, many are quite destroyed, I need to have reliable and precise algorithm for that. I would be very grateful if someone have worked on similar problem and would be able to give me some advice. The signal looks like this: This is raw signal from Hall sensor: This is averaged signal from 4 previous values (data is not the same as previous one): I have tried two methods: one was to set a highThreshold and, when the value is over it, the program starts to look for the highest number in current region; this worked, although not in parts where the data is somewhat corrupted and the graph does not have proper peak (the curVal is input). HighThresCoeff = 0.85 //code for highThreshold generation vals[i]=curVal; i++; if(i==arrSize){ low=getLow(vals); high=getHigh(vals); highThreshold=((high-low)*HighThresCoeff+low); i=0; } //peak detection if (curVal > highThreshold) { activated = true; if(curVal > lastHigh){ lastHigh = curVal; lastHighTime = micros(); } } else if (activated == true) { lastHigh = 0; activated = false; t2 = t1; t1 = lastHighTime; // code for processing the time of the peak } The other method I tried was also based on highThreshold, although I was looking for time, when the graph value was over and under the threshold, then made an average; this was better although, because of noise, I still haven't had as nice data as I wished for. HighThresCoeff = 0.85 //code for highThreshold generation vals[i]=curVal; i++; if(i==arrSize){ low=getLow(vals); high=getHigh(vals); highThreshold=((high-low)*HighThresCoeff+low); i=0; } //peak detection if (curVal > highThreshold) { tss = micros(); activated = true; } else if (activated == true) { activated = false; tse = micros(); t2 = t1; t1 = tss + ((tse - tss) / 2); //code for processing the time further } Additional info: Latency: If the latency is under 1/3 of peak-to-peak time, and is predictable, or constant, it's okay. Example data: https://github.com/Atores1/exampleData
Having noticed the OP provided a link to raw, int data, I ran it through a moving average filter. Advantage of a moving average filter is one doesn't need to add all the samples in the buffer but just subtract the one dropping off and add in the new sample to an initial sum of buffer contents. Far less computational work and memory accesses. Here's the filtered result overlayed with the original data: And here's the code that reads in the original data as well as outputs synced original data and filtered data. #include <iostream> #include <fstream> #include <vector> #include <array> #include <numeric> #include <algorithm> #include <cstdio> #include <type_traits> using std::array, std::vector, std::size_t; using sample_type = int; // data sample type, either int or double constexpr int Global_Filter_N = 41; // filter length, must be odd // moving average filter template <typename T=sample_type, int N=Global_Filter_N> class Filter_MA { public: T clk(T in) { sum += in - buf[index]; buf[index] = in; index = (index + 1) % N; if constexpr (std::is_floating_point_v<T>) return sum / N; else return (sum + (N / 2)) / N; } bool update_vectors(const vector<T>& vin, vector<T>* pvout, vector<T>* prawout = nullptr) { if (vin.size() <= N || pvout == nullptr) return false; pvout->reserve(vin.size() - N); if (prawout != nullptr) pvout->reserve(vin.size() - N); for (size_t i = 0; i < N; i++) clk(vin[i]); for (size_t i = N; i < vin.size(); i++) { pvout->push_back(clk(vin[i])); if (prawout != nullptr) prawout->push_back(vin[i - N / 2]); } return true; } private: array<T, N> buf{}; // moving average buffer T sum{}; // running sum of buffer size_t index{}; // current loc remove output, add input }; template <typename T=sample_type> std::pair<T, T> peak_detect(T y1, T y2, T y3) { // scale pk location by 100 to work with int arith T pk = 100* (y1 - y3) / (2 * (y1 - 2 * y2 + y3)); T mag = 2 * y2 - y1 - y3; return std::pair{ pk, mag }; } struct WaveInfo { sample_type w_mean{}; sample_type w_max{}; sample_type w_min{}; vector<sample_type> peaks; vector<sample_type> mags; }; inline WaveInfo get_wave_info(std::vector<sample_type> v) { constexpr int N = Global_Filter_N; static_assert(Global_Filter_N & 1, "filter must be odd number"); WaveInfo w; w.w_max = *std::max_element(v.begin(), v.end()); w.w_min = *std::min_element(v.begin(), v.end()); // "0ll + sample_type{}" Produces either a double or long long int depending on sample_type to stop overflow if > 2M samples w.w_mean = static_cast<sample_type>(std::accumulate(v.begin(), v.end(), 0ll + sample_type{}) / std::size(v)); sample_type pos_thresh = w.w_mean + (w.w_max - w.w_mean) / 10; // 10% above ave. sample_type neg_thresh = w.w_mean + (w.w_min - w.w_mean) / 10; // 10% below ave int search_polarity = 0; // if 0 prior peak polarity not determined for (int i = 0; i < int(v.size()) - N; i++) { const int center = N/2; if (v[i] > pos_thresh && v[i] > v[i + N - 1] && v[i] < v[i + center] && search_polarity >= 0) { search_polarity = -1; auto results = peak_detect(v[i], v[i + center], v[i + N - 1]); w.peaks.push_back(results.first * center / 100 + i + center); w.mags.push_back(results.second); } if (v[i] < neg_thresh && v[i] < v[i + N - 1] && v[i] > v[i + center] && search_polarity <= 0) { search_polarity = 1; auto results = peak_detect(v[i], v[i + N / 2], v[i + N - 1]); w.peaks.push_back(results.first * center / 100 + i + center); w.mags.push_back(-results.second); } } return w; } // Used to get text file int samples vector<sample_type> get_raw_data() { std::ifstream in("raw_data.txt"); vector<sample_type> v; int x; while(in >> x) v.push_back(x); return v; } int main() { Filter_MA filter; vector<sample_type> vn = get_raw_data(); vector<sample_type> vfiltered; vector<sample_type> vraw; if (!filter.update_vectors(vn, &vfiltered, &vraw)) return 1; // exit if update failed // file with aligned raw and filtered data std::ofstream out("waves.txt"); for (size_t i = 0; i < vfiltered.size(); i++) out << vraw[i] << " " << vfiltered[i] << '\n'; // get filtered file metrics WaveInfo info = get_wave_info(vfiltered); out.close(); // file with peak locs and magnitudes out.open("peaks.txt"); for (size_t i = 0; i < info.peaks.size(); i++) out << info.peaks[i] << " " << info.mags[i] << '\n'; } Here's the peak info output for the first 4 peaks. First column is location, second column is a relative magnitude of the peak, 116 43 344 32 577 44 812 37
Minimum cuts on a rectangle to make into squares
I'm trying to solve this problem: Given an a×b rectangle, your task is to cut it into squares. On each move you can select a rectangle and cut it into two rectangles in such a way that all side lengths remain integers. What is the minimum possible number of moves? My logic is that the minimum number of cuts means the minimum number of squares; I don't know if it's the correct approach. I see which side is smaller, Now I know I need to cut bigSide/SmallSide of cuts to have squares of smallSide sides, then I am left with SmallSide and bigSide%smallSide. Then I go on till any side is 0 or both are equal. #include <iostream> int main() { int a, b; std::cin >> a >> b; // sides of the rectangle int res = 0; while (a != 0 && b != 0) { if (a > b) { if (a % b == 0) res += a / b - 1; else res += a / b; a = a % b; } else if (b > a) { if (b % a == 0) res += b / a - 1; else res += b / a; b = b % a; } else { break; } } std::cout << res; return 0; } When the input is 404 288, my code gives 18, but the right answer is actually 10. What am I doing wrong?
It seems clear to me that the problem defines each move as cutting a rectangle to two rectangles along the integer lines, and then asks for the minimum number of such cuts. As you can see there is a clear recursive nature in this problem. Once you cut a rectangle to two parts, you can recurse and cut each of them into squares with minimum moves and then sum up the answers. The problem is that the recursion might lead to exponential time complexity which leads us directly do dynamic programming. You have to use memoization to solve it efficiently (worst case time O(a*b*(a+b))) Here is what I'd suggest doing: #include <iostream> #include <vector> using std::vector; int min_cuts(int a, int b, vector<vector<int> > &mem) { int min = mem[a][b]; // if already computed, just return the value if (min > 0) return min; // if one side is divisible by the other, // store min-cuts in 'min' if (a%b==0) min= a/b-1; else if (b%a==0) min= b/a -1; // if there's no obvious solution, recurse else { // recurse on hight for (int i=1; i<a/2; i++) { int m = min_cuts(i,b, mem); int n = min_cuts(a-i, b, mem); if (min<0 or m+n+1<min) min = m + n + 1; } // recurse on width for (int j=1; j<b/2; j++) { int m = min_cuts(a,j, mem); int n = min_cuts(a, b-j, mem); if (min<0 or m+n+1<min) min = m + n + 1; } } mem[a][b] = min; return min; } int main() { int a, b; std::cin >> a >> b; // sides of the rectangle // -1 means the problem is not solved yet, vector<vector<int> > mem(a+1, vector<int>(b+1, -1)); int res = min_cuts(a,b,mem); std::cout << res << std::endl; return 0; } The reason the foor loops go up until a/2 and b/2 is that cuting a paper is symmetric: if you cut along vertical line i it is the same as cutting along the line a-i if you flip the paper vertically. This is a little optimization hack that reduces complexity by a factor of 4 overall. Another little hack is that by knowing that the problem is that if you transpose the paper the result is the same, meaining min_cuts(a,b)=min_cuts(b,a) you can potentially reduce computations by half. But any major further improvement, say a greedy algorithm would take more thinking (if there exists one at all).
The current answer is a good start, especially the suggestions to use memoization or dynamic programming, and potentially efficient enough. Obviously, all answerers used the first with a sub-par data-structure. Vector-of-Vector has much space and performance overhead, using a (strict) lower triangular matrix stored in an array is much more efficient. Using the maximum value as sentinel (easier with unsigned) would also reduce complexity. Finally, let's move to dynamic programming instead of memoization to simplify and get even more efficient: #include <algorithm> #include <memory> #include <utility> constexpr unsigned min_cuts(unsigned a, unsigned b) { if (a < b) std::swap(a, b); if (a == b || !b) return 0; const auto triangle = [](std::size_t n) { return n * (n - 1) / 2; }; const auto p = std::make_unique_for_overwrite<unsigned[]>(triangle(a)); /* const! */ unsigned zero = 0; const auto f = [&](auto a, auto b) -> auto& { if (a < b) std::swap(a, b); return a == b ? zero : p[triangle(a - 1) + b - 1]; }; for (auto i = 1u; i <= a; ++i) { for (auto j = 1u; j < i; ++j) { auto r = -1u; for (auto k = i / 2; k; --k) r = std::min(r, f(k, j) + f(i - k, j)); for (auto k = j / 2; k; --k) r = std::min(r, f(k, i) + f(j - k, i)); f(i, j) = ++r; } } return f(a, b); }
Weird different results between debug and run mode in Clion
I found a weird problem when I was solving a dynamic programming problem. Code // Problem 01 // A child is running up a staircase with n steps and can hop either 1 step, 2 steps, or 3 steps at a time. Implement a method to count how many possible ways the child can run up the stairs. #include <cstdio> #include <iostream> #include <sstream> #include <vector> using namespace std; // 1st method // top-down dynamic programming long steps(int n, vector<long>& d) { if (d[n] > 0L) return d[n]; // if (n == 1) return 1L; else if (n == 2) // 1,1 2 return 2L; else if (n == 3) // 1,1,1 1,2 2,1 3 return 4L; // long s = steps(n-1, d) + steps(n-2, d) + steps(n-3, d); d[n] = s; return s; } long solve(int n) { vector<long> d; for (int i = 0; i < n; ++i) { d.push_back(-1L); } long r = steps(n, d); return r; } // 2nd method // bottom-up dynamic programming // and remove the redundant array long steps2(int n) { if (n == 1) return 1L; else if (n == 2) // 1,1 2 return 2L; else if (n == 3) // 1,1,1 1,2 2,1 3 return 4L; // long s = 0L; long s1 = 4L; long s2 = 2L; long s3 = 1L; for (int i = 4; i < n; ++i) { s = s1 + s2 + s3; s3 = s2; s2 = s1; s1 = s; } s = s1 + s2 + s3; return s; } long solve2(int n) { return steps2(n); } ////////////////////// Test //////////////////////// class Test { public: Test() { basicTests(); } private: int num_fail = 0; void basicTests() { printf("C++ version: %ld\n", __cplusplus); // customize your own tests here printf("%d, %ld\n", 20, solve(20)); printf("%d, %ld\n", 20, solve2(20)); } }; ////////////////////// Main //////////////////////// int main() { Test t = Test(); // change the method you want to test here. return 0; } Results When I run the code, both methods give the same results, seem correct. C++ version: 201103 20, 121415 20, 121415 But when I using debug mode, sometimes the first result seems wrong, like this. C++ version: 201103 20, 4210418572061733749 20, 121415 What's the problem?
In solve you fill vector d with n values, index 0...n-1. And in steps you immediately checking for d[n] which would be n+1 value. You're using undefined value - got garbage result
Having problems with ctime, and working out function running time
I'm having trouble working out the time for my two maxsubarray functions to run. (right at the bottom of the code) The output it gives me: Inputsize: 101 Time using Brute Force:0 Time Using DivandCon: 12 is correct for the second time I use clock() but for the first difference diff1 it just gives me 0 and I'm not sure why? Edit: Revised Code. Edit2: Added Output. #include <iostream> #include <cmath> #include <cstdlib> #include <ctime> #include <limits.h> using namespace std; int Kedane(int a[], int size) { int max_so_far = 0, max_ending_here = 0; int i; for(i = 0; i < size; i++) { max_ending_here = max_ending_here + a[i]; if(max_ending_here < 0) max_ending_here = 0; if(max_so_far < max_ending_here) max_so_far = max_ending_here; } return max_so_far; } int BruteForce(int array[],int n) { int sum,ret=0; for(int j=-1;j<=n-2;j++) { sum=0; for(int k=j+1;k<+n-1;k++) { sum+=array[k]; if(sum>ret) { ret=sum; } } } return ret; } //------------------------------------------------------ // FUNCTION WHICH FINDS MAX OF 2 INTS int max(int a, int b) { return (a > b)? a : b; } // FUNCTION WHICH FINDS MAX OF 3 NUMBERS // CALL MAX FUNCT FOR 2 VARIS TWICE! int max(int a, int b, int c) { return max(max(a, b), c); } // WORKS OUT FROM MIDDLE+1->RIGHT THE MAX SUM & // THE MAX SUM FROM MIDDLE->LEFT + RETURNS SUM OF THESE int maxCrossingSum(int arr[], int l, int m, int h) { int sum = 0; // LEFT OF MID int LEFTsum = INT_MIN; // INITIALLISES SUM TO LOWEST POSSIBLE INT for (int i = m; i >= l; i--) { sum = sum + arr[i]; if (sum > LEFTsum) LEFTsum = sum; } sum = 0; // RIGHT OF MID int RIGHTsum = INT_MIN; for (int i = m+1; i <= h; i++) { sum = sum + arr[i]; if (sum > RIGHTsum) RIGHTsum = sum; } // RETURN SUM OF BOTH LEFT AND RIGHT SIDE MAX'S return LEFTsum + RIGHTsum; } // Returns sum of maxium sum subarray in aa[l..h] int maxSubArraySum(int arr[], int l, int h) { // Base Case: Only one element if (l == h) return arr[l]; // Find middle point int m = (l + h)/2; /* Return maximum of following three possible cases a) Maximum subarray sum in left half b) Maximum subarray sum in right half c) Maximum subarray sum such that the subarray crosses the midpoint */ return max(maxSubArraySum(arr, l, m), maxSubArraySum(arr, m+1, h), maxCrossingSum(arr, l, m, h)); } // DRIVER int main(void) { std::srand (time(NULL)); // CODE TO FILL ARRAY WITH RANDOMS [-50;50] int size=30000; int array[size]; for(int i=0;i<=size;i++) { array[i]=(std::rand() % 100) -50; } // TIMING VARI'S clock_t t1,t2; clock_t A,B; clock_t K1,K2; volatile int mb, md, qq; //VARYING ELEMENTS IN THE ARRAY for(int n=101;n<size;n=n+100) { t1=clock(); mb=BruteForce(array,n); t2=clock(); A=clock(); md=maxSubArraySum(array, 0, n-1) ; B=clock(); K1=clock(); qq=Kedane(array, n); K2=clock(); cout<< n << "," << (double)t2-(double)t1 << ","<<(double)B-(double)A << ","<<(double)K2-(double)K1<<endl; } return 0; } 101,0,0,0 201,0,0,0 301,1,0,0 401,0,0,0 501,0,0,0 601,0,0,0 701,0,0,0 801,1,0,0 901,1,0,0 1001,0,0,0 1101,1,0,0 1201,1,0,0 1301,0,0,0 1401,1,0,0 1501,1,0,0 1601,2,0,0 1701,1,0,0 1801,2,0,0 1901,1,1,0 2001,1,0,0 2101,2,0,0 2201,3,0,0 2301,2,0,0 2401,3,0,0 2501,3,0,0 2601,3,0,0 2701,4,0,0 2801,4,0,0 2901,4,0,0 3001,4,0,0 3101,4,0,0 3201,5,0,0 3301,5,0,0 3401,6,0,0 3501,5,0,0 3601,6,0,0 3701,6,0,0 3801,8,0,0 3901,7,0,0 4001,8,0,0 4101,7,0,0 4201,10,1,0 4301,9,0,0 4401,8,0,0 4501,9,0,0 4601,10,0,0 4701,11,0,0 4801,11,0,0 4901,11,0,0 5001,12,0,1 5101,11,1,0 5201,13,0,0 5301,13,0,0 5401,15,0,0 5501,14,0,0 5601,16,0,0 5701,15,0,0 5801,15,1,0 5901,16,0,0 6001,17,0,0 6101,18,0,0 6201,18,0,0 6301,19,0,0 6401,21,0,0 6501,19,0,0 6601,21,1,0 6701,20,0,0 6801,22,0,0 6901,23,0,0 7001,22,0,0 7101,24,0,0 7201,26,0,0 7301,26,0,0 7401,24,1,0 7501,26,0,0 7601,27,0,0 7701,28,0,0 7801,28,0,0 7901,30,0,0 8001,29,0,0 8101,31,0,0 8201,31,1,0 8301,35,0,0 8401,33,0,0 8501,35,0,0 8601,35,1,0 8701,35,0,0 8801,36,1,0 8901,37,0,0 9001,38,0,0 9101,39,0,0 9201,41,1,0 9301,40,0,0 9401,41,0,0 9501,42,0,0 9601,45,0,0 9701,45,0,0 9801,44,0,0 9901,47,0,0 10001,47,0,0 10101,48,0,0 10201,50,0,0 10301,51,0,0 10401,50,0,0 10501,51,0,0 10601,53,0,0 10701,55,0,0 10801,54,0,0 10901,56,0,0 11001,57,0,0 11101,56,0,0 11201,60,0,0 11301,60,0,0 11401,61,1,0 11501,61,1,0 11601,63,0,0 11701,62,1,0 11801,66,1,0 11901,65,0,0 12001,68,1,0 12101,68,0,0 12201,70,0,0 12301,71,0,0 12401,72,0,0 12501,73,1,0 12601,73,1,0 12701,76,0,0 12801,77,0,0 12901,78,1,0 13001,79,1,0 13101,80,0,0 13201,83,0,0 13301,82,0,0 13401,86,0,0 13501,85,1,0 13601,86,0,0 13701,89,0,0 13801,90,0,1 13901,90,0,0 14001,91,0,0 14101,97,0,0 14201,93,0,0 14301,96,0,0 14401,99,0,0 14501,100,0,0 14601,101,0,0 14701,101,0,0 14801,103,1,0 14901,104,0,0 15001,107,0,0 15101,108,0,0 15201,109,0,0 15301,109,0,0 15401,114,0,0 15501,114,0,0 15601,115,0,0 15701,116,0,0 15801,119,0,0 15901,118,0,0 16001,124,0,0 16101,123,1,0 16201,123,1,0 16301,125,0,0 16401,127,1,0 16501,128,1,0 16601,131,0,0 16701,132,0,0 16801,134,0,0 16901,134,1,0 17001,135,1,0 17101,139,0,0 17201,139,0,0 17301,140,1,0 17401,143,0,0 17501,145,0,0 17601,147,0,0 17701,147,0,0 17801,150,1,0 17901,152,1,0 18001,153,0,0 18101,155,0,0 18201,157,0,0 18301,157,1,0 18401,160,0,0 18501,160,1,0 18601,163,1,0 18701,165,0,0 18801,169,0,0 18901,171,0,1 19001,170,1,0 19101,173,1,0 19201,178,0,0 19301,175,1,0 19401,176,1,0 19501,180,0,0 19601,180,1,0 19701,182,1,0 19801,184,0,0 19901,187,1,0 20001,188,1,0 20101,191,0,0 20201,192,1,0 20301,193,1,0 20401,195,0,0 20501,199,0,0 20601,200,0,0 20701,201,0,0 20801,209,1,0 20901,210,0,0 21001,206,0,0 21101,210,0,0 21201,210,0,0 21301,213,0,0 21401,215,1,0 21501,217,1,0 21601,218,1,0 21701,221,1,0 21801,222,1,0 21901,226,1,0 22001,225,1,0 22101,229,0,0 22201,232,0,0 22301,233,1,0 22401,234,1,0 22501,237,1,0 22601,238,0,1 22701,243,0,0 22801,242,1,0 22901,246,1,0 23001,246,0,0 23101,250,1,0 23201,250,1,0 23301,254,1,0 23401,254,0,0 23501,259,0,1 23601,260,1,0 23701,263,1,0 23801,268,0,0 23901,266,1,0 24001,271,0,0 24101,272,1,0 24201,274,1,0 24301,280,0,1 24401,279,0,0 24501,281,0,0 24601,285,0,0 24701,288,0,0 24801,289,0,0 24901,293,0,0 25001,295,1,0 25101,299,1,0 25201,299,1,0 25301,302,0,0 25401,305,1,0 25501,307,0,0 25601,310,1,0 25701,315,0,0 25801,312,1,0 25901,315,0,0 26001,320,1,0 26101,320,0,0 26201,322,0,0 26301,327,1,0 26401,329,0,0 26501,332,1,0 26601,339,1,0 26701,334,1,0 26801,337,0,0 26901,340,0,0 27001,341,1,0 27101,342,1,0 27201,347,0,0 27301,348,1,0 27401,351,1,0 27501,353,0,0 27601,356,1,0 27701,360,0,1 27801,361,1,0 27901,362,1,0 28001,366,1,0 28101,370,0,1 28201,372,0,0 28301,375,1,0 28401,377,1,0 28501,380,0,0 28601,384,1,0 28701,384,0,0 28801,388,1,0 28901,391,1,0 29001,392,1,0 29101,399,1,0 29201,399,0,0 29301,404,1,0 29401,405,0,0 29501,409,1,0 29601,412,2,0 29701,412,1,0 29801,422,1,0 29901,419,1,0
The return values from BruteForce and maxSubArraySum are never used, and this gives the compiler a lot of lattitude when it comes to optimizing them. On my machine for example, using clang -O3 reduces the call to BruteForce to a vector copy and nothing else. One method for forcing the evaluation of these functions is to write their results to volatile variables: volatile int mb, md; // ... mb = BruteForce(array, n); // ... md = maxSubArraySum(array, 0, n-1); As the variables are volatile, the value given by the right-hand side of the assignments must be stored, despite the absence of any other side-effects, which prevents the compiler from optimising the computation away.
Windows Threads: Parallel Mergesort
I have what is hopefully a very easy question, I just cant find the answer online. I made a merge sort function ( which im sure has inefficiencies), but im here to ask about the threads. I'm using Windows' CreateThread function to spawn threads to sort intervals of a given array. Once all the threads are finished, I will merge the segments together for the final result. I havent implemented the final merge yet because im getting strange errors which im sure is from a dumb mistake in the threads. I'll post my code, if you could kindly look at paraMSort. Ill post the whole MergeSort.h file so you can see the helper functions as well. Sometimes the code will compile and run perfectly. Sometimes the console will abruptly close with no errors/exceptions. There shouldnt be mutex issues because im doing operations on different segments of the array (Different memory locations altogether). Does anyone see something wrong? Thanks so much. PS. Are Windows CreateThread's kernel level? In other words, if I make 2 threads on a dual core computer, may they run simultaneously on separate cores? Im thinking yes, since on this computer I can do the same work in 1/2 the time with 2 threads (with another test example). PPS. I also saw some parallelism answers solved using Boost.Thread. Should I just use boost threads instead of windows threads? I don't have experience with Boost. #include "Windows.h" #include <iostream> using namespace std; void insert_sort(int* A, int sA, int eA, int* B, int sB, int eB) { int value; int iterator; for(int i = sA + 1; i < eA; i++) { value = A[i]; // Grab the next value in the array iterator = i - 1; // Move this value left up the list until its in the right spot while(iterator >= sA && value < A[iterator]) A[iterator + 1] = A[iterator--]; A[iterator + 1] = value; // Put value in its correct spot } for(int i = sA; i < eB; i++) { B[i] = A[i]; // Put results in B } } void merge_func(int* a, int sa, int ea, int* b, int sb, int eb, int* c, int sc) { int i = sa, j = sb, k = sc; while(i < ea && j < eb) c[k++] = a[i] < b[j] ? a[i++] : b[j++]; while(i < ea) c[k++] = a[i++]; while(j < eb) c[k++] = b[j++]; } void msort_big(int* a, int* b, int s, int e, bool inA) { if(e-s < 4) { insert_sort(a, s, e, b, s, e); return; // We sorted (A,s,e) into (B,s,e). } int m = (s + e)/2; msort_big(a, b, s, m, !inA); msort_big(a, b, m, e, !inA); // If we want to merge in A, do it. Otherwise, merge in B inA ? merge_func(b, s, m, b, m, e, a, s) : merge_func(a, s, m, a, m, e, b, s); } void msort(int* toBeSorted, int s, int e) // Sorts toBeSorted from [s, e+1), so just enter [s, e] and // the call to msort_big adds one. { int* b = new int[e - s + 1]; msort_big(toBeSorted, b, s, e+1, true); delete [] b; } template <class T> struct SortData_Send { T* data; int start; int end; }; DWORD WINAPI msort_para_callback(LPVOID lpParam) { SortData_Send<int> dat = *(SortData_Send<int>*)lpParam; msort(dat.data, dat.start, dat.end); cout << "done! " << endl; } int ceiling_func(double num) { int temp = (int)num; if(num > (double)temp) { return temp + 1; } else { return temp; } } void paraMSort(int* toBeSorted, int s, int e, int numThreads) { HANDLE threads[numThreads]; DWORD threadIDs[numThreads]; SortData_Send<int>* sent[numThreads]; for(int i = 0; i < numThreads; i++) { // So for each thread, make an interval and pass the pointer to the array of ints. // So for numThreads = 3 and array size of 0 to 99 (100), we have 0-32, 33-65, 66-100. // 100 because sort function takes [start, end). sent[i] = new SortData_Send<int>; sent[i]->data = toBeSorted; sent[i]->start = s + ceiling_func(double(i)*(double)e/double(numThreads)); sent[i]->end = ceiling_func(double(i+1)*double(e)/double(numThreads)) + ((i == numThreads-1) ? 1 : -1); threads[i] = CreateThread(NULL, 0, msort_para_callback, sent[i], 0, &threadIDs[i]); } WaitForMultipleObjects(numThreads, threads, true, INFINITE); cout << "Done waiting!" <<endl; }
Assuming 's' is your starting point and 'e' is your ending point for a thread shouldn't your code be something like sent[i]->start = s + ceiling_func(double(i)*(double)(e-s)/double(numThreads)); sent[i]->end = (i == numThreads-1) ? e : (s - 1 + ceiling_func(double(i+1)*(double)(e-s)/double(numThreads))); This is in case your function void paraMSort(int* toBeSorted, int s, int e, int numThreads) is being called with a value of 's' not equal to 0? This could cause you to read wrong sections of memory.