Computing all distinct pair-combinations from N elements - combinations

Working on a USACO programming problem, I got stuck when using a brute-force approach.
From a list of N elements, I need to compute all distinct pair-configurations.
My problem is twofold.
How do I express such a configuration in, lets say, an array?
How do I go about computing all distinct combinations?
I only resorted to the brute-force approach after I gave up solving it analytically. Although this is context-specific, I came as far as noting that one can quickly rule out the rows where there is only a single, so called, "wormhole" --- it isn't effectively in an infinite cycle.
Update
I'll express them with a tree structure. Set N = 6; {A,B,C,D,E,F}.
By constructing the following trees chronologically, all combinations are listed.
A --> B,C,D,E,F;
B --> C,D,E,F;
C --> D,E,F;
D --> E,F;
E --> F.
Check: in total there are 6 over 2 = 6!/(2!*4!) = 15 combinations.
Note. Once a lower node is selected, it should be discarded as a top node; it can only exist in one single pair.
Next, selecting them and looping over all configurations.

Here is a sample code (in C/C++):
int c[N];
void LoopOverAll(int n)
{
if (n == N)
{
// output, the array c now contains a configuration
// do anything you want here
return;
}
if (c[n] != - 1)
{
// this warmhole is already paired with someone
LoopOverAll(n + 1);
return;
}
for (int i = n + 1; i < N; i ++)
{
if (c[i] != - 1)
{
// this warmhole is already paired with someone
continue;
}
c[i] = n; c[n] = i; LoopOverAll(n + 1);
c[i] = - 1;
}
c[n] = - 1;
}
int main()
{
for (int i = 0; i < N; i ++)
c[i] = - 1;
LoopOverAll(0);
return 1;
}

Related

Minimize the maximum difference between the heights

Given heights of n towers and a value k. We need to either increase or decrease height of every tower by k (only once) where k > 0. The task is to minimize the difference between the heights of the longest and the shortest tower after modifications, and output this difference.
I get the intuition behind the solution but I can not comment on the correctness of the solution below.
// C++ program to find the minimum possible
// difference between maximum and minimum
// elements when we have to add/subtract
// every number by k
#include <bits/stdc++.h>
using namespace std;
// Modifies the array by subtracting/adding
// k to every element such that the difference
// between maximum and minimum is minimized
int getMinDiff(int arr[], int n, int k)
{
if (n == 1)
return 0;
// Sort all elements
sort(arr, arr+n);
// Initialize result
int ans = arr[n-1] - arr[0];
// Handle corner elements
int small = arr[0] + k;
int big = arr[n-1] - k;
if (small > big)
swap(small, big);
// Traverse middle elements
for (int i = 1; i < n-1; i ++)
{
int subtract = arr[i] - k;
int add = arr[i] + k;
// If both subtraction and addition
// do not change diff
if (subtract >= small || add <= big)
continue;
// Either subtraction causes a smaller
// number or addition causes a greater
// number. Update small or big using
// greedy approach (If big - subtract
// causes smaller diff, update small
// Else update big)
if (big - subtract <= add - small)
small = subtract;
else
big = add;
}
return min(ans, big - small);
}
// Driver function to test the above function
int main()
{
int arr[] = {4, 6};
int n = sizeof(arr)/sizeof(arr[0]);
int k = 10;
cout << "\nMaximum difference is "
<< getMinDiff(arr, n, k);
return 0;
}
Can anyone help me provide the correct solution to this problem?
The codes above work, however I don't find much explanation so I'll try to add some in order to help develop intuition.
For any given tower, you have two choices, you can either increase its height or decrease it.
Now if you decide to increase its height from say Hi to Hi + K, then you can also increase the height of all shorter towers as that won't affect the maximum. Similarly, if you decide to decrease the height of a tower from Hi to Hi − K, then you can also decrease the heights of all taller towers.
We will make use of this, we have n buildings, and we'll try to make each of the building the highest and see making which building the highest gives us the least range of heights(which is our answer). Let me explain:
So what we want to do is - 1) We first sort the array(you will soon see why).
2) Then for every building from i = 0 to n-2[1] , we try to make it the highest (by adding K to the building, adding K to the buildings on its left and subtracting K from the buildings on its right).
So say we're at building Hi, we've added K to it and the buildings before it and subtracted K from the buildings after it. So the minimum height of the buildings will now be min(H0 + K, Hi+1 - K), i.e. min(1st building + K, next building on right - K).
(Note: This is because we sorted the array. Convince yourself by taking a few examples.)
Likewise, the maximum height of the buildings will be max(Hi + K, Hn-1 - K), i.e. max(current building + K, last building on right - K).
3) max - min gives you the range.
[1]Note that when i = n-1. In this case, there is no building after the current building, so we're adding K to every building, so the range will merely be
height[n-1] - height[0] since K is added to everything, so it cancels out.
Here's a Java implementation based on the idea above:
class Solution {
int getMinDiff(int[] arr, int n, int k) {
Arrays.sort(arr);
int ans = arr[n-1] - arr[0];
int smallest = arr[0] + k, largest = arr[n-1]-k;
for(int i = 0; i < n-1; i++){
int min = Math.min(smallest, arr[i+1]-k);
int max = Math.max(largest, arr[i]+k);
if (min < 0) continue;
ans = Math.min(ans, max-min);
}
return ans;
}
}
int getMinDiff(int a[], int n, int k) {
sort(a,a+n);
int i,mx,mn,ans;
ans = a[n-1]-a[0]; // this can be one possible solution
for(i=0;i<n;i++)
{
if(a[i]>=k) // since height of tower can't be -ve so taking only +ve heights
{
mn = min(a[0]+k, a[i]-k);
mx = max(a[n-1]-k, a[i-1]+k);
ans = min(ans, mx-mn);
}
}
return ans;
}
This is C++ code, it passed all the test cases.
This python code might be of some help to you. Code is self explanatory.
def getMinDiff(arr, n, k):
arr = sorted(arr)
ans = arr[-1]-arr[0] #this case occurs when either we subtract k or add k to all elements of the array
for i in range(n):
mn=min(arr[0]+k, arr[i]-k) #after sorting, arr[0] is minimum. so adding k pushes it towards maximum. We subtract k from arr[i] to get any other worse (smaller) minimum. worse means increasing the diff b/w mn and mx
mx=max(arr[n-1]-k, arr[i]+k) # after sorting, arr[n-1] is maximum. so subtracting k pushes it towards minimum. We add k to arr[i] to get any other worse (bigger) maximum. worse means increasing the diff b/w mn and mx
ans = min(ans, mx-mn)
return ans
Here's a solution:-
But before jumping on to the solution, here's some info that is required to understand it. In the best case scenario, the minimum difference would be zero. This could happen only in two cases - (1) the array contain duplicates or (2) for an element, lets say 'x', there exists another element in the array which has the value 'x + 2*k'.
The idea is pretty simple.
First we would sort the array.
Next, we will try to find either the optimum value (for which the answer would come out to be zero) or at least the closest number to the optimum value using Binary Search
Here's a Javascript implementation of the algorithm:-
function minDiffTower(arr, k) {
arr = arr.sort((a,b) => a-b);
let minDiff = Infinity;
let prev = null;
for (let i=0; i<arr.length; i++) {
let el = arr[i];
// Handling case when the array have duplicates
if (el == prev) {
minDiff = 0;
break;
}
prev = el;
let targetNum = el + 2*k; // Lets say we have an element 10. The difference would be zero when there exists an element with value 10+2*k (this is the 'optimum value' as discussed in the explaination
let closestMatchDiff = Infinity; // It's not necessary that there would exist 'targetNum' in the array, so we try to find the closest to this number using Binary Search
let lb = i+1;
let ub = arr.length-1;
while (lb<=ub) {
let mid = lb + ((ub-lb)>>1);
let currMidDiff = arr[mid] > targetNum ? arr[mid] - targetNum : targetNum - arr[mid];
closestMatchDiff = Math.min(closestMatchDiff, currMidDiff);
if (arr[mid] == targetNum) break; // in this case the answer would be simply zero, no need to proceed further
else if (arr[mid] < targetNum) lb = mid+1;
else ub = mid-1;
}
minDiff = Math.min(minDiff, closestMatchDiff);
}
return minDiff;
}
Here is the C++ code, I have continued from where you left. The code is self-explanatory.
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int minDiff(int arr[], int n, int k)
{
// If the array has only one element.
if (n == 1)
{
return 0;
}
//sort all elements
sort(arr, arr + n);
//initialise result
int ans = arr[n - 1] - arr[0];
//Handle corner elements
int small = arr[0] + k;
int big = arr[n - 1] - k;
if (small > big)
{
// Swap the elements to keep the array sorted.
int temp = small;
small = big;
big = temp;
}
//traverse middle elements
for (int i = 0; i < n - 1; i++)
{
int subtract = arr[i] - k;
int add = arr[i] + k;
// If both subtraction and addition do not change the diff.
// Subtraction does not give new minimum.
// Addition does not give new maximum.
if (subtract >= small or add <= big)
{
continue;
}
// Either subtraction causes a smaller number or addition causes a greater number.
//Update small or big using greedy approach.
// if big-subtract causes smaller diff, update small Else update big
if (big - subtract <= add - small)
{
small = subtract;
}
else
{
big = add;
}
}
return min(ans, big - small);
}
int main(void)
{
int arr[] = {1, 5, 15, 10};
int n = sizeof(arr) / sizeof(arr[0]);
int k = 3;
cout << "\nMaximum difference is: " << minDiff(arr, n, k) << endl;
return 0;
}
class Solution {
public:
int getMinDiff(int arr[], int n, int k) {
sort(arr, arr+n);
int diff = arr[n-1]-arr[0];
int mine, maxe;
for(int i = 0; i < n; i++)
arr[i]+=k;
mine = arr[0];
maxe = arr[n-1]-2*k;
for(int i = n-1; i > 0; i--){
if(arr[i]-2*k < 0)
break;
mine = min(mine, arr[i]-2*k);
maxe = max(arr[i-1], arr[n-1]-2*k);
diff = min(diff, maxe-mine);
}
return diff;
}
};
class Solution:
def getMinDiff(self, arr, n, k):
# code here
arr.sort()
res = arr[-1]-arr[0]
for i in range(1, n):
if arr[i]>=k:
# at a time we can increase or decrease one number only.
# Hence assuming we decrease ith elem, we will increase i-1 th elem.
# using this we basically find which is new_min and new_max possible
# and if the difference is smaller than res, we return the same.
new_min = min(arr[0]+k, arr[i]-k)
new_max = max(arr[-1]-k, arr[i-1]+k)
res = min(res, new_max-new_min)
return res

Algorithm on hexagonal grid

Hexagonal grid is represented by a two-dimensional array with R rows and C columns. First row always comes "before" second in hexagonal grid construction (see image below). Let k be the number of turns. Each turn, an element of the grid is 1 if and only if the number of neighbours of that element that were 1 the turn before is an odd number. Write C++ code that outputs the grid after k turns.
Limitations:
1 <= R <= 10, 1 <= C <= 10, 1 <= k <= 2^(63) - 1
An example with input (in the first row are R, C and k, then comes the starting grid):
4 4 3
0 0 0 0
0 0 0 0
0 0 1 0
0 0 0 0
Simulation: image, yellow elements represent '1' and blank represent '0'.
This problem is easy to solve if I simulate and produce a grid each turn, but with big enough k it becomes too slow. What is the faster solution?
EDIT: code (n and m are used instead R and C) :
#include <cstdio>
#include <cstring>
using namespace std;
int old[11][11];
int _new[11][11];
int n, m;
long long int k;
int main() {
scanf ("%d %d %lld", &n, &m, &k);
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) scanf ("%d", &old[i][j]);
}
printf ("\n");
while (k) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
int count = 0;
if (i % 2 == 0) {
if (i) {
if (j) count += old[i-1][j-1];
count += old[i-1][j];
}
if (j) count += (old[i][j-1]);
if (j < m-1) count += (old[i][j+1]);
if (i < n-1) {
if (j) count += old[i+1][j-1];
count += old[i+1][j];
}
}
else {
if (i) {
if (j < m-1) count += old[i-1][j+1];
count += old[i-1][j];
}
if (j) count += old[i][j-1];
if (j < m-1) count += old[i][j+1];
if (i < n-1) {
if (j < m-1) count += old[i+1][j+1];
count += old[i+1][j];
}
}
if (count % 2) _new[i][j] = 1;
else _new[i][j] = 0;
}
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) old[i][j] = _new[i][j];
}
k--;
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
printf ("%d", old[i][j]);
}
printf ("\n");
}
return 0;
}
For a given R and C, you have N=R*C cells.
If you represent those cells as a vector of elements in GF(2), i.e, 0s and 1s where arithmetic is performed mod 2 (addition is XOR and multiplication is AND), then the transformation from one turn to the next can be represented by an N*N matrix M, so that:
turn[i+1] = M*turn[i]
You can exponentiate the matrix to determine how the cells transform over k turns:
turn[i+k] = (M^k)*turn[i]
Even if k is very large, like 2^63-1, you can calculate M^k quickly using exponentiation by squaring: https://en.wikipedia.org/wiki/Exponentiation_by_squaring This only takes O(log(k)) matrix multiplications.
Then you can multiply your initial state by the matrix to get the output state.
From the limits on R, C, k, and time given in your question, it's clear that this is the solution you're supposed to come up with.
There are several ways to speed up your algorithm.
You do the neighbour-calculation with the out-of bounds checking in every turn. Do some preprocessing and calculate the neighbours of each cell once at the beginning. (Aziuth has already proposed that.)
Then you don't need to count the neighbours of all cells. Each cell is on if an odd number of neighbouring cells were on in the last turn and it is off otherwise.
You can think of this differently: Start with a clean board. For each active cell of the previous move, toggle the state of all surrounding cells. When an even number of neighbours cause a toggle, the cell is on, otherwise the toggles cancel each other out. Look at the first step of your example. It's like playing Lights Out, really.
This method is faster than counting the neighbours if the board has only few active cells and its worst case is a board whose cells are all on, in which case it is as good as neighbour-counting, because you have to touch each neighbours for each cell.
The next logical step is to represent the board as a sequence of bits, because bits already have a natural way of toggling, the exclusive or or xor oerator, ^. If you keep the list of neigbours for each cell as a bit mask m, you can then toggle the board b via b ^= m.
These are the improvements that can be made to the algorithm. The big improvement is to notice that the patterns will eventually repeat. (The toggling bears resemblance with Conway's Game of Life, where there are also repeating patterns.) Also, the given maximum number of possible iterations, 2⁶³ is suspiciously large.
The playing board is small. The example in your question will repeat at least after 2¹⁶ turns, because the 4×4 board can have at most 2¹⁶ layouts. In practice, turn 127 reaches the ring pattern of the first move after the original and it loops with a period of 126 from then.
The bigger boards may have up to 2¹⁰⁰ layouts, so they may not repeat within 2⁶³ turns. A 10×10 board with a single active cell near the middle has ar period of 2,162,622. This may indeed be a topic for a maths study, as Aziuth suggests, but we'll tacke it with profane means: Keep a hash map of all previous states and the turns where they occurred, then check whether the pattern has occurred before in each turn.
We now have:
a simple algorithm for toggling the cells' state and
a compact bitwise representation of the board, which allows us to create a hash map of the previous states.
Here's my attempt:
#include <iostream>
#include <map>
/*
* Bit representation of a playing board, at most 10 x 10
*/
struct Grid {
unsigned char data[16];
Grid() : data() {
}
void add(size_t i, size_t j) {
size_t k = 10 * i + j;
data[k / 8] |= 1u << (k % 8);
}
void flip(const Grid &mask) {
size_t n = 13;
while (n--) data[n] ^= mask.data[n];
}
bool ison(size_t i, size_t j) const {
size_t k = 10 * i + j;
return ((data[k / 8] & (1u << (k % 8))) != 0);
}
bool operator<(const Grid &other) const {
size_t n = 13;
while (n--) {
if (data[n] > other.data[n]) return true;
if (data[n] < other.data[n]) return false;
}
return false;
}
void dump(size_t n, size_t m) const {
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
std::cout << '\n';
}
};
int main()
{
size_t n, m, k;
std::cin >> n >> m >> k;
Grid grid;
Grid mask[10][10];
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
int x;
std::cin >> x;
if (x) grid.add(i, j);
}
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
Grid &mm = mask[i][j];
if (i % 2 == 0) {
if (i) {
if (j) mm.add(i - 1, j - 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j) mm.add(i + 1, j - 1);
mm.add(i + 1, j);
}
} else {
if (i) {
if (j < m - 1) mm.add(i - 1, j + 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j < m - 1) mm.add(i + 1, j + 1);
mm.add(i + 1, j);
}
}
}
}
std::map<Grid, size_t> prev;
std::map<size_t, Grid> pattern;
for (size_t turn = 0; turn < k; turn++) {
Grid next;
std::map<Grid, size_t>::const_iterator it = prev.find(grid);
if (1 && it != prev.end()) {
size_t start = it->second;
size_t period = turn - start;
size_t index = (k - turn) % period;
grid = pattern[start + index];
break;
}
prev[grid] = turn;
pattern[turn] = grid;
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
if (grid.ison(i, j)) next.flip(mask[i][j]);
}
}
grid = next;
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (grid.ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
return 0;
}
There is probably room for improvement. Especially, I'm not so sure how it fares for big boards. (The code above uses an ordered map. We don't need the order, so using an unordered map will yield faster code. The example above with a single active cell on a 10×10 board took significantly longer than a second with an ordered map.)
Not sure about how you did it - and you should really always post code here - but let's try to optimize things here.
First of all, there is not really a difference between that and a quadratic grid. Different neighbor relationships, but I mean, that is just a small translation function. If you have a problem there, we should treat this separately, maybe on CodeReview.
Now, the naive solution is:
for all fields
count neighbors
if odd: add a marker to update to one, else to zero
for all fields
update all fields by marker of former step
this is obviously in O(N). Iterating twice is somewhat twice the actual run time, but should not be that bad. Try not to allocate space every time that you do that but reuse existing structures.
I'd propose this solution:
at the start:
create a std::vector or std::list "activated" of pointers to all fields that are activated
each iteration:
create a vector "new_activated"
for all items in activated
count neighbors, if odd add to new_activated
for all items in activated
set to inactive
replace activated by new_activated*
for all items in activated
set to active
*this can be done efficiently by putting them in a smart pointer and use move semantics
This code only works on the activated fields. As long as they stay within some smaller area, this is far more efficient. However, I have no idea when this changes - if there are activated fields all over the place, this might be less efficient. In that case, the naive solution might be the best one.
EDIT: after you now posted your code... your code is quite procedural. This is C++, use classes and use representation of things. Probably you do the search for neighbors right, but you can easily make mistakes there and therefore should isolate that part in a function, or better method. Raw arrays are bad and variables like n or k are bad. But before I start tearing your code apart, I instead repeat my recommendation, put the code on CodeReview, having people tear it apart until it is perfect.
This started off as a comment, but I think it could be helpful as an answer in addition to what has already been stated.
You stated the following limitations:
1 <= R <= 10, 1 <= C <= 10
Given these restrictions, I'll take the liberty to can represent the grid/matrix M of R rows and C columns in constant space (i.e. O(1)), and also check its elements in O(1) instead of O(R*C) time, thus removing this part from our time-complexity analysis.
That is, the grid can simply be declared as bool grid[10][10];.
The key input is the large number of turns k, stated to be in the range:
1 <= k <= 2^(63) - 1
The problem is that, AFAIK, you're required to perform k turns. This makes the algorithm be in O(k). Thus, no proposed solution can do better than O(k)[1].
To improve the speed in a meaningful way, this upper-bound must be lowered in some way[1], but it looks like this cannot be done without altering the problem constraints.
Thus, no proposed solution can do better than O(k)[1].
The fact that k can be so large is the main issue. The most anyone can do is improve the rest of the implementation, but this will only improve by a constant factor; you'll have to go through k turns regardless of how you look at it.
Therefore, unless some clever fact and/or detail is found that allows this bound to be lowered, there's no other choice.
[1] For example, it's not like trying to determine if some number n is prime, where you can check all numbers in the range(2, n) to see if they divide n, making it a O(n) process, or notice that some improvements include only looking at odd numbers after checking n is not even (constant factor; still O(n)), and then checking odd numbers only up to √n, i.e., in the range(3, √n, 2), which meaningfully lowers the upper-bound down to O(√n).

Knapsack Backtracking Using only O(W) space

So I have this code that I have written that correctly finds the optimal value for the knapsack problem.
int mat[2][size + 1];
memset(mat, 0, sizeof(mat));
int i = 0;
while(i < nItems)
{
int j = 0;
if(i % 2 != 0)
{
while(++j <= size)
{
if(weights[i] <= j) mat[1][j] = max(values[i] + mat[0][j - weights[i]], mat[0][j]);
else mat[1][j] = mat[0][j];
}
}
else
{
while(++j <= size)
{
if(weights[i] <= j) mat[0][j] = max(values[i] + mat[1][j - weights[i]], mat[1][j]);
else mat[0][j] = mat[1][j];
}
}
i++;
}
int val = (nItems % 2 != 0)? mat[0][size] : mat[1][size];
cout << val << endl;
return 0;
This part I udnerstand. However I am trying to keep the same memory space, i.e. O(W), but also now compute the optimal solution using backtracking. This is where I am finding trouble. The hints I have been given is this
Now suppose that we also want the optimal set of items. Recall that the goal
in finding the optimal solution in part 1 is to find the optimal path from
entry K(0,0) to entry K(W,n). The optimal path must pass through an
intermediate node (k,n/2) for some k; this k corresponds to the remaining
capacity in the knapsack of the optimal solution after items n/2 + 1,...n
have been considered
The question asked is this.
Implement a modified version of the algorithm from part 2 that returns not
only the optimal value, but also the remaining capacity of the optimal
solution after the last half of items have been considered
Any help would be apprecaited to get me started. Thanks

Dynamic Programming for analyzing a Vector of Vector of Bools

Here's the problem I'm trying to solve.
Given a square of bools, I want to find the size of largest subsquare entirely full of trues (1's). Also, I am allowed O(n^2) memory requirement as well as the run time must be O(n^2). The header to the function will look like the following
unsigned int largestCluster(const vector<vector<bool>> &map);
Some other things to note will be there always be at least one 1 (a 1 x 1 subsquare) and the input will also always be a square.
Now for my attempts at the problem:
Given this is based on the concept of dynamic programming, which to my limited understanding, helps store information that is previously found for later use. So if my understanding is correcting, Prim's algorithm would be an example of a dynamic algorithm because it remembers what vertices we've visited, the smallest distance to a vertice, and the parent that enables that smallest distance.
I tried analyzing the map and keeping track of the number of true neighbors, a true location location has. I was thinking if a spot had 4 true neighbors than that is a potential subsquare. However, this didn't help with subsquares of size 4 or less..
I tried to include a lot of detail in this question for help as I'm trying to game plan a way to tackle this problem because I don't believe it's going to require writing a lengthy function. Thanks for any help
Here's my nomination. Dynamic programming, O(n^2) complexity. I realize that I probably just did somebody's homework, but it looked like an intriguing little problem.
int largestCluster(const std::vector<std::vector<bool> > a)
{
const int n = a.size();
std::vector<std::vector<short> > s;
s.resize(n);
for (int i = 0; i < n; ++i)
{
s[i].resize(n);
}
s[0][0] = a[0][0] ? 1 : 0;
int maxSize = s[0][0];
for (int k = 1; k < n; ++k)
{
s[k][0] = a[k][0] ? 1 : 0;
for (int j = 1; j < k; ++j)
{
if (a[k][j])
{
int m = s[k - 1][j - 1];
if (s[k][j - 1] < m)
{
m = s[k][j - 1];
}
if (s[k - 1][j] < m)
{
m = s[k - 1][j];
}
s[k][j] = ++m;
if (m > maxSize)
{
maxSize = m;
}
}
else
{
s[k][j] = 0;
}
}
s[0][k] = a[0][k] ? 1 : 0;
for (int i = 1; i <= k; ++i)
{
if (a[i][k])
{
int m = s[i - 1][k - 1];
if (s[i - 1][k] < m)
{
m = s[i - 1][k];
}
if (s[i][k - 1] < m)
{
m = s[i][k - 1];
}
s[i][k] = ++m;
if (m > maxSize)
{
maxSize = m;
}
}
else
{
s[i][k] = 0;
}
}
}
return maxSize;
}
If you want a dynamic programming approach one strategy I could think of would be to consider a box (base case 1 entry) as a potential upper left corner of a larger box and start by the bottom right corner of your large square, you then need to evaluate only the "boxes" (using information previously stored to only consider the largest cluster so far) that are to the right, bottom, and diagonally right-bottom of that we are now evaluating.
By saving information about each edge we would be respecting the O(n^2) (though not o(n^2)) however for the run-time you need to work on the details of the approach to get to O(n^2)
This is just a rough draft idea as I don't have much time, and I would appreciate any more hints/comments about this myself.

weighted RNG speed problem in C++

Edit: to clarify, the problem is with the second algorithm.
I have a bit of C++ code that samples cards from a 52 card deck, which works just fine:
void sample_allcards(int table[5], int holes[], int players) {
int temp[5 + 2 * players];
bool try_again;
int c, n, i;
for (i = 0; i < 5 + 2 * players; i++) {
try_again = true;
while (try_again == true) {
try_again = false;
c = fast_rand52();
// reject collisions
for (n = 0; n < i + 1; n++) {
try_again = (temp[n] == c) || try_again;
}
temp[i] = c;
}
}
copy_cards(table, temp, 5);
copy_cards(holes, temp + 5, 2 * players);
}
I am implementing code to sample the hole cards according to a known distribution (stored as a 2d table). My code for this looks like:
void sample_allcards_weighted(double weights[][HOLE_CARDS], int table[5], int holes[], int players) {
// weights are distribution over hole cards
int temp[5 + 2 * players];
int n, i;
// table cards
for (i = 0; i < 5; i++) {
bool try_again = true;
while (try_again == true) {
try_again = false;
int c = fast_rand52();
// reject collisions
for (n = 0; n < i + 1; n++) {
try_again = (temp[n] == c) || try_again;
}
temp[i] = c;
}
}
for (int player = 0; player < players; player++) {
// hole cards according to distribution
i = 5 + 2 * player;
bool try_again = true;
while (try_again == true) {
try_again = false;
// weighted-sample c1 and c2 at once
// h is a number < 1325
int h = weighted_randi(&weights[player][0], HOLE_CARDS);
// i2h uses h and sets temp[i] to the 2 cards implied by h
i2h(&temp[i], h);
// reject collisions
for (n = 0; n < i; n++) {
try_again = (temp[n] == temp[i]) || (temp[n] == temp[i+1]) || try_again;
}
}
}
copy_cards(table, temp, 5);
copy_cards(holes, temp + 5, 2 * players);
}
My problem? The weighted sampling algorithm is a factor of 10 slower. Speed is very important for my application.
Is there a way to improve the speed of my algorithm to something more reasonable? Am I doing something wrong in my implementation?
Thanks.
edit: I was asked about this function, which I should have posted, since it is key
inline int weighted_randi(double *w, int num_choices) {
double r = fast_randd();
double threshold = 0;
int n;
for (n = 0; n < num_choices; n++) {
threshold += *w;
if (r <= threshold) return n;
w++;
}
// shouldn't get this far
cerr << n << "\t" << threshold << "\t" << r << endl;
assert(n < num_choices);
return -1;
}
...and i2h() is basically just an array lookup.
Your reject collisions are turning an O(n) algorithm into (I think) an O(n^2) operation.
There are two ways to select cards from a deck: shuffle and pop, or pick sets until the elements of the set are unique; you are doing the latter which requires a considerable amount of backtracking.
I didn't look at the details of the code, just a quick scan.
you could gain some speed by replacing the all the loops that check if a card is taken with a bit mask, eg for a pool of 52 cards, we prevent collisions like so:
DWORD dwMask[2] = {0}; //64 bits
//...
int nCard;
while(true)
{
nCard = rand_52();
if(!(dwMask[nCard >> 5] & 1 << (nCard & 31)))
{
dwMask[nCard >> 5] |= 1 << (nCard & 31);
break;
}
}
//...
My guess would be the memcpy(1326*sizeof(double)) within the retry-loop. It doesn't seem to change, so should it be copied each time?
Rather than tell you what the problem is, let me suggest how you can find it. Either 1) single-step it in the IDE, or 2) randomly halt it to see what it's doing.
That said, sampling by rejection, as you are doing, can take an unreasonably long time if you are rejecting most samples.
Your inner "try_again" for loop should stop as soon as it sets try_again to true - there's no point in doing more work after you know you need to try again.
for (n = 0; n < i && !try_again; n++) {
try_again = (temp[n] == temp[i]) || (temp[n] == temp[i+1]);
}
Answering the second question about picking from a weighted set also has an algorithmic replacement that should be less time complex. This is based on the principle of that which is pre-computed does not need to be re-computed.
In an ordinary selection, you have an integral number of bins which makes picking a bin an O(1) operation. Your weighted_randi function has bins of real length, thus selection in your current version operates in O(n) time. Since you don't say (but do imply) that the vector of weights w is constant, I'll assume that it is.
You aren't interested in the width of the bins, per se, you are interested in the locations of their edges that you re-compute on every call to weighted_randi using the variable threshold. If the constancy of w is true, pre-computing a list of edges (that is, the value of threshold for all *w) is your O(n) step which need only be done once. If you put the results in a (naturally) ordered list, a binary search on all future calls yields an O(log n) time complexity with an increase in space needed of only sizeof w / sizeof w[0].