I'm trying to rewrite the main loop in a physics simulation and split the workload between more threads.
It calls dostuff on every unique pair of indices and looks like this:
for (int i = 0; i < n - 1; ++i)
{
for (int j = i + 1; j < n; ++j)
{
dostuff(i, j);
}
}
I came up with two options:
//#1
//sqrt is implemented as binary search on ints, floors the result
for (int x = 0; x < n * (n - 1) / 2; ++x)
{
int i = (1 + sqrt(1 + 8 * x)) / 2;
int j = x - i * (i - 1) / 2;
dostuff(i, j);
}
//#2
for (int x = 0; x < n * n; ++x)
{
int i = x % n;
int j = x / n;
if (i < j)
dostuff(i, j);
}
And for each option, there is corresponding thread loop using shared atomic counter:
//#1
while(int x = counter.fetch_add(1) < n * (n - 1) / 2)
{
int i = (1 + sqrt(1 + 8 * x)) / 2;
int j = x - i * (i - 1) / 2;
dostuff(i, j);
}
//#2
while(int x = counter.fetch_add(1) < n * n)
{
int i = x % n;
int j = x / n;
if (i < j)
dostuff(i, j);
}
My question is, what is the best way to share the workload of the main loop between threads for n < 10^6?
EDIT:
//dostuff
Element& a = elements[i];
Element& b = elements[j];
glm::dvec3 r = b.getPosition() - a.getPosition();
double rv = glm::length(r);
double base = G / (rv * rv);
glm::dvec3 dir = glm::normalize(r);
glm::dvec3 bd = dir * base;
accelerations[i] += bd * b.getMass();
accelerations[j] -= bd * a.getMass();
Your work is a triangle. You want to.divide the triangle into k distinct pieces.
If k is a power of 2 you can do this:
a
a a
b c d
b c d d
Each of those regions are equal in size.
Question Description : Given an array arr[] of length N, the task is to find the XOR of pairwise sum of every possible unordered pairs of the array.
I solved this question using the method described in this post.
My Code :
int xorAllSum(int a[], int n)
{
int curr, prev = 0;
int ans = 0;
for (int k = 0; k < 32; k++) {
int o = 0, z = 0;
for (int i = 0; i < n; i++) {
if (a[i] & (1 << k)) {
o++;
}
else {
z++;
}
}
curr = o * z + prev;
if (curr & 1) {
ans = ans | (1 << k);
}
prev = o * (o - 1) / 2;
}
return ans;
}
Code Descrption : I am finding out at each bit, whether our answer will have that bit set ort not. So to do this for each bit-position, I find the count of all the numbers which have a set bit at the position(represeneted by 'o' in the code) and the count of numbers having un-set bit at that position(represented by 'z').
Now if we pair up these numbers(the numbers having set bit and unset bit together, then we will get a set bit in their sum(Because we need to get XOR of all pair sums).
The factor of 'prev' is included to account for the carry over bits. Now we know that the answer will have a set bit at current position only if the number of set bits are 'odd' as we are doing an XOR operation.
But I am not getting correct output. Can anyone please help me
Test Cases :
n = 3, a[] = {1, 2, 3} => (1 + 2) ^ (1 + 3) ^ (2 + 3)
=> 3 ^ 4 ^ 5 = 2
=> Output : 2
n = 6
a[] = {1 2 10 11 18 20}
Output : 50
n = 8
a[] = {10 26 38 44 51 70 59 20}
Output : 182
Constraints : 2 <= n <= 10^8
Also, here we need to consider UNORDERED PAIRS and not Ordered Pairs for the answer
PS : I know that the same question has been asked before but I couldn't explain my problem with this much detail in the comments so I created a new post. I am new here, so please pardon me and give me your feedback :)
I suspect that the idea in the post you referred to is missing important details, if it could work at all with the stated complexity. (I would be happy to better understand and be corrected should that author wish to clarify their method further.)
Here's my understanding of at least one author's intention for an O(n * log n * w) solution, where w is the number of bits in the largest sum, as well as JavaScript code with a random comparison to brute force to show that it works (easily translatable to C or Python).
The idea is to examine the contribution of each bit one a time. Since in any one iteration, we are only interested in whether the kth bit in the sums is set, we can remove all parts of the numbers that include higher bits, taking them each modulo 2^(k + 1).
Now the sums that would necessarily have the kth bit set are in the intervals, [2^k, 2^(k + 1)) (that's when the kth bit is the highest) and [2^(k+1) + 2^k, 2^(k+2) − 2] (when we have both the kth and (k+1)th bits set). So in the iteration for each bit, we sort the input list (modulo 2^(k + 1)), and for each left summand, we decrement a pointer to the end of each of the two intervals, and binary search the relevant start index.
// https://stackoverflow.com/q/64082509
// Returns the lowest index of a value
// greater than or equal to the target
function lowerIdx(a, val, left, right){
if (left >= right)
return left;
mid = left + ((right - left) >> 1);
if (a[mid] < val)
return lowerIdx(a, val, mid+1, right);
else
return lowerIdx(a, val, left, mid);
}
function bruteForce(A){
let answer = 0;
for (let i=1; i<A.length; i++)
for (let j=0; j<i; j++)
answer ^= A[i] + A[j];
return answer;
}
function f(A, W){
const n = A.length;
const _A = new Array(n);
let result = 0;
for (let k=0; k<W; k++){
for (let i=0; i<n; i++)
_A[i] = A[i] % (1 << (k + 1));
_A.sort((a, b) => a - b);
let pairs_with_kth_bit = 0;
let l1 = 1 << k;
let r1 = 1 << (k + 1);
let l2 = (1 << (k + 1)) + (1 << k);
let r2 = (1 << (k + 2)) - 2;
let ptr1 = n - 1;
let ptr2 = n - 1;
for (let i=0; i<n-1; i++){
// Interval [2^k, 2^(k+1))
while (ptr1 > i+1 && _A[i] + _A[ptr1] >= r1)
ptr1 -= 1;
const idx1 = lowerIdx(_A, l1-_A[i], i+1, ptr1);
let sum = _A[i] + _A[idx1];
if (sum >= l1 && sum < r1)
pairs_with_kth_bit += ptr1 - idx1 + 1;
// Interval [2^(k+1)+2^k, 2^(k+2)−2]
while (ptr2 > i+1 && _A[i] + _A[ptr2] > r2)
ptr2 -= 1;
const idx2 = lowerIdx(_A, l2-_A[i], i+1, ptr2);
sum = _A[i] + _A[idx2]
if (sum >= l2 && sum <= r2)
pairs_with_kth_bit += ptr2 - idx2 + 1;
}
if (pairs_with_kth_bit & 1)
result |= 1 << k;
}
return result;
}
var As = [
[1, 2, 3], // 2
[1, 2, 10, 11, 18, 20], // 50
[10, 26, 38, 44, 51, 70, 59, 20] // 182
];
for (let A of As){
console.log(JSON.stringify(A));
console.log(`DP, brute force: ${ f(A, 10) }, ${ bruteForce(A) }`);
console.log('');
}
var numTests = 500;
for (let i=0; i<numTests; i++){
const W = 8;
const A = [];
const n = 12;
for (let j=0; j<n; j++){
const num = Math.floor(Math.random() * (1 << (W - 1)));
A.push(num);
}
const fA = f(A, W);
const brute = bruteForce(A);
if (fA != brute){
console.log('Mismatch:');
console.log(A);
console.log(fA, brute);
console.log('');
}
}
console.log("Done testing.");
Hi so I'm basically trying to understand this piece of code in regards to determining the lexicographically minimal string rotation but I just can't seem to understand why it works. I understand what the first two ifs do but the third one, when there is a new minimum, it takes the maximum between p and m+l+1. Does anyone have an explanation?
int p = 0, l = 0, m = 0, n = 0;
string inp;
cin >> inp;
n = inp.size();
p = l = 1;
while (p < n && m + l + 1 < n) {
if (inp[m + l] == inp[(p + l) % n])
++l;
if (inp[m + l] < inp[(p + l) % n])
p += l + 1, l = 0;
if (inp[m + l] > inp[(p + l) % n]) {
if (m + l + 1 < p) m = p;
else m = m + l + 1;
p = m + 1;
l = 0;
}
}
cout << m;
I have a record called Move that is defined as follows:
type Move = {
X : int
Y : int
By: CellState }
I have created a list of list of moves (Move list list) to store some data. I want to remove duplicate entries from this list. Each sublist in my example has the same contents but in a different order. It looks as follows when printed:
[[{X = 5;
Y = 1;
By = R;}; {X = 5;
Y = 0;
By = B;}; {X = 4;
Y = 0;
By = B;}]; [{X = 5;
Y = 0;
By = B;}; {X = 4;
Y = 0;
By = B;}; {X = 5;
Y = 1;
By = R;}];
[{X = 4;
Y = 0;
By = B;}; {X = 5;
Y = 1;
By = R;}; {X = 5;
Y = 0;
By = B;}]]
This list contains 3 lists that each have 3 records. Each list has the same records but in a different order. I want to know if there's a way to remove the duplicate sublists from the main list
If you order the sublists then List.distinct will do the trick:
yourList
|> List.map List.sort
|> List.distinct
I'm writing my own implementation of a Neural Network class in C++. I'm not sure how to refer to the weights this statement:
in = in + (inputs [l] * calcWeights [l]) ;
The reason is because there could be more weights than inputs. Here is my code:
void Train (int numInputs, int numOutputs, double inputs [], double outputs []) {
// Set the Random Seed:
srand (time (0)) ;
// Weights (n input(s) * n output(s) = n weight branch(es)):
double calcWeights [numInputs * numOutputs] ;
// Errors (n input(s) * n output(s) = n error branch(es)):
double errors [numInputs * numOutputs] ;
// Set the Weights to random:
for (int j = 0 ; j < numInputs ; j = j + 1) {
calcWeights [j] = ((-1 * numInputs) + (((double) rand ()) % (1 * numInputs))) ;
}
// Train:
int i = 0 ;
double in = 0 ;
double out [numOutputs] ;
while (i < 14999) {
// Get the estimated output:
for (int k = 0 ; k < numOutputs ; k = k + 1) {
for (int l = 0 ; l < numInputs ; l = l + 1) {
in = in + (inputs [l] * calcWeights [l]) ;
}
out [k] = in + GetBias () ;
}
for (int m = 0 ; m < numOutputs ; m = m + 1) {
error [m] = outputs [m] - out [m]
}
// Increment the iterator:
i = i + 1 ;
}
}
From your clarification in comments, I believe modifying your loop a bit will give you what you want.
for (int k = 0 ; k < numOutputs ; k = k + 1) {
in = 0; //Reset in to 0 at the beginning of each output loop
for (int l = 0 ; l < numInputs ; l = l + 1) {
in = in + (inputs [l] * calcWeights [l + k*numInputs]) ;
}
out [k] = in + GetBias () ;
}
You should also make sure you initialize all the weights above.
for (int j = 0 ; j < (numInputs * numOutputs) ; j = j + 1) {
calcWeights [j] = ((-1 * numInputs) + (((double) rand ()) % (1 * numInputs))) ;
}
For a couple of style choices I just want to point out that you can replace k = k + 1 with simply ++k. Likewise you can replace in = in + ...; with in += ...;