passing vector elements in a function [closed] - c++

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am new to multithreading so any suggestions will be very useful! I am implementing a multithreading program according to the following requirements:
The user inputs a list of integers as a vector. Each vector element N represents a cell.
The vector elements are passed to a threading function, from which the total cells at a certain time are calculated
The lifetime of the cells ( 0.1 + N % 8 seconds) is calculated. At half their lifetime, they breed a number (( N – N % 8 ) / 8 ) of child cells.
The child cells live the same amount as their parents, but die without breeding when their lifetime is over.
A cell monitor is started before the first genesis cell thread is created. The monitor will print out the number of existing live cells every second, so as to monitor how many cells are live.
A main function awaits input from user. vector inputs are given, it will start the
monitor thread and then start the genesis cells threads.

The code in the question has given me some uncertainties on how the reproduction is supposed to be performed, but I have made the following assumptions:
((N - N % 8) / 8) refers to how many child cells are produced by each parent, N, when it reaches half its lifetime, this is not what the code in question implies
The child cells live the same amount of time as their parents, starting from when they are created - so they outlive their parents instead of dying at the same time, this is again not what the code in question does
The scheme I would employ to accomplish the outlined simulation would be to have one thread which controls a time variable, either the main thread or a thread created specifically for the purpose. This thread will increment the time as needed, but will wait for all threads to check if their cell has died or needs to reproduce and perform necessary operations between increments. The example below demonstrates this approach.
I find this is a little bit easier and perhaps clearer when using std::atomic variables to store the number of living cells, the simulation time, the number of threads needing to be checked, etc. When using an atomic variable, the necessary memory fencing is performed for any increment or decrement without needing a std::mutex or other explicit synchronization. Additionally, it may be better to implement a class for the cells, that way they can store their own lifetime, if they are alive still, whether they are a child or a parent, whether they have children, etc.
Example
#include <iostream>
#include <thread>
#include <atomic>
#include <vector>
#include <mutex>
class Cell {
public:
Cell(int x, bool child = false) {
lifetime = (0.1 + x % 8);
n = x;
is_child = child;
alive = true;
has_children = false;
}
int lifetime;
int n;
bool is_child;
bool has_children;
bool alive;
};
std::mutex mtx; // This will be used to synchronize threads.push_back()
// when creating children cells
std::vector<Cell> cells;
std::vector<std::thread> threads;
std::atomic<int> t; // The simulation time
std::atomic<int> living; // The number of living cells
std::atomic<int> check; // This will be used to ensure every thread goes through the
// necessary checks every time step
void thread_function(Cell cell) {
int prev = t;
while (living > 0) {
while (prev == t) {if (living == 0) return;}
prev = (int)t;
if (!cell.has_children && !cell.is_child && t > cell.lifetime / 2.0) {
cell.has_children = true;
// Create children and send them to new threads
for (int ii = 0; ii < ((cell.n - cell.n % 8) / 8); ii ++) {
living ++;
Cell c(ii, true); // Create a new cell which will die
c.lifetime = cell.lifetime + t; // {lifetime} seconds from now
mtx.lock();
threads.push_back(std::thread(thread_function, c));
mtx.unlock();
}
}
if (cell.alive && t >= cell.lifetime) {
cell.alive = false;
living --;
}
check --;
}
}
int main(int argn, char** argv) {
living = argn - 1;
if (argn > 1) {
for (int ii = 1; ii < argn; ii ++) {
cells.push_back(Cell(atoi(argv[ii])));
threads.push_back(std::thread(thread_function, cells[ii-1]));
}
}
t = 0;
while (living > 0) {
std::cout << "Total Cells: "+std::to_string(living)+" [ "+std::to_string(t)+
" s ]\n" << std::flush;
check = threads.size();
t ++;
while (check > 0) {
if (living == 0) break;
}
}
std::cout << "Total Cells: "+std::to_string(living)+" [ "+std::to_string(t)+
" s ]\n" << std::flush;
for (int ii = 0; ii < threads.size(); ii ++) {
threads[ii].join();
}
}
./cells 1 2 3 4 5 6 7
Total Cells: 7 [ 0 s ]
Total Cells: 6 [ 1 s ]
Total Cells: 5 [ 2 s ]
Total Cells: 4 [ 3 s ]
Total Cells: 3 [ 4 s ]
Total Cells: 2 [ 5 s ]
Total Cells: 1 [ 6 s ]
Total Cells: 0 [ 7 s ]
./cells 21 12 6 7 1 17 25
Total Cells: 7 [ 0 s ]
Total Cells: 9 [ 1 s ]
Total Cells: 4 [ 2 s ]
Total Cells: 7 [ 3 s ]
Total Cells: 6 [ 4 s ]
Total Cells: 5 [ 5 s ]
Total Cells: 4 [ 6 s ]
Total Cells: 2 [ 7 s ]
Total Cells: 0 [ 8 s ]
You can achieve the same result using mutexes by surrounding every increment and decrement of check, t, and living.
Note Using global variables as I have is not good practice, I have done so only to simplify the demonstration of the multithreading, in practice it would be best to wrap them in a namespace, refactor the entire simulation into a class, or something of the like.

Related

Creating the Backtracking Algorithm for n-queen Problem [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 months ago.
Improve this question
I have tried to come up with a solution to the n-queen problem, through backtracking. I have created a board, and I think I have created functions which checks whether a piece can be placed at position column2 or not, in comparison to a piece at position column1. And I guess I somehow want to loop through the columns, to check if the current piece is in a forbidden position to any of the power pieces already placed at the first row through the current minus one. I haven't done this yet, but I'm just confused at the moment, so I can't really see how I should do it.
Let me share the code I have written so far
// Method for creating chessboard
vector<vector<vector<int>>> create_chessboard(int size_of_board)
{
vector<int> v1;
vector<vector<int>> v2;
vector<vector<vector<int>>> v3;
for (int i = 0; i < size_of_board; i++)
{
for (int j = 0; j < size_of_board; j++)
{
v1.clear();
v1.push_back(i);
v1.push_back(j);
v2.push_back(v1);
}
v3.push_back(v2);
v2.clear();
}
return v3;
}
// Method for visualizing chessboard
void visualize_board(vector<vector<vector<int>>> chess, int dimension_of_board)
{
int i = 1;
for (vector<vector<int>> rows : chess)
{
for (int j = 0; j < dimension_of_board; j++)
{
cout << "(" << rows[j][0] << "," << rows[j][1] << ")" << " ";
}
cout << endl;
}
}
// Method for checking if two coordinates are on the same diagonal
bool check_diagonal(vector<int> coordinate1, vector<int> coordinate2)
{
if(abs(coordinate1[1] - coordinate2[1]) == abs(coordinate1[0] - coordinate2[0]))
{
return true;
}
return false;
}
bool check_column(vector<int> coordinate1, vector<int> coordinate2)
{
if(coordinate1[1] == coordinate2[1])
{
return true;
}
return false;
}
bool check_row(vector<int> coordinate1, vector<int> coordinate2)
{
if (coordinate1[0] == coordinate2[0])
{
return true;
}
return false;
}
bool check_allowed_positions(vector<int> coordinate1, vector<int> coordinate2, int column)
{
if (check_diagonal(coordinate1, coordinate2))
{
return false;
}
if (check_column(coordinate1, coordinate2))
{
return false;
}
if (check_row(coordinate1, coordinate2))
{
return false;
}
return true;
}
vector<vector<int>> solve_nqueen(vector<vector<vector<int>>> board, int dimension_of_board, int row)
{
vector<int> first_element = board[0][0];
vector<vector<int>> solution_space;
if (dimension_of_board == row)
{
cout << "we found a solution!";
}
/*
if (dimension_of_board == row)
{
}
for (int j = 0; j < dimension_of_board; j++)
{
if (check_allowed_positions(board, row, j))
{
do something here
solve_nqueen(board, dimension_of_board, row+1);
}
else
{
do something here;
}
}
return;
*/
return solution_space;
}
I would be really happy if someone could just lay up a few steps I have to take in order to build the solve_nqueen function, and maybe some remarks on how I could do that. If I should complement with some further information, just let me know! I'm happy to elaborate.
I hope this isn't a stupid question, but I have been trying to search the internet for a solution. But I didn't manage to use what I found.
Best wishes,
Joel
There is not always a solution, like e.g. not for 2 queens on 2x2 board, or for 3 queens on a 3x3 board.
This is a well-known problem (which can also be found in the internet). According to this, there is not a simple rule or structure, how you can find a solution. In fact, you could reduce the problem by symmetries, but that is not that simple, too.
Well according to this, you have to loop through all (n out of n x n) solutions, and do all tests for every queen. (In fact, reduce it to half again, by only checking a certain pair of queens, once only - but again that is not much, and such reduction takes some time, too).
Note: Your check routines are correct.
For 8 queens on a 8x8 board, write 8 nested loops from i(x)=0 to 63
(row is i(x)%8 and column is i(x)/8). You also need to check then, if a queen does not sit on queen, but your check routines will already find that. Within second nested loop, you can already check if the first two queens are okay, or otherwise, you do not have to go any deeper, but can already increment the value of first nested loop (move the second queen on a new position).
Also it would be nice, I propose not to write the search for a n-problem, but for a n=8 problem or n=7 problem. (That is easier for the beginning.).
Speed-Ups:
While going deeper into the nested loops, you might hold a quick
record (array) of positions which already did not work for upper
loops (still 64 records to check, but could be written to be faster than doing your check routines again).
Or even better, do the inner loops only through a list from remaining candidates, much less than (n x n) positions.
There should be some more options for speed-ups, which you might find.
Final proposal: do not only wait for the full result to come, but also track, when e.g. you find a valid position of 5 queens, then of 6 queens and so on - which will be more fun then (instead of waiting ages with nothing happening).
A further idea is not to loop, e.g. from 0 to 63 for each queen, but "randomly". Which also might lead to more surprising. For this, mix an array 0 .. 63 to a random order. Then, still do the loop from 0 to 63 but this is just the index to the random vector. Al right? Anyway, it would even be more interesting to create 8 random vectors, for each queen one random vector. If you run this program then, anything could happen ... the first few trials could (theoretically) already deliver a successful result.
If you would like to become super efficient, please note that the queen state on the 8x8 board can be stored in one 64-bit-integer variable (64 times '0' or '1' where '1' means here is queen. Keyword: bitboards). But I didn't mention this in the beginning, because the approach which you started is quite different.
And from that on, you could create 64 bit masks for each queen position, to each position to which a queen can go. Then you only need to do 1 "bitwise AND" operation of two (properly defined) 64-bit variables, like a & b, which replaces your (diagonal-, column-, row-) check routines by only one operation and thus is much faster.
Avoid too many function calls, or use inline.
... an endless list of possible dramatic speed-ups: compiler options, parallelization, better algorithms, avoid cache misses (work on a possibly low amount of memory or access memory in a regular way), ... as usual ...
My best answer, e.g. for 8-queen problem:
queen is between 0 .. 7
queen is between 8 .. 15
queen is between 16 .. 23
queen is between 24 .. 31
queen is between 32 .. 39
queen is between 40 .. 47
queen is between 48 .. 55
queen is between 56 .. 63
because all 8 queens have to be on different rows!
These are the limits of the nested loops then, which gives "only"
8 * 8 * 8 * 8 * 8 * 8 * 8 * 8 = 16777216
possibilities to be checked. This can be quick on modern machines.
Then probably you don't need anything more sophisticated (to which my first answer refers - for the 8x8 queens problem.) Anyway, you could still also keep a record of which column is still free, while diving into the nested loops, which yields a further dramatic cut down of checks.
I wrote some C code (similar to C++) to verify my answer. In fact, it is very fast, much less than a second (real 0m0,004s; user 0m0,003s; sys 0m0,001s). The code finds the correct number of 92 solutions for the 8x8 queens problem.
#include <stdio.h>
int f(int a, int b)
{
int r1, c1, r2, c2, d1, d2;
int flag = 1;
r1 = a / 8;
r2 = b / 8;
c1 = a % 8;
c2 = b % 8;
d1 = r1 - r2;
d2 = c1 - c2;
if( d1 == d2 || d1 == -d2 || c1 == c2 ) flag=0;
return flag;
}
int main()
{
int p0,p1, p2, p3, p4, p5, p6, p7;
int solutions=0;
for(p0=0; p0<8; p0++)
{
for(p1=8; p1<16; p1++)
{
if( f(p0,p1) )
for(p2=16; p2<24; p2++)
{
if( f(p0,p2) && f(p1,p2) )
for(p3=24; p3<32; p3++)
{
if( f(p0,p3) && f(p1,p3) && f(p2,p3) )
for(p4=32; p4<40; p4++)
{
if( f(p0,p4) && f(p1,p4) && f(p2,p4) && f(p3,p4))
for(p5=40; p5<48; p5++)
{
if( f(p0,p5) && f(p1,p5) && f(p2,p5) && f(p3,p5) && f(p4,p5) )
for(p6=48; p6<56; p6++)
{
if( f(p0,p6) && f(p1,p6) && f(p2,p6) && f(p3,p6) && f(p4,p6) && f(p5,p6))
for(p7=56; p7<64; p7++)
{
if( f(p0,p7) && f(p1,p7) && f(p2,p7) && f(p3,p7) && f(p4,p7) && f(p5,p7) && f(p6,p7))
{
solutions++;
// 0 .. 63 integer print
printf("%2i %2i %2i %2i %2i %2i %2i %2i\n",
p0,p1,p2,p3,p4,p5,p6,p7);
// a1 .. h8 chess notation print
//printf("%c%d %c%d %c%d %c%d %c%d %c%d %c%d %c%d\n",
//p0%8+'a', p0/8+1, p1%8+'a', p1/8+1, p2%8+'a', p2/8+1, p3%8+'a', p3/8+1,
//p4%8+'a', p4/8+1, p5%8+'a', p5/8+1, p6%8+'a', p6/8+1, p7%8+'a', p7/8+1);
}
}
}
}
}
}
}
}
}
printf("%i solutions have been found\n",solutions);
return 1;
}
Notes: Subroutine f checks if two queen positions are "ok" with each other (1 means true, 0 means false, in C). An inner loop is only entered, if all already selected positions (in outer loops) are "ok" with each other.

How to make my CodeChef solution code faster?

I am a beginner currently in first semester. I have been practising on Code Chef and am stuck at this problem. They are asking to reduce the execution time of my code. The problem goes as follows:
Meliodas and Ban are fighting over chocolates. Meliodas has X chocolates, while Ban has Y. Whoever has lesser number of chocolates eats as many chocolates as he has from the other's collection. This eatfest war continues till either they have the same number of chocolates, or at least one of them is left with no chocolates.
Can you help Elizabeth predict the total no of chocolates they'll be left with at the end of their war?
Input:
First line will contain T, number of testcases. Then the testcases follow.
Each testcase contains of a single line of input, which contains two integers X,Y, the no of chocolates Meliodas and Ban have, respectively.
Output:
For each testcase, output in a single line the no of chocolates that remain after Ban and Meliodas stop fighting.
Sample Input:
3
5 3
10 10
4 8
Sample Output:
2
20
8
My code is as follows:
#include <iostream>
using namespace std;
int main()
{
unsigned int t,B,M;
cin>>t;
while(t--)
{
cin>>M>>B;
if(B==M)
{
cout<<B+M<<endl;
}
else
{
for(int i=1;B!=M;i++)
{
if(B>M)
B=B-M;
else
M=M-B;
}
cout<<M+B<<endl;
}
}
return 0;
}
Assuming that Band Mare different from 0, this algorithm corresponds to one version of the Euclidean algorithm. Therefore, you can simply:
std::cout << 2 * std::gcd(B, M) << "\n";
If at least one of the quantity is equal to 0, then just print B + M.
After realizing that your code was correct, I wondered where could be any algorithmic improvement. And I realized that eating as many chocolate from the peer as one has was in fact close to a modulo operation. If both number are close, a minus operation could be slightly faster than a modulo one, but if one number is high, while the other is 1, you immediately get it instead of looping a great number of times...
The key to prevent stupid errors is to realize that if a modulo is 0, that means that the high number is a multiple of the small one and we must stop immediately writing twice the lower value.
And care should be taken that if one of the initial counts are 0, the total number will never change.
So the outer loop should become:
if(B==M || B == 0 || M == 0)
{
cout<<B+M<<"\0";
}
else {
for (;;) {
if (M < B) {
B = B % M;
if (B == 0) {
cout << M * 2 << '\n';
break;
}
}
else {
M = M % B;
if (M == 0) {
cout << B * 2 << '\n';
break;
}
}
}
}
...
Note: no infinite loop is possible here because a modulo ensures that for example is M > B > 0' after M = M % Byou will haveB > M >= 0and as the case== 0` is explicitely handled the number of loops cannot be higher than the lower number.

Counting active tasks using start time and duration in C++

The input consists of a set of tasks given in increasing order of start time, and each task has a certain duration associated.
The first line is number of tasks, for example
3
2 5
4 23
7 4
This means that there are 3 tasks. The first one starts at time 2, and ends at 7 (2+5). Second starts at 4, ends at 27. Third starts at 7, ends at 11.
We assume each task starts as soon as it is ready, and does not need to wait for a processor or anything else to free up.
This means we can keep track of number of active tasks:
Time #tasks
0 - 2 0
2 - 4 1
4 - 11 2
11 - 27 1
I need to find 2 numbers:
Max number of active tasks at any given time (2 in this case) and
Average number of active tasks over the entire duration computed here as :
[ 0*(2-0) + 1*(4-2) + 2*(11-4) + 1*(27-11) ] / 27
For this,
I have first found the time when all tasks have come to an end using the below code:
#include "stdio.h"
#include "stdlib.h"
typedef struct
{
long int start;
int dur;
} task;
int main()
{
long int num_tasks, endtime;
long int maxtime = 0;
scanf("%ld",&num_tasks);
task *t = new task[num_tasks];
for (int i=0;i<num_tasks;i++)
{
scanf("%ld %d",&t[i].start,&t[i].dur);
endtime = t[i].start + t[i].dur;
if (endtime > maxtime)
maxtime = endtime;
}
printf("%ld\n",maxtime);
}
Can this be done using Priority Queues implemented as heaps ?
Your question is rather broad, so I am going to just give you a teaser answer that will, hopefully, get you started, attempting to answer your first part of the question, with a not necessarily optimized solution.
In your toy input, you have:
2 5
4 23
7 4
thus you can compute and store in the array of structs that you have, the end time of a task, rather than its duration, for later usage. That gives as an array like this:
2 7
4 27
7 11
Your array is already sorted (because the input is given in that order) by start time, and that's useful. Use std::sort to sort the array, if needed.
Observe how you could check for the end time of the first task versus the start time of the other tasks. With the right comparison, you can determine the number of active tasks along with the first task. Checking whether the end time of the first task is greater than the start time of the second task, if true, denotes that these two tasks are active together at some point.
Then you would do the same for the comparison of the first with the third task. After that you would know how many tasks were active in relation with the first task.
Afterwards, you are going to follow the same procedure for the second task, and so on.
Putting all that together in code, we get:
#include "stdio.h"
#include "stdlib.h"
#include <algorithm>
typedef struct {
int start;
int dur;
int end;
} task;
int compare (const task& a, const task& b) {
return ( a.start < b.start );
}
int main() {
int num_tasks;
scanf("%d",&num_tasks);
task *t = new task[num_tasks];
for (int i=0;i<num_tasks;i++) {
scanf("%d %d",&t[i].start,&t[i].dur);
t[i].end = t[i].start + t[i].dur;
}
std::sort(t, t + num_tasks, compare);
for (int i=0;i<num_tasks;i++) {
printf("%d %d\n", t[i].start, t[i].end);
}
int max_noOf_tasks = 0;
for(int i = 0; i < num_tasks - 1; i++) {
int noOf_tasks = 1;
for(int j = i + 1; j < num_tasks; j++) {
if(t[i].end > t[j].start)
noOf_tasks++;
}
if(max_noOf_tasks < noOf_tasks)
max_noOf_tasks = noOf_tasks;
}
printf("Max. number of active tasks: %d\n", max_noOf_tasks);
delete [] t;
}
Output:
2 7
4 27
7 11
Max. number of active tasks: 2
Now, good luck with the second part of your question.
PS: Since this is C++, you could have used an std::vector to store your structs, rather than a plain array. That way you would avoid dynamic memory allocation too, since the vector takes care that for you automatically.

How to improve upon this?

There are n groups of friends staying in the queue in front of bus station. The i-th group consists of ai men. Also, there is a single bus that works on the route. The size of the bus is x, that is it can transport x men simultaneously.
When the bus comes (it always comes empty) to the bus station, several groups from the head of the queue goes into the bus. Of course, groups of friends don't want to split, so they go to the bus only if the bus can hold the whole group. In the other hand, none wants to lose his position, that is the order of groups never changes.
The question is: how to choose the size x of the bus in such a way that the bus can transport all the groups and everytime when the bus moves off the bus station there is no empty space in the bus (the total number of men inside equals to x)?
Input Format:
The first line contains the only integer n (1≤n≤10^5). The second line contains n space-separated integers a1,a2,…,an (1≤ai≤10^4).
Output Format:
Print all the possible sizes of the bus in the increasing order.
Sample:
8
1 2 1 1 1 2 1 3
Output:
3 4 6 12
I made this code:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int main(void)
{
int max=0,sum=0,i,n;
cin>>n;
int values[100000];
for ( i = 0; i < n; i++ )
{
cin>>values[i];
sum = sum + values[i];
if ( values[i] > max )
max = values[i];
}
int p = 0,j;
int count = 0;
vector<int> final;
for ( i = 0; i < n; i++ )
{
p = p + values[i];
j = 0;
if ( p >= max && sum%p == 0)
{
flag = 0;
while ( j < n )
{
garb = p;
while (garb!= 0)
{
garb = garb - values[j++];
if ( garb < 0 )
flag = 1;
}
}
if ( flag == 0 )
{
final.push_back(p);
count++;
}
}
}
sort(final.begin(),final.end());
for ( j = 0; j < count; j++ )
{
cout<<final[j]<<"\t";
}
return 0;
}
Edit: I did this in which basically, I am checking if the found divisor satisfies the condition, and if at any point of time, I get a negative integer on taking difference with the values, I mark it by using a flag. However, it seems to give me a seg fault now. Why?
I firstly, calculated the maximum value out of the all possible values, and then, I checked if its a divisor of the sum of the values. However, this approach doesn't work for the input as:
10
2 2 1 1 1 1 1 2 1 2
My output is
2 7 14
whereas the output should be
7 14
only.
Any other approach that I can go with?
Thanks!
I can think of the following simple solution (since your present concern is correctness and not time complexity):
Calculate the sum of all ai's (as you are already doing).
Calculate the maximum of all ai's (as you are already doing).
Find all the factors of sum that are > max(ai).
For each factor, iterate through the ai's and check whether the bus condition is satisfied.

C++ Intel TBB and Microsoft PPL, how to use next_permutation in a parallel loop?

I have Visual Studio 2012 with Intel parallel studio 2013 installed, so I have Intel TBB.
Say I have the following piece of code:
const int cardsCount = 12; // will be READ by all threads
// the required number of cards of each colour to complete its set:
// NOTE that the required number of cards of each colour is not the same as the total number of cards of this colour available
int required[] = {2,3,4}; // will be READ by all threads
Card cards[cardsCount]; // will be READ by all threads
int cardsIndices[cardsCount];// this will be permuted, permutations need to be split among threads !
// set "cards" to 4 cards of each colour (3 colours total = 12 cards)
// set cardsIndices to {0,1,2,3...,11}
// this variable will be written to by all threads, maybe have one for each thread and combine them later?? or can I use concurrent_vector<int> instead !?
int logColours[] = {0,0,0};
int permutationsCount = fact(cardsCount);
for (int pNum=0; pNum<permutationsCount; pNum++) // I want to make this loop parallel !!
{
int countColours[3] = {0,0,0}; // local loop variable, no problem with multithreading
for (int i=0; i<cardsCount; i++)
{
Card c = cards[cardsIndices[i]]; // accessed "cards"
countColours[c.Colour]++; // local loop variable, np.
// we got the required number of cards of this colour to complete it
if (countColours[c.Colour] == required[c.Colour]) // read global variable "required" !
{
// log that we completed this colour and go to next permutation
logColours[c.Colour] ++; // should I use a concurrent_vector<int> for this shared variable?
break;
}
}
std::next_permutation(cardsIndices, cardsIndices+cardsCount); // !! this is my main issue
}
What I'm calculating is how many times we will complete a colour if we pick randomly from available cards, and that's done exhaustively by going through each permutation possible and picking sequentially, when a colour is "complete" we break and go to the next permutation. Note that we have 4 cards of each colour but the required number of cards to complete each colour is {2,3,4} for Red, Green, Blue. 2 red cards are enough to complete red and we have 4 available, hence red is more likely to be completed than blue which requires all 4 cards to be picked.
I want to make this for-loop parallel, but my main problem is how to deal with "cards" permutations? you have ~0.5 billion permutation here (12!), if I have 4 threads how can I split this into 4 different quarters and let every thread go through each of them?
What if I don't know the number of cores the machine has and I want the program to automatically choose the right number of concurrent threads? surely there must be a way to do that using Intel or Microsoft tools?
This is my Card struct just in case:
struct Card
{
public:
int Colour;
int Symbol;
}
Let N = cardsNumber, M = required[0] * required[1] * ... * required[maxColor].
Then, actually, your problem could be easily solved in O(N * M) time. In your very case, that is 12 * 2 * 3 * 4 = 288 operations. :)
One of possible ways to do this is to use a recurrence relation.
Consider a function logColours f(n, required). Let n be the current number of already considered cards; required is a vector from your example. Function returns the answer in a vector logColours.
You are interested in f(12, {2,3,4}). Brief recurrent calculation inside a function f could be written like this:
std::vector<int> f(int n, std::vector<int> require) {
if (cache[n].count(require)) {
// we have already calculated function with same arguments, do not recalculate it again
return cache[n][require];
}
std::vector<int> logColours(maxColor, 0); // maxColor = 3 in your example
for (int putColor=0; putColor<maxColor; ++putColor) {
if (/* there is still at least one card with color 'putColor'*/) {
// put a card of color 'putColor' on place 'n'
if (require[putColor] == 1) {
// means we've reached needed amount of cards of color 'putColor'
++logColours[putColor];
} else {
--require[putColor];
std::vector<int> logColoursRec = f(n+1, require);
++require[putColor];
// merge child array into your own.
for (int i=0; i<maxColor; ++i)
logColours[i] += logColoursRec[i];
}
}
}
// store logColours in a cache corresponding to this function arguments
cache[n][required] = std::move(logColours);
return cache[n][required];
}
Cache could be implemented as an std::unordered_map<int, std::unordered_map<std::vector<int>, std::vector<int>>>.
Once you understand the main idea, you'll be able to implement it in even more efficient code.
You can easy make your code run in parallel with 1,2, ..., or cardsCount threads by fixing the first element of permutation and calling std::next_permutation on other elements independently in each threads.
Consider the following code:
// declarations
// #pragma omp parallel may be here
{ // start of a parallel section
const int start = (cardsCount * threadIndex) / threadNumber;
const int end = (cardsCount * (threadIndex + 1)) / threadNumber;
int cardsIndices[cardsCount]; // a local array for each thread
for (const int firstElement = start; firstElement < end; ++firstElement) {
cardsIndices[0] = firstElement;
// fill other cardsIndices with elements [0-cardsCount], but skipping firstElement
do {
// your calculations go here
} while (std::next_permutation(cardsIndices + 1, cardsIndices + cardsCount)); // note the +1 here
}
}
If you wish to use OpenMP as a parallelization tool, you only have to
add #pragma omp parallel just before the parallel section. And use
omp_get_thread_num() function to get a thread index.
You also do not have to use a concurrent_vector here, this would
probably make your program extremely slow, use a thread-specific
accumulation array:
logColours[threadNumber][3] = {};
++logColours[threadIndex][c.Colour];
If Card is a rather heavy class, I would suggest using const Card& c = ... instead of copying each time Card c = ....
I guess this is an amateur friendly version of what #Ixanezis means
If red wins
the final outcome will be: 2 red, 0-2 green, 0-3 blue
Say the winning red is A, and the other red is B, there are 12 ways to get A and B.
The following are the possible cases:
Cases: #Cards after A #Cards before A #pick green #pick blue
0 green, 0 blue: 10! = 3628800 1! = 1 1 1
0 green, 1 blue: 9 ! = 362880 2! = 2 1 4
0 green, 2 blue: 8 ! = 40320 3! = 6 1 6
0 green, 3 blue: 7 ! = 5040 4! = 24 1 4
1 green, 0 blue: 9 ! = 362880 2! = 2 4 1
1 green, 1 blue: 8 ! = 40320 3! = 6 4 4
1 green, 2 blue: 7 ! = 5040 4! = 24 4 6
1 green, 3 blue: 6 ! = 720 5! = 120 4 4
2 green, 0 blue: 8 ! = 40320 3! = 6 6 1
2 green, 1 blue: 7 ! = 5040 4! = 24 6 4
2 green, 2 blue: 6 ! = 720 5! = 120 6 6
2 green, 3 blue: 5 ! = 120 6! = 720 6 4
Lets sumproduct those 4 arrays: = 29064960, then multiply by 12 = 348779520
Similarly you can calc for green wins for blue wins.
You can use std::thread::hardware_ concurrency() from <thread>. Quoting from "C++ Concurrency in action" by A.Williams -
One feature of the C++ Standard Library that helps here is
std::thread::hardware_ concurrency(). This function returns an
indication of the number of threads that can truly run concurrently
for a given execution of a program. On a multicore system it might be
the number of CPU cores, for example.