Parallelize for-loop in c++ - memory error - c++

I am trying to parallelize a for-loop in C++, but every time I try to use this loop for a larger data set, I get this error:
Process returned -1073741819 (0xC0000005)
For small data sets the loop works and for larger sets the initialization works but after this I get memory errors.
I am using Codeblocks and the GNU GCC Compiler.
In this loop I want to run several iterations of an optimiziation evolutionary heuristic.
I am using openmp and tried to put the variables which are used in several threads in private.
#include <omp.h>
void search_framework(Data &data, Solution &best_s)
{
vector<Solution> pop(data.p_size);
vector<Solution> child(data.p_size);
for (int i = 0; i < data.p_size; i++)
{
pop[i].reserve(data);
child[i].reserve(data);
}
// parent index in pop
vector<tuple<int, int>> p_indice(data.p_size);
bool time_exhausted = false;
int run = 1;
#pragma omp parallel for firstprivate(pop, pop_fit, pop_argrank, child, child_fit, child_argrank, p_indice)
for (int run = 1; run <= data.runs; run++)
{
run++;
int no_improve = 0;
int gen = 0;
initialization(pop, pop_fit, pop_argrank, data);
local_search(pop, pop_fit, pop_argrank, data);
while (!termination(no_improve, data))
{
gen++;
// printf("---------------------------------Gen %d---------------------------\n", gen);
no_improve++;
// select parents
select_parents(pop, pop_fit, p_indice, data);
// do local search for children
local_search(child, child_fit, child_argrank, data);
// replacement
replacement(pop, p_indice, child, pop_fit, pop_argrank, child_fit, child_argrank, data);
// update best
argsort(pop_fit, pop_argrank, data.p_size);
update_best_solution(pop[pop_argrank[0]], best_s, used, run, gen, data);
if (data.tmax != NO_LIMIT && used > clock_t(data.tmax))
{
time_exhausted = true;
break;
}
}
if (time_exhausted) run = data.runs;
}
}
Edited: This is the part where pop etc.. is initialized:
void initialization(vector<Solution> &pop, vector<double> &pop_fit, vector<int> &pop_argrank, Data &data)
{
int len = int(pop.size());
for (int i = 0; i < len; i++)
{
pop[i].clear(data);
}
for (int i = 0; i < len; i++)
{
data.lambda_gamma = data.latin[i];
new_route_insertion(pop[i], data);
}
for (int i = 0; i < len; i++)
{
pop_fit[i] = pop[i].cost;
}
argsort(pop_fit, pop_argrank, len);
}

You duplicated for increasing run
One for (int run = 1; run <= data.runs; run++)
and right below run++
I don't know what is the 'Data' in this case, but I guess this is unstable.
If not, I guess the type of data.runs is unsigned long, be careful with
for (int run = 1; run <= data.runs; run++)
The range of int is "-2147483648 to 2147483647", if the value of data.runs out of int range, this is very dangerous, it may create an infinite loop.

Try increasing the stack size for each OMP thread using the OMP_STACKSIZE environment variable.
https://gcc.gnu.org/onlinedocs/gcc-12.1.0/libgomp/OMP_005fSTACKSIZE.html
I think the private data structures get put on the stack. So increasing the problem size will eventually exceed the reserved stack space.

Related

"void value not ignored as it ought to be" When trying to sort an array

I'm new to programming and trying to program a sensor. I'm sorting the array that I plot the input from the sensor into. But I'm running into a problem when sorting it, it gives me this error:
exit status 1
void value not ignored as it ought to be"
Here's my code:
#include <ArduinoSort.h>
int a[10];
int b[10];
int sensor1 = A0;
int sensor2 = A1;
int display1;
int display2;
int sort[10];
void setup() {
// put your setup code here, to run once:
Serial.begin(9600);
}
void loop() {
// put your main code here, to run repeatedly:
delay(100);
Serial.print("Sensor 1: [");
for (int i = 0; i < 11; i++) {
display1 = analogRead(sensor1);
a[i] = display1;
sort[i] = sortArray(a, 10);
if (i < 10) {
Serial.print(String(sort[i]) + ",");
} else {
Serial.print(String(sort[i]));
}
}
Serial.print("]");
Serial.println();
Serial.print("Sensor 2: [");
for (int i = 0; i < 11; i++) {
display2 = analogRead(sensor2);
b[i] = display2;
if (i < 10) {
Serial.print(String(b[i]) + ",");
} else {
Serial.print(String(b[i]));
}
}
Serial.print("]");
Serial.println();
Serial.println();
//String hej = "Sensor 1"+ String(display1);
//Serial.println(hej);
}
and, on this line I get the error:
sort[i] = sortArray(a,10);
The sortArray() function returns nothing i.e. void.
It sorts the passed array in-place so you need to call it like this:
sortArray(a, 10); // no return value assignment
As pointed out by Ben Voigt, you have not initialized a properly so you need to first collect all the values and then sort it after the loop.
Here's your loop (after correction of sortArray after loop):
for (int i = 0; i < 11; i++) { // 0-10 i.e. 11 iterations
a[i] = analogRead(sensor1); // read values in the array
// ...
}
sortArray(a, 10); // sort after the loop
Another problem is that your loop's condition is i < 11 which means the iterations from 0 to 10 i.e. 11 iterations. But, the arrays you're using are of size 10 i.e. 0 to 9 locations as C++ arrays are ZERO-based. So, this is causing out-of-bounds access resulting in Undefined Behavior.
So, your loop iterations and the array sizes should match i.e. 10 iterations and 10 memory locations to write to.
It's better to use a constant and use that at all places like this:
const int SIZE = 10;
int a[SIZE] = {0}; // initialize if it's not an overhead for arduino
for (int i = 0; i < SIZE; i++) {
a[i] = analogRead(sensor1);
// ...
}
sortArray(a, SIZE);
You have a somewhat bigger problem with this code.
You simply cannot "print outputs in sorted order" while you are still collecting them. You don't know whether the "smallest" right now should be printed, or wait because an even smaller one is forthcoming.

How can I avoid dividing by zero without too many conditionals?

I have an integer parameter which is supposed to control how many times in a particular run an event occurs.
For example, if the number of iterations for each run is 1000, then the parameter FREQ would be 5, if I wanted to have the event occur every 200 iterations. However, I want to be able to change the number of iterations, but keep the ratio the same, and also to be able to set the FREQ parameter to 0 if I don't want the event to occur at all.
Here is what I am currently using:
int N_ITER = 1000;
int FREQ = 5;
void doRun(){
int count = 0;
for (int i = 0; i < N_ITER; ++i){
if (FREQ > 0){
if (count < N_ITER/FREQ){
doSomething();
count++;
}
else{
doSomethingElse();
count = 0;
}
}
else{
doSomething();
}
}
}
This works fine, but it doesn't sit right with me having the nested conditionals, especially when I have two lots of doSomething(), I feel like it should be able to be accomplished more easily than that.
I tried making the one conditional if (FREQ > 0 && count < N_ITER/FREQ) but obviously that threw a Floating point exception because of the divide by zero.
I also tried using a try/catch block, but it really was no different, in terms of messiness, to using the nested conditionals. Is there a more elegant solution to this problem?
How about rearranging the condition? Instead of count < N_ITER/FREQ, use count*FREQ < N_ITER. If FREQ = 0, the expression will still be true.
int N_ITER = 1000;
int FREQ = 5;
void doRun() {
int count = 0;
for (int i = 0; i < N_ITER; ++i) {
if (count*FREQ < N_ITER) {
doSomething();
count++;
} else {
doSomethingElse();
count = 0;
}
}
}

threading program in C++ not faster

I have a program which reads the file line by line and then stores each possible substring of length 50 in a hash table along with its frequency. I tried to use threads in my program so that it will read 5 lines and then use five different threads to do the processing. The processing involves reading each substring of that line and putting them into hash map with frequency. But it seems there is something wrong which I could not figure out for which the program is not faster then the serial approach. Also, for large input file it is aborted. Here is the piece of code I am using
unordered_map<string, int> m;
mutex mtx;
void parseLine(char *line, int subLen){
int no_substr = strlen(line) - subLen;
for(int i = 0; i <= no_substr; i++) {
char *subStr = (char*) malloc(sizeof(char)* subLen + 1);
strncpy(subStr, line+i, subLen);
subStr[subLen]='\0';
mtx.lock();
string s(subStr);
if(m.find(s) != m.end()) m[s]++;
else {
pair<string, int> ret(s, 1);
m.insert(ret);
}
mtx.unlock();
}
}
int main(){
char **Array = (char **) malloc(sizeof(char *) * num_thread +1);
int num = 0;
while (NOT END OF FILE) {
if(num < num_th) {
if(num == 0)
for(int x = 0; x < num_th; x++)
Array[x] = (char*) malloc(sizeof(char)*strlen(line)+1);
strcpy(Array[num], line);
num++;
}
else {
vector<thread> threads;
for(int i = 0; i < num_th; i++) {
threads.push_back(thread(parseLine, Array[i]);
}
for(int i = 0; i < num_th; i++){
if(threads[i].joinable()) {
threads[i].join();
}
}
for(int x = 0; x < num_th; x++) free(seqArray[x]);
num = 0;
}
}
}
It's a myth that just by the virtue of using threads, the end result must be faster. In general, in order to take advantage of multithreading, two conditions must be met(*):
1) You actually have to have sufficient physical CPU cores, that can run the threads at the same time.
2) The threads have independent tasks to do, that they can do on their own.
From a cursory examination of the shown code, it seems to fail on the second part. It seems to me that, most of the time all of these threads will be fighting each other in order to acquire the same mutex. There's little to be gained from multithreading, in this situation.
(*) Of course, you don't always use threads for purely performance reasons. Multithreading also comes in useful in many other situations too, for example, in a program with a GUI, having a separate thread updating the GUI helps the UI working even while the main execution thread is chewing on something, for a while...

c++ BWAPI exception access violation

is there anybody using BWAPI who gets access violation error when accessing the Unit objects of the current game?
i am certain that the error is not in my code.. anyway.. is there anything i can do to avoid access violation?
i am getting this error sometimes at line with the comment bellow.. this code bellow execute many times and only sometimes i get the error..
int Squad::getSize() {
int no = 0;
for (int i = 0; i < (int) agents.size(); i++) {
BaseAgent* agent = agents.at(i);
if (agent != NULL && agent->isAlive() && agent->getUnit() != NULL && !agent->getUnit()->isBeingConstructed()) // this line
no++;
}
return no;
}
this is the code that I use to remove an BaseAgent from the vector.. analyze that and see if i can do it better:
void AgentManager::cleanup() {
//Step 2. Do the cleanup.
int cnt = 0;
int oldSize = (int)agents.size();
for (int i = 0; i < (int)agents.size(); i++) {
if (!agents.at(i)->isAlive()) {
delete agents.at(i);
agents.erase(agents.begin() + i);
cnt++;
i--;
}
}
int newSize = (int)agents.size();
}
the BaseAgent code is on this link
I would speculate that this line:
BaseAgent* agent = agents.at(i);
is returning some invalid pointer which is not set to 0.
Looking at your cleanup code, it looks a bit complicated. I would suggest
looping over the entire vector, deleting the dead elements and
setting the pointers to 0.
After the loop, use the erase-remove idiom to remove all NULL pointers from the vector.
step 1
for (unsigned int i = 0; i < agents.size(); ++i) {
if (!agents.at(i)->isAlive()) {
delete agents.at(i);
agents.at(i) = 0;
}
step 2
agents.erase(std::remove(agents.begin(), agents.end(), 0), agents.end());

misusing OpenMP?

I have a program using OpenMP to parallelize a for-loop. Inside the loop, the threads will write to shared variable, so I need to synchronize them. However I can sometimes get either segment fault or double free or corruption error. Anyone knows what happens? Thanks and regards! Here is the code:
void KNNClassifier::classify_various_k(int dim, double *feature, int label, int *ks, double * errors, int nb_ks, int k_max) {
ANNpoint queryPt = 0;
ANNidxArray nnIdx = 0;
ANNdistArray dists = 0;
queryPt = feature;
nnIdx = new ANNidx[k_max];
dists = new ANNdist[k_max];
if(strcmp(_search_neighbors, "brutal") == 0) {// search
_search_struct->annkSearch(queryPt, k_max, nnIdx, dists, _eps);
}else if(strcmp(_search_neighbors, "kdtree") == 0) {
_search_struct->annkSearch(queryPt, k_max, nnIdx, dists, _eps); // double free or corruption
}
for (int j = 0; j < nb_ks; j++)
{
scalar_t result = 0.0;
for (int i = 0; i < ks[j]; i++) {
result+=_labels[ nnIdx[i] ]; // Segmentation fault
}
if (result*label<0)
{
#pragma omp critical
{
errors[j]++;
}
}
}
delete [] nnIdx;
delete [] dists;
}
void KNNClassifier::tune_complexity(int nb_examples, int dim, double **features, int *labels, int fold, char *method, int nb_examples_test, double **features_test, int *labels_test) {
int nb_try = (_k_max - _k_min) / scalar_t(_k_step);
scalar_t *error_validation = new scalar_t [nb_try];
int *ks = new int [nb_try];
for(int i=0; i < nb_try; i ++){
ks[i] = _k_min + _k_step * i;
}
if (strcmp(method, "ct")==0)
{
train(nb_examples, dim, features, labels );// train once for all nb of nbs in ks
for(int i=0; i < nb_try; i ++){
if (ks[i] > nb_examples){nb_try=i; break;}
error_validation[i] = 0;
}
int i = 0;
#pragma omp parallel shared(nb_examples_test, error_validation,features_test, labels_test, nb_try, ks) private(i)
{
#pragma omp for schedule(dynamic) nowait
for (i=0; i < nb_examples_test; i++)
{
classify_various_k(dim, features_test[i], labels_test[i], ks, error_validation, nb_try, ks[nb_try - 1]); // where error occurs
}
}
for (i=0; i < nb_try; i++)
{
error_validation[i]/=nb_examples_test;
}
}
......
}
UPDATE:
As in my last post double free or corruption, the code runs fine with single-thread but gives runtime errors for multi-thread. The error changes from time to time. If I run it twice, one will be segfault, and the other will be double free or corruption.
Let's take a look at your segmentation fault line:
result+=_labels[ nnIdx[i] ];
result is local -- OK.
nnIdx is local -- also OK.
i is local -- still OK.
_labels ... what is it?
Is it global? Did you define access to it via #pragma shared?
Same goes for the former:
_search_struct->annkSearch(queryPt, k_max, nnIdx, dists, _eps);
Seems as we have here a problem that is not easily solvable -- _search_struct is not thread safe -- probably values in it are modified by threads at once. You have to have a dedicated _search_struct per-thread, probably by allocating it in classify_various_k.
The really bad news however is that ANN is probably completely non-threadable:
The library allocates a small amount
of storage, which is shared by all
search struc- tures built during the
program’s lifetime. Because the data
is shared, it is not deallocated, even
when the all the individual structures
are deleted.
As seen above there'll always be problems with parallel data modification, because the library itself has some shared data -- hence it's not thread-safe itself :/.