I want to declare a parallel for within a master region, like this:
#pragma omp parallel
{
#pragma omp master
{
*many functions...*
#pragma omp parallel for
for (int i = 0; i < x; ++i){
a += i;
}
}
}
This is just an example code, I have hundreds of functions that I don't want to manually add the master clause in each of them, but is this possible to do? Or is there any other way to do what I want?
#pragma omp parallel
{
//mater only
#pragma omp master
{
*many functions...*
}
//full team: just for not parallel for
#pragma omp for
for(int i = 0; i < x; ++i){
a += i;
}
}
Just declare the for outside of the mater.
Or just do the sequential actions out side of the parallel section al together
*many functions...*
#pragma omp parallel for
for(int i = 0; i < x; ++i){
a += i;
}
Related
Im trying to do the equivalent of this code with omp parallel and omp critical to synchronize it:
std::vector<int> randValueCounter(int n, int listSize) {
std::vector<int> list1(listSize, 0);
for (int i = 0; i < N; ++i) {
int rand = randomVal();
if(rand <= listSize){
++list1[rand]
}
return list1;
}
my attempt to use prallelization + synchronization using OMP parallel and OMP critical:
std::vector<int> randValueCounter2(int n, int listSize) {
std::vector<int> list1(listSize, 0);
#pragma omp parallel for
for (int i = 0; i < N; ++i) {
int rand = randomVal();
#pragma omp critical
{
if(rand <= listSize){
++list1[rand]
}
}
return list1;
}
I read that randValueCounter2 will have some overhead, but I was wondering if it in this case would accumulate the same result in functionality as the first function ?
I am trying out OpenMP offloading with an nvidia GPU and I am trying to do some array calculations with it in C++.
Right now my output is not desirable, as I am new with offloading calculations with OpenMP. Would appreciate if someone can point me to the correct direction.
Code Snippet:
#include <omp.h>
#include <iostream>
using namespace std;
int main(){
int totalSum, ompSum;
const int N = 1000;
int array[N];
for (int i=0; i<N; i++){
array[i]=i;
}
#pragma omp target
{
#pragma omp parallal private(ompSum) shared(totalSum)
{
ompSum=0;
#pragma omp parallel for
for (int i=0; i<N; i++){
ompSum += array[i];
}
#pragma omp critical
totalSum += ompSum;
}
printf ( "Caculated sum should be %d but is %d\n", N*(N-1)/2, totalSum );
}
return 0;
}
Right now, I know that the sum should calculate to a number 499500, but my machine is outputting extremely big numbers that are also negative.
You have some typos on the OpenMP constructors, namely:
#pragma omp parallal -> #pragma omp parallel;
#pragma omp parallel for -> #pragma omp for
Regarding 2. you do not need the parallel because you are already inside a parallel region.
Try the following:
using namespace std;
int main(){
int totalSum = 0, ompSum = 0;
const int N = 1000;
int array[N];
for (int i=0; i<N; i++){
array[i]=i;
}
#pragma omp target
{
#pragma omp parallel private(ompSum) shared(totalSum)
{
ompSum=0;
#pragma omp for
for (int i=0; i<N; i++){
ompSum += array[i];
}
#pragma omp critical
totalSum += ompSum;
}
printf ( "Caculated sum should be %d but is %d\n", N*(N-1)/2, totalSum );
}
return 0;
}
There is two consecutive loops and there is a reduction clause in the second loop.
#pragma opm parallel
{
#pragma omp for
for (size_t i = 0; i < N; ++i)
{
}
#pragma omp barrier
#pragma omp for reduction(+ \
: sumj)
for (size_t i = 0; i < N; ++i)
{
sumj = 0.0;
for (size_t j = 0; j < adjList[i].size(); ++j)
{
sumj += 0;
}
Jac[i, i] = sumj;
}
}
to reduce the creating threads overhead I wand to keep the threads and use them in the second loop, but I get the following error
lib.cpp:131:17: error: reduction variable ‘sumj’ is private in outer context
#pragma omp for reduction(+ \
^~~
how to fix that?
I'm not sure what you are trying to do, but it seems that something like this would do what you expect:
#pragma omp parallel
{
#pragma omp for
for (size_t i = 0; i < N; ++i)
{
}
#pragma omp barrier
#pragma omp for
for (size_t i = 0; i < N; ++i)
{
double sumj = 0.0;
for (size_t j = 0; j < adjList[i].size(); ++j)
{
sumj += 0;
}
Jac[i, i] = sumj;
}
}
Reduce would be useful in the case of an "omp for" in the interior loop.
I am new on using OpenMP 2.0 along with MSVC++ 2017. I'm working with a big data structure (referenced as bigMap) so I need to distribute the workload when iterating on it in the best possible way. My attempt for doing so is:
std::map<int, std::set<std::pair<double, double>>> bigMap;
///thousands of values are added here
int k;
int max_threads = omp_get_max_threads();
omp_set_num_threads(max_threads);
#pragma omp parallel default(none) private(k)
{
#pragma omp for
for(k = kMax; k > kMin; k--)
{
for (auto& myPair : bigMap[k])
{
int pthread = omp_get_thread_num();
std::cout << "Thread " << pthread << std::endl;
for (auto& item : myPair)
{
#pragma omp critical
myMap[k-1].insert(std::make_pair(item, 0));
}
}
}
The output for "pthread" is always "0" and the execution time is the same as for single-thread (so I assume no new threads are being created).
Why this code doesn't work and which OMP directives / clauses / sections are wrong??
UPDATE:
OMP is now working, but the code below is not working as expected:
#pragma omp parallel for schedule(static,1)
for (int i = 0; i < map_size; ++i) {
#pragma omp critical
bigMap[i] = std::set<int>();
}
bigMap[1] = { 10, 100, 1000 };
int i;
#pragma omp parallel for schedule(static) num_threads(8)
for (i = thread_num; i < map_size; i += thread_count)
{
for (auto it = bigMap[i].begin(); it != bigMap[i].end(); ++it)
{
int elem = *it;
bigMap[i + 1].insert(elem);
}
}
I expect the 3 elements from bigMap[1] to be inserted across all entries of bigMap, instead, they're inserted only once, for bigMap[2], why??
Little bug....
#pragma omp parallel for schedule(static,1)
for (int i = 0; i < map_size; ++i) {
#pragma omp critical
bigMap[i] = std::set<int>();
}
bigMap[1] = { 10, 100, 1000 };
int i;
#pragma omp parallel for schedule(static) num_threads(8)
for (i = thread_num; i < map_size; i += thread_count)
{
//here you loop on bigMap[i] which is empty execpt for i==1.
//for (auto it = bigMap[i].begin(); it != bigMap[i].end(); ++it)
for (auto it = bigMap[1].begin(); it != bigMap[1].end(); ++it)
{
int elem = *it;
bigMap[i + 1].insert(elem);
}
}
Maybe you miss understand what static means.
If I use nested parallel for loops like this:
#pragma omp parallel for schedule(dynamic,1)
for (int x = 0; x < x_max; ++x) {
#pragma omp parallel for schedule(dynamic,1)
for (int y = 0; y < y_max; ++y) {
//parallelize this code here
}
//IMPORTANT: no code in here
}
is this equivalent to:
for (int x = 0; x < x_max; ++x) {
#pragma omp parallel for schedule(dynamic,1)
for (int y = 0; y < y_max; ++y) {
//parallelize this code here
}
//IMPORTANT: no code in here
}
Is the outer parallel for doing anything other than creating a new task?
If your compiler supports OpenMP 3.0, you can use the collapse clause:
#pragma omp parallel for schedule(dynamic,1) collapse(2)
for (int x = 0; x < x_max; ++x) {
for (int y = 0; y < y_max; ++y) {
//parallelize this code here
}
//IMPORTANT: no code in here
}
If it doesn't (e.g. only OpenMP 2.5 is supported), there is a simple workaround:
#pragma omp parallel for schedule(dynamic,1)
for (int xy = 0; xy < x_max*y_max; ++xy) {
int x = xy / y_max;
int y = xy % y_max;
//parallelize this code here
}
You can enable nested parallelism with omp_set_nested(1); and your nested omp parallel for code will work but that might not be the best idea.
By the way, why the dynamic scheduling? Is every loop iteration evaluated in non-constant time?
NO.
The first #pragma omp parallel will create a team of parallel threads and the second will then try to create for each of the original threads another team, i.e. a team of teams. However, on almost all existing implementations the second team has just only one thread: the second parallel region is essentially not used. Thus, your code is more like equivalent to
#pragma omp parallel for schedule(dynamic,1)
for (int x = 0; x < x_max; ++x) {
// only one x per thread
for (int y = 0; y < y_max; ++y) {
// code here: each thread loops all y
}
}
If you don't want that, but only parallelise the inner loop, you can do this:
#pragma omp parallel
for (int x = 0; x < x_max; ++x) {
// each thread loops over all x
#pragma omp for schedule(dynamic,1)
for (int y = 0; y < y_max; ++y) {
// code here, only one y per thread
}
}