Force Auto-Parallelization VS 2012 - c++

Assume that /QPar is set, and for the following code:
#pragma loop(hint_parallel(8))
for(int i = 0; i < u; i++)
{
SomeExpensiveCall();
}
My u is small (~50), and SomeExpensiveCall takes ~1 second. The code doesn't appear to be getting parallelized (I commented out the hint and there was no change). Is there any way I can force the compiler to parallelize this?
Something I just thought of - would this have anything to do with the fact that the project containing the above code is in a static library that is linked into a CLI/C++ DLL that does not (and cannot) have /QPar?
Thanks

/Qpar-report:2 ought to tell you what's happening. Likely it doesn't want to parallel the function call due to potential side-effects.

Related

boost circular buffer in stl vector crashes in release

I have a class where I define a circular buffer like so:
class cTest
{
public:
boost::circular_buffer<std::vector<std::pair<double, double>>> circDat;
cTest() : circDat(1000)
{
}
};
I then create a stl vector of type cTest
std::vector<cTest> vC;
Afterwards I try to fill the vector like this:
for (unsigned int i = 0; i < 4; ++i)
{
cTest obj;
vC.push_back(obj);
}
While this works in Debug mode, in Release, it crashes (sometimes, when I run with from Visual Studio, I get a Heap Corruption message). The boost documentation mentions, that in Debug mode, the uninitialized memory is filled with '0xcc'. I assume, the error I get, has its root in uninitialized memory. But I am not really sure, how to fix this problem.
If I use pointers, it seems to work:
std::vector<cTest*> vC;
for (unsigned int i = 0; i < 4; ++i)
{
cTest* obj = new cTest;
vC.push_back(obj);
}
But I still do not know, what the problem with the first version is. If anyone does know, I'd appreciate the help.
Edit:
I've tried to create a minimal, reproducable code but failed. It also seemed to crash randomly, not really correlating to the lines added/removed. I then stumbled on the /GL flag in Visual Studio 2015.
After turning the /GL flag off (in the GUI project - in the library project it can stay on), I've been unable to recreate the crash. I do not know, if this is really a fix. But it seems like there was a similar problem present in Visual Studio 2010:
crash-in-program-using-openmp-x64-only
Edit2:
I've managed to pull together a minimal working example. The code can be downloaded here:
https://github.com/davidmarianovak/crashtest
You need Boost (I used 1.60) and QT5 (I used 5.6.3). Build GoAcquire in Release (/GL is active in Visual Studio). Afterwards, build GoGUI in Release (activate /GL and use 'standard' for link-time code generation). After you've built it, run it and it should crash.
The crash can be avoided by changing this in 'GoInterface.hpp' line 22:
void fillGraphicsViews(std::vector<cSensorConstruct> vSens);
to
void fillGraphicsViews(std::vector<cSensorConstruct> &vSens);
But I do not really believe that it is the problem. Can anyone tell me, what I'm doing wrong? I'm using Visual Studio 2015 for this.
I bet you're forgetting about iterator/reference invalidation. So the problem is not with the code shown.
Iterator invalidation rules
This makes sense since you report that pointers seem to work: the pointers stay the same even if push_back causes reallocation.
Simply don't hold on to references/iterators to vector elements when you don't know that they're going to stay valid.
If your vector has a known maximum size, you could "cheat" by reserving the capacity ahead of time:
static constexpr size_t MAX_BUFFERS = 100;
std::vector<cTest> vC;
vC.reserver(MAX_BUFFERS); // never more
And then perhaps guard the invariant:
assert(vC.size() < MAX_BUFFERS);
vC.push_back(obj);

Are comparison between macro values bad in embedded programming?

I am building a program that needs to run on an ARM.
The processor has plenty of resources to run the program, so this question is not directly related to this type of processor, but is related to non powerful ones, where resources and computing power are 'limited'.
To print debug informations (or even to activate portions of code) I am using a header file where I define macros that I set to true or false, like this:
#define DEBUG_ADCS_OBC true
and in the main program:
if (DEBUG_ADCS_OBC == true) {
printf("O2A ");
for (j = 0; j < 50; j++) {
printf("%x ", buffer_obc[jj]);
}
}
Is this a bad habit? Are there better ways to do this?
In addition, will having these IF checks affect performances in a measurable way?
Or is it safe to assume that when the code is compiled the IFs are somehow removed from the flow, as the comparison is made between two values that cannot change?
Since the expression DEBUG_ADCS_OBC == true can be evaluated at compile time, optimizing compilers will figure out that the branch is either always taken or is always bypassed, and eliminate the condition altogether. Therefore, there is zero runtime cost to the expression when you use an optimized compiler.
If you are compiling with all optimization turned off, use conditional compilation instead. This will do the same thing an optimizing compiler does with a constant expression, but at the preprocessor stage. Hence the compiler will not "see" the conditional even with optimization turned off.
Note 1: Since DEBUG_ADCS_OBC has a meaning of boolean variable, use DEBUG_ADCS_OBC without == true for a somewhat cleaner look.
Note 2: Rather than defining the value in the body of your program, consider passing a value on the command line, for example -DDEBUG_ADCS_OBC=true. This lets you change the debug setting without modifying your source code, simply by manipulating the make file or one of its options.
The code you are using is evaluated everytime when your program reaches this line. Since every change of DEBUG_ADCS_OBC will require a recompile of your code, you should use #ifdef/#ifndef expressions instead. The advantage of them is, that they are only evaluated once at compile time.
Your code segment could look like the following:
Header:
//Remove this line if debugging should be disabled
#define DEBUG_DCS_OBS
Source:
#ifdef DEBUG_DCS_OBS
printf("O2A ");
for (j = 0; j < 50; j++) {
printf("%x ", buffer_obc[jj]);
}
#endif
The problem with getting the compiler to do this is the unnecessary run-time test of a constant expression. An optimising compiler will remove this, but equally it may issue warnings about constant expressions or when the macro is undefined, issue warnings about unreachable code.
It is not a matter of "bad in embedded programming", it bears little merit in any programming domain.
The following is the more usual idiom, will not include unreachable code in the final build and in an appropriately configured a syntax highlighting editor or IDE will generally show you which code sections are active and which are not.
#define DEBUG_ADCS_OBC
...
#if defined DEBUG_ADCS_OBC
printf("O2A ");
for (j = 0; j < 50; j++)
{
printf("%x ", buffer_obc[jj]);
}
#endif
I'll add one thing that didn't see being mentioned.
If optimizations are disabled on debug builds, and even if runtime performance impact is insignificant, code is still included. As a result debug builds are usually bigger than release builds.
If you have very limited memory, you can run into situation where release build fits in the device memory and debug build does not.
For this reason I prefer compile time #if over runtime if. I can keep the memory usage between debug and release builds closer to each other, and it's easier to keep using the debugger at the end of project.
The optimizer will solve the extra resources problem as mentioned in the other replies, but I want to add another point. From the code readability point of view this code will be repeated a lot of times, so you can consider creating your specific printing macros. Those macros is what should be enclosed by the debug enable or disable macros.
#ifdef DEBUG_DCS_OBS
myCustomPrint //your custom printing code
#else
myCustomPrint //No code here
#end
Also this will decrease the probability of the macro to be forgotten in any file which will cause a real optimization problem.

Why my program is receiving SIGABRT when I use OpenMP to make a for loop parallel?

I'm writing a scientific program to solve Maxwell's equation with C++. The task in data parallel and I want to use OpenMP to make the program parallel. But when I use OpenMP to parallelise a for loop in side a function it. When I run my code the program gets SIGABRT. I couldn't find out went wrong. Please help.
The for loop is as follows:
#pragma omp parallel for
for (int i = 0; i < totalNoOfElementsInSecondMesh; i++) {
FEMSecondMeshElement2D *secondMeshElement = (FEMSecondMeshElement2D *)mesh->secondMeshFEMElement(i);
if (secondMeshElement->elementType == FEMDelectricElement) {
if (solutionType == TE)
calculateEzFieldForDielectricElement(secondMeshElement, i, currentSecondMeshIndex, nextFirstMeshIndex);
else
calculateHzFieldForDielectricElement(secondMeshElement, i, currentSecondMeshIndex, nextFirstMeshIndex);
} else if (secondMeshElement->elementType == FEMXPMLDielectricElement) {
if (solutionType == TE)
calculateEzFieldForDielectricPMLElement((FEMPMLSecondMeshElement2D *)secondMeshElement, i, currentSecondMeshIndex, nextFirstMeshIndex);
else
calculateHzFieldForDielectricPMLElement((FEMPMLSecondMeshElement2D *)secondMeshElement, i, currentSecondMeshIndex, nextFirstMeshIndex);
}
}
The compiler is llvm-gcc which came with Xcode 4.2 by default.
Please help.
It is possible you've run into a compiler problem on Lion. See this link:
https://plus.google.com/101546077160053841119/posts/9h35WKKqffL
You can download gcc 4.7 pre-compiled for Lion from a link on that page, and that seems to work fine.
The most likely reason that your program crashes is memory corruption when accessing FEMSecondMeshElement2D* secondMeshElement, currentSecondMeshIndex or nextFirstMeshIndex
depending what the other functions in the if clause do to them.
I recommend to check carefully the access of variables and declare them thread private / shared properly, beforehand.
e.g.
FEMSecondMeshElement2D *secondMeshElement = NULL;
#pragma omp parallel for private(secondMeshElement)
...
Did you try to compile your program with debugging and all warnings, i.e. with -g -Wall flags?
Then you can use a debugger (that is, gdb) to debug it.
You can enable core(5) dumps (by setting appropriately, with setrlimit(2) or the ulimit shell builtin which calls it, the RLIMIT_CORE). Once you have a core file, gdb can be used for post-mortem analysis. And there is also gcore(1) to force a core dump.

Some tests to see whether the code generated for iteration using pointers and iteration using indexing is different?

Is there a way to check if the compiler generates equivalent code for iteration using pointers and iteration using indexing???
i.e. for the codes
void f1(char v[])
{
for(int i=0; v[i]!=0;i++) use(v[i]);
}
and
void f1(char v[])
{
for(char *p = v; *p!=0; p++) use(*p);
}
I use microsoft visual C++ as my compiler......
Please help.....
Put a breakpoint in the function.
Verify that you compile in Release (otherwise it will surely be different) with debug information turned on.
Run.
Open the Assembly window to see the generated assembly (usually Alt+8).
I haven't used Visual Studio in some time, but I think there should be an option to create assembler files that you could compare.
Otherwise you can have two C files, one with each version of the function, and create object files from them. Then use a disassembler to get the assembler code, and compare the two files.

dynamic cast throws pointer is not std::__non_rtti_object

I'm having problem with dynamic_cast. i just compiled my project and tested every thing in debug mode and then i tried compiling it in release mode, i have copied every configuration from debug mode exept optimization parameter which is now /o2, (while debuging i set it as /od) the project compiled but when it starts loading my resources i got exception in the piece of code here :
for(int j = 1; j < i->second->getParametersNumber();j++)
{
CCTMXTiledMap* temp = CCTMXTiledMap::tiledMapWithTMXFile(i->second->As<string>(j).c_str());
CCTMXLayer* ret = NULL;
for(NSMutableArray<CCNode*>::NSMutableArrayIterator l=temp->getChildren()->begin();!ret && l!=temp->getChildren()->end();l++)
ret = dynamic_cast<CCTMXLayer*> (*l);
t1.first = ret;
templates[i->first].second.push_back(t1);
templates[i->first].second.back().first->retain();
}
nothing in code changed and when I check in debugger every variable in classes is what it should be but dynamic cast is throwing std::__non_rtti_object. what am i doing it wrong? and i'm using cocos2d-x ,I didn't have enough reputation to add that tag!
Does CCNode have any virtual functions? Are all elements of temp->getChildren()->begin() really CCNodes? Does temp->getChildren() return a reference? The latter is especially insidious: you call both temp->getChildren()->begin() and temp->getChildren()->end(). If getChildren() returns a copy, you're taking the begin of one copy and the end of another copy.
In this case after many code changes I found out there has to be some bugs which show themselves when code is optimized (still don't know if it's compiler's mis optimization or my code has some problems but it's probably mine). and the main reason for that problem was with *l being NULL.