Execute and finish of methods - c++

This is a very naive question, please forgive my ignorance if I use the wrong terms.
If I have a series of instructions as in the snippet,
bool methodComplete = false;
methodComplete = doSomeMethod(someParam, etcParam); //long & complex method that returns true
if (methodComplete)
doSomeOtherMethod();
will the method doSomeMethod() finish its execution before if (methodComplete) is evaluated?
Or is this a case for an asynchronous pattern if I want to guarantee it is completed?

The language specifications define how a program will effectively behave from the point of the user/programmer. So, yes, you can assume that the program behaves as that:
It computes doSomeMethod
It stores the results in methodComplete
It executes the if clauses
That said, some optimizations might result in code executed ahead, see Speculative execution.

will the method doSomeMethod() finished executing before if (methodComplete) is evaluated?
Yes*.
or is this a case for an asynchronous pattern if I want to guarantee it has completed?
Only if you are doing parallel computing.
*)It can get to be a no if your code is executing in parallel..

Related

could you help me undrstand the parrallelisme in VHDL?

I understand that in a process the instructions are executed sequentially and that the value of a signal is not updated until the end of the process, but I can not understand the principle of parallelism? for example in the following code I know that both instructions will be executed in parallel (at the same time) but I do not know if Q will have the new value of Sig2 or the precidente also when we calculate Sig2 do we use the new value of Sig1 or the precidente ?
Sig1<=a and b;
Sig2<=Sig1 and a;
Q<=Sig2;
As VHDL uses event driven semantics, nothing actually executes in parallel. It just has the appearance of parallelism. The concurrent assignments you show execute whenever the RHS operands change—there is no implied ordering. If a changes from 1 to 0, you cannot depend on which order the first two statements execute. It's possible the 2nd assignment executes first, then the 1st assignment executes second, followed by the 3rd assignment executes third (because Sig2 has changed) and then the 2nd assignment executes again because Sig1 has changed.
Most tools will try to order the statements to minimize the number of assignment re-executions and may even optimize it as if you wrote:
Q <= a and b;
and eliminate Sig1 and Sig2 from the simulation.

How should one wait for a condition to become true in Chapel?

Consider the following simplified situation -- task A increments a counter i (while possibly also doing some work), while task B needs to start its task when i reaches a particular value. Task A is oblivious to the existence of B, so I can't assume that A can signal to B when the condition is met. B can however read i, although i could well be remote to B.
What is the best way (or most idiomatic way) for B to check to see if i has reached/crossed a value?
I thought of a few different options (some of which don't work) :
A simple while loop, with no body -- Does this lock the task, or does Chapel sometimes yield from the while loop? Also, I assume the correct procedure would be to execute the while loop on the i's locale.
Using atomics and the waitFor method -- unfortunately, this doesn't work, since it's possible that i has already crossed the value of interest.
It's a little ugly, but you can implement a slight variation of waitFor(). Something like:
on i {
while i.read() < valueOfInterest {
chpl_task_yield();
}
}
Note that you have to explicitly do a chpl_task_yield() yourself. Chapel will not automatically insert yields into a loop or anything.
You can also make a wrapper:
proc waitUntil(i, valueOfIterest) where isAtomic(i) {
on i {
while i.read() < valueOfIterest {
chpl_task_yield();
}
}
}
waitUntil(i, valueOfInterest);
Ideally the signature would be more like proc waitUntil(i: atomic(?t), valueOfInterest: t), but that's not supported today.

Are atomic types necessary in multi-threading? (OS X, clang, c++11)

I'm trying to demonstrate that it's very bad idea to not use std::atomic<>s but I can't manage to create an example that reproduces the failure. I have two threads and one of them does:
{
foobar = false;
}
and the other:
{
if (foobar) {
// ...
}
}
the type of foobar is either bool or std::atomic_bool and it's initialized to true. I'm using OS X Yosemite and even tried to use this trick to hint via CPU affinity that I want the threads to run on different cores. I run such operations in loops etc. and in any case, there's no observable difference in execution. I end up inspecting generated assembly with clang clang -std=c++11 -lstdc++ -O3 -S test.cpp and I see that the asm differences on read are minor (without atomic on left, with on right):
No mfence or something that "dramatic". On the write side, something more "dramatic" happens:
As you can see, the atomic<> version uses xchgb which uses an implicit lock. When I compile with a relatively old version of gcc (v4.5.2) I can see all sorts of mfences being added which also indicates there's a serious concern.
I kind of understand that "X86 implements a very strong memory model" (ref) and that mfences might not be necessary but does it mean that unless I want to write cross-platform code that e.g. supports ARM, I don't really need to put any atomic<>s unless I care for consistency at ns-level?
I've watched "atomic<> Weapons" from Herb Sutter but I'm still impressed with how difficult it is to create a simple example that reproduces those problems.
The big problem of data races is that they're undefined behavior, not guaranteed wrong behavior. And this, in conjunction with the the general unpredictability of threads and the strength of the x64 memory model, means that it gets really hard to create reproduceable failures.
A slightly more reliable failure mode is when the optimizer does unexpected things, because you can observe those in the assembly. Of course, the optimizer is notoriously finicky as well and might do something completely different if you change just one code line.
Here's an example failure that we had in our code at one point. The code implemented a sort of spin lock, but didn't use atomics.
bool operation_done;
void thread1() {
while (!operation_done) {
sleep();
}
// do something that depends on operation being done
}
void thread2() {
// do the operation
operation_done = true;
}
This worked fine in debug mode, but the release build got stuck. Debugging showed that execution of thread1 never left the loop, and looking at the assembly, we found that the condition was gone; the loop was simply infinite.
The problem was that the optimizer realized that under its memory model, operation_done could not possibly change within the loop (that would have been a data race), and thus it "knew" that once the condition was true once, it would be true forever.
Changing the type of operation_done to atomic_bool (or actually, a pre-C++11 compiler-specific equivalent) fixed the issue.
This is my own version of #Sebastian Redl's answer that fits the question more closely. I will still accept his for credit + kudos to #HansPassant for his comment which brought my attention back to writes which made everything clear - since as soon as I observed that the compiler was adding synchronization on writes, the problem turned to be that it wasn't optimizing bool as much as one would expect.
I was able to have a trivial program that reproduces the problem:
std::atomic_bool foobar(true);
//bool foobar = true;
long long cnt = 0;
long long loops = 400000000ll;
void thread_1() {
usleep(200000);
foobar = false;
}
void thread_2() {
while (loops--) {
if (foobar) {
++cnt;
}
}
std::cout << cnt << std::endl;
}
The main difference with my original code was that I used to have a usleep() inside the while loop. It was enough to prevent any optimizations within the while loop. The cleaner code above, yields the same asm for write:
but quite different for read:
We can see that in the bool case (left) clang brought the if (foobar) outside the loop. Thus when I run the bool case I get:
400000000
real 0m1.044s
user 0m1.032s
sys 0m0.005s
while when I run the atomic_bool case I get:
95393578
real 0m0.420s
user 0m0.414s
sys 0m0.003s
It's interesting that the atomic_bool case is faster - I guess because it does just 95 million incs on the counter contrary to 400 million in the bool case.
What is even more crazy-interesting though is this. If I move the std::cout << cnt << std::endl; out of the threaded code, after pthread_join(), the loop in the non-atomic case becomes just this:
i.e. there's no loop. It's just if (foobar!=0) cnt = loops;! Clever clang. Then the execution yields:
400000000
real 0m0.206s
user 0m0.001s
sys 0m0.002s
while the atomic_bool remains the same.
So more than enough evidence that we should use atomics. The only thing to remember is - don't put any usleep() on your benchmarks because even if it's small, it will prevent quite a few compiler optimizations.
In general, it is very rare that the use of atomic types actually does anything useful for you in multithreaded situations. It is more useful to implement things like mutexes, semaphores and so on.
One reason why it's not very useful: As soon as you have two values that both need to be changed in an atomic way, you are absolutely stuck. You can't do it with atomic values. And it's quite rare that I want to change a single value in an atomic way.
In iOS and MacOS X, the three methods to use are: Protecting the change using #synchronized. Avoiding multi-threaded access by running code on a sequential queue (may be the main queue). Using mutexes.
I hope you are aware that atomicity for boolean values is rather pointless. What you have is a race condition: One thread stores a value, another reads it. Atomicity doesn't make a difference here. It makes (or might make) a difference if two threads accessing a variable at exactly the same time causes problems. For example, if a variable is incremented on two threads at exactly the same time, is it guaranteed that the final result is increased by two? That requires atomicity (or one of the methods mentioned earlier).
Sebastian makes the ridiculous claim that atomicity fixes the data race: That's nonsense. In a data race, a reader will read a value either before or after it is changed, whether that value is atomic or not doesn't make any difference whatsoever. The reader will read the old value or the new value, so the behaviour is unpredictable. All that atomicity does is prevent the situation that the reader would read some in-between state. Which doesn't fix the data race.

How to get an objective evaluation of the execution time of a C++ code snippet?

I am following this post How to Calculate Execution Time of a Code Snippet in C++, and a nice solution is given in this post for calculating the execution time of a code snippet. However, when I use this solution to measure the execution time of my code snippet in linux, I found that everything I run the program, the execution time given by the solution is different. So my question is how I can have an objective evaluation of the execution time. The objective evaluation is important to me as I use the following scheme to evaluate the different implementation of the same task:
void main()
{
int64 begin,end;
begin = GetTimeMs64();
execute_my_codes_method1();
end = GetTimeMs64();
std::cout<<"Execution time is "<<end-begin<<std::endl;
}
First, I run the above code to get the execution time for the first method. After that, I will change the above codes by invoking execute_my_codes_method2() and get the execution time for the second method.
void main()
{
int64 begin,end;
begin = GetTimeMs64();
execute_my_codes_method2();//execute_my_codes_method1();
end = GetTimeMs64();
std::cout<<"Execution time is "<<end-begin<<std::endl;
}
By comparing the different execution time I expect to compare the efficiency of these two different implementations.
The reason why I changed the codes and run different implementations is because it is very difficult to call them sequentially in one program. Therefore, for the same program running it at different times will lead to different execution time means that comparing different implementation methods using the calculated execution time is meaningless. Any suggestions on this problem? Thanks.
Measuring a single call's execution time is pretty useless for judging any performance improvements. There are too many factors that influence the actual execution time of a function. If you are measure timing you should make many calls to the function measure the time and build a statistical average of the measured execution times
void main() {
int64 begin = 0, end = 0;
begin = GetTimeMs64();
for (int i = 0; i < 10000; ++i) {
execute_my_codes_method1();
}
end = GetTimeMs64();
std::cout<<"Average execution time is "<< (end - begin) / 10000 << std::endl;
}
Additionally instead of what's shown above, the presence of having unit tests for your functions up front (using a decent testing framework like e.g. Google Test), will making such quick judgments as you mention a lot quicker and easier.
Not only you can determine how often the test cases should be run (to gather the statistical data for average time calculation), the unit tests can also prove that the desired/existing functionality and input/output consistency wasn't broken by an alternate implementation.
As an extra benefit (as you mentioned difficulties running the two functions in question sequentially), most of those unit test frameworks allow to have a SetUp() and TearDown() method, that are executed before/after running a test case. Thus you can easily provide consistent state of predicate or invariant conditions for each single test case run.
As a further option, instead of measuring to gather the statistical data yourself, you can use profiling tools that work via code instrumentation. A good sample for this is GCC's gprof. I think there's information gathered for how often every underlying function was called and which time the execution took. This data can be analyzed later with the tool, to find potential bottlenecks in your implementations.
Additionally,- if you decide to provide unit tests in future -, you may want to ensure all of your code path's regarding various input data situations are covered well by your test cases. A very good example,for how to do this, is GCC's gcov instrumentation. To analyze the gathered information about code coverage you can use lcov, that visualizes the results quite nicely and comprehensive.

How to print result of C++ evaluation with GDB?

I've been looking around but was unable to figure out how one could print out in GDB the result of an evaluation. For example, in the code below:
if (strcmp(current_node->word,min_node->word) > 0)
min_node = current_node;
(above I was trying out a possible method for checking alphabetical order for strings, and wasn't absolutely certain it works correctly.)
Now I could watch min_node and see if the value changes but in more involved code this is sometimes more complicated. I am wondering if there is a simple way to watch the evaluation of a test on the line where GDB / program flow currently is.
There is no expression-level single stepping in gdb, if that's what you are asking for.
Your options are (from most commonly to most infrequently used):
evaluate the expression in gdb, doing print strcmp(current_node->word,min_node->word). Surprisingly, this works: gdb can evaluate function calls, by injecting code into the running program and having it execute the code. Of course, this is fairly dangerous if the functions have side effects or may crash; in this case, it is so harmless that people typically won't think about potential problems.
perform instruction-level (assembly) single-stepping (ni/si). When the call instruction is done, you find the result in a register, according to the processor conventions (%eax on x86).
edit the code to assign intermediate values to variables, and split that into separate lines/statements; then use regular single-stepping and inspect the variables.
you may simply try to type in :
call "my_funtion()"
as far as i rember, though it won't work when a function is inlined.