I don't think that the following do concurrent Fortran loop is valid, as acc is modified in every iteration. However, gfortran is not giving me any kind of warning, and the resulting value in acc is correct at 55. Is it valid or not?
integer :: acc, i
acc = 0
do concurrent (i=1:10)
acc = acc + i
end do
The loop is indeed not valid. The compiler is not required to detect this and report the reason in this case.
In Fortran 2008 8.1.6.5 ('Restrictions on DO CONCURRENT constructs') we have as one restriction:
A variable that is referenced in an iteration shall either be previously defined during that iteration, or shall not be defined or become undefined during any other iteration. A variable that is defined or becomes undefined by more than one iteration becomes undefined when the loop terminates.
acc is such a variable that becomes defined (being on the left-hand side of an intrinsic assignment statement) by more than one iteration (all of them). The loop is thus a bad one (and at the end of the loop construct acc is undefined, so checking its value is also naughty).
As noted in the comments, and similarly to other invalid Fortran programs, you may still appear to get the correct answer without any complaints here. In this case, a DO CONCURRENT construct could be implemented in exactly the same way as a normal DO construct to give exactly the same answer. Only when running in parallel (say with autoparallelization or on GPUs), or with very strict compiler checks, would the data dependency result in a race condition and a wrong answer or abort.
Related
This question already has answers here:
c++ what happens when in one thread write and in second read the same object? (is it safe?) [duplicate]
(4 answers)
Closed 2 years ago.
Since I've started multi-threading, I've been asking myself this one question :
Is writing and reading a variable from different threads undefined behavior?
Let's use the minimal example where we increment an integer in a thread and read the integer inside another one.
void thread1()
{
x++;
}
void thread2()
{
if (x == 5)
{
//doSomething
}
}
I understand that the addition operation is not atomic and therefore I could make a read from the second thread while the first thread is in the middle of the adding operation, but there is something i'm not quite sure of.
Does x keeps his value until the whole addition operation is completed and then is assigned this new value, or does x have an intermediate state where reading from it would result in undefined behavior.
If the first theory applies, then reading from x while it's being writing to would simply return the value before the addition and wouldn't be so problematic.
If the second theory is true, could someone explain more in detail what is the process of the addition operation and why it would be undefined behavior (maybe with an example?)
Thanks
The comments already got the basics right.
The compiler, when compiling a single function may consider the ways in which a variable is changed. If the function cannot directly or indirectly change a certain variable, then the compiler may assume that there is no change to that variable whatsoever, unless there's thread synchronization. In that case the compiler must deal with the possibility of another thread changing those variables.
If the compiler assumption is violated (i.e. you have a bug), then literally anything may happen. This is not constrained, because that would severely restrict optimizers. You may make some assumptions that x has some unique address in memory, but optimizers are known to move variables around and have multiple variables share a single address (just at different times). Such optimizations may very well be justified based on a single-thread assumption, one that your example is violating. Your second thread may think it's looking at x, but it might also be getting y.
x (32bit variable) will be always defined on 32+bits cpu however not so precisely. You know that x can be any value from start up to end range defined by ++.
like in following case: x is initialized to 0 and you call 5 times thread1 the thread 2 can see this x in range from 0 to 5.
It means I can consider assignment of integer to memory as atomic.
There are some reasons why x on both thread is not synchronized e.g. while x on thread1 is 5 on the thread2 can be 0 in the same time.
One of the reason is cpu cache which is different for each core. To synchronise the value between caches you have to use memory barriers. You can use for example std::atomic which do a great job for you
In reference to this question.
I have tried this same program many times and I have seen others (group of friends ) using the same logic for swapping but none of them ever found wrong output. I want to ask that is there any chance of getting wrong output due to sequence point.
C++11 doesn't have sequence points anymore, but yes, the line is undefined behavior because the modification of b is not sequenced relative to its read.
This means that anything can happen; in general, though, the main problem is that compilers might reorder the exact sequence of events.
Yes, as far as I can tell, this is undefined behavior. The semicolon here is the only sequence point, so it is undefined whether the assignment takes place before or after the same variable gets used.
Now, if all your group of friends are using the same compiler and the same platform, which seems likely, they're all going to see the same results, with the same compiler, so this is not surprising. That's the answer to that part of the question.
Basically, yes. It may give wrong result, because in this line B is both written and read, and it is unspecified what will happen first.
Most probably you have tried it many times but you have used the same compiler, right? In such case it's very unlikely for you to observe different results. For a given the same bit of code, compilers usually produce a stable the same result.
To see a difference, you may need to change the compiler, or at least change some options like more or less aggressive optimization.
The problem with this expression is that, theoretically, it may be compiled as:
assign b <- a
a = a + b - a // but now, B is already equal a
or
assign temp1 <- a
assign temp2 <- b
assign b <- a
a = temp1 + temp2 - a // here values are preserved
Yes, this is undefined behavior
and it will give following warning
$ g++ -Wall -o test test.cpp test.cpp: In function ‘int main()’:
test.cpp:11:21: warning: operation on ‘b’ may be undefined
[-Wsequence-point]
If you use the above "trick" instead of a standard swap, with Visual Studio, you will have an unpleasant surprise. The side effects of evaluations are still here.
The C standard (1999 ed.) says in section 6.5 clause 2:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
So, yes, this code violates the sequence point rules (b is read from but not to determine the new value of b). C++ inherits this from C.
The following program tries to do a common mistake: modify a function argument,
whereas it is passed initially as a constant. Thus, usually, the constant is stored
in a read-only section in object code, and at run time one gets an access violation.
It's exactly what happens with gfortran, with optimization -O0 or -O1 (gfortran 4.8.1 on Windows).
But it disappears with -O2, and the second PRINT shows the value 100, like the first.
By inspection of the assembly output, I can see that in the -O1 case, the function F is optimized out, but the computations are still done in the code of A, and storing 117 causes a crash. With -O2, no computation is done, the result (201) is included in the assembly output as a constant, and the value 117 is never stored.
program bob
implicit none
call a(100)
contains
subroutine a(n)
integer :: n
print *, "In A:", f(n), n
print *, n
end subroutine
function f(n)
integer :: n, f
f = 2*n + 1
n = 117
end function
end program
Is this behaviour accepted by the standard? Is this a bug?
My first thought was that maybe it's a bug of the optimizer (it does not do something that would have indeed an effect, since the modified value is printed afterwards). But I'm aware that usually, an undefined behaviour in the standard can have any consequence when actually run.
If I replace the constant 100 in the call, with a variable previously initialized to 100, the compiler produces the expected result (the second PRINT gives me 117, with any optimization level).
So, maybe the optimizer is very clever, in the "constant" case: since the code would crash, the print woud not happen, so the value is not needed, so optmized out, and finally the program won't crash. But I still find it a bit puzzling.
The behaviour of the erroneous program is consistent with what the standard requires.
The standard doesn't require the compiler to diagnose this particular error (it is not a violation of the numbered syntax rules or numbered constraints). Beyond that, if a program is in error in this way, then the standard doesn't impose any requirements on the Fortran processor.
It does not reveal a bug in the compiler. Any behaviour is valid, including things like the compiler beating you over the head with a stick.
Perhaps you should have stated your INTENT.
This is probably a bug in the constants propagation module of the GCC optimiser. It is enabled by default for any optimisation level greater than -O1 and could be disabled by passing -fno-ipa-cp.
This example only serves to illustrate the importance of giving each dummy argument the correct INTENT attribute. When n is marked as INTENT(INOUT) in a, the compiler gives an error, no matter what the optimisation level.
So sample loop:
do i=1,1
print *,i
enddo
print *,i
gives me 2 as the final value of i. How can I set up Intel Fortran for Visual Studio on Windows to give me a final value of 1 for i?
This has been the way that Fortran loops work for decades and you can't simply change this with a compiler option. The Fortran standard clearly states:
8.1.4.4.1 Loop initiation
(2) The DO variable becomes defined with the value of the initial parameter m1.
(3) The iteration count is established and is the value of the expression
MAX (INT ((m2 – m1 + m3) / m3), 0)
Here m1, m2 and m3 are the three parameters in the loop-control: [,] var = m1,m2[,m3], Given your example of i=1,1 (m3 is implicitly 1 if omitted) the iteration count is MAX(INT((1-1+1)/1),0) which evaluates to 1, i.e. the loop should get executed once. i is initialised to 1 as per (2).
8.1.4.4.2 The execution cycle
The execution cycle of a DO construct consists of the following steps performed in sequence repeatedly until termination:
(1) The iteration count, if any, is tested. If the iteration count is zero, the loop terminates and the DO construct becomes inactive. If loop-control is [ , ] WHILE (scalar-logical-expr), the scalar-logicalexpr is evaluated; if the value of this expression is false, the loop terminates and the DO construct becomes inactive. If, as a result, all of the DO constructs sharing the do-term-shared-stmt are inactive, the execution of all of these constructs is complete. However, if some of the DO constructs sharing the do-term-shared-stmt are active, execution continues with step (3) of the execution cycle of the active DO construct whose DO statement was most recently executed.
Fortran tests if the remaining iteration count is greater than zero, not if the DO variable is less than (greater than) the end value.
(2) If the iteration count is nonzero, the range of the loop is executed.
(3) The iteration count, if any, is decremented by one. The DO variable, if any, is incremented by the value of the incrementation parameter m3.
The DO variable is always incremented as an iteration of the loop is being executed. Thus after the first execution i becomes incremented by 1 which evaluates to 2.
Except for the incrementation of the DO variable that occurs in step (3), the DO variable must neither be redefined nor become undefined while the DO construct is active.
8.1.4.4.4 Loop termination
When a DO construct becomes inactive, the DO-variable, if any, of the DO construct retains its last defined value.
The last defined value is 2. Thus after the DO loop has ended, i is equal to 2.
I've pulled the text out of ISO/IEC 1539:1991 (a.k.a. Fortran 90) but one can also find pretty much the same text in §11.10.3 of ISO/IEC 1539:1980 (a.k.a. ANSI X3J3/90.4 a.k.a. FORTRAN 77; sans the WHILE stuff which is not present in F77) as well as in §8.1.6.6 of ISO/IEC 1539-1:2010 (a.k.a. Fortran 2008).
You can't, because that's how DO works; it stops when the control variable exceeds the limit.
In general, in pretty much any language with a FOR/DO counting loop, you should only use the loop control variable inside the loop body, and treat it as undefined elsewhere, even if you can't actually limit its scope to the body.
In your case, I would use a different variable to keep track of the actual last value of i in any iteration:
lasti = 0
do i=1,1
print *,i
lasti = i
enddo
print *,lasti
I've (essentially) come across the following in the wild
x = x = 5;
which apparently compiles cleanly under earlier version of gcc (generates a warning under gcc 4.5.1). As far as I can tell the warning is generated by -Wsequence-point. So my question is does this violate the wording in the standard about manipulating variables in between sequence points (i.e., it is undefined behavior per the spec) or is this a gcc false positive (i.e., it is defined behavior per the spec)? The wording on sequence points is a bit hard to follow.
I said essentially because what I actually came across (in a larger expression) was
x[0][0] = x[0][0] = 5;
but I didn't think that was material to the warning (please correct me if that is to the point and not what I've assumed is the crux of the matter).
Assuming x is of built-in type, it assigns to x twice without an intervening sequence point, which is all you need to know. The fact that both assignments are of the same value (5), and could in theory be optimized into a single assignment (if x is not volatile), is neither here nor there.
At least, that's how I interpret "modified" in the standard - assigned a value, regardless of whether it happens to be the same as the old value. Likewise, casting away const and assigning to a const object is, I think, UB regardless of whether the value you assign happens to be equal to the prior value. Otherwise there's be a huge overhead on all memory writes if an implementation wanted to put string literals into ROM, to prevent a page fault in that case, and we know by inspection that compilers don't emit that code.
A even more exciting example would be x[0][0] = x[0][i] = 5;, which assigns to the same object without an intervening sequence point if (and only if) i == 0, so is defined behaviour conditional on the value of i.
I don't see quite why a compiler might do anything unexpected in either case, but again my lack of imagination is irrelevant :-)
What ablenky says is right. If you're in some context where you can't use two statements, maybe write x[0][0] = 5, x[0][i] = 5 instead. In both your given cases, just ditch the redundant assignment.