Fortran performance when passing array slices as arguments - fortran

I like fortran's array-slicing notation (array(1:n)), but I wonder whether I take a performance hit if I use them when it's not necessary.
Consider, for example, this simple quicksort code (It works, but obviously it's not taking care to pick a good pivot):
recursive subroutine quicksort(array, size)
real, dimension(:), intent(inout) :: array
integer, intent(in) :: size
integer :: p
if (size > 1) then
p = partition(array, size, 1)
call quicksort(array(1:p-1), p-1)
call quicksort(array(p+1:size), size-p)
end if
end subroutine quicksort
function partition(array, size, pivotdex) result(p)
real, dimension(:), intent(inout) :: array
integer, intent(in) :: size, pivotdex
real :: pivot
integer :: i, p
pivot = array(pivotdex)
call swap(array(pivotdex), array(size))
p=1
do i=1,size-1
if (array(i) < pivot) then
call swap(array(i), array(p))
p=p+1
end if
end do
call swap(array(p), array(size))
end function partition
subroutine swap(a, b)
real, intent(inout) :: a, b
real :: temp
temp = a
a = b
b = temp
end subroutine swap
I could just as easily pass the whole array along with the indices of where the recursive parts should be working, but I like the code this way. When I call quicksort(array(1:p-1), p-1), however, does it make a temporary array to operate on, or does it just make a shallow reference structure or something like that? Is this a sufficiently efficient solution?
This question is related, but it seems like it makes temporary arrays because of the strided slice and explicit-sized dummy variables, so I'm safe from that, right?

Regarding your question of efficiency: Yes, for most cases, using assumed-shape arrays and array slices is indeed a sufficiently efficient solution.
There is some overhead involved. Assumed-shape arrays require an array descriptor (sometimes also called "dope vector"). This array descriptor contains information about dimensions and strides, and setting it up requires some work.
The code in the called procedure with an assume-shape dummy argument has to take both unity stride (the usual case) and non-unity stride into account. Somebody, somewhere, might want to call your sorting routine with an actual argument of somearray(1:100:3) because he only wants to sort every third element of the array. Unusual, but legal. Code that cannot depend on unity stride may have some performance penalty.
Having said that, compilers, especially those using link-time optimization, are quite good nowadays in inlining and/or stripping away all the extra work, and also tend to clone procedures for special-casing unity strides.
So, as a rule, clarity (and assumed-shape arrays) should win. Just keep in mind that the old-fashioned way of passing array arguments may, in some circumstances, gain some extra efficiency.

Your subarray
array(1:p-1)
is contiguous, provided array is contiguous.
Also, you use an assumed shape array dummy argument
real, dimension(:), intent(inout) :: array
There is no need for a temporary. Just the descriptor of an assumed shape array is passed. And as your subarray is contiguous, even an assumed size, or explicit size, or assumed size dummy argument with the contiguous attribute would be OK.

Related

Size implications of 1-D array over a 2-D array

Is there performance or size difference between the the following ways of declaring a large array -
int a[4000][4000] and int a[4000 * 4000] ? Should we prefer one over the other if possible?
There's zero difference in memory layout.
Should be no difference in access speed, but you need to measure to be sure.
The 1D array is more versatile. If you want to make a function that can accept arrays of different sizes, with a 1D array you can simply do void foo(int *arr, std::size_t w, std::size_t h). But for a 2D array there's no good solution. Even though their memory layout is the same, attempting to pass a 2D array to such a function would cause UB, just because the standard says so.
If you later decide that you want to allocate the array on the heap, the transition is easier with a 1D array, because you can keep the same [] syntax. For 2D arrays, you would either have to use an array of pointers to arrays (which is less efficient), or write a class that wraps a 1D array and overloads operator[] (which is the proper way of doing it, but takes time).
Any array is accessed using pointer arithematic by the code generated by compiler in a way like this:
A[i] == *(&A+i)
in case of two dimensional array A[d1][d2]:
A[i][j] == *(&A+(i*d2)+j)
for multi-dimensional arrays it keep getting more complex but internally it is always a linear memory block starting at memory location stored in array variable.
If your code has no row-column logic then it is better to use linear array but if you need that kind of code then it is always better to use multi-dimensional array as per your need because pointer arithmetic can be optimized very efficiently by the compiler.

Iterate over vector using loop c++

Let's say I have a vector of long long elements:
std::vector<long long>V {1, 2, 3};
I have seen that in order to iterate over the vector you can do this:
for (auto i = v.begin() ; i != v.end() ; i++)
cout<<*i;
i++ means i grows by 1, but shouldn't the address go up 8 bytes to print the next element? So "growing" part of the for loop(for any type) should look like this:
i += sizeof(v[0]);
I'm assuming an address can hold 1 byte , so if the starting address of an integer would be 1000 , then its total allocation will be represented by adresses:1000 , 1001 , 1002 , 1003.I would like to understand memory better so I'd be thankful if you could help me.
When you increment a pointer it goes up by the "size of" the pointer's type. Remember, for a given pointer i, i[0] is the first element and is equivalent to *(i + 1).
Iterators tend to work in a very similar fashion so their operation is familiar, they feel like pointers due to how operator* and operator-> are implemented.
In other words, the meaning of i++ depends entirely on what i is and what operator++ will do on that type. Seeing ++ does not automatically mean +1. For iterators it has a very specific meaning, and that depends entirely on the type.
For std::vector it moves a pointer up to the next entry. For other structures it might navigate a linked list. You could write your own class where it makes a database call, reads a file from disk, or basically whatever you want.
Now if you do i += sizeof(v[0]) instead then you're moving up an arbitrary number of places in your array. It depends on what the size of that entry is, which depends a lot on your ISA.
std::vector is really simple, it's just a straight up block of memory treated like an array. You can even get a pointer to this via the data() function.
In other words think of i++ as "move up one index" not "move up one byte".
One of the nice things about the way you've written this code is that you don't need to worry about bytes.
Your auto type for i is masking the fact that it's a vector iterator, in other words, a special type that is designed for accessing members of a vector of long long. This means that *i evaluates to a long long pulled from whichever member of the vector i is currently indexing, so i++ advances i to index the next member of the vector.
Now, in terms of low level implementation, the values 1, 2, 3 are probably sitting in adjacent 8-byte blocks of memory, and i is probably implemented as a pointer, and incrementing i is probably implemented by adding 8 to its value, but each of those assumptions is going to depend on your architecture and the implementation of your compiler. And as I said, it's something you probably don't need to worry about.

Why doesn't C++ support range based for loop for dynamic arrays?

Why doesn't C++ support range based for loop over dynamic arrays? That is, something like this:
int* array = new int[len];
for[] (int i : array) {};
I just invented the for[] statement to rhyme with new[] and delete[]. As far as I understand, the runtime has the size of the array available (otherwise delete[] could not work) so in theory, range based for loop could also be made to work. What is the reason that it's not made to work?
What is the reason that it's not made to work?
A range based loop like
for(auto a : y) {
// ...
}
is just syntactic sugar for the following expression
auto endit = std::end(y);
for(auto it = std::begin(y); it != endit; ++it) {
auto a = *it;
// ...
}
Since std::begin() and std::end() cannot be used with a plain pointer, this can't be applied with a pointer allocated with new[].
As far as I understand, the runtime has the size of the array available (otherwise delete[] could not work)
How delete[] keeps track of the memory block that was allocated with new[] (which isn't necessarily the same size as was specified by the user), is a completely different thing and the compiler most probably doesn't even know how exactly this is implemented.
When you have this:
int* array = new int[len];
The problem here is that your variable called array is not an array at all. It is a pointer. That means it only contains the address of one object (in this case the first element of the array created using new).
For range based for to work the compiler needs two addresses, the beginning and the end of the array.
So the problem is the compiler does not have enough information to do this:
// array is only a pointer and does not have enough information
for(int i : array)
{
}
int* array = new int[len];
for[] (int i : array) {}
There are several points which must be addressed; I'll tackle them one at a time.
Does the run-time knows the size of the array?
In certain conditions, it must. As you pointed out, a call to delete[] will call the destructor of each element (in reserve order) and therefore must know how many there are.
However, by not specifying that the number of elements must be known, and accessible, the C++ standard allows an implementation to omit it whenever the call to the destructor is not required (std::is_trivially_destructible<T>::value evaluates to true).
Can the run-time distinguish between pointer and array?
In general, no.
When you have a pointer, it could point to anything:
a single item, or an item in an array,
the first item in an array, or any other,
an array on the stack, or an array on the heap,
just an array, or an array part of a larger object.
This is the reason what delete[] exists, and using delete here would be incorrect. With delete[], you the user state: this pointer points to the first item of a heap-allocated array.
The implementation can then assume that, for example, in the 8 bytes preceding this first item it can find the size of the array. Without you guaranteeing this, those 8 bytes could be anything.
Then, why not go all the way and create for[] (int i : array)?
There are two reasons:
As mentioned, today an implementation can elide the size on a number of elements; with this new for[] syntax, it would no longer be possible on a per-type basis.
It's not worth it.
Let us be honest, new[] and delete[] are relics of an older time. They are incredibly awkward:
the number of elements has to be known in advance, and cannot be changed,
the elements must be default constructible, or otherwise C-ish,
and unsafe to use:
the number of elements is inaccessible to the user.
There is generally no reason to use new[] and delete[] in modern C++. Most of the times a std::vector should be preferred; in the few instances where the capacity is superfluous, a std::dynarray is still better (because it keeps track of the size).
Therefore, without a valid reason to keep using these statements, there is no motivation to include new semantic constructs specifically dedicated to handling them.
And should anyone be motivated enough to make such a proposal:
the inhibition of the current optimization, a violation of C++ philosophy of "You don't pay for what you don't use", would likely be held against them,
the inclusion of new syntax, when modern C++ proposals have gone to great lengths to avoid it as much as possible (to the point of having a library defined std::variant), would also likely be held against them.
I recommend that you simply use std::vector.
This is not related to dynamic arrays, it is more general. Of course for dynamic arrays there exists somewhere the size to be able to call destructors (but remember that standard doesn't says anything about that, just that calling delete [] works as intended).
The problem is with pointers in general as given a pointer you can't tell if it correspond to any kind of...what?
Arrays decay to pointers but given a pointer what can you say?
array is not an array, but a pointer and there's no information about the size of the "array". So, compiler can not deduce begin and end of this array.
See the syntax of range based for loop:
{
auto && __range = range_expression ;
for (auto __begin = begin_expr, __end = end_expr;
__begin != __end; ++__begin) {
range_declaration = *__begin;
loop_statement
}
}
range_expression - any expression that represents a suitable sequence
(either an array or an object for which begin and end member functions
or free functions are defined, see below) or a braced-init-list.
auto works at compile time.So, begin_expr and end_expr doesn't at deduct runtime.
The reason is that, given only the value of the pointer array, the compiler (and your code) has no information about what it points at. The only thing known is that array has a value which is the address of a single int.
It could point at the first element of a statically allocated array. It could point at an element in the middle of a dynamically allocated array. It could point at a member of a data structure. It could point at an element of an array that is within a data structure. The list goes on.
Your code will make ASSUMPTIONS about what the pointer points at. It may assume it is an array of 50 elements. Your code may access the value of len, and assume array points at the (first element of) an array of len elements. If your code gets it right, all works as intended. If your code gets it wrong (e.g. accessing the 50th element of an array with 5 elements) then the behaviour is simply undefined. It is undefined because the possibilities are endless - the book-keeping to keep track of what an arbitrary pointer ACTUALLY points at (beyond the information that there is an int at that address) would be enormous.
You're starting with the ASSUMPTION that array points at the result from new int[len]. But that information is not stored in the value of array itself, so the compiler has no way to work back to a value of len. That would be needed for your "range based" approach to work.
While, yes, given array = new int[len], the machinery invoked by delete [] array will work out that array has len elements, and release them. But delete [] array also has undefined behaviour if array results from something other than a new [] expression. Even
int *array = new int;
delete [] array;
gives undefined behaviour. The "runtime" is not required to work out, in this case, that array is actually the address of a single dynamically allocated int (and not an actual array). So it is not required to cope with that.

How can I sort an array passed as a parameter?

I have to write a method within already-written code that passes me an array directly. However once inside my method that array has become a pointer to the first object in the array. So now I have done some calculations and want to sort the array. But since it's now not considered an array, I can't perform the sort() function.
What's the best way to sort an array when I only have the pointer to work with?
You either need to know the number of elements in the array, passed as a separate parameter or have a pointer to one past the last element.
void my_sort(int* p, unsigned n) {
std::sort(p, p+n);
}
or
void my_sort2(int* p, int* p_end) {
std::sort(p, p_end);
}
and you would call them
int a[] = { 3, 1, 2 };
my_sort(a, sizeof a / sizeof a[0]); // or 3...
my_sort2(a, &a[2] + 1); // one past the last element! i.e. a+3
In c there is essentially no difference between an "array" and a "pointer to the first object in the array". Arrays are referred to using their base pointer, that is, pointer to first object.
Technically precise explanation at Array base pointer and its address are same. Why?
So, just sort the array as you would anywhere else. Got an example sort or sample code in mind or is that sufficient?
Sort it exactly as you would sort it before you passed it in. If your sort() function requires a length, then pass the length as an additional parameter.
The best would be if you could start using std::array from C++11 on:
http://en.cppreference.com/w/cpp/container/array
This way, you would also have the size known and accessible by the corresponding size method. You could also consider other std container types rather than raw array. In general, it is better to avoid raw arrays as much as possible.
Failing that, you would need to know the size of the array either through function parameter, or other means like class member variable if it is happening inside a class, and so on.
Then, you could use different type of sorting algorithms based on your complexity desire; let it be quick sort, bubble sort, heap sort, stable sort, etc... it depends on what kind of data, the array represents, etc.
One sorting algorithm is to use std::sort. Therefore, you would be writing something like this:
std::sort (mystdarray.begin(), mystdarray.end());
or
std::sort (myrawarray, myrawarray+size);

changing pointer members of a subroutine argument with intent(in)

I'm writing a sparse matrix library in Fortran for fun but came into a little snag. I have a subroutine for matrix multiplication with the interface
subroutine matvec(A,x,y)
class(sparse_matrix), intent(in) :: A
real(double_precision), intent(in) :: x(:)
real(double_precision), intent(inout) :: y(:)
{etc.}
This uses a sparse matrix type that I've defined myself, the implementation of which is unimportant. Now, I can make things nicer and have a lot less code if A contains an object called iterator:
type :: sparse_matrix
type(matrix_iterator) :: iterator
{etc.}
which stores a few variables that keep track of things during matvec. But, if I change the state of iterator and in turn the state of A during matrix multiplication, the compiler will throw a fit because A has intent(in) for that subroutine.
Suppose I change things around and instead define
type :: sparse_matrix
type(matrix_iterator), pointer :: iterator
{etc.}
It's no problem if I change the state of iterator during a procedure in which a matrix has intent(in), because the value of the pointer to iterator doesn't change; only the memory stored at that address is affected. This is confirmed by making a reduced test case, which compiles and runs just fine using GCC.
Am I correct in thinking this is an appropriate fix? Or should I change the subroutine so that A has intent(inout)? The fact that it compiled with GCC doesn't necessarily mean it's standard-compliant, nor does it mean it's good programming practice.
To make an analogy with C, suppose I had a function foo(int* const p). If I wrote
*p = 42;
this would be ok, as the value of the pointer doesn't change, only the data stored at the address pointed to. On the other hand, I couldn't write
p = &my_var;
because it's a constant pointer.
Yes, it is OK. Actually this practice is well known and it is used, for example, when doing reference counting memory management, because the right hand side of defined assignment is an intent(in) expression, but you must be able to decrease the reference count in it.