Use of MPI_GATHERV where the "root" has no send buffer - fortran

I want to employ the MPI function MPI_GATHERV, where each MPI rank has a certain buffer with varying sizes that needs to be Gathered at the Root process.
My root process will only gather the buffers, but has no send buffer itself, as usual.
E.g.
RANK1
buf_sz =3
buf(1) = 1
buf(2) = 2
buf(3) = 3
Rank 2
buf_sz =3
buf(1) = 4
buf(2) = 5
buf(3) = 6
After MPI_GATHERV, my Root should have
buf(1) = 1
buf(2) = 2
buf(3) = 3
buf(4) = 4
buf(5) = 5
buf(6) = 6
As for now, this will mean that at the Root I will have to apply a sendbuf which is allocated with zero elements and zero count. I am not sure how well defined and safe this is generally?
Sample code
if(myrankid == root)then
allocate(displ(3))
allocate(counts(3))
displ(1) = 0
displ(2) = 0
displ(3) = 3
counts(1) = 0
counts(2) = 3
counts(3) = 3
allocate(buf_send(0))
sendcount = 0
allocate(recvbuf(6))
else
allocate(displ(1))
allocate(counts(1))
allocate(buf_send(3))
allocate(recvbuf(1))
sendcount = 3
endif
MPI_Gatherv(buf_send, sendcount, mpi_integer,&
recvbuf, counts, displs,&
mpi_integer, root, mpi_comm_world)

(Turning my comment into an answer)
Yes, it is legal to allocate to zero size and pass that. That is precisely what I do

Related

Using AND bitwise operator between a number, and its negative counterpart

I stumbled upon this simple line of code, and I cannot figure out what it does. I understand what it does in separate parts, but I don't really understand it as a whole.
// We have an integer(32 bit signed) called i
// The following code snippet is inside a for loop declaration
// in place of a simple incrementor like i++
// for(;;HERE){}
i += (i&(-i))
If I understand correctly it uses the AND binary operator between i and negative i and then adds that number to i. I first thought that this would be an optimized way of calculating the absolute value of an integer, however as I come to know, c++ does not store negative integers simply by flipping a bit, but please correct me if I'm wrong.
Assuming two's complement representation, and assuming i is not INT_MIN, the expression i & -i results in the value of the lowest bit set in i.
If we look at the value of this expression for various values of i:
0 00000000: i&(-i) = 0
1 00000001: i&(-i) = 1
2 00000010: i&(-i) = 2
3 00000011: i&(-i) = 1
4 00000100: i&(-i) = 4
5 00000101: i&(-i) = 1
6 00000110: i&(-i) = 2
7 00000111: i&(-i) = 1
8 00001000: i&(-i) = 8
9 00001001: i&(-i) = 1
10 00001010: i&(-i) = 2
11 00001011: i&(-i) = 1
12 00001100: i&(-i) = 4
13 00001101: i&(-i) = 1
14 00001110: i&(-i) = 2
15 00001111: i&(-i) = 1
16 00010000: i&(-i) = 16
We can see this pattern.
Extrapolating that to i += (i&(-i)), assuming i is positive, it adds the value of the lowest set bit to i. For values that are a power of two, this just doubles the number.
For other values, it rounds the number up by the value of that lowest bit. Repeating this in a loop, you eventually end up with a power of 2. As for what such an increment could be used for, that depends on the context of where this expression was used.

Not all invocations running

I am attempting to implement a simple compute operation of multiplying all values in a buffer by a push constant.
This shader:
#version 450
layout(local_size_x = 1024, local_size_y = 1, local_size_z = 1) in;
layout(binding = 0) buffer Buffer { float x[]; };
layout(push_constant) uniform PushConsts { float a; };
void main() {
x[gl_GlobalInvocationID.x] *= a;
}
In my current implementation from my prints I get:
in:
1 2 3 4 5 5 4 3 2 1
[7]
workgroups: (1,1,1)
out:
7 14 3 4 5 5 4 3 2 1
1 2 3 4 5 5 4 3 2 1 being the initial values of the buffer.
7 being the value of the push constant.
(1,1,1) being the number of workgroups set.
7 14 3 4 5 5 4 3 2 1 being the resultant values of the buffer after execution.
Strangely only the 1st 2 values have been multiplied by the push constant (which leads me to think only the 1st 2 invocations ran).
I suspect this is likely an issue with memory allocation, although I don’t know where it may be coming from.
I would greatly appreciate any help, and if there is anything I can do to make this post better please let me know.
Project link, code file link and the .spv files (.comp files are in the repo, but I thought it worth including these in case you didn't want to compile the .comp files)
Problem originated with setting the size of the VkBuffer and VkDescriptorBufferInfo to the number of values rather than the number of bytes.
Making these changes fixes it:
bufferCreateInfo.size = size; -> bufferCreateInfo.size = sizeof(float)*size;
bindings[i].range = size; -> bindings[i].range = VK_WHOLE_SIZE;

Using bit wise operators

Am working on a C++ app in Windows platform. There's a unsigned char pointer that get's bytes in decimal format.
unsigned char array[160];
This will have values like this,
array[0] = 0
array[1] = 0
array[2] = 176
array[3] = 52
array[4] = 0
array[5] = 0
array[6] = 223
array[7] = 78
array[8] = 0
array[9] = 0
array[10] = 123
array[11] = 39
array[12] = 0
array[13] = 0
array[14] = 172
array[15] = 51
.......
........
.........
and so forth...
I need to take each block of 4 bytes and then calculate its decimal value.
So for eg., for the 1st 4 bytes the combined hex value is B034. Now i need to convert this to decimal and divide by 1000.
As you see, for each 4 byte block the 1st 2 bytes are always 0. So i can ignore those and then take the last 2 bytes of that block. So from above example, it's 176 & 52.
There're many ways of doing this, but i want to do it via using bit wise operators.
Below is what i tried, but it's not working. Basically am ignoring the 1st 2 bytes of every 4 byte block.
int index = 0
for (int i = 0 ; i <= 160; i++) {
index++;
index++;
float Val = ((Array[index]<<8)+Array[index+1])/1000.0f;
index++;
}
Since you're processing the array four-by-four, I recommend that you increment i by 4 in the for loop. You can also avoid confusion after dropping the unnecessary index variable - you have i in the loop and can use it directly, no?
Another thing: Prefer bitwise OR over arithmetic addition when you're trying to "concatenate" numbers, although their outcome is identical.
for (int i = 0 ; i <= 160; i += 4) {
float val = ((array[i + 2] << 8) | array[i + 3]) / 1000.0f;
}
First of all, i <= 160 is one iteration too many.
Second, your incrementation is wrong; for index, you have
Iteration 1:
1, 2, 3
And you're combining 2 and 3 - this is correct.
Iteration 2:
4, 5, 6
And you're combining 5 and 6 - should be 6 and 7.
Iteration 3:
7, 8, 9
And you're combining 8 and 9 - should be 10 and 11.
You need to increment four times per iteration, not three.
But I think it's simpler to start looping at the first index you're interested in - 2 - and increment by 4 (the "stride") directly:
for (int i = 2; i < 160; i += 4) {
float Val = ((Array[i]<<8)+Array[i+1])/1000.0f;
}

shifting with re-sampling in time series data

assume that i have this time-series data:
A B
timestamp
1 1 2
2 1 2
3 1 1
4 0 1
5 1 0
6 0 1
7 1 0
8 1 1
i am looking for a re-sample value that would give me specific count of occurrences at least for some frequency
if I would use re sample for the data from 1 to 8 with 2S, i will get different maximum if i would start from 2 to 8 for the same window size (2S)
ds = series.resample( str(tries) +'S').sum()
for shift in range(1,100):
tries = 1
series = pd.read_csv("file.csv",index_col='timestamp') [shift:]
ds = series.resample( str(tries) +'S').sum()
while ( (ds.A.max + ds.B.max < 4) & (tries < len(ds))):
ds = series.resample( str(tries) +'S').sum()
tries = tries + 1
#other lines
i am looking for performance improvement as it takes prohibitively long to finish for large data

Ordering an array based on 2D array of relations (higher, lower, doesn't matter)

I have been stuck with this problem for two days and I still can't get it right.
Basically, I have a 2D array with relations between certain numbers (in given range):
0 = the order doesn't matter
1 = the first number (number in left column) should be first
2 = the second number (number in upper row) should be first
So, I have some 2D array, for example this:
0 1 2 3 4 5 6
0 0 0 1 0 0 0 2
1 0 0 2 0 0 0 0
2 2 1 0 0 1 0 0
3 0 0 0 0 0 0 0
4 0 0 2 0 0 0 0
5 0 0 0 0 0 0 0
6 1 0 0 0 0 0 0
And my goal is to create a new array of given numbers (0 - 6) in such a way that it is following the rules from the 2D array (e.g. 0 is before 2 but it is after 6). I probably also have to check if such array exists and then create the array. And get something like this:
6 0 2 1 4 5
My Code
(It doesn't really matter, but I prefer c++)
So far I tried to start with ordered array 0123456 and then swap elements according to the table (but that obviously can't work). I also tried inserting the number in front of the other number according to the table, but it doesn't seem to work either.
// My code example
// I have:
// relArr[n][n] - array of relations
// resArr = {1, 2, ... , n} - result array
for (int i = 0; i < n; i++) {
for (int x = 0; x < n; x++) {
if (relArr[i][x] == 1) {
// Finding indexes of first (i) and second (x) number
int iI = 0;
int iX = 0;
while (resArr[iX] != x)
iX++;
while (resArr[iI] != i)
iI++;
// Placing the (i) before (x) and shifting array
int tmp, insert = iX+1;
if (iX < iI) {
tmp = resArr[iX];
resArr[iX] = resArr[iI];
while (insert < iI+1) {
int tt = resArr[insert];
resArr[insert] = tmp;
tmp = tt;
insert++;
}
}
} else if (relArr[i][x] == 2) {
int iI = 0;
int iX = 0;
while (resArr[iX] != x)
iX++;
while (resArr[iI] != i)
iI++;
int tmp, insert = iX-1;
if (iX > iI) {
tmp = resArr[iX];
resArr[iX] = resArr[iI];
while (insert > iI-1) {
int tt = resArr[insert];
resArr[insert] = tmp;
tmp = tt;
insert--;
}
}
}
}
}
I probably miss correct way how to check whether or not it is possible to create the array. Feel free to use vectors if you prefer them.
Thanks in advance for your help.
You seem to be re-ordering the output at the same time as you're reading the input. I think you should parse the input into a set of rules, process the rules a bit, then re-order the output at the end.
What are the constraints of the problem? If the input says that 0 goes before 1:
| 0 1
--+----
0 | 1
1 |
does it also guarantee that it will say that 1 comes after 0?
| 0 1
--+----
0 |
1 | 2
If so you can forget about the 2s and look only at the 1s:
| 0 1 2 3 4 5 6
--+--------------
0 | 1
1 |
2 | 1 1
3 |
4 |
5 |
6 | 1
From reading the input I would store a list of rules. I'd use std::vector<std::pair<int,int>> for this. It has the nice feature that yourPair.first comes before yourPair.second :)
0 before 2
2 before 1
2 before 4
6 before 0
You can discard any rules where the second value is never the first value of a different rule.
0 before 2
6 before 0
This list would then need to be sorted so that "... before x" and "x before ..." are guaranteed to be in that order.
6 before 0
0 before 2
Then move 6, 0, and 2 to the front of the list 0123456, giving you 6021345.
Does that help?
Thanks for the suggestion.
As suggested, only ones 1 are important in 2D array. I used them to create vector of directed edges and then I implemented Topological Sort. I decide to use this Topological Sorting Algorithm. It is basically Topological Sort, but it also checks for the cycle.
This successfully solved my problem.