How are ReplicatedDist and PrivateDist different? I know the syntax is different ;-) I’ve used PrivateDist extensively in my code and I am wondering what the advantages are of each. Is the ReplicatedDist local access (and known to the compiler) by default? I don’t think the documentation is clear enough.
How are ReplicatedDist and PrivateDist different?
(A) PrivateDist gives you a single array. ReplicatedDist gives you one array per locale. Normal array accesses give you the array on the current locale. Special methods coming with ReplicatedDist let you access arrays on other locales.
(B) The domain of a PrivateDist array is always the same, PrivateSpace. When using ReplicatedDist, you choose the domain that the array on each locale will have.
(C) Performance may differ, for example the amount of communication between locales. Not sure whether one is always better than the other, and if so, which one.
Is the ReplicatedDist local access (and known to the compiler) by default?
Yes, this is the intention. You may need to wrap the surrounding code in a local block for the compiler to take advantage of this.
Related
How exactly structs are packed and padded in c++? The standard does not says anything about how it should be done (as far as I know) and compilers can do whatever they want. But there are tutorials showing how to efficiently pack structs with known rules (for example that every variable needs to be on address that is multiple of its size and if end of previous variable is not multiple, then padding will be inserted), and with these rules we can pack structs by hand in source. What is it finally? We know in what way structs will be packed on modern machines (for example PCs) or it is just idea that can be right, but it is not good to take it for granted?
How exactly structs are packed and padded in c++?
Short answer: In such way that alignment requirements are satisfied.
The standard does not says anything about how it should be done (as far as I know) and compilers can do whatever they want.
Within bounds of the alignment requirements, this is indeed correct. This is also an answer to your question.
I'm trying to port C++ code from a developer who uses global variables called
p0, p1, ..., p30
of integer type.
I wondered if I could not just use an array int p[31]; and access them as p[0], p[1],...
It seems plausible that there would be no performance hit if the indices were always passed as constants. Then I could just pass his data as extern int p[];.
Obviously I could use descriptive macros for the various indices to make the code clearer.
I know that this sounds like weird code, but the developer seems to have a "neurodiverse" personality, and we can't just tell him to mend his ways. Performance is very important in the module he is working on.
I don't see any danger in the replacement of variables with an array.
Modern compilers are very good at optimizing code.
You normally can asume that there will be no difference between using individual variables p0, … p30 and an std::array<int, 31> (or an int[31]), if they are used in the same way and if you use only constants for accessing the array.
A compiler is not required to keep an std::array or an int[] as such, but can completely or partially optimize it a way as long as it complains with the as-if rule.
Variables (also arrays) only need to exists in memory if the compiler can't determin their contents at runtime and/or if the registers are not sufficient to do all manipulations related to those variables using only these registers.
If they exits in memory they need to be referenced by their address in memory, for both a pN and a p[N] (if N is constant) the address where the value is in memory can be determined in the same way at compile time.
If you are unsure if the generated code is the same you can always compair the output generated by the compiler (this can e.g. be done on godbolt), or using the corresponding compiler flags if you don't want to submit code to a foreign service.
In a typical C or C++ struct the developer must explicitly order data members in a way that provides efficient memory alignment and padding, if that is an issue.
Google's Protocol Buffers behave a lot like structs and it is not clear how the compilation of these affects memory layout. Does anyone know if this tendency to organize data in a specific order for the sake of efficient memory layout is automatically handled by the protocol buffer compiler? I have been unable to find any information on this.
I.E. the buffer might actually internally order the data differently than it is specified in the message object of the protobuf.
In a typical C or C++ struct the developer must explicitly order data members in a way that provides efficient memory alignment and padding, if that is an issue.
Actually this is not entirely true.
It's true that most compilers (actually all I know of) tend to align struct elements to machine word addresses. They do this, due to performance reasons because it's usually cheaper to read from a word address and just mask away some bits than to read from the word address, shift the word, so the value you are looking for is right aligned and the mask away the bits not needed. (Of course this depends on the architecture you are compiling for)
So why is your statement I quoted above not true? - Because of the fact that compilers are arranging elements as described above, they also offer the programmer the opportunity to influnece this behavior. Usually this is done using a compiler specific pragma.
For example GCC and MS C Compilers provide a pragma called "pack" which allows the programmer to change the alignment behavior of the compiler for specific structs. Of course, if you choose to set pack to '1', the memory usage is improvide, but this will possibly impact your runtime behavior.
What never happens to my knowledge is a reordering of the members in a struct by the compiler.
I am porting application from fortran to JAVA.I was wondering how to convert if equivalence is between two different datatypes.
If I type cast,i may loose the data or should I pass that as byte array?
You have to fully understand the old FORTRAN code. EQUIVALENCE shares memory WITHOUT converting the values between different datatypes. Perhaps the programmer was conserving memory by overlapping arrays that weren't used at the same time and the EQUIVALENCE can be ignored. Perhaps they were doing something very tricky, based on the binary representation of a particular platform, and you will need to figure out what they were doing.
There is extremely little reason to use EQUIVALENCE in modern Fortran. In most cases where bits need to be transferred from one type to another without conversion, the TRANSFER intrinsic function should be used instead.
From http://www.fortran.com/F77_std/rjcnf0001-sh-8.html#sh-8.2 :
An EQUIVALENCE statement is used to specify the sharing of storage units by two or more entities in a program unit. This causes association of the entities that share the storage units.
If the equivalenced entities are of different data types, the EQUIVALENCE statement does not cause type conversion or imply mathematical equivalence. If a variable and an array are equivalenced, the variable does not have array properties and the array does not have the properties of a variable.
So, consider the reason it was EQUIVALENCE'd in the Fortran code and decide from there how to proceed. There's not enough information in your question to assess the intention or best way to convert it.
I am currently in process of making our application Large Address Aware. As experience has shown, there are some unexpected gotchas when doing so. I create this post to make a complete list of steps which need to be taken.
The development considerations listed in the AMD Large Address Aware guide provide a good starting point, but are by no means complete:
The following considerations will help to make sure that the code can handle addresses larger than 2GB:
Avoid the use of signed pointer arithmetic (I.e. compares and adds)
Pointers use all 32-bits. Don’t use Bit31 for something else.
Some dll’s will be loaded just under the 2GB boundary. In this case, no consecutive memory can be allocated with VirtualAlloc().
Whenever possible, use GlobalMemoryStatusEx() (preferred) or GlobalMemoryStatus() to retrieve memory sizes.
Therefore, the question is: What is the complete list of things which need to be done when making C++ Win32 native application Large Address Aware?
(obvious) select Support Address Larger than 2 Gigabytes (/LARGEADDRESSAWARE) in the project properties: Linker / System / Enable Large Address
check all pointer subtractions and verify the result is stored in a type which can contain the possible difference, or replace them with comparisons or other constructs - see Detect pointer arithmetics because of LARGEADDRESSAWARE). Note: pointer comparison should be fine, contrary to AMD advice, there is no reason why it should cause 4 GB issues
make sure you are not assuming pointers have Bit31 zero, do not attempt to use Bit31 for something else.
replace all GetCursorPos calls with GetCursorInfo - see GetCursorPos fails with large addresses
for all assignments into PVOID64 use PtrToPtr64, needed e.g. when using ReadFileScatter, see ReadFileScatter remark section