opengl shared memory layout and size - opengl

Given the following glsl declarations (this is just an example):
struct S{
f16vec3 a;
float16_t b;
f16vec3_t c;
float16_t d;
};
shared float16_t my_float_array[100];
shared S my_S_array[100];
I have the following questions:
How much shared memory will be used by a given declaration, in the above example for instance?
Which memory layout is used for variables in shared memory? std140, std430 or something else?
How does this play with bank conflicts?
I was able to get the total shared memory required by a program using glGetProgramBinary and skiping until the begining of the text part indicated by a line starting with "!!NV":
...
!!NVcp5.0
OPTION NV_shader_buffer_load;
OPTION NV_internal;
OPTION NV_gpu_program_fp64;
OPTION NV_shader_storage_buffer;
OPTION NV_bindless_texture;
OPTION NV_gpu_program5_mem_extended;
GROUP_SIZE 4 4 4;
SHARED_MEMORY 4480;
SHARED shared_mem[] = { program.sharedmem };
...
This is rather indirect though and does not tell much about the alignment/packing rules.

Which memory layout is used for variables in shared memory? std140, std430 or something else?
It's implementation-defined.
The layouts applied to buffer-backed storage matter because code external to the shader needs to be able to access it. shared variables cannot be externally to the shader, so the layout of such variables is functionally implementation-defined.
In short, you cannot know for certain how much storage any particular shared variable declaration will consume. You can assume that it will take up at least as many bytes as is minimally required to store the data you asked to store. But that's about it.
How does this play with bank conflicts?
"Banks" are not a concept which OpenGL or GLSL recognizes. It is therefore implementation-defined.

Related

How to manage type-safety in POSIX shared memory

I have a 'program' that runs as multiple independent processes that use POSIX semaphores and shared memory to store and share common variables with each other. The problem is that so far this has been implemented as a main program which sets the initial values of the variables in shared memory, and the other programs rely on getting the address offset right to access a given variable.
I would like to improve on this design. An idea I had was to bundle together all methods for access to shared memory in a shared library, and have perhaps an enum as an argument to a GetVariable method, something like this
enum class SHMVariables: long
{
kVariable1 = address_offset_of_variable_1,
kVariable2 = address_offset_of_variable_2,
...
}
template <class T>
void GetVariable(SHMVariables var, T &)
{
// lock semaphore, do some memory boundary checks, and return the value of the variable
}
The challenge is that the variables may have different arithmetic types (some are user-defined structs with arithmetic type members) and I'm wondering what a best approach could be to manage type-safety. Could a map do the trick?
std::map<SHMVariables, std::size_t> // the value being assigned with something like sizeof(struct foo)
What are best practices when it comes to managing variables of different type in shared memory? Is shared memory even an appropriate choice?

Two Different Processes With 2 std::atomic Variables on Same Address?

I read C++ Standard (n4713)'s § 32.6.1 3:
Operations that are lock-free should also be address-free. That is,
atomic operations on the same memory location via two different
addresses will communicate atomically. The implementation should not
depend on any per-process state. This restriction enables
communication by memory that is mapped into a process more than once
and by memory that is shared between two processes.
So it sounds like it is possible to perform a lock-free atomic operation on the same memory location. I wonder how it can be done.
Let's say I have a named shared memory segment on Linux (via shm_open() and mmap()). How can I perform a lockfree operation on the first 4 bytes of the shared memory segment for example?
At first, I thought I could just reinterpret_cast the pointer to std::atomic<int32_t>*. But then I read this. It first points out that std::atomic might not have the same size of T or alignment:
When we designed the C++11 atomics, I was under the misimpression that
it would be possible to semi-portably apply atomic operations to data
not declared to be atomic, using code such as
int x; reinterpret_cast<atomic<int>&>(x).fetch_add(1);
This would clearly fail if the representations of atomic and int
differ, or if their alignments differ. But I know that this is not an
issue on platforms I care about. And, in practice, I can easily test
for a problem by checking at compile time that sizes and alignments
match.
Tho, it is fine with me in this case because I use a shared memory on the same machine and casting the pointer in two different processes will "acquire" the same location. However, the article states that the compiler might not treat the casted pointer as a pointer to an atomic type:
However this is not guaranteed to be reliable, even on platforms on
which one might expect it to work, since it may confuse type-based
alias analysis in the compiler. A compiler may assume that an int is
not also accessed as an atomic<int>. (See 3.10, [Basic.lval], last
paragraph.)
Any input is welcome!
The C++ standard doesn't concern itself with multiple processes and no guarantees were given outside of a multi-threaded environment.
However, the standard does recommend that implementations of lock-free atomics be usable across processes, which is the case in most real implementations.
This answer will assume atomics behave more or less the same with processes as with threads.
The first solution requires C++20 atomic_ref
void* shared_mem = /* something */
auto p1 = new (shared_mem) int; // For creating the shared object
auto p2 = (int*)shared_mem; // For getting the shared object
std::atomic_ref<int> i{p2}; // Use i as if atomic<int>
You need to make sure the shared int has std::atomic_ref<int>::required_alignment alignment; typically the same as sizeof(int). Normally you'd use alignas() on a struct member or variable, but in shared memory the layout is up to you (relative to a known page boundary).
This prevents the presence of opaque atomic types existing in the shared memory, which gives you precise control over what exactly goes in there.
A solution prior C++20 would be
auto p1 = new (shared_mem) atomic<int>; // For creating the shared object
auto p2 = (atomic<int>*)shared_mem; // For getting the shared object
auto& i = *p2;
Or using C11 atomic_load and atomic_store
_Atomic int* i = (_Atomic int*)shared_mem;
atomic_store(i, 42);
int i2 = atomic_load(i);
Alignment requirements are the same here, alignof(std::atomic<int>) or _Alignof(atomic_int).
Yes, the C++ standard is a bit mealy-mouthed about all this.
If you are on Windows (which you probably aren't) then you can use InterlockedExchange() etc, which offer all the required semantics and don't care where the referenced object is (it's a LONG *).
On other platforms, gcc has some atomic builtins which might help with this. They might free you from the tyranny of the standards writers. Trouble is, it's hard to test if the resulting code is bullet-proof.
On all mainstream platforms, std::atomic<T> does have the same size as T, although possibly higher alignment requirement if T has alignof < sizeof.
You can check these assumptions with:
static_assert(sizeof(T) == sizeof(std::atomic<T>),
"atomic<T> isn't the same size as T");
static_assert(std::atomic<T>::is_always_lock_free, // C++17
"atomic<T> isn't lock-free, unusable on shared mem");
auto atomic_ptr = static_cast<atomic<int>*>(some_ptr);
// beware strict-aliasing violations
// don't also access the same memory via int*
// unless you're aware of possible issues
// also make sure that the ptr is aligned to alignof(atomic<T>)
// otherwise you might get tearing (non-atomicity)
On exotic C++ implementations where these aren't true, people that want to use your code on shared memory will need to do something else.
Or if all accesses to shared memory from all processes consistently use atomic<T> then there's no problem, you only need lock-free to guarantee address-free. (You do need to check this: std::atomic uses a hash table of locks for non-lock-free. This is address-dependent, and separate processes will have separate hash tables of locks.)

C++ Class Memory Model And Alignment

I have several questions to ask that pertains to data position and alignment in C++. Do classes have the same memory placement and memory alignment format as structs?
More specifically, is data loaded into memory based on the order in which it's declared? Do functions affect memory alignment and data position or are they allocated to another location? Generally speaking, I keep all of my memory alignment and position dependent stuff like file headers and algorithmic data within a struct. I'm just curious to know whether or not this is intrinsic to classes as it is to structs and whether or not it will translate well into classes if I chose to use that approach.
Edit: Thanks for all your answers. They've really helped a lot.
Do classes have the same memory placement and memory alignment format
as structs?
The memory placement/alignment of objects is not contingent on whether its type was declared as a class or a struct. The only difference between a class and a struct in C++ is that a class have private members by default while a struct have public members by default.
More specifically, is data loaded into memory based on the order in
which it's declared?
I'm not sure what you mean by "loaded into memory". Within an object however, the compiler is not allowed to rearrange variables. For example:
class Foo {
int a;
int b;
int c;
};
The variables c must be located after b and b must be located after a within a Foo object. They are also constructed (initialized) in the order shown in the class declaration when a Foo is created, and destructed in the reverse order when a Foo is destroyed.
It's actually more complicated than this due to inheritance and access modifiers, but that is the basic idea.
Do functions affect memory alignment and data position or are they
allocated to another location?
Functions are not data, so alignment isn't a concern for them. In some executable file formats and/or architectures, function binary code does in fact occupy a separate area from data variables, but the C++ language is agnostic to that fact.
Generally speaking, I keep all of my memory alignment and position
dependent stuff like file headers and algorithmic data within a
struct. I'm just curious to know whether or not this is intrinsic to
classes as it is to structs and whether or not it will translate well
into classes if I chose to use that approach.
Memory alignment is something that's almost automatically taken care of for you by the compiler. It's more of an implementation detail than anything else. I say "almost automatically" since there are situations where it may matter (serialization, ABIs, etc) but within an application it shouldn't be a concern.
With respect with reading files (since you mention file headers), it sounds like you're reading files directly into the memory occupied by a struct. I can't recommend that approach since issues with padding and alignment may make your code work on one platform and not another. Instead you should read the raw bytes a couple at a time from the file and assign them into the structs with simple assignment.
Do classes have the same memory placement and memory alignment format as structs?
Yes. Technically there is no difference between a class and a struct. The only difference is the default member access specification otherwise they are identical.
More specifically, is data loaded into memory based on the order in which it's declared?
Yes.
Do functions affect memory alignment and data position or are they allocated to another location?
No. They do not affect alignment. Methods are compiled separately. The object does not contain any reference to methods (to those that say virtual tables do affect members the answer is yes and no but this is an implementation detail that does not affect the relative difference between members. The compiler is allowed to add implementation specific data to the object).
Generally speaking, I keep all of my memory alignment and position dependent stuff like file headers and algorithmic data within a struct.
OK. Not sure how that affects anything.
I'm just curious to know whether or not this is intrinsic to classes as it is to structs
Class/Structs different name for the same thing.
and whether or not it will translate well into classes if I chose to use that approach.
Choose what approach?
C++ classes simply translate into structs with all the instance variables as the data contained inside the structs, while all the functions are separated from the class and are treated like functions with accept those structs as an argument.
The exact way instance variables are stored depends on the compiler used, but they generally tend to be in order.
C++ classes do not participate in "persistence", like binary-mode structures, and shouldn't have alignment attached to them. Keep the classes simple.
Putting alignment with classes may have negative performance benefits and may have side effects too.

C++ object in memory

Is there a standard in storing a C++ objects in memory? I wish to set a char* pointer to a certain address in memory, so that I can read certain objects' variables directly from the memory byte by byte. When I am using Dev C++, the variables are stored one by one right in the memory address of an object in the order that they were defined. Now, can it be different while using a different compiler (like the variables being in a different order, or somewhere else)? Thank you in advance. :-)
The variables can't be in a different order, as far as I know. However, there may be varying amounts of padding between members. Also I think all bets are off with virtual classes and different implementations of user-defined types (such as std::string) may be completely different between libraries (or even build options).
It seems like a very suspicious thing to do. What do you need it for: to access private members?
I believe that the in-memory layout of objects is implementation defined - not the ordering, necessarily, but the amount of space. In particular, you will probably run into issues with byte-alignment and so-forth, especially across platforms.
Can you give us some details of what you're trying to do?
Implementations are free to do anything they want :P. However since C++ has to appeal to certain styles of programming, you will find a deterministic way of accessing your fields for your specific compiler/platform/cpu architecture.
If your byte ordering is varied on a different compiler, my first assumption would be byte packing issues. If you need the class to have a certain specific byte ordering first look up "#pragma pack" directives for your compiler... you can change the packing order into something less optimal but deterministic. Please note this piece of advice generally applies to POD data types.
The C++ compiler is not allowed to reorder variables within a visibility block (public, protected, etc). But it is allowed to reorder variables in separate visibility blocks. For example:
struct A {
int a;
short b;
char c;
};
struct B {
int a;
public:
short b;
protected:
char c;
};
In the above, the variables in A will always be laid out in the order a, b, c. The variables in B might be laid out in another order if the compiler chose. And, of course, there are alignment and packing requirements so there might be "spaces" between some of the variables if needed.
Keep in mind when working with multi-dimensional arrays that they are stored in Row Major Order.
The order of the variables should never change, but as others have said, the byte packing will vary. Another thing to consider is the endianness of the platform.
To get around the byte alignment/packing problem, most compilers offer some way to guide the process. In gcc you could use __attribute__((__packed__)) and in msvc #pragma pack.
I've worked with something that did this professionally, and as far as I could tell, it worked very specifically because it was decoding something another tool encoded, so we always knew exactly how it worked.
We did also use structs that we pointed at a memory address, then read out data via the struct's variables, but the structs notably included packing and we were on an embedded platform.
Basically, you can do this, so long as you know -exactly- how everything is constructed on a byte-by-byte level. (You might be able to get away with knowing when it's constructed the same way, which could save some time and learning)
It sounds like you want to marshall objects between machines over a TCP/IP connection. You can probably get away with this if the code was compiled with the same compiler on each end, otherwise, I'm not so sure. Keep in mind that if the platforms can be different, then you might need to take into account different processor endians!
Sounds like what you real want to ask is how to serialize your objects
http://dieharddeveloper.blogspot.in/2013/07/c-memory-layout-and-process-image.html
In the middle of the process's address space, there is a region is reserved for shared objects. When a new process is created, the process manager first maps the two segments from the executable into memory. It then decodes the program's ELF header. If the program header indicates that the executable was linked against a shared library, the process manager (PM) will extract the name of the dynamic interpreter from the program header. The dynamic interpreter points to a shared library that contains the runtime linker code.

Arranging global/static objects sequentially in memory

In C++, is it possible to force the compiler to arrange a series of global or static objects in a sequential memory position? Or is this the default behavior? For example, if I write…
MyClass g_first (“first”);
MyClass g_second (“second”);
MyClass g_third (“third”);
… will these objects occupy a continuous chunk of memory, or is the compiler free to place them anywhere in the address space?
The compiler can do as it pleases when it comes to placing static objects in memory; if you want better control over how your globals are placed, you should consider writing a struct that encompasses all of them. That will guarantee that your objects will all be packed in a sequential and predictable order.
Placing specific variables or group of variables into a memory segment is not a standard feature of the compiler.
But some compiler supports special methods to do this. Especially in embedded systems. For example in Keil, I guess you at at operator to place a particular variable.
The way to force objects to be in a contiguous piece of memory is to put them into an array.
If you use the built-in array type, the only way they can be initialized is their default constructors (although you can change their values later):
MyClass my_globals[3];
If you use a dynamic array (called std::vector in C++), you are more flexible:
namespace {
typedef std::vector<MyClass> my_globals_type;
my_globals_type init_my_globals()
{
my_globals_type globals;
globals.push_back(MyClass(“first”));
globals.push_back(MyClass(“second”));
globals.push_back(MyClass(“third”));
return globals;
}
my_globals_type my_globals = init_my_globals();
}
Note that global variables are usually frowned upon. And rightly so.
Yes, some compilers now contain optimizations that will automatically do something like this for you:
e.g.
Automatic pool allocation: improving performance by controlling data structure layout in the heap
Restructuring field layouts for embedded memory systems
MPADS: memory-pooling-assisted data splitting