should I lock before dlopen? - c++

I do have an *.so library, which obtains some information from system libraries using dlopen. Library can be used by multiple application simultaneously.
Maybe it is a silly question, but should I flock library before doing dlopen on it? I haven't found direct answer anywhere.

Similar to what was said in the comments, you don't need a semaphore(flock) unless you are accessing a shared resource that could change on you. (IE. accessing shared memory and needing to ensure concurrency of that data). The way dynamic loading ... dlopen()... works
Those two routines are actually simple wrappers that call back into
the dynamic linker. When the dynamic linker loads a library via
dlopen(), it does the same relocation and symbol resolution it does on
any other library, so the dynamically loaded program can without any
special arrangements call back to routines already loaded
Because of the way linking works, relocations and modifications to the GOT/PLT are done in the memory space of the ( processes calling dlopen ) not that where the shared object is mapped.
If a hundred processes use a shared library, it makes no sense to have
100 copies of the code in memory taking up space. If the code is
completely read-only, and hence never, ever, modified
Having the shared objects being in read-only memory you never need to worry about them suddenly changing on you sooo no need for a flock :)!
Note: Because you have a shared object linking to other shared objects... the GOT of the initial shared object needs to be updated/mod with the relocations of the libraries being loaded with dlopen() ... but that is stored in a r/w segment of process unique memory space not in that of the shared objects.
the shared library must still have a unqiue data instance in each
process...the read-write data section is always put at a known offset
from the code section of the library. This way, via the magic of
virtual-memory, every process sees its own data section but can share
the unmodified code

Related

Does dlopen create multiple library instances?

I can't seem to find an answer after searching for this out on the net.
When I use dlopen the first time it seems to take longer than any time after that, including if I run it from multiple instances of a program.
Does dlopen load up the so into memory once and have the OS save it so that any following calls even from another instance of the program point to the same spot in memory?
So basically does 3 instances of a program running a library mean 3 instances of the same .so are loaded into memory, or is there only one instance in memory?
Thanks
Does dlopen load up the so into memory once and have the OS save it so that any following calls even from another instance of the program point to the same spot in memory?
Multiple calls to dlopen from within a single process are guaranteed to not load the library more than once. From the man page:
If the same shared object is loaded again with dlopen(), the same
object handle is returned. The dynamic linker maintains reference
counts for object handles, so a dynamically loaded shared object is
not deallocated until dlclose() has been called on it as many times
as dlopen() has succeeded on it.
When the first call to dlopen happens, the library is mmaped into the calling process. There are usually at least two separate mmap calls: the .text and .rodata sections (which usually reside in a single RO segment) are mapped read-only, the .data and .bss sections are mapped read-write.
A subsequent dlopen from another process performs the same mmaps. However the OS does not have to load any of the read-only data from disk -- it merely increments reference counts on the pages already loaded for the first dlopen call. That is the sharing in "shared library".
So basically does 3 instances of a program running a library mean 3 instances of the same .so are loaded into memory, or is there only one instance in memory?
Depends on what you call an "instance".
Each process will have its own set of (dynamically allocated) runtime loader structures describing this library, and each set will contain an "instance" of the shared library (which can be loaded at different address in different process). Each process will also have its own instance of writable data (which uses copy-on-write semantics). But the read-only mappings will all occupy the same physical memory (though they can appear at different addresses in each of the processes).

Whether the complete library will get loaded to memory (RAM) while execution of a program?

How it will be different in case of static and dynamic library?
I am having understanding of static and dynamic library creation and use, but I am having doubt regarding loading of library to primary memory. whether static/dynamic library will get fully loaded to RAM if we are calling only one function from library.
e.g. consider we are having 10mb size of library and we are calling only one function from that library whether complete library will get loaded or only the called function object code will get load? and is it same in case of static and dynamic library?(if we are using static library executable size will be more but what about loading time)
thanks in advance!
Linux (as all modern OS with on-demand-paging) will map your whole library on load, but only page in those pages it has to read, e.g.: Init the libraries, Resolve all external (non-delayed) symbols.
Those tasks are mostly delegated to a user-mode dynamic loader.
Parts of your images never written, or remerged afterwards by KSM (Kernel Samepage Merger), can be stored only once, relieving memory pressure.
When dynamic linking is needed, the kernel bootstraps the dynamic
linker (ELF interpreter), which initializes itself, and then loads the
specified shared objects (unless already loaded).
IBM: Anatomy of Linux dynamic libraries

Dynamic linking, memory usage and concurrency

When an executable links with a static library, the executable contains only the necessary library parts, that are used in the code, right?
But I'm missing the part with - how the shared objects (the dynamic linked libraries) are used exactly?
As far as I know, they are not included in the executable, they are dynamically loaded using dlopen and this is done directly by the linker, right?
In this case, where's this library located in the memory? I mean, there are posts here, explaining that the dynamic libraries could reduce the memory usage, but how exactly? And if a dynamic library is somehow loaded into a shared memory (for several processes), how the kernel handles the concurrency in this case?
I realize this is something probably fundamental and sorry if this is a duplicate, I couldn't find such.
I am aware of Static linking vs dynamic linking and what I ask is a bit different.
The shared library is indeed loaded into memory that is shared between all "users" (all applications using the same library).
This is essentially done by reference-counting, so for each new user of the library, the reference is counted up. When an application exits, the reference count is counted down. If it gets to zero, the library is no longer needed, and will be removed from memory (quite possibly only when "memory is needed for something else", rather than "immediately"). The reference counting is done "atomically" by the kernel, so there is no conflict of concurrency.
Note that it's only the CODE in the shared library that is actually shared. Any data sections will be private per process.
the dynamic library is loaded only once for all the processes that are using them. The memory of the dynamic library is then mapped into the process adress space by the operating system. This way, it consumes its required memory only once. With static linking, all executables include the statically linked code. When the executable is loaded, the statically linked code is loaded as well. This means, a function that is included in 10 executables resides 10 times in memory.

Will using shared library in place of static library effect memory usage?

I am linking against 10 static library.
My binary file size is getting reduced when I am using dynamic library.
As I know using dynamic library will not reduce memory usage.
But my senior told me that using shared library will also reduce memory usage ? (when multiple process are running for the same executable code. )
Is that statement is right ?
he told me that as there will no duplicate copy of function used in library , so memory usage will be less. when you create n instance of that process.
When the process start it fork it's 10 children. So will using dynamic library in place of static library reduce total memory usages ?
In your example, dynamic libraries won't save you much. When you fork your process on a modern OS all the pages are marked copy on write rather than actually copied. So your static library is already shared between your 10 copies of your process.
However, where you can save is when the dynamic library is shared between different processes rather than forks of the same process. So if you're using the same glibc.so as another process, the two processes are sharing the physical pages of glibc.so, even though they are otherwise unrelated processes.
If you fork given process there shouldn't be much of a difference, because most operating systems use copy-on-write. This means that pages will only be copied if they're updated, so things like the code segments in shared libraries shouldn't be affected.
On the other hand different processes won't be able to share code if they're statically linked. Consider libc, which practically every binary links against... if they were all statically linked you'd end up with dozens of copies of printf in memory.
The bottom line is you shouldn't link your binaries statically unless you have an excellent reason for it.
Your senior in this instance is correct. A single copy of the shared library will be loaded into memory and will be used by every program that references it.
There is a post regarding this topic here:
http://www.linuxquestions.org/linux/articles/Technical/Understanding_memory_usage_on_Linux

How to get Shared Object in Shared Memory

Our app depends on an external, 3rd party-supplied configuration (including custom driving/decision making functions) loadable as .so file.
Independently, it cooperates with external CGI modules using a chunk of shared memory, where almost all of its volatile state is kept, so that the external modules can read it and modify it where applicable.
The problem is the CGI modules require a lot of the permanent config data from the .so as well, and the main app performs a whole lot of entirely unnecessary copying between the two memory areas to make the data available. The idea is to make the whole Shared Object to load into Shared Memory, and make it directly available to the CGI. The problem is: how?
dlopen and dlsym don't provide any facilities for assigning where to load the SO file.
we tried shmat(). It seems to work only until some external CGI actually tries to access the shared memory. Then the area pointed to appears just as private as if it was never shared. Maybe we're doing something wrong?
loading the .so in each script that needs it is out of question. The sheer size of the structure, connected with frequency of calls (some of the scripts are called once a second to generate live updates), and this being an embedded app make it no-go.
simply memcpy()'ing the .so into shm is not good either - some structures and all functions are interconnected through pointers.
The first thing to bear in mind when using shared memory is that the same physical memory may well be mapped into the two processes virtual address space as different addresses. This means that if pointers are used anywhere in your data structures, they are going to cause problems. Everything must work off an index or an offset to work correctly. To use shared memory, you will have to purge all the pointers from your code.
When loading a .so file, only one copy of the .so file code is loaded (hence the term shared object).
fork may also be your friend here. Most modern operating systems implement copy-on-write semantics. This means that when you fork, your data segments are only copied into separate physical memory when one process writes to the given data segment.
I suppose the easiest option would be to use memory mapped file, what Neil has proposed already. If this option does not fill well, alternative is to could be to define dedicated allocator. Here is a good paper about it: Creating STL Containers in Shared Memory
There is also excellent Ion GaztaƱaga's Boost.Interprocess library with shared_memory_object and related features. Ion has proposed the solution to the C++ standardization committee for future TR: Memory Mapped Files And Shared Memory For C++
what may indicate it's worth solution to consider.
Placing actual C++ objects in shared memory is very, very difficult, as you have found. I would strongly recommend you don't go that way - putting data that needs to be shared in shared memory or a memory mapped file is much simpler and likely to be much more robust.
You need to implement object's Serialization
Serialization function will convert your object into bytes, then you can write bytes in SharedMemory and have your CGI module to deserialize bytes back to object.