How are global variables in shared libraries linked?

How are global variables in shared libraries linked? - c++

Suppose I have shared library with this function where "i" is some global variable.
int foo() {
return i++;
}
When I call this function from multiple processes the value of "i" in each process is independent on the other processes.
This behavior is quite expected.
I was just wondering how is usually this behavior implemented by the linker? From my understanding the code is shared between processes, so the variable has to have the same virtual address in all address spaces of every program that uses this library. That condition seems quite difficult to accomplish to me so I guess I am missing something here and it is done differently.
Can I get some more detailed info on this subject?

The dynamic linking process at run time (much the same as the static linking process), allocates separate data (and bss) segments for each process, and maps those into the process address space. Only the text segments are shared between processes. This way, each process gets its own copy of static data.

the code is shared between processes, so the variable has to have the
same virtual address in all address spaces of every program that uses
this library
The code is not shared the way you think. Yes the dynamic shared object is loaded only once but the memory references or the stack or the heap that code in the so uses is not shared. Only the section that contains the code is shared.

Each process has it's own unique address space, so when a process access the variable it can have different values then the other process. If the process should share the same memory, they would have to specifically set this up. A shared library is not enough for that.

Related

Understanding Dynamic Library loading in Linux

I am trying to understand Dynamic Library loading in Linux from here [1] and want to clarify the concept. Concretely, when a dynamic library is loaded in a process in a Linux environment, it is loaded at any point in the address space. Now, a library has a code segment, and a data segment. The code segment's address is not defined pre-linking so it is 0x0000000 while for data segment, some number is defined to be an address.
But here is the trick, this address of data segment is not actually the true address. Actually, at whatever position code segment is loaded, data segment's pre-defined address is added to it.
Am I correct here?
One more thing from the referenced article. What does this statement mean?
However, we have the constraint that the shared library must still have a unqiue data instance in each process. While it would be possible to put the library data anywhere we want at runtime, this would require leaving behind relocations to patch the code and inform it where to actually find the data — destroying the always read-only property of the code and thus sharability.
[1] http://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html

Actually, at whatever position code segment is loaded, data segment's pre-defined address is added to it.
Yes. The "VirtAddr" of the data segment will be added to base address.
What does this statement mean?
It means that when library accesses its own static data, we should not use relocations in the library code. Otherwise linker may need to patch the binary code, which leads to unsharing some parts of library codes between processes (if process1 loads library lib1 at 0x40000000, and process2 loads lib1 at 0x50000000, their data relocations will be different).
So, different solution is used in real life. Both library code and data are loaded together, and the offset between code and data is fixed for all cases. There is the "solution" after text you cited: http://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
As you can see from the above headers, the solution is that the read-write data section is always put at a known offset from the code section of the library. This way, via the magic of virtual-memory, every process sees its own data section but can share the unmodified code. All that is needed to access data is some simple maths; address of thing I want = my current address + known fixed offset.

Sharing heap memory in a dll between two separate applications

Sorry if this question has been answered before; however all of the questions that are similar seem to be related to global or static variables in a DLL and sharing of those.
Is it possible to have one instance of a dll shared between two separate applications?
I have two applications (appA, appB) and a DLL (theDLL).
I am seeing if it is possible for appA to call a function in theDLL which then creates a variable on the heap and stores that in theDLL. At a later time, I would like to have appB connect to theDLL and be able to access that variable created earlier. Again, sorry if this answer is the same as static and global variables in dlls.
Here is some psuedo code:
(theDLL)
class VariableHolder
{
public:
void StoreVariable(int x)
{
mInt = new int(x);
}
int GetVariable()
{
return mInt;
}
private:
int mInt;
}
(appA)
int main()
{
...
(assuming we got access to a VariableHolder singleton created in theDLL)
theVarialbeHolder.StoreVariable(5);
...
}
(appB)
int main()
{
...
(assuming we got access to a VariableHolder singleton created in theDLL)
if (theVarialbeHolder.GetVariable() == 5)
{
cout << "Hurray, how did you know?";
}
...
}

This exactly is not possible - as the address spaces for the two processes are different (because they're virtual, having been created by the kernel), so a valid pointer in one won't work within the other. However, you can use shared memory to transport raw scalar data (strings, integers) between processes - here's how.

Yes, this is possible using shared memory. It doesn't need to use a shared DLL though.
Depending on the operating, the approaches are somewhat different:
On Windows, a shared file is used on mapped into memory (see Creating Named Shared Memory).
On Linux and Unix, there are direct functions to create shared memory areas, e.g. System V IPC. Just google for it.

Shared libraries on almost any modern operating system are implemented by shared read-only executable and data pages, mapped simultaneously into the address space of any process that uses the given library. On Windows though (in contrast to most Unix system) this sharing can also be extended to read-write data segments in DLLs, so it is possible to have global variables in a DLL, shared among all images that have the DLL loaded. To achieve this, there is a two-step process involved. First you tell the compiler to put the shared variables in a new named data section:
#pragma data_seg (".myshared")
int iscalar = 0;
int iarray[10] = { 0 };
#pragma data_seg ()
It is important to have all those variables statically intialised otherwise they will end up into the .bss section instead. Then you have to tell the linker that you'd like to have the .myshared section with shared read-write attributes using the /SECTION:.myshared,RWS option.
This mechanism is much simpler than creating and binding to named shared memory objects but it only allows to share statically allocated global variables - you cannot use it to share data on the heap as the heap is private to the process. For anything more complex you should use shared memory mappings, i.e. as shown on the MSDN page, linked in the answer from H2CO3.

This is not possible. The DLL can be shared in the 2 process but the data isn't. It's the code or program image (i.e. the logic or instructions) that is shared and not the data. Every Dll is mapped into the virtual address space of the process that loads it so the data either is on the data section of the process or on stack if it is local to the function. When a process is executing the address of the other process data is not visible.
You need to do some reading on virtual memory and how memory management unit(MMU) works. The OS, CPU, MMU works together to make it possible. The reliable way to do this is inter process communication. You can use shared memory where each process has a copy of data in form of virtual address but it is eventually mapped to same location into the real memory i.e the real address. The OS makes it possible.

This as #H2CO3 pointed out, is not possible because of different address spaces.
However, from your problem, it looks like you need either a surrogate process around that DLL or a Service and then different processes can connect to that surrogate process/exe and use the shared memory.

You must use shared memory (as was written above).
I recommend to use boost interprocess library. See documentation about shared memory - Shared memory between processes

Converting a string into a function in c++

I have been looking for a way to dynamically load functions into c++ for some time now, and I think I have finally figure it out. Here is the plan:
Pass the function as a string into C++ (via a socket connection, a file, or something).
Write the string into file.
Have the C++ program compile the file and execute it. If there are any errors, catch them and return it.
Have the newly executed program with the new function pass the memory location of the function to the currently running program.
Save the location of the function to a function pointer variable (the function will always have the same return type and arguments, so
this simplifies the declaration of the pointer).
Run the new function with the function pointer.
The issue is that after step 4, I do not want to keep the new program running since if I do this very often, many running programs will suck up threads. Is there some way to close the new program, but preserve the memory location where the new function is stored? I do not want it being overwritten or made available to other programs while it is still in use.
If you guys have any suggestions for the other steps as well, that would be appreciated as well. There might be other libraries that do things similar to this, and it is fine to recommend them, but this is the approach I want to look into — if not for the accomplishment of it, then for the knowledge of knowing how to do so.
Edit: I am aware of dynamically linked libraries. This is something I am largely looking into to gain a better understanding of how things work in C++.

I can't see how this can work. When you run the new program it'll be a separate process and so any addresses in its process space have no meaning in the original process.
And not just that, but the code you want to call doesn't even exist in the original process, so there's no way to call it in the original process.
As Nick says in his answer, you need either a DLL/shared library or you have to set up some form of interprocess communication so the original process can send data to the new process to be operated on by the function in question and then sent back to the original process.

How about a Dynamic Link Library?
These can be linked/unlinked/replaced at runtime.
Or, if you really want to communicated between processes, you could use a named pipe.
edit- you can also create named shared memory.

for the step 4. we can't directly pass the memory location(address) from one process to another process because the two process use the different virtual memory space. One process can't use memory in other process.
So you need create a shared memory through two processes. and copy your function to this memory, then you can close the newly process.
for shared memory, if in windows, looks Creating Named Shared Memory
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366551(v=vs.85).aspx
after that, you still create another memory space to copy function to it again.
The idea is that the normal memory allocated only has read/write properties, if execute the programmer on it, the CPU will generate the exception.
So, if in windows, you need use VirtualAlloc to allocate the memory with the flag,PAGE_EXECUTE_READWRITE (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx)
void* address = NULL;
address= VirtualAlloc(NULL,
sizeof(emitcode),
MEM_COMMIT|MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
After copy the function to address, you can call the function in address, but need be very careful to keep the stack balance.

Dynamic library are best suited for your problem. Also forget about launching a different process, it's another problem by itself, but in addition to the post above, provided that you did the virtual alloc correctly, just call your function within the same "loadder", then you shouldn't have to worry since you will be running the same RAM size bound stack.
The real problems are:
1 - Compiling the function you want to load, offline from the main program.
2 - Extract the relevant code from the binary produced by the compiler.
3 - Load the string.
1 and 2 require deep understanding of the entire compiler suite, including compiler flag options, linker, etc ... not just the IDE's push buttons ...
If you are OK, with 1 and 2, you should know why using a std::string or anything but pure char *, is an harmfull.
I could continue the entire story but it definitely deserve it's book, since this is Hacker/Cracker way of doing things I strongly recommand to the normal user the use of dynamic library, this is why they exists.

Usually we call this code injection ...
Basically it is forbidden by any modern operating system to access something for exceution after the initial loading has been done for sake of security, so we must fall back to OS wide validated dynamic libraries.
That's said, one you have valid compiled code, if you realy want to achieve that effect you must load your function into memory then define it as executable ( clear the NX bit ) in a system specific way.
But let's be clear, your function must be code position independant and you have no help from the dynamic linker in order to resolve symbol ... that's the hard part of the job.

How to link non thread-safe library so each thread will have its own global variables from it?

I have a program that I link with many libraries. I run my application on profiler and found out that most of the time is spent in "waiting" state after some network requests.
Those requests are effect of my code calling sleeping_function() from external library.
I call this function in a loop which executes many, many times so all waiting times sum up to huge amounts.
As I cannot modify the sleeping_function() I want to start a few threads to run a few iterations of my loop in parallel. The problem is that this function internally uses some global variables.
Is there a way to tell linker on SunOS that I want to link specific libraries in a way that will place all variables from them in Thread Local Storage?

I don’t think you’ll be able to achieve this with just the linker, but you might be able to get something working with some code in C.
The problem is that a call to load a library that is already loaded will return a reference to the already loaded instance instead of loading a new copy. A quick look at the documentation for dlopen and LoadLibrary seems to confirm that there’s no way to load the same library more than once, at least not if you want the image to be prepared for execution. One way to circumvent this would be to prevent the OS from knowing that it is the same library. To do this you could make a copy of the file.
Some pseudo code, just replace calls to sleeping_function with calls to call_sleeping_function_thread_safe:
char *shared_lib_name
void sleeping_function_thread_init(char *lib_name);
void call_sleeping_function_thread_safe()
{
void *lib_handle;
pthread_t pthread;
new_file_name = make_copy_of_file(shared_lib_name);
pthread_create(&pthread, NULL, sleeping_function_thread_init, new_file_name);
}
void sleeping_function_thread_init(char *lib_name)
{
void *lib_handle;
void (*)() sleeping_function;
lib_handle = dlopen(lib_name, RTLD_LOCAL);
sleeping_function = dlsym(lib_handle, "sleeping_function")
while (...)
sleeping_function;
dlclose(lib_handle);
delete_file(lib_name);
}
For windows dlopen becomes LoadLibrary and dlsym becomes GetProcAddress etc... but the basic idea would still work.

In general, this is a bad idea. Global data isn't the only issue that may prevent a non thread-safe library from running in a multithreaded environment.
As one example, what if the library had a global variable that points to a memory-mapped file that it always maps into a single, hardcoded address. In this case, with your technique, you would have one global variable per thread, but they would all point to the same memory location, which would be trashed by multi-threaded access.

Creating an object in shared memory inside a Shared Lib (so) in C++

Is it possible to share a single 'god' instance among everyone that links to this code, to be placed in a shared object?
god* _god = NULL;
extern "C"
{
int set_log_level(int level)
{
if(_god == NULL) return -1;
_stb->log_level(level);
return 0;
}
int god_init(){
if(_god == NULL){
_god = new god(); //Magic happens here
}
}
}
Provided that I perform a lock synchronization at the beginning of every function, and considering that God itself can new/malloc other things, but those things will never be returned themselves to the caller (God mallocs only for internal use), what is the simplest way of doing this, if possible.
How can that be extended to an arbitrary number of programs linked to this shared library?

Boost Interprocess library has high(er) level, portable shared memory objects.

This isn't the correct approach at all. By doing what you suggest, the variable, yes, is global to the library, and thus the program, but the data is private to the actual running process. You won't be able to share the values across running programs. #grieve is referring to a global accessed by multiple threads, but threads share the same parent process instance.
Across actual processes, you need to break out to an OS specific shared memory facility.
Take a look at Shared Memory for details. It's a doable issue, but it's not particularly trivial to pull off. You'll also need a interprocess synchronization system like Semaphores as well to coordinate usage.

I have feeling that god will be a server of some kind. Consider using a proper client/server architecture, so as to keep god away from the masses.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How are global variables in shared libraries linked? - c++

Each process has it's own unique address space, so when a process access the variable it can have different values then the other process. If the process should share the same memory, they would have to specifically set this up. A shared library is not enough for that.

Related

Understanding Dynamic Library loading in Linux

Sharing heap memory in a dll between two separate applications

Converting a string into a function in c++

How to link non thread-safe library so each thread will have its own global variables from it?

Creating an object in shared memory inside a Shared Lib (so) in C++

Categories

Resources