Initialize before any global objects - c++

I am writing a C++ memory profiler.
As everyone knows, an executable program may contain many global objects,
and its DLL modules may contain global objects as
well. These global objects will be initialized with CRT
initialization - after the entry point and before WinMain; for DLLs,
the entry point is _DllMainCRTStartup, before you get into DllMain,
all the global objects of the DLL are initialized.
A global object may allocate memory. so a memory profiler
must be initialized before any global object initialization. After
doing a lot of researching, I found that this is not an easy job.
One idea is using CreateProcess with CREATE_SUSPENDED flag - try
to get a first chance. After that, use CreateRemoteThread to call
LoadLibrary in the target process to load an injection DLL, and
initialize that DLL. But it doesn't work because this will load
all the implicit-linking DLLs of the executable program first. Maybe CreateRemoteThread triggers this behavior?
So, how do we get the first chance?

There might be a way to do this otherwise using very platform-specific ways, but one way to solve the issue is to combine lazy initialization with dylib loading.
For example, say your memory allocator functions are exported like this:
API void* exported_alloc();
API void exported_free(void* mem);
... inside a dylib called mem.dll.
In this case, to ensure that all of your other dylibs can get to it when they are being loaded, we can create a central statically-linked library (ex: sdk.lib) that all of your dylibs link against with a header like so:
#ifndef MEMORY_H
#define MEMORY_H
// Memory.h
void* my_alloc();
void my_free(void* mem);
#endif
... which we can implement like so:
static void* (exported_alloc)() = 0;
static void (exported_free)(void* mem) = 0;
static void initialize()
{
if (!exported_alloc)
{
// Load 'mem.dll' (ex: 'LoadLibrary') and look up
// symbols for `exported_alloc` and `exported_free`
// (ex: GetProcAddress).
}
}
void* my_alloc()
{
initialize();
return exported_alloc();
}
void my_free(void* mem)
{
initialize();
exported_free(mem);
}
.. Then call FreeLibrary at the appropriate time when you're finished with the DLL. This incurs a bit of run-time overhead (similar to the overhead of accessing a singleton), but is a cross-platform solution (provided that you have a cross-platform means of loading/unloading dylibs/shared libs at runtime).
With this solution, all of your DLLs allocating memory at a global scope would then load mem.dll prior to performing any memory allocations in a lazy-initialized kind of way, ensuring that they all have access to your memory functions at the appropriate time.

Related

Can destruction order be controlled across dynamic libraries?

I am encountering an issue where my library is crashing due to executable that loads me calling my function after main exits. I'm wondering - can I control the lifecycle of my globals to not be destroyed until (and if) my library gets unloaded? I understand what I'm asking.
Basically, the executable looks something like this (roughly):
struct MyLibController
{
void *libhandle;
void (*function)() myLibFunction;
~MyLibController()
{
myLibFunction(); // Call to my library's exported function
dlclose(libhandle);
}
};
std::shared_ptr<MyLibController> globalPtr;
int main(int argc, const char **argv)
{
globalPtr = std::make_shared<MyLibController>();
// initializing globalPtr to dlopen my library, map function ptrs, etc.
// do some work with my library
return 0;
}
I have absolutely no control over the code in this executable.
My library code look something like this:
SomeType globalObject;
// Exported function via c interface in my library
void myLibFunction()
{
// crash occurs globalObject is used after executable's main function exits
globalObject.someFunction();
// do some work
}
I have a lot of control over the library code - but this is a simple example. The Sometype globalObject is very necessary (suppose that it's a mutex, used to sync myLibFunction and a bunch of others).
I'd like to ensure that my globalObject is valid even after executable's main function exits. Is this possible? If so, how?
P.S. I am aware that I can dynamically allocate the globalObject and leak it, which resolves the crash. It feels wrong though, and I don't want to sign off on it.
You can register for a callback when the main() returns using std::atexit(): http://en.cppreference.com/w/cpp/utility/program/atexit
For example, when your library is loaded, use atexit() to register, then when that callback fires, set a flag for yourself that you check before trying to do anything else. If the flag is set, ignore any other actions from the caller, because the program is shutting down.
Your best bet is to make your global object a reference counted singleton of some sort. Then update your interface to instantiate an object from your loaded library. This object can then grab a reference to the global object during construction, and thus only releases it after it has been destroyed. The global variable itself would also merely have a reference, thus during the dlclose() process the global reference would be released, but your object would still have a reference until it is destroyed.
#o11c's comment provided a working solution:
Using attribute((constructor)) and attribute((destructor))
allows you to specify a priority
In other words, I can have the control over lifecycle of my variables despite binary's main function exiting (I just have to dynamically allocate them and free them).

DLL unloading procedure

I did solve my immediate problem at hand, but now I need to understand why it is solved. ;-)
So here I have couple more questions.
Suppose I have a class that is being exported from the DLL. Now this DLL should be loaded into the memory every time I call:
MyExportedClass *pb = new MyExportedClass;
and it should stay in memory and becomes unloaded only when I call:
delete pb;
Is this correct?
If I understood this correctly and the answer to the previous question is yes, then what should happen in the following scenario:
I have an interface which exported from the dll (dll1) and I have its implementation which is exported from another dll (dll2). So every time I execute:
MyInterface *pInterface = new MyImplementation;
both those dlls should be loaded in memory and they should stay in memory until I call:
delete pInterface;
Is this correct?
Now, if the answer to this question is yes - do I have a control/saying, which library will be unloaded first and which one will be second? Or the unload will always happen right after calling destructor of the appropriate class?
Now, is there a tool which checks if the library becomes unloaded and at which point? I can probably just use the fake DllMain() and check its process_detach case, but my impression always was: use DllMain when the library exports function and don't use DllMain when the library exports classes. I used this approach since MSVC 5/6 (following one of the books about C++).
Was I wrong and I can still use DLLMain in both cases?
Thank you.
DLL loading can be automatic (if listed in the import table) or manual (using LoadLibrary). Some managed frameworks, like .NET, will silently call LoadLibrary for you when they see a DLL import declaration in their metadata, but C++ is not one of these. The closest thing C++ has is delay-loading, where the function that calls LoadLibrary is (by default, you can substitute your own) provided by the compiler.
On the other hand, DLL unloading is always manual. Deleting an object never implicitly unloads a DLL. You have to call FreeLibrary (or FreeLibraryAndExitThread). And you had better not call FreeLibrary while objects defined in that library are still in use.
Now, the COM system in Windows is a bit more complicated, because it manages DLL lifetime for you. But DLL lifetime is still not controlled by object deletion, rather by calling a DllCanUnloadNow function in the DLL. Usually you need to maintain a count of active objects in order to write that function correctly. But it still requires your manual implementation, and you can simply always return false to never unload (this is a bit of a pain during development because you have to close the entire application to try a new plugin version, etc... but actually freeing every usage of a DLL and successfully unloading it is rare anyway)
And certainly there is nothing that will automatically unload a DLL listed in the import table, those get loaded during process startup and never unloaded, no matter how many object instances are created or destroyed.

Sharing heap memory in a dll between two separate applications

Sorry if this question has been answered before; however all of the questions that are similar seem to be related to global or static variables in a DLL and sharing of those.
Is it possible to have one instance of a dll shared between two separate applications?
I have two applications (appA, appB) and a DLL (theDLL).
I am seeing if it is possible for appA to call a function in theDLL which then creates a variable on the heap and stores that in theDLL. At a later time, I would like to have appB connect to theDLL and be able to access that variable created earlier. Again, sorry if this answer is the same as static and global variables in dlls.
Here is some psuedo code:
(theDLL)
class VariableHolder
{
public:
void StoreVariable(int x)
{
mInt = new int(x);
}
int GetVariable()
{
return mInt;
}
private:
int mInt;
}
(appA)
int main()
{
...
(assuming we got access to a VariableHolder singleton created in theDLL)
theVarialbeHolder.StoreVariable(5);
...
}
(appB)
int main()
{
...
(assuming we got access to a VariableHolder singleton created in theDLL)
if (theVarialbeHolder.GetVariable() == 5)
{
cout << "Hurray, how did you know?";
}
...
}
This exactly is not possible - as the address spaces for the two processes are different (because they're virtual, having been created by the kernel), so a valid pointer in one won't work within the other. However, you can use shared memory to transport raw scalar data (strings, integers) between processes - here's how.
Yes, this is possible using shared memory. It doesn't need to use a shared DLL though.
Depending on the operating, the approaches are somewhat different:
On Windows, a shared file is used on mapped into memory (see Creating Named Shared Memory).
On Linux and Unix, there are direct functions to create shared memory areas, e.g. System V IPC. Just google for it.
Shared libraries on almost any modern operating system are implemented by shared read-only executable and data pages, mapped simultaneously into the address space of any process that uses the given library. On Windows though (in contrast to most Unix system) this sharing can also be extended to read-write data segments in DLLs, so it is possible to have global variables in a DLL, shared among all images that have the DLL loaded. To achieve this, there is a two-step process involved. First you tell the compiler to put the shared variables in a new named data section:
#pragma data_seg (".myshared")
int iscalar = 0;
int iarray[10] = { 0 };
#pragma data_seg ()
It is important to have all those variables statically intialised otherwise they will end up into the .bss section instead. Then you have to tell the linker that you'd like to have the .myshared section with shared read-write attributes using the /SECTION:.myshared,RWS option.
This mechanism is much simpler than creating and binding to named shared memory objects but it only allows to share statically allocated global variables - you cannot use it to share data on the heap as the heap is private to the process. For anything more complex you should use shared memory mappings, i.e. as shown on the MSDN page, linked in the answer from H2CO3.
This is not possible. The DLL can be shared in the 2 process but the data isn't. It's the code or program image (i.e. the logic or instructions) that is shared and not the data. Every Dll is mapped into the virtual address space of the process that loads it so the data either is on the data section of the process or on stack if it is local to the function. When a process is executing the address of the other process data is not visible.
You need to do some reading on virtual memory and how memory management unit(MMU) works. The OS, CPU, MMU works together to make it possible. The reliable way to do this is inter process communication. You can use shared memory where each process has a copy of data in form of virtual address but it is eventually mapped to same location into the real memory i.e the real address. The OS makes it possible.
This as #H2CO3 pointed out, is not possible because of different address spaces.
However, from your problem, it looks like you need either a surrogate process around that DLL or a Service and then different processes can connect to that surrogate process/exe and use the shared memory.
You must use shared memory (as was written above).
I recommend to use boost interprocess library. See documentation about shared memory - Shared memory between processes

How to link non thread-safe library so each thread will have its own global variables from it?

I have a program that I link with many libraries. I run my application on profiler and found out that most of the time is spent in "waiting" state after some network requests.
Those requests are effect of my code calling sleeping_function() from external library.
I call this function in a loop which executes many, many times so all waiting times sum up to huge amounts.
As I cannot modify the sleeping_function() I want to start a few threads to run a few iterations of my loop in parallel. The problem is that this function internally uses some global variables.
Is there a way to tell linker on SunOS that I want to link specific libraries in a way that will place all variables from them in Thread Local Storage?
I don’t think you’ll be able to achieve this with just the linker, but you might be able to get something working with some code in C.
The problem is that a call to load a library that is already loaded will return a reference to the already loaded instance instead of loading a new copy. A quick look at the documentation for dlopen and LoadLibrary seems to confirm that there’s no way to load the same library more than once, at least not if you want the image to be prepared for execution. One way to circumvent this would be to prevent the OS from knowing that it is the same library. To do this you could make a copy of the file.
Some pseudo code, just replace calls to sleeping_function with calls to call_sleeping_function_thread_safe:
char *shared_lib_name
void sleeping_function_thread_init(char *lib_name);
void call_sleeping_function_thread_safe()
{
void *lib_handle;
pthread_t pthread;
new_file_name = make_copy_of_file(shared_lib_name);
pthread_create(&pthread, NULL, sleeping_function_thread_init, new_file_name);
}
void sleeping_function_thread_init(char *lib_name)
{
void *lib_handle;
void (*)() sleeping_function;
lib_handle = dlopen(lib_name, RTLD_LOCAL);
sleeping_function = dlsym(lib_handle, "sleeping_function")
while (...)
sleeping_function;
dlclose(lib_handle);
delete_file(lib_name);
}
For windows dlopen becomes LoadLibrary and dlsym becomes GetProcAddress etc... but the basic idea would still work.
In general, this is a bad idea. Global data isn't the only issue that may prevent a non thread-safe library from running in a multithreaded environment.
As one example, what if the library had a global variable that points to a memory-mapped file that it always maps into a single, hardcoded address. In this case, with your technique, you would have one global variable per thread, but they would all point to the same memory location, which would be trashed by multi-threaded access.

Loading a dll from a dll?

What's the best way for loading a dll from a dll ?
My problem is I can't load a dll on process_attach, and I cannot load the dll from the main program, because I don't control the main program source. And therefore I cannot call a non-dllmain function, too.
After all the debate that went on in the comments, I think that it's better to summarize my positions in a "real" answer.
First of all, it's still not clear why you need to load a dll in DllMain with LoadLibrary. This is definitely a bad idea, since your DllMain is running inside another call to LoadLibrary, which holds the loader lock, as explained by the documentation of DllMain:
During initial process startup or after a call to LoadLibrary, the system scans the list of loaded DLLs for the process. For each DLL that has not already been called with the DLL_PROCESS_ATTACH value, the system calls the DLL's entry-point function. This call is made in the context of the thread that caused the process address space to change, such as the primary thread of the process or the thread that called LoadLibrary. Access to the entry point is serialized by the system on a process-wide basis. Threads in DllMain hold the loader lock so no additional DLLs can be dynamically loaded or initialized.
The entry-point function should perform only simple initialization or termination tasks. It must not call the LoadLibrary or LoadLibraryEx function (or a function that calls these functions), because this may create dependency loops in the DLL load order. This can result in a DLL being used before the system has executed its initialization code. Similarly, the entry-point function must not call the FreeLibrary function (or a function that calls FreeLibrary) during process termination, because this can result in a DLL being used after the system has executed its termination code.
(emphasis added)
So, this on why it is forbidden; for a clear, more in-depth explanation, see this and this, for some other examples about what can happen if you don't stick to these rules in DllMain see also some posts in Raymond Chen's blog.
Now, on Rakis answer.
As I already repeated several times, what you think that is DllMain, isn't the real DllMain of the dll; instead, it's just a function that is called by the real entrypoint of the dll. This one, in turn, is automatically took by the CRT to perform its additional initialization/cleanup tasks, among which there is the construction of global objects and of the static fields of the classes (actually all these from the compiler's perspective are almost the same thing). After (or before, for the cleanup) it completes such tasks, it calls your DllMain.
It goes somehow like this (obviously I didn't write all the error checking logic, it's just to show how it works):
/* This is actually the function that the linker marks as entrypoint for the dll */
BOOL WINAPI CRTDllMain(
__in HINSTANCE hinstDLL,
__in DWORD fdwReason,
__in LPVOID lpvReserved
)
{
BOOL ret=FALSE;
switch(fdwReason)
{
case DLL_PROCESS_ATTACH:
/* Init the global CRT structures */
init_CRT();
/* Construct global objects and static fields */
construct_globals();
/* Call user-supplied DllMain and get from it the return code */
ret = DllMain(hinstDLL, fdwReason, lpvReserved);
break;
case DLL_PROCESS_DETACH:
/* Call user-supplied DllMain and get from it the return code */
ret = DllMain(hinstDLL, fdwReason, lpvReserved);
/* Destruct global objects and static fields */
destruct_globals();
/* Destruct the global CRT structures */
cleanup_CRT();
break;
case DLL_THREAD_ATTACH:
/* Init the CRT thread-local structures */
init_TLS_CRT();
/* The same as before, but for thread-local objects */
construct_TLS_globals();
/* Call user-supplied DllMain and get from it the return code */
ret = DllMain(hinstDLL, fdwReason, lpvReserved);
break;
case DLL_THREAD_DETACH:
/* Call user-supplied DllMain and get from it the return code */
ret = DllMain(hinstDLL, fdwReason, lpvReserved);
/* Destruct thread-local objects and static fields */
destruct_TLS_globals();
/* Destruct the thread-local CRT structures */
cleanup_TLS_CRT();
break;
default:
/* ?!? */
/* Call user-supplied DllMain and get from it the return code */
ret = DllMain(hinstDLL, fdwReason, lpvReserved);
}
return ret;
}
There isn't anything special about this: it also happens with normal executables, with your main being called by the real entrypoint, which is reserved by the CRT for the exact same purposes.
Now, from this it will be clear why the Rakis' solution isn't going to work: the constructors for global objects are called by the real DllMain (i.e. the actual entrypoint of the dll, which is the one about the MSDN page on DllMain talks about), so calling LoadLibrary from there has exactly the same effect as calling it from your fake-DllMain.
Thus, following that advice you'll obtain the same negative effects of calling directly LoadLibrary in the DllMain, and you'll also hide the problem in a seemingly-unrelated position, which will make the next maintainer work hard to find where this bug is located.
As for delayload: it may be an idea, but you must be really careful not to call any function of the referenced dll in your DllMain: in fact, if you did that you would trigger a hidden call to LoadLibrary, which would have the same negative effects of calling it directly.
Anyhow, in my opinion, if you need to refer to some functions in a dll the best option is to link statically against its import library, so the loader will automatically load it without giving you any problem, and it will resolve automatically any strange dependency chain that may arise.
Even in this case you mustn't call any function of this dll in DllMain, since it's not guaranteed that it's already been loaded; actually, in DllMain you can rely only on kernel32 being loaded, and maybe on dlls you're absolutely sure that your caller has already loaded before the LoadLibrary that is loading your dll was issued (but still you shouldn't rely on this, because your dll may also be loaded by applications that don't match these assumptions, and just want to, e.g., load a resource of your dll without calling your code).
As pointed out by the article I linked before,
The thing is, as far as your binary is concerned, DllMain gets called at a truly unique moment. By that time OS loader has found, mapped and bound the file from disk, but - depending on the circumstances - in some sense your binary may not have been "fully born". Things can be tricky.
In a nutshell, when DllMain is called, OS loader is in a rather fragile state. First off, it has applied a lock on its structures to prevent internal corruption while inside that call, and secondly, some of your dependencies may not be in a fully loaded state. Before a binary gets loaded, OS Loader looks at its static dependencies. If those require additional dependencies, it looks at them as well. As a result of this analysis, it comes up with a sequence in which DllMains of those binaries need to be called. It's pretty smart about things and in most cases you can even get away with not following most of the rules described in MSDN - but not always.
The thing is, the loading order is unknown to you, but more importantly, it's built based on the static import information. If some dynamic loading occurs in your DllMain during DLL_PROCESS_ATTACH and you're making an outbound call, all bets are off. There is no guarantee that DllMain of that binary will be called and therefore if you then attempt to GetProcAddress into a function inside that binary, results are completely unpredictable as global variables may not have been initialized. Most likely you will get an AV.
(again, emphasis added)
By the way, on the Linux vs Windows question: I'm not a Linux system programming expert, but I don't think that things are so different there in this respect.
There are still some equivalents of DllMain (the _init and _fini functions), which are - what a coincidence! - automatically took by the CRT, which in turn, from _init, calls all the constructors for the global objects and the functions marked with __attribute__ constructor (which are somehow the equivalent of the "fake" DllMain provided to the programmer in Win32). A similar process goes on with destructors in _fini.
Since _init too is called while the dll loading is still taking place (dlopen didn't return yet), I think that you're subject to similar limitations in what you can do in there. Still, in my opinion on Linux the problem is felt less, because (1) you have to explicitly opt-in for a DllMain-like function, so you aren't immediately tempted to abuse of it and (2), Linux applications, as far as I saw, tend to use less dynamic loading of dlls.
In a nutshell
No "correct" method will allow you to reference to any dll other than kernel32.dll in DllMain.
Thus, don't do anything important from DllMain, neither directly (i.e. in "your" DllMain called by the CRT) neither indirectly (in global class/static fields constructors), especially don't load other dlls, again, neither directly (via LoadLibrary) neither indirectly (with calls to functions in delay-loaded dlls, which trigger a LoadLibrary call).
The right way to have another dll loaded as a dependency is to - doh! - mark it as a static dependency. Just link against its static import library and reference at least one of its functions: the linker will add it to the dependency table of the executable image, and the loader will load it automatically (initializing it before or after the call to your DllMain, you don't need to know about it because you mustn't call it from DllMain).
If this isn't viable for some reason, there's still the delayload options (with the limits I said before).
If you still, for some unknown reason, have the inexplicable need to call LoadLibrary in DllMain, well, go ahead, shoot in your foot, it's in your faculties. But don't tell me I didn't warn you.
I was forgetting: another fundamental source of information on the topic is the [Best Practices for Creating DLLs][6] document from Microsoft, which actually talks almost only about the loader, DllMain, the loader lock and their interactions; have a look at it for additional information on the topic.
Addendum
No, not really an answer to my question. All it says is: "It's not possible with dynamic linking, you must link statically", and "you musn't call from dllmain".
Which *is* an answer to your question: under the conditions you imposed, you can't do what you want. In a nutshell of a nutshell, from DllMain you can't call *anything other than kernel32 functions*. Period.
Although in detail, but I'm not really interested in why it doesn't work,
You should, instead, because understanding why the rules are made in that way makes you avoid big mistakes.
fact is, the loader is not resolving dependenies correctly and the loading process is improperly threaded from Microsoft's part.
No, my dear, the loader does its job correctly, because *after* LoadLibrary has returned, all the dependencies are loaded and everything is ready to be used. The loader tries to call the DllMain in dependency order (to avoid problems with broken dlls which rely on other dlls in DllMain), but there are cases in which this is simply impossible.
For example, there may be two dlls (say, A.dll and B.dll) that depend on each other: now, whose DllMain is to call first? If the loader initialized A.dll first, and this, in its DllMain, called a function in B.dll, anything could happen, since B.dll isn't initialized yet (its DllMain hasn't been called yet). The same applies if we reverse the situation.
There may be other cases in which similar problems may arise, so the simple rule is: don't call any external functions in DllMain, DllMain is just for initializing the internal state of your dll.
The problem is there is no other way then doing it on dll_attach, and all the nice talk about not doing anything there is superfluous, because there is no alternative, at least not in my case.
This discussion is going on like this: you say "I want to solve an equation like x^2+1=0 in the real domain". Everybody says you that it's not possible; you say that it's not an answer, and blame the math.
Someone tells you: hey, you can, here's a trick, the solution is just +/-sqrt(-1); everybody downvotes this answer (because it's wrong for your question, we're going outside the real domain), and you blame who downvotes. I explain you why that solution is not correct according to your question and why this problem can't be solved in the real domain. You say that you don't care about why it can't be done, that you can only do that in the real domain and again blame math.
Now, since, as explained and restated a million times, under your conditions your answer has no solution, can you explain us why on earth do you "have" to do such an idiotic thing as loading a dll in DllMain? Often "impossible" problems arise because we've chosen a strange route to solve another problem, which brings us to deadlock. If you explained the bigger picture, we could suggest a better solution to it which does not involve loading dlls in DllMain.
PS: If I statically link DLL2 (ole32.dll, Vista x64) against DLL1 (mydll), which version of the dll will the linker require on older operating systems?
The one that is present (obviously I'm assuming you're compiling for 32 bit); if an exported function needed by your application isn't present in the found dll, your dll is simply not loaded (LoadLibrary fails).
Addendum (2)
Positive on injection, with CreateRemoteThread if you wanna know. Only on Linux and Mac the dll/shared library is loaded by the loader.
Adding the dll as a static dependency (what has been suggested since the beginning) makes it to be loaded by the loader exactly as Linux/Mac do, but the problem is still there, since, as I explained, in DllMain you still cannot rely on anything other than kernel32.dll (even if the loader in general intelligent enough to init first the dependencies).
Still, the problem can be solved. Create the thread (that actually calls LoadLibrary to load your dll) with CreateRemoteThread; in DllMain use some IPC method (for example named shared memory, whose handle will be saved somewhere to be closed in the init function) to pass to the injector program the address of the "real" init function that your dll will provide. DllMain then will exit without doing anything else. The injector application, instead, will wait for the end of the remote thread with WaitForSingleObject using the handle provided by CreateRemoteThread. Then, after the remote thread will be ended (thus LoadLibrary will be completed, and all the dependencies will be initialized), the injector will read from the named shared memory created by DllMain the address of the init function in the remote process, and start it with CreateRemoteThread.
Problem: on Windows 2000 using named objects from DllMain is prohibited because
In Windows 2000, named objects are provided by the Terminal Services DLL. If this DLL is not initialized, calls to the DLL can cause the process to crash.
So, this address may have to be passed in another manner. A quite clean solution would be to create a shared data segment in the dll, load it both in the injector application and in the target one and have it put in such data segment the required address. The dll would obviously have to be loaded first in the injector and then in the target, because otherwise the "correct" address would be overwritten.
Another really interesting method that can be done is to write in the other process memory a little function (directly in assembly) that calls LoadLibrary and returns the address of our init function; since we wrote it there, we can also call it with CreateRemoteThread because we know where it is.
In my opinion, this is the best approach, and is also the simplest, since the code is already there, written in this nice article. Have a look at it, it is quite interesting and it probably will do the trick for your problem.
The most robust way is to link the first DLL against the import lib of the second. This way, the actual loading of the second DLL will be done by Windows itself. Sounds very trivial, but not everyone knows that DLLs can link against other DLLs. Windows can even deal with cyclic dependencies. If A.DLL loads B.DLL which needs A.DLL, the imports in B.DLL are resolved without loading A.DLL again.
I suggest you to use delay-loading mechanism. The DLL will be loaded at the fisrt time you call imported function. Moreover you can modify load function and error handling. See Linker Support for Delay-Loaded DLLs for more info.
One possible answer is through the use of LoadLibrary and GetProcAddress to access pointers to functions found/located inside the loaded dll - but your intentions/needs aren't clear enough to determine if this is a suitable answer.