Scenario: Global variables in DLL which is used by Multi-threaded Application

Scenario: Global variables in DLL which is used by Multi-threaded Application - c++

Few months back, I had come across this interesting scenario asked by a guy (on orkut). Though, I've come up with a "non-portable" solution to this problem (have tested it with small code), but still would like to know what you guys have to say and suggest.
Suppose, I created a DLL, exporting some functionalities, written in C++, for single threaded client. This DLL declares lots of global variables, some maybe const variables (read-only) and others are modifiable.
Anyway, later things changed and now I want the same DLL to work with multi-threaded application (without modifying the DLL); that means, several threads access the functions and global variables from the DLL, and modify them.. and so on. All these may cause global variables to hold inconsistent values.
So the question is,
Can we do something in the client code to prevent multi-access of the DLL, and at the same time, ensuring that each thread runs in it's own context (meaning, when it gets access to the DLL, the DLL's global values are same as it was before)?

Sure, you can always create a wrapper-layer handling multi-threading specific tasks such as locking. You could even do so in a second DLL that links with the original one, and then have the final project link with that new DLL.
Be aware that no matter how you implement it, this won't be an easy task. You have to know exactly which thread is able to modify which value at what time, who is able to read what and when etc. unless you want to run into problems like deadlocks or race conditions.
If you're solution allows it, it's often best to assign a single thread to modify any data, and have all others just read and never write, as concurrent reading access is always easier to implement than concurrent writing access (Boost provides all basic functionality to do so, for example shared_mutex).

Can we do something in the client code to prevent multi-access of the DLL, and at the same time, ensuring that each thread runs in it's own context (meaning, when it gets access to the DLL, the DLL's global values are same as it was before)?
This is the hard part. I think the only way top do this would be to create a wrapper around teh existing DLL. When it is called, it would restore the state (global variables) for the current thread, and save them when the call to the DLL returns. You would need to know all of the state variables in the DLL, and be able to read/write them.
If performance is not an issue, a single lock for the entire DLL would suffice, and be the easiest to implement correctly. That would ensure that only one thread was accessing (reading or writing) the DLL at one time.

Related

C++ Injecting a dll, do you need threads?

I'm a little new to it and I don't understand the threading term completely yet, Allthough I know how to make a thread and run programs with multiple threads. What I'm wondering about is that when you create a dll file (c++) and inject it into a process (lets say gamehacking) for instance. Would you need to create threads in the dll file, or is that not going to work? After my understanding the main thread will be running from the host process right? Or how does it work?

Well it depends on what you are planning to achieve using the DLL. If that particular DLL has some static functions / utility class, that just takes an input, doing some calculations / processing and produce an output, then there is no need of threading here.
But if that DLL is going to listen on a socket or write to a file or do the actual work that going to need some parallelism, then you might want to create threads inside that DLL.
Basically you must need to understand, what is that task, that is accomplished by this DLL. A DLL can be linked during compile time as a normal library or it can be loaded dynamically in run-time based on your need / use-case.
To answer your question,
Would you need to create threads in the dll file, or is that not going to work?
Ans : Not always. You need to create thread to accomplish some task. That being said, this is not the case always. It is perfectly feasible to run a DLL inside a process, without having any threads.
After my understanding the main thread will be running from the host process right? Or how does it work?
Ans : That's right. Any process you run, there will be one thread by default. If your application is simple enough to be processed by a single thread, then it is a blessing. Keep up with it :)

Since you are specifically refering to injecting the DLL, I have some input for you additionally to what has already been said.
First, let's make sure that the concepts of threads, processes and modules are clear.
A thread is basically the immediate environment in which code runs. Things like the current state of processor registers and stack variables (e.g. your local variables in functions, in most cases, but also where in the code the execution currently stands) belong to a thread. There are also other resources which often have thread-affinity, such as windows. It depends a lot on the resource in question whether and what kind of thread affinity they have.
Let's assume you write a simple hello world program. It will run in one thread which goes through your program from the beginning to the end and print "Hello World". Now let's assume you would want to write a program which slowly writes "Hello World", one character per second, but in the meantime download a file. Then you could create a second thread and have one thread output "Hello World" slowly, and one thread download the file. This means execution can happen in parallel, with different local state - one thread is currently inside your printHelloWorld function and one thread is inside downloadFile.
A process is basically a container for one or more threads. It bundles them together in a shared environment which uses the same virtual memory (this means that for example global variables in your code would be accessible from all threads, but this would require careful synchronization to avoid race conditions) and shares resources such as file handles the threads in the process create. So, your hello-world-and-download program from before would have 2 threads in 1 process, sharing the console for example, and being seen in the task manager as one entity.
A module is a file which contains executable code (in most cases, that is) and is loaded into a process. Usually, in a process there are one EXE file and several DLL files loaded as modules. DLL files and EXE files are technically almost the same, but EXE files are meant to be the basis from which a processes starts, and DLL files are meant to be libraries exporting certain functions which can be used by other modules. Since I said modules are loaded into processes, it means that a module is accessible by all threads in the process, and it doesn't have thread-affinity by its own - in our previous example, when the second thread downloads the file, it may be calling into a HTTP-networking DLL, whose code would then run in the second thread. There is a number of modules which is loaded automatically into each process by Windows, and others are probably loaded by certain features of your compiler.
OK, so, back to your question:
Would you need to create threads in the dll file[...]?
Per se, using a DLL has nothing to do with whether you need to create new threads or not. It depends on what you want to do - if you need to do some time-consuming task in parallel to whatever other code is running, then you would need to create a new thread for it, otherwise there is no need.
[...]or is that not going to work?
As said, you can create new threads if you want (it will work), but it's not a necessity coming with using a DLL.
After my understanding the main thread will be running from the host process right?
The main thread of the host process will of course be in the host process. (Although there is technically no "main thread", since it's perfectly valid to have the first thread in a process create a second one and then terminate, so only the second one would be running anymore, you usually do have the first thread live through the whole lifetime of the process, and you can probably call it the "main thread" in this case.) In which module the currently-running code is located, though, will depend on what the thread is currently doing.
Let me get back to the matter of "injecting": The previous answers appear to have assumed a more "normal" environment where your DLL is just linked to the program and meant to be loaded by it. In this case, your DLL's initialization routine (which is automatically run when a module is loaded into a process) would just be run in the "main thread", probably before the actual work of the process begins.
However, things are a bit different when you inject a DLL. It depends on how you do the injection:
If you inject the DLL by modifying the imports table of the host EXE, then your DLL will be loaded the "normal" way I just said. So you can expect your initialization routine to run during the process' startup, in the main thread.
If you inject the DLL by using the AppInit_DLLs registry key, it would be the same.
Same thing if you inject the DLL by starting the host process suspended, then writing a stub to load the DLL into the processes' memory and using SetThreadContext to point the instruction pointer to it.
If you inject the DLL through means of remotely calling LoadLibrary inside the target process, using CreateRemoteThread, then however, as the name suggests, you are creating a new thread inside the process. In this thread, LoadLibrary will load your DLL and also call your initialization routine, so in this case, your initialization routine would indeed run in a new thread other than the "main thread".

Every process has at least one thread. When that process starts, it's possible to link a bunch of functions, or a library, to the memory space of that process. That's what a dll is. The advantage compared to linking directly to the binary is the library only has to exist in one place in the file system and one place in memory while being used by multiple processes. It's a linking technique, similar to how .so files are used in Linux. It has nothing to do with threading.

Would you need to create threads in the dll file, or is that not going to work?
There wouldn't be any point loading a DLL that didn't contain code that would be run. That said, there are several ways the DLL code might get run:
when the DLL is loaded it gets a chance to run some initialisation code
during initialisation, it might:
start one or more threads, which can keep running - perhaps watching for some event that triggers some action on their part
register for callbacks from the OS or application, such as setting up signal handlers, keystroke handlers, any type of event handler....
it might contain functions that the program will look for dynamically and run, mistaking your DLL code for the original versions of those functions that the program came with
Which of these suits your needs depends entirely on what your DLL is trying to achieve, and what's technically necessary to achieve it. For example, if watching for some memory to have specific content, then modifying it further, it might suffice to have a function in your DLL called by an OS alarm service, resetting itself to go off again later if the triggering memory content is not found. But, the trigger might be existence of a file, or shared memory service, a socket being created etc..
After my understanding the main thread will be running from the host process right? Or how does it work?
Yes - threads started within a process - including any DLL initialisation routines - are also within the process. There are some library functions that may create other processes - such as fork, popen, system - that may contain their own threads.

How to create more than one instance with own copy of global variables

I have two projects:
Embedded one, written in C++, which uses a lot of static/global variables.
Second one, running on PC and using the same source codes as embedded one uses.
It works very well.
But now second project should run more than one instances of embedded project. Furthermore each instance should have its own copy of static/global variables, and I should be able to interact with each instance in one program scope. I don't know how to do this with all that static/global variables.
Is there any simple way to solve my problem?

There are several ways you can solve this:
Spawn multiple processes (each with their own set of globals) and setup channels of communication between them and the main program.
Get rid of the global variables. The easiest way to do this would be to dump them all in a class (as non-static members) and use instances of that class to access each set of variables.
Either way, it's not a small problem to solve if you have a large number of globals.

Run two separate processes and use some form of IPC to communicate between the the processes. In Windows IPC mechanisms available include:
Clipboard
COM
Data Copy
DDE
File Mapping
Mailslots
Pipes
RPC
Windows Sockets
See here for details of each of these. Similar mechanisms are available in other operating systems.
A perhaps simpler alternative is to run each instance in a separate thread and place the globals in thread local storage.
In all cases however, you should avoid nit just "a lot" but any global variables. It is generally indicative of poor design. See this article for why globals are bad, and ways to avoid them.

As the other answers state the best solution is to get rid of the globals, but I understand that this is not always feasible.
I ran into the exact same problem with our code base.
The solution I used was to build each instance as a separate DLL.
Then load I loaded each DLL with LoadLibrary() at runtime.
In this way you can get everything to run in a single process and have multiple version of the same globals and singletons.
And then you don't need to use any IPC but can pass data between the instances with a simple function call. It also makes the debugging easier, because you can see everything in one debugger.
NOTE: I made it on Windows, but I assume the something similar is possible on Unix.

Multi thread C++ dll global variable causing crash

I have a small C++ dll that has 2 callback functions that retrieve information from another dll.
These 2 callback functions are being called repeatedly in more than 1 thread.
They both add information to the same global Cstring variable.
I have another function that the program that uses this dll will call that reads this variable.
It is rare, but sometimes I get a crash and its definitely due to this global variable being read/written to at the same time by 2 different functions.
I am not very experienced with multithreads, so I don't really know what to do.
Any suggestions?
here is a previous question I posted about the same problem with a bit more info..(and some of the code).
One of the users assisted me in confirming that it was a multithread issue and we didnt get much further than that.
C++ DLL crash (reading/writing crash related I think)

Have a read of Thread Synchronization for Beginners.
If you're using MFC then CMutex may be appropriate.

You have to create a critical section on this variable. In Windows, you can do it by using Mutex Objects.

Odd issue with std::map and thread safety

This isn't so much of a problem now as I've implemented my own collection but still a little curious on this one.
I've got a singleton which provides access to various common components, it holds instances of these components with thread ID's so each thread should (and does, I checked) have it's own instance of the component such as an Oracle database access library.
When running the system (which is a C++ library being called by a C# application) with multiple incoming requests everything seems to run fine for a while but then it crashes out with an AccessViolation exception. Stepping through the debugger the problem appears to be when one thread finishes and clears out it's session information (held in a std::map object) the session information held in a separate collection instance for the other thread also appears to be cleared out.
Is this something anyone else has encountered or knows about? I've tried having a look around but can't find anything about this kind of problem.
Cheers

Standard C++ containers do not concern themselves with thread safety much. Your code sounds like it is modifying the map instance from two different threads or modifying the map in one thread and reading from it in another. That is obviously wrong. Use some locking primitives to synchronize the access between the threads.

If all you want is a separate object for each thread, you might want to take a look at boost::thread_specific_ptr.

How do you manage giving each thread its own session information? Somewhere under there you have classes managing the lifetimes of these objects, and this is where it appears to be going wrong.

A Wrapper to hardware functions

I'm developing a project and I have to make a wrapper to some hardware functions.
We have to write and read data to and from a non-volatile memory. I have a library with the read and write functions from the seller company. The problem is that these functions should be called with a delay between each call due to hardware characteristics.
So my solution is to start a thread, make a queue and make my own read and write functions. So every time my functions are called, the data will be stored on the queue and then in the loop thread will be actually read or written on the memory. My functions will use a mutex to synchronize the access to the queue. My wrapper is going to be on a dll. The main module will call my dll init function once to start the thread, and then it will call my read/write functions many times from different threads.
My questions is: Is it safe to do this? the original functions are non reentrant. I don't know if this is going to be a problem. Is there a better way to do this?
Any help will be appreciated.
Sorry I forgot something:
-The language to be used is C++
-The main program will call my wrapper dll but also will call other modules (dlls) that are going to call the wrapper dll.

Adding a mediator in this context is a pretty typical solution so you aren't out in the weeds here. I would say you would need to implement this because the original functions are not reentrant. Assuming, of course, that you own the access to the hardware. (i.e. You are the driver.) If other people can get access to the same piece of hardware, then you're going to have to come up with some higher level contract. Your thread then provides the ordered access to the driver. You'll find that the mediator will also allow you to throttle.
The hard part it seems is knowing when it is okay to make the next call to the device. Does it have some sort of flag to let you know it is ready for reads and writes? Some other questions: How do you plan to communicate state to your clients? Since you are providing an async interface, you'll need to have some sort of error callback registration, etc. Take a look at a normal async driver interface for ideas.
But overall, sounds like a good strategy to start with. As another poster mentioned, more specifics would be nice.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js