I SEEM to be having an issue with GlobalLock in my application. I say seem because I haven't been able to witness the issue by stepping through yet but when I let it run it breaks in one of two locations.
The app has multiple threads (say 2) simultaneously reading and writing bitmaps from PDF files. each thread handles a different file.
The first location it breaks I am reading a dib from the pdf to be OCRed. OCR is reading the characters on the bitmap and turning them into string data. The second location is when a new PDF is being created with the string data being added over the bitmap.
GlobalLock is being used on a HANDLE created by the following:
GlobalAlloc(GMEM_MOVEABLE, uBytes);
I either get an AccessViolationError (always in the first instance) or I get GlobalLock returning a NULL pointer. (The second occurance)
It seems like one file is being read and another is having a copy written at the same time. There seems to be no pattern to which files it happens on.
Now I understand that the VC++ runtime has been multithreaded since 2005 (I am using VS2010 with 2008 toolchain). But is GlobalLock part of the runtime? It seems to me more like a platform independent thing.
I want to avoid just putting a CRITICAL_SECTION around globallock and globalunlock to get them to work, or at least not know why I am doing so.
Can anyone inform me better about GlobalLock/Unlock?
-A fish out of water
First, the Global* heap routines are provided for compatibility with 16-bit windows. They still work, but there's no real reason to use them anymore, except for compatibility with routines that still use global heap object handles. Note that GlobalLock/GlobalUnlock are not threading locks - they prevent the memory from moving, but multiple threads can GlobalLock the same object at the same time.
That said, they are otherwise thread-safe; they take a heap lock internally, so there is no need to wrap your own locking around every Global* call. If you are having problems like this, it suggests you may be trying to GlobalLock a freed object, or you may be corrupting the heap (heap overflows, use-after-free, etc). You may also be missing thread synchronization on the contents of the heap object - the Global* API does not prevent multiple threads from accessing or modifying the same object at once.
Related
The boost::interprocess::managed_shared_memory manual and most other resources I checked always shows examples where there is a parent process and a bunch of children spawned by it.
In my case, I have several processes spawned by a third part application and I can only control the "children". It means I cannot have a central brain to allocate and deallocate the shared memory segment. All my processes must be able to do so (Therefore, I can't erase data on exit).
My idea was to open_or_create a segment and, using a lock stored (find_or_construct'ed) in this area, I check a certain hash to see if the memory area was created by this same software version.
If this is not true, the memory segment must be wiped to avoid breaking code.
Ideally, I would want to keep the lock object because there could be other processes already waiting on it.
Things I though:
List all object names and delete all but the lock.
This can not be done because the objects might be using different implementations
Also I couldn't find where to list the names.
Use shared_memory_object::truncate
I could not find much about it
By using a managed_shared_memory, I don't know how reliable it would be because I'm not sure the lock was the first allocated data.
Refcount the processes and wipe data on last one
Prone to fatal termination problems.
Use a separated shared memory area just for this bookkeeping.
Sounds reasonable, but overkill?
Any suggestions or insights?
This sounds like a "shared ownership" scenario.
What you'd usually think of in such a scenario, would be shared pointers:
http://www.boost.org/doc/libs/1_58_0/doc/html/interprocess/interprocess_smart_ptr.html#interprocess.interprocess_smart_ptr.shared_ptr
Interprocess has specialized shared pointers (and ditto make_shared) for exactly this purpose.
Creating the shared memory realm can be done "optimistically" from each participating process (open_or_create). Note that creation needs to be synchronized. Further segment manager operations are usually already implicitly synchronized:
Whenever the same managed shared memory is accessed from different processes, operations such as creating, finding, and destroying objects are automatically synchronized. If two programs try to create objects with different names in the managed shared memory, the access is serialized accordingly. To execute multiple operations at one time without being interrupted by operations from a different process, use the member function atomic_func() (see Example 33.11).
I'm trying to understand something about HGLOBALs, because I just found out that what I thought is simply wrong.
In app A I GlobalAlloc() data (with GMEM_SHARE|GMEM_MOVABLE) and place the string "Test" in it. Now, what can I give to another application to get to that data?
I though (wrongfully!) that HGLOBALs are valid in all the processes, which is obviously wrong, because HGLOBAL is a HANDLE to the global data, and not a pointer to the global data (that's where I said "OHHHH!").
So how can I pass the HGLOBAL to another application?
Notice: I want to pass just a "pointer" to the data, not the data itself, like in the clipboard.
Thanks a lot! :-)
(This is just a very long comment as others have already explained that Win32 takes different approach to memory sharing.)
I would say that you are reading into books (or tutorials) on Windows programming which are quite old and obsolete as Win16 is virtually dead for quite some time.
16-bit Windows (3.x) didn't have the concept of memory isolation (or virtual /flat/ address space) that 32-bit (and later) Windows versions provide. Memory there used to be divided into local (to the process) and global sections, both living in the same global address space. Descriptors like HGLOBAL were used to allow memory blocks to be moved around in physical memory and still accessed correctly despite their new location in the address space (after proper fixation with LocalLock()/GlobalLock()). Win32 uses pointers instead since physical memory pages can be moved without affecting their location in the virtual address space. It still provides all of the Global* and Local* API functions for compatibility reasons but they should not be used anymore and usual heap management should be used instead (e.g. malloc() in C or the new operator in C++). Also several different kind of pointers existed on Win16 in order to reflect on the several different addressing modes available on x86 - near (same segment), far (segment:offset) and huge (normalised segment:offset). You can still see things like FARPTR in legacy Win16 code that got ported to Win32 but they are defined to be empty strings as in flat mode only near pointers are used.
Read the documentation. With the introduction of 32-bit processing, GlobalAlloc() does not actually allocate global memory anymore.
To share a memory block with another process, you could allocate the block with GlobalAlloc() and put it on the clipboard, then have the other process retreive it. Or you can allocate a block of shared memory using CreateFileMapping() and MapViewOfFile() instead.
Each process "thinks" that it owns the full memory space available on the computer. No process can "see" the memory space of another process. As such, normally, nothing a process stores can be seen by another process.
Because it can be necessary to pass information between processess, certain mechanisms exists to provide this functionality.
One approach is message passing; one process issues a message to another, for example over a pipe, or a socket, or by a Windows message.
Another is shared memory, where a given block of memory is made available to two or more processes, such that whatever one process writes can be seen by the others.
Don't be confused with GMEM_SHARE flag. It does not work the way you possibly supposed. From MSDN:
The following values are obsolete, but are provided for compatibility
with 16-bit Windows. They are ignored.
GMEM_SHARE
GMEM_SHARE flag explained by Raymond Chen:
In 16-bit Windows, the GMEM_SHARE flag controlled whether the memory
should outlive the process that allocated it.
To share memory with another process/application you instead should take a look at File Mappings: Memory-mapped files and how they work.
I have been looking for a way to dynamically load functions into c++ for some time now, and I think I have finally figure it out. Here is the plan:
Pass the function as a string into C++ (via a socket connection, a file, or something).
Write the string into file.
Have the C++ program compile the file and execute it. If there are any errors, catch them and return it.
Have the newly executed program with the new function pass the memory location of the function to the currently running program.
Save the location of the function to a function pointer variable (the function will always have the same return type and arguments, so
this simplifies the declaration of the pointer).
Run the new function with the function pointer.
The issue is that after step 4, I do not want to keep the new program running since if I do this very often, many running programs will suck up threads. Is there some way to close the new program, but preserve the memory location where the new function is stored? I do not want it being overwritten or made available to other programs while it is still in use.
If you guys have any suggestions for the other steps as well, that would be appreciated as well. There might be other libraries that do things similar to this, and it is fine to recommend them, but this is the approach I want to look into — if not for the accomplishment of it, then for the knowledge of knowing how to do so.
Edit: I am aware of dynamically linked libraries. This is something I am largely looking into to gain a better understanding of how things work in C++.
I can't see how this can work. When you run the new program it'll be a separate process and so any addresses in its process space have no meaning in the original process.
And not just that, but the code you want to call doesn't even exist in the original process, so there's no way to call it in the original process.
As Nick says in his answer, you need either a DLL/shared library or you have to set up some form of interprocess communication so the original process can send data to the new process to be operated on by the function in question and then sent back to the original process.
How about a Dynamic Link Library?
These can be linked/unlinked/replaced at runtime.
Or, if you really want to communicated between processes, you could use a named pipe.
edit- you can also create named shared memory.
for the step 4. we can't directly pass the memory location(address) from one process to another process because the two process use the different virtual memory space. One process can't use memory in other process.
So you need create a shared memory through two processes. and copy your function to this memory, then you can close the newly process.
for shared memory, if in windows, looks Creating Named Shared Memory
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366551(v=vs.85).aspx
after that, you still create another memory space to copy function to it again.
The idea is that the normal memory allocated only has read/write properties, if execute the programmer on it, the CPU will generate the exception.
So, if in windows, you need use VirtualAlloc to allocate the memory with the flag,PAGE_EXECUTE_READWRITE (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx)
void* address = NULL;
address= VirtualAlloc(NULL,
sizeof(emitcode),
MEM_COMMIT|MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
After copy the function to address, you can call the function in address, but need be very careful to keep the stack balance.
Dynamic library are best suited for your problem. Also forget about launching a different process, it's another problem by itself, but in addition to the post above, provided that you did the virtual alloc correctly, just call your function within the same "loadder", then you shouldn't have to worry since you will be running the same RAM size bound stack.
The real problems are:
1 - Compiling the function you want to load, offline from the main program.
2 - Extract the relevant code from the binary produced by the compiler.
3 - Load the string.
1 and 2 require deep understanding of the entire compiler suite, including compiler flag options, linker, etc ... not just the IDE's push buttons ...
If you are OK, with 1 and 2, you should know why using a std::string or anything but pure char *, is an harmfull.
I could continue the entire story but it definitely deserve it's book, since this is Hacker/Cracker way of doing things I strongly recommand to the normal user the use of dynamic library, this is why they exists.
Usually we call this code injection ...
Basically it is forbidden by any modern operating system to access something for exceution after the initial loading has been done for sake of security, so we must fall back to OS wide validated dynamic libraries.
That's said, one you have valid compiled code, if you realy want to achieve that effect you must load your function into memory then define it as executable ( clear the NX bit ) in a system specific way.
But let's be clear, your function must be code position independant and you have no help from the dynamic linker in order to resolve symbol ... that's the hard part of the job.
I get a segmentation fault when attempting to delete this.
I know what you think about delete this, but it has been left over by my predecessor. I am aware of some precautions I should take, which have been validated and taken care of.
I don't get what kind of conditions might lead to this crash, only once in a while. About 95% of the time the code runs perfectly fine but sometimes this seems to be corrupted somehow and crash.
The destructor of the class doesn't do anything btw.
Should I assume that something is corrupting my heap somewhere else and that the this pointer is messed up somehow?
Edit : As requested, the crashing code:
long CImageBuffer::Release()
{
long nRefCount = InterlockedDecrement(&m_nRefCount);
if(nRefCount == 0)
{
delete this;
}
return nRefCount;
}
The object has been created with a new, it is not in any kind of array.
The most obvious answer is : don't delete this.
If you insists on doing that, then use common ways of finding bugs :
1. use valgrind (or similar tool) to find memory access problems
2. write unit tests
3. use debugger (prepare for loooong staring at the screen - depends on how big your project is)
It seems like you've mismatched new and delete. Note that delete this; can only be used on an object which was allocated using new (and in case of overridden operator new, or multiple copies of the C++ runtime, the particular new that matches delete found in the current scope)
Crashes upon deallocation can be a pain: It is not supposed to happen, and when it happens, the code is too complicated to easily find a solution.
Note: The use of InterlockedDecrement have me assume you are working on Windows.
Log everything
My own solution was to massively log the construction/destruction, as the crash could well never happen while debugging:
Log the construction, including the this pointer value, and other relevant data
Log the destruction, including the this pointer value, and other relevant data
This way, you'll be able to see if the this was deallocated twice, or even allocated at all.
... everything, including the stack
My problem happened in Managed C++/.NET code, meaning that I had easy access to the stack, which was a blessing. You seem to work on plain C++, so retrieving the stack could be a chore, but still, it remains very very useful.
You should try to load code from internet to print out the current stack for each log. I remember playing with http://www.codeproject.com/KB/threads/StackWalker.aspx for that.
Note that you'll need to either be in debug build, or have the PDB file along the executable file, to make sure the stack will be fully printed.
... everything, including multiple crashes
I believe you are on Windows: You could try to catch the SEH exception. This way, if multiple crashes are happening, you'll see them all, instead of seeing only the first, and each time you'll be able to mark "OK" or "CRASHED" in your logs. I went even as far as using maps to remember addresses of allocations/deallocations, thus organizing the logs to show them together (instead of sequentially).
I'm at home, so I can't provide you with the exact code, but here, Google is your friend, but the thing to remember is that you can't have a __try/__except handdler everywhere (C++ unwinding and C++ exception handlers are not compatible with SEH), so you'll have to write an intermediary function to catch the SEH exception.
Is your crash thread-related?
Last, but not least, the "I happens only 5% of the time" symptom could be caused by different code path executions, or the fact you have multiple threads playing together with the same data.
The InterlockedDecrement part bothers me: Is your object living in multiple threads? And is m_nRefCount correctly aligned and volatile LONG?
The correctly aligned and LONG part are important, here.
If your variable is not a LONG (for example, it could be a size_t, which is not a LONG on a 64-bit Windows), then the function could well work the wrong way.
The same can be said for a variable not aligned on 32-byte boundaries. Is there #pragma pack() instructions in your code? Does your projet file change the default alignment (I assume you're working on Visual Studio)?
For the volatile part, InterlockedDecrement seem to generate a Read/Write memory barrier, so the volatile part should not be mandatory (see http://msdn.microsoft.com/en-us/library/f20w0x5e.aspx).
In Java each object has a synchronisation monitor. So i guess the implementation is pretty condensed in term of memory usage and hopefully fast as well.
When porting this to C++ what whould be the best implementation for it. I think that there must be something better then "pthread_mutex_init" or is the object overhead in java really so high?
Edit: i just checked that pthread_mutex_t on Linux i386 is 24 bytes large. Thats huge if i have to reserve this space for each object.
In a sense it's worse than pthread_mutex_init, actually. Because of Java's wait/notify you kind of need a paired mutex and condition variable to implement a monitor.
In practice, when implementing a JVM you hunt down and apply every single platform-specific optimisation in the book, and then invent some new ones, to make monitors as fast as possible. If you can't do a really fiendish job of that, you definitely aren't up to optimising garbage collection ;-)
One observation is that not every object needs to have its own monitor. An object which isn't currently synchronised doesn't need one. So the JVM can create a pool of monitors, and each object could just have a pointer field, which is filled in when a thread actually wants to synchronise on the object (with a platform-specific atomic compare and swap operation, for instance). So the cost of monitor initialisation doesn't have to add to the cost of object creation. Assuming the memory is pre-cleared, object creation can be: decrement a pointer (plus some kind of bounds check, with a predicted-false branch to the code that runs gc and so on); fill in the type; call the most derived constructor. I think you can arrange for the constructor of Object to do nothing, but obviously a lot depends on the implementation.
In practice, the average Java application isn't synchronising on very many objects at any one time, so monitor pools are potentially a huge optimisation in time and memory.
The Sun Hotspot JVM implements thin locks using compare and swap. If an object is locked, then the waiting thread wait on the monitor of thread which locked the object. This means you only need one heavy lock per thread.
I'm not sure how Java does it, but .NET doesn't keep the mutex (or analog - the structure that holds it is called "syncblk" there) directly in the object. Rather, it has a global table of syncblks, and object references its syncblk by index in that table. Furthermore, objects don't get a syncblk as soon as they're created - instead, it's created on demand on the first lock.
I assume (note, I do not know how it actually does that!) that it uses atomic compare-and-exchange to associate the object and its syncblk in a thread-safe way:
Check the hidden syncblk_index field of our object for 0. If it's not 0, lock it and proceed, otherwise...
Create a new syncblk in global table, get the index for it (global locks are acquired/released here as needed).
Compare-and-exchange to write it into object itself.
If previous value was 0 (assume that 0 is not a valid index, and is the initial value for the hidden syncblk_index field of our objects), our syncblk creation was not contested. Lock on it and proceed.
If previous value was not 0, then someone else had already created a syncblk and associated it with the object while we were creating ours, and we have the index of that syncblk now. Dispose the one we've just created, and lock on the one that we've obtained.
Thus the overhead per-object is 4 bytes (assuming 32-bit indices into syncblk table) in best case, but larger for objects which actually have been locked. If you only rarely lock on your objects, then this scheme looks like a good way to cut down on resource usage. But if you need to lock on most or all your objects eventually, storing a mutex directly within the object might be faster.
Surely you don't need such a monitor for every object!
When porting from Java to C++, it strikes me as a bad idea to just copy everything blindly. The best structure for Java is not the same as the best for C++, not least because Java has garbage collection and C++ doesn't.
Add a monitor to only those objects that really need it. If only some instances of a type need synchronization then it's not that hard to create a wrapper class that contains the mutex (and possibly condition variable) necessary for synchronization. As others have already said, an alternative is to use a pool of synchronization objects with some means of choosing one for each object, such as using a hash of the object address to index the array.
I'd use the boost thread library or the new C++0x standard thread library for portability rather than relying on platform specifics at each turn. Boost.Thread supports Linux, MacOSX, win32, Solaris, HP-UX and others. My implementation of the C++0x thread library currently only supports Windows and Linux, but other implementations will become available in due course.