Windows WriteFile problem when using threads - c++

My company is developing a hardware that needs to communicate with software. To do this, we have made a driver that enables writing to and reading from the hardware. To access the driver, we use the command:
HANDLE device = CreateFile(DEVICE_NAME,
GENERIC_READ | GENERIC_WRITE,
0x00000007,
&sec,
OPEN_EXISTING,
0,
NULL);
Reading and writing is done using the functions:
WriteFile(device,&package,package.datasize,&bytesWritten,NULL);
and
ReadFile(device,returndata,returndatasize,&bytesRead,NULL);
And finally, CloseHandle(device), to close the file.
This works just fine in the case where the functions are called from the main thread. If they are called from some other thread, we get error 998 (no_acccess) when trying to Write more than a couple of elements. The threads are created using
CreateThread(NULL, 0, thread_func, NULL, 0, &thread_id);
I'm running out of ideas here, any suggestions?
edit:
When running the following sequence:
Main_thread:
CreateFile
Write
Close
CreateThread
WaitForThread
Thread_B:
CreateFile
Write
Close
Main_Thread succeeds and Thread_B does not. However, when writing small sets of data, this works fine. May this be because Thread_B does not inherit all of Main_Thread's access privileges?
edit2:
a lot of good thinking going on here, much appreciated! After some work on this problem, the following seems to be the case:
The api contains a Queue-thread, handling all packages going to and from the device. This thread handles pointers to package-objects. When a pointer reaches the front of the queue, a "send_and_get" function is called. If the arrays in the package is allocated in the same thread that calls the "send_and_get" function, everything works fine. If the arrays are allocated in some other thread, sending fails. How to fix this, though, I don't know.

According to winerror, Win32 error 998 is one of the following native status values (which would be returned by the O/S or the driver):
998 ERROR_NOACCESS <--> 0x80000002 STATUS_DATATYPE_MISALIGNMENT
998 ERROR_NOACCESS <--> 0xc0000005 STATUS_ACCESS_VIOLATION
998 ERROR_NOACCESS <--> 0xc00002c5 STATUS_DATATYPE_MISALIGNMENT_ERROR
Access violation might be a likely candidate based on you saying, "when trying to Write more than a couple of elements." Are you sure the buffer that you're sending is large enough?
The alignment errors are fairly exotic, but might be relevant if the device has some alignment requirements and the developer chose to use these particular errors.
-scott

Still sounds to me like it's concurrent access.
Your separate threads writing to this device will need to properly protect access to the file using a mutex or similar. Either open the handle in the main thread and leave it open or protect the whole Open -> Write -> Close sequence that can occur in each thread (with a mutex).

As a debugging measure, since it's your own driver, you could get the driver to log the requests it is receiving, e.g., into the event log. Set up two test runs which are identical except that one runs all the code in the main thread and the other runs all the code in a second thread. Comparing the results should give you a better insight into what is happening.
It would also be a good idea to get your driver to report any error codes that it is returning to the operating system.

First thing that you should check is if the error (998) reported by your driver or by the kernel-mode I/O manager (which is responsible to initiate the IRP and call your driver) even before the request reaches your driver. You should be able to discover this since this is your driver. Just log the calls to the driver's Dispatch routine, what it returns, what it does (does it call other drivers or calls IoCompleteRequest with an error code or etc.) and things should become clear.
From the scenario that you describe it seems that most likely the error is caused by your driver. For instance, your driver may allocate some global state structure on a response to CreateFile (which is driver's IRP_MJ_CREATE), and purge it when the file is closed. Such a driver won't function correctly if simultaneously two files are opened, then one is closed whereas the second still receives I/O requests.

Related

Rare EXCEPTION_ACCESS_VIOLATION when debugging any process started with CREATE_SUSPENDED

While writing an x86 WinAPI-based debugger, I've encountered a rare condition when the debuggee (which usually works well) suddenly terminates with EXCEPTION_ACCESS_VIOLATION after I attach to it with my native debugger. I can stably reproduce this on any applications it seems (tried on .NET Hello World-styled application and on notepad.exe on multiple Windows 10 machines).
Essentially I've written a simple WaitForDebugEvent loop:
CreateProcessW(L"C:\\Windows\\SYSWOW64\\notepad.exe", […], CREATE_SUSPENDED, […]);
DebugActiveProcess(processId);
DEBUG_EVENT debugEvent = {};
while (WaitForDebugEvent(&debugEvent, INFINITE)) {
switch (debugEvent.dwDebugEventCode) {
// log all the events
}
ContinueDebugEvent(debugEvent.dwProcessId, debugEvent.dwThreadId, DBG_EXCEPTION_NOT_HANDLED);
}
DebugActiveProcessStop(processId);
(here's the full listing: I won't paste it all here, because there's some additional non-essential boilerplate there; the MCVE is 136 lines long)
For the sake of an example, I'll just log all the debugger events and detect whether the debuggee is ready to "proceed normally" or it will terminate due to an exception.
Most of the time, my debugging session looks like that:
CREATE_PROCESS_DEBUG_EVENT (which reports creation of both the process and its initial thread)
LOAD_DLL_DEBUG_EVENT (I was never able to get the name for this DLL, but this is documented in MSDN)
CREATE_THREAD_DEBUG_EVENT (which, I suspect, is a thread injected by debugger)
LOAD_DLL_DEBUG_EVENT […] — after this, many DLLs get loaded into the target process and everything looks okay, the process works as intended
But sometimes (in about 1.5% of all runs), the event sequence changes:
CREATE_PROCESS_DEBUG_EVENT
LOAD_DLL_DEBUG_EVENT
CREATE_THREAD_DEBUG_EVENT
EXCEPTION_DEBUG_EVENT: EXCEPTION_ACCESS_VIOLATION (which I never was able to gather details for: it reports a DEP violation, and the address is empty)
After that, I cannot proceed with debugging, because my debuggee is in exception state and will terminate soon. I was never able to catch notepad.exe crash without my debugger attached (and I doubt it is that bad and will crash for no reason), so I suspect that my debugger causes these exceptions.
One bizarre detail is that I could "fix" the situation by calling Sleep(1) immediately after WaitForDebugEvent. So, this is possibly some sort of race condition, but race condition between what? Between the debugger thread and other threads in the debuggee? Is it a thing? How are we supposed to debug other applications, then? How could actual debuggers work if it is a thing?
I couldn't reproduce the issue with the same code compiled for x64 CPU (and debugging an x64 process).
What could actually cause this erroneous behavior? I've carefully read the documentation about the API functions I call, and checked some other debugger examples online, but still wasn't able to find what's wrong with my debugger: it looks like I follow all the right conventions.
I have tried to debug my debuggee with WinDBG while it is still paused in my debugger, but had no luck doing that. First of all, it's difficult to attach to the debuggee with another debugger (WinDBG only allows to use non-intrusive mode, which is less functional it seems?), and the call stacks for the process' threads aren't usually meaningful.
Steps to reproduce
Checkout this repository, compile with MSVC and then execute in cmd:
Debug\NetRuntimeWaiter.exe > log.txt
It is important to redirect output to the log file and not show it in the terminal: without that, timings for the log writer get changed, and the issue won't reproduce (due to a possible race condition I mentioned earlier?).
Usually the program will start and terminate 1000 notepads in about 10 seconds, and 10-15 of 1000 invocations will hold the error condition (i.e. EXCEPTION_ACCESS_VIOLATION).
the DebugActiveProcess (and undocumented DbgUiDebugActiveProcess which is internally called by DebugActiveProcess) have serious design problem: after calling NtDebugActiveProcess it create remote thread in the target process, via DbgUiIssueRemoteBreakin call - as result new thread in target process is created - DbgUiRemoteBreakin - this thread call DbgBreakPoint and then RtlExitUserThread
all this not documented and explained, only this note from DebugActiveProcess:
After all of this is done, the system resumes all threads in the
process. When the first thread in the process resumes, it executes a
breakpoint instruction that causes an EXCEPTION_DEBUG_EVENT
debugging event to be sent to the debugger.
of course this is wrong. why is DbgUiRemoteBreakin first (??) thread ? and which thread resume first undefined. why not exactly write - we create additional (but not first) thread in process ? and this thread execute breakpoint.
however, when process already running - create this additional thread not create problems. but in case we create process in suspended state, and then just call DebugActiveProcess - the DbgUiRemoteBreakin really became first executing thread in process and process initialization was done on this thread, instead of created first thread. on xp this always lead to fail process initialize at connect to csrss phase. (csrss wait connect to it only on first created thread in process). on later systems this is fixed and process can execute as usual. but can and not, because thread on which it was initialized is exit. it can cause subtle problems.
solution here - not use DebugActiveProcess but NtDebugActiveProcess in it place.
the debug object we can create or via DbgUiConnectToDbg() and then get it via DbgUiGetThreadDebugObject() (system store debug object in thread TEB) or direct by call NtCreateDebugObject
also if we create debuggee process from another process(B) we can do next:
duplicate debug object from debugger process to this B process
call DbgUiSetThreadDebugObject(hDdg) just before call
CreateProcessW with DEBUG_ONLY_THIS_PROCESS or DEBUG_PROCESS
system will be use DbgUiGetThreadDebugObject() for get debug object
from your thread and pass it to low level process create api
remove debug object from your thread via
DbgUiSetThreadDebugObject(0)
really no matter who is create process with debug object. matter who is handle events posted to this debug object.
all undocumented api definitions you can take from ntdbg.h and then link with ntdll.lib or ntdllp.lib

How can I perform network IO at the very end of a process' lifetime?

I'm developing a DLL in C++ which needs to write some data via a (previously established) TCP/IP connection using the write() call. To be precise, the DLL should send a little 'Process 12345 is terminating at 2007-09-27 15:30:42, value of i is 131' message over the wire when the process goes down.
Unfortunately, all the ways I know for detecting that the process is ending are apparently too late for any network calls to succeed. In particular, I tried the following approaches and the write() call returned -1 in every case:
Calling write() from the destructor of a global object.
Calling write() from a callback function registered using atexit().
Calling write() from DllMain (in case the reason argument is DLL_PROCESS_DETACH). I know that this is not a safe thing to do, but I'm getting a bit desperate. :-)
I'm aware that a DLL can't detect any process shutdown (it might have been unloaded long before the process terminates) but since the shutdown data which the DLL needs to send depends on other code in the DLL, that's acceptable. I'm basically looking for the latest moment at which I can safely perform network IO.
Does anybody know how to do this?
Consider monitoring the process from a separate watchdog process.
Determining If a Process Has Exited: http://msdn.microsoft.com/en-us/library/y111seb2(v=VS.71).aspx
Tutorial: Managing a Windows Process: http://msdn.microsoft.com/en-us/library/s9tkk4a3(v=VS.71).aspx
Consider to use Windows Job Objects.
You main program (monitoring program, which will use for example send()) can start child process suspended, place it into a Job and then resume. Then it will run in the job object. You can register notification via SetInformationJobObject with JobObjectAssociateCompletionPortInformation. Then you will be notified if in the job will be created some child process and if some process inside of job will be ended. So you will be able to send all what you need from the monitoring process. If you debug a program in Visual Studio it uses also job objects to have control under your process and all child processes which you start.
I successfully use the technique in C++ and in C#. So if you will have some problem with implementation I could post you a code example.
I suggest taking option 3. Just do your DLL loading/unloading properly and you're fine. Calling write() should work, I can't explain why it's not in your case. Is it possible that the call fails for a different reason that is unrelated?
Does it work if you call your DLL function manually from the host app?
Why? Just close the socket. If that's the only close in the program, which by your description it must be, that tells the other end that this end is exiting, and you can send the process ID information at the beginning instead of the end. You shouldn't do anything time-consuming or potentially blocking in an exit hook or static destructor.
Where is Winsock being shut down using WSACleanup? You need to make sure that your I/O completes before this happens.
You should be able to work out if this is happening by placing a breakpoint on the Win32 call in Winsock2.dll. Unload of DLLs is displayed in the output in the debug window.

What happens to a named pipe if server crashes?

i know little about pipes but have used one to connect two processes in my code in visual C++. The pipe is working well, but I need to add error handling to the same, hence wanted to know what will happen to a pipe if the server creating it crashed and how do I recognize it from client process?
Also what will happen if the client process tried accessing the same pipe, after the server crash, if no error handling is put in place?
Edit:
What impact will be there on the memory if i keep creating new pipes (say by using system time as pipe name) while the previous was broken because of a server crash? Will these broken pipes be removed from the memory?
IIRC the ReadFile or WriteFile function will return FALSE and GetLastError() will return STATUS_PIPE_DISCONNECTED
I guess this kind of handling is implemented in your code, if not you should better add it ;-)
I just want to throw this out there.
If you want a survivable method for transferring data between two applications, you might consider using MSMQ or even bringing in BizTalk or another message platform.
There are several things to consider:
what happens if the server is rebooted or loses power?
What happens if the server application becomes unresponsive?
What happens if the server application is killed or goes away completely?
What is the appropriate response of a client application in each of the above?
Each of those contexts represent a potential loss of data. If the data loss is unacceptable then named pipes is not the mechanism you should be using. Instead you need to persist the messages somehow.
MSMQ, storing to a database, or even leveraging Biztalk can take care of the survivability of the message itself.
If 1 or 3 happens, then the named pipe goes away and must be recreated by a new instance of your server application. If #2 happens, then the pipe won't go away until someone either reboots the server or kills the server app and starts it again.
Regardless, the client application needs to handle the above issues. They boil down to connection failed problems. Depending on what the client does you might have it move into a wait state and let it ping the server every so often to see if it has come back again.
Without knowing the nature of the data and communication processes involved its hard to recommend a proper approach.

How to check if an application is in waiting

I have two applications running on my machine. One is supposed to hand in the work and other is supposed to do the work. How can I make sure that the first application/process is in wait state. I can verify via the resources its consuming, but that does not guarantee so. What tools should I use?
Your 2 applications shoud communicate. There are a lot of ways to do that:
Send messages through sockets. This way the 2 processes can run on different machines if you use normal network sockets instead of local ones.
If you are using C you can use semaphores with semget/semop/semctl. There should be interfaces for that in other languages.
Named pipes block until there is both a read and a write operation in progress. You can use that for synchronisation.
Signals are also good for this. In C it is called sendmsg/recvmsg.
DBUS can also be used and has bindings for variuos languages.
Update: If you can't modify the processing application then it is harder. You have to rely on some signs that indicate the progress. (I am assuming you processing application reads a file, does some processing then writes the result to an output file.) Do you know the final size the result should be? If so you need to check the size repeatedly (or whenever it changes).
If you don't know the size but you know how the processing works you may be able to use that. For example the processing is done when the output file is closed. You can use strace to see all the system calls including the close. You can replace the close() function with the LD_PRELOAD environment variable (on windows you have to replace dlls). This way you can sort of modify the processing program without actually recompiling or even having access to its source.
you can use named pipes - the first app will read from it but it will be blank and hence it will keep waiting (blocked). The second app will write into it when it wants the first one to continue.
Nothing can guarantee that your application is in waiting state. You have to pass it some work and get back a response. It might be transactions or not - application can confirm that it got the message to process before it starts to process it or after it was processed (successfully or not). If it does not wait, passing a piece of work should fail. Whether when trying to write to a TCP/IP socket or other means, or if timeout occurs. This depends on implementation, what kind of transport you are using and other requirements.
There is actually a way of figuring out if the process (thread) is in blocking state and waiting for data on a socket (or other source), but that means that client should be on the same computer and have access privileges required to do that, but that makes no sense other than debugging, which you can do using any debugger anyway.
Overall, the idea of making sure that application is waiting for data before trying to pass it that data smells bad. Not to mention the racing condition - what if you checked and it was OK, and when you actually tried to send the data, you found out that application is not waiting at that time (even if that is microseconds).

Writing files to USB stick causes file corruption/lockup on surprise removal

I'm writing a background application to copy files in a loop to a USB stick with the "Optimize for quick removal" policy set. However, if the stick is removed part way through this process (specifically in the WriteFile() call below, which returns ERROR FILE NOT FOUND) the application hangs, the drive is then permanently inaccessible from any other application and the PC cannot be shutdown/logged off/restarted etc. All running instances of Windows Explorer also hang as a result.
I have traced the issue to the CloseHandle() call made after the stick is removed and the above error occurs. Its almost as if CloseHandle() is blocking indefinitely in the driver somewhere because the stick is no longer there? Anyway, I have managed to get past this issue by simply skipping the CloseHandle() call if WriteFile() returns ERROR FILE NOT FOUND. However, this leads to another problem where, every so often, a file gets irrecoverably corrupted and the only way to fix it is using chkdsk, or reformat the stick.
Note that this only happens on XP (SP2 and 3), Vista does not seem to suffer from the issue. A snippet of the code follows:
HANDLE hFile = CreateFile(szFile,
GENERIC_WRITE,
FILE_SHARE_WRITE | FILE_SHARE_READ | FILE_SHARE_DELETE,
NULL,
CREATE_ALWAYS,
FILE_FLAG_WRITE_THROUGH,
NULL);
if (hFile != INVALID_HANDLE_VALUE)
{
if (!WriteFile(hFile, pBuffer, dwBufferSize, &dwWritten))
{
int nLastError = GetLastError();
}
// If usb stick is removed during WriteFile(), ERROR_FILE_NOT_FOUND usually results.
// If handle is closed at this point then drive is inaccessible.
// If CloseHandle() is skipped, then file corruption occurs instead
if (nLastError != ERROR_FILE_NOT_FOUND)
{
CloseHandle(hFile);
}
}
I've tried pretty much every combination of flags for CreateFile() all to no avail. Has anybody seen this before or have any good ideas how to avoid either of the two problems occuring. Is what I'm seeing a driver problem which has been silently fixed under vista?
Thanks for any help.
Its almost as if CloseHandle() is blocking indefinitely in the driver somewhere because
the stick is no longer there?
Sounds reasonable. CloseHandle() will ultimately emit a file system IRP and you're not using non-blocking I/O, so that IRP will be synchronous, but it looks like where the actual file system has abruptly disappeared from underneath the file system driver, that IRP is never completed. Which means you're stuffed - the user-mode function call which lead to the file system IRP being issued will never return.
Try using non-blocking I/O - that will prolly get you around this problem, at least from the point of view of not hanging. You will still be experiencing resource loss and the like, since the IRP will still be passed down and almost certainly still won't be coming back up, but at least you won't be blocking on it.
BTW, "optimize for quick removal" is I would say designed to reduce the amount of caching that goes on and perhaps influence the order of writes to the file system to reduce the chance of corruption; I extremely doubt it is intended to preserve the file system in the event of file system departure during a write!
You should not be surprised that this kills the file system.
This seems to be a driver problem.
You must free all handles to a driver, to let it cleanup itself, and windows to unload it. When you don't do that, the driver thinks it is still responsible for the device, even though it changed.
You cannot escape this problem in User-Mode.
Abandoning the handle just transfers the problem to a later stage (e.g. Quitting your program, so that windows tries to close all your abandoned open handles.)
About the hanging problem: You could spawn a separate thread for writing, supervise the writing process from the main thread and abort the writing thread if it takes suspiciously long. In other words: Implement the writing asynchronously and look for a timeout.
Could it be a problem with the filesystem driver rather than the hardware driver? You might find that if you use NTFS the problem goes away.