DeviceIoControl buffer parameters marshalling and alignment - c++

I am writing a Windows CE service and an API library for it, which wraps DeviceIoControl calls needed to communicate with the library.
Can I be sure, that marshalling of memory buffers passed to the DeviceIoControl function will not break any memory aligned data? E.g., if I call the DeviceIoControl the following way:
int32_t value = 5; // properly aligned at 4 bytes
DeviceIoControl(handle, IOCTL_CODE, &value, sizeof(value), NULL, 0, NULL, NULL);
can I handle it on the service side the following way:
BOOL APIENTRY SRV_IOControl(DWORD data, DWORD code, PBYTE inputBuffer, DWORD inputBufferLength, /*other params*/)
{
if ((code == IOCTL_CODE) && (inputBufferLength == sizeof(int32_t)))
{
// if inputBuffer is not aligned to 4 bytes, then this may produce
// unaligned memory access failure on some ARM processors
int32_t value = *(reinterpret_cast<int32_t*>(inputBuffer));
}
//...
}
In Windows CE 6.0 each process uses it's own address space, so the memory buffer passed from a client to a service needs to be marshalled by the OS somehow, e.g. through memory aliasing or copying. The (potential) problem can be prematurely solved on the service side by using the UNALIGNED (__unaligned) Visual C++ extension keyword, or by copying buffers to an aligned destination. But since all these need more work from developer and from CPU, it's good to avoid it if it's known that the problem does not exist at all.

The DeviceIoControl call will not change the alignment of any data that it marshals, so whatever alignment you have at the source is what you'll get in the driver. That's not to say that you could screw things up using UNALIGNED in a caller and the driver then would break, but if the caller is doing that, it's on them, and your driver shouldn't be expecting unaligned data anyway.

Related

the order of VirtualAlloc appears to matter (c++)

I am having some odd behavior when using virtualalloc. I'm in c++, Visual Studio 2010.
I have two things I want to allocate, and I'm using VirtualAlloc (I have my reasons, irrelevant to the question)
1 - Space to hold a buffer of x86 assembly code
2 - Space to hold the data structure that the x86 code wants
In my code I am doing:
thread_data_t * p_data = (thread_data_t*)VirtualAlloc(NULL, sizeof(thread_data_t), MEM_COMMIT, PAGE_READWRITE);
//set up all the values in the structure
unsigned char* p_function = (unsigned char*)VirtualAlloc(NULL, sizeof(buffer), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
memcpy(p_function, buffer, sizeof(buffer));
CreateThread( 0, (LPTHREAD_START_ROUTINE)p_function, p_data, 0, NULL);
in DEBUG mode: Works fine
in RELEASE mode: The spun up thread receives a null as its input data. Verified through debugging that when I call createThread the pointer is correct
if I switch the VirtualAlloc's around, so that I allocate the function space before the data space, then both DEBUG and RELEASE mode work fine.
Any ideas why? I've verified all my VS build settings are the same between DEBUG/RELEASE
After copying assembly code into a memory buffer, you can't just jump straight into that buffer. You need to flush CPU caches and the like or it will not work. You can use FlushInstructionCache to do this.
https://msdn.microsoft.com/en-us/library/windows/desktop/ms679350%28v=vs.85%29.aspx
It's hard to say exactly why reordering the allocations would fix the issue, but if you copied the instructions into their buffer and then did a lot of work before jumping into the buffer, that would likely improve the odds of "getting away with it," as the CPU caches would have more of an opportunity to get flushed out by other means.

Is there no GetFilePointer(Ex) Windows API function?

I am trying to debug a program that manipulates a file. For example, I set the file-pointer to offset 4 (using a base of 0), but it seems to be starting at offset 5 instead.
To try to figure out what is happening, I want to put in a line to print out the current file pointer (I’m not using an IDE for this little project, just Notepad2 and the command-line). Unfortunately there does not seem to be a Windows API function to retrieve the current file pointer, only one to set it.
I’m recall being able to find the current file pointer in Pascal (in DOS), but how can the current file pointer be determined in C++ in Windows?
Unlike most functions, that provide both a getter and setter (in the read-write sense), there is indeed no GetFilePointer or GetFilePointerEx.
However the value can be retrieved by calling SetFilePointer(Ex). The two SetFilePointer functions return the the return/output from SetFilePointer, but you have to make sure to specify an offset of 0, and FILE_CURRENT as the mode. That way, it moves 0 bytes from where it is, then returns (I can’t vouch for whether or not it wastes CPU cycles and RAM to perform the zero-move, but I would think they have optimized to not do so).
Yes, it is inconsistent and confusing (and redundant and poorly designed), but you can wrap it in your own GetFilePointer(Ex) function:
DWORD GetFilePointer (HANDLE hFile) {
return SetFilePointer(hFile, 0, NULL, FILE_CURRENT);
}
LONGLONG GetFilePointerEx (HANDLE hFile) {
LARGE_INTEGER liOfs={0};
LARGE_INTEGER liNew={0};
SetFilePointerEx(hFile, liOfs, &liNew, FILE_CURRENT);
return liNew.QuadPart;
}

Consistency of two C FILE* streams on a single file

I need to implement a simple "spill to disk" layer for large volume of data coming off a network socket. I was hoping to have two C FILE* streams, one used by a background thread writing to the file, one used by a front end thread reading it.
The two streams are so one thread can be writing at one offset, while the other is reading elsewhere - without taking a lock and blocking the other thread.
There will be a paging mechanism so the reads/writes are at random access locations - not necessarily sequential.
One more caveat, this needs to work on Windows and Linux.
The question: after the fwrite to the first stream has returned, is that written data guaranteed to be immediately visible to an fread on the second stream?
If not, what other options might I consider?
So Posix pread/pwrite functions turned out to be what I needed. Here's a version for Win32:
size_t pread64(int fd, void* buf, size_t nbytes, __int64 offset)
{
OVERLAPPED ovl;
memset(&ovl, 0, sizeof(ovl));
*((__int64*)&ovl.Offset)=offset;
DWORD nBytesRead;
if (!ReadFile((HANDLE)_get_osfhandle(fd), buf, nbytes, &nBytesRead, &ovl))
return -1;
return nBytesRead;
}
size_t pwrite64(int fd, void* buf, size_t nbytes, __int64 offset)
{
OVERLAPPED ovl;
memset(&ovl, 0, sizeof(ovl));
*((__int64*)&ovl.Offset)=offset;
DWORD nBytesWritten;
if (!WriteFile((HANDLE)_get_osfhandle(fd), buf, nbytes, &nBytesWritten, &ovl))
return -1;
return nBytesWritten;
}
(And thank you everyone for input on this - much appreciated).
This sounds like a great fit for memory-mapped I/O. It's guaranteed to be coherent, very fast, and keeping track of multiple pointers is straightforward.
You'll need different functions to set up the memory mapping on different OSes, but the actual I/O is completely portable (using pointer deference).
linux: open, mmap
Windows: CreateFileMapping, MapViewOfFile
This definitely will not give you the semantics you want. If you disabled buffering, it might be reasonable to expect it to work, but I still don't think there are any guarantees. Stdio/FILE is really not the right tool for specialized IO needs like this.
The POSIX way to do what you want is with file descriptors and the pread/pwrite functions. I suspect there's a Windows way (or you could emulate them based on some other underlying Windows primitive) but I don't know it.
Also Ben's suggestion of using memory-mapped IO is a very good one, assuming the file fits in your address space.

Are there equivalents to pread on different platforms?

I am writing a concurrent, persistent message queue in C++, which requires concurrent read access to a file without using memory mapped io. Short story is that several threads will need to read from different offsets of the file.
Originally I had a file object that had typical read/write methods, and threads would acquire a mutex to call those methods. However, it so happened that I did not acquire the mutex properly somewhere, causing one thread to move the file offset during a read/write, and another thread would start reading/writing to an incorrect part of the file.
So, the paranoid solution is to have one open file handle per thread. Now I've got a lot of file handles to the same file, which I'm assuming can't be great.
I'd like to use something like pread, which allows passing in of the current offset to read/write functions.
However, the function is only available on linux, and I need equivalent implementations on windows, aix, solaris and hpux, any suggestions?
On Windows, the ReadFile() function can do it, see the lpOverlapped parameter and this info on async IO.
With NIO, java.nio.channels.FileChannel has a read(ByteBuffer dst, long position) method, which internally uses pread.
Oh wait, your question is about C++, not Java. Well, I just looked at the JDK source code to see how it does it for Windows, but unfortunately on Windows it isn't atomic: it simply seeks, then reads, then seeks back.
For Unix platforms, the punchline is that pread is standard for any XSI-supporting (X/Open System Interface, apparently) operating system: http://www.opengroup.org/onlinepubs/009695399/functions/pread.html
Based on another answer, the closest I could come up with is this. However, there is a bug: ReadFile will change the file offset, and pread is guaranteed to not change the file offset. There's no real way to fix this, because code can do normal read() and write() concurrently with no lock. Anybody found a call that will not change the offset?
unsigned int FakePRead(int fd, void *to, std::size_t size, uint64_offset) {
// size_t might be 64-bit. DWORD is always 32.
const std::size_t kMax = static_cast<std::size_t>(1UL << 31);
DWORD reading = static_cast<DWORD>(std::min<std::size_t>(kMax, size));
DWORD ret;
OVERLAPPED overlapped;
memset(&overlapped, 0, sizeof(OVERLAPPED));
overlapped.Offset = static_cast<DWORD>(off);
overlapped.OffsetHigh = static_cast<DWORD>(off >> 32);
if (!ReadFile((HANDLE)_get_osfhandle(fd), to, reading, &ret, &overlapped)) {
// TODO: set errno to something?
return -1;
}
// Note the limit to 1 << 31 before.
return static_cast<unsigned int>(ret);
}

Why would waveOutWrite() cause an exception in the debug heap?

While researching this issue, I found multiple mentions of the following scenario online, invariably as unanswered questions on programming forums. I hope that posting this here will at least serve to document my findings.
First, the symptom: While running pretty standard code that uses waveOutWrite() to output PCM audio, I sometimes get this when running under the debugger:
ntdll.dll!_DbgBreakPoint#0()
ntdll.dll!_RtlpBreakPointHeap#4() + 0x28 bytes
ntdll.dll!_RtlpValidateHeapEntry#12() + 0x113 bytes
ntdll.dll!_RtlDebugGetUserInfoHeap#20() + 0x96 bytes
ntdll.dll!_RtlGetUserInfoHeap#20() + 0x32743 bytes
kernel32.dll!_GlobalHandle#4() + 0x3a bytes
wdmaud.drv!_waveCompleteHeader#4() + 0x40 bytes
wdmaud.drv!_waveThread#4() + 0x9c bytes
kernel32.dll!_BaseThreadStart#8() + 0x37 bytes
While the obvious suspect would be a heap corruption somewhere else in the code, I found out that that's not the case. Furthermore, I was able to reproduce this problem using the following code (this is part of a dialog based MFC application:)
void CwaveoutDlg::OnBnClickedButton1()
{
WAVEFORMATEX wfx;
wfx.nSamplesPerSec = 44100; /* sample rate */
wfx.wBitsPerSample = 16; /* sample size */
wfx.nChannels = 2;
wfx.cbSize = 0; /* size of _extra_ info */
wfx.wFormatTag = WAVE_FORMAT_PCM;
wfx.nBlockAlign = (wfx.wBitsPerSample >> 3) * wfx.nChannels;
wfx.nAvgBytesPerSec = wfx.nBlockAlign * wfx.nSamplesPerSec;
waveOutOpen(&hWaveOut,
WAVE_MAPPER,
&wfx,
(DWORD_PTR)m_hWnd,
0,
CALLBACK_WINDOW );
ZeroMemory(&header, sizeof(header));
header.dwBufferLength = 4608;
header.lpData = (LPSTR)GlobalLock(GlobalAlloc(GMEM_MOVEABLE | GMEM_SHARE | GMEM_ZEROINIT, 4608));
waveOutPrepareHeader(hWaveOut, &header, sizeof(header));
waveOutWrite(hWaveOut, &header, sizeof(header));
}
afx_msg LRESULT CwaveoutDlg::OnWOMDone(WPARAM wParam, LPARAM lParam)
{
HWAVEOUT dev = (HWAVEOUT)wParam;
WAVEHDR *hdr = (WAVEHDR*)lParam;
waveOutUnprepareHeader(dev, hdr, sizeof(WAVEHDR));
GlobalFree(GlobalHandle(hdr->lpData));
ZeroMemory(hdr, sizeof(*hdr));
hdr->dwBufferLength = 4608;
hdr->lpData = (LPSTR)GlobalLock(GlobalAlloc(GMEM_MOVEABLE | GMEM_SHARE | GMEM_ZEROINIT, 4608));
waveOutPrepareHeader(hWaveOut, &header, sizeof(WAVEHDR));
waveOutWrite(hWaveOut, hdr, sizeof(WAVEHDR));
return 0;
}
Before anyone comments on this, yes - the sample code plays back uninitialized memory. Don't try this with your speakers turned all the way up.
Some debugging revealed the following information: waveOutPrepareHeader() populates header.reserved with a pointer to what appears to be a structure containing at least two pointers as its first two members. The first pointer is set to NULL. After calling waveOutWrite(), this pointer is set to a pointer allocated on the global heap. In pseudo code, that would look something like this:
struct Undocumented { void *p1, *p2; } /* This might have more members */
MMRESULT waveOutPrepareHeader( handle, LPWAVEHDR hdr, ...) {
hdr->reserved = (Undocumented*)calloc(sizeof(Undocumented));
/* Do more stuff... */
}
MMRESULT waveOutWrite( handle, LPWAVEHDR hdr, ...) {
/* The following assignment fails rarely, causing the problem: */
hdr->reserved->p1 = malloc( /* chunk of private data */ );
/* Probably more code to initiate playback */
}
Normally, the header is returned to the application by waveCompleteHeader(), a function internal to wdmaud.dll. waveCompleteHeader() tries to deallocate the pointer allocated by waveOutWrite() by calling GlobalHandle()/GlobalUnlock() and friends. Sometimes, GlobalHandle() bombs, as shown above.
Now, the reason that GlobalHandle() bombs is not due to a heap corruption, as I suspected at first - it's because waveOutWrite() returned without setting the first pointer in the internal structure to a valid pointer. I suspect that it frees the memory pointed to by that pointer before returning, but I haven't disassembled it yet.
This only appears to happen when the wave playback system is low on buffers, which is why I'm using a single header to reproduce this.
At this point I have a pretty good case against this being a bug in my application - after all, my application is not even running. Has anyone seen this before?
I'm seeing this on Windows XP SP2. The audio card is from SigmaTel, and the driver version is 5.10.0.4995.
Notes:
To prevent confusion in the future, I'd like to point out that the answer suggesting that the problem lies with the use of malloc()/free() to manage the buffers being played is simply wrong. You'll note that I changed the code above to reflect the suggestion, to prevent more people from making the same mistake - it doesn't make a difference. The buffer being freed by waveCompleteHeader() is not the one containing the PCM data, the responsibility to free the PCM buffer lies with the application, and there's no requirement that it be allocated in any specific way.
Also, I make sure that none of the waveOut API calls I use fail.
I'm currently assuming that this is either a bug in Windows, or in the audio driver. Dissenting opinions are always welcome.
Now, the reason that GlobalHandle()
bombs is not due to a heap corruption,
as I suspected at first - it's because
waveOutWrite() returned without
setting the first pointer in the
internal structure to a valid pointer.
I suspect that it frees the memory
pointed to by that pointer before
returning, but I haven't disassembled
it yet.
I can reproduce this with your code on my system. I see something similar to what Johannes reported. After the call to WaveOutWrite, hdr->reserved normally holds a pointer to allocated memory (which appears to contain the wave out device name in unicode, among other things).
But occasionally, after returning from WaveOutWrite(), the byte pointed to by hdr->reserved is set to 0. This is normally the least significant byte of that pointer. The rest of the bytes in hdr->reserved are ok, and the block of memory that it normally points to is still allocated and uncorrupted.
It probably is being clobbered by another thread - I can catch the change with a conditional breakpoint immediately after the call to WaveOutWrite(). And the system debug breakpoint is occurring in another thread, not the message handler.
However, I can't cause the system debug breakpoint to occur if I use a callback function instead of the windows messsage pump. (fdwOpen = CALLBACK_FUNCTION in WaveOutOpen() )
When I do it this way, my OnWOMDone handler is called by a different thread - possibly the one that's otherwise responsible for the corruption.
So I think there is a bug, either in windows or the driver, but I think you can work around by handling WOM_DONE with a callback function instead of the windows message pump.
You're not alone with this issue:
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=100589
I'm seeing the same problem and have done some analysis myself:
waveOutWrite() allocates (i.e. GlobalAlloc) a pointer to a heap area of 354 bytes and correctly stores it in the data area pointed to by header.reserved.
But when this heap area is to be freed again (in waveCompleteHeader(), according to your analysis; I don't have the symbols for wdmaud.drv myself), the least significant byte of the pointer has been set to zero, thus invalidating the pointer (while the heap is not corrupted yet). In other words, what happens is something like:
(BYTE *) (header.reserved) = 0
So I disagree with your statements in one point: waveOutWrite() stores a valid pointer first; the pointer only becomes corrupted later from another thread.
Probably that's the same thread (mxdmessage) that later tries to free this heap area, but I did not yet find the point where the zero byte is stored.
This does not happen very often, and the same heap area (same address) has successfully been allocated and deallocated before.
I'm quite convinced that this is a bug somewhere in the system code.
Not sure about this particular problem, but have you considered using a higher-level, cross-platform audio library? There are a lot of quirks with Windows audio programming, and these libraries can save you a lot of headaches.
Examples include PortAudio, RtAudio, and SDL.
The first thing that I'd do would be to check the return values from the waveOutX functions. If any of them fail - which isn't unreasonable given the scenario you describe - and you carry on regardless then it isn't surprising that things start to go wrong. My guess would be that waveOutWrite is returning MMSYSERR_NOMEM at some point.
Use Application Verifier to figure out what's going on, if you do something suspicious, it will catch it much earlier.
It may be helpful to look at the source code for Wine, although it's possible that Wine has fixed whatever bug there is, and it's also possible Wine has other bugs in it. The relevant files are dlls/winmm/winmm.c, dlls/winmm/lolvldrv.c, and possibly others. Good luck!
What about the fact that you are not allowed to call winmm functions from within callback?
MSDN does not mention such restrictions about window messages, but usage of window messages is similar to callback function. Possibly, internally it's implemented as a callback function from the driver and that callback does SendMessage.
Internally, waveout has to maintain linked list of headers that were written using waveOutWrite; So, I guess that:
hdr->reserved = (Undocumented*)calloc(sizeof(Undocumented));
sets previous/next pointers of the linked list or something like this. If you write more buffers, then if you check the pointers and if any of them point to one another then my guess is most likely correct.
Multiple sources on the web mention that you don't need to unprepare/prepare same headers repeatedly. If you comment out Prepare/unprepare header in the original example then it appears to work fine without any problems.
I solved the problem by polling the sound playback and delays:
WAVEHDR header = { buffer, sizeof(buffer), 0, 0, 0, 0, 0, 0 };
waveOutPrepareHeader(hWaveOut, &header, sizeof(WAVEHDR));
waveOutWrite(hWaveOut, &header, sizeof(WAVEHDR));
/*
* wait a while for the block to play then start trying
* to unprepare the header. this will fail until the block has
* played.
*/
while (waveOutUnprepareHeader(hWaveOut,&header,sizeof(WAVEHDR)) == WAVERR_STILLPLAYING)
Sleep(100);
waveOutClose(hWaveOut);
Playing Audio in Windows using waveOut Interface