the order of VirtualAlloc appears to matter (c++) - c++

I am having some odd behavior when using virtualalloc. I'm in c++, Visual Studio 2010.
I have two things I want to allocate, and I'm using VirtualAlloc (I have my reasons, irrelevant to the question)
1 - Space to hold a buffer of x86 assembly code
2 - Space to hold the data structure that the x86 code wants
In my code I am doing:
thread_data_t * p_data = (thread_data_t*)VirtualAlloc(NULL, sizeof(thread_data_t), MEM_COMMIT, PAGE_READWRITE);
//set up all the values in the structure
unsigned char* p_function = (unsigned char*)VirtualAlloc(NULL, sizeof(buffer), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
memcpy(p_function, buffer, sizeof(buffer));
CreateThread( 0, (LPTHREAD_START_ROUTINE)p_function, p_data, 0, NULL);
in DEBUG mode: Works fine
in RELEASE mode: The spun up thread receives a null as its input data. Verified through debugging that when I call createThread the pointer is correct
if I switch the VirtualAlloc's around, so that I allocate the function space before the data space, then both DEBUG and RELEASE mode work fine.
Any ideas why? I've verified all my VS build settings are the same between DEBUG/RELEASE

After copying assembly code into a memory buffer, you can't just jump straight into that buffer. You need to flush CPU caches and the like or it will not work. You can use FlushInstructionCache to do this.
https://msdn.microsoft.com/en-us/library/windows/desktop/ms679350%28v=vs.85%29.aspx
It's hard to say exactly why reordering the allocations would fix the issue, but if you copied the instructions into their buffer and then did a lot of work before jumping into the buffer, that would likely improve the odds of "getting away with it," as the CPU caches would have more of an opportunity to get flushed out by other means.

Related

clCreateBuffer() never fails

Intending to use ALL the available on-GPU memory for my algorithm, so retrieving it's amount with:
clGetDeviceInfo( ..., CL_DEVICE_GLOBAL_MEM_SIZE, ... );
which is 536543232 bytes, and then allocate it on GPU with:
clCreateBuffer( gpuContext, CL_MEM_READ_WRITE, 536543232, NULL, & errcode_ret );
Wondered why it worked and if it would fail if try to allocate some more memory? Tried out 100 gigs and it still worked!
clCreateBuffer( gpuContext, CL_MEM_READ_WRITE, 100000000000, NULL, & errcode_ret );
So the question is why it works with whatever amount of memory specified?
I may happen if OpenCL platform has lazy memory allocation (almost every platform does that). I guess some OpenCL platforms just check if what you requested can be allocated on clCreateBuffer, and maybe yours doesn't. You will probably get an error on the first OpenCL functions that actually uses your buffer, like clEnqueueWriteBuffer() etc. What is your OpenCL platform?

WinAPI: Is it needed to call FlushInstructionCache on an executable memory-mapped file?

I have written a short program to read a windows obj file and find the .text section and run the code in it. To do this I make the following Windows API function calls (Full code [gist.github.com], for those interested):
HANDLE FileHandle = CreateFile("lib.obj",
GENERIC_READ | GENERIC_EXECUTE,
FILE_SHARE_READ, 0,
OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
HANDLE MappingHandle = CreateFileMapping(FileHandle, 0, PAGE_EXECUTE_READ, 0, 0, 0);
void *Address = MapViewOfFile(MappingHandle, FILE_MAP_EXECUTE | FILE_MAP_READ,
0, 0, 0);
I then find the .text section in the file and cast the pointer to the code to a function pointer in C++ and simply call the function. This actually appeared to work for me.
Have I made a mistake not calling FlushInstructonCache on the range of virtual memory mapped to the file?
I ask this because I was recently reading the VirtualAlloc documentation and it notes at the bottom:
When creating a region that will be executable, the calling program bears responsibility for ensuring cache coherency via an appropriate call to FlushInstructionCache once the code has been set in place. Otherwise attempts to execute code out of the newly executable region may produce unpredictable results.
Is it possible that my code will cause the CPU to execute old instructions in the instruction cache?
There is no such note on the MapViewOfFile or CreateFileMapping pages.
If you only load the file-content into memory using MapViewOfFile, it should be fine without.
If you MODIFY the content in memory, you need to flush the instructioncache before executing the code, as it MAY exist in cache in the unmodified form, and MAY then be executed without your modifications.
I use the word MAY because of two things:
it depends on the processor architecture whether the processor detects writes to the memory it is about to execute [some processors don't even have hardware to register writes to data that is in instruction caches - because it's so rare that it's very unlikely].
because it's hard to predict what may be in a cache - processors
have all manner of "clever" ways to prefetch and in general "fill"
caches.
Obviously, VirtualAlloc has zero chance of containing the data you wanted, so it's mentioned there because you'd ALWAYS write to it before executing.
Modifications include "fix up for absolute addresses" for example (something you'd have to do if you want to complete a project that loads something complex to execute it), or if you write a debugger, when you set a breakpoint by replacing an instruction with the INT 3 instruction on x86.
A second case of "modification" is if you unload the file, and load a different file (perhaps the "same" file, but rebuilt, for example), in which case, the previously executed code may still be in the cache, and you get the mysterious "why didn't my changes do what I expect"

DeviceIoControl buffer parameters marshalling and alignment

I am writing a Windows CE service and an API library for it, which wraps DeviceIoControl calls needed to communicate with the library.
Can I be sure, that marshalling of memory buffers passed to the DeviceIoControl function will not break any memory aligned data? E.g., if I call the DeviceIoControl the following way:
int32_t value = 5; // properly aligned at 4 bytes
DeviceIoControl(handle, IOCTL_CODE, &value, sizeof(value), NULL, 0, NULL, NULL);
can I handle it on the service side the following way:
BOOL APIENTRY SRV_IOControl(DWORD data, DWORD code, PBYTE inputBuffer, DWORD inputBufferLength, /*other params*/)
{
if ((code == IOCTL_CODE) && (inputBufferLength == sizeof(int32_t)))
{
// if inputBuffer is not aligned to 4 bytes, then this may produce
// unaligned memory access failure on some ARM processors
int32_t value = *(reinterpret_cast<int32_t*>(inputBuffer));
}
//...
}
In Windows CE 6.0 each process uses it's own address space, so the memory buffer passed from a client to a service needs to be marshalled by the OS somehow, e.g. through memory aliasing or copying. The (potential) problem can be prematurely solved on the service side by using the UNALIGNED (__unaligned) Visual C++ extension keyword, or by copying buffers to an aligned destination. But since all these need more work from developer and from CPU, it's good to avoid it if it's known that the problem does not exist at all.
The DeviceIoControl call will not change the alignment of any data that it marshals, so whatever alignment you have at the source is what you'll get in the driver. That's not to say that you could screw things up using UNALIGNED in a caller and the driver then would break, but if the caller is doing that, it's on them, and your driver shouldn't be expecting unaligned data anyway.

Is it possible to protect a region of memory from WinAPI?

Having read this interesting article outlining a technique for debugging heap corruption, I started wondering how I could tweak it for my own needs. The basic idea is to provide a custom malloc() for allocating whole pages of memory, then enabling some memory protection bits for those pages, so that the program crashes when they get written to, and the offending write instruction can be caught in the act. The sample code is C under Linux (mprotect() is used to enable the protection), and I'm curious as to how to apply this to native C++ and Windows. VirtualAlloc() and/or VirtualProtect() look promising, but I'm not sure how a use scenario would look like.
Fred *p = new Fred[100];
ProtectBuffer(p);
p[10] = Fred(); // like this to crash please
I am aware of the existence of specialized tools for debugging memory corruption in Windows, but I'm still curious if it would be possible to do it "manually" using this approach.
EDIT: Also, is this even a good idea under Windows, or just an entertaining intellectual excercise?
Yes, you can use VirtualAlloc and VirtualProtect to set up sections of memory that are protected from read/write operations.
You would have to re-implement operator new and operator delete (and their [] relatives), such that your memory allocations are controlled by your code.
And bear in mind that it would only be on a per-page basis, and you would be using (at least) three pages worth of virtual memory per allocation - not a huge problem on a 64-bit system, but may cause problems if you have many allocations in a 32-bit system.
Roughly what you need to do (you should actually find the page-size for the build of Windows - I'm too lazy, so I'll use 4096 and 4095 to represent pagesize and pagesize-1 - you also will need to do more error checking than this code does!!!):
void *operator new(size_t size)
{
Round size up to size in pages + 2 pages extra.
size_t bigsize = (size + 2*4096 + 4095) & ~4095;
// Make a reservation of "size" bytes.
void *addr = VirtualAlloc(NULL, bigsize, PAGE_NOACCESS, MEM_RESERVE);
addr = reinterpret_cast<void *>(reinterpret_cast<char *>(addr) + 4096);
void *new_addr = VirtualAlloc(addr, size, PAGE_READWRITE, MEM_COMMIT);
return new_addr;
}
void operator delete(void *ptr)
{
char *tmp = reinterpret_cast<char *>(ptr) - 4096;
VirtualFree(reinterpret_cast<void*>(tmp));
}
Something along those lines, as I said - I haven't tried compiling this code, as I only have a Windows VM, and I can't be bothered to download a compiler and see if it actually compiles. [I know the principle works, as we did something similar where I worked a few years back].
This is what Gaurd Pages are for (see this MSDN tutorial), they raise a special exception when the page is accessed the first time, allowing you to do more than crash on the first invalid pages access (and catch bad read/writes as opposed to NULL pointers etc).

Why would waveOutWrite() cause an exception in the debug heap?

While researching this issue, I found multiple mentions of the following scenario online, invariably as unanswered questions on programming forums. I hope that posting this here will at least serve to document my findings.
First, the symptom: While running pretty standard code that uses waveOutWrite() to output PCM audio, I sometimes get this when running under the debugger:
ntdll.dll!_DbgBreakPoint#0()
ntdll.dll!_RtlpBreakPointHeap#4() + 0x28 bytes
ntdll.dll!_RtlpValidateHeapEntry#12() + 0x113 bytes
ntdll.dll!_RtlDebugGetUserInfoHeap#20() + 0x96 bytes
ntdll.dll!_RtlGetUserInfoHeap#20() + 0x32743 bytes
kernel32.dll!_GlobalHandle#4() + 0x3a bytes
wdmaud.drv!_waveCompleteHeader#4() + 0x40 bytes
wdmaud.drv!_waveThread#4() + 0x9c bytes
kernel32.dll!_BaseThreadStart#8() + 0x37 bytes
While the obvious suspect would be a heap corruption somewhere else in the code, I found out that that's not the case. Furthermore, I was able to reproduce this problem using the following code (this is part of a dialog based MFC application:)
void CwaveoutDlg::OnBnClickedButton1()
{
WAVEFORMATEX wfx;
wfx.nSamplesPerSec = 44100; /* sample rate */
wfx.wBitsPerSample = 16; /* sample size */
wfx.nChannels = 2;
wfx.cbSize = 0; /* size of _extra_ info */
wfx.wFormatTag = WAVE_FORMAT_PCM;
wfx.nBlockAlign = (wfx.wBitsPerSample >> 3) * wfx.nChannels;
wfx.nAvgBytesPerSec = wfx.nBlockAlign * wfx.nSamplesPerSec;
waveOutOpen(&hWaveOut,
WAVE_MAPPER,
&wfx,
(DWORD_PTR)m_hWnd,
0,
CALLBACK_WINDOW );
ZeroMemory(&header, sizeof(header));
header.dwBufferLength = 4608;
header.lpData = (LPSTR)GlobalLock(GlobalAlloc(GMEM_MOVEABLE | GMEM_SHARE | GMEM_ZEROINIT, 4608));
waveOutPrepareHeader(hWaveOut, &header, sizeof(header));
waveOutWrite(hWaveOut, &header, sizeof(header));
}
afx_msg LRESULT CwaveoutDlg::OnWOMDone(WPARAM wParam, LPARAM lParam)
{
HWAVEOUT dev = (HWAVEOUT)wParam;
WAVEHDR *hdr = (WAVEHDR*)lParam;
waveOutUnprepareHeader(dev, hdr, sizeof(WAVEHDR));
GlobalFree(GlobalHandle(hdr->lpData));
ZeroMemory(hdr, sizeof(*hdr));
hdr->dwBufferLength = 4608;
hdr->lpData = (LPSTR)GlobalLock(GlobalAlloc(GMEM_MOVEABLE | GMEM_SHARE | GMEM_ZEROINIT, 4608));
waveOutPrepareHeader(hWaveOut, &header, sizeof(WAVEHDR));
waveOutWrite(hWaveOut, hdr, sizeof(WAVEHDR));
return 0;
}
Before anyone comments on this, yes - the sample code plays back uninitialized memory. Don't try this with your speakers turned all the way up.
Some debugging revealed the following information: waveOutPrepareHeader() populates header.reserved with a pointer to what appears to be a structure containing at least two pointers as its first two members. The first pointer is set to NULL. After calling waveOutWrite(), this pointer is set to a pointer allocated on the global heap. In pseudo code, that would look something like this:
struct Undocumented { void *p1, *p2; } /* This might have more members */
MMRESULT waveOutPrepareHeader( handle, LPWAVEHDR hdr, ...) {
hdr->reserved = (Undocumented*)calloc(sizeof(Undocumented));
/* Do more stuff... */
}
MMRESULT waveOutWrite( handle, LPWAVEHDR hdr, ...) {
/* The following assignment fails rarely, causing the problem: */
hdr->reserved->p1 = malloc( /* chunk of private data */ );
/* Probably more code to initiate playback */
}
Normally, the header is returned to the application by waveCompleteHeader(), a function internal to wdmaud.dll. waveCompleteHeader() tries to deallocate the pointer allocated by waveOutWrite() by calling GlobalHandle()/GlobalUnlock() and friends. Sometimes, GlobalHandle() bombs, as shown above.
Now, the reason that GlobalHandle() bombs is not due to a heap corruption, as I suspected at first - it's because waveOutWrite() returned without setting the first pointer in the internal structure to a valid pointer. I suspect that it frees the memory pointed to by that pointer before returning, but I haven't disassembled it yet.
This only appears to happen when the wave playback system is low on buffers, which is why I'm using a single header to reproduce this.
At this point I have a pretty good case against this being a bug in my application - after all, my application is not even running. Has anyone seen this before?
I'm seeing this on Windows XP SP2. The audio card is from SigmaTel, and the driver version is 5.10.0.4995.
Notes:
To prevent confusion in the future, I'd like to point out that the answer suggesting that the problem lies with the use of malloc()/free() to manage the buffers being played is simply wrong. You'll note that I changed the code above to reflect the suggestion, to prevent more people from making the same mistake - it doesn't make a difference. The buffer being freed by waveCompleteHeader() is not the one containing the PCM data, the responsibility to free the PCM buffer lies with the application, and there's no requirement that it be allocated in any specific way.
Also, I make sure that none of the waveOut API calls I use fail.
I'm currently assuming that this is either a bug in Windows, or in the audio driver. Dissenting opinions are always welcome.
Now, the reason that GlobalHandle()
bombs is not due to a heap corruption,
as I suspected at first - it's because
waveOutWrite() returned without
setting the first pointer in the
internal structure to a valid pointer.
I suspect that it frees the memory
pointed to by that pointer before
returning, but I haven't disassembled
it yet.
I can reproduce this with your code on my system. I see something similar to what Johannes reported. After the call to WaveOutWrite, hdr->reserved normally holds a pointer to allocated memory (which appears to contain the wave out device name in unicode, among other things).
But occasionally, after returning from WaveOutWrite(), the byte pointed to by hdr->reserved is set to 0. This is normally the least significant byte of that pointer. The rest of the bytes in hdr->reserved are ok, and the block of memory that it normally points to is still allocated and uncorrupted.
It probably is being clobbered by another thread - I can catch the change with a conditional breakpoint immediately after the call to WaveOutWrite(). And the system debug breakpoint is occurring in another thread, not the message handler.
However, I can't cause the system debug breakpoint to occur if I use a callback function instead of the windows messsage pump. (fdwOpen = CALLBACK_FUNCTION in WaveOutOpen() )
When I do it this way, my OnWOMDone handler is called by a different thread - possibly the one that's otherwise responsible for the corruption.
So I think there is a bug, either in windows or the driver, but I think you can work around by handling WOM_DONE with a callback function instead of the windows message pump.
You're not alone with this issue:
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=100589
I'm seeing the same problem and have done some analysis myself:
waveOutWrite() allocates (i.e. GlobalAlloc) a pointer to a heap area of 354 bytes and correctly stores it in the data area pointed to by header.reserved.
But when this heap area is to be freed again (in waveCompleteHeader(), according to your analysis; I don't have the symbols for wdmaud.drv myself), the least significant byte of the pointer has been set to zero, thus invalidating the pointer (while the heap is not corrupted yet). In other words, what happens is something like:
(BYTE *) (header.reserved) = 0
So I disagree with your statements in one point: waveOutWrite() stores a valid pointer first; the pointer only becomes corrupted later from another thread.
Probably that's the same thread (mxdmessage) that later tries to free this heap area, but I did not yet find the point where the zero byte is stored.
This does not happen very often, and the same heap area (same address) has successfully been allocated and deallocated before.
I'm quite convinced that this is a bug somewhere in the system code.
Not sure about this particular problem, but have you considered using a higher-level, cross-platform audio library? There are a lot of quirks with Windows audio programming, and these libraries can save you a lot of headaches.
Examples include PortAudio, RtAudio, and SDL.
The first thing that I'd do would be to check the return values from the waveOutX functions. If any of them fail - which isn't unreasonable given the scenario you describe - and you carry on regardless then it isn't surprising that things start to go wrong. My guess would be that waveOutWrite is returning MMSYSERR_NOMEM at some point.
Use Application Verifier to figure out what's going on, if you do something suspicious, it will catch it much earlier.
It may be helpful to look at the source code for Wine, although it's possible that Wine has fixed whatever bug there is, and it's also possible Wine has other bugs in it. The relevant files are dlls/winmm/winmm.c, dlls/winmm/lolvldrv.c, and possibly others. Good luck!
What about the fact that you are not allowed to call winmm functions from within callback?
MSDN does not mention such restrictions about window messages, but usage of window messages is similar to callback function. Possibly, internally it's implemented as a callback function from the driver and that callback does SendMessage.
Internally, waveout has to maintain linked list of headers that were written using waveOutWrite; So, I guess that:
hdr->reserved = (Undocumented*)calloc(sizeof(Undocumented));
sets previous/next pointers of the linked list or something like this. If you write more buffers, then if you check the pointers and if any of them point to one another then my guess is most likely correct.
Multiple sources on the web mention that you don't need to unprepare/prepare same headers repeatedly. If you comment out Prepare/unprepare header in the original example then it appears to work fine without any problems.
I solved the problem by polling the sound playback and delays:
WAVEHDR header = { buffer, sizeof(buffer), 0, 0, 0, 0, 0, 0 };
waveOutPrepareHeader(hWaveOut, &header, sizeof(WAVEHDR));
waveOutWrite(hWaveOut, &header, sizeof(WAVEHDR));
/*
* wait a while for the block to play then start trying
* to unprepare the header. this will fail until the block has
* played.
*/
while (waveOutUnprepareHeader(hWaveOut,&header,sizeof(WAVEHDR)) == WAVERR_STILLPLAYING)
Sleep(100);
waveOutClose(hWaveOut);
Playing Audio in Windows using waveOut Interface