Embedded RTOS and using malloc/free - c++

I am currently evaluating embOS from SEGGER running on Cortex M4F. It has 128 kilobytes of internal RAM, and 2 megabytes of external RAM, so I know I have plenty of memory.
My program uses some dynamic allocations (yes, I am aware that is not recommended on embedded systems).
When starting my task, I am trying to call malloc/OS_malloc, where the OS_malloc is the thread-safe version provided from embOS. In both cases, malloc failed and returned NULL pointer.
When doing the same malloc/OS_malloc before the OS starts, it works correctly:
**//Malloc here does not fail**
OS_IncDI(); /* Initially disable interrupts */
**//Malloc here does not fail**
OS_InitKern(); /* Initialize OS */
**//Malloc here does fail !!**
OS_InitHW(); /* Initialize Hardware for OS */
OS_CREATETASK(&TCBHP, "My Task", HPTask, 50, StackHP); //**<--And off course malloc failes inside teh task also**
OS_Start();
I went and tried using uCOS from MICRIUM, and I see the same behavior. Any ideas why this happens?

I think that i am on my way to fix the problem .
it seems that setting in the linker script :
_Min_Heap_Size = 0x19000; /* required amount of heap */
_Min_Stack_Size = 0x200; /* required amount of stack */
instead of :
_Min_Heap_Size = 0x00; /* required amount of heap */
_Min_Stack_Size = 0x200; /* required amount of stack */

malloc may returns fail in following condition
1)Running out of memory but as you said i have plenty of memory so this is not the case.
2)malloc is not able to allocate contiguous memory of requested size.
I guess option 2 is responsible for your case.

Related

i convert my compiler to 64 bit from 32 but still i cant use more than 2GB :( why?

i can create this array:
int Array[490000000];
cout << "Array Byte= " << sizeof(Array) << endl;
Array byte = 1,960,000,000 byte and convert gb = 1,96 GB about 2 gb whatever.
but i cant create same time these:
int Array[490000000];
int Array2[490000000];
it give error why ? sorry for bad englisgh :)
Also i checked my compiler like this:
printf("%d\n", sizeof(char *));
it gives me 8.
C++ programs are not usually compiled to have 2Gb+ of stack space, regardless of whether it is compiled in 32-bit mode or 64-bit mode. Stack space can be increased as part of the compiler options, but even in the scenario where it is permissible to set the stack size that high, it's still not an ideomatic solution or recommended.
If you need an array of 2Gb, you should use std::vector<int> Array(490'000'000); (strongly recommended) or a manually created array, i.e. int* Array = new int[490'000'000]; (remember that manually allocated memory must be manually deallocated with delete[]), either of which will allocate dynamic memory. You'll still want to be compiling in 64-bit mode, since this will brush up against the maximum memory limit of your application if you don't, but in your scenario, it's not strictly necessary, since 2Gb is less than the maximum memory of a 32-bit application.
But still I can't use more than 2 GB :( why?
The C++ language does not have semantics to modify (nor report) how much automatic memory is available (or at least I have not seen it.) The compilers rely on the OS to provide some 'useful' amount. You will have to search (google? your hw documents, user's manuals, etc) for how much. This limit is 'machine' dependent, in that some machines do not have as much memory as you may want.
On Ubuntu, for the last few releases, the Posix function ::pthread_attr_getstacksize(...) reports 8 M Bytes per thread. (I am not sure of the proper terminology, but) what linux calls 'Stack' is the resource that the C++ compiler uses for the automatic memory. For this release of OS and compiler, the limit for automatic var's is thus 8M (much smaller than 2G).
I suppose that because the next machine might have more memory, the compiler might be given a bigger automatic memory, and I've seen no semantics that will limit the size of your array based on memory size of the machine performing the compile ...
there can can be no compile-time-report that the stack will overflow.
I see Posix has a function suggesting a way to adjust size of stack. I've not tried it.
I have also found Ubuntu commands that can report and adjust size of various memory issues.
From https://www.nics.tennessee.edu/:
The command to modify limits varies by shell. The C shell (csh) and
its derivatives (such as tcsh) use the limit command to modify limits.
The Bourne shell (sh) and its derivatives (such as ksh and bash) use
the ulimit command. The syntax for these commands varies slightly and
is shown below. More detailed information can be found in the man page
for the shell you are using.
One minor experiment ... the command prompt
& dtb_chimes
launches this work-in-progress app which uses Posix and reports 8 MByte stack (automatic var)
With the ulimit prefix command
$ ulimit -S -s 131072 ; dtb_chimes
the app now reports 134,217,728
./dtb_chimes_ut
default Stack size: 134,217,728
argc: 1
1 ./dtb_chimes_ut
But I have not confirmed the actual allocation ... and this is still a lot smaller than 1.96 GBytes ... but, maybe you can get there.
Note: I strongly recommend std::vector versus big array.
On my Ubuntu desktop, there is 4 GByte total dram (I have memory test utilities), and my dynamic memory is limited to about 3.5 GB. Again, the amount of dynamic memory is machine dependent.
64 bits address a lot more memory than I can afford.

How to assert/test when uninitialised memory is passed to function

I have a situation where a part of my code has been found to be passed uninitialized memory at times. I am looking for a way in which I could assert when this case occurs when running with the debug-heap. This is a function that could be thrown about in places for that extra help in tracking bugs:
void foo( char* data, int dataBytes )
{
assert( !hasUninitialisedData(data,dataBytes) ); //, This is what we would like
...
}
I have seen that there are tools like valgrind and as I run on windows there is DrMemory. These however run external to the application so don't find the issue when it occurs for the developer. More importantly these throw up thousands of reports for Qt and other irrelevant functions making things impossible.
I think the idea is to have a function that would search for the 0xBAADFOOD within the array but there are a whole series of potential hex values and these change per platform. These hex values may also sometimes be valid when integers are stored so not sure if there is more information that can be obtained form the debug-heap.
I am primarily interested the potential there could be a CRT function, library, visual-studio breakpoint, or other helper function for doing this sort of check. It 'feels' like there should be one somewhere already, I couldn't find it yet so if anybody has some nice solutions for this sort of situation it would be appreciated.
EDIT: I should explain better, I know the debug-heap will initialize all allocations with a value in attempt to allow detecting uninitialised data. As mentioned the data being received contains some 0xBAADFOOD values, normally memory is initialized with 0xCDCDCDCD but this is a third party library allocating the data and apparently there are multiple magic numbers hence I am interested if there is a generalized check hidden somewhere.
The VC++ runtime, at least in debug builds, initialize all heap allocations with a certain value. It has been the same value for as long as I can remember. I can't, however, remember the actual value. You could do a quick allocation test and check.
Debug builds of VC++ programs often set uninitialized memory to 0xCD at startup. That's not dependable over the life of the session (once the memory's been allocated/used/deallocated the value will change), but it's a place to start.
I have implemented a function now that basically does what is intended after finding a list of magic numbers on wiki (Magic numbers):
/** Performs a check for potentially unintiialised data
\remarks May incorrectly report uninitialised data as it is always possible the contained data may match the magic numbers in rare circumstances so this function should be used for initial identification of uninitialised data only
*/
bool hasUninitialisedData( const char* data, size_t lenData )
{
const unsigned int kUninitialisedMagic[] =
{
0xABABABAB, // Used by Microsoft's HeapAlloc() to mark "no man's land" guard bytes after allocated heap memory
0xABADCAFE, // A startup to this value to initialize all free memory to catch errant pointers
0xBAADF00D, // Used by Microsoft's LocalAlloc(LMEM_FIXED) to mark uninitialised allocated heap memory
0xBADCAB1E, // Error Code returned to the Microsoft eVC debugger when connection is severed to the debugger
0xBEEFCACE, // Used by Microsoft .NET as a magic number in resource files
0xCCCCCCCC, // Used by Microsoft's C++ debugging runtime library to mark uninitialised stack memory
0xCDCDCDCD, // Used by Microsoft's C++ debugging runtime library to mark uninitialised heap memory
0xDEADDEAD, // A Microsoft Windows STOP Error code used when the user manually initiates the crash.
0xFDFDFDFD, // Used by Microsoft's C++ debugging heap to mark "no man's land" guard bytes before and after allocated heap memory
0xFEEEFEEE, // Used by Microsoft's HeapFree() to mark freed heap memory
};
const unsigned int kUninitialisedMagicCount = sizeof(kUninitialisedMagic)/sizeof(kUninitialisedMagic[0]);
if ( lenData < 4 ) return assert(false=="not enough data for checks!"), false;
for ( unsigned int i =0; i < lenData - 4; ++i ) //< we don't check the last few bytes as keep to full 4-byte/int checks for now, this is where the -4 comes in
{
for ( unsigned int iMagic = 0; iMagic < kUninitialisedMagicCount; ++iMagic )
{
const unsigned int* ival = reinterpret_cast<const unsigned int*>(data + i);
if ( *ival == kUninitialisedMagic[iMagic] )
return true;
}
}
return false;
}

Threads are blocked in malloc and free, virtual size

I'm running a 64-bit multi-threaded program on the windows server 2003 server (X64), It run into a case that some of the threads seem to be blocked in the malloc or free function forever. The stack trace is like follows:
ntdll.dll!NtWaitForSingleObject() + 0xa bytes
ntdll.dll!RtlpWaitOnCriticalSection() - 0x1aa bytes
ntdll.dll!RtlEnterCriticalSection() + 0xb040 bytes
ntdll.dll!RtlpDebugPageHeapAllocate() + 0x2f6 bytes
ntdll.dll!RtlDebugAllocateHeap() + 0x40 bytes
ntdll.dll!RtlAllocateHeapSlowly() + 0x5e898 bytes
ntdll.dll!RtlAllocateHeap() - 0x1711a bytes
MyProg.exe!malloc(unsigned __int64 size=0) Line 168 C
MyProg.exe!operator new(unsigned __int64 size=1) Line 59 + 0x5 bytes C++
ntdll.dll!NtWaitForSingleObject()
ntdll.dll!RtlpWaitOnCriticalSection()
ntdll.dll!RtlEnterCriticalSection()
ntdll.dll!RtlpDebugPageHeapFree()
ntdll.dll!RtlDebugFreeHeap()
ntdll.dll!RtlFreeHeapSlowly()
ntdll.dll!RtlFreeHeap()
MyProg.exe!free(void * pBlock=0x000000007e8e4fe0) C
BTW, the param values passed to the new operator is not correct here maybe due to optimization.
Also, at the same time, I found in the process Explorer, the virtual size of this program is 10GB, but the private bytes and working set is very small (<2GB). We did have some threads using virtualalloc but in a way that commit the memory in the call, and these threads are not blocked.
m_pBuf = VirtualAlloc(NULL, m_size, MEM_COMMIT, PAGE_READWRITE);
......
VirtualFree(m_pBuf, 0, MEM_RELEASE);
This looks strange to me, seems a lot of virtual space is reserved but not committed, and malloc/free is blocked by lock. I'm guessing if there's any corruptions in the memory/object, so plan to turn on gflag with pageheap to troubleshoot this.
Does anyone has similar experience on this before? Could you share with me so I may get more hints?
Thanks a lot!
Your program is using PageHeap, which is intended for debugging only and imposes a ton of memory overhead. To see which programs have PageHeap activated, do this at a command line.
% Gflags.exe /p
To disable it for your process, type this (for MyProg.exe):
% Gflags.exe /p /disable MyProg.exe
Pageheap.exe detects most heap-related bugs - try Pageheap
Also you should look in to "the param values passed to the new ..." - does this corruption occur in the debug mode? make sure all optimizations are disabled.
If your system is running out of memory, it might be the case that the OS is swapping, that means that for a single allocation, in the worst case the OS could need to locate the best candidate for swapping, write it to disk, free the memory and return it. Are you sure that it is locking or might it just be performing very slowly? Can another thread be swapping memory to disk while these two threads wait for it's call to malloc/free to complete?
My preferred solution for debugging leaks in native applications in to use UMDH to get consecutive snapshots of the user-mode heap(s) in the process and then run UMDH again to diff the snapshots. Any pattern of change in the snapshots is likely a leak.
You get a count and size of memory blocks bucketed by their allocating callstack so it's reasonably straightforward to see where the biggest hogs are.
The user-mode dump heap (UMDH) utility
works with the operating system to
analyze Windows heap allocations for a
specific process.

C/C++ maximum stack size of program on mainstream OSes

I want to do DFS on a 100 X 100 array. (Say elements of array represents graph nodes) So assuming worst case, depth of recursive function calls can go upto 10000 with each call taking upto say 20 bytes. So is it feasible means is there a possibility of stackoverflow?
What is the maximum size of stack in C/C++?
Please specify for gcc for both
1) cygwin on Windows
2) Unix
What are the general limits?
In Visual Studio the default stack size is 1 MB i think, so with a recursion depth of 10,000 each stack frame can be at most ~100 bytes which should be sufficient for a DFS algorithm.
Most compilers including Visual Studio let you specify the stack size. On some (all?) linux flavours the stack size isn't part of the executable but an environment variable in the OS. You can then check the stack size with ulimit -s and set it to a new value with for example ulimit -s 16384.
Here's a link with default stack sizes for gcc.
DFS without recursion:
std::stack<Node> dfs;
dfs.push(start);
do {
Node top = dfs.top();
if (top is what we are looking for) {
break;
}
dfs.pop();
for (outgoing nodes from top) {
dfs.push(outgoing node);
}
} while (!dfs.empty())
Stacks for threads are often smaller.
You can change the default at link time,
or change at run time also.
For reference, some defaults are:
glibc i386, x86_64: 7.4 MB
Tru64 5.1: 5.2 MB
Cygwin: 1.8 MB
Solaris 7..10: 1 MB
MacOS X 10.5: 460 KB
AIX 5: 98 KB
OpenBSD 4.0: 64 KB
HP-UX 11: 16 KB
Platform-dependent, toolchain-dependent, ulimit-dependent, parameter-dependent.... It is not at all specified, and there are many static and dynamic properties that can influence it.
Yes, there is a possibility of stack overflow. The C and C++ standard do not dictate things like stack depth, those are generally an environmental issue.
Most decent development environments and/or operating systems will let you tailor the stack size of a process, either at link or load time.
You should specify which OS and development environment you're using for more targeted assistance.
For example, under Ubuntu Karmic Koala, the default for gcc is 2M reserved and 4K committed but this can be changed when you link the program. Use the --stack option of ld to do that.
I just ran out of stack at work, it was a database and it was running some threads, basically the previous developer had thrown a big array on the stack, and the stack was low anyway. The software was compiled using Microsoft Visual Studio 2015.
Even though the thread had run out of stack, it silently failed and continued on, it only stack overflowed when it came to access the contents of the data on the stack.
The best advice i can give is to not declare arrays on the stack - especially in complex applications and particularly in threads, instead use heap. That's what it's there for ;)
Also just keep in mind it may not fail immediately when declaring the stack, but only on access. My guess is that the compiler declares stack under windows "optimistically", i.e. it will assume that the stack has been declared and is sufficiently sized until it comes to use it and then finds out that the stack isn't there.
Different operating systems may have different stack declaration policies. Please leave a comment if you know what these policies are.
I am not sure what you mean by doing a depth first search on a rectangular array, but I assume you know what you are doing.
If the stack limit is a problem you should be able to convert your recursive solution into an iterative solution that pushes intermediate values onto a stack which is allocated from the heap.
(Added 26 Sept. 2020)
On 24 Oct. 2009, as #pixelbeat first pointed out here, Bruno Haible empirically discovered the following default thread stack sizes for several systems. He said that in a multithreaded program, "the default thread stack size is" as follows. I added in the "Actual" size column because #Peter.Cordes indicates in his comments below my answer, however, that the odd tested numbers shown below do not include all of the thread stack, since some of it was used in initialization. If I run ulimit -s to see "the maximum stack size" that my Linux computer is configured for, it outputs 8192 kB, which is exactly 8 MB, not the odd 7.4 MB listed in the table below for my x86-64 computer with the gcc compiler and glibc. So, you can probably add a little to the numbers in the table below to get the actual full stack size for a given thread.
Note also that the below "Tested" column units are all in MB and KB (base 1000 numbers), NOT MiB and KiB (base 1024 numbers). I've proven this to myself by verifying the 7.4 MB case.
Thread stack sizes
System and std library Tested Actual
---------------------- ------ ------
- glibc i386, x86_64 7.4 MB 8 MiB (8192 KiB, as shown by `ulimit -s`)
- Tru64 5.1 5.2 MB ?
- Cygwin 1.8 MB ?
- Solaris 7..10 1 MB ?
- MacOS X 10.5 460 KB ?
- AIX 5 98 KB ?
- OpenBSD 4.0 64 KB ?
- HP-UX 11 16 KB ?
Bruno Haible also stated that:
32 KB is more than you can safely allocate on the stack in a multithreaded program
And he said:
And the default stack size for sigaltstack, SIGSTKSZ, is
only 16 KB on some platforms: IRIX, OSF/1, Haiku.
only 8 KB on some platforms: glibc, NetBSD, OpenBSD, HP-UX, Solaris.
only 4 KB on some platforms: AIX.
Bruno
He wrote the following simple Linux C program to empirically determine the above values. You can run it on your system today to quickly see what your maximum thread stack size is, or you can run it online on GDBOnline here: https://onlinegdb.com/rkO9JnaHD.
Explanation: It simply creates a single new thread, so as to check the thread stack size and NOT the program stack size, in case they differ, then it has that thread repeatedly allocate 128 bytes of memory on the stack (NOT the heap), using the Linux alloca() call, after which it writes a 0 to the first byte of this new memory block, and then it prints out how many total bytes it has allocated. It repeats this process, allocating 128 more bytes on the stack each time, until the program crashes with a Segmentation fault (core dumped) error. The last value printed is the estimated maximum thread stack size allowed for your system.
Important note: alloca() allocates on the stack: even though this looks like dynamic memory allocation onto the heap, similar to a malloc() call, alloca() does NOT dynamically allocate onto the heap. Rather, alloca() is a specialized Linux function to "pseudo-dynamically" (I'm not sure what I'd call this, so that's the term I chose) allocate directly onto the stack as though it was statically-allocated memory. Stack memory used and returned by alloca() is scoped at the function-level, and is therefore "automatically freed when the function that called alloca() returns to its caller." That's why its static scope isn't exited and memory allocated by alloca() is NOT freed each time a for loop iteration is completed and the end of the for loop scope is reached. See man 3 alloca for details. Here's the pertinent quote (emphasis added):
DESCRIPTION
The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.
RETURN VALUE
The alloca() function returns a pointer to the beginning of the allocated space. If the allocation causes stack overflow, program behavior is undefined.
Here is Bruno Haible's program from 24 Oct. 2009, copied directly from the GNU mailing list here:
Again, you can run it live online here.
// By Bruno Haible
// 24 Oct. 2009
// Source: https://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00262.html
// =============== Program for determining the default thread stack size =========
#include <alloca.h>
#include <pthread.h>
#include <stdio.h>
void* threadfunc (void*p) {
int n = 0;
for (;;) {
printf("Allocated %d bytes\n", n);
fflush(stdout);
n += 128;
*((volatile char *) alloca(128)) = 0;
}
}
int main()
{
pthread_t thread;
pthread_create(&thread, NULL, threadfunc, NULL);
for (;;) {}
}
When I run it on GDBOnline using the link above, I get the exact same results each time I run it, as both a C and a C++17 program. It takes about 10 seconds or so to run. Here are the last several lines of the output:
Allocated 7449856 bytes
Allocated 7449984 bytes
Allocated 7450112 bytes
Allocated 7450240 bytes
Allocated 7450368 bytes
Allocated 7450496 bytes
Allocated 7450624 bytes
Allocated 7450752 bytes
Allocated 7450880 bytes
Segmentation fault (core dumped)
So, the thread stack size is ~7.45 MB for this system, as Bruno mentioned above (7.4 MB).
I've made a few changes to the program, mostly just for clarity, but also for efficiency, and a bit for learning.
Summary of my changes:
[learning] I passed in BYTES_TO_ALLOCATE_EACH_LOOP as an argument to the threadfunc() just for practice passing in and using generic void* arguments in C.
Note: This is also the required function prototype, as required by the pthread_create() function, for the callback function (threadfunc() in my case) passed to pthread_create(). See: https://www.man7.org/linux/man-pages/man3/pthread_create.3.html.
[efficiency] I made the main thread sleep instead of wastefully spinning.
[clarity] I added more-verbose variable names, such as BYTES_TO_ALLOCATE_EACH_LOOP and bytes_allocated.
[clarity] I changed this:
*((volatile char *) alloca(128)) = 0;
to this:
volatile uint8_t * byte_buff =
(volatile uint8_t *)alloca(BYTES_TO_ALLOCATE_EACH_LOOP);
byte_buff[0] = 0;
Here is my modified test program, which does exactly the same thing as Bruno's, and even has the same results:
You can run it online here, or download it from my repo here. If you choose to run it locally from my repo, here's the build and run commands I used for testing:
Build and run it as a C program:
mkdir -p bin && \
gcc -Wall -Werror -g3 -O3 -std=c11 -pthread -o bin/tmp \
onlinegdb--empirically_determine_max_thread_stack_size_GS_version.c && \
time bin/tmp
Build and run it as a C++ program:
mkdir -p bin && \
g++ -Wall -Werror -g3 -O3 -std=c++17 -pthread -o bin/tmp \
onlinegdb--empirically_determine_max_thread_stack_size_GS_version.c && \
time bin/tmp
It takes < 0.5 seconds to run locally on a fast computer with a thread stack size of ~7.4 MB.
Here's the program:
// =============== Program for determining the default thread stack size =========
// Modified by Gabriel Staples, 26 Sept. 2020
// Originally by Bruno Haible
// 24 Oct. 2009
// Source: https://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00262.html
#include <alloca.h>
#include <pthread.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <unistd.h> // sleep
/// Thread function to repeatedly allocate memory within a thread, printing
/// the total memory allocated each time, until the program crashes. The last
/// value printed before the crash indicates how big a thread's stack size is.
///
/// Note: passing in a `uint32_t` as a `void *` type here is for practice,
/// to learn how to pass in ANY type to a func by using a `void *` parameter.
/// This is also the required function prototype, as required by the
/// `pthread_create()` function, for the callback function (this function)
/// passed to `pthread_create()`. See:
/// https://www.man7.org/linux/man-pages/man3/pthread_create.3.html
void* threadfunc(void* bytes_to_allocate_each_loop)
{
const uint32_t BYTES_TO_ALLOCATE_EACH_LOOP =
*(uint32_t*)bytes_to_allocate_each_loop;
uint32_t bytes_allocated = 0;
while (true)
{
printf("bytes_allocated = %u\n", bytes_allocated);
fflush(stdout);
// NB: it appears that you don't necessarily need `volatile` here,
// but you DO definitely need to actually use (ex: write to) the
// memory allocated by `alloca()`, as we do below, or else the
// `alloca()` call does seem to get optimized out on some systems,
// making this whole program just run infinitely forever without
// ever hitting the expected segmentation fault.
volatile uint8_t * byte_buff =
(volatile uint8_t *)alloca(BYTES_TO_ALLOCATE_EACH_LOOP);
byte_buff[0] = 0;
bytes_allocated += BYTES_TO_ALLOCATE_EACH_LOOP;
}
}
int main()
{
const uint32_t BYTES_TO_ALLOCATE_EACH_LOOP = 128;
pthread_t thread;
pthread_create(&thread, NULL, threadfunc,
(void*)(&BYTES_TO_ALLOCATE_EACH_LOOP));
while (true)
{
const unsigned int SLEEP_SEC = 10000;
sleep(SLEEP_SEC);
}
return 0;
}
Sample output (same results as Bruno Haible's original program):
bytes_allocated = 7450240
bytes_allocated = 7450368
bytes_allocated = 7450496
bytes_allocated = 7450624
bytes_allocated = 7450752
bytes_allocated = 7450880
Segmentation fault (core dumped)

Why would waveOutWrite() cause an exception in the debug heap?

While researching this issue, I found multiple mentions of the following scenario online, invariably as unanswered questions on programming forums. I hope that posting this here will at least serve to document my findings.
First, the symptom: While running pretty standard code that uses waveOutWrite() to output PCM audio, I sometimes get this when running under the debugger:
ntdll.dll!_DbgBreakPoint#0()
ntdll.dll!_RtlpBreakPointHeap#4() + 0x28 bytes
ntdll.dll!_RtlpValidateHeapEntry#12() + 0x113 bytes
ntdll.dll!_RtlDebugGetUserInfoHeap#20() + 0x96 bytes
ntdll.dll!_RtlGetUserInfoHeap#20() + 0x32743 bytes
kernel32.dll!_GlobalHandle#4() + 0x3a bytes
wdmaud.drv!_waveCompleteHeader#4() + 0x40 bytes
wdmaud.drv!_waveThread#4() + 0x9c bytes
kernel32.dll!_BaseThreadStart#8() + 0x37 bytes
While the obvious suspect would be a heap corruption somewhere else in the code, I found out that that's not the case. Furthermore, I was able to reproduce this problem using the following code (this is part of a dialog based MFC application:)
void CwaveoutDlg::OnBnClickedButton1()
{
WAVEFORMATEX wfx;
wfx.nSamplesPerSec = 44100; /* sample rate */
wfx.wBitsPerSample = 16; /* sample size */
wfx.nChannels = 2;
wfx.cbSize = 0; /* size of _extra_ info */
wfx.wFormatTag = WAVE_FORMAT_PCM;
wfx.nBlockAlign = (wfx.wBitsPerSample >> 3) * wfx.nChannels;
wfx.nAvgBytesPerSec = wfx.nBlockAlign * wfx.nSamplesPerSec;
waveOutOpen(&hWaveOut,
WAVE_MAPPER,
&wfx,
(DWORD_PTR)m_hWnd,
0,
CALLBACK_WINDOW );
ZeroMemory(&header, sizeof(header));
header.dwBufferLength = 4608;
header.lpData = (LPSTR)GlobalLock(GlobalAlloc(GMEM_MOVEABLE | GMEM_SHARE | GMEM_ZEROINIT, 4608));
waveOutPrepareHeader(hWaveOut, &header, sizeof(header));
waveOutWrite(hWaveOut, &header, sizeof(header));
}
afx_msg LRESULT CwaveoutDlg::OnWOMDone(WPARAM wParam, LPARAM lParam)
{
HWAVEOUT dev = (HWAVEOUT)wParam;
WAVEHDR *hdr = (WAVEHDR*)lParam;
waveOutUnprepareHeader(dev, hdr, sizeof(WAVEHDR));
GlobalFree(GlobalHandle(hdr->lpData));
ZeroMemory(hdr, sizeof(*hdr));
hdr->dwBufferLength = 4608;
hdr->lpData = (LPSTR)GlobalLock(GlobalAlloc(GMEM_MOVEABLE | GMEM_SHARE | GMEM_ZEROINIT, 4608));
waveOutPrepareHeader(hWaveOut, &header, sizeof(WAVEHDR));
waveOutWrite(hWaveOut, hdr, sizeof(WAVEHDR));
return 0;
}
Before anyone comments on this, yes - the sample code plays back uninitialized memory. Don't try this with your speakers turned all the way up.
Some debugging revealed the following information: waveOutPrepareHeader() populates header.reserved with a pointer to what appears to be a structure containing at least two pointers as its first two members. The first pointer is set to NULL. After calling waveOutWrite(), this pointer is set to a pointer allocated on the global heap. In pseudo code, that would look something like this:
struct Undocumented { void *p1, *p2; } /* This might have more members */
MMRESULT waveOutPrepareHeader( handle, LPWAVEHDR hdr, ...) {
hdr->reserved = (Undocumented*)calloc(sizeof(Undocumented));
/* Do more stuff... */
}
MMRESULT waveOutWrite( handle, LPWAVEHDR hdr, ...) {
/* The following assignment fails rarely, causing the problem: */
hdr->reserved->p1 = malloc( /* chunk of private data */ );
/* Probably more code to initiate playback */
}
Normally, the header is returned to the application by waveCompleteHeader(), a function internal to wdmaud.dll. waveCompleteHeader() tries to deallocate the pointer allocated by waveOutWrite() by calling GlobalHandle()/GlobalUnlock() and friends. Sometimes, GlobalHandle() bombs, as shown above.
Now, the reason that GlobalHandle() bombs is not due to a heap corruption, as I suspected at first - it's because waveOutWrite() returned without setting the first pointer in the internal structure to a valid pointer. I suspect that it frees the memory pointed to by that pointer before returning, but I haven't disassembled it yet.
This only appears to happen when the wave playback system is low on buffers, which is why I'm using a single header to reproduce this.
At this point I have a pretty good case against this being a bug in my application - after all, my application is not even running. Has anyone seen this before?
I'm seeing this on Windows XP SP2. The audio card is from SigmaTel, and the driver version is 5.10.0.4995.
Notes:
To prevent confusion in the future, I'd like to point out that the answer suggesting that the problem lies with the use of malloc()/free() to manage the buffers being played is simply wrong. You'll note that I changed the code above to reflect the suggestion, to prevent more people from making the same mistake - it doesn't make a difference. The buffer being freed by waveCompleteHeader() is not the one containing the PCM data, the responsibility to free the PCM buffer lies with the application, and there's no requirement that it be allocated in any specific way.
Also, I make sure that none of the waveOut API calls I use fail.
I'm currently assuming that this is either a bug in Windows, or in the audio driver. Dissenting opinions are always welcome.
Now, the reason that GlobalHandle()
bombs is not due to a heap corruption,
as I suspected at first - it's because
waveOutWrite() returned without
setting the first pointer in the
internal structure to a valid pointer.
I suspect that it frees the memory
pointed to by that pointer before
returning, but I haven't disassembled
it yet.
I can reproduce this with your code on my system. I see something similar to what Johannes reported. After the call to WaveOutWrite, hdr->reserved normally holds a pointer to allocated memory (which appears to contain the wave out device name in unicode, among other things).
But occasionally, after returning from WaveOutWrite(), the byte pointed to by hdr->reserved is set to 0. This is normally the least significant byte of that pointer. The rest of the bytes in hdr->reserved are ok, and the block of memory that it normally points to is still allocated and uncorrupted.
It probably is being clobbered by another thread - I can catch the change with a conditional breakpoint immediately after the call to WaveOutWrite(). And the system debug breakpoint is occurring in another thread, not the message handler.
However, I can't cause the system debug breakpoint to occur if I use a callback function instead of the windows messsage pump. (fdwOpen = CALLBACK_FUNCTION in WaveOutOpen() )
When I do it this way, my OnWOMDone handler is called by a different thread - possibly the one that's otherwise responsible for the corruption.
So I think there is a bug, either in windows or the driver, but I think you can work around by handling WOM_DONE with a callback function instead of the windows message pump.
You're not alone with this issue:
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=100589
I'm seeing the same problem and have done some analysis myself:
waveOutWrite() allocates (i.e. GlobalAlloc) a pointer to a heap area of 354 bytes and correctly stores it in the data area pointed to by header.reserved.
But when this heap area is to be freed again (in waveCompleteHeader(), according to your analysis; I don't have the symbols for wdmaud.drv myself), the least significant byte of the pointer has been set to zero, thus invalidating the pointer (while the heap is not corrupted yet). In other words, what happens is something like:
(BYTE *) (header.reserved) = 0
So I disagree with your statements in one point: waveOutWrite() stores a valid pointer first; the pointer only becomes corrupted later from another thread.
Probably that's the same thread (mxdmessage) that later tries to free this heap area, but I did not yet find the point where the zero byte is stored.
This does not happen very often, and the same heap area (same address) has successfully been allocated and deallocated before.
I'm quite convinced that this is a bug somewhere in the system code.
Not sure about this particular problem, but have you considered using a higher-level, cross-platform audio library? There are a lot of quirks with Windows audio programming, and these libraries can save you a lot of headaches.
Examples include PortAudio, RtAudio, and SDL.
The first thing that I'd do would be to check the return values from the waveOutX functions. If any of them fail - which isn't unreasonable given the scenario you describe - and you carry on regardless then it isn't surprising that things start to go wrong. My guess would be that waveOutWrite is returning MMSYSERR_NOMEM at some point.
Use Application Verifier to figure out what's going on, if you do something suspicious, it will catch it much earlier.
It may be helpful to look at the source code for Wine, although it's possible that Wine has fixed whatever bug there is, and it's also possible Wine has other bugs in it. The relevant files are dlls/winmm/winmm.c, dlls/winmm/lolvldrv.c, and possibly others. Good luck!
What about the fact that you are not allowed to call winmm functions from within callback?
MSDN does not mention such restrictions about window messages, but usage of window messages is similar to callback function. Possibly, internally it's implemented as a callback function from the driver and that callback does SendMessage.
Internally, waveout has to maintain linked list of headers that were written using waveOutWrite; So, I guess that:
hdr->reserved = (Undocumented*)calloc(sizeof(Undocumented));
sets previous/next pointers of the linked list or something like this. If you write more buffers, then if you check the pointers and if any of them point to one another then my guess is most likely correct.
Multiple sources on the web mention that you don't need to unprepare/prepare same headers repeatedly. If you comment out Prepare/unprepare header in the original example then it appears to work fine without any problems.
I solved the problem by polling the sound playback and delays:
WAVEHDR header = { buffer, sizeof(buffer), 0, 0, 0, 0, 0, 0 };
waveOutPrepareHeader(hWaveOut, &header, sizeof(WAVEHDR));
waveOutWrite(hWaveOut, &header, sizeof(WAVEHDR));
/*
* wait a while for the block to play then start trying
* to unprepare the header. this will fail until the block has
* played.
*/
while (waveOutUnprepareHeader(hWaveOut,&header,sizeof(WAVEHDR)) == WAVERR_STILLPLAYING)
Sleep(100);
waveOutClose(hWaveOut);
Playing Audio in Windows using waveOut Interface