Having read this interesting article outlining a technique for debugging heap corruption, I started wondering how I could tweak it for my own needs. The basic idea is to provide a custom malloc() for allocating whole pages of memory, then enabling some memory protection bits for those pages, so that the program crashes when they get written to, and the offending write instruction can be caught in the act. The sample code is C under Linux (mprotect() is used to enable the protection), and I'm curious as to how to apply this to native C++ and Windows. VirtualAlloc() and/or VirtualProtect() look promising, but I'm not sure how a use scenario would look like.
Fred *p = new Fred[100];
ProtectBuffer(p);
p[10] = Fred(); // like this to crash please
I am aware of the existence of specialized tools for debugging memory corruption in Windows, but I'm still curious if it would be possible to do it "manually" using this approach.
EDIT: Also, is this even a good idea under Windows, or just an entertaining intellectual excercise?
Yes, you can use VirtualAlloc and VirtualProtect to set up sections of memory that are protected from read/write operations.
You would have to re-implement operator new and operator delete (and their [] relatives), such that your memory allocations are controlled by your code.
And bear in mind that it would only be on a per-page basis, and you would be using (at least) three pages worth of virtual memory per allocation - not a huge problem on a 64-bit system, but may cause problems if you have many allocations in a 32-bit system.
Roughly what you need to do (you should actually find the page-size for the build of Windows - I'm too lazy, so I'll use 4096 and 4095 to represent pagesize and pagesize-1 - you also will need to do more error checking than this code does!!!):
void *operator new(size_t size)
{
Round size up to size in pages + 2 pages extra.
size_t bigsize = (size + 2*4096 + 4095) & ~4095;
// Make a reservation of "size" bytes.
void *addr = VirtualAlloc(NULL, bigsize, PAGE_NOACCESS, MEM_RESERVE);
addr = reinterpret_cast<void *>(reinterpret_cast<char *>(addr) + 4096);
void *new_addr = VirtualAlloc(addr, size, PAGE_READWRITE, MEM_COMMIT);
return new_addr;
}
void operator delete(void *ptr)
{
char *tmp = reinterpret_cast<char *>(ptr) - 4096;
VirtualFree(reinterpret_cast<void*>(tmp));
}
Something along those lines, as I said - I haven't tried compiling this code, as I only have a Windows VM, and I can't be bothered to download a compiler and see if it actually compiles. [I know the principle works, as we did something similar where I worked a few years back].
This is what Gaurd Pages are for (see this MSDN tutorial), they raise a special exception when the page is accessed the first time, allowing you to do more than crash on the first invalid pages access (and catch bad read/writes as opposed to NULL pointers etc).
Related
I have a complex code base in C++. I have run a memory profiler that counts the number of bytes allocated by malloc, this gives me X bytes. Theoretically, my code should return X-Y bytes (Y varies with the input, and ranges from a few KB to a couple of GB, so this is not negligible.)
I need to find out which part of my code is asking for the extra bytes. I've tried a few tools, but to no avail: massif, perf, I've even tried gdb breaking on malloc(). I could probably write a wrapper for malloc asking to provide the calling function, but I don't know how to do that.
Does anyone know a way to find how much memory different parts of the program are asking for?
If you use a custom allocate function - a wrapper around malloc - you can use the gcc backtrace functions (http://man7.org/linux/man-pages/man3/backtrace.3.html) to find out which functions call malloc with what arguments.
That'll tell you the functions which are allocating. From there you can probably sort the biggies into domains by hand.
This question has good info on the wrapping itself. Create a wrapper function for malloc and free in C
Update:
This won't catch new/delete allocations but overriding them is even easier than malloc! See here: How to properly replace global new & delete operators + the very important comment on the best answer "Don't forget the other 3 versions: new[], delete[], nothrow"
You can make a macro that calls the libc malloc and prints the details of the allocation.
#define malloc( sz ) (\
{\
printf( "Allocating %d Bytes, File %s:%d\n", sz, __FILE__, __LINE__ );\
void *(*libc_malloc)(size_t) = dlsym(RTLD_NEXT, "malloc");\
printf("malloc\n");\
void* mem = libc_malloc(sz);\
mem; // GCC-specific statement-expression \
}
This should ( touch wood ) get called in lieu of the real malloc and spit out the number of bytes allocated and where the allocation occurred. Returning mem like this is GCC-specific though.
On Windows platform, I'm trying to dump memory from my application where the variables lie. Here's the function:
void MyDump(const void *m, unsigned int n)
{
const unsigned char *p = reinterpret_cast<const unsigned char *>(m);
char buffer[16];
unsigned int mod = 0;
for (unsigned int i = 0; i < n; ++i, ++mod) {
if (mod % 16 == 0) {
mod = 0;
std::cout << " | ";
for (unsigned short j = 0; j < 16; ++j) {
switch (buffer[j]) {
case 0xa:
case 0xb:
case 0xd:
case 0xe:
case 0xf:
std::cout << " ";
break;
default: std::cout << buffer[j];
}
}
std::cout << "\n0x" << std::setfill('0') << std::setw(8) << std::hex << (long)i << " | ";
}
buffer[i % 16] = p[i];
std::cout << std::setw(2) << std::hex << static_cast<unsigned int>(p[i]) << " ";
if (i % 4 == 0 && i != 1)
std::cout << " ";
}
}
Now, how can I know from which address starts my process memory space, where all the variables are stored? And how do I now, how long the area is?
For instance:
MyDump(0x0000 /* <-- Starts from here? */, 0x1000 /* <-- This much? */);
Best regards,
nhaa123
The short answer to this question is you cannot approach this problem this way. The way processes are laid out in memory is very much compiler and operating system dependent, and there is no easy to to determine where all of the code and variables lie. To accurately and completely find all of the variables, you'd need to write large portions of a debugger yourself (or borrow them from a real debugger's code).
But, you could perhaps narrow the scope of your question a little bit. If what you really want is just a stack trace, those are not too hard to generate: How can one grab a stack trace in C?
Or if you want to examine the stack itself, it is easy to get a pointer to the current top of the stack (just declare a local variable and then take it's address). Tthe easiest way to get the bottom of the stack is to declare a variable in main, store it's address in a global variable, and use that address later as the "bottom" (this is easy but not really 'clean').
Getting a picture of the heap is a lot lot lot harder, because you need extensive knowledge of the internal workings of the heap to know which pieces of it are currently allocated. Since the heap is basically "unlimited" in size, that's quite alot of data to print if you just print all of it, even the unused parts. I don't know of a way to do this, and I would highly recommend you not waste time trying.
Getting a picture of static global variables is not as bad as the heap, but also difficult. These live in the data segments of the executable, and unless you want to get into some assembly and parsing of executable formats, just avoid doing this as well.
Overview
What you're trying to do is absolutely possible, and there are even tools to help, but you'll have to do more legwork than I think you're expecting.
In your case, you're particularly interested in "where the variables lie." The system heap API on Windows will be an incredible help to you. The reference is really quite good, and though it won't be a single contiguous region the API will tell you where your variables are.
In general, though, not knowing anything about where your memory is laid out, you're going to have to do a sweep of the entire address space of the process. If you want only data, you'll have to do some filtering of that, too, because code and stack nonsense are also there. Lastly, to avoid seg-faulting while you dump the address space, you may need to add a segfault signal handler that lets you skip unmapped memory while you're dumping.
Process Memory Layout
What you will have, in a running process, is multiple disjoint stretches of memory to print out. They will include:
Compiled code (read-only),
Stack data (local variables),
Static Globals (e.g. from shared libraries or in your program), and
Dynamic heap data (everything from malloc or new).
The key to a reasonable dump of memory is being able to tell which range of addresses belongs to which family. That's your main job, when you're dumping the program. Some of this, you can do by reading the addresses of functions (1) and variables (2, 3 and 4), but if you want to print more than a few things, you'll need some help.
For this, we have...
Useful Tools
Rather than just blindly searching the address space from 0 to 2^64 (which, we all know, is painfully huge), you will want to employ OS and compiler developer tools to narrow down your search. Someone out there needs these tools, maybe even more than you do; it's just a matter of finding them. Here are a few of which I'm aware.
Disclaimer: I don't know many of the Windows equivalents for many of these things, though I'm sure they exist somewhere.
I've already mentioned the Windows system heap API. This is a best-case scenario for you. The more things you can find in this vein, the more accurate and easy your dump will be. Really, the OS and the C runtime know quite a bit about your program. It's a matter of extracting the information.
On Linux, memory types 1 and 3 are accessible through utilities like /proc/pid/maps. In /proc/pid/maps you can see the ranges of the address space reserved for libraries and program code. You can also see the protection bits; read-only ranges, for instance, are probably code, not data.
For Windows tips, Mark Russinovich has written some articles on how to learn about a Windows process's address space and where different things are stored. I imagine he might have some good pointers in there.
Well, you can't, not really... at least not in a portable manner. For the stack, you could do something like:
void* ptr_to_start_of_stack = 0;
int main(int argc, char* argv[])
{
int item_at_approximately_start_of_stack;
ptr_to_start_of_stack = &item_at_approximately_start_of_stack;
// ...
// ... do lots of computation
// ... a function called here can do something similar, and
// ... attempt to print out from ptr_to_start_of_stack to its own
// ... approximate start of stack
// ...
return 0;
}
In terms of attempting to look at the range of the heap, on many systems, you could use the sbrk() function (specifically sbrk(0)) to get a pointer to the start of the heap (typically, it grows upward starting from the end of the address space, while the stack typically grows down from the start of the address space).
That said, this is a really bad idea. Not only is it platform dependent, but the information you can get from it is really not as useful as good logging. I suggest you familiarize yourself with Log4Cxx.
Good logging practice, in addition to the use of a debugger such as GDB, is really the best way to go. Trying to debug your program by looking at a full memory dump is like trying to find a needle in a haystack, and so it really is not as useful as you might think. Logging where the problem might logically be, is more helpful.
AFAIK, this depends on OS, you should look at e.g. memory segmentation.
Assuming you are running on a mainstream operating system. You'll need help from the operating system to find out which addresses in your virtual memory space have mapped pages. For example, on Windows you'd use VirtualQueryEx(). The memory dump you'll get can be as large as two gigabytes, it isn't that likely you discover anything recognizable quickly.
Your debugger already allows you to inspect memory at arbitrary addresses.
You can't, at least not portably. And you can't make many assumptions either.
Unless you're running this on CP/M or MS-DOS.
But with modern systems, the where and hows of where you data and code are located, in the generic case, aren't really up to you.
You can play linker games, and such to get better control of the memory map for you executable, but you won't have any control over, say, any shared libraries you may load, etc.
There's no guarantee that any of your code, for example, is even in a continuous space. The Virtual Memory and loader can place code pretty much where it wants. Nor is there any guarantee that your data is anywhere near your code. In fact, there's no guarantee that you can even READ the memory space where your code lives. (Execute, yes. Read, maybe not.)
At a high level, your program is split in to 3 sections: code, data, and stack. The OS places these where it sees fit, and the memory manager controls what and where you can see stuff.
There are all sorts of things that can muddy these waters.
However.
If you want.
You can try having "markers" in your code. For example, put a function at the start of your file called "startHere()" and then one at the end called "endHere()". If you're lucky, for a single file program, you'll have a continuous blob of code between the function pointers for "startHere" and "endHere".
Same thing with static data. You can try the same concept if you're interested in that at all.
Apparently this function in SDL_Mixer keeps dying, and I'm not sure why. Does anyone have any ideas? According to visual studio, the crash is caused by Windows triggering a breakpoint somewhere in the realloc() line.
The code in question is from the SVN version of SDL_Mixer specifically, if that makes a difference.
static void add_music_decoder(const char *decoder)
{
void *ptr = realloc(music_decoders, num_decoders * sizeof (const char **));
if (ptr == NULL) {
return; /* oh well, go on without it. */
}
music_decoders = (const char **) ptr;
music_decoders[num_decoders++] = decoder;
}
I'm using Visual Studio 2008, and music_decoders and num_decoders are both correct (music_decoders contains one pointer, to the string "WAVE", and music_decoders. ptr is 0x00000000, and the best I can tell, the crash seems to be in the realloc() function. Does anyone have any idea how I could handle this crash problem? I don't mind having to do a bit of refactoring in order to make this work, if it comes down to that.
For one thing, it's not valid to allocate an array of num_decoders pointers, and then write to index num_decoders in that array. Presumably the first time this function was called, it allocated 0 bytes and wrote a pointer to the result. This could have corrupted the memory allocator's structures, resulting in a crash/breakpoint when realloc is called.
Btw, if you report the bug, note that add_chunk_decoder (in mixer.c) is broken in the same way.
I'd replace
void *ptr = realloc(music_decoders, num_decoders * sizeof (const char **));
with
void *ptr = realloc(music_decoders, (num_decoders + 1) * sizeof(*music_decoders));
Make sure that the SDL_Mixer.DLL file and your program build are using the same C Runtime settings. It's possible that the memory is allocated using one CRT, and realloc'ed using another CRT.
In the project settings, look for C/C++ -> Code Generation. The Runtime Library setting there should be the same for both.
music_decoders[num_decoders++] = decoder;
You are one off here. If num_decoders is the size of the array then the last index is num_decoders - 1. Therefore you should replace the line with:
music_decoders[num_decoders-1] = decoder;
And you may want to increment num_decoders at the beginning of the function, not at the end since you want to reallow for the new size, not for the old one.
One additional thing: you want to multiply the size with sizeof (const char *), not with double-star.
Ah, the joys of C programming. A crash in realloc (or malloc or free) can be triggered by writing past the bounds of a memory block -- and this can happen anywhere else in your program. The approach I've used in the past is some flavor of debugging malloc package. Before jumping in with a third party solution, check the docs to see if Visual Studio provides anything along these lines.
Crashes are not generally triggered by breakpoints. Are you crashing, breaking due to a breakpoint or crashing during the handling of the breakpoint?
The debug output window should have some information as to why a CRT breakpoint is being hit. For example, it might notice during the memory operations that guard bytes around the original block have been modified (due to a buffer overrun that occurred before add_music_decoder was even invoked). The CRT will check these guard pages when memory is freed and possibly when realloced too.
I'm pretty inexperienced using C++, but I'm trying to compile version 2.0.2 of the SBML toolbox for matlab on a 64-bit XP platform. The SBML toolbox depends upon Xerces 2.8 and libsbml 2.3.5.
I've been able to build and compile the toolbox on a 32-bit machine, and it works when I test it. However, after rebuilding it on a 64-bit machine (which is a HUGE PITA!), I get a segmentation fault when I try to read long .xml files with it.
I suspect that the issue is caused by pointer addresses issues.
The Stack Trace from the the segmentation fault starts with:
[ 0] 000000003CB3856E libsbml.dll+165230 (StringBuffer_append+000030)
[ 6] 000000003CB1BFAF libsbml.dll+049071 (EventAssignment_createWith+001631)
[ 12] 000000003CB1C1D7 libsbml.dll+049623 (SBML_formulaToString+000039)
[ 18] 000000003CB2C154 libsbml.dll+115028 (
So I'm looking at the StringBuffer_append function in the libsbml code:
LIBSBML_EXTERN
void
StringBuffer_append (StringBuffer_t *sb, const char *s)
{
unsigned long len = strlen(s);
StringBuffer_ensureCapacity(sb, len);
strncpy(sb->buffer + sb->length, s, len + 1);
sb->length += len;
}
ensureCapacity looks like this:
LIBSBML_EXTERN
void
StringBuffer_ensureCapacity (StringBuffer_t *sb, unsigned long n)
{
unsigned long wanted = sb->length + n;
unsigned long c;
if (wanted > sb->capacity)
{
/**
* Double the total new capacity (c) until it is greater-than wanted.
* Grow StringBuffer by this amount minus the current capacity.
*/
for (c = 2 * sb->capacity; c < wanted; c *= 2) ;
StringBuffer_grow(sb, c - sb->capacity);
}
}
and StringBuffer_grow looks like this:
LIBSBML_EXTERN
void
StringBuffer_grow (StringBuffer_t *sb, unsigned long n)
{
sb->capacity += n;
sb->buffer = (char *) safe_realloc(sb->buffer, sb->capacity + 1);
}
Is it likely that the
strncpy(sb->buffer + sb->length, s, len + 1);
in StringBuffer_append is the source of my segfault?
If so, can anyone suggest a fix? I really don't know C++, and am particularly confused by pointers and memory addressing, so am likely to have no idea what you're talking about - I'll need some hand-holding.
Also, I put details of my build process online here, in case anyone else is dealing with trying to compile C++ for 64-bit systems using Microsoft Visual C++ Express Edition.
Thanks in advance!
-Ben
Try printing or using a debugger to see what values your getting for some of your intermediate variables. In StringBuffer_append() O/P len, in StringBuffer_ensureCapacity() observe sb->capacity and c before and in the loop. See if the values make sense.
A segmentation fault may be caused by accessing data beyond the end of the string.
The strange fact that it worked on a 32-bit machine and not a 64-bit O/S is also a clue. Is the physical and pagefile memory size the same for the two machines? Also, in a 64-bit machine the kernel space may be larger than the 32-bit machine, and eating some available memory space that was in the user part of the memory space for 32-bit O/S. For XML the entire document must fit into memory. There are probably some switches to set the size if this is the problem. The difference in machines being the cause of the problem should only be the case if you are working with a very large string. If the string is not huge, it may be some problem with library or utility method that doesn't work well in a 64-bit environment.
Also, use a simple/small xml file to start with if you have nothing else to try.
Where do you initialize sb->length. Your problem is likely in strncpy(), though I don't know why the 32bit -> 64-bit O/S change would matter. Best bet is looking at the intermediate values and your problem will then be obvious.
being one of the developers of libsbml i just stumbled over this. Is this still a problem for you? In the meantime we have released libsbml 5, with separate 64bit and 32bit versions and improved testing a lot, please have a look at:
http://sf.net/projects/sbml/files/libsbml
The problem could be pretty much anything. True, it might be that strncpy does something bad, but most likely, it is simply passed a bad pointer. Which could originate from anywhere. A segfault (or access violation in Windows) simply means that the application tried to read or write to an address it did not have permission to access. So the real question is, where did that address come from? The function that tried to follow the pointer is probably ok. But it was passed a bad pointer from somewhere else. Probably.
Unfortunately, debugging C code is not trivial at the best of time. If the code isn't your own, that doesn't make it easier. :)
StringBuffer is defined as follows:
/**
* Creates a new StringBuffer and returns a pointer to it.
*/
LIBSBML_EXTERN
StringBuffer_t *
StringBuffer_create (unsigned long capacity)
{
StringBuffer_t *sb;
sb = (StringBuffer_t *) safe_malloc(sizeof(StringBuffer_t));
sb->buffer = (char *) safe_malloc(capacity + 1);
sb->capacity = capacity;
StringBuffer_reset(sb);
return sb;
}
A bit more of the stack trace is:
[ 0] 000000003CB3856E libsbml.dll+165230 (StringBuffer_append+000030)
[ 6] 000000003CB1BFAF libsbml.dll+049071 (EventAssignment_createWith+001631)
[ 12] 000000003CB1C1D7 libsbml.dll+049623 (SBML_formulaToString+000039)
[ 18] 000000003CB2C154 libsbml.dll+115028 (Rule::setFormulaFromMath+000036)
[ 20] 0000000001751913 libmx.dll+137491 (mxCheckMN_700+000723)
[ 25] 000000003CB1E7B2 libsbml.dll+059314 (KineticLaw_getFormula+000018)
[ 37] 0000000035727749 TranslateSBML.mexw64+030537 (mexFunction+009353)
Seems if it was in any of the StringBuffer_* functions, that would be in the stack trace. I disagree with how _ensureCapacity and _grow are implemented. None of the functions check if realloc works or not. Realloc failure will certainly cause a segfault. I would insert a check for null after _ensureCapacity. With the way _ensureCapacity and _grow are, it seems possible to get an off-by-one error. If you're running on Windows, the 64-bit and 32-bit systems may have different page protection mechanisms that cause it to fail. (You can often live through off-by-one errors in malloc'ed memory on systems with weak page protection.)
Let's assume that safe_malloc and safe_realloc do something sensible like aborting the program when they can't get the requested memory. That way your program won't continue executing with invalid pointers.
Now let's look at how big StringBuffer_ensureCapacity grows the buffer to, in comparison to the wanted capacity. It's not an off-by-one error. It's an off-by-a-factor-of-two error.
How did your program ever work in x32, I can't guess.
In response to bk1e's comment on the question - unfortunately, I need version 2.0.2 for use with the COBRA toolbox, which doesn't work with the newer version 3. So, I'm stuck with this older version for now.
I'm also hitting some walls attempting to debug - I'm building a .dll, so in addition to recompiling xerces to make sure it has the same debugging settings in MSVC++, I also need to attach to the Matlab process to do the debugging - it's a pretty big jump for my limited experience in this environment, and I haven't dug into it very far yet.
I had hoped there was some obvious syntax issue with the buffer allocation or expansion. Looks like I'm in for a few more days of pain, though.
unsigned long is not a safe type to use for sizes on a 64-bit machine in Windows. Unlike Linux, Windows defines long to be 32 bits on both 32- and 64-bit architectures. So if the buffer being appended to grows beyond 4 GB in size (or if you're trying to append a string whose length is >4GB), you need to change the unsigned long type declarations to size_t (which is 64 bits on 64-bit architectures, in all operating systems).
However, if you're only reading a 1.5 MB file, I don't see how you'd ever get a StringBuffer to exceed 4 GB in size, so this might be a blind alley.
I'm working on a project in C and it requires memalign(). Really, posix_memalign() would do as well, but darwin/OSX lacks both of them.
What is a good solution to shoehorn-in memalign? I don't understand the licensing for posix-C code if I were to rip off memalign.c and put it in my project- I don't want any viral-type licensing LGPL-ing my whole project.
Mac OS X appears to be 16-byte mem aligned.
Quote from the website:
I had a hard time finding a definitive
statement on MacOS X memory alignment
so I did my own tests. On 10.4/intel,
both stack and heap memory is 16 byte
aligned. So people porting software
can stop looking for memalign() and
posix_memalign(). It’s not needed.
update: OSX now has posix_memalign()
Late to the party, but newer versions of OSX do have posix_memalign(). You might want this when aligning to page boundaries. For example:
#include <stdlib.h>
char *buffer;
int pagesize;
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) handle_error("sysconf");
if (posix_memalign((void **)&buffer, pagesize, 4 * pagesize) != 0) {
handle_error("posix_memalign");
}
One thing to note is that, unlike memalign(), posix_memalign() takes **buffer as an argument and returns an integer error code.
Should be easy enough to do yourself, no? Something like the following (not tested):
void *aligned_malloc( size_t size, int align )
{
void *mem = malloc( size + (align-1) + sizeof(void*) );
char *amem = ((char*)mem) + sizeof(void*);
amem += align - ((uintptr)amem & (align - 1));
((void**)amem)[-1] = mem;
return amem;
}
void aligned_free( void *mem )
{
free( ((void**)mem)[-1] );
}
(thanks Jonathan Leffler)
Edit:
Regarding ripping off another memalign implementation, the problem with that is not licensing. Rather, you'd run into the difficulty that any good memalign implementation will be an integral part of the heap-manager codebase, not simply layered on top of malloc/free. So you'd have serious trouble transplanting it to a different heap-manager, especially when you have no access to it's internals.
Why does the software you are porting need memalign() or posix_memalign()? Does it use it for alignments bigger than the 16-byte alignments referenced by austirg?
I see Mike F posted some code - it looks relatively neat, though I think the while loop may be sub-optimal (if the alignment required is 1KB, it could iterate quite a few times).
Doesn't:
amem += align - ((uintptr)amem & (align - 1));
get there in one operation?
Yes Mac OS X does have 16 Byte memory alignment in the ABI.
You should not need to use memalign(). If you memory requirements are a factor of 16 then I would not implement it and maybe just add an assert.
From the macosx man pages:
The malloc(), calloc(), valloc(),
realloc(), and reallocf() functions
allocate memory. The allocated memory is aligned such that it can be
used for any data type, including AltiVec- and SSE-related types. The free()
function frees allocations that were created via the preceding allocation
functions.
If you need an arbitrarily aligned malloc, check out x264's malloc (common/common.c in the git repository), which has a custom memalign for systems without malloc.h. Its extremely trivial code, to the point where I would not even consider it copyrightable, but you should easily be able to implement your own after seeing it.
Of course, if you only need 16-byte alignment, as stated above, its in the OS X ABI.
Might be worthwhile suggesting using Doug Lea's malloc in your code. link text
Thanks for the help, guys... helped in my case (OpenCascade src/Image/Image_PixMap.cxx, OSX10.5.8 PPC)
Combined with the answers above, this might save someone some digging around or instill hope if not particularly familiar with malloc, etc.:
The rather large project I'm building only had one reference to posix_memalign, and it turns out it was the result of a bunch of preprocessor conditions that didn't include OSX but DID include BORLANDC, which confirms what others suggested about it being safe to use malloc in some cases:
#if defined(_MSC_VER)
return (TypePtr )_aligned_malloc (theBytesCount, theAlign);
#elif (defined(__GNUC__) && __GNUC__ >= 4 && __GNUC_MINOR__ >= 1)
return (TypePtr ) _mm_malloc (theBytesCount, theAlign);
#elif defined(__BORLANDC__)
return (TypePtr ) malloc (theBytesCount);
#else
void* aPtr;
if (posix_memalign (&aPtr, theAlign, theBytesCount))
{
aPtr = NULL;
}
return (TypePtr )aPtr;
#endif
So, it could be as simple as just using malloc, as suggested by others.
e.g. here: moving __BORLANDC__ condition above __GNUC__ and adding APPLE:
#elif (defined(__BORLANDC__) || defined(__APPLE__)) //now above `__GNUC__`
NOTE: I did NOT check that BORLANDC uses 16-byte alignment like someone above stated OS X does. Nor did I verify that PPC OS X does. However, this usage suggests that this alignment isn't particularly important. (Here's hoping it works, and that it could be that easy for you searchers, as well!)