OSX lacks memalign - c++

I'm working on a project in C and it requires memalign(). Really, posix_memalign() would do as well, but darwin/OSX lacks both of them.
What is a good solution to shoehorn-in memalign? I don't understand the licensing for posix-C code if I were to rip off memalign.c and put it in my project- I don't want any viral-type licensing LGPL-ing my whole project.

Mac OS X appears to be 16-byte mem aligned.
Quote from the website:
I had a hard time finding a definitive
statement on MacOS X memory alignment
so I did my own tests. On 10.4/intel,
both stack and heap memory is 16 byte
aligned. So people porting software
can stop looking for memalign() and
posix_memalign(). It’s not needed.

update: OSX now has posix_memalign()
Late to the party, but newer versions of OSX do have posix_memalign(). You might want this when aligning to page boundaries. For example:
#include <stdlib.h>
char *buffer;
int pagesize;
pagesize = sysconf(_SC_PAGE_SIZE);
if (pagesize == -1) handle_error("sysconf");
if (posix_memalign((void **)&buffer, pagesize, 4 * pagesize) != 0) {
handle_error("posix_memalign");
}
One thing to note is that, unlike memalign(), posix_memalign() takes **buffer as an argument and returns an integer error code.

Should be easy enough to do yourself, no? Something like the following (not tested):
void *aligned_malloc( size_t size, int align )
{
void *mem = malloc( size + (align-1) + sizeof(void*) );
char *amem = ((char*)mem) + sizeof(void*);
amem += align - ((uintptr)amem & (align - 1));
((void**)amem)[-1] = mem;
return amem;
}
void aligned_free( void *mem )
{
free( ((void**)mem)[-1] );
}
(thanks Jonathan Leffler)
Edit:
Regarding ripping off another memalign implementation, the problem with that is not licensing. Rather, you'd run into the difficulty that any good memalign implementation will be an integral part of the heap-manager codebase, not simply layered on top of malloc/free. So you'd have serious trouble transplanting it to a different heap-manager, especially when you have no access to it's internals.

Why does the software you are porting need memalign() or posix_memalign()? Does it use it for alignments bigger than the 16-byte alignments referenced by austirg?
I see Mike F posted some code - it looks relatively neat, though I think the while loop may be sub-optimal (if the alignment required is 1KB, it could iterate quite a few times).
Doesn't:
amem += align - ((uintptr)amem & (align - 1));
get there in one operation?

Yes Mac OS X does have 16 Byte memory alignment in the ABI.
You should not need to use memalign(). If you memory requirements are a factor of 16 then I would not implement it and maybe just add an assert.

From the macosx man pages:
The malloc(), calloc(), valloc(),
realloc(), and reallocf() functions
allocate memory. The allocated memory is aligned such that it can be
used for any data type, including AltiVec- and SSE-related types. The free()
function frees allocations that were created via the preceding allocation
functions.

If you need an arbitrarily aligned malloc, check out x264's malloc (common/common.c in the git repository), which has a custom memalign for systems without malloc.h. Its extremely trivial code, to the point where I would not even consider it copyrightable, but you should easily be able to implement your own after seeing it.
Of course, if you only need 16-byte alignment, as stated above, its in the OS X ABI.

Might be worthwhile suggesting using Doug Lea's malloc in your code. link text

Thanks for the help, guys... helped in my case (OpenCascade src/Image/Image_PixMap.cxx, OSX10.5.8 PPC)
Combined with the answers above, this might save someone some digging around or instill hope if not particularly familiar with malloc, etc.:
The rather large project I'm building only had one reference to posix_memalign, and it turns out it was the result of a bunch of preprocessor conditions that didn't include OSX but DID include BORLANDC, which confirms what others suggested about it being safe to use malloc in some cases:
#if defined(_MSC_VER)
return (TypePtr )_aligned_malloc (theBytesCount, theAlign);
#elif (defined(__GNUC__) && __GNUC__ >= 4 && __GNUC_MINOR__ >= 1)
return (TypePtr ) _mm_malloc (theBytesCount, theAlign);
#elif defined(__BORLANDC__)
return (TypePtr ) malloc (theBytesCount);
#else
void* aPtr;
if (posix_memalign (&aPtr, theAlign, theBytesCount))
{
aPtr = NULL;
}
return (TypePtr )aPtr;
#endif
So, it could be as simple as just using malloc, as suggested by others.
e.g. here: moving __BORLANDC__ condition above __GNUC__ and adding APPLE:
#elif (defined(__BORLANDC__) || defined(__APPLE__)) //now above `__GNUC__`
NOTE: I did NOT check that BORLANDC uses 16-byte alignment like someone above stated OS X does. Nor did I verify that PPC OS X does. However, this usage suggests that this alignment isn't particularly important. (Here's hoping it works, and that it could be that easy for you searchers, as well!)

Related

std::aligned_alloc() never returns a null pointer. How?

I am using std::aligned_alloc() in one of my projects to allocate aligned memory for optimized PCIe read/write.
When I read about aligned_alloc from here, it says:
Defined in header <stdlib.h>
void *aligned_alloc( size_t alignment, size_t size );
Passing a size which is not an integral multiple of alignment or an alignment which is not valid or not supported by the implementation causes the function to fail and return a null pointer (C11, as published, specified undefined behaviour in this case, this was corrected by DR 460). Removal of size restrictions to make it possible to allocate small objects at restrictive alignment boundaries (similar to alignas) has been proposed by n2072.
From what I understood, now the only valid restriction is that the alignment parameter should be a valid alignment value (and a power of two). Fine. To get a valid alignment value, we can get the value of max_align_t.
[My System RAM - 128 GB.
2 CPUs - AMD EPYC 7313 16-Core Processor. It is a server machine running Centos7 latest]
I now have a couple of doubts here:
In my system, for almost every combination of 'alignment value' and 'size', aligned_alloc() returns success. (Unless the alignment is some huge value). How is this possible? Is it implementation specific?
My code snippet:
```
void* a = aligned_alloc(64, 524288000);
if(a == nullptr)
std::cout << "Failed" << std::endl;
else
std::cout << "Success" << std::endl;
```
Here is what values I tried for aligned_alloc() and their results:
aligned_alloc(64, 524288000) - Success
aligned_alloc(4096, 524288000) - Success
aligned_alloc(64, 331) - Success
aligned_alloc(21312323, 889998) - Success
aligned_alloc(1, 331) - Success
aligned_alloc(0, 21) - Success
aligned_alloc(21312314341, 331); - Success
aligned_alloc(21312312243413, 331); - Failed
Please do comment if any more info is needed to clear the question.
Thanks
Glibc has this line of code https://github.com/lattera/glibc/blob/master/malloc/malloc.c#L3278
/* Make sure alignment is power of 2. */
if (!powerof2 (alignment))
{
size_t a = MALLOC_ALIGNMENT * 2;
while (a < alignment)
a <<= 1;
alignment = a;
}
How is this possible?
(Weeeeellll, that that something is in specification doesn't restrict reality.) There is just code that makes it possible. If you want to know what exactly happens, inspect the source code - glibc is open-source.
Centos7 "latest" is quite old, I see glibc 2.17 which is from year 2012 ( https://centos.pkgs.org/7/centos-x86_64/glibc-2.17-317.el7.x86_64.rpm.html and https://sourceware.org/glibc/wiki/Glibc%20Timeline ). DR460 is from 2014. For that glibc that DR does not exist, and we can say that glibc followed C11 standard and the behavior is undefined.
Is it implementation specific?
"Implementation specific" is a... specific term used by standards to specify the behavior. In C11 the behavior is undefined. in C17 the behavior is that aligned_alloc should fail with invalid alignment. In real life, everything is implementation specific, as glibc comes with the implementation of aligned_alloc.
If you are wondering not about alignment, but why you can specify a size greater than your available RAM, then welcome to virtual memory. Malloc allocates memory more than RAM
Looks like you found a bug. The libc doesn't seem to fail as specified by the standard but just gives you memory instead. Personally I don't see anything wrong with 331 bytes aligned to a 64 byte boundary. It's just not something C/C++ ever has because a struct with 64 byte alignment always has padding at the end to a multiple of 64.
None of your allocations use a lot of ram, half a gig at most. So you are not running out of memory.
As for why insanely huge alignment works?
If the code isn't stupid it will use mmap() with a fixed address to allocate memory to the closest page. So no matter the alignment you should never have more than 2 * 4095 bytes wasted (assuming 4k pages, could be 16k or 64k too).
And as KamilCuk pointed out:
https://github.com/lattera/glibc/blob/master/malloc/malloc.c#L3278
/* Make sure alignment is power of 2. */
if (!powerof2 (alignment))
{
size_t a = MALLOC_ALIGNMENT * 2;
while (a < alignment)
a <<= 1;
alignment = a;
}
Seems like the glibc will round up the alignment to the next power of 2. So all your huge odd numbers would become multiples of page sizes and waste even less. Although how that fullfilles the standard I don't know.
As for your last case: The address space of the architecture is only so big. You can see that in /proc/cpuinfo under Linux:
address sizes : 43 bits physical, 48 bits virtual
Relevant here is the 48 bits virtual. That goes from -128EB - 128EB or 0 - 128EB or (16Gozillabyte - 128EB) to 16Gozillabyte depending on how you view the address space (signed or unsigned addresses). Either way user space has a maximum of 128EB to work with. Your last alignment is ~19TB, or after rounding up 32TB. Looks like glibc isn't smart enough to mmap that properly. That's plenty small enough to work with.

new-Operator increases Arduino sketch size drastically - why?

While restucturing a part of my code into a class I chose to change a static sized array into a dynamic array and was shocked about the size of my sketch, which increased by ~579%!
Now there is some discussion going on about wheather to use new or malloc() but I did not find a hint to this massive increase in sketch size.
So, if anybody would like to explain where this huge increase is comming from, that would be great!
Further, if anybody knows simmilar pitfalls, it would be very nice of you to share ;)
Here is a demo code to check for yourselfs:
void setup() {
// put your setup code here, to run once:
#define BUFLEN 8 * sizeof(char)
#define BUFVAL '5'
#define BUFARR {BUFVAL,BUFVAL,BUFVAL,BUFVAL,BUFVAL,BUFVAL,BUFVAL,BUFVAL,0}
#define MODE 2
int i = 0;
Serial.begin(115200);
#if (MODE == 0)
//10.772 Bytes for an Arduino Due on IDE 1.57 -> 2% of total
char empty_page[BUFLEN+1] = BUFARR;
#elif (MODE == 1)
//12.772 Bytes for an Arduino Due on IDE 1.57 -> 2% of total, ~18.5% increase
char *empty_page = (char *)malloc(BUFLEN+1);
memset(empty_page, BUFVAL, BUFLEN);
empty_page[BUFLEN+1] = '\0'; // NULL Terminate
#elif (MODE == 2)
//73.152 Bytes for an Arduino Due on IDE 1.57 -> 13% of total, ~579% increase
char *empty_page = new char[BUFLEN+1]BUFARR;
#endif
Serial.println("Result: ");
for(i=0; i<BUFLEN; i++) {
Serial.print(empty_page[i]);
}
Serial.println("");
#if (MODE == 1)
free(empty_page);
#elif (MODE == 2)
delete[] empty_page;
#endif
}
void loop() {
// put your main code here, to run repeatedly:
}
To check this code without arduino: http://ideone.com/e.js/bMVi0d
EDIT:
My understanding is, that new leads the IDE to compile in some large c++ stuff in order to handle it. On the other hand, the verbose compiler output of the IDE is identical.
I am trying to minimize my sketches, anybody else with this goal would sure be interested as well in parts like ´new´ that you need to avoid in order to get a smaller sketch. This seems to be a general Arduino IDE thing, so there should be a more meta explaination for it.
The new operator is essentially a type safe version of malloc meant to reduce errors in C++. You can see from the code here that new actually just calls malloc with a few bells and whistles added on. As far as when to use new vs malloc, one great discussion can be found here where the main verdict is that almost all C++ programs should use new. That being said, you do not need the extra bells and whistles for making a char array as you don't need to call a constructor for primitive types (calling a constructor is one of the main functions of the new operator. Also of note, primitive types do not have constructors at all so a compiler may lose performance searching for one). If memory is of dire concern, malloc is a perfectly acceptable solution. Your code with malloc looks perfectly fine and should work better for your purposes.
You're confusing a few parts. The "IDE" is the Integrated Development Environment. That's a GUI which acts as an integrated front end for a few tools, including the compiler and linker. This problem looks like a linker problem.
In particular, this looks like a very poor compiler. It drags in about 60 kB while it should drag in nothing. Type safety should be handled by the compiler. Here the compiler should have told the compiler to just use malloc instead of new[], as all the type checks pass.
You should understand that the Arduino is a cheap product and the tooling isn't exactly state of the art.

How do I count which function requests what number of bytes?

I have a complex code base in C++. I have run a memory profiler that counts the number of bytes allocated by malloc, this gives me X bytes. Theoretically, my code should return X-Y bytes (Y varies with the input, and ranges from a few KB to a couple of GB, so this is not negligible.)
I need to find out which part of my code is asking for the extra bytes. I've tried a few tools, but to no avail: massif, perf, I've even tried gdb breaking on malloc(). I could probably write a wrapper for malloc asking to provide the calling function, but I don't know how to do that.
Does anyone know a way to find how much memory different parts of the program are asking for?
If you use a custom allocate function - a wrapper around malloc - you can use the gcc backtrace functions (http://man7.org/linux/man-pages/man3/backtrace.3.html) to find out which functions call malloc with what arguments.
That'll tell you the functions which are allocating. From there you can probably sort the biggies into domains by hand.
This question has good info on the wrapping itself. Create a wrapper function for malloc and free in C
Update:
This won't catch new/delete allocations but overriding them is even easier than malloc! See here: How to properly replace global new & delete operators + the very important comment on the best answer "Don't forget the other 3 versions: new[], delete[], nothrow"
You can make a macro that calls the libc malloc and prints the details of the allocation.
#define malloc( sz ) (\
{\
printf( "Allocating %d Bytes, File %s:%d\n", sz, __FILE__, __LINE__ );\
void *(*libc_malloc)(size_t) = dlsym(RTLD_NEXT, "malloc");\
printf("malloc\n");\
void* mem = libc_malloc(sz);\
mem; // GCC-specific statement-expression \
}
This should ( touch wood ) get called in lieu of the real malloc and spit out the number of bytes allocated and where the allocation occurred. Returning mem like this is GCC-specific though.

Is it possible to protect a region of memory from WinAPI?

Having read this interesting article outlining a technique for debugging heap corruption, I started wondering how I could tweak it for my own needs. The basic idea is to provide a custom malloc() for allocating whole pages of memory, then enabling some memory protection bits for those pages, so that the program crashes when they get written to, and the offending write instruction can be caught in the act. The sample code is C under Linux (mprotect() is used to enable the protection), and I'm curious as to how to apply this to native C++ and Windows. VirtualAlloc() and/or VirtualProtect() look promising, but I'm not sure how a use scenario would look like.
Fred *p = new Fred[100];
ProtectBuffer(p);
p[10] = Fred(); // like this to crash please
I am aware of the existence of specialized tools for debugging memory corruption in Windows, but I'm still curious if it would be possible to do it "manually" using this approach.
EDIT: Also, is this even a good idea under Windows, or just an entertaining intellectual excercise?
Yes, you can use VirtualAlloc and VirtualProtect to set up sections of memory that are protected from read/write operations.
You would have to re-implement operator new and operator delete (and their [] relatives), such that your memory allocations are controlled by your code.
And bear in mind that it would only be on a per-page basis, and you would be using (at least) three pages worth of virtual memory per allocation - not a huge problem on a 64-bit system, but may cause problems if you have many allocations in a 32-bit system.
Roughly what you need to do (you should actually find the page-size for the build of Windows - I'm too lazy, so I'll use 4096 and 4095 to represent pagesize and pagesize-1 - you also will need to do more error checking than this code does!!!):
void *operator new(size_t size)
{
Round size up to size in pages + 2 pages extra.
size_t bigsize = (size + 2*4096 + 4095) & ~4095;
// Make a reservation of "size" bytes.
void *addr = VirtualAlloc(NULL, bigsize, PAGE_NOACCESS, MEM_RESERVE);
addr = reinterpret_cast<void *>(reinterpret_cast<char *>(addr) + 4096);
void *new_addr = VirtualAlloc(addr, size, PAGE_READWRITE, MEM_COMMIT);
return new_addr;
}
void operator delete(void *ptr)
{
char *tmp = reinterpret_cast<char *>(ptr) - 4096;
VirtualFree(reinterpret_cast<void*>(tmp));
}
Something along those lines, as I said - I haven't tried compiling this code, as I only have a Windows VM, and I can't be bothered to download a compiler and see if it actually compiles. [I know the principle works, as we did something similar where I worked a few years back].
This is what Gaurd Pages are for (see this MSDN tutorial), they raise a special exception when the page is accessed the first time, allowing you to do more than crash on the first invalid pages access (and catch bad read/writes as opposed to NULL pointers etc).

C++, function that checks if software runs on 32-bit or 64-bit system

As I explained in many questions, I'm trying to move a software from a 32-bit system to a 64-bit system.
I had some problem with malloc() function, but now I solved it by correcting a parameter.
In that part of my code, if I run on a 32-bit system, I can use:
(int**) malloc (const * sizeof(int))
But, on a 64-bit system, I have to use:
(int**) malloc (const * sizeof(int64_t))
I'd like to manage these crossroads with an if() condition, so I need a boolean isIt64system() function that behaves this way:
if(isIt64system()) then [64-bit code]
else [32-bit code]
Does this function exist in C++?
Is there any function that tells me if software's running on a 32-bit system or 64-bit system?
Rather than writing two size-dependent branches, just write one correct, portable code path. In your case:
(int**)malloc(count*sizeof(int*));
This will work correctly regardless of the sizeof of int* on your system.
Postscript: As you can see from this literal answer to your question, you are better off not having an if:
if(sizeof(int*) == sizeof(int))
x = (int**)malloc(count*sizeof(int));
else if (sizeof(int*) == sizeof(int64_t))
x = (int**)malloc(count*sizeof(int64_t));
Hopefully you can see how absurdly redundant that code is, and how it should be replaced by a single well-constructed malloc call.
Your compiler will have preprocessor defines that will let you check 32bit versus 64bit.
The best would be to use something like this
#ifdef __LP64__
<64bit code>
#else
<32bit code>
#endif
But if you really need a function for it then this should work.
bool is64bit() {
if (sizeof(int*) == 4) {
return false;
}
return true;
}