So one major problem I have is determining whether a given function in C/C++ does memory allocation or not. I frequently work with external libraries, some of which have functions that return pointers to new objects. Is there some basic design paradigm or convention that will let me know ahead of time if something allocates memory?
It would seem like any function that returns a pointer to a new object must be allocating memory, but this does not always seem to be the case. For example, fopen does not
Edit: To be clear, I don't have access to the source code so I can't just check if it uses new or malloc, ect.
Read the documentation of all libraries you use. They should tell you if certain things should be freed.
If they documented the libraries well (as all the built in libraries are), then it should state something along the lines of "caller must free" in the post condition, sub section side effects of the function.
For C++ anything that calls new or new [] allocates memory. So a function does if it calls those or calls any function (that calls any function.... and so on) that calls new.
The same in C except the calls are malloc, calloc and family.
The best, or at least simplest, solution is the documentation, of course.
But, if you want to be sure that the function doesn't use malloc, you wrap malloc (and its friends calloc, realloc and eventually free) to gather some stats.
Writing wrappers is quiet simple, at least if can use dlsym(3) (sorry I don't know the windows way for that), here is the code for malloc:
void *malloc(size_t s) {
// Retrieve the pointer to the libc's malloc
// I use a static var to avoid time penality
static void* (*real_malloc)(size_t) = NULL;
if (!real_malloc) real_malloc = dlsym(RTLD_NEXT,"malloc");
stat.nmalloc += 1; // count malloc calls
stat.smalloc += s; // count malloced size
// You can also directly print malloc's parameters
// but you first need to check that stdio functions
// doesn't use malloc, or write your own printer
return real_malloc(s);
}
In my example, I use a static global struct to store the number of calls for each functions and the sum of size at each call. The wrapper code is but in a small lib that you can link with your test code (or, if you diretly print statistic, you can use LD_PRELOAD.)
The result is interesting, for example, you said that fopen doesn't use malloc, using that kind of tricks, you can see that it's false. On a 64bits recent linux system, for example, I got one malloc call for 568 bytes when using fopen ([edit] of course the free is done in fclose.)
Related
I'm implementing a C API that exposes some data collected from a systemd service. The header of the API contains two structs:
typedef struct Info {
int latest;
int prev;
int cur;
} Info;
typedef struct InfoArray {
Info *info;
} InfoArray;
The implementation of this C header is written in C++ and I plan to offer a create(size_t size) function to let the user create a new struct with an array of the user entered size. This returned array can then later be passed by reference to another function to fill it with data until the array has reached the size defined by the user.
How would I achieve this?
Edit:
I'm looking for a way to return the allocated array of size n, however I tried the following:
InfoArray *create(size_t size) {
InfoArray *array = (InfoArray *)malloc(size * sizeof(Info));
return array;
}
which should return a allocated array, but when trying to delete the array on the client side by calling delete info I get an error that I cannot free this handle, so how would i let the consumer of the API manage the deletion of the memory?
Since the implementation is written in a .cpp file (C++) i figured there must be a way to return it by using the new operator, but I've failed on implementing it like this.
When using malloc(), to clear the object you have to use free(). This is the C way of doing things.
When using C++, use new and delete instead, like this:
InfoArray *array = new InfoArray[size];
However, when going with C++ i would urge you to just dump the InfoArray thing and roll with something like std::vector<Info>.
Edit:
The implementation of this C header is written in C++
This doesn't make sense. It's either a C unit or a C++ unit. Those are two separate languages, you only get to pick one. Similiar syntax and good interfaceability do not change that.
I see "I'm implementing a C API", so the interface is C, I understand.
I see "implementation of this C header is written in C++" so the code behind the interface is C++ by your design.
I see "trying to delete the array on the client side by calling delete info", so the calling code is also C++.
Strictly speaking, it would be imaginable to use something with a C interface from C++ and to even implement it in C++ with a C interface. I just really hope it is not what OP wants.
But whatever comes out of a C interface will never be accepted by delete, the C++ code which uses the C interface has to accept that it is a C interface and use free() on the malloced pointer.
That is why I recommend to redesign the interface. Either drop the C interface between two pieces of C++ in favor of a C++ interface. Or go with the C interface and drop the idea to use delete.
how would i let the consumer of the API manage the deletion of the memory?
Problems with freeing is one of the reasons why C idiom is for the caller to provide a buffer:
int fillInfo(InfoArray *array, size_t arraySize) {
memcopy...
return countCopied;
}
In this approach the API never allocates anything, it only copies the data to provided buffers. Just like strcpy. The huge upside of this approach is that client can use whatever allocation methods they deem appropriate at the moment.
If you really must return a new buffer then provide a pair of methods:
InfoArray *create(size_t size);
void release(array* InfoArray);
This way clients don't know nor care how the objects are allocated and freed. This is useful when you want to restrict the client how the memory is allocated, but note that the client is still responsible for memory management, it's just limited to one option. And it still doesn't guarantee the client won't use memory allocated in unapproved ways.
Keeping allocation and freeing on 2 sides of the demarcation line between client and library is asking for trouble. You're hiding from the client author how the memory was allocated, and yet you require them to use correct deallocation method. Which is precisely the culprit here.
I am fairly new to C++ and working with DLLs. I have a main application that aggregates results from different measurements. As the measurements are different from case to case I decided to put them into external DLLs so they can be loaded at runtime (they simply all export the same function). The idea is to just load them like this so the aggregator can be extended depending on the runtime needs:
typedef int (*measure)(measurement &dataHolder);
int callM() {
[...]
measurement dataHolder;
lib = LoadLibraryA("measureDeviceTypeA.dll");
measure measureFunc = (measure)GetProcAddress(lib, "measureFunc");
pluginFunc(dataHolder);
[...] // close the lib and load the next one depending on found Devices
}
This works pretty well for simple datatypes (depending on the actual definition of the struct "measurement") such as this:
typedef struct measurement {
DWORD realPBS;
DWORD imaginaryPBS;
int a;
} measurement;
Now there also may be a string of arbitrary length (char representations of results). I would like to put them into the measurement struct as well and fill them inside the actual worker function inside the DLL. My first assumption was that it would be easy to just use std::string, which works sometimes and sometimes not (as it will reallocate memory on std::string().append() and this might break (access violation) depending on the actual runtime environment of the program and the dll). I read here and here that returning a string from a function is a bad idea.
So what would be the "proper" C++ way of returning arbitrary length strings from such a call? Is it helpful at all to pass a struct to the DLL or should I split it into separate calls? I don't want to have pointers dangling around or unfreed memory when I close the DLL again.
This won't work with std::string, as noted by Dani in the comments. The problem is that std::string is a type that belongs to your implementation, and different C++ implementations have different std::strings.
For DLL's specifically (Microsoft), you do have another alternative. COM is an ancient technology, but it still works today and is unlikely to go away ever. And it has its own string type, BSTR. Visual Studio provides a helper C++ class bstr_t for your own code, but on the interface you'd use the plain BSTR from _bstr_t::GetBSTR.
BSTR relies on the Windows allocator SysAllocString from OleAut32.dll
The problem is, that the string data is often allocated on the heap, so it has to be freed / managed somehow.
You could think, hey std::string is returned by value - so why I need to care about memory management. The problem is that usually only very small strings are stored "inside" the class. For larger strings the string class contains a pointer to some "heap-storage".
Dlls can be used from and with different programming languages - which is the reason that dlls do not share a "memory manager", freeing in the dll would fail.
To solve this you need to have two function calls, one which returns a pointer / handle to the data and one to free it. Or the caller could give the callee some pointer where it wants the data to be stored. You need for that a maximum-byte-count, too.
As you can see, there are some reasons why you should avoid these APIs - but it is not always possible. See for example the Windows API (there you can find both approaches).
Another approach would be to ensure a shared memory manager, but this is tricky somehow because it must be done really early!
Note: when I say "static string" here I mean memory that can not be handled by realloc.
Hi, I have written a procedure that takes a char * argument and I would like to create a duplicate IF the memory is not relocatable/resizable via realloc. As is, the procedure is a 'heavy' string processor, so being ignorant and duplicating the string whether or not it is static will surely cause some memory overhead/processing issues in the future.
I have tried to use exception handlers to modify a static string, the application just exits without any notice. I step back, look at C and say: "I'm not impressed." That would be an exception if I have ever heard of one.
I tried to use exception handlers to call realloc on a static variable... Glib reports that it can't find some private information to a structure (I'm sure) I don't know about and obviously calls abort on the program which means its not an exception that can be caught with longjmp/setjmp OR C++ try, catch finally.
I'm pretty sure there must be a way to do this reasonably. For instance dynamic memory most likely is not located anywhere near the static memory so if there is a way to divulge this information from the address... we might just have a bingo..
I'm not sure if there are any macros in the C/C++ Preprocessors that can identify the source and type of a macro argument, but it would be pretty dumb if it didn't. Macro Assemblers are pretty smart about things like that. Judging by the lack of robust error handling, I would not be a bit surprised if it did not.
C does not provide a portable way to tell statically allocated memory blocks from dynamically allocated ones. You can build your own struct with a string pointer and a flag indicating the type of memory occupied by the object. In C++ you can make it a class with two different constructors, one per memory type, to make your life easier.
As far as aborting your program goes, trying to free or re-allocate memory that has not been allocated dynamically is undefined behavior, so aborting is a fair game.
You may be able to detect ranges of memory and do some pointer comparisons. I've done this in some garbage collection code, where I need to know whether a pointer is in the stack, heap, or elsewhere.
If you control all allocation, you can simply keep min and max bounds based on every dynamic pointer that ever came out of malloc, calloc or realloc. A pointer lower than min or greater than max is probably not in the heap, and this min and max delimited region is unlikely to intersect with any static area, ever. If you know that a pointer is either static or it came from malloc, and that pointer is outside of the "bounding box" of malloced storage, then it must be static.
There are some "museum" machines where that sort of stuff doesn't work and the C standard doesn't give a meaning to comparisons of pointers to different objects using the relational operators, other than exact equality or inequality.
Any solution you would get would be platform specific, so you might want to specify the platform you are running on.
As for why a library should call abort when you pass it unexpected parameters, that tends to be safer than continuing execution. It's more annoying, certainly, but at that point the library knows that the code calling into it is in an state that cannot be recovered from.
I have written a procedure that takes a char * argument and I would like to create a duplicate IF the memory is not relocatable/resizable via realloc.
Fundamentally, the problem is that you want to do memory management based on information that isn't available in the scope you're operating in. Obviously you know if the string is on the stack or heap when you create it, but that information is lost by the time you're inside your function. Trying to fix that is going to be nearly impossible and definitely outside of the Standard.
I have tried to use exception handlers to modify a static string, the application just exits without any notice. I step back, look at C and say: "I'm not impressed." That would be an exception if I have ever heard of one.
As already mentioned, C doesn't have exceptions. C++ could do this, but the C++ Standards Committee believes that having C functions behave differently in C++ would be a nightmare.
I'm pretty sure there must be a way to do this reasonably.
You could have your application replace the default stack with one you created (and, as such, know the range of addresses in) using ucontext.h or Windows Fibers, and check if the address is inside the that range. However, (1) this puts a huge burden on any application using your library (of course, if you wrote the only application using your library, then you may be willing to accept that burden); and (2) doesn't detect memory that can't be realloced for other reasons (allocated using static, allocated using a custom allocator, allocated using SysAlloc or HeapAlloc on Windows, allocated using new in C++, etc.).
Instead, I would recommend having your function take a function pointer that would point at a function used to reallocate the memory. If the function pointer is NULL, then you duplicate the memory. Otherwise, you call the function.
original poster here. I neglected to mention that I have a working solution to the problem, it is not as robust as I would have hoped for. Please do not be upset, I appreciate everyone participating in this Request For Comments and Answers. The 'procedure' in question is variadic in nature and expects no more than 63 anonymous char * arguments.
What it is: a multiple string concatenator. It can handle many arguments but I advise the developer against passing more than 20 or so. The developer never calls the procedure directly. Instead a macro known as 'the procedure name' passes the arguments along with a trailing null pointer, so I know when I have met the end of statistics gathering.
If the function recieves only two arguments, I create a copy of the first argument and return that pointer. This is the string literal case. But really all it is doing is masking strdup
Failing the single valid argument test, we proceed to realloc and memcpy, using record info from a static database of 64 records containing each pointer and its strlen, each time adding the size of the memcopy to a secondary pointer (memcpy destination) that began as a copy of the return value from realloc.
I've written a second macro with an appendage of 'd' to indicate that the first argument is not dynamic, therefore a dynamic argument is required, and that macro uses the following code to inject a dynamic argument into the actual procedure call as the first argument:
strdup("")
It is a valid memory block that can be reallocated. Its strlen returns 0 so when the loop adds the size of it to the records, it affects nothing. The null terminator will be overwritten by memcpy. It works pretty damned well I should say. However being new to C in only the past few weeks, I didn't understand that you can't 'fool proof' this stuff. People follow directions or wind up in DLL hell I suppose.
The code works great without all of these extra shenanigans do-hickies and whistles, but without a way to reciprocate a single block of memory, the procedure is lost on loop processing, because of all the dynamic pointer mgmt. involved. Therefore the first argument must always be dynamic. I read somehwere someone had suggested using a c-static variable holding the pointer in the function, but then you can't use the procedure to do other things in other functions, such as would be needed in a recursive descent parser that decided to compile strings as it went along.
If you would like to see the code just ask!
Happy Coding!
mkstr.cpp
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
struct mkstr_record {
size_t size;
void *location;
};
// use the mkstr macro (in mkstr.h) to call this procedure.
// The first argument to mkstr MUST BE dynamically allocated. i.e.: by malloc(),
// or strdup(), unless that argument is the sole argument to mkstr. Calling mkstr()
// with a single argument is functionally equivalent to calling strdup() on the same
// address.
char *mkstr_(char *source, ...) {
va_list args;
size_t length = 0, item = 0;
mkstr_record list[64]; /*
maximum of 64 input vectors. this goes beyond reason!
the result of this procedure is a string that CAN be
concatenated by THIS procedure, or further more reallocated!
We could probably count the arguments and initialize properly,
but this function shouldn't be used to concatenate more than 20
vectors per call. Unless you are just "asking for it".
In any case, develop a workaround. Thank yourself later.
*/// Argument Range Will Not Be Validated. Caller Beware!!!
va_start(args, source);
char *thisArg = source;
while (thisArg) {
// don't validate list bounds here.
// an if statement here is too costly for
// for the meager benefit it can provide.
length += list[item].size = strlen(thisArg);
list[item].location = thisArg;
thisArg = va_arg(args, char *);
item++;
}
va_end(args);
if (item == 1) return strdup(source); // single argument: fail-safe
length++; // final zero terminator index.
char *str = (char *) realloc(source, length);
if (!str) return str; // don't care. memory error. check your work.
thisArg = (str + list[0].size);
size_t count = item;
for (item = 1; item < count; item++) {
memcpy(thisArg, list[item].location, list[item].size);
thisArg += list[item].size;
}
*(thisArg) = '\0'; // terminate the string.
return str;
}
mkstr.h
#ifndef MKSTR_H_
#define MKSTR_H_
extern char *mkstr_(char *string, ...);
// This macro ensures that the final argument to "mkstr" is null.
// arguments: const char *, ...
// limitation: 63 variable arguments max.
// stipulation: caller must free returned pointer.
#define mkstr(str, args...) mkstr_(str, ##args, NULL)
#define mkstrd(str, args...) mkstr_(strdup(str), ##args, NULL)
/* calling mkstr with more than 64 arguments should produce a segmentation fault
* this is not a bug. it is intentional operation. The price of saving an in loop
* error check comes at the cost of writing code that looks good and works great.
*
* If you need a babysitter, find a new function [period]
*/
#endif /* MKSTR_H_ */
Don't for get to mention me in the credits. She's fine and dandy.
1) Is MMGR thread safe?
2) I was hoping someone could help me understand some code. I am looking at something where a macro is used, but I don't understand the macro. I know it contains a function call and an if check, however, the function is a void function. How does wrapping "(m_setOwner (FILE,_LINE_,FUNCTION),false)" ever change return types?
#define someMacro (m_setOwner(__FILE__,__LINE__,__FUNCTION__),false) ? NULL : new ...
void m_setOwner(const char *file, const unsigned int line, const char *func);
3) What is the point of the reservoir?
4) On line 770 ("void *operator new(size_t reportedSize)" there is the line
"// ANSI says: allocation requests of 0 bytes will still return a valid value"
Who/what is ANSI in this context? Do they mean the standards?
5) This is more of C++ standards, but where does "reportedSize" come from for "void *operator new(size_t reportedSize)"?
6) Is this the code that is actually doing the allocation needed?
"au->actualAddress = malloc(au->actualSize);"
1) The C++03 standard does not mention threads. However, in all platforms with thread support I know of, the default memory allocator (new and delete) is thread safe.
Edit: In general, if things are not marked as thread-safe, you should assume they aren't, especially when there implicit global data (such as heap management structures in a memory manager). I've read some comments on another forum about this MMGR library not being thread safe.
2) The comma operator in the macro discards the result on the left, so the result of the (m_setOwner(...), false) expression is always false.
Edit: This syntax is used in MMGR to log the memory allocation before proceeding to the real allocation. The comma operator is used so that the new macro syntax is unchanged. Pre-processor macros are a simple text-based find-and-replace mechanism. Any use of new in your code will compile with or without this MMGR library. Just that, when using MMGR, the memory allocation will be logged, which is useful for debugging!
3) What "reservoir"? Are you referring to the heap? Where did you get this term from?
Edit: The memory manager at the application level is just a front-end to the memory manager at the system level. Hence, it must ask the system to allocate large pages of memory. The reservoir, in this case, seems to be the name for the mechanism that pre-allocates some of those large pages such that the next few allocations are guaranteed to succeed. This is mainly an optimization, as you amortize the cost of a single (expensive) system-level allocation over several application-level allocations.
4) Yes, "ANSI", in this context, refers to the the C++03 standard. The proper way to refer to it now is to use the ISO standard number. Feel free to Google it.
5) The reported size is set by the compiler. When you write something like X* x = new X(...); the compiler logically "rewrites" this to the equivalent form:
X* x = static_cast<X*>(operator new(sizeof(X)));
new(x) X(...);
The first line allocates enough memory (sizeof(X) is the value passed as the reportedSize argument to operator new). The second line invokes the constructor of the X class to create an object in the allocated slot of memory.
6) See #5. Yes, can think of it in these temrs, although your platform will likely not call malloc() in the operator new in "release" mode.
I am working on a library that support multiple programming environment such as VB6 and FoxPro. I have to stick with C convention as it is the lowest common denominator. Now I have a question regarding the style.
Suppose that the function process input and returns a string. During the process, the error can happen. The current proposed style is this:
int func(input params... char* buffer, unsigned int* buffer_size);
The good thing about this style is that everything is included in the prototype, including error code. And memory allocation can be avoided. The problem is that the function is quite verbose. And because the buffer_size can be any, it requires more code to implement.
Another option is to return char*, and return NULL to indicate error:
char* func(input params...);
This style requires caller to delete the buffer. Memory allocation is required so a server program could face memory fragmentation issue.
A variant of the second option is to use a thread local variable to hold the returned pointer char*, so that the user does not need to delete the buffer.
Which style do you like? And reason?
I'm a bit "damaged goods" when it comes to this subject. I used to design and maintain fairly large APIs for embedded telecom. A context where you cannot take anything for granted. Not even things like global variables or TLS. Sometimes even heap buffers show up that actually are addressed ROM memory.
Hence, if you're looking for a "lowest common denominator", you might also want to think about what language constructs are available in your target environment (the compiler is likely to accept anything within standard C, but if something is unsupported the linker will say no).
Having said that, I would always go for alternative 1. Partly because (as others have pointed out), you should never allocate memory for the user directly (an indirect approach is explained further down). Even if the user is guaranteed to work with pure and plain C, they still might for instance use their own customized memory management API for tracking leaks, diagnostic logging etc. Support for strategies like that is commonly appreciated.
Error communication is one of the most important things when dealing with an API. Since the user probably have distinct ways to handle errors in his code, you should be as consistent as possible about this communication throughout the API. The user should be able to wrap error handling towards your API in a consistent way and with minimum code. I would generally always recommend using clear enum codes or defines/typedefs. I personally prefer typedef:ed enums:
typedef enum {
RESULT_ONE,
RESULT_TWO
} RESULT;
..because it provides type/assignment safety.
Having a get-last-error function is also nice (requires central storage however), I personally use it solely for providing extra information about an already recognized error.
The verbosity of alternative 1 can be limited by making simple compounds like this:
struct Buffer
{
unsigned long size;
char* data;
};
Then your api might look better:
ERROR_CODE func( params... , Buffer* outBuffer );
This strategy also opens up for more elaborate mechanisms. Say for instance you MUST be able to allocate memory for the user (e.g. if you need to resize the buffer), then you can provide an indirect approach to this:
struct Buffer
{
unsigned long size;
char* data;
void* (*allocator_callback)( unsigned long size );
void (*free_callback)( void* p );
};
Ofcourse, the style of such constructs is always open for serious debate.
Good Luck!
I would prefer the first definition, where buffer and its size are passed in. There are exceptions, but usually you don't expect to have to clean up after the functions you call. Whereas if I allocate memory and pass it into a function, then I know that I have to clean up after myself.
Handling different sized buffers shouldn't be a big deal.
Another issue with the second style is that the contexts that the memory is allocated on may be different. For example:
// your library in C
char * foo() {
return malloc( 100 );
}
// my client code C++
char * p = foo(); // call your code
delete p; // natural for me to do, but ... aaargh!
And this is only a minor part of the problem. You can say that both sides should use malloc & free, but what if they are using diffeent compiler implementations? It is better for all allocations and deallocations to occur in the same place. whether this is the library r the client code is up to you.
If I have to choose between the two styles shown I'd go for the 1st one every time. The 2nd style gives the users of your library something else to think about, memeory allocation, and someone is bound to forget to free the memory.
The second variant is cleaner.
COM IErrorInfo is an implementation of the second approach. The server calls SetErrorInfo to set details of what went wrong and returns an error code. The caller examines the code and can call GetErrorInfo to get the details. The caller is responsible for releasing the IErrorInfo, but passing the parameters of each call in the first variant is not beautiful either.
The server could preallocate enough memory on start so that it surely has enough memory to return error details.
Few things to ponder;
Allocation and deallocation should happen at the same scope (ideally). It is best to pass in an pre-allocated buffer by the caller. The caller can safely free this later on. This poses the question -- how big the buffer should be? An approach that I've seen used fairly widely in Win32 is to pass NULL as the input buffer and the size parameter will tell you how much you need.
How many possible error conditions do you oversee? Returning a char* may limit the extent of error reporting.
What pre and post conditions do you want to be fulfilled? Does your prototype reflect that?
Do you do error checking in the caller or the callee?
I can't really tell you one is better than the other, since I don't have the big picture. But I am sure these things can get your started thinking as well as the other posts.
What about using both methods? I agree with the consensus of answers favoring style 1 vs. the pitfalls of style 2. I do feel style 2 could be used if all your API's follow a consistent naming idiom, like so:
// Style 1 functions
int fooBuff(char* buffer, unsigned int buffer_size, input params... );
// Style 2 functions
char* fooBuffAlloc(input params...);
bool fooBuffFree(char* foo);
/D
The first edition would be less error prone when other programmers use it.
If programmers have to allocate memory themselves they are more likely to remember to free it. If a library allocates memory for them it's yet another abstraction and can/will lead to complications.
I'd do it similarly to the first way, but just subtly different, after the model of snprintf and similar functions:
int func(char* buffer, size_t buffer_size, input params...);
This way, if you have lots of these, they can look similar, and you can use variable numbers of arguments wherever useful.
I agree greatly with the already-stated reasons for using version 1 rather than version 2 - memory problems are much more likely with version 2.