Prevent malloc/free to be compiled for embedded projects - c++

Background: We are using Keil to compile our NXP LPC2458 project. There are numerous tasks that are being run on Keil’s RealView RTOS. There is stack space created, which is being allocated to each task. There is no HEAP created by default, and I want to avoid it since we can't afford the code-space overhead and the cost of "garbage collecting"
Objective: Use C++ in the embedded code without using the heap. Keil provides the #pragma (__use_no_heap) which prevents malloc() and free() calls to be linked.
Solution: I tried creating a Singleton with a private static pointer. My hopes were that the new() would not be called since I declared dlmData as static in the getDLMData(). For some reason, the linker still states that malloc() and free() are being called. I have thoughts of a private operator new () and a private operator delete() , and then declaring the dlmData as static within the overloaded function. It is not working for some reason. WHAT AM I DOING WRONG?
//class declaration
class DataLogMaintenanceData
{
public:
static DataLogMaintenanceData* getDLMData();
~DataLogMaintenanceData()
{ instanceFlag = FALSE; }
protected:
DataLogMaintenaceData(); //constructor declared protected to avoid poly
private:
static Boolean instanceFlag;
static DataLogMaintenceData *DLMData;
}
//set these to NULL when the code is first started
Boolean DataLogMaintenanceData::instanceFlag = FALSE;
DataLogMaintenanceData *DataLogMaintenaceData::DLMData = NULL;
//class functions
DataLogMaintenanceData *DataLogMaintenanceData::getDLMData()
{
if (FALSE == instanceFlag)
{
static DataLogMaintenanceData dlmData;
DLMData = &dlmData;
instanceFlag = TRUE;
return DLMData;
}
else
{
return DLMData;
}
}
void InitDataLog ( void )
{
DataLogMaintenanceData *dlmData;
dlmData = DataLogMaintenanceData::getDLMData();
// to avoid dlmData warning
dlmData = dlmData;
}
//ACTUAL TASK
__task DataLog()
{
.. .. .. code to initialize stuff
InitDataLog();
.. .. ..more stuff
}
For some reason, the only way I can get this to compile, is to create a heap space and then allow the malloc() and free() calls to be compiled into the project. As expected, the “static”ally defined object, dlmData, resides in the RAM space allocated to the dataLog.o module (i.e. it doesn’t live in the HEAP).
I can’t figure out, and I have checked Google, what am I missing? Is it possible in C++ to bypass malloc() and free() when compiling pure objects? I know I can replace the RTOS’s implementation of malloc() and free() to do nothing, but I want to avoid compiling in code that I won’t use.

Probably some of the code we aren't seeing calls a function that calls malloc behind the scenes.
From http://www.keil.com/support/man/docs/armlib/armlib_CJAIJCJI.htm you can use --verbose --list=out.txt on the link line to get details about the malloc caller.

Included in the Keil installation is a set of PDFs... one of the documents (document ID DUI0475A) is titled "Using ARM C and C++ Libraries and Floating-Point Support". It discusses use of the heap (and preventing its use) in several places.
Specifically, check out section 2.64 "Avoiding the ARM-supplied heap and heap-using library functions", lots of good information there. The interesting text in that section:
You can reference the __use_no_heap or __use_no_heap_region symbols in
your code to guarantee that no heap-using functions are linked in from
the ARM library.
__use_no_heap guards against the use of malloc(), realloc(), free(),
and any function that uses those functions. For example, calloc() and
other stdio functions.
__use_no_heap_region has the same properties as __use_no_heap, but in
addition, guards against other things that use the heap memory region.
For example, if you declare main() as a function taking arguments, the
heap region is used for collecting argc and argv.
Since your question is about how prevent malloc() from being called / used, that might put you on the right track.

From the code you've posted I cannot see anything that would like to allocate the memory on the heap. Are there any implicit conversions taking place somewhere? What if you compile without this class at all?
What you could do:
1) Run under debugger (assuming you can build a runnable image, maybe on an emulator), set a breakpoint in malloc and examine the stack
2) Provide your own malloc and free to make linker happy, then repeat step 1.
You may find that you need to link against a different version of C runtime startup. In the worst case if number of calls to malloc/free is limited you can roll out your own version which will give the callers some preallocated memory - but hopefully this will not be neccessary.

Related

Returning unique_ptr from a function executed via dlsym

I have a function that is located in a shared object, and is loaded and executed with dlsym from the main program.
(both the shared object and the main program are C++)
Is it possible that this function will return std::unique_ptr ?
shared object function -
extern "C" {
unique_ptr<Obj> some_function() {
return make_unique<Obj>();
}
}
main program :
void main_flow() {
auto handle = dlopen(...);
FuncPtr func = dlsym(handle, "some_function");
unique_ptr<Obj> func();
}
Yes, ish, with a lot of caveats. Firstly, using boost or STL within a DSO interface is a little dangerous.
std::unique_ptr differs between compilers
std::unique_ptr differs between C++ versions
std::unique_ptr may differ between debug/release builds.
This means if you use STL or boost in your DSO interface, ALL exes and dsos must use exactly the same version of the C++ runtime compiled with the same build flags (and same version of boost if that's your kind of thing).
I'd recommend using warning level 4 on Visual Studio, which will nicely list all of the above problems in your DSO interfaces (As C4251 warnings)
As for your question, yes the function will return a std::unique_ptr, however you are now allocating memory in the DSO, which you may be freeing in the exe. This can be very bad in the windows world, where you may find that debug builds have different heaps. Attempting to free the DSO allocated object in the EXE heap will throw a runtime error, but usually only in debug builds.
Your main should look like this:
void main_flow() {
auto handle = dlopen(...);
FuncPtr func = (FuncPtr)dlsym(handle, "some_function");
unique_ptr<Obj> obj = func();
}
Personally though, I'd recommend just returning a naked pointer, and doing a make_unique on it in your exe. That at least removes the C4251 problems, although you may get bitten by the heap issue (unless you make the destructor of the class type virtual)

Overloading base types with a custom allocator, and its alternatives

So, this is a bit of an open question. But let's say that I have a large application which globally overrides the various new and delete operators so that they use home-brewed jemalloc-style arenas and custom alignments.
All fine and good, but I have been running into segfault issues because other C++-based DLLs and their dependencies also use the overloaded allocators when they shouldn't (namely LLVM), putting the little custom allocator to its knees (lack of memory and more stresses).
Testing workarounds, I have wrapped (and moved) those global operators into a class, and I made all base classes inherit from it. And well, that works for classes, but not for base types. That's the problem.
Given that C++ doesn't allow useful things like having separate allocators per namespace, or limiting the new operator per executable module, what is the best way of emulating this in base data types, where I can't directly subclass an int?
The obvious way is wrapping them in a custom template, but the problem is performance. Do I have to emulate all the array and indexing operations under a second layer just so that I can malloc from a different place without having to change the rest of the functional code? There's a better way?
P.S.: I have also been thinking about using special global new/delete operators with extra parameters, while leaving the standard ones alone. Thus ensuring that I am (well, my executable module is) the only one calling those global functions. It should be a simple search-and-replace.
Well, quick update. What I did in the end to 'solve' this conundrum is to manually detect if the code that called the overridden global allocators comes from the main executable module and conditionally redirect all the external new / delete calls to their corresponding malloc / free while still using the custom arena allocator for our own internal code.
How? After doing some R&D I found that this could be done by using the _ReturnAddress() built-in on MSVC and __builtin_extract_return_addr(__builtin_return_address(0)) on GCC/Clang; and I can say that it seems to work fine so far in production software.
Now, when some C++ code from our address space wants some memory we can see where it comes from.
But, how do we find out if that address is part of some other module in our process space or our own? We might need to find out both the base and end addresses of the main program, cache them at startup as globals, and check that the return address is within bounds.
All for extremely little overhead. But, our second problem is that retrieving the base address is different in every platform. After some research I found that things were more straightforward than expected:
In Windows/Win32 we can simply do this:
#include <windows.h>
#include <psapi.h>
inline void __initialize_base_address()
{
MODULEINFO minfo;
GetModuleInformation(GetCurrentProcess(), GetModuleHandle(NULL), &minfo, sizeof(minfo));
base_addr = (uintptr_t) minfo.lpBaseOfDll;
base_end = (uintptr_t) minfo.lpBaseOfDll + minfo.SizeOfImage;
}
In Linux there are a thousand ways of doing this, including linker globals and some debuggey (verbose and unreliable) ways of walking the process module table. I was looking at the linker map output and noticed that the _init and _fini functions always seem to wrap the rest of the .text section symbols. Sometimes it's hard to get to the simplest solution that works everywhere:
#include <link.h>
inline void __initialize_base_address()
{
void *handle = dlopen(0, RTLD_NOW);
base_addr = (uintptr_t) dlsym(handle, "_init");
base_end = (uintptr_t) dlsym(handle, "_fini");
dlclose(handle);
}
While in macOS things are even less documented and I had to cobble together my own thing using the Darwin kernel open-source code and tracking down some obscure low-level tools as reference. Keep in mind that _NSGetMachExecuteHeader() is just a wrapper for the internal _mh_execute_header linker global. If you need to do anything about parsing the Mach-O format and its structures then getsect.h is the way to go:
#include <mach-o/getsect.h>
#include <mach-o/ldsyms.h>
#include <crt_externs.h>
inline void __initialize_base_address()
{
size_t size;
void *ptr = getsectiondata(&_mh_execute_header, SEG_TEXT, SECT_TEXT, &size);
base_addr = (uintptr_t) _NSGetMachExecuteHeader();
base_end = (uintptr_t) ptr + size;
}
Another thing to keep in mind is that this some-other-cpp-module-is-using-our-internal-allocator-that-globally-overrides-new-causing-weird-bugs issue seems to be a problem in Linux and maybe macOS, I didn't have this issue in Windows, probably because no conflicting DLLs were loaded in the process, being mostly C API-based. I think, or maybe the platform uses different C++ runtimes for each module.
The main issue I had was caused by Mesa3D, which uses LLVM (pure C++ in and out) for many of their GLSL shader compilers and liked to gobble up big chunks of my small custom-tailored memory arena uninvited.
Rewriting a legacy program that is structurally dependent on these allocators was out of the question due to its sheer size and complexity, so this turned out to be the best way of making things work as expected.
It's only a few lines of optional, sneaky, extra per-platform code.

Boost shared memory object in DLL

i've created dll and implemented shared memory that every connected process can use. My problem is that i can't change anything in object, which is stored in the memory.
my class :
class MyClass
{
public:
MyClass();
void test();
int counter;
};
void MyClass::test() {
MessageBoxA(NULL, "test", "test", 0x0000000L);
counter++;
}
in stdafx.h i have :
static offset_ptr<MyClass> offset_mt;
static managed_shared_memory *memSegment;
I initialize shared memory and pointer :
memSegment = new managed_shared_memory(create_only, SHARED_MEMORY_NAME, 4096);
offset_mt = memSegment->construct<MyClass>("MyClass myClass")();
And then in an exported function i call
offset_mt.get()->test();
Im calling this from Java using JNA and result is a memory error (Invalid memory access). However, if I delete 'counter++' from test method, everything works fine - message box appears. Is there a limitation that I cant modify objects inside mapped memory or is this done the other way?
Well, i solved this by moving my variables to stdafx.cpp :
offset_ptr<MyClass> offset_mt;
managed_shared_memory *memSegment;
and making them extern in stdafx.h :
extern offset_ptr<MyClass> offset_mt;
extern managed_shared_memory *memSegment;
Now it's running fine, but I've done this kinda accidentally and I'm not pretty sure why this works and previous way not. If anyone could explain this to me, it would be great.
When you say
static offset_ptr<MyClass> offset_mt;
compiler has to do a few things. One of them is allocating space for your variable (see where static variables are stored). Another one is calling any nontrivial constructors. This last part is done by CRT, before main() (or dllmain) runs. In fact CRT replaces your entry point and initializes statics before calling your [dll]main().
When you say that in a header, compiler is allocating space for the variable in each compilation unit that includes the header.
When you say that in stdafx.h, that means every cpp file. Normally that should result in a linker error, but sometimes it slips through (one way to do it is to use anonymous namespace) and results in different cpp files seeing different copies of the variable. So if you are initializing in one cpp, and you using it in another, you blow up.
When you are importing the dll in interesting ways sometimes importing code doesn't call the entry point at all -- this kills most CRT facilities and results in your own statics being uninitialized. Don't know about JNA, but some old versions of .Net had this problem.
There is also static initialization fiasco, but that might not affect your particular case.
By moving your definitions into cpp and removing static modifier, you avoided all those pitfalls.

Inline class constructor to avoid vc memory crash

C++ class constructor can be inlined or not be inlined. However, I found a strange situation where only inline class constructor can avoid Visual Studio memory crash. The example is as follows:
dll.h
class _declspec(dllexport) Image
{
public:
Image();
virtual ~Image();
};
class _declspec(dllexport) Testimage:public Image
{
public:
Testimage();
virtual ~Testimage();
};
typedef std::auto_ptr<Testimage> TestimagePtr;
dll.cpp
#include "dll.h"
#include <assert.h>
Image::~Image()
{
std::cout<<"Image is being deleted."<<std::endl;
}
Image::Image()
{
}
Testimage::Testimage()
{
}
Testimage::~Testimage()
{
std::cout<<"Geoimage is being deleted."<<std::endl;
}
The dll library is compiled as a dynamic library, and it is statically linked to the C++ runtime library (Multi-threaded Debug (/MTd)). The executable program that runs the library is as follows:
int main()
{
TestimagePtr my_img(new Testimage());
return 0;
}
The executable program will invoke the dll library and it also statically links the runtime library. The problem I have is that when running the executable program the following error message appears:
However, when the class constructor in dll is inlined as the following codes show:
class _declspec(dllexport) Image
{
public:
Image();
virtual ~Image();
};
class _declspec(dllexport) Testimage:public Image
{
public:
Testimage()
{
}
virtual ~Testimage();
};
The crash will disappear. Could someone explain the reason behind? Thanks! By the way, I am using VC2010.
EDIT: The following situation also trigger the same crash
.
Situation 1
int main()
{
//TestimagePtr my_img(new Testimage());
Testimage *p_img;
p_img = new Testimage();
delete p_img;
return 0;
}
it is statically linked to the C++ runtime library (Multi-threaded Debug (/MTd)
This is a very problematic scenario in versions of Visual Studio prior to VS2012. The issue is that you have more than one version of the CRT loaded in your process. One used by your EXE, another used by the DLL. This can cause many subtle problems, and not so subtle problems like this crash.
The CRT has global state, stuff like errno and strtok() cannot work properly when that global state is updated by one copy of the CRT and read back by another copy. Relevant to your crash, a hidden global state variable is the heap that the CRT uses to allocate memory from. Functions like malloc() and ::operator new use that heap.
This goes wrong when objects are allocated by one copy of the CRT and released by another. The pointer that's passed to free() or ::operator delete belongs to the wrong heap. What happens next depends on your operating system. A silent memory leak in XP. In Vista and up, you program runs with the debug version of the memory manager enabled. Which triggers a breakpoint when you have a debugger attached to your process to tell you that there's a problem with the pointer. The dialog in your screenshot is the result. It isn't otherwise very clear to me how inlining the constructor could make a difference, the fundamental issue however is that your code invokes undefined behavior. Which has a knack for producing random outcomes.
There are two approaches available to solve this problem. The first one is the simple one, just build both your EXE and your DLL project with the /MD compile option instead. This selects the DLL version of the CRT. It is now shared by both modules and you'll only have a single copy of the CRT in your process. So there is no longer a problem with having one module allocating and another module releasing memory, the same heap is used.
This will work fine to solve your problem but can still become an issue later. A DLL tends to live a life of its own and may some day be used by another EXE that was built with a different version of the CRT. The CRT will now again not be shared since they'll use different versions of the DLL, invoking the exact same failure mode you are seeing today.
The only way to guarantee that this cannot happen is to design your DLL interface carefully. And ensure that there will never be a case where the DLL allocates memory that the client code needs to release. That requires giving up on a lot of C++ goodies. You for example can never write a function that returns a C++ object, like std::string. And you can never allow an exception to cross the module boundary. You are basically down to a C-style interface. Note how COM addresses this problem by using interface-based programming techniques and a class factory plus reference counting to solve the memory management problem.
VS2012 has a counter-measure against this problem, it has a CRT version that allocates from the default process heap. Which solves this particular problem, not otherwise a workaround for the global state issue for other runtime functions. And adds some new problems, a DLL compiled with /MT that gets unloaded that doesn't release all of its allocations now causes an unpluggable leak for example.
This is an ugly problem in C++, the language fundamentally misses an ABI specification that addresses problems like this. The notion of modules is entirely missing from the language specification. Being worked on today but not yet completed. Not simple to do, it is solved in other languages like Java and the .NET languages by specifying a virtual machine, providing a runtime environment where memory management is centralized. Not the kind of runtime environment that excites C++ programmers.
I tried to reproduce your problem in VC2010 and it doesn't crash. It works with a constructor inline or not. Your problem is probably not in what you write here.
Your project is too hard to open as it seams to have its file pathes set in absolute, probably because generated with CMake. (So the files are not found by the compiler).
The problem I see in your code is that you declare the exported classes with _declspec(dllexport) directly written.
You should have a #Define to do this, and the value should be _declspec(dllimport) when read from the exe compilation. Maybe the problem comes from that.

Is this a valid way to provide STL functions in a library regardless of CRT version?

I am trying to migrate some static C++ libraries into DLLs with a C interface so I don't need to build a separate version of the library for every version of Visual Studio (i.e. CRT) we want to support. However, I do like the convenience of using STL objects for some of the function calls. I came up with something that seems to work, but was wondering if there may be some hidden things that I'm just not thinking of.
Here is what I came up with to get STL versions of functions while still maintaining Visual Studio independence.
Original library function:
//library.h
...
std::wstring GetSomeString();
...
StringGenerator* mStrGen; //assume forward declared for pimpl implementation
//library.cpp
std::wstring library::GetSomeString()
{
return mStrGen->GetString(); //returns a wstring;
}
First, I created a private function that would provide the C interface
//library.h
__declspec(dllexport) void GetSomeStringInternal(wchar_t* pSomeString);
//library.cpp
void library::GetSomeString(wchar_t*& pSomeString)
{
if(pSomeString!= nullptr) {
delete [] pSomeString; //assumes allocated by the DLL
}
std::wstring tmpString(mStrGen->GetString());
size_t stringLength(tmpString.size());
stringToReturn = new wchar_t[stringLength + 1];
wcscpy_s(pSomeString, stringLength + 1, tmpString.c_str());
}
Next, I added a private function that deallocates memory allocated by the DLL
//library.h
__declspec(dllexport) void FreeArray(void* arrayPtr);
//library.cpp
void library::FreeArray(void* arrayPtr)
{
if(arrayPtr) {
delete [] arrayPtr;
}
}
Finally, I converted the original C++ function returning a string into a function that calls the internal C interface function
//library.h
std::wstring GetSomeString()
{
std::wstring someString(L"");
wchar_t* pSomeString= NULL;
GetSomeStringInternal(pSomeString);
someString = pSomeString;
FreeArray(pSomeString);
return someString;
}
//library.cpp
//removed GetSomeString from cpp since it is defined in header
My thinking is that since the header will be compiled every time it is included, an application that uses a different version of the CRT will compile the function using its implementation of the CRT. All data passed into and out of the library uses a C interface to preserve compatibility and memory is allocated and freed by the library so you don't run into one version of the CRT trying to free memory from a different version.
It seems to perform as I intend:
The library can be used by programs compiled with multiple versions of Visual Studio.
There are no memory leaks or access violations
If I modify the code so that memory is allocated in the GetSomeString function in the header, I do get a memory access error when trying to free that memory.
The GetSomeString function is compiled by the library and included in the DLL, but it is never called since it is 1) not exported and 2) the compiling program will always choose its version since it is inlined.
Is there anything I am missing, or is this a valid way of going about providing a C++ interface to a library that is Visual Studio version independent?
Side note: I have run into some deletion issues if I have a program that uses a std::shared_ptr<library>, but haven't researched that issue enough and will probably have a follow up question on that problem.
One thing I could see becoming a problem is if you actually needed to pass large objects around by reference for performance reasons. You're dealing with the binary compatibility issues by copying all the data to and from a compatible format, which is fine until it becomes a performance issue.