Access violation when using alloca - c++

My stackAlloc function looks like this:
void* stackAlloc(size_t size) {
if (size > maxStackAllocation)
return malloc(size);
else
return _alloca(size);
}
void stackAllocFree(void *ptr, size_t size) {
if (size > maxStackAllocation) {
free(ptr);
}
}
If I change so the stackAlloc function always use malloc instead of alloca everything works.
I changed the function to a macro, and now its working as expected:
#define maxStackAllocation 1024
#define stackAlloc(size) \
( \
(size > maxStackAllocation)? \
malloc(size): \
_alloca(size) \
)
#define stackAllocFree(ptr, size) \
( \
(size > maxStackAllocation)? \
free(ptr): \
void() \
)

Assuming you're running on Windows, since your code calls _alloca(), per the MSDN documentation:
_alloca allocates size bytes from the program stack. The allocated space is automatically freed when the calling function exits
Note that the memory is freed when the calling function exits - which I'm assuming also means the calling function returns.
Your code:
void* stackAlloc(size_t size) {
if (size > maxStackAllocation)
return malloc(size);
else
return _alloca(size);
}
returns, thus freeing the memory obtained via _alloca().

From the man page,
This temporary space is automatically freed
when the function that called alloca() returns to its caller.
So whenever your stackAlloc function returns, it will automatically free the memory.

This works, but I'd advise against using it in production:
#include <iostream>
#include <alloca.h>
auto stackAlloc(const size_t size)
{
return [size](){ return alloca(size); };
}
int main() {
char *ch = (char *)stackAlloc(40000)();
ch[39999] = '\0';
return 0;
}
Counter-check: if I decrease stackAlloc's parameter, it doesn't work (which is the expected behaviour here)
Feel free to add the check, etc. in stackAlloc (either by returning different lambdas or having the lambda do the check).

Related

fixing the recursive call to malloc with LD_PRELOAD

I am using LD_PRELOAD to log malloc calls from an application and map out the virtual address space however malloc is used internally by fopen/printf. Is there a way I can fix this issue?
I know about glibc's hooks but I want to avoid changing the source code of the application.
My issue was caused by the fact that malloc is used internally by glibc so when I use LD_PRELOAD to override malloc any attempt to log caused malloc to be called resulting in a recursive call to malloc itself
Solution:
call original malloc whenever the TLS needs memory allocation
providing code:
static __thread int no_hook;
static void *(*real_malloc)(size_t) = NULL;
static void __attribute__((constructor))init(void) {
real_malloc = (void * (*)(size_t))dlsym(RTLD_NEXT, "malloc");
}
void * malloc(size_t len) {
void* ret;
void* caller;
if (no_hook) {
return (*real_malloc)(len);
}
no_hook = 1;
caller = (void*)(long) __builtin_return_address(0);
printf("malloc call %zu from %lu\n", len, (long)caller);
ret = (*real_malloc)(len);
// fprintf(logfp, ") -> %pn", ret);
no_hook = 0;
return ret;
}

Override global new/delete and malloc/free with tcmalloc library

I want to override new/delete and malloc/free. I have tcmalloc library linked in my application. My aim is to add stats.
From new I am calling malloc. Below is an example it's global.
void* my_malloc(size_t size, const char *file, int line, const char *func)
{
void *p = malloc(size);
....
....
....
return p;
}
#define malloc(X) my_malloc(X, __FILE__, __LINE__, __FUNCTION__)
void *
operator new(size_t size)
{
auto new_addr = malloc(size);
....
...
return new_addr;
}
New/delete override is working fine.
My question is what happen to other file where I have use malloc directly for example
first.cpp
malloc(sizeof(..))
second.cpp
malloc(sizeof(..))
How this malloc call get's interpret as my macro is not in header file.
tcmalloc provides new/delete hooks that can be used to implement any kind of tracking/accounting of memory usage. See e.g. AddNewHook in https://github.com/gperftools/gperftools/blob/master/src/gperftools/malloc_hook.h

Device pointer in a device class (Cuda C++)

I would like to implement a device side vector class which encapsulates a pointer to the elements of the container.
After I instantiate an object of this class I have no access to the inside pointer. It always says 'Access violation writing location some device memory address'.
My code is the following:
#include <iostream>
#include <cuda_runtime.h>
template <typename T>
class DeviceVector
{
private:
T* m_bValues;
std::size_t m_bSize;
public:
__host__
void* operator new(std::size_t size)
{
DeviceVector<T>* object = nullptr;
cudaMalloc((void**)&object, size);
return object;
}
__host__
void operator delete(void* object)
{
cudaFree(object);
}
__host__
DeviceVector(std::size_t size = 1)
{
cudaMemcpy(&m_bSize, &size, sizeof(std::size_t), cudaMemcpyHostToDevice);
// At this cudaMalloc I get Access violation writing location...
cudaMalloc((void**)&m_bValues, size * sizeof(T));
// It's an alternative solution here
T* ptr;
cudaMalloc((void**)&ptr, size * sizeof(T));
cudaMemcpy(&m_bValues, &ptr, sizeof(T*), cudaMemcpyHostToDevice);
// The memory is allocated
// But I can't access it through m_bValues pointer
// It is also Access violation writing location...
}
__host__
~DeviceVector()
{
// Access violation here if I use the second solution in the constructor
cudaFree(m_bValues);
}
};
int main()
{
DeviceVector<int>* vec = new DeviceVector<int>();
delete vec;
return 0;
}
Note:
I have access to the size attribute.
So my questions are:
How to allocate memory for this class to get access to the pointer inside?
Is this even possible to encapsulate a pointer into a class on the device?
This line is illegal:
cudaMalloc((void**)&m_bValues, size * sizeof(T));
because your new operator allocated the object on the device:
cudaMalloc((void**)&object, size);
return object;
and the constructor was called to operate on that allocation. Therefore &m_bValues is taking the address of a device variable in host code which is illegal in CUDA. If you do that, and then attempt to use it in host code (i.e. the cudaMalloc operation), you're going to get a seg fault. cudaMalloc creates a device allocation of a particular size, and then stores the device pointer to that allocation in a variable that is expected to be resident on the host. If you pass it a device address to store that pointer into instead, cudaMalloc will segfault trying to write the pointer value.
Your alternative solution is a somewhat better approach, and is the general idea when it's necessary to copy a pointer to a device allocation to a variable resident on the device.
But you've still basically made the allocation that m_bValues points to inaccessible from the host. (ptr, being a temporary variable, won't help, and creating another variable in the class to hold a value like ptr won't help either because the entire class is allocated and resident on the device.) For the same reason that you're not allowed to use &m_bValues in the previous cudaMalloc operation, you won't be able to use it directly in any other host code (except as the target for cudaMempcy host->device when copying the pointer value itself).
I don't think there are any simple fixes for this. I suggest re-crafting the object to live on the host, and provide appropriate host- and device-side allocations for corresponding pointers and parameters (like size).
It also seems like you're re-inventing the wheel. You might want to investigate thrust device vectors (which are easily usable with ordinary CUDA code.)
Anyway, this was the closest I could come up with:
#include <iostream>
#include <cuda_runtime.h>
#include <stdio.h>
#define cudaCheckErrors(msg) \
do { \
cudaError_t __err = cudaGetLastError(); \
if (__err != cudaSuccess) { \
fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
msg, cudaGetErrorString(__err), \
__FILE__, __LINE__); \
fprintf(stderr, "*** FAILED - ABORTING\n"); \
exit(1); \
} \
} while (0)
template <typename T>
class DeviceVector
{
private:
T* m_bValues;
std::size_t m_bSize;
std::size_t eleSize;
public:
__host__
void* operator new(std::size_t size)
{
DeviceVector<T>* object = NULL;
object = (DeviceVector<T> *)malloc(size*sizeof(DeviceVector<T>));
return object;
}
__host__
void operator delete(void* object)
{
free(object);
}
__host__
DeviceVector(std::size_t size = 1)
{
m_bSize = size;
eleSize = sizeof(T);
cudaMalloc(&m_bValues, m_bSize*sizeof(T));
cudaCheckErrors("constructor cudaMalloc fail");
cudaMemset(m_bValues, 0, m_bSize*sizeof(T));
}
__host__
~DeviceVector()
{
cudaFree(m_bValues);
cudaCheckErrors("destructor cudaFree fail");
}
__host__
T* getDevPtr(){
return m_bValues;}
__host__
std::size_t getSize(){
return m_bSize;}
__host__
std::size_t geteleSize(){
return eleSize;}
};
int main()
{
DeviceVector<int>* vec = new DeviceVector<int>();
cudaMemset(vec->getDevPtr(), 0xFF, vec->getSize()*vec->geteleSize());
cudaCheckErrors("vector fill fail");
delete vec;
return 0;
}
You've shown very little about how you want to interact with an object of this class, so I'm just guessing here.

Track memory allocation per function

So I know I can track memory allocation with methods of overloading new globally like so: http://www.almostinfinite.com/memtrack.html
However, I was wondering if there was a good way to do this per function so I can get a report of how much is allocated per function. Right now I can get file and lines and what the typeid is as in the link I provided but I would like to find which function is allocating the most.
What about doing something like: http://ideone.com/Wqjkrw
#include <iostream>
#include <cstring>
class MemTracker
{
private:
static char func_name[100];
static size_t current_size;
public:
MemTracker(const char* FuncName) {strcpy(&func_name[0], FuncName);}
static void inc(size_t amount) {current_size += amount;}
static void print() {std::cout<<func_name<<" allocated: "<<current_size<<" bytes.\n";}
static void reset() {current_size = 0; memset(&func_name[0], 0, sizeof(func_name)/sizeof(char));}
};
char MemTracker::func_name[100] = {0};
size_t MemTracker::current_size = 0;
void* operator new(size_t size)
{
MemTracker::inc(size);
return malloc(size);
}
void operator delete(void* ptr)
{
free(ptr);
}
void FuncOne()
{
MemTracker(__func__);
int* i = new int[100];
delete[] i;
i = new int[200];
delete[] i;
MemTracker::print();
MemTracker::reset();
}
void FuncTwo()
{
MemTracker(__func__);
char* c = new char[1024];
delete[] c;
c = new char[2048];
delete[] c;
MemTracker::print();
MemTracker::reset();
}
int main()
{
FuncOne();
FuncTwo();
FuncTwo();
FuncTwo();
return 0;
}
Prints:
FuncOne allocated: 1200 bytes.
FuncTwo allocated: 3072 bytes.
FuncTwo allocated: 3072 bytes.
FuncTwo allocated: 3072 bytes.
What platform are you using? There might be platform specific solutions without changing the functions in your code base.
If you are using Microsoft Visual Studio, you can use compiler switches /Gh and /GH to let the compiler call functions _penter and _pexit that you can define. In those functions, you can query how much memory the program is using. There should be enough information in there to figure out how much memory is allocated in each function.
Example code for checking memory usage is provided in this MSDN article.

Problems with LD_PRELOAD and calloc() interposition for certain executables

Relating to a previous question of mine
I've successfully interposed malloc, but calloc seems to be more problematic.
That is with certain hosts, calloc gets stuck in an infinite loop with a possible internal calloc call inside dlsym. However, a basic test host does not exhibit this behaviour, but my system's "ls" command does.
Here's my code:
// build with: g++ -O2 -Wall -fPIC -ldl -o libnano.so -shared Main.cc
#include <stdio.h>
#include <dlfcn.h>
bool gNanoUp = false;// global
// Function types
typedef void* (*MallocFn)(size_t size);
typedef void* (*CallocFn)(size_t elements, size_t size);
struct MemoryFunctions {
MallocFn mMalloc;
CallocFn mCalloc;
};
MemoryFunctions orgMemFuncs;
// Save original methods.
void __attribute__((constructor)) __nano_init(void) {
fprintf(stderr, "NANO: init()\n");
// Get address of original functions
orgMemFuncs.mMalloc = (MallocFn)dlsym(RTLD_NEXT, "malloc");
orgMemFuncs.mCalloc = (CallocFn)dlsym(RTLD_NEXT, "calloc");
fprintf(stderr, "NANO: malloc() found #%p\n", orgMemFuncs.mMalloc);
fprintf(stderr, "NANO: calloc() found #%p\n", orgMemFuncs.mCalloc);
gNanoUp = true;
}
// replacement functions
extern "C" {
void *malloc(size_t size) {
if (!gNanoUp) __nano_init();
return orgMemFuncs.mMalloc(size);
}
void* calloc(size_t elements, size_t size) {
if (!gNanoUp) __nano_init();
return orgMemFuncs.mCalloc(elements, size);
}
}
Now, When I do the following, I get an infinite loop followed by a seg fault, eg:
% setenv LD_PRELOAD "./libnano.so"
% ls
...
NANO: init()
NANO: init()
NANO: init()
Segmentation fault (core dumped)
However if I comment out the calloc interposer, it almost seems to work:
% setenv LD_PRELOAD "./libnano.so"
% ls
NANO: init()
NANO: malloc() found #0x3b36274dc0
NANO: calloc() found #0x3b362749e0
NANO: init()
NANO: malloc() found #0x3b36274dc0
NANO: calloc() found #0x3b362749e0
<directory contents>
...
So somethings up with "ls" that means init() gets called twice.
EDIT
Note that the following host program works correctly - init() is only called once, and calloc is successfully interposed, as you can see from the output.
// build with: g++ test.cc -o test
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[]) {
void* p = malloc(123);
printf("HOST p=%p\n", p);
free(p);
char* c = new char;
printf("HOST c=%p\n", c);
delete c;
void* ca = calloc(10,10);
printf("HOST ca=%p\n", ca);
free(ca);
}
% setenv LD_PRELOAD "./libnano.so"
% ./test
NANO: init()
NANO: malloc() found #0x3b36274dc0
NANO: calloc() found #0x3b362749e0
HOST p=0x601010
HOST c=0x601010
HOST ca=0x601030
I know I am a bit late (6 years). But I wanted to override calloc() today and faced a problem because dlsym() internally uses calloc(). I solved it using a simple technique and thought of sharing it here:
static unsigned char buffer[8192];
void *calloc(size_t nmemb, size_t size)
{
if (calloc_ptr == NULL) // obtained from dlsym
return buffer;
init(); // uses dlsym() to find address of the real calloc()
return calloc_ptr(len);
}
void free(void *in)
{
if (in == buffer)
return;
free_ptr(in);
}
buffer satisfies the need of dlsym() till the real calloc() has been located and my calloc_ptr function pointer initialized.
With regard to __nano_init() being called twice: You've declared the function as a constructor, so it's called when the library is loaded, and it's called a second time explicitly when your malloc() and calloc() implementations are first called. Pick one.
With regard to the calloc() interposer crashing your application: Some of the functions you're using, including dlsym() and fprintf(), may themselves be attempting to allocate memory, calling your interposer functions. Consider the consequences, and act accordingly.
Using dlsym based hooking can result in crashes, as dlsym calls back into the memory allocator. Instead use malloc hooks, as I suggested in your prior question; these can be installed without actually invoking dlsym at all.
You can get away with a preliminary poor calloc that simply returns NULL. This actually works on Linux, YMMV.
static void* poor_calloc(size_t nmemb, size_t size)
{
// So dlsym uses calloc internally, which will lead to infinite recursion, since our calloc calls dlsym.
// Apparently dlsym can cope with slightly wrong calloc, see for further explanation:
// http://blog.bigpixel.ro/2010/09/interposing-calloc-on-linux
return NULL; // This is a poor implementation of calloc!
}
You can also use sbrk to allocate the memory for "poor calloc".