Bad memory address when allocating OpenCL buffer

Bad memory address when allocating OpenCL buffer - c++

I have a program running some image treatment with OpenCL, I sometimes have a crash because it's trying to write something into a memory address (with clCreateBuffer) that is null.
Is their any OpenCL call I can use to delay that memory write, or is it possible to check via C++ if a memory address is valid ?

You probably can use OpenCL events.
cl_int clWaitForEvents (cl_uint num_events,
const cl_event *event_list)
You can create an event from the call or operation you want to wait for, then before creating your buffer you wait for that event to complete.
However, could you provide a little bit of information. For example what exactly do you want to do? Maybe there is another way. It would also be better if you have some code showing your operations.

Related

Can Kernel mode driver do ReadProcessMemory on any process?

I am currently writing a kernel mode driver (software driver) with KMDF and since I am very new to this topic I want to ask you if my driver would be able to call OpenProcess and ReadProcessMemory on any running process or is there some way to prevent that my driver can call those functions on a process from kernel mode?

you can get target process pointer by call PsLookupProcessByProcessId. than call KeStackAttachProcess and direct read process memory. because this is user mode memory - mandatory do it in __try/__except block. finally call KeUnstackDetachProcess and ObfDereferenceObject for target process

According to https://github.com/Zer0Mem0ry/KernelBhop/blob/master/Driver/Driver.c, you need to use an undocumented MmCopyVirtualMemory for both reading and writing any process.
NTSTATUS NTAPI MmCopyVirtualMemory
(
PEPROCESS SourceProcess,
PVOID SourceAddress,
PEPROCESS TargetProcess,
PVOID TargetAddress,
SIZE_T BufferSize,
KPROCESSOR_MODE PreviousMode,
PSIZE_T ReturnSize
);

You have NtReadVirtualMemory, but there is no Zw* version in kernel-mode, which means you're going to have to locate the address yourself (using the KeServiceDescriptorTable will work, but memory scanning is also an option).
Bear in mind, if you want to make use of any kernel-mode addresses, you'll need to set the PreviousMode of the current thread to 0 (KernelMode) if you happen to be executing under the context of a non-kernel thread (e.g. in a callback routine you might be put under the context of another process other than NTOSKRNL). This is what the Zw* routines will do for you automatically in kernel-mode, but obviously as I've already said, there isn't one for NtReadVirtualMemory in kernel-mode (Microsoft just don't want you to use it I guess).
A second approach would be to attach to the context of the process you'd like to read the memory of, and then rely on MmCopyMemory (documented at MSDN) to copy memory from an address valid in the process you've just attached to, to your own buffer. Then you can access the copied memory from your own buffer. Remember to detach.
Alternatively, you can take the path which #RbMm suggested. Personally, I'd take his suggestion because it is a documented approach, and you're likely to have more success with implementing it (not to mention you'll have less work to do).

Converting a string into a function in c++

I have been looking for a way to dynamically load functions into c++ for some time now, and I think I have finally figure it out. Here is the plan:
Pass the function as a string into C++ (via a socket connection, a file, or something).
Write the string into file.
Have the C++ program compile the file and execute it. If there are any errors, catch them and return it.
Have the newly executed program with the new function pass the memory location of the function to the currently running program.
Save the location of the function to a function pointer variable (the function will always have the same return type and arguments, so
this simplifies the declaration of the pointer).
Run the new function with the function pointer.
The issue is that after step 4, I do not want to keep the new program running since if I do this very often, many running programs will suck up threads. Is there some way to close the new program, but preserve the memory location where the new function is stored? I do not want it being overwritten or made available to other programs while it is still in use.
If you guys have any suggestions for the other steps as well, that would be appreciated as well. There might be other libraries that do things similar to this, and it is fine to recommend them, but this is the approach I want to look into — if not for the accomplishment of it, then for the knowledge of knowing how to do so.
Edit: I am aware of dynamically linked libraries. This is something I am largely looking into to gain a better understanding of how things work in C++.

I can't see how this can work. When you run the new program it'll be a separate process and so any addresses in its process space have no meaning in the original process.
And not just that, but the code you want to call doesn't even exist in the original process, so there's no way to call it in the original process.
As Nick says in his answer, you need either a DLL/shared library or you have to set up some form of interprocess communication so the original process can send data to the new process to be operated on by the function in question and then sent back to the original process.

How about a Dynamic Link Library?
These can be linked/unlinked/replaced at runtime.
Or, if you really want to communicated between processes, you could use a named pipe.
edit- you can also create named shared memory.

for the step 4. we can't directly pass the memory location(address) from one process to another process because the two process use the different virtual memory space. One process can't use memory in other process.
So you need create a shared memory through two processes. and copy your function to this memory, then you can close the newly process.
for shared memory, if in windows, looks Creating Named Shared Memory
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366551(v=vs.85).aspx
after that, you still create another memory space to copy function to it again.
The idea is that the normal memory allocated only has read/write properties, if execute the programmer on it, the CPU will generate the exception.
So, if in windows, you need use VirtualAlloc to allocate the memory with the flag,PAGE_EXECUTE_READWRITE (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx)
void* address = NULL;
address= VirtualAlloc(NULL,
sizeof(emitcode),
MEM_COMMIT|MEM_RESERVE,
PAGE_EXECUTE_READWRITE);
After copy the function to address, you can call the function in address, but need be very careful to keep the stack balance.

Dynamic library are best suited for your problem. Also forget about launching a different process, it's another problem by itself, but in addition to the post above, provided that you did the virtual alloc correctly, just call your function within the same "loadder", then you shouldn't have to worry since you will be running the same RAM size bound stack.
The real problems are:
1 - Compiling the function you want to load, offline from the main program.
2 - Extract the relevant code from the binary produced by the compiler.
3 - Load the string.
1 and 2 require deep understanding of the entire compiler suite, including compiler flag options, linker, etc ... not just the IDE's push buttons ...
If you are OK, with 1 and 2, you should know why using a std::string or anything but pure char *, is an harmfull.
I could continue the entire story but it definitely deserve it's book, since this is Hacker/Cracker way of doing things I strongly recommand to the normal user the use of dynamic library, this is why they exists.

Usually we call this code injection ...
Basically it is forbidden by any modern operating system to access something for exceution after the initial loading has been done for sake of security, so we must fall back to OS wide validated dynamic libraries.
That's said, one you have valid compiled code, if you realy want to achieve that effect you must load your function into memory then define it as executable ( clear the NX bit ) in a system specific way.
But let's be clear, your function must be code position independant and you have no help from the dynamic linker in order to resolve symbol ... that's the hard part of the job.

GlobalLock Multithreading

I SEEM to be having an issue with GlobalLock in my application. I say seem because I haven't been able to witness the issue by stepping through yet but when I let it run it breaks in one of two locations.
The app has multiple threads (say 2) simultaneously reading and writing bitmaps from PDF files. each thread handles a different file.
The first location it breaks I am reading a dib from the pdf to be OCRed. OCR is reading the characters on the bitmap and turning them into string data. The second location is when a new PDF is being created with the string data being added over the bitmap.
GlobalLock is being used on a HANDLE created by the following:
GlobalAlloc(GMEM_MOVEABLE, uBytes);
I either get an AccessViolationError (always in the first instance) or I get GlobalLock returning a NULL pointer. (The second occurance)
It seems like one file is being read and another is having a copy written at the same time. There seems to be no pattern to which files it happens on.
Now I understand that the VC++ runtime has been multithreaded since 2005 (I am using VS2010 with 2008 toolchain). But is GlobalLock part of the runtime? It seems to me more like a platform independent thing.
I want to avoid just putting a CRITICAL_SECTION around globallock and globalunlock to get them to work, or at least not know why I am doing so.
Can anyone inform me better about GlobalLock/Unlock?
-A fish out of water

First, the Global* heap routines are provided for compatibility with 16-bit windows. They still work, but there's no real reason to use them anymore, except for compatibility with routines that still use global heap object handles. Note that GlobalLock/GlobalUnlock are not threading locks - they prevent the memory from moving, but multiple threads can GlobalLock the same object at the same time.
That said, they are otherwise thread-safe; they take a heap lock internally, so there is no need to wrap your own locking around every Global* call. If you are having problems like this, it suggests you may be trying to GlobalLock a freed object, or you may be corrupting the heap (heap overflows, use-after-free, etc). You may also be missing thread synchronization on the contents of the heap object - the Global* API does not prevent multiple threads from accessing or modifying the same object at once.

Better way to handle read buffer in C++

I am working a linux server using epoll. I have this code to read buffer
int str_len = read(m_events[i].data.fd, buf, BUF_SIZE);
if (str_len == 0) {
if (!removeClient(m_events[i].data.fd))
break;
close(m_events[i].data.fd);
} else {
char *pdata = buf;
pushWork(pdata);
}
buf is declared like this
buf[BUF_SIZE]
pushWork function declared like this
pushWork(char *pdata){
push pdata to the thread pool's queue
}
Firt of all, I think char *pdata = buf has a problem since it just points to the buffer and the buffer will be overriden whenever a new data comes in. So do I need to memcpy?
Also, is there any other nice way to handle this in c++ ? This code is kind of c style I think I have a better way to do this in c++.

is there any other nice way to handle
this in c++ ? This code is kind of c
style I think I have a better way to
do this in c++
Like I suggested in one of your previous questions, the Boost.Asio library is the de-facto C++ networking library. I strongly suggest you spend some time reading about it and studying the examples if you are writing a C++ networking application.

That way is bad because you have only one buf, meaning that you can only have one thread at a time working on it, so you are not using threads like they should be used.
What you can do is malloc() a buffer, copy the payload to that malloc'd buffer and pass it to the thread and let the thread free that buffer once it's done with it. Just be sure to free it or else there will be memory leaks.
example code:
char * p;
p = (char *)malloc(str_len+1);
memcpy(p, buf, str_len+1);
pushWork(p); //free p inside, after use.

You will need to do a memcpy (or something equivalent) unless you can make it so that the data in (buf) is never needed after pushWork() returns. If you can afford to all the processing of the buffer's data inside pushWork(), OTOH, no copying of the buffer is necessary.
Another alternative would be to allocate a buffer off the heap in advance each time therough the epoll loop, and read() directly into that buffer instead of reading into a buffer on the stack... that way the dynamically allocated buffer would still be available for use ever after the event loop goes on to other things. Note that doing this would open the door to possible memory leaks or memory exhaustion though, so you'd need to be careful to avoid those issues.
As far as a "C++ way" to do this, I'm sure there are some, but I don't know that they will necessarily improve your program. The C way works fine, why fix what isn't broken?
(One other thing to note: if you are using non-blocking I/O, then read() will sometimes read and return fewer than BUF_SIZE bytes, in which case you'll need logic to handle that eventuality. If you are using blocking I/O, OTOH, read() will sometimes block for long periods (e.g. if a client machine dies having sent only half of a buffer) which will block your epoll loop and make it unresponsive for perhaps several minutes. That's a more difficult problem to deal with, which is why I usually end up using non-blocking I/O, even if it is more of a pain to get right.

What can modify the frame pointer?

I have a very strange bug cropping up right now in a fairly massive C++ application at work (massive in terms of CPU and RAM usage as well as code length - in excess of 100,000 lines). This is running on a dual-core Sun Solaris 10 machine. The program subscribes to stock price feeds and displays them on "pages" configured by the user (a page is a window construct customized by the user - the program allows the user to configure such pages). This program used to work without issue until one of the underlying libraries became multi-threaded. The parts of the program affected by this have been changed accordingly. On to my problem.
Roughly once in every three executions the program will segfault on startup. This is not necessarily a hard rule - sometimes it'll crash three times in a row then work five times in a row. It's the segfault that's interesting (read: painful). It may manifest itself in a number of ways, but most commonly what will happen is function A calls function B and upon entering function B the frame pointer will suddenly be set to 0x000002. Function A:
result_type emit(typename type_trait<T_arg1>::take _A_a1) const
{ return emitter_type::emit(impl_, _A_a1); }
This is a simple signal implementation. impl_ and _A_a1 are well-defined within their frame at the crash. On actual execution of that instruction, we end up at program counter 0x000002.
This doesn't always happen on that function. In fact it happens in quite a few places, but this is one of the simpler cases that doesn't leave that much room for error. Sometimes what will happen is a stack-allocated variable will suddenly be sitting on junk memory (always on 0x000002) for no reason whatsoever. Other times, that same code will run just fine. So, my question is, what can mangle the stack so badly? What can actually change the value of the frame pointer? I've certainly never heard of such a thing. About the only thing I can think of is writing out of bounds on an array, but I've built it with a stack protector which should come up with any instances of that happening. I'm also well within the bounds of my stack here. I also don't see how another thread could overwrite the variable on the stack of the first thread since each thread has it's own stack (this is all pthreads). I've tried building this on a linux machine and while I don't get segfaults there, roughly one out of three times it will freeze up on me.

Stack corruption, 99.9% definitely.
The smells you should be looking carefully for are:-
Use of 'C' arrays
Use of 'C' strcpy-style functions
memcpy
malloc and free
thread-safety of anything using pointers
Uninitialised POD variables.
Pointer Arithmetic
Functions trying to return local variables by reference

I had that exact problem today and was knee-deep in gdb mud and debugging for a straight hour before occurred to me that I simply wrote over array boundaries (where I didn't expect it the least) of a C array.
So, if possible, use vectors instead because any decend STL implementation will give good compiler messages if you try that in debug mode (whereas C arrays punish you with segfaults).

I'm not sure what you're calling a "frame pointer", as you say:
On actual execution of that
instruction, we end up at program
counter 0x000002
Which makes it sound like the return address is being corrupted. The frame pointer is a pointer that points to the location on the stack of the current function call's context. It may well point to the return address (this is an implementation detail), but the frame pointer itself is not the return address.
I don't think there's enough information here to really give you a good answer, but some things that might be culprits are:
incorrect calling convention. If you're calling a function using a calling convention different from how the function was compiled, the stack may become corrupted.
RAM hit. Anything writing through a bad pointer can cause garbage to end up on the stack. I'm not familiar with Solaris, but most thread implementations have the threads in the same process address space, so any thread can access any other thread's stack. One way a thread can get a pointer into another thread's stack is if the address of a local variable is passed to an API that ultimately deals with the pointer on a different thread. unless you synchronize things properly, this will end up with the pointer accessing invalid data. Given that you're dealing with a "simple signal implementation", it seems like it's possible that one thread is sending a signal to another. Maybe one of the parameters in that signal has a pointer to a local?

There's some confusion here between stack overflow and stack corruption.
Stack Overflow is a very specific issue cause by try to use using more stack than the operating system has allocated to your thread. The three normal causes are like this.
void foo()
{
foo(); // endless recursion - whoops!
}
void foo2()
{
char myBuffer[A_VERY_BIG_NUMBER]; // The stack can't hold that much.
}
class bigObj
{
char myBuffer[A_VERY_BIG_NUMBER];
}
void foo2( bigObj big1) // pass by value of a big object - whoops!
{
}
In embedded systems, thread stack size may be measured in bytes and even a simple calling sequence can cause problems. By default on windows, each thread gets 1 Meg of stack, so causing stack overflow is much less of a common problem. Unless you have endless recursion, stack overflows can always be mitigated by increasing the stack size, even though this usually is NOT the best answer.
Stack Corruption simply means writing outside the bounds of the current stack frame, thus potentially corrupting other data - or return addresses on the stack.
At it's simplest:-
void foo()
{
char message[10];
message[10] = '!'; // whoops! beyond end of array
}

That sounds like a stack overflow problem - something is writing beyond the bounds of an array and trampling over the stack frame (and probably the return address too) on the stack. There's a large literature on the subject. "The Shell Programmer's Guide" (2nd Edition) has SPARC examples that may help you.

With C++ unitialized variables and race conditions are likely suspects for intermittent crashes.

Is it possible to run the thing through Valgrind? Perhaps Sun provides a similar tool. Intel VTune (Actually I was thinking of Thread Checker) also has some very nice tools for thread debugging and such.
If your employer can spring for the cost of the more expensive tools, they can really make these sorts of problems a lot easier to solve.

It's not hard to mangle the frame pointer - if you look at the disassembly of a routine you will see that it is pushed at the start of a routine and pulled at the end - so if anything overwrites the stack it can get lost. The stack pointer is where the stack is currently at - and the frame pointer is where it started at (for the current routine).
Firstly I would verify that all of the libraries and related objects have been rebuilt clean and all of the compiler options are consistent - I've had a similar problem before (Solaris 2.5) that was caused by an object file that hadn't been rebuilt.
It sounds exactly like an overwrite - and putting guard blocks around memory isn't going to help if it is simply a bad offset.
After each core dump examine the core file to learn as much as you can about the similarities between the faults. Then try to identify what is getting overwritten. As I remember the frame pointer is the last stack pointer - so anything logically before the frame pointer shouldn't be modified in the current stack frame - so maybe record this and copy it elsewhere and compare upon return.

Is something meaning to assign a value of 2 to a variable but instead is assigning its address to 2?
The other details are lost on me but "2" is the recurring theme in your problem description. ;)

I would second that this definitely sounds like a stack corruption due to out of bound array or buffer writing. Stack protector would be good as long as the writing is sequential, not random.

I second the notion that it is likely stack corruption. I'll add that the switch to a multi-threaded library makes me suspicious that what has happened is a lurking bug has been exposed. Possibly the sequencing the buffer overflow was occurring on unused memory. Now it's hitting another thread's stack. There are many other possible scenarios.
Sorry if that doesn't give much of a hint at how to find it.

I tried Valgrind on it, but unfortunately it doesn't detect stack errors:
"In addition to the performance penalty an important limitation of Valgrind is its inability to detect bounds errors in the use of static or stack allocated data."
I tend to agree that this is a stack overflow problem. The tricky thing is tracking it down. Like I said, there's over 100,000 lines of code to this thing (including custom libraries developed in-house - some of it going as far back as 1992) so if anyone has any good tricks for catching that sort of thing, I'd be grateful. There's arrays being worked on all over the place and the app uses OI for its GUI (if you haven't heard of OI, be grateful) so just looking for a logical fallacy is a mammoth task and my time is short.
Also agreed that the 0x000002 is suspect. It is about the only constant between crashes. Even weirder is the fact that this only cropped up with the multi-threaded switch. I think that the smaller stack as a result of the multiple-threads is what's making this crop up now, but that's pure supposition on my part.
No one asked this, but I built with gcc-4.2. Also, I can guarantee ABI safety here so that's also not the issue. As for the "garbage at the end of the stack" on the RAM hit, the fact that it is universally 2 (though in different places in the code) makes me doubt that as garbage tends to be random.

It is impossible to know, but here are some hints that I can come up with.
In pthreads you must allocate the stack and pass it to the thread. Did you allocate enough? There is no automatic stack growth like in a single threaded process.
If you are sure that you don't corrupt the stack by writing past stack allocated data check for rouge pointers (mostly uninitialized pointers).
One of the threads could overwrite some data that others depend on (check your data synchronisation).
Debugging is usually not very helpful here. I would try to create lots of log output (traces for entry and exit of every function/method call) and then analyze the log.
The fact that the error manifest itself differently on Linux may help. What thread mapping are you using on Solaris? Make sure you map every thread to it's own LWP to ease the debugging.

Also agreed that the 0x000002 is suspect. It is about the only constant between crashes. Even weirder is the fact that this only cropped up with the multi-threaded switch. I think that the smaller stack as a result of the multiple-threads is what's making this crop up now, but that's pure supposition on my part.
If you pass anything on the stack by reference or by address, this would most certainly happen if another thread tried to use it after the first thread returned from a function.
You might be able to repro this by forcing the app onto a single processor. I don't know how you do that with Sparc.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Bad memory address when allocating OpenCL buffer - c++

Related

Can Kernel mode driver do ReadProcessMemory on any process?

Converting a string into a function in c++

GlobalLock Multithreading

Better way to handle read buffer in C++

What can modify the frame pointer?

Categories

Resources