Crash using memcpy/memmove repeatedly - c++

So I have this litte code, It loops through memory regions, saves them to a byte array, then uses it and finally deletes it (deallocate it). This all happens in a non-main thread, therefore the use of CriticalSections.
Code looks like this:
SIZE_T addr_min = (SIZE_T)sysInfo.lpMinimumApplicationAddress;
SIZE_T addr_max = (SIZE_T)sysInfo.lpMaximumApplicationAddress;
while (addr_min < addr_max)
{
MEMORY_BASIC_INFORMATION mbi = { 0 };
if (!::VirtualQueryEx(hndl, (LPCVOID)addr_min, &mbi, sizeof(mbi)))
{
continue;
}
if (mbi.State == MEM_COMMIT && ((mbi.Protect & PAGE_GUARD) == 0) && ((mbi.Protect & PAGE_NOACCESS) == 0))
{
SIZE_T region_size = mbi.RegionSize;
PVOID Base_Address = mbi.BaseAddress;
BYTE * dump = new BYTE[region_size + 1];
EnterCriticalSection(...);
memset(dump, 0x00, region_size + 1);
//this is where it crashes, same thing with memcpy
//Access violation reading "dump"'s address:
//memmove(unsigned char * dst=0x42aff024, unsigned char *
//src=0x7a768000, unsigned long count=1409024)
std::memmove(dump, Base_Address, region_size);
LeaveCriticalSection(...);
//Do Stuff with dump, that only involves reading from it
if (dump){
delete[] dump;
dump = NULL;
}
}
addr_min += mbi.RegionSize;
}
Code works fine most of the time. But sometimes it just crashes in memcpy/memmove. Under the Visual Studio Debugger it shows that the crash is because there is a error reading "dump", how is that possible if I just define and allocated memory for it. Thanks!
Also, could it be because memory can change in the middle of memcpy?

Related

Finding the rendezvous structure of tracee (program being debugged)

I need debugger I am writing to give me the name of shared lib that program being debugged is linking with, or loading dynamically. I get the rendezvous structure as described in link.h, and answers to other questions, using DT_DEBUG, in the loop over _DYNAMIC[].
First, debugger never hits the break point set at r_brk.
Then I put a break in the program being debugged, and use link_map to print all loaded libraries. It only prints libraries loaded by the debugger, not the program being debugged.
It seems that, the rendezvous structure I am getting belongs to the debugger itself. If so, could you please tell me how to get the rendezvous structure of the program I am debugging? If what I am doing must work, your confirmation will be helpful, perhaps with some hint as to what else might be needed.
Thank you.
// You need to include <link.h>. All structures are explained
// in elf(5) manual pages.
// Caller has opened "prog_name", the debugee, and fd is the
// file descriptor. You can send the name instead, and do open()
// here.
// Debugger is tracing the debugee, so we are using ptrace().
void getRandezvousStructure(int fd, pid_t pd, r_debug& rendezvous) {
Elf64_Ehdr elfHeader;
char* elfHdrPtr = (char*) &elfHeader;
read(fd, elfHdrPtr, sizeof(elfHeader));
Elf64_Addr debugeeEntry = elfHeader.e_entry; // entry point of debugee
// Here, set a break at debugeeEntry, and after "PTRACE_CONT",
// and waitpid(), remove the break, and set rip back to debugeeEntry.
// After that, here it goes.
lseek(fd, elfHeader.e_shoff, SEEK_SET); // offset of section header
Elf64_Shdr secHeader;
elfHdrPtr = (char*) &secHeader;
Elf64_Dyn* dynPtr;
// Keep reading until we get: secHeader.sh_addr.
// That is the address of _DYNAMIC.
for (int i = 0; i < elfHeader.e_shnum; i++) {
read(fd, elfHdrPtr, elfHeader.e_shentsize);
if (secHeader.sh_type == SHT_DYNAMIC) {
dynPtr = (Elf64_Dyn*) secHeader.sh_addr; // address of _DYNAMIC
break;
}
}
// Here, we get "dynPtr->d_un.d_ptr" which points to rendezvous
// structure, r_debug
uint64_t data;
for (;; dynPtr++) {
data = ptrace(PTRACE_PEEKDATA, pd, dynPtr, 0);
if (data == DT_NULL) break;
if (data == DT_DEBUG) {
data = ptrace(PTRACE_PEEKDATA, pd, (uint64_t) dynPtr + 8 , 0);
break;
}
}
// Using ptrace() we read sufficient chunk of memory of debugee
// to copy to rendezvous.
int ren_size = sizeof(rendezvous);
char* buffer = new char[2 * ren_size];
char* p = buffer;
int total = 0;
uint64_t value;
for (;;) {
value = ptrace(PTRACE_PEEKDATA, pd, data, 0);
memcpy(p, &value, sizeof(value));
total += sizeof(value);
if (total > ren_size + sizeof(value)) break;
data += sizeof(data);
p += sizeof(data);
}
// Finally, copy the memory to rendezvous, which was
// passed by reference.
memcpy(&rendezvous, buffer, ren_size);
delete [] buffer;
}

ReadProcessMemory _out_ bytesread

The program uses ReadProcessMemory to scan through chunks of memory for a certain value. Unfortunately when I call ReadProcessMemory I get error 299.
void update_memblock(MEMBLOCK *mb)
{
//variables
static unsigned char tempbuf[128*1024];
size_t bytes_left;
size_t total_read;
size_t bytes_to_read;
size_t bytes_read;
size_t sizeMem;
size_t MemoryBase;
bytes_left = mb->size;
total_read = 0;
while (bytes_left)
{
bytes_to_read = (bytes_left > sizeof(tempbuf)) ? sizeof(tempbuf) : bytes_left;
ReadProcessMemory(mb->hProc ,mb->addr + total_read,mb->buffer, bytes_to_read, (SIZE_T*)&bytes_read);
if (bytes_read != bytes_to_read)break;
memcpy(mb->buffer + total_read, tempbuf,bytes_read);
bytes_left -= bytes_read;
total_read += bytes_read;
}
mb->size = total_read;
}
Erorr Code 299 (0x12B) ERROR_PARTIAL_COPY
"Only part of a ReadProcessMemory or WriteProcessMemory request was completed"
You're receiving this error because you're trying to read memory from a page that isn't "allocated".
You want to use VirtualQueryEx() on each page of memory which yields a MEMORY_BASIC_INFORMATION structure, which contains 2 variables of note:
State: can be MEM_COMMIT, MEM_FREE or MEM_RESERVE
Protect: can be any of the Memory Protection Constants
You want to loop through all the pages of memory, call VirtualQueryEx() on them and skip any pages that are bad. I like to skip all pages/regions in which state != MEM_COMMIT and Protect == PAGE_NOACCESS
Here is a psuedo code example:
MEMORY_BASIC_INFORMATION mbi = { 0 };
while (LoopingThroughTheMemories.bat)
{
if (!VirtualQueryEx(hProc, currentMemoryAddress, &mbi, sizeof(mbi))) continue
if (mbi.State != MEM_COMMIT || mbi.Protect == PAGE_NOACCESS) continue;
//good mem region, do ReadProcessMemory() stuffs
}

Have a very long buffer but only use the last 1GB bytes of data.

Need to write an application in C/C++ on Linux that receives a stream of bytes from a socket and process them. The total bytes could be close to 1TB. If I have unlimited amount memory, I will just put it all in the memory, so my application can easily process data. It's much easy to do many things on flat memory space, such as memmem(), memcmp() ... On a circular buffer, the application has to be extra smart to be aware of the circular buffer.
I have about 8G of memory, but luckily due to locality, my application never needs to go back by more than 1GB from the latest data it received. Is there a way to have a 1TB buffer, with only the latest 1GB data mapped to physical memory? If so, how to do it?
Any ideas? Thanks.
Here's an example. It sets up a full terabyte mapping, but initially inaccessible (PROT_NONE). You, the programmer, maintain a window that can only extend and move upwards in memory. The example program uses a one and a half gigabyte window, advancing it in steps of 1,023,739,137 bytes (the mapping_use() makes sure the available pages cover at least the desired region), and does actually modify every page in every window, just to be sure.
#define _GNU_SOURCE
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>
typedef struct mapping mapping;
struct mapping {
unsigned char *head; /* Start of currently accessible region */
unsigned char *tail; /* End of currently accessible region */
unsigned char *ends; /* End of region */
size_t page; /* Page size of this mapping */
};
/* Discard mapping.
*/
void mapping_free(mapping *const m)
{
if (m && m->ends > m->head) {
munmap(m->head, (size_t)(m->ends - m->head));
m->head = NULL;
m->tail = NULL;
m->ends = NULL;
m->page = 0;
}
}
/* Move the accessible part up in memory, to [from..to).
*/
int mapping_use(mapping *const m, void *const from, void *const to)
{
if (m && m->ends > m->head) {
unsigned char *const head = ((unsigned char *)from <= m->head) ? m->head :
((unsigned char *)from >= m->ends) ? m->ends :
m->head + m->page * (size_t)(((size_t)((unsigned char *)from - m->head)) / m->page);
unsigned char *const tail = ((unsigned char *)to <= head) ? head :
((unsigned char *)to >= m->ends) ? m->ends :
m->head + m->page * (size_t)(((size_t)((unsigned char *)to - m->head) + m->page - 1) / m->page);
if (head > m->head) {
munmap(m->head, (size_t)(head - m->head));
m->head = head;
}
if (tail > m->tail) {
#ifdef USE_MPROTECT
mprotect(m->tail, (size_t)(tail - m->tail), PROT_READ | PROT_WRITE);
#else
void *result;
do {
result = mmap(m->tail, (size_t)(tail - m->tail), PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_FIXED | MAP_PRIVATE | MAP_NORESERVE, -1, (off_t)0);
} while (result == MAP_FAILED && errno == EINTR);
if (result == MAP_FAILED)
return errno = ENOMEM;
#endif
m->tail = tail;
}
return 0;
}
return errno = EINVAL;
}
/* Initialize a mapping.
*/
int mapping_create(mapping *const m, const size_t size)
{
void *base;
size_t page, truesize;
if (!m || size < (size_t)1)
return errno = EINVAL;
m->head = NULL;
m->tail = NULL;
m->ends = NULL;
m->page = 0;
/* Obtain default page size. */
{
long value = sysconf(_SC_PAGESIZE);
page = (size_t)value;
if (value < 1L || (long)page != value)
return errno = ENOTSUP;
}
/* Round size up to next multiple of page. */
if (size % page)
truesize = size + page - (size % page);
else
truesize = size;
/* Create mapping. */
do {
errno = ENOTSUP;
base = mmap(NULL, truesize, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE, -1, (off_t)0);
} while (base == MAP_FAILED && errno == EINTR);
if (base == MAP_FAILED)
return errno;
/* Success. */
m->head = base;
m->tail = base;
m->ends = (unsigned char *)base + truesize;
m->page = page;
errno = 0;
return 0;
}
static void memtouch(void *const ptr, const size_t size)
{
if (ptr && size > 0) {
unsigned char *mem = (unsigned char *)ptr;
const size_t step = 2048;
size_t n = size / (size_t)step - 1;
mem[0]++;
mem[size-1]++;
while (n-->0) {
mem += step;
mem[0]++;
}
}
}
int main(void)
{
const size_t size = (size_t)1024 * (size_t)1024 * (size_t)1024 * (size_t)1024;
const size_t need = (size_t)1500000000UL;
const size_t step = (size_t)1023739137UL;
unsigned char *base;
mapping map;
size_t i;
if (mapping_create(&map, size)) {
fprintf(stderr, "Cannot create a %zu-byte mapping: %m.\n", size);
return EXIT_FAILURE;
}
printf("Have a %zu-byte mapping at %p to %p.\n", size, (void *)map.head, (void *)map.ends);
fflush(stdout);
base = map.head;
for (i = 0; i <= size - need; i += step) {
printf("Requesting %p to %p .. ", (void *)(base + i), (void *)(base + i + need));
fflush(stdout);
if (mapping_use(&map, base + i, base + i + need)) {
printf("Failed (%m).\n");
fflush(stdout);
return EXIT_FAILURE;
}
printf("received %p to %p.\n", (void *)map.head, (void *)map.tail);
fflush(stdout);
memtouch(base + i, need);
}
mapping_free(&map);
return EXIT_SUCCESS;
}
The approach is twofold. First, an inaccessible (PROT_NONE) mapping is created to reserve the necessary virtual contiguous address space. If we omit this step, it would make it possible for a malloc() call or similar to acquire pages within this range, which would defeat the entire purpose; a single terabyte-long mapping.
Second, when the accessible window extends into the region, either mprotect() (if USE_MPROTECT is defined), or mmap() is used to make the required pages accessible. Pages no longer needed are completely unmapped.
Compile and run using
gcc -Wall -Wextra -std=c99 example.c -o example
time ./example
or, to use mmap() only once and mprotect() to move the window,
gcc -DUSE_MPROTECT=1 -Wall -Wextra -std=c99 example.c -o example
time ./example
Note that you probably don't want to run the test if you don't have at least 4GB of physical RAM.
On this particular machine (i5-4200U laptop with 4GB of RAM, 3.13.0-62-generic kernel on Ubuntu x86_64), quick testing didn't show any kind of performance difference between mprotect() and mmap(), in execution speed or resident set size.
If anyone bothers to compile and run the above, and finds that one of them has a repeatable benefit/drawback (resident set size or time used), I'd very much like to know about it. Please also define your kernel and CPU used.
I'm not sure which details I should expand on, since this is pretty straightforward, really, and the Linux man pages project man 2 mmap and man 2 mprotect pages are quite descriptive. If you have any questions on this approach or program, I'd be happy to try and elaborate.

free() causing stack overflow

I am trying to develop an application with Visual Studio C++ using some communication DLLs .
In one of the DLLs, I have a stack overflow exception.
I have two functions, one receives packet, and another function which do some operations on the packets.
static EEcpError RxMessage(unsigned char SrcAddr, unsigned char SrcPort, unsigned char DestAddr, unsigned char DestPort, unsigned char* pMessage, unsigned long MessageLength)
{
EEcpError Error = ERROR_MAX;
TEcpChannel* Ch = NULL;
TDevlinkMessage* RxMsg = NULL;
// Check the packet is sent to an existing port
if (DestPort < UC_ECP_CHANNEL_NB)
{
Ch = &tEcpChannel[DestPort];
RxMsg = &Ch->tRxMsgFifo.tDevlinkMessage[Ch->tRxMsgFifo.ucWrIdx];
// Check the packet is not empty
if ((0UL != MessageLength)
&& (NULL != pMessage))
{
if (NULL == RxMsg->pucDataBuffer)
{
// Copy the packet
RxMsg->SrcAddr = SrcAddr;
RxMsg->SrcPort = SrcPort;
RxMsg->DestAddr =DestAddr;
RxMsg->DestPort = DestPort;
RxMsg->ulDataBufferSize = MessageLength;
RxMsg->pucDataBuffer = (unsigned char*)malloc(RxMsg->ulDataBufferSize);
if (NULL != RxMsg->pucDataBuffer)
{
memcpy(RxMsg->pucDataBuffer, pMessage, RxMsg->ulDataBufferSize);
// Prepare for next message
if ((UC_ECP_FIFO_DEPTH - 1) <= Ch->tRxMsgFifo.ucWrIdx)
{
Ch->tRxMsgFifo.ucWrIdx = 0U;
}
else
{
Ch->tRxMsgFifo.ucWrIdx += 1U;
}
// Synchronize the application
if (0 != OS_MbxPost(Ch->hEcpMbx))
{
Error = ERROR_NONE;
}
else
{
Error = ERROR_WINDOWS;
}
}
else
{
Error = ERROR_WINDOWS;
}
}
else
{
// That should never happen. In case it happens, that means the FIFO
// is full. Either the FIFO size should be increased, or the listening thread
// does no more process the messages.
// In that case, the last received message is lost (until the messages are processed, or forever...)
Error = ERROR_FIFO_FULL;
}
}
else
{
Error = ERROR_INVALID_PARAMETER;
}
}
else
{
// Trash the packet, nothing else to do
Error = ERROR_NONE;
}
return Error;
}
static EEcpError ProcessNextRxMsg(unsigned char Port, unsigned char* SrcAddr, unsigned char* SrcPort, unsigned char* DestAddr, unsigned char* Packet, unsigned long* PacketSize)
{
EEcpError Error = ERROR_MAX;
TEcpChannel* Ch = &tEcpChannel[Port];
TDevlinkMessage* RxMsg = &Ch->tRxMsgFifo.tDevlinkMessage[Ch->tRxMsgFifo.ucRdIdx];
if (NULL != RxMsg->pucDataBuffer)
{
*SrcAddr = RxMsg->ucSrcAddr;
*SrcPort = RxMsg->ucSrcPort;
*DestAddr = RxMsg->ucDestAddr;
*PacketSize = RxMsg->ulDataBufferSize;
memcpy(Packet, RxMsg->pucDataBuffer, RxMsg->ulDataBufferSize);
// Cleanup the processed message
free(RxMsg->pucDataBuffer); // <= Exception stack overflow after 40 min
RxMsg->pucDataBuffer = NULL;
RxMsg->ulDataBufferSize = 0UL;
RxMsg->ucSrcAddr = 0U;
RxMsg->ucSrcPort = 0U;
RxMsg->ucDestAddr = 0U;
RxMsg->ucDestPort = 0U;
// Prepare for next message
if ((UC_ECP_FIFO_DEPTH - 1) <= Ch->tRxMsgFifo.ucRdIdx)
{
Ch->tRxMsgFifo.ucRdIdx = 0U;
}
else
{
Ch->tRxMsgFifo.ucRdIdx += 1U;
}
Error =ERROR_NONE;
}
else
{
Error = ERROR_NULL_POINTER;
}
return Error;
}
The problem occur after 40 min, during all this time I receive a lot of packets, and everything is going well.
After 40 min, the stack overflow exception occur on the free.
I don't know what is going wrong.
Can anyone help me please ?
Thank you.
A few suggestions:
The line
memcpy(Packet, RxMsg->pucDataBuffer, RxMsg->ulDataBufferSize);
is slightly suspect as it occurs just before the free() call which crashes. How is Packet allocated and how are you making sure a buffer overflow does not occur here?
If this is an asynchronous / multi-threaded program do you have the necessary locks to prevent data from being written/read at the same time?
Best bet if you still need to find the issue is to run a tool like Valgrind to help diagnose and narrow down memory issues more precisely. As dasblinklight mentions in the comments the issue most likely originates somewhere else and merely happens to show up at the free() call.

Segmentation fault occurs when calling function in the Pin tool

I am currently building a Pin tool which detects uninitialized reads from Linux application, based on this blog post.
You can also see the author's code from the blog.
Since this one is for Windows, I tried to create a Linux-compatible one.
But when I execute my Pin tool with application, a segmentation fault occurs.
The weird one is that the fault occurs when a function is called(the fault occurs when the pin tool is calling the function taint_get which is in the taint_define function), not because of access of uninitialized heap pointer or such points of general segmentation fault.
The point of the segmentation fault looks like this:
VOID Instruction(INS ins, VOID *v)
{
Uninit_Instruction(ins, v);
}
void Uninit_Instruction(INS ins, void* v)
{
// check if the stack pointer is altered (i.e. memory is allocated on the
// stack by subtracting an immediate from the stack pointer)
if(INS_Opcode(ins) == XED_ICLASS_SUB &&
INS_OperandReg(ins, 0) == REG_STACK_PTR &&
INS_OperandIsImmediate(ins, 1))
{
// insert call after, so we can pass the stack pointer directly
INS_InsertCall(ins, IPOINT_AFTER, (AFUNPTR)taint_undefined,
IARG_REG_VALUE,
REG_STACK_PTR,
IARG_ADDRINT, (UINT32) INS_OperandImmediate(ins, 1),
IARG_END);
}
UINT32 memOperands = INS_MemoryOperandCount(ins);
for (UINT32 memOp = 0; memOp < memOperands; memOp++)
{
if (INS_MemoryOperandIsRead(ins, memOp))
{
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)taint_check,
IARG_INST_PTR,
IARG_MEMORYOP_EA, memOp,
IARG_MEMORYREAD_SIZE,
IARG_END);
}
if (INS_MemoryOperandIsWritten(ins, memOp))
{
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)taint_define,
IARG_MEMORYOP_EA, memOp,
IARG_MEMORYWRITE_SIZE,
IARG_END);
}
}
}
The callback functions look like these:
// Taint this address as written
void taint_define(ADDRINT addr, UINT32 size)
{
// Debug purpose
TraceFile << "taint_define: " << addr << ", " << size << endl;
// taint the addresses as defined, pretty slow, but easiest to implement
for (UINT32 i = 0; i < size; i++)
{
//TraceFile << "taint_define_loop size: " << size << endl;
UINT32 *t = taint_get(addr + i);
TraceFile << "after taint_get" << endl;
UINT32 index = (addr + i) % 0x20000;
// define this bit
t[index / 32] |= 1 << (index % 32);
}
}
inline UINT32* taint_get(ADDRINT addr)
{
// Debug purpose
TraceFile << "taint_get: " << addr;
// allocate memory to taint these memory pages
if(taint[addr / 0x20000] == NULL) {
// we need an 16kb page to track 128k of memory
/*
taint[addr / 0x20000] = (UINT32 *) W::VirtualAlloc(NULL, 0x20000 / 8,
MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
*/
taint[addr / 0x20000] = (UINT32*)malloc(0x20000/8);
}
return taint[addr / 0x20000];
}
The output looks like this:
C:Tool (or Pin) caused signal 11 at PC 0x7fcf475e08a4
segmentation fault (core dumped)
and the log is here.
Watched Image count: 0x1
WatchedImage: unread_3vars
Uninit_Image
Uninit_Image
Thread start
taint_define: 0x7fff06930d58, 0x8
I'm currently working on Fedora core 17 x86-64, gcc 4.7.2, and Pin 2.12-58423.
And, my pin tool code is attached here
I am currently building a Pin tool which detects uninitialized reads from Linux application, based on this blog post.
This doesn't really answer your question, and you may have other reasons to learn Pin tool, but ...
We've found Pin-based tools inadequate for instrumenting non-toy programs. IF your goal is to detect uninitialized memory reads, consider using Memory Sanitizer.
readb4write is 32 bit only. I don't know how are you are compiling it but even if you add -m32 it might still not work. This is what happened in my case but i am running it on Windows.
You can tell it is 32 bit only by looking for example at the comment: "// we use 0x8000 chunks of 128k to taint"
0x8000 x 128kb = 4294967296 which is the virtual range limit of 32 bit process.
On x64 you would need to cater for 48 bit addresses in taint_get method. This is still a naive implementation but so is everything else
typedef UINT64 * TTaint[0x80000];
TTaint *taintTable[0x10000] = { 0 };
inline UINT64 *taint_get(ADDRINT addr)
{
UINT64 chunkAddress = addr / 0x20000; //get number address of 128kb chunk.
UINT64 firstLevAddr = chunkAddress / 0x10000;
UINT64 secondLevelAddr = chunkAddress % 0x10000;
TTaint *taint = NULL;
if (taintTable[firstLevAddr] == NULL){
taintTable[firstLevAddr] = (TTaint*)W::VirtualAlloc(NULL, sizeof(TTaint),
MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
}
taint = taintTable[firstLevAddr];
// allocate memory to taint these memory pages
if ((*taint)[secondLevelAddr ] == NULL) {
// we need an 16kb page to track 128k of memory
(*taint)[secondLevelAddr] = (UINT64 *)W::VirtualAlloc(NULL, 0x20000 / 8,
MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
}
return (*taint)[secondLevelAddr];
}
Also most (if not all ) variables need to be UINT64 instead of UINT32. And 32 need to be changed to 64.
There is another problem i have not solved yet. There is a line that detects if the instruction accessing uninitialized memory belongs to the program being checked. It is unlikely that it is still valid in x64:
(ip & 0xfff00000) == 0x00400000)
I will publish the code in github if i manage to get it working.