faster than Stackwalk - c++

Does anybody know of a better/ faster way to get the call stack than "StackWalk"?
I also think that stackwalk can also be slower on methods with a lot of variables...
(I wonder what commercial profilers do?)
I'm using C++ on windows. :)
thanks :)

I don't know if it's faster, and it won't show you any symbols, and I'm sure you can do better than that, but this is some code I wrote a while back when I needed this info (only works for Windows):
struct CallStackItem
{
void* pc;
CallStackItem* next;
CallStackItem()
{
pc = NULL;
next = NULL;
}
};
typedef void* CallStackHandle;
CallStackHandle CreateCurrentCallStack(int nLevels)
{
void** ppCurrent = NULL;
// Get the current saved stack pointer (saved by the compiler on the function prefix).
__asm { mov ppCurrent, ebp };
// Don't limit if nLevels is not positive
if (nLevels <= 0)
nLevels = 1000000;
// ebp points to the old call stack, where the first two items look like this:
// ebp -> [0] Previous ebp
// [1] previous program counter
CallStackItem* pResult = new CallStackItem;
CallStackItem* pCurItem = pResult;
int nCurLevel = 0;
// We need to read two pointers from the stack
int nRequiredMemorySize = sizeof(void*) * 2;
while (nCurLevel < nLevels && ppCurrent && !IsBadReadPtr(ppCurrent, nRequiredMemorySize))
{
// Keep the previous program counter (where the function will return to)
pCurItem->pc = ppCurrent[1];
pCurItem->next = new CallStackItem;
// Go the the previously kept ebp
ppCurrent = (void**)*ppCurrent;
pCurItem = pCurItem->next;
++nCurLevel;
}
return pResult;
}
void PrintCallStack(CallStackHandle hCallStack)
{
CallStackItem* pCurItem = (CallStackItem*)hCallStack;
printf("----- Call stack start -----\n");
while (pCurItem)
{
printf("0x%08x\n", pCurItem->pc);
pCurItem = pCurItem->next;
}
printf("----- Call stack end -----\n");
}
void ReleaseCallStack(CallStackHandle hCallStack)
{
CallStackItem* pCurItem = (CallStackItem*)hCallStack;
CallStackItem* pPrevItem;
while (pCurItem)
{
pPrevItem = pCurItem;
pCurItem = pCurItem->next;
delete pPrevItem;
}
}

I use Jochen Kalmbachs StackWalker.
I speedet it up this way:
The most time is lost in looking for the PDB files in the default directories and PDB Servers.
I use only one PDB path and implemented a white list for the images I want to get resolved (no need for me to look for user32.pdb)
Sometimes I dont need to dive to the bottom, so I defined a max deep
code changes:
BOOL StackWalker::LoadModules()
{
...
// comment this line out and replace to your pdb path
// BOOL bRet = this->m_sw->Init(szSymPath);
BOOL bRet = this->m_sw->Init(<my pdb path>);
...
}
BOOL StackWalker::ShowCallstack(int iMaxDeep /* new parameter */ ... )
{
...
// define a maximal deep
// for (frameNum = 0; ; ++frameNum )
for (frameNum = 0; frameNum < iMaxDeep; ++frameNum )
{
...
}
}

Check out http://msdn.microsoft.com/en-us/library/bb204633%28VS.85%29.aspx - this is "CaptureStackBackTrace", although it's called as "RtlCaptureStackBackTrace".

Related

How to look up 64-bit module's function table when it's mapped in memory?

My goal is to understand stack unwinding in 64-bit PE32+ executable format in Windows, or how the following API can calculate addresses of a function prologue, body, epilogue, etc.:
CONTEXT context = {0};
RtlCaptureContext(&context);
DWORD64 ImgBase = 0;
RUNTIME_FUNCTION* pRTFn = RtlLookupFunctionEntry(context.Rip, &ImgBase, NULL);
_tprintf(L"Prologue=0x%p\n", (void*)(ImgBase + pRTFn->BeginAddress));
I know that the information on the offsets of all non-leaf functions used by the linker is included in the PE32+ header in the exceptions directory. So I tried to write my own function to parse it. I got to this point where I got stumped:
//INFO -- must be compiled as x64 only!
void GetFunctionTable(BYTE* lpBaseAddress, size_t szImageSz)
{
if(lpBaseAddress)
{
if(szImageSz > sizeof(IMAGE_DOS_HEADER))
{
IMAGE_DOS_HEADER* pDOSHeader = (IMAGE_DOS_HEADER*)lpBaseAddress;
if(pDOSHeader->e_magic == IMAGE_DOS_SIGNATURE)
{
IMAGE_NT_HEADERS* pNtHeader = (IMAGE_NT_HEADERS*)((BYTE*)pDOSHeader + pDOSHeader->e_lfanew);
PIMAGE_DATA_DIRECTORY pDataDirectories = NULL;
if(pNtHeader->OptionalHeader.Magic == IMAGE_NT_OPTIONAL_HDR64_MAGIC)
{
//64-bit image only
IMAGE_NT_HEADERS64* pHdr64 = (IMAGE_NT_HEADERS64*)pNtHeader;
IMAGE_OPTIONAL_HEADER64* pIOH64 = &pHdr64->OptionalHeader;
pDataDirectories = pIOH64->DataDirectory;
IMAGE_DATA_DIRECTORY* pExceptDir = &pDataDirectories[IMAGE_DIRECTORY_ENTRY_EXCEPTION];
if(pExceptDir->VirtualAddress &&
pExceptDir->Size)
{
IMAGE_RUNTIME_FUNCTION_ENTRY* pRFs = (IMAGE_RUNTIME_FUNCTION_ENTRY*)
GetPtrFromRVA64(pExceptDir->VirtualAddress, pNtHeader, lpBaseAddress);
//'pRFs' = should point to an array of RUNTIME_FUNCTION structs
// but in my case it points to an empty region of memory with all zeros.
}
}
}
}
}
}
with the following helper functions:
PIMAGE_SECTION_HEADER GetEnclosingSectionHeader64(DWORD_PTR rva, PIMAGE_NT_HEADERS64 pNTHeader)
{
PIMAGE_SECTION_HEADER section = IMAGE_FIRST_SECTION(pNTHeader);
unsigned int i;
for ( i=0; i < pNTHeader->FileHeader.NumberOfSections; i++, section++ )
{
if ( (rva >= section->VirtualAddress) &&
(rva < (section->VirtualAddress + section->Misc.VirtualSize)))
return section;
}
return 0;
}
LPVOID GetPtrFromRVA64(DWORD rva, const void* pNTHeader, const void* imageBase)
{
PIMAGE_SECTION_HEADER pSectionHdr;
INT_PTR delta;
pSectionHdr = GetEnclosingSectionHeader64( rva, (PIMAGE_NT_HEADERS64)pNTHeader );
if ( !pSectionHdr )
return 0;
delta = (INT_PTR)(pSectionHdr->VirtualAddress - pSectionHdr->PointerToRawData);
return (PVOID) ( (BYTE*)imageBase + rva - delta );
}
So I'm testing it on the self executable:
HMODULE hMod = ::GetModuleHandle(NULL);
MODULEINFO mi = {0};
if(::GetModuleInformation(::GetCurrentProcess(), hMod, &mi, sizeof(mi)))
{
GetFunctionTable((BYTE*)hMod, mi.SizeOfImage);
}
But the problem is that inside my GetFunctionTable when I try to look up the function table mapped in memory in the IMAGE_DIRECTORY_ENTRY_EXCEPTION directory, I'm getting a pointer (i.e. IMAGE_RUNTIME_FUNCTION_ENTRY*) to an empty region of memory. I must be not translating the rva address correctly.
So anyone who knows how PE32+ header is mapped in memory, can please show what am I doing wrong there?

make_fcontext/jump_fcontext used with shared stack

Is there a way to use boost context make_fcontext/jump_fcontext with a shared stack to share coroutine memory by saving/restoring the stack ?
It seems that make_fcontext and jump_fcontext write on the stack themselves and I get crashes when trying to save/restore stack on yield/resume, but it is really hard for me to get what happens as make_fcontext/jump_fcontext are pure assembly code.
Here is the coroutine methods which trigger segmentation fault (the same code works very well if I use a different stack for each coroutine and I don't use the saveStack/restoreStack)
void resume()
{
if (yielded)
{
restoreStack();
yielded = false;
}
else
{
running = true;
thisContext = boost::context::make_fcontext(
(char*)sharedStackPtr + sharedStackSize ,
sharedStackSize,
my_entry_func);
}
boost::context::jump_fcontext(&yieldContext, thisContext, reinterpret_cast<intptr_t>(this));
}
void yield()
{
yielded = true;
saveStack();
boost::context::jump_fcontext(&thisContext, yieldContext, 0);
}
void restoreStack()
{
char* stackTop = (char*)sharedStackPtr + sharedStackSize ;
memcpy(stackTop - savedStackSize, savedStackPtr, savedStackSize);
}
void saveStack()
{
char dummy = 0;
char* stackPointer = &dummy;
char* stackTop = (char*)sharedStackPtr + sharedStackSize ;
assert((stackPointer < stackTop) && (stackPointer >= sharedStackPtr ));
savedStackSize = stackTop - stackPointer;
if (savedStackPtr == nullptr)
{
savedStackPtr = coroutine_stack_alloc(savedStackSize);
}
else
{
savedStackPtr = coroutine_stack_realloc(savedStackPtr, savedStackSize);
}
memcpy(savedStackPtr, stackPointer, savedStackSize);
}
Any idea ? Is there something I do wrong somewhere ?
make_fcontext() must be applied to a stack inorder to initialize the stack before it can be used with jump_fcontext(). Ofcourse you could re-use a stack by applying make_fcontext() after the execution context is finished.

Same names in PCBs in array of pointers to Thread

I am writing a code for thread handling in C++. One instance of a Thread has a pointer to PCB structure and in the constructor of the Thread I just call myPCB = new PCB(name, stackSize, timeSlice, this). It was all working just fine until I tried to make an array of pointers to Thread.
When I just make a pointer to thread and initialized it with new Thread(name, stackSize, timeSlice) the name in PCB of that Thread is appointed correctly.
But when I try it with an array of pointers all the PCBs have the same value for name variable.
I have checked and they are all diffrent PCBs (their IDs are diffrent). Their names get properly initialized in the constructor, but somewhere between the end of the constructor of the Nth and the end of the (N+1)th all names get the same value, that of N+1.
PCB constructor:
PCB::PCB(TName namee, StackSize stackSizee, Time timeSlicee,Thread *threadd){
status = begging;
id = genID++;
if(namee) name = namee;
else name = "Thread"+id;
createStack(stackSizee);
thread = threadd;
timeSlice = timeSlicee;
System::numberOfThreads++;
System::allPCBQueue.add(this);
waitingMe = new Queue();
waitingFor = 0;
semaphore = 0;
sleepTime = -1;
}
void PCB::createStack(StackSize stackSize){
intOff;
if(stackSize > maxStack) stack = new char[maxStack];
else stack = new char[stackSize];
newSS = FP_SEG(stack + stackSize);
newSP = FP_OFF(stack + stackSize);
asm{
mov oldSS, ss
mov oldSP, sp
mov ss, newSS
mov sp, newSP
push ax; push bx; push cx; push dx; push es; push ds; push si; push di; push bp
mov newSP, sp
mov newSS, ss
mov sp, oldSP
mov ss, oldSS
}
stackPointer = MK_FP(newSS, newSP);
intOn;
}
I figure its something with createStack() but I don't know what. All help is appreciated.
*Note: I currently don't have constant access to the internet so please don't get angry if I don't reply fastly. I will try to check on this question as much as I can.
EDITED:
PCB class definition:
class PCB
{
static ID genID;
char *stack;
void *stackPointer;
Thread *thread;
TName name;
ID id;
Time timeSlice, sleepTime;
Status status;
Queue *waitingMe;
PCB* waitingFor;
KernelSem* semaphore;
friend class Thread;
// static unsigned newSS, newSP, oldSS, oldSP;
public:
static StackSize maxStack;
PCB(TName name, StackSize stackSize, Time timeSlice,Thread *thread);
~PCB(void);
void runThread();
ID getID(){
return id;
}
TName getName(){
return name;
}
void setStatus(Status status){
this->status = status;
}
Status getStatus(){
return status;
}
int getEnding(){
if(status == ending) return 1;
return 0;
}
int getBegging(){
if(status == begging) return 1;
return 0;
}
void createStack(StackSize stackSize);
void* getStackPointer(){
return stackPointer;
}
void setStack(void *newStackPointer){
stackPointer = newStackPointer;
}
Time getTimeSlice(){return timeSlice;}
Time getSleepTime(){return sleepTime;}
void decrementSleepTime(){sleepTime--;}
void setSemaphore(KernelSem* kersem){this->semaphore = kersem;}
void resetSemphore(){this->semaphore = 0;}
Thread* getThread(){return thread;}
};
Code where this happens:
Producer **pro = new Producer*[N];
for (i=0; i<N; i++){
producerName[8]='0'+i;
pro[i] = new Producer(buff, producerName ,'0'+i, TIME_SLICE);
pro[i]->start();
}
It's the part of a test file that I got with this assignment. It mustn't be changed. But it is regular.
I have put
allPCBQueue->listAll()
after
pro[i] = new Producer(buff, producerName ,'0'+i, TIME_SLICE);
and I always get that all of the names are same. allPCBQueue is a simple list of PCBs
if(namee) name = namee;
else name = "Thread"+id; <<< Doesn't do what you think it does.
"Thread" is a char *, adding a number to it will give you the pointer + offset.
You don't want to SWITCH to your new stack until AFTER you have created it. Instead of using push to store, just use something like this:
stackPointer = MK_FP(newSS, newSP);
unsigned *sp = reinterpret_cast<unsigned *>(stackPointer);
*--sp = 0; // AX
*--sp = 0; // BX
*--sp = 0; // CX
*--sp = 0; // DX
*--sp = default_ds; // You'll have to dig this out with some inline assembler
*--sp = default_es; // You'll have to dig this out with some inline assembler
*--sp = 0; // SI
*--sp = 0; // DI
*--sp = 0; // BP
stackPointer = reinterpret_cast<void *>(sp);
[Of course, it would be easier to just make stackpointer a pointer to int in the first place].
Since the thread is starting from scratch, values of AX, BX, etc, doesn't matter. ES/DS may matter depending on what memory model you are using. Not pushing onto the stack also means you don't have to disable interrupts for this part - always a bonus.
Unfortunately, your code isn't showing what you are doing with "array of PCB's", so I can't say what' wrong there. And I'm sure someone says this should be a comment, not an answer, since it doesn't actually answer your question - but formatting code in comments is nearly hopeless...
Edit:
I'm guessing that "producername" is a local variable in your code that creates the threads. This won't work, but I think it's a bit difficult to dictate that the caller must ensure that the name stays forever, so I think what you should do is:
if(namee)
{
size_t len = strlen(namee);
char *name_buf = new char[len+1];
strcpy(name_buf, namee);
name = name_buf;
}
else
{
// Make up some random name here.
}
The code was
name = namee
or
this->name = namee
I just made it
strcpy(name, namee)
and it works now.

Segmentation fault occurs only under release configuration

For some odd reason, my application likes to break on me when I switch to release and run it outside of my debugger. Here's what works for me, and here's what doesn't
(Qt Creator is the IDE)
Debugging with debug configuration - ok
Running with debug configuration - ok
Debugging with release configuration - ok
Running with release configuration - application crash
My UI is one project, and the core for some stuff as a separate dependency. On Windows (compiling with MSVCC), I hit a menu button, which eventually calls down to a function. In that function, the app breaks on adding a new element to a vector. e.g:
str *x = new str();
str *y = new str();
/* ...set some of x & y's members... */
vector.push_back(x); // works fine
vector.push_back(y); // causes crash
If I comment out the line vector.push_back(y);, the app continues no problem until the app leaves the event scope (i.e. the end of OnMenuButtonClick). On OS X, it's similar to the issue of adding an element to a vector, except I have:
std::vector<foo *> SomeFunction()
{
std::vector<foo *> returningVector;
/* do stuff */
std::vector<foo *> goo = GetFooObjects();
for (int i = 0; i < goo.size(); i++)
{
returningVector.push_back(goo[i]); // breaks here
}
}
So what are some causes of this strange behavior without a debugger attached and not under debug configuration? I've checked to make sure all of my variables are initialized, so I'm stumped. If you want to view the code above, the first part can be located here, and the second part here. Please forgive anything you see as "bad", and if you have suggestions that you just can't contain, then please do message me on GitHub.
Edit:
I looked more into it, and found out exactly what's causing the problem, but don't know how to fix it. This is the function where my app crashes (on OS X):
vector<Drive *> Drive::GetFATXDrives( bool HardDisks )
{
vector<Drive *> Return;
if (HardDisks)
{
vector<DISK_DRIVE_INFORMATION> Disks = GetPhysicalDisks();
for (int i = 0; i < (int)Disks.size(); i++)
{
DISK_DRIVE_INFORMATION ddi = Disks.at(i);
// First, try reading the disk way
Streams::xDeviceStream* DS = NULL;
try
{
char path[0x200] = {0};
wcstombs(path, ddi.Path, wcslen(ddi.Path));
DS = new Streams::xDeviceStream(ddi.Path);
}
catch (xException& e)
{
continue;
}
if (DS == NULL || DS->Length() == 0 || DS->Length() < HddOffsets::Data)
{
// Disk is not of valid length
continue;
}
DS->SetPosition(HddOffsets::Data);
// Read the FATX partition magic
int Magic = DS->ReadInt32();
// Close the stream
DS->Close();
// Compare the magic we read to the *actual* FATX magic
if (Magic == FatxMagic)
{
Drive *d = new Drive(Disks.at(i).Path, Disks.at(i).FriendlyName, false);
Return.push_back(d);
}
}
}
vector<Drive *> LogicalDisks = GetLogicalPartitions();
for (int i = 0; i < (int)LogicalDisks.size(); i++)
{
Return.push_back(LogicalDisks.at(i));
}
return Return;
}
If I change if (HardDisks) to if (HardDisks = false), the app works just fine. So, I looked into that scope and discovered that after vector<DISK_DRIVE_INFORMATION> Disks = GetPhysicalDisks();, the heap gets corrupt or something like that. I noticed this because in the debugger, after that function is called, my HardDisks bool changes to "false", which wasn't what it was before.
Here is GetPhysicalDisks:
vector<Drive::DISK_DRIVE_INFORMATION> Drive::GetPhysicalDisks( void )
{
// RIGHT AFTER this vector is initialized, everything goes to hell
vector<Drive::DISK_DRIVE_INFORMATION> ReturnVector;
DIR *dir;
dirent *ent;
dir = opendir("/dev/");
if (dir != NULL)
{
// Read the shit
while ((ent = readdir(dir)) != NULL)
{
// Check the directory name, and if it starts with "disk" then keep it!
QRegExp exp("disk*");
exp.setPatternSyntax(QRegExp::Wildcard);
exp.setCaseSensitivity(Qt::CaseInsensitive);
if (exp.exactMatch(ent->d_name))
{
DISK_DRIVE_INFORMATION curdir;
memset(curdir.FriendlyName, 0, sizeof(curdir.FriendlyName));
memset(curdir.Path, 0, sizeof(curdir.Path));
char diskPath[0x50] = {0};
sprintf(diskPath, "/dev/r%s", ent->d_name);
mbstowcs(curdir.Path, diskPath, strlen(diskPath));
int device;
if ((device = open(diskPath, O_RDONLY)) > 0)
{
#ifdef __linux
hd_driveid hd;
if (!ioctl(device, HDIO_GET_IDENTITY, &hd))
{
swprintf(curdir.FriendlyName, strlen(hd) * 2, L"%hs", hd.model);
}
#elif defined __APPLE__
mbstowcs(curdir.FriendlyName, ent->d_name, strlen(ent->d_name));
#endif
ReturnVector.push_back(curdir);
}
}
}
}
return ReturnVector;
}
While this isn't a real answer as to what happened, I did find a way to fix the problem. Looking at my edit above, I edited my Drive::GetFATXDrives function like so:
vector<Drive *> Drive::GetFATXDrives( bool HardDisks )
{
// Initialize Disks vector up here
vector<DISK_DRIVE_INFORMATION> Disks;
// Call the function to get the hard disks
if (HardDisks)
Drive::GetPhysicalDisks(Disks);
vector<Drive *> ReturnVector;
if (HardDisks)
{
Streams::xDeviceStream* DS = NULL;
for (int i = 0; i < (int)Disks.size(); i++)
{
/* ... */
}
if (DS)
{
DS->Close();
delete DS;
}
}
vector<Drive *> LogicalDisks = GetLogicalPartitions();
for (int i = 0; i < LogicalDisks.size(); i++)
{
ReturnVector.push_back(LogicalDisks[i]);
}
return ReturnVector;
}
And my Drive::GetPhysicalDisks function now takes a vector<DISK_DRIVE_INFORMATION> reference instead of returning one. Seemed to make my program work just fine after that.

Function has corrupt return value

I have a situation in Visual C++ 2008 that I have not seen before. I have a class with 4 STL objects (list and vector to be precise) and integers.
It has a method:
inline int id() { return m_id; }
The return value from this method is corrupt, and I have no idea why.
debugger screenshot http://img687.imageshack.us/img687/6728/returnvalue.png
I'd like to believe its a stack smash, but as far as I know, I have no buffer over-runs or allocation issues.
Some more observations
Here's something that puts me off. The debugger prints right values in the place mentioned // wrong ID.
m_header = new DnsHeader();
assert(_CrtCheckMemory());
if (m_header->init(bytes, size))
{
eprintf("0The header ID is %d\n", m_header->id()); // wrong ID!!!
inside m_header->init()
m_qdcount = ntohs(h->qdcount);
m_ancount = ntohs(h->ancount);
m_nscount = ntohs(h->nscount);
m_arcount = ntohs(h->arcount);
eprintf("The details are %d,%d,%d,%d\n", m_qdcount, m_ancount, m_nscount, m_arcount);
// copy the flags
// this doesn't work with a bitfield struct :(
// memcpy(&m_flags, bytes + 2, sizeof(m_flags));
//unpack_flags(bytes + 2); //TODO
m_init = true;
}
eprintf("Assigning an id of %d\n", m_id); // Correct ID.
return
m_header->id() is an inline function in the header file
inline int id() { return m_id; }
I don't really know how best to post the code snippets I have , but here's my best shot at it. Please do let me know if they are insufficient:
Class DnsHeader has an object m_header inside DnsPacket.
Main body:
DnsPacket *p ;
p = new DnsPacket(r);
assert (_CrtCheckMemory());
p->add_bytes(buf, r); // add bytes to a vector m_bytes inside DnsPacket
if (p->parse())
{
read_packet(sin, *p);
}
p->parse:
size_t size = m_bytes.size(); // m_bytes is a vector
unsigned char *bytes = new u_char[m_bytes.size()];
copy(m_bytes.begin(), m_bytes.end(), bytes);
m_header = new DnsHeader();
eprintf("m_header allocated at %x\n", m_header);
assert(_CrtCheckMemory());
if (m_header->init(bytes, size)) // just set the ID and a bunch of other ints here.
{
size_t pos = DnsHeader::SIZE; // const int
if (pos != size)
; // XXX perhaps generate a warning about extraneous data?
if (ok)
m_parsed = true;
}
else
{
m_parsed = false;
}
if (!ok) {
m_parsed = false;
}
return m_parsed;
}
read_packet:
DnsHeader& h = p.header();
eprintf("The header ID is %d\n", h.id()); // ID is wrong here
...
DnsHeader constructor:
m_id = -1;
m_qdcount = m_ancount = m_nscount = m_arcount = 0;
memset(&m_flags, 0, sizeof(m_flags)); // m_flags is a struct
m_flags.rd = 1;
p.header():
return *m_header;
m_header->init: (u_char* bytes, int size)
header_fmt *h = (header_fmt *)bytes;
m_id = ntohs(h->id);
eprintf("Assigning an id of %d/%d\n", ntohs(h->id), m_id); // ID is correct here
m_qdcount = ntohs(h->qdcount);
m_ancount = ntohs(h->ancount);
m_nscount = ntohs(h->nscount);
m_arcount = ntohs(h->arcount);
You seem to be using a pointer to an invalid class somehow. The return value shown is the value that VS usually uses to initialize memory with:
2^32 - 842150451 = 0xCDCDCDCD
You probably have not initialized the class that this function is a member of.
Without seeing more of the code in context.. it might be that the m_id is out of the scope you expect it to be in.
Reinstalled VC++. That fixed everything.
Thank you for your time and support everybody! :) Appreciate it!