gcc objdump - what is section .ARM.exidx.text - c++

I need to find out what a particular BYTE - .obj address 0x584a7 - in my .obj is mapped to i.e. what is responsible for generating it (code/debug info/etc).
I've successfully run objdump -xSsDg on my .obj file.
Looking at the output, I've identified the areas that refer to that byte:
First section: note the address is 0x584a0 and the size is 8 so this is what includes my byte of interest (0x584a7)
1012 .ARM.exidx.text._ZN5QHashI7QStringiE11deleteNode2EPN9QHashData4NodeE 00000008 00000000 00000000 000584a0 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
...
Second section: using a hex editor, I've identified that this data - 00000000 00000080 - matches my actual .obj file. The first four 0 at the start of the line indicate the offset INTO this section, which itself starts at 0x584a0
Contents of section .ARM.exidx.text._ZN5QHashI7QStringiE11deleteNode2EPN9QHashData4NodeE:
0000 00000000 00000080 ........
...
Third section: looks like code but not sure what is what.
Disassembly of section .ARM.exidx.text._ZN5QHashI7QStringiE11deleteNode2EPN9QHashData4NodeE:
00000000 <.ARM.exidx.text._ZN5QHashI7QStringiE11deleteNode2EPN9QHashData4NodeE>:
0: 00000000 andeq r0, r0, r0
0: R_ARM_PREL31 .text._ZN5QHashI7QStringiE11deleteNode2EPN9QHashData4NodeE
0: R_ARM_NONE __aeabi_unwind_cpp_pr1
4: 80000000 andhi r0, r0, r0
4: R_ARM_PREL31 .ARM.extab.text._ZN5QHashI7QStringiE11deleteNode2EPN9QHashData4NodeE
So I have my .cpp and I have the .obj and I can see the mangled name has something to do with QHash, QString, QHashData . . .
QUESTION
How do I conclusively map this section to a specific . . . THING be it code or debug info or . . . whatever so that I can know what affects this particular byte.

In an ARM object file, sections named .ARM.exidx.text.* contain the exception index table. Each entry in this table contains the association between a function in the code being compiled and its exception-handling code; in other words, each entry specifies what should happen when an exception is thrown during execution of the function (or in other cases where the function call stack frame needs to be unwound outside the normal code execution path). These sections don't contain ARM or Thumb instructions, so their contents cannot be disassembled, and the andeq and andhi instructions you are seeing in your disassembly are misleading (the -D option you are passing to objdump makes it disassemble the contents of all object sections, even if they don't contain instructions).
In your case, you are looking at the frame unwinding instructions associated to the deleteNode() method of the QHash class from the Qt library. The format of these instructions is defined in the exception handling ABI for the ARM architecture (available for download at http://infocenter.arm.com/help/topic/com.arm.doc.ihi0038b/IHI0038B_ehabi.pdf), and specifically at section 9.3 of the document.

Related

Differences in environment layout with and without GDB

Recently I have been working on CTF challenges that require the attacker to stage shellcode in the environment. With ASLR disabled, one can rely on only slight differences between the environment of the shell, for example, and that of the exploitable process (e.g. differences due only to binary name differences). However, GDB (and R2) will make slight changes to the environment that make this very hard to do due to the environment variables shifting around slightly when not being debugged.
For example, GDB seems to at least add the environment variables LINES and COLUMNS. However, these can be removed by invoking GDB as follows:
gdb -ex 'set exec-wrapper env -u LINES -u COLUMNS' -ex 'r < exploit.input' challenge.bin
Note that GDB will implicitly use the fully qualified path when debugging a binary, so the user can further decrease any differences by invoking the binary in a similar manner.
`pwd`/challenge.bin < exploit.input
However, there still appear to be some differences. I have many times been able to get an exploit working while in GDB, but only to have it crash when run without the debugger. I've read mention of some script (sometimes referred to as setenv.sh) that can (allegedly) be used to setup an environment exactly like GDB, but I have not been able to find it.
Looking at the env of the shell:
LANG=en_US.UTF-8
PROFILEHOME=
DISPLAY=:0
OLDPWD=/home/user
SHELL_SESSION_ID=e7a0e681012e480fb044a034a775bb83
INVOCATION_ID=8ef3be94d09f4e47a0322ddf0d6ed787
COLORTERM=truecolor
MOZ_PLUGIN_PATH=/usr/lib/mozilla/plugins
XDG_VTNR=1
XDG_SESSION_ID=c1
USER=user
PWD=/test
HOME=/home/user
JOURNAL_STREAM=9:15350
KONSOLE_DBUS_SESSION=/Sessions/1
KONSOLE_DBUS_WINDOW=/Windows/1
GTK_MODULES=canberra-gtk-module
MAIL=/var/spool/mail/user
WINDOWPATH=1
TERM=xterm-256color
SHELL=/bin/bash
KONSOLE_DBUS_SERVICE=:1.7
KONSOLE_PROFILE_NAME=Profile 1
SHELLCODE=����
XDG_SEAT=seat0
SHLVL=4
COLORFGBG=15;0
LANGUAGE=
WINDOWID=16777222
LOGNAME=user
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
XDG_RUNTIME_DIR=/run/user/1000
XAUTHORITY=/home/user/.Xauthority
PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
_=/usr/bin/env
And comparing it to that of GDG (LINES and COLUMNS removed):
/test/challenge.bin
_=/usr/bin/gdb
LANG=en_US.UTF-8
DISPLAY=:0
PROFILEHOME=
OLDPWD=/home/user
SHELL_SESSION_ID=e7a0e681012e480fb044a034a775bb83
INVOCATION_ID=8ef3be94d09f4e47a0322ddf0d6ed787
COLORTERM=truecolor
MOZ_PLUGIN_PATH=/usr/lib/mozilla/plugins
XDG_VTNR=1
XDG_SESSION_ID=c1
USER=user
PWD=/test
HOME=/home/user
JOURNAL_STREAM=9:15350
KONSOLE_DBUS_SESSION=/Sessions/1
KONSOLE_DBUS_WINDOW=/Windows/1
GTK_MODULES=canberra-gtk-module
MAIL=/var/spool/mail/user
WINDOWPATH=1
SHELL=/bin/bash
TERM=xterm-256color
KONSOLE_DBUS_SERVICE=:1.7
KONSOLE_PROFILE_NAME=Profile 1
SHELLCODE=����
COLORFGBG=15;0
SHLVL=4
XDG_SEAT=seat0
LANGUAGE=
WINDOWID=16777222
LOGNAME=user
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
XDG_RUNTIME_DIR=/run/user/1000
XAUTHORITY=/home/user/.Xauthority
PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl
/test/challenge.bin
One can see the environments are not very different on inspection. Notably, the GDB env seems to have a second instance of the debugged binary's name (e.g. challenge.bin, in this case), as well as the fact that it sets _ to GDB rather than the debugged binary. The offsets seem to be way off, even when taking these small changes into account.
TL;DR
How can the GDB environment differences be nulled out for the case when it is necessary to know a priori the locations of things in the environment with and without the debugger running?
In an effort of lazyness, has anyone fully characterized the with/without GDB environment, or the changes GDB makes?
And for those interested, R2 appears to made changes to PATH. There may also be other differences.
How can the GDB environment differences be nulled out
One way is to run the binary outside of GDB (have the binary wait for GDB to attach, as described here), and attach GDB to it from "outside".
Update:
the binary in question is part of a challenge and source is not provided
You can patch _start with a jmp _start (so the binary will never progress past the first instruction). Once attached, replace the jmp with the original instruction, and start debugging.
Update 2:
Are you familiar with a better process?
In order to find offset of a given function in the ELF file, you need two values: offset of the function within section, and offset of section within the file.
For example:
$ readelf -Ws a.out | grep ' _start'
58: 00000000004003b0 43 FUNC GLOBAL DEFAULT 11 _start
This tells you that _start is linked at 0x4003b0 in section 11.
What is that section, what is its starting address, and where in the file does it start?
$ readelf -WS a.out | grep '\[11\]'
[11] .text PROGBITS 00000000004003b0 0003b0 000151 00 AX 0 0 16
We now see that _start is at the very start of .text (this is usually the case), and that .text starts at offset 0x3b0 in the file. QED.
An even better process is to use GDB to perfom the patching (GDB will perform all the finding of offsets). Suppose I want to overwrite the first instruction of _start with 0xCC instruction:
$ gdb --write -q ./a.out
Reading symbols from ./a.out...done.
Let's look at the original instructions first:
(gdb) x/4i _start
0x4003b0 <_start>: xor %ebp,%ebp
0x4003b2 <_start+2>: mov %rdx,%r9
0x4003b5 <_start+5>: pop %rsi
0x4003b6 <_start+6>: mov %rsp,%rdx
Now let's patch the first one:
(gdb) set *(char*)0x4003b0 = 0xCC
(gdb) x/4i _start
0x4003b0 <_start>: int3
0x4003b1 <_start+1>: in (%dx),%eax
0x4003b2 <_start+2>: mov %rdx,%r9
0x4003b5 <_start+5>: pop %rsi
(gdb) quit
Segmentation fault (core dumped) <<-- this is a GDB bug. I should fix it some day.
$ objdump -d a.out
...
Disassembly of section .text:
00000000004003b0 <_start>:
4003b0: cc int3 <<-- success!
4003b1: ed in (%dx),%eax
4003b2: 49 89 d1 mov %rdx,%r9
...
Voila!

Debugging: Is there a Way to Discern Exported DLL Functions in the C++ Call Stack

Is there a way to predict/calculate where in a module's memory a function will be located?
*Sigh* The first thing you'll tell me is: you have to have symbols for stuff to work well. I know. (Oh! Do I know!) It isn't an option for me, so I'm looking to make the most of what I have. I'm not trying to get the values of local variables or anything (which would require intimate knowledge of compiler optimizations, etc.). I just would like to know what DLL function we were in.
I have a memory dump from a crashed process. The process loads numerous DLL's, each of which have some exported functions. One thread in the process has some items on the call stack "within" this DLL. The Visual Studio debugger reports them as (the names are changed to protect the innocent):
msvcr120.dll!abort() Line 90
msvcr120.dll!_XcptFilter(unsigned long xcptnum, _EXCEPTION_POINTERS * pxcptinfoptrs) Line 366
MyProcess.exe!016c9a9a()
[Frames below may be incorrect and/or missing, no symbols loaded for Mach4GUI.exe]
[External Code]
MyProcess.exe!016ca390()
[External Code]
ExtLib01.dll!6eb1f75c()
ExtLib01.dll!6eb95991()
ExtLib01.dll!6e658979()
ExtLib01.dll!6e653bab()
ExtLib01.dll!6e66d5dd()
[External Code]
The DLL wasn't built for debugging, and I don't have a .PDB or other source of symbols.
Visual Studio reports this about the DLL module:
Name Path Address
---- ---- -------
ExtLib01.dll C:\MyProj\ExtLib01.dll 6E5E0000-6FE8800
The DLL, of course, has some exported functions (reported by dumpbin /exports ExtLib01.dll), like:
ordinal hint RVA name
1 0 0009CE30 LibFunc1#16
2 1 0009CDF0 LibFunc2#4
3 2 0009CE60 LibFunc3#4
...
482 1E1 0010FB40 LibFunc482
483 1E2 0010FBD0 LibFunc483
484 1E3 0010FAC0 LibFunc484
Not knowing any better, it seems like there might be enough information here to figure out what functions are in the call stack (if the return address is "within" one of the DLL functions).
Some simple arithmetic on the first item in the call stack, 0x6eb1f75c-0x6E5E0000, yields an offset into the ExtLib01.dll "module" in memory: +0x0053f75c, but this doesn't correspond at all to the relative virtual addresses listed in the DLL exports.
So, Visual Studio tells me nothing but the module name and an address (within that module) in the call stack. One might assume that this is because more information isn't available. My experience with Microsoft products suggests that they just might not have implemented such a feature—though in the case of their flagship development tool, surely they would try hard to make it useful.
More confusingly, WinDbg tells me different things about about the call stack for the same thread.
# ChildEBP RetAddr
00 0022cdb8 7740171a ntdll!NtWaitForMultipleObjects+0x15
01 0022ce54 76c519fc KERNELBASE!WaitForMultipleObjectsEx+0x100
02 0022ce9c 76c541d8 kernel32!WaitForMultipleObjectsExImplementation+0xe0
03 0022ceb8 76c780bc kernel32!WaitForMultipleObjects+0x18
04 0022cf24 76c77f7b kernel32!WerpReportFaultInternal+0x186
05 0022cf38 76c77870 kernel32!WerpReportFault+0x70
06 0022cf48 76c777ef kernel32!BasepReportFault+0x20
07 0022cfd4 74fb4820 kernel32!UnhandledExceptionFilter+0x1af
08 0022cfe0 74fb4611 msvcr120!__crtUnhandledException+0x14 [f:\dd\vctools\crt\crtw32\misc\winapisupp.c # 259]
09 0022d318 74fb7676 msvcr120!_call_reportfault+0xfe [f:\dd\vctools\crt\crtw32\misc\invarg.c # 201]
0a 0022d328 74fb4e38 msvcr120!abort+0x38 [f:\dd\vctools\crt\crtw32\misc\abort.c # 90]
0b 0022d340 016c9a9a msvcr120!_XcptFilter+0x14b [f:\dd\vctools\crt\crtw32\misc\winxfltr.c # 366]
WARNING: Stack unwind information not available. Following frames may be wrong.
0c 0022f8c8 76c5336a MyProcess!SomeSymbol+0xa0d71a
0d 0022f8d4 77b39902 kernel32!BaseThreadInitThunk+0xe
0e 0022f914 77b398d5 ntdll!__RtlUserThreadStart+0x70
0f 0022f92c 00000000 ntdll!_RtlUserThreadStart+0x1b
I can perhaps understand some of the items at the top of the list (everything above UnhandledExceptionFilter, where Windows Error Reporting is doing its thing, capturing the memory dump, etc.), but why the discrepancy between the two call stacks? I see that both report "Following frames may be wrong." Is this information completely unreliable? What can be known or discerned from the information available?
What can be discerned about the items in the call stack from the information available?
Is it possible to know where a function from a loaded DLL is located in memory of a running process?
I'm willing to use WinDbg if it'll help. I'm willing to use PowerShell within Visual Studio's NuGet console. I'm not unwilling to do some work, but I have no real idea if it's even possible to know something these DLL functions.
Note: I realize the answer may very well be, "Nope. It's impossible." I'm looking for a good answer with an explanation (or references to some sort of authoritative source with an explanation) as to how the DLL's are loaded and its functions called and why a memory dump doesn't contain enough information to figure out where any given function "starts" in memory.

Activation record for main()

I am new to reversing. My apologies if question sounds to beginer-ish :) I have created simple code in Visual Studio C++ 2010 on XP SP3:
int main()
{
return 0;
}
Whenever I open it in Olly it shows the following state of the stack with execution paused:
0012FFC4 7C817077 RETURN to kernel32.7C817077
0012FFC8 7C910228 ntdll.7C910228
0012FFCC FFFFFFFF
0012FFD0 7FFD5000
0012FFD4 80544CFD
0012FFD8 0012FFC8
0012FFDC 82537DA8
0012FFE0 FFFFFFFF End of SEH chain
0012FFE4 7C839AD8 SE handler
0012FFE8 7C817080 kernel32.7C817080
0012FFEC 00000000
0012FFF0 00000000
0012FFF4 00000000
0012FFF8 004012A0 Reversin.<ModuleEntryPoint>
0012FFFC 00000000
I can see end of SEH chain and SE handler the rest of it just doesn't make sense to me. I have found the following stack layout for the functions with exception handler installed:
Function_Local_Variables
Exception_Registration_Record
Exception_Handler
Callers_EBP
Return_Address_in_Caller
Function_Arguments
It does not seem to apply in my case. I need help trying to understand what's been stored in stack please.
Thank you.
If you're trying to learn the stack convention by looking at hex in Olly, you should consider which stack convention your code is following. By default, most C++ code follows the __cdecl convention. Check out this link: http://en.wikipedia.org/wiki/X86_calling_conventions#cdecl

What is a good way to create a string for crash reporting Win32 C++ that reflects the cause of the crash?

We're using Fogbugz for tracking issues and I am in the middle of writing a C++ wrapper around the XML API for Fogbugz.
The best practice seems to be to use the "scout" field so that similar/same crashes are just counted but not reported again. To do that we need a unique string for a particular cause of a crash.
In Win32 - after getting a dmp file or other crash handler what is a good way to make a unique string for a crash? (we're going to create a dmp file and send it to the fogbugz server)
In previous postings/articles/etc Joel has made various suggestions but much of those counted on a language like C# that use reflection and have a lot of information that is either harder to get or not possible to get.
Have any other people gotten things like stack traces or other things to make scout entries in fogbugz?
EDIT
To clarify - we don;t want a unique id for every incident - there are likely crashes that have the same code path. We want to capture that. I was thinking that we would get the last few stack calls that are in our code (not ones from win32 DLLs) - but not sure how to go about doing this.
Reporting every crash as unique is not right. Reporting all crashes under the same case is not right. Different users repeating a scenario that causes a crash should map to the same incident.
EDIT
What I think we want is a general "signature" of a crash - based on what is on the stack. Similar stacks should have the same signature. For example - take the top 5 methods that are in our app and then the first call (if any) we make into an MS DLL. This would probably be sufficient for a signature and would likely correlate the crashes that are "the same".
So how does one get the list of methods on the stack? And how can you tell if they are from your own app or in another DLL?
EDIT - NOTE
We want to create a "bucket id"/signature while in the exception handler so that we can create the minidump and send it to fogbugz as a scout description. Alternatively we can load up the dump on t he next start of the app and send it then with a signature we generate.
Here in my project I use the Address Memory of the Crash as a "Unique" ID.
IMO the best thing you can use will be bucket id from dump analysis. Use properly configured Debugging Tools for Windows (windbg), one can do !analyze -v and classify your dumps into different buckets based on bucket id. Bucket id guaranteed that if two dumps are the same, their bucket id will be the same. That solves part of the puzzle.
Many times two dumps rooted from same problem will create different bucket id's (maybe version difference, say your 1.0 and 1.1 both crash at same point). You can use faulting module and stack signature to correlate bugs from the same point of fault.
There will be certain things that causes very random dumps (e.g. heap corruption, the faulting module is typically the victim). Therefore dump analysis should be considered best-effort. When you can't, you can't.
I used something like this to generate exceptions in my last app (MSVC), so every error would get logged with the sourcefile and line it occured on:
class Error {
//...
public: Error(string file, string line, string error) ;
};
#define ERROR(err) Error(__FILE__, __LINE__, err)
It's probably a little bit late, but I will add my solution here, too, in case it can help other people.
You can do this using fools from "Debugging Tools for Windows", for example windbg.exe or better kd.exe.
Running the command "kd.exe -z "path_to_dump.dmp" -c "kd;q" >> dumpstack.txt, you might get the following result:
Microsoft (R) Windows Debugger Version 10.0.15063.400 X86
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [d:\work\bugs\14122\myexe.exe.2624.dmp]
User Mini Dump File with Full Memory: Only application data is available
************* Symbol Path validation summary **************
Response Time (ms) Location
Deferred srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows 10 Version 15063 MP (4 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
15063.0.x86fre.rs2_release.170317-1834
Machine Name:
Debug session time: Fri Oct 13 00:09:01.000 2017 (UTC + 1:00)
System Uptime: 0 days 0:18:33.797
Process Uptime: 0 days 0:03:40.000
................................................................
.....................................................
Loading unloaded module list
..............................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(a40.2580): Security check failure or stack buffer overrun - code c0000409 (first/second chance not available)
eax=00000001 ebx=00000000 ecx=00000007 edx=77cc4350 esi=00000000 edi=00000000
eip=62ae7666 esp=0b75e17c ebp=0b75e1a8 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
msvcr120!abort+0x28:
62ae7666 cd29 int 29h
0:068> kd: Reading initial command 'kb;q'
ChildEBP RetAddr Args to Child
0b75e178 62addc5f 935dda1f 00000000 00000000 msvcr120!abort+0x28
0b75e1a8 0b75e7d4 62a9b436 0b75e1dc 62a52aa5 msvcr120!terminate+0x33
WARNING: Frame IP not in any known module. Following frames may be wrong.
0b75e1ac 62a9b436 0b75e1dc 62a52aa5 00000000 0xb75e7d4
0b75e1b4 62a52aa5 00000000 62a59740 0b75e7d4 msvcr120!__FrameUnwindToState+0x89
0b75e1c8 62a52b33 00000000 00000000 00000000 msvcr120!_EH4_CallFilterFunc+0x12
0b75e1f4 62a5a0f3 62b1f7b8 62a4f7c6 0b75e324 msvcr120!_except_handler4_common+0x8e
0b75e214 77cd6152 0b75e324 0b75e7c4 0b75e344 msvcr120!_except_handler4+0x1e
0b75e238 77cd6124 0b75e324 0b75e7c4 0b75e344 ntdll!ExecuteHandler2+0x26
0b75e30c 77cc4266 0b75e324 0b75e344 0b75e324 ntdll!ExecuteHandler+0x24
0b75e30c 74cf28f2 0b75e324 0b75e344 0b75e324 ntdll!KiUserExceptionDispatcher+0x26
0b75e684 62a59339 e06d7363 00000001 00000003 KERNELBASE!RaiseException+0x62
0b75e6c4 6001821c 0b75e6e4 6004e1bc 946a8f2a msvcr120!_CxxThrowException+0x5b
0b75e6f8 60018042 0b75e720 946a8efa ffffffff mymodule!FunctionC+0x7c
0b75e730 60016544 946a8ece ffffffff 092889d8 mymodule!FunctionB+0x32
0b75e754 600166b8 00842338 6000588d 00000001 myothermodule!FunctionB+0x44
From this stack, you can create a unique bucket if you take for example only your methods from the stack and concatenate them in a string: "mymodule!FunctionC+0x7c;mymodule!FunctionB+0x32;myothermodule!FunctionB+0x44". In order for this to work, you need to have access to you personal symbols server, either using the environment variable _NT_SYMBOL_PATH or with the -y command line switch.
You can alternatively create a string from the return addresses only (second column): "62addc5f,0b75e7d4,62a9b436,62a52aa5,62a52b33,62a5a0f3,77cd6152,77cd6124,77cc4266,74cf28f2,62a59339,6001821c,60018042,60016544,600166b8"
Just use an MD5 string generated from the dump file and you will likely to get a unique string for every crash.
I would start with collecting the data on how often every function in your code has been "flashed" in a crash report stack trace. Every report would have to be added to some kind of database, and every function would have to be indexed so that you could later query, which functions seem to crash more often than others. (And of course, functions like main() will be in every report, but that's understandable).
Or, you think that only crash reports seem to be the problem, you could just remove all those entries from crash stack traces, and then hash the rest (your functions). That way you could see if any particular call chain of your own functions causes a crash repeatedly, no matter what external functions have been called in between.
Then of course, some of the more complicated problems will not be captured this way anyway, as the stack trace will be completely different. To help that, you could record other data from your application along with the stack trace in every report, like sizes of buffers, counters, states of different parts of the application and so on... And then do some statistics on that.

ELF File Format

I'm attempting to manually load the hexdump of an elf file that I compiled using g++ into a processor simulation I designed. There are 30 sections to a standard elf file and I am loading all 30 segments with their proper memory location offset taken into account. I then start my program counter at the beginning of the .text section (00400130) but it seems that the program isn't running correctly. I have verified my processor design relatively thoroughly using SPIM as a gold standard. The strange thing is that, if I load an assembly file into SPIM, and then take the disassembled .text and .data sections that are generated by the software, load them into my processor's memory, the programs work. This is different from what I want to do because I want to:
write a c++ program
compile it using mipseb-linux-g++ (cross compiler)
hex dump all sections into their own file
read files and load contents into processor "memory"
run program
Where in the ELF file should I place my program counter initially? I have it at the beginning of .text right now. Also, do I only need to include .text and .data for my program to work correctly? What am I doing wrong here?
The ELF header should include the entry address, which is not necessarily the same as the first address in the .text region. Use objdump -f to see what the entry point of the file is -- it'll be called the "start address".
The format is described here - you should be using the program headers rather than the section headers for loading the ELF image into memory (I doubt that there are 30 program headers), and the entry point will be described by the e_entry field in the ELF header.
Use the e_entry field of the ELF header to determine where to set the Program Counter.
Look into Elf32_Ehdr.e_entry (or Elf64_Ehdr.e_entry if you are on 64-bit platform). You should at least also include the .bss section, which is empty, but has "in-memory" size in the disk ELF image.
Wikipedia will lead you to all necessary documentation.
Edit:
Here's from objdump -h /usr/bin/vim on my current box:
Sections:
Idx Name Size VMA LMA File off Algn
...
22 .bss 00009628 00000000006df760 00000000006df760 001df760 2**5
ALLOC
23 .comment 00000bc8 0000000000000000 0000000000000000 001df760 2**0
CONTENTS, READONLY
Note the File off is the same for both .bss and .comment, which means .bss is empty in the disk file, but should be 0x9628 bytes in memory.