Will (or can?) a switch statement based on a template argument be removed by the compiler?
See for example the following function in which the template argument funcStep1 is used to select the appropriate function. Will this switch statement be removed as its argument is known at compile time? I tried to learn it from the assembly code (given below) but I have no experience in reading assembly yet so this is not really a feasible task for me.
This question became now relevant for me since the newly introduced if constexpr provides an alternative that is guaranteed to be evaluated at compile time.
template<int funcStep1>
double Mdp::expectedCost(int sdx0, int x0, int x1, int adx0, int adx1) const
{
switch (funcStep1)
{
case 1:
case 3:
return expectedCost_exact(sdx0, x0, x1, adx0, adx1);
case 2:
case 4:
return expectedCost_approx(sdx0, x0, x1, adx0, adx1);
default:
throw string("Unknown value (Mdp::expectedCost.cc)");
}
}
// Define all the template types we need
template double Mdp::expectedCost<1>(int, int, int, int, int) const;
template double Mdp::expectedCost<2>(int, int, int, int, int) const;
template double Mdp::expectedCost<3>(int, int, int, int, int) const;
template double Mdp::expectedCost<4>(int, int, int, int, int) const;
Here you find the output of 'objdump -D' when the above function is compiled with gcc -O2 -ffunction-sections:
1expectedCost.o: file format elf64-x86-64
Disassembly of section .group:
0000000000000000 <.group>:
0: 01 00 add %eax,(%rax)
2: 00 00 add %al,(%rax)
4: 08 00 or %al,(%rax)
6: 00 00 add %al,(%rax)
8: 09 00 or %eax,(%rax)
...
Disassembly of section .group:
0000000000000000 <.group>:
0: 01 00 add %eax,(%rax)
2: 00 00 add %al,(%rax)
4: 0a 00 or (%rax),%al
6: 00 00 add %al,(%rax)
8: 0b 00 or (%rax),%eax
...
Disassembly of section .group:
0000000000000000 <.group>:
0: 01 00 add %eax,(%rax)
2: 00 00 add %al,(%rax)
4: 0c 00 or $0x0,%al
6: 00 00 add %al,(%rax)
8: 0d .byte 0xd
9: 00 00 add %al,(%rax)
...
Disassembly of section .group:
0000000000000000 <.group>:
0: 01 00 add %eax,(%rax)
2: 00 00 add %al,(%rax)
4: 0e (bad)
5: 00 00 add %al,(%rax)
7: 00 0f add %cl,(%rdi)
9: 00 00 add %al,(%rax)
...
Disassembly of section .bss:
0000000000000000 <_ZStL8__ioinit>:
...
Disassembly of section .text._ZNK3Mdp12expectedCostILi1EEEdiiiii:
0000000000000000 <_ZNK3Mdp12expectedCostILi1EEEdiiiii>:
0: e9 00 00 00 00 jmpq 5 <_ZNK3Mdp12expectedCostILi1EEEdiiiii+0x5>
Disassembly of section .text._ZNK3Mdp12expectedCostILi2EEEdiiiii:
0000000000000000 <_ZNK3Mdp12expectedCostILi2EEEdiiiii>:
0: e9 00 00 00 00 jmpq 5 <_ZNK3Mdp12expectedCostILi2EEEdiiiii+0x5>
Disassembly of section .text._ZNK3Mdp12expectedCostILi3EEEdiiiii:
0000000000000000 <_ZNK3Mdp12expectedCostILi3EEEdiiiii>:
0: e9 00 00 00 00 jmpq 5 <_ZNK3Mdp12expectedCostILi3EEEdiiiii+0x5>
Disassembly of section .text._ZNK3Mdp12expectedCostILi4EEEdiiiii:
0000000000000000 <_ZNK3Mdp12expectedCostILi4EEEdiiiii>:
0: e9 00 00 00 00 jmpq 5 <_ZNK3Mdp12expectedCostILi4EEEdiiiii+0x5>
Disassembly of section .text.startup._GLOBAL__sub_I_expectedCost.cc:
0000000000000000 <_GLOBAL__sub_I_expectedCost.cc>:
0: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 7 <_GLOBAL__sub_I_expectedCost.cc+0x7>
7: 48 83 ec 08 sub $0x8,%rsp
b: e8 00 00 00 00 callq 10 <_GLOBAL__sub_I_expectedCost.cc+0x10>
10: 48 8b 3d 00 00 00 00 mov 0x0(%rip),%rdi # 17 <_GLOBAL__sub_I_expectedCost.cc+0x17>
17: 48 8d 15 00 00 00 00 lea 0x0(%rip),%rdx # 1e <_GLOBAL__sub_I_expectedCost.cc+0x1e>
1e: 48 8d 35 00 00 00 00 lea 0x0(%rip),%rsi # 25 <_GLOBAL__sub_I_expectedCost.cc+0x25>
25: 48 83 c4 08 add $0x8,%rsp
29: e9 00 00 00 00 jmpq 2e <_GLOBAL__sub_I_expectedCost.cc+0x2e>
Disassembly of section .init_array:
0000000000000000 <.init_array>:
...
Disassembly of section .comment:
0000000000000000 <.comment>:
0: 00 47 43 add %al,0x43(%rdi)
3: 43 3a 20 rex.XB cmp (%r8),%spl
6: 28 55 62 sub %dl,0x62(%rbp)
9: 75 6e jne 79 <_ZStL8__ioinit+0x79>
b: 74 75 je 82 <_ZStL8__ioinit+0x82>
d: 20 37 and %dh,(%rdi)
f: 2e 33 2e xor %cs:(%rsi),%ebp
12: 30 2d 32 37 75 62 xor %ch,0x62753732(%rip) # 6275374a <_ZStL8__ioinit+0x6275374a>
18: 75 6e jne 88 <_ZStL8__ioinit+0x88>
1a: 74 75 je 91 <_ZStL8__ioinit+0x91>
1c: 31 7e 31 xor %edi,0x31(%rsi)
1f: 38 2e cmp %ch,(%rsi)
21: 30 34 29 xor %dh,(%rcx,%rbp,1)
24: 20 37 and %dh,(%rdi)
26: 2e 33 2e xor %cs:(%rsi),%ebp
29: 30 00 xor %al,(%rax)
Disassembly of section .eh_frame:
0000000000000000 <.eh_frame>:
0: 14 00 adc $0x0,%al
2: 00 00 add %al,(%rax)
4: 00 00 add %al,(%rax)
6: 00 00 add %al,(%rax)
8: 01 7a 52 add %edi,0x52(%rdx)
b: 00 01 add %al,(%rcx)
d: 78 10 js 1f <.eh_frame+0x1f>
f: 01 1b add %ebx,(%rbx)
11: 0c 07 or $0x7,%al
13: 08 90 01 00 00 10 or %dl,0x10000001(%rax)
19: 00 00 add %al,(%rax)
1b: 00 1c 00 add %bl,(%rax,%rax,1)
1e: 00 00 add %al,(%rax)
20: 00 00 add %al,(%rax)
22: 00 00 add %al,(%rax)
24: 05 00 00 00 00 add $0x0,%eax
29: 00 00 add %al,(%rax)
2b: 00 10 add %dl,(%rax)
2d: 00 00 add %al,(%rax)
2f: 00 30 add %dh,(%rax)
31: 00 00 add %al,(%rax)
33: 00 00 add %al,(%rax)
35: 00 00 add %al,(%rax)
37: 00 05 00 00 00 00 add %al,0x0(%rip) # 3d <.eh_frame+0x3d>
3d: 00 00 add %al,(%rax)
3f: 00 10 add %dl,(%rax)
41: 00 00 add %al,(%rax)
43: 00 44 00 00 add %al,0x0(%rax,%rax,1)
47: 00 00 add %al,(%rax)
49: 00 00 add %al,(%rax)
4b: 00 05 00 00 00 00 add %al,0x0(%rip) # 51 <.eh_frame+0x51>
51: 00 00 add %al,(%rax)
53: 00 10 add %dl,(%rax)
55: 00 00 add %al,(%rax)
57: 00 58 00 add %bl,0x0(%rax)
5a: 00 00 add %al,(%rax)
5c: 00 00 add %al,(%rax)
5e: 00 00 add %al,(%rax)
60: 05 00 00 00 00 add $0x0,%eax
65: 00 00 add %al,(%rax)
67: 00 14 00 add %dl,(%rax,%rax,1)
6a: 00 00 add %al,(%rax)
6c: 6c insb (%dx),%es:(%rdi)
6d: 00 00 add %al,(%rax)
6f: 00 00 add %al,(%rax)
71: 00 00 add %al,(%rax)
73: 00 2e add %ch,(%rsi)
75: 00 00 add %al,(%rax)
77: 00 00 add %al,(%rax)
79: 4b 0e rex.WXB (bad)
7b: 10 5e 0e adc %bl,0xe(%rsi)
7e: 08 00 or %al,(%rax)
Yes, this is optimized. There are a few things that make reading assembly easier, such as demangling names (example: _ZNK3Mdp12expectedCostILi1EEEdiiiii is the mangled form of double Mdp::expectedCost<1>(int, int, int, int, int) const), stripping comments and text (and using Intel syntax):
double expectedCost<1>(int, int, int, int, int): # #double expectedCost<1>(int, int, int, int, int)
jmp expectedCost_exact(int, int, int, int, int) # TAILCALL
double expectedCost<2>(int, int, int, int, int): # #double expectedCost<2>(int, int, int, int, int)
jmp expectedCost_approx(int, int, int, int, int) # TAILCALL
double expectedCost<3>(int, int, int, int, int): # #double expectedCost<3>(int, int, int, int, int)
jmp expectedCost_exact(int, int, int, int, int) # TAILCALL
double expectedCost<4>(int, int, int, int, int): # #double expectedCost<4>(int, int, int, int, int)
jmp expectedCost_approx(int, int, int, int, int) # TAILCALL
https://godbolt.org/z/ZtoKFH
The above site simplifies this whole process for you.
In this case I didn't provide definitions for expectedCost_approx so the compiler just leaves a jump. But in any case, compilers are definitely smart enough to realize that each template function has a constant value in the switch.
The answer to your question is: Yes, any moderately useful compiler will perform dead code elimination.
if constexpr is not so much about forcing compile time evaluation for reasons of performance. In terms of performance, there's not really going to be any difference between if constexpr and a normal if when the given expression is a compile-time constant because compilers will end up optimizing the unused branch away either way. What if constexpr enables is to have code in the inactive branch that must not be instantiated with the given template arguments (e.g., because it would be invalid in that particular case). For your switch above, the whole code will be instantiated for all cases. Only afterwards will the unused code be removed by the optimizer. if constexpr on the other hand, guarantees that the code in the unused branch will never be instantiated to begin with. See, e.g., here for more on that…
We don't have switch constexpr, and there are no guaranties of branch elimination for simple switch, even with constexpr value (as for regular if in fact), but I expect than compiler would remove them with proper optimization flag.
Notice also that your not-used branches would instantiate, if any, template methods/objects whereas if constexpr would not.
So if you want to have guaranty that only relevant code is there, or avoid unneeded instantiations, use if constexpr. Else use the one you find the clearer.
I have some template-heavy C++ code that I want to ensure the compiler optimizes as much as possible due to the large amount of information it has at compile time. To evaluate its performance, I decided to take a look at the disassembly of the object file that it generates. Below is a snippet of what I got from objdump -dC:
0000000000000000 <bar<foo, 0u>::get(bool)>:
0: 41 57 push %r15
2: 49 89 f7 mov %rsi,%r15
5: 41 56 push %r14
7: 41 55 push %r13
9: 41 54 push %r12
b: 55 push %rbp
c: 53 push %rbx
d: 48 81 ec 68 02 00 00 sub $0x268,%rsp
14: 48 89 7c 24 10 mov %rdi,0x10(%rsp)
19: 48 89 f7 mov %rsi,%rdi
1c: 89 54 24 1c mov %edx,0x1c(%rsp)
20: e8 00 00 00 00 callq 25 <bar<foo, 0u>::get(bool)+0x25>
25: 84 c0 test %al,%al
27: 0f 85 eb 00 00 00 jne 118 <bar<foo, 0u>::get(bool)+0x118>
2d: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp)
34: 00 00
36: 4c 89 ff mov %r15,%rdi
39: 4d 8d b7 30 01 00 00 lea 0x130(%r15),%r14
40: e8 00 00 00 00 callq 45 <bar<foo, 0u>::get(bool)+0x45>
45: 84 c0 test %al,%al
47: 88 44 24 1b mov %al,0x1b(%rsp)
4b: 0f 85 ef 00 00 00 jne 140 <bar<foo, 0u>::get(bool)+0x140>
51: 80 7c 24 1c 00 cmpb $0x0,0x1c(%rsp)
56: 0f 85 24 03 00 00 jne 380 <bar<foo, 0u>::get(bool)+0x380>
5c: 48 8b 44 24 10 mov 0x10(%rsp),%rax
61: c6 00 00 movb $0x0,(%rax)
64: 80 7c 24 1b 00 cmpb $0x0,0x1b(%rsp)
69: 75 25 jne 90 <bar<foo, 0u>::get(bool)+0x90>
6b: 48 8b 74 24 10 mov 0x10(%rsp),%rsi
70: 4c 89 ff mov %r15,%rdi
73: e8 00 00 00 00 callq 78 <bar<foo, 0u>::get(bool)+0x78>
78: 48 8b 44 24 10 mov 0x10(%rsp),%rax
7d: 48 81 c4 68 02 00 00 add $0x268,%rsp
84: 5b pop %rbx
85: 5d pop %rbp
86: 41 5c pop %r12
88: 41 5d pop %r13
8a: 41 5e pop %r14
8c: 41 5f pop %r15
8e: c3 retq
8f: 90 nop
90: 4c 89 f7 mov %r14,%rdi
93: e8 00 00 00 00 callq 98 <bar<foo, 0u>::get(bool)+0x98>
98: 83 f8 04 cmp $0x4,%eax
9b: 74 f3 je 90 <bar<foo, 0u>::get(bool)+0x90>
9d: 85 c0 test %eax,%eax
9f: 0f 85 e4 08 00 00 jne 989 <bar<foo, 0u>::get(bool)+0x989>
a5: 49 83 87 b0 01 00 00 addq $0x1,0x1b0(%r15)
ac: 01
ad: 49 8d 9f 58 01 00 00 lea 0x158(%r15),%rbx
b4: 48 89 df mov %rbx,%rdi
b7: e8 00 00 00 00 callq bc <bar<foo, 0u>::get(bool)+0xbc>
bc: 49 8d bf 80 01 00 00 lea 0x180(%r15),%rdi
c3: e8 00 00 00 00 callq c8 <bar<foo, 0u>::get(bool)+0xc8>
c8: 48 89 df mov %rbx,%rdi
cb: e8 00 00 00 00 callq d0 <bar<foo, 0u>::get(bool)+0xd0>
d0: 4c 89 f7 mov %r14,%rdi
d3: e8 00 00 00 00 callq d8 <bar<foo, 0u>::get(bool)+0xd8>
d8: 83 f8 04 cmp $0x4,%eax
The disassembly of this particular function continues on, but one thing I noticed is the relatively large number of call instructions like this one:
20: e8 00 00 00 00 callq 25 <bar<foo, 0u>::get(bool)+0x25>
These instructions, always with the opcode e8 00 00 00 00, occur frequently throughout the generated code, and from what I can tell, are nothing more than no-ops; they all seem to just fall through to the next instruction. This begs the question, then, is there a good reason why all these instructions are generated?
I'm concerned about the instruction cache footprint of the generated code, so wasting 5 bytes many times throughout a function seems counterproductive. It seems a bit heavyweight for a nop, unless the compiler is trying to preserve some kind of memory alignment or something. I wouldn't be surprised if this were the case.
I compiled my code using g++ 4.8.5 using -O3 -fomit-frame-pointer. For what it's worth, I saw similar code generation using clang 3.7.
The 00 00 00 00 (relative) target address in e8 00 00 00 00 is intended to be filled in by the linker. It doesn't mean that the call falls through. It just means you are disassembling an object file that has not been linked yet.
Also, a call to the next instruction, if that was the end result after the link phase, would not be a no-op, because it changes the stack (a certain hint that this is not what is going on in your case).
i got some simple code file
mangen.c:
///////////// begin of the file
void mangen(int* data)
{
for(int j=0; j<100; j++)
for(int i=0; i<100; i++)
data[j*100+i] = 111;
}
//////// end of the file
I compile it with mingw (on win32)
c:\mingw\bin\gcc -std=c99 -c mangen.c -fno-exceptions -march=core2 -mtune=generic -mfpmath=both -msse2
it yeilds to mangen.o file which is 400 bytes
00000000 4C 01 03 00 00 00 00 00-D8 00 00 00 0A 00 00 00 L...............
00000010 00 00 05 01 2E 74 65 78-74 00 00 00 00 00 00 00 .....text.......
00000020 00 00 00 00 4C 00 00 00-8C 00 00 00 00 00 00 00 ....L...........
00000030 00 00 00 00 00 00 00 00-20 00 30 60 2E 64 61 74 ........ .0`.dat
00000040 61 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 a...............
00000050 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000060 40 00 30 C0 2E 62 73 73-00 00 00 00 00 00 00 00 #.0..bss........
00000070 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000080 00 00 00 00 00 00 00 00-80 00 30 C0 55 89 E5 83 ..........0.U...
00000090 EC 10 C7 45 FC 00 00 00-00 EB 34 C7 45 F8 00 00 ...E......4.E...
000000A0 00 00 EB 21 8B 45 FC 6B-D0 64 8B 45 F8 01 D0 8D ...!.E.k.d.E....
000000B0 14 85 00 00 00 00 8B 45-08 01 D0 C7 00 6F 00 00 .......E.....o..
000000C0 00 83 45 F8 01 83 7D F8-63 7E D9 83 45 FC 01 83 ..E...}.c~..E...
000000D0 7D FC 63 7E C6 C9 C3 90-2E 66 69 6C 65 00 00 00 }.c~.....file...
000000E0 00 00 00 00 FE FF 00 00-67 01 6D 61 6E 67 65 6E ........g.mangen
000000F0 2E 63 00 00 00 00 00 00-00 00 00 00 5F 6D 61 6E .c.........._man
00000100 67 65 6E 00 00 00 00 00-01 00 20 00 02 01 00 00 gen....... .....
00000110 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
00000120 2E 74 65 78 74 00 00 00-00 00 00 00 01 00 00 00 .text...........
00000130 03 01 4B 00 00 00 00 00-00 00 00 00 00 00 00 00 ..K.............
00000140 00 00 00 00 2E 64 61 74-61 00 00 00 00 00 00 00 .....data.......
00000150 02 00 00 00 03 01 00 00-00 00 00 00 00 00 00 00 ................
00000160 00 00 00 00 00 00 00 00-2E 62 73 73 00 00 00 00 .........bss....
00000170 00 00 00 00 03 00 00 00-03 01 00 00 00 00 00 00 ................
00000180 00 00 00 00 00 00 00 00-00 00 00 00 04 00 00 00 ................
Now I need to know where is the binary chunk containing
above function body in here
Could someone provide some simple code that will allow me to retrive
this boundaries ?
(assume that function body may be shorter or longer and also
there may be other functions or data in source fite added so
it will move in chunk but I suspect procedure to localise it
should be not very complex.
You can use objdump -Fd mangen.o to find out file offset and lenght of a function.
Alternatively, you can use readelf -s mangen.o to find out size of a function.
You may define something like int abc = 0x11223344; in the beginning and end of function and use the constants to locate the function body.
You can use objdump or nm.
For instance, try:
nm mangen.o
Or
objdump -t mangen.o
If you need to use your own code, have a look here:
http://www.rohitab.com/discuss/topic/38591-c-import-table-parser/
It will give you something to start with. You can find much more information about the format in MSDN.
If you are into Python, there is nice tool/library (including source code) that can be helpful:
https://code.google.com/p/pefile/
I read that with inline functions where ever the function call is made we replace the function call with the body of the function definition.
According to the above explanation there should not be any function call when inline is user.
If that is the case Why do I see three call instructions in the assembly code ?
#include <iostream>
inline int add(int x, int y)
{
return x+ y;
}
int main()
{
add(8,9);
add(20,10);
add(100,233);
}
meow#vikkyhacks ~/Arena/c/temp $ g++ -c a.cpp
meow#vikkyhacks ~/Arena/c/temp $ objdump -M intel -d a.o
0000000000000000 <main>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: be 09 00 00 00 mov esi,0x9
9: bf 08 00 00 00 mov edi,0x8
e: e8 00 00 00 00 call 13 <main+0x13>
13: be 0a 00 00 00 mov esi,0xa
18: bf 14 00 00 00 mov edi,0x14
1d: e8 00 00 00 00 call 22 <main+0x22>
22: be e9 00 00 00 mov esi,0xe9
27: bf 64 00 00 00 mov edi,0x64
2c: e8 00 00 00 00 call 31 <main+0x31>
31: b8 00 00 00 00 mov eax,0x0
36: 5d pop rbp
37: c3 ret
NOTE
Complete dump of the object file is here
You did not optimize so the calls are not inlined
You produced an object file (not a .exe) so the calls are not resolved. What you see is a dummy call whose address will be filled by the linker
If you compile a full executable you will see the correct addresses for the jumps
See page 28 of:
http://www.cs.princeton.edu/courses/archive/spr04/cos217/lectures/Assembler.pdf
I have the following entry in a Dr Watson log. What is the significance of the "ds:0023:003a3000=??" part of the entry to the right of the FAULT line?
*----> State Dump for Thread Id 0xdfc <----*
eax=00000000 ebx=00390320 ecx=0854ff48 edx=09e44bfc esi=00012ce1 edi=0854ff61
eip=00465c51 esp=0854ff30 ebp=00000000 iopl=0 nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
function: sysman
00465c37 49 dec ecx
00465c38 eb02 jmp sysman+0x65c3c (00465c3c)
00465c3a 33c9 xor ecx,ecx
00465c3c 8d542428 lea edx,[esp+0x28]
00465c40 52 push edx
00465c41 51 push ecx
00465c42 8d4c2418 lea ecx,[esp+0x18]
00465c46 e8d5c0fcff call sysman+0x31d20 (00431d20)
00465c4b 33c0 xor eax,eax
00465c4d 8d4c2418 lea ecx,[esp+0x18]
FAULT ->00465c51 8a441eff mov al,[esi+ebx-0x1] ds:0023:003a3000=??
00465c55 50 push eax
00465c56 6864074900 push 0x490764
00465c5b 51 push ecx
00465c5c e8cfd0fcff call sysman+0x32d30 (00432d30)
00465c61 8d542424 lea edx,[esp+0x24]
00465c65 68689f4800 push 0x489f68
00465c6a 8d44242c lea eax,[esp+0x2c]
00465c6e 52 push edx
00465c6f 50 push eax
00465c70 e83bc3fcff call sysman+0x31fb0 (00431fb0)
*----> Stack Back Trace <----*
ChildEBP RetAddr Args to Child
00000000 00000000 00000000 00000000 00000000 sysman+0x65c51
*----> Raw Stack Dump <----*
000000000854ff30 58 01 55 08 75 07 c8 09 - 00 00 00 00 18 6d c7 01 X.U.u........m..
000000000854ff40 fc 4b e4 09 04 bd 47 00 - 04 bd 47 00 fc 0c c9 01 .K....G...G.....
000000000854ff50 ac ca ae 09 64 5f c4 01 - 20 37 37 30 32 34 3a 20 ....d_.. 77024:
000000000854ff60 00 b3 42 00 a8 ff 54 08 - 90 a6 47 00 02 00 00 00 ..B...T...G.....
000000000854ff70 8b c5 42 00 b8 ff 54 08 - 2e 03 39 00 28 99 cb 01 ..B...T...9.(...
000000000854ff80 ff ff ff ff 00 00 00 00 - 00 00 00 00 20 1e cb 01 ............ ...
000000000854ff90 a6 f7 ba 77 06 00 00 00 - c9 f7 ba 77 e1 6b d9 09 ...w.......w.k..
000000000854ffa0 06 00 00 00 1f 00 00 00 - 68 00 55 08 c1 a0 47 00 ........h.U...G.
000000000854ffb0 00 00 00 00 58 c4 42 00 - c9 a5 ca 09 d1 fb 38 0a ....X.B.......8.
000000000854ffc0 27 00 00 00 e1 6b d9 09 - ef f2 41 00 c9 a5 ca 09 '....k....A.....
000000000854ffd0 01 59 cc 01 38 00 55 08 - ec 00 55 08 00 00 00 00 .Y..8.U...U.....
000000000854ffe0 e0 00 55 08 ff ff ff ff - 89 00 00 00 01 00 01 01 ..U.............
000000000854fff0 c8 ff 54 08 b8 ff 54 08 - 77 00 55 08 29 a5 ca 09 ..T...T.w.U.)...
0000000008550000 51 00 00 00 5f 00 00 00 - 00 9f 82 7c 61 36 ca 01 Q..._......|a6..
0000000008550010 25 00 00 00 3f 00 00 00 - 00 ce bb 77 91 b7 c7 01 %...?......w....
0000000008550020 19 00 00 00 1f 00 00 00 - 00 ff ff ff d9 28 cc 01 .............(..
0000000008550030 0b 00 00 00 1f 00 00 00 - 00 00 55 08 d1 fb 38 0a ..........U...8.
0000000008550040 27 00 00 00 3f 00 00 00 - 00 20 ba 77 00 00 00 00 '...?.... .w....
0000000008550050 00 00 00 00 00 00 00 00 - 20 b7 c7 01 00 00 00 00 ........ .......
0000000008550060 00 00 00 00 00 00 00 00 - b4 00 55 08 1b 90 47 00 ..........U...G.`
To summarize:
You get a register dump here:
eax=00000000 ebx=00390320 ecx=0854ff48 edx=09e44bfc esi=00012ce1 edi=0854ff61
eip=00465c51 esp=0854ff30 ebp=00000000 iopl=0 nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
The eip indicates the instruction that failed:
FAULT ->00465c51 8a441eff mov al,[esi+ebx-0x1] ds:0023:003a3000=??
The stuff at the end is the address that failed to read, which is the "usual" data segment of 23, and address 3A3000, whcih is composed of esi and ebx minus 1: 390320+12ce1-1. To me, that looks like an index gone bad - 3a3000 would be the first address of a new "page" in memory, so that's why it's failing at that point. 77025 bytes into an array is quite a long way, but it is of course possible that it's something else that is wrong.