Extracting int array from compiled C++ code

Extracting int array from compiled C++ code - c++

Is it possible for an attacker to get integer arrays from your compiled code?
Like how the attacker can get strings from your code using the strings command.

Yes. A quick example with main.c:
int main(void) {
int vars[8] = {0,1,2,3,4,5,6,7};
}
Then gcc -O0 main.c -o main to disable optimization so our unused array isn't removed. Then if you simply disassemble it:
0000000000400474 <main>:
400474: 55 push %rbp
400475: 48 89 e5 mov %rsp,%rbp
400478: c7 45 e0 00 00 00 00 movl $0x0,-0x20(%rbp)
40047f: c7 45 e4 01 00 00 00 movl $0x1,-0x1c(%rbp)
400486: c7 45 e8 02 00 00 00 movl $0x2,-0x18(%rbp)
40048d: c7 45 ec 03 00 00 00 movl $0x3,-0x14(%rbp)
400494: c7 45 f0 04 00 00 00 movl $0x4,-0x10(%rbp)
40049b: c7 45 f4 05 00 00 00 movl $0x5,-0xc(%rbp)
4004a2: c7 45 f8 06 00 00 00 movl $0x6,-0x8(%rbp)
4004a9: c7 45 fc 07 00 00 00 movl $0x7,-0x4(%rbp)
It makes logical sense, if you have data in your code and your program uses it, then the data must exist somewhere.

It is possible, I recommend you to open a executable file after compiled with notepad++ or notepad, and maybe you see something as:
But why this is so simple with strings and not with vectors? Because strings are usually just made by alfphanumeric letters and so on, just seen them inside of binary files we found tham, an array of int will be inside of your code as raw data but eye seen them is hard since they are not (obligatorily) printed characters, but if you use a disassembling program it is as easy to find strings as other arrays.

Related

Is there a way to see what's inside a ".rodata+(memory location)" in an object file?

So I'm taking a class where I am given a single object file and need to reverse engineer it into c++ code. The command I'm told to use is "gdb assignment6_1.o" to open it in gdb, and "disass main" to see assembly code.
I'm also using "objdump -dr assignment6_1.o" myself since it outputs a little more information.
The problem I'm running into, is that using objdump, I can see that the program is trying to access what I believe is a variable or maybe a string, ".rodata+0x41". There are multiple .rodata's, that's just one example.
Is there a command or somewhere I can look to see what that's referencing? I also have access to the "Bless" program.
Below is a snippet of the disassembled code I have.
a3: 48 8d 35 00 00 00 00 lea 0x0(%rip),%rsi # aa <main+0x31>
a6: R_X86_64_PC32 .rodata+0x41
aa: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b1 <main+0x38>
ad: R_X86_64_PC32 _ZSt4cout-0x4
b1: e8 00 00 00 00 callq b6 <main+0x3d>
b2: R_X86_64_PLT32 _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc-0x4
b6: 48 8d 35 00 00 00 00 lea 0x0(%rip),%rsi # bd <main+0x44>
b9: R_X86_64_PC32 .rodata+0x53
bd: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # c4 <main+0x4b>
c0: R_X86_64_PC32 _ZSt4cout-0x4
c4: e8 00 00 00 00 callq c9 <main+0x50>
c5: R_X86_64_PLT32 _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc-0x4
c9: 48 8d 35 00 00 00 00 lea 0x0(%rip),%rsi # d0 <main+0x57>
cc: R_X86_64_PC32 .rodata+0x5e
d0: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # d7 <main+0x5e>
d3: R_X86_64_PC32 _ZSt4cout-0x4
d7: e8 00 00 00 00 callq dc <main+0x63>
d8: R_X86_64_PLT32 _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc-0x4
dc: 48 8d 35 00 00 00 00 lea 0x0(%rip),%rsi # e3 <main+0x6a>
df: R_X86_64_PC32 .rodata+0x6e
e3: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # ea <main+0x71>
e6: R_X86_64_PC32 _ZSt4cout-0x4
ea: e8 00 00 00 00 callq ef <main+0x76>
eb: R_X86_64_PLT32 _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc-0x4```

Is there a way to see what's inside a ".rodata+(memory location)" in an object file?
Sure. Both objdump and readelf can dump contents of any section.
Example:
// x.c
#include <stdio.h>
int foo() { return printf("AA.\n") + printf("BBBB.\n"); }
gcc -c x.c
objdump -dr x.o
...
9: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 10 <foo+0x10>
c: R_X86_64_PC32 .rodata-0x4
...
1f: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 26 <foo+0x26>
22: R_X86_64_PC32 .rodata+0x1
...
Note that because the RIP used in these instructions is the address of the next instruction, the actual data we care about is at .rodata+0 and .rodata+5 (in your original disassembly, you care about .rodata+45, not .rodata+41).
So what's there?
objdump -sj.rodata x.o
x.o: file format elf64-x86-64
Contents of section .rodata:
0000 41412e0a 00424242 422e0a00 AA...BBBB...
or, using readelf:
readelf -x .rodata x.o
Hex dump of section '.rodata':
0x00000000 41412e0a 00424242 422e0a00 AA...BBBB...

Why does C++ inline function has call instructions?

I read that with inline functions where ever the function call is made we replace the function call with the body of the function definition.
According to the above explanation there should not be any function call when inline is user.
If that is the case Why do I see three call instructions in the assembly code ?
#include <iostream>
inline int add(int x, int y)
{
return x+ y;
}
int main()
{
add(8,9);
add(20,10);
add(100,233);
}
meow#vikkyhacks ~/Arena/c/temp $ g++ -c a.cpp
meow#vikkyhacks ~/Arena/c/temp $ objdump -M intel -d a.o
0000000000000000 <main>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: be 09 00 00 00 mov esi,0x9
9: bf 08 00 00 00 mov edi,0x8
e: e8 00 00 00 00 call 13 <main+0x13>
13: be 0a 00 00 00 mov esi,0xa
18: bf 14 00 00 00 mov edi,0x14
1d: e8 00 00 00 00 call 22 <main+0x22>
22: be e9 00 00 00 mov esi,0xe9
27: bf 64 00 00 00 mov edi,0x64
2c: e8 00 00 00 00 call 31 <main+0x31>
31: b8 00 00 00 00 mov eax,0x0
36: 5d pop rbp
37: c3 ret
NOTE
Complete dump of the object file is here

You did not optimize so the calls are not inlined
You produced an object file (not a .exe) so the calls are not resolved. What you see is a dummy call whose address will be filled by the linker
If you compile a full executable you will see the correct addresses for the jumps
See page 28 of:
http://www.cs.princeton.edu/courses/archive/spr04/cos217/lectures/Assembler.pdf

Why does vectorization fail?

I want to optimize my code for vectorization using
-msse2 -ftree-vectorizer-verbose=2.
I have the following simple code:
int main(){
int a[2048], b[2048], c[2048];
int i;
for (i=0; i<2048; i++){
b[i]=0;
c[i]=0;
}
for (i=0; i<2048; i++){
a[i] = b[i] + c[i];
}
return 0;
}
Why do I get the note
test.cpp:10: note: not vectorized: not enough data-refs in basic block.
Thanks!

For what it's worth, after adding an asm volatile("": "+m"(a), "+m"(b), "+m"(c)::"memory"); near the end of main, my copy of gcc emits this:
400610: 48 81 ec 08 60 00 00 sub $0x6008,%rsp
400617: ba 00 20 00 00 mov $0x2000,%edx
40061c: 31 f6 xor %esi,%esi
40061e: 48 8d bc 24 00 20 00 lea 0x2000(%rsp),%rdi
400625: 00
400626: e8 b5 ff ff ff callq 4005e0 <memset#plt>
40062b: ba 00 20 00 00 mov $0x2000,%edx
400630: 31 f6 xor %esi,%esi
400632: 48 8d bc 24 00 40 00 lea 0x4000(%rsp),%rdi
400639: 00
40063a: e8 a1 ff ff ff callq 4005e0 <memset#plt>
40063f: 31 c0 xor %eax,%eax
400641: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
400648: c5 f9 6f 84 04 00 20 vmovdqa 0x2000(%rsp,%rax,1),%xmm0
40064f: 00 00
400651: c5 f9 fe 84 04 00 40 vpaddd 0x4000(%rsp,%rax,1),%xmm0,%xmm0
400658: 00 00
40065a: c5 f8 29 04 04 vmovaps %xmm0,(%rsp,%rax,1)
40065f: 48 83 c0 10 add $0x10,%rax
400663: 48 3d 00 20 00 00 cmp $0x2000,%rax
400669: 75 dd jne 400648 <main+0x38>
So it recognised that the first loop was just doing memset to a couple arrays and the second loop was doing a vector addition, which it appropriately vectorised.
I'm using gcc version 4.9.0 20140521 (prerelease) (GCC).
An older machine with gcc version 4.7.2 (Debian 4.7.2-5) also vectorises the loop, but in a different way. Your -ftree-vectorizer-verbose=2 setting makes it emit the following output:
Analyzing loop at foo155.cc:10
Vectorizing loop at foo155.cc:10
10: LOOP VECTORIZED.
foo155.cc:1: note: vectorized 1 loops in function.
You probably goofed your compiler flags (I used g++ -O3 -ftree-vectorize -ftree-vectorizer-verbose=2 -march=native foo155.cc -o foo155 to build) or have a really old compiler.

remove the first loop and do this
int a[2048], b[2048], c[2048] = {0};
also try this tag
-ftree-vectorize
instead of
-msse2 -ftree-vectorizer-verbose=2

odd compiled code

I've compiled some Qt code with google's nacl compiler, but the ncval validator does not grok it. One example among many:
src/corelib/animation/qabstractanimation.cpp:165
Here's the relevant code:
#define Q_GLOBAL_STATIC(TYPE, NAME) \
static TYPE *NAME() \
{ \
static TYPE thisVariable; \
static QGlobalStatic<TYPE > thisGlobalStatic(&thisVariable); \
return thisGlobalStatic.pointer; \
}
#ifndef QT_NO_THREAD
Q_GLOBAL_STATIC(QThreadStorage<QUnifiedTimer *>, unifiedTimer)
#endif
which compiles to:
00000480 <_ZL12unifiedTimerv>:
480: 55 push %ebp
481: 89 e5 mov %esp,%ebp
483: 57 push %edi
484: 56 push %esi
485: 53 push %ebx
486: 83 ec 2c sub $0x2c,%esp
489: c7 04 24 28 00 2e 10 movl $0x102e0028,(%esp)
490: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
494: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
49b: e8 fc ff ff ff call 49c <_ZL12unifiedTimerv+0x1c>
4a0: 84 c0 test %al,%al
4a2: 74 1c je 4c0 <_ZL12unifiedTimerv+0x40>
4a4: 0f b6 05 2c 00 2e 10 movzbl 0x102e002c,%eax
4ab: 83 f0 01 xor $0x1,%eax
4ae: 84 c0 test %al,%al
4b0: 74 0e je 4c0 <_ZL12unifiedTimerv+0x40>
4b2: b8 01 00 00 00 mov $0x1,%eax
4b7: eb 27 jmp 4e0 <_ZL12unifiedTimerv+0x60>
4b9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
4c0: b8 00 00 00 00 mov $0x0,%eax
4c5: eb 19 jmp 4e0 <_ZL12unifiedTimerv+0x60>
4c7: 90 nop
4c8: 90 nop
4c9: 90 nop
4ca: 90 nop
4cb: 90 nop
Check the call instruction at 49b: it is what the validator cannot grok. What on earth could induce the compiler to issue an instruction that calls into the middle of itself? Is there a way around this? I've compiled with -g -O0 -fno-inline. Compiler bug?

Presumably it's really a call to an external symbol, which will get filled in at link time. Actually what will get called is externalSymbol-4, which is a bit strange -- perhaps this is what is throwing the ncval validator off the scent.

Is this a dynamic library or a static object that is not linked to an executable yet?
In a dynamic library this likely came out because the code was built as position-dependent and linked into a dynamic library. Try "objdump -d -r -R" on it, if you see TEXTREL, that is the case. TEXTREL is not supported in NaCl dynamic linking stories. (solved by having -fPIC flag during compilation of the code)
With a static object try to validate after it was linked into a static executable.

Is this an optimization bug in g++?

I'm not sure whether I've found a bug in g++ (4.4.1-4ubuntu9), or if I'm doing
something wrong. What I believe I'm seeing is a bug introduced by enabling
optimization with g++ -O2. I've tried to distill the code down to just the
relevant parts.
When optimization is enabled, I have an ASSERT which is failing. When
optimization is disabled, the same ASSERT does not fail. I think I've tracked
it down to the optimization of one function and its callers.
The System
Language: C++
Ubuntu 9.10
g++-4.4.real (Ubuntu 4.4.1-4ubuntu9) 4.4.1
Linux 2.6.31-22-server x86_64
Optimization Enabled
Object compiled with:
g++ -DHAVE_CONFIG_H -I. -fPIC -g -O2 -MT file.o -MD -MP -MF .deps/file.Tpo -c -o file.o file.cpp
And here is the relevant code from objdump -dg file.o.
00000000000018b0 <helper_function>:
;; This function takes two parameters:
;; pointer to int: %rdi
;; pointer to int[]: %rsi
18b0: 0f b6 07 movzbl (%rdi),%eax
18b3: 83 f8 12 cmp $0x12,%eax
18b6: 74 60 je 1918 <helper_function+0x68>
18b8: 83 f8 17 cmp $0x17,%eax
18bb: 74 5b je 1918 <helper_function+0x68>
...
1918: c7 06 32 00 00 00 movl $0x32,(%rsi)
191e: 66 90 xchg %ax,%ax
1920: c3 retq
0000000000005290 <buggy_invoker>:
... snip ...
52a0: 48 81 ec c8 01 00 00 sub $0x1c8,%rsp
52a7: 48 8d 84 24 a0 01 00 lea 0x1a0(%rsp),%rax
52ae: 00
52af: 48 c7 84 24 a0 01 00 movq $0x0,0x1a0(%rsp)
52b6: 00 00 00 00 00
52bb: 48 c7 84 24 a8 01 00 movq $0x0,0x1a8(%rsp)
52c2: 00 00 00 00 00
52c7: c7 84 24 b0 01 00 00 movl $0x0,0x1b0(%rsp)
52ce: 00 00 00 00
52d2: 4c 8d 7c 24 20 lea 0x20(%rsp),%r15
52d7: 48 89 c6 mov %rax,%rsi
52da: 48 89 44 24 08 mov %rax,0x8(%rsp)
;; ***** BUG HERE *****
;; Pointer to int[] loaded into %rsi
;; But where is %rdi populated?
52df: e8 cc c5 ff ff callq 18b0 <helper_function>
0000000000005494 <perfectly_fine_invoker>:
5494: 48 83 ec 20 sub $0x20,%rsp
5498: 0f ae f0 mfence
549b: 48 8d 7c 24 30 lea 0x30(%rsp),%rdi
54a0: 48 89 e6 mov %rsp,%rsi
54a3: 48 c7 04 24 00 00 00 movq $0x0,(%rsp)
54aa: 00
54ab: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp)
54b2: 00 00
54b4: c7 44 24 10 00 00 00 movl $0x0,0x10(%rsp)
54bb: 00
;; Non buggy invocation here: both %rdi and %rsi loaded correctly.
54bc: e8 ef c3 ff ff callq 18b0 <helper_function>
Optimization Disabled
Now compiled with:
g++ -DHAVE_CONFIG_H -I. -fPIC -g -O0 -MT file.o -MD -MP -MF .deps/file.Tpo -c -o file.o file.cpp
0000000000008d27 <helper_function>:
;; Still the same parameters here, but it looks a little different.
... snip ...
8d2b: 48 89 7d e8 mov %rdi,-0x18(%rbp)
8d2f: 48 89 75 e0 mov %rsi,-0x20(%rbp)
8d33: 48 8b 45 e8 mov -0x18(%rbp),%rax
8d37: 0f b6 00 movzbl (%rax),%eax
8d3a: 0f b6 c0 movzbl %al,%eax
8d3d: 89 45 fc mov %eax,-0x4(%rbp)
8d40: 8b 45 fc mov -0x4(%rbp),%eax
8d43: 83 f8 17 cmp $0x17,%eax
8d46: 74 40 je 8d88 <helper_function+0x61>
...
000000000000948a <buggy_invoker>:
948a: 55 push %rbp
948b: 48 89 e5 mov %rsp,%rbp
948e: 41 54 push %r12
9490: 53 push %rbx
9491: 48 81 ec c0 01 00 00 sub $0x1c0,%rsp
9498: 48 89 bd 38 fe ff ff mov %rdi,-0x1c8(%rbp)
949f: 48 89 b5 30 fe ff ff mov %rsi,-0x1d0(%rbp)
94a6: 48 c7 45 c0 00 00 00 movq $0x0,-0x40(%rbp)
94ad: 00
94ae: 48 c7 45 c8 00 00 00 movq $0x0,-0x38(%rbp)
94b5: 00
94b6: c7 45 d0 00 00 00 00 movl $0x0,-0x30(%rbp)
94bd: 48 8d 55 c0 lea -0x40(%rbp),%rdx
94c1: 48 8b 85 38 fe ff ff mov -0x1c8(%rbp),%rax
94c8: 48 89 d6 mov %rdx,%rsi
94cb: 48 89 c7 mov %rax,%rdi
;; ***** NOT BUGGY HERE *****
;; Now, without optimization, both %rdi and %rsi loaded correctly.
94ce: e8 54 f8 ff ff callq 8d27 <helper_function>
0000000000008eec <different_perfectly_fine_invoker>:
8eec: 55 push %rbp
8eed: 48 89 e5 mov %rsp,%rbp
8ef0: 48 83 ec 30 sub $0x30,%rsp
8ef4: 48 89 7d d8 mov %rdi,-0x28(%rbp)
8ef8: 48 c7 45 e0 00 00 00 movq $0x0,-0x20(%rbp)
8eff: 00
8f00: 48 c7 45 e8 00 00 00 movq $0x0,-0x18(%rbp)
8f07: 00
8f08: c7 45 f0 00 00 00 00 movl $0x0,-0x10(%rbp)
8f0f: 48 8d 55 e0 lea -0x20(%rbp),%rdx
8f13: 48 8b 45 d8 mov -0x28(%rbp),%rax
8f17: 48 89 d6 mov %rdx,%rsi
8f1a: 48 89 c7 mov %rax,%rdi
;; Another example of non-optimized call to that function.
8f1d: e8 05 fe ff ff callq 8d27 <helper_function>
The Original C++ Code
This is a sanitized version of the original C++. I've just changed some names
and removed irrelevant code. Forgive my paranoia, I just don't want to expose
too much code from unpublished and unreleased work :-).
static void helper_function(my_struct_t *e, int *outArr)
{
unsigned char event_type = e->header.type;
if (event_type == event_A || event_type == event_B) {
outArr[0] = action_one;
} else if (event_type == event_C) {
outArr[0] = action_one;
outArr[1] = action_two;
} else if (...) { ... }
}
static void buggy_invoker(my_struct_t *e, predicate_t pred)
{
// MAX_ACTIONS is #defined to 5
int action_array[MAX_ACTIONS] = {0};
helper_function(e, action_array);
...
}
static int has_any_actions(my_struct_t *e)
{
int actions[MAX_ACTIONS] = {0};
helper_function(e, actions);
return actions[0] != 0;
}
// *** ENTRY POINT to this code is this function (note not static).
void perfectly_fine_invoker(my_struct_t e, predicate_t pred)
{
memfence();
if (has_any_actions(&e)) {
buggy_invoker(&e, pred);
}
...
}
If you think I've obfuscated or eliminiated too much, let me know. Users of
this code call 'perfectly_fine_invoker'. With optimization, g++ optimizes the
'has_any_actions' function away into a direct call to 'helper_function', which
you can see in the assembly.
The Question
So, my question is, does it look like a buggy optimization to anyone else?
If it would be helpful, I could post a sanitized version of the original C++ code.
This is my first posting to Stack Overflow, so please let me know if I can do
anything to make the question clearer, or provide any additional information.
The Answer
Edit (several days after the fact):
I accepted an answer below to my question -- it was not an optimization bug in g++, I was just looking at the assembly code wrong.
However, for whoever may be viewing this question in the future, I've found the answer. I did some reading on undefined behavior in C ( http://blog.regehr.org/archives/213 and http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html ) and some of the descriptions of the compiler optimizing away functions with undefined behavior seemed eerily familiar.
I added some NULL-pointer checks to the function 'helper_function' and lo and behold... bug goes away. I should have had the NULL-pointer checks to begin with, but apparently not having them allowed g++ to do whatever it wanted (in my case, optimize away the call).
Hope this information helps someone down the road.

I think you are looking at the wrong thing. I imagine the compiler notice that your function is short and doesn't touch the %rdi register so it just leaves it alone (you have the same variable as the first parameter, which I guess is what is placed in %rdi. See page 21 here http://www.x86-64.org/documentation/abi.pdf)
If you look at the unoptimized version it saves the %rdi register on this line
9498: 48 89 bd 38 fe ff ff mov %rdi,-0x1c8(%rbp)
...and then later just before calling helper_function it moves the saved value into %rax that is moved into %rdi.
94c1: 48 8b 85 38 fe ff ff mov -0x1c8(%rbp),%rax
94c8: 48 89 d6 mov %rdx,%rsi
94cb: 48 89 c7 mov %rax,%rdi
When optimizing it the compiler just get rid of all that moving back and forth.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extracting int array from compiled C++ code - c++

Is it possible for an attacker to get integer arrays from your compiled code? Like how the attacker can get strings from your code using the strings command.

Related

Is there a way to see what's inside a ".rodata+(memory location)" in an object file?

Why does C++ inline function has call instructions?

Why does vectorization fail?

odd compiled code

Is this an optimization bug in g++?

Categories

Resources