gdb - optimized value analysis

gdb - optimized value analysis - gdb

My CPU is arm.How can I figure out the function parameter value if it's optimized out?
For example:
status_t NuPlayer::GenericSource::setDataSource(
int fd, int64_t offset, int64_t length) {
resetDataSource();
mFd = dup(fd);
mOffset = offset;
mLength = length;
Above function has 3 parameters, when I try to print the second parameter offset, I will get below result:
Thread 4 "Binder:15082_3" hit Breakpoint 1, android::NuPlayer::GenericSource::setDataSource (this=0xae63bb40, fd=8, offset=<optimized out>, length=9384436) at frameworks/av/media/libmediaplayerservice/nuplayer/GenericSource.cpp:123
123 resetDataSource();
(gdb) x/i $pc
=> 0xb02aaa80 <android::NuPlayer::GenericSource::setDataSource(int, long long, long long)+12>: blx 0xb0282454 <_ZN7android8NuPlayer13GenericSource15resetDataSourceEv#plt>
(gdb) n
125 mFd = dup(fd);
(gdb) print offset
$1 = <optimized out>
(gdb) p $eax
$2 = void
(gdb) disassemble /m
Dump of assembler code for function android::NuPlayer::GenericSource::setDataSource(int, long long, long long):
122 int fd, int64_t offset, int64_t length) {
0xb02aaa74 <+0>: push {r4, r5, r6, r7, lr}
0xb02aaa76 <+2>: sub sp, #4
0xb02aaa78 <+4>: mov r4, r3
0xb02aaa7a <+6>: mov r5, r2
0xb02aaa7c <+8>: mov r6, r1
0xb02aaa7e <+10>: mov r7, r0
123 resetDataSource();
=> 0xb02aaa80 <+12>: blx 0xb0282454 <_ZN7android8NuPlayer13GenericSource15resetDataSourceEv#plt>
124
125 mFd = dup(fd);
0xb02aaa84 <+16>: mov r0, r6
0xb02aaa86 <+18>: blx 0xb027e5d8 <dup#plt>
0xb02aaa8a <+22>: ldrd r2, r1, [sp, #24]
0xb02aaa8e <+26>: str.w r0, [r7, #224] ; 0xe0
0xb02aaa92 <+30>: movs r0, #0
126 mOffset = offset;
0xb02aaa94 <+32>: strd r5, r4, [r7, #232] ; 0xe8
127 mLength = length;
0xb02aaa98 <+36>: strd r2, r1, [r7, #240] ; 0xf0
128
129 // delay data source creation to prepareAsync() to avoid blocking
130 // the calling thread in setDataSource for any significant time.
131 return OK;
0xb02aaa9c <+40>: add sp, #4
0xb02aaa9e <+42>: pop {r4, r5, r6, r7, pc}
End of assembler dump.
(gdb)
I guess it's in some register but the result of $eax is void.

I guess it's in some register but the result of $eax is void.
There is no register called eax on ARM.
To know which register the parameter is in, you need to know calling convention.
Looks like you are using 32-bit ARM. From above link:
r0 to r3: used to hold argument values passed to a subroutine
So you should do info registers, verify that r0 == 0xae63bb40, r1 == 8 and find the offset in r2.

Sounds like example code has assigned the parameter variable to local variable already, so print that value will be exactly the same as optimized out parameters.
mOffset = offset;
mLength = length;

Related

Converting my C++ Program to ARM Assembly

I have been assigned to convert this program to arm assembly v8.
int power(int x, int y){
if (x == 0){
return 0;
}
else if (y < 0){
return 0;
}
else if (y == 0){
return 1;
}
else {
return x * power(x, y - 1);
}
}
Although I'm not very familiar with ARM assembly language and would like to know where to start.
I have attempted to research a bit on this but ultimately found very little on the internet about ARM.

The magic command is arm-linux-gnueabi-gcc -S -O2 -march=armv8-a power.c.
I used arm-linux-gnueabi-gcc since I work on an X86-64 machine and gcc does not have ARM targets available. If you are on an arm system, you should be able to use regular gcc instead. If not it will error, but no harm done.
-S tells gcc to output assembly.
The -O2 is optional and just helps to optimize the code slightly and reduce debug clutter from the result.
-march=armv8-a tells it to use the ARM v8 target while compiling. I chose armv8-a somewhat arbitrarily. According to the docs all of the ARM v8 are armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a, armv8.5-a, armv8.6-a, armv8-m.base, armv8-m.main, and armv8.1-m.main. I have no idea what the differences are so you may want to choose a different one.
power.c just tells it which file to compile. Since we don't specify an output file (Ex: -o output.asm), the assembly will be outputted to power.s.
If you are not compiling on an arm machine that has provides the desired target with regular gcc, you can use arm-linux-gnueabi-gcc instead. If you do not have it installed, you can install it with:
sudo apt-get update
sudo apt-get install gcc-arm-linux-gnueabi binutils-arm-linux-gnueabi
Output
If anyone is curious, this is the output I received when I tried it on my machine.
.arch armv8-a
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "testing.c"
.text
.align 2
.global power
.syntax unified
.arm
.fpu softvfp
.type power, %function
power:
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
clz r2, r0
mov r3, r0
lsr r2, r2, #5
orrs r2, r2, r1, lsr #31
bne .L4
cmp r1, #0
mov r0, #1
bxeq lr
.L3:
subs r1, r1, #1
mul r0, r3, r0
bne .L3
bx lr
.L4:
mov r0, #0
bx lr
.size power, .-power
.ident "GCC: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0"
.section .note.GNU-stack,"",%progbits
How can I get started.
Here is my thought process for how I was able to solve this problem. Looking at the problem I could tell it would probably be in two parts:
How can that code be compiled to assembly?
For the first part I found this answer which provided the command gcc -S -fverbose-asm -O2 foo.c. After testing it, I decided to remove the -fverbose-asm since only seemed to provide clutter for a program this small.
How can I set the compiler target to ARM v8?
After a quick google search I found that gcc lets you specify the target architecture with -march=xxx. My next step was to find a list of ARM architectures that I could select from. After finding gcc.gnu.org/onlinedocs/gcc/ARM-Options.html, I selected armv8-a since it sounded the most correct. When I tried it out, gcc told me that the target architecture could not be found. This was not really a surprise since I am on x86-64 and usually compilers come with the compatible targets to reduce the space required. I knew this likely meant I would need to identify the apt package which provided the arm targets so I searched around until I found this answer which filled in the rest of the information I needed.

Compiler explorer is the friend in a such simple cases.
ARMv8-a Clang Assembly with the compiler option -O1 for keeping the recursion:
# Compilation provided by Compiler Explorer at https://godbolt.org/
power(int, int): // #power(int, int)
stp x29, x30, [sp, #-32]! // 16-byte Folded Spill
str x19, [sp, #16] // 8-byte Folded Spill
mov x29, sp
mov w19, w0
mov w0, wzr
cbz w19, .LBB0_5
tbnz w1, #31, .LBB0_5
cbz w1, .LBB0_4
sub w1, w1, #1
mov w0, w19
bl power(int, int)
mul w0, w0, w19
b .LBB0_5
.LBB0_4:
mov w0, #1
.LBB0_5:
ldr x19, [sp, #16] // 8-byte Folded Reload
ldp x29, x30, [sp], #32 // 16-byte Folded Reload
ret
ARM GCC (linux) Assembly with the compiler option -O1 for keeping the recursion:
# Compilation provided by Compiler Explorer at https://godbolt.org/
power(int, int):
push {r4, lr}
mov r4, r0
clz r0, r0
lsrs r0, r0, #5
orrs r3, r0, r1, lsr #31
it ne
movne r0, #0
beq .L6
.L1:
pop {r4, pc}
.L6:
movs r0, #1
cmp r1, #0
beq .L1
subs r1, r1, #1
mov r0, r4
bl power(int, int)
mul r0, r4, r0
b .L1
ARM GCC (none) Assembly with the compiler option -O1 for keeping the recursion:
# Compilation provided by Compiler Explorer at https://godbolt.org/
power(int, int):
push {r4, lr}
mov r4, r0
rsbs r0, r0, #1
movcc r0, #0
orrs r3, r0, r1, lsr #31
movne r0, #0
beq .L6
.L1:
pop {r4, lr}
bx lr
.L6:
cmp r1, #0
moveq r0, #1
beq .L1
sub r1, r1, #1
mov r0, r4
bl power(int, int)
mul r0, r4, r0
b .L1
ARM64 GCC Assembly with the compiler option -O1 for keeping the recursion:
# Compilation provided by Compiler Explorer at https://godbolt.org/
power(int, int):
cmp w1, 0
ccmp w0, 0, 4, ge
bne .L9
mov w0, 0
ret
.L9:
stp x29, x30, [sp, -32]!
mov x29, sp
str x19, [sp, 16]
mov w19, w0
mov w0, 1
cbnz w1, .L10
.L1:
ldr x19, [sp, 16]
ldp x29, x30, [sp], 32
ret
.L10:
sub w1, w1, #1
mov w0, w19
bl power(int, int)
mul w0, w19, w0
b .L1

Does armclang saves all needed register on stack with attribute("IRQ")?

I'm working with Keil ARMCompiler 6.15 (armclang.exe) and I'm in doubt of the correctness of the generated assembler code.
It seems to me that the attribute 'interrupt("IRQ")' is ignored.
For me r1 and r2 should be saved on the stack, too.
When I remove the attribute 'used' my complete function is removed (optimization).
Can anyone see the mistake I made or what I've forgotten?
Originally the code was created for gcc.
Attributes used for interrupt routines:
#define INTERRUPT_PROCEDURE __attribute__((interrupt("IRQ"),used,section(".IsrSection")))
#define ISR_VARIABLE __attribute__((section(".IsrSection")))
#define FAST_SHARED_DATA __attribute__((section(".FastSharedDataSection")))
C++ Code:
uint64_t volatile FAST_SHARED_DATA systick_value = uint64_t(0);
extern "C" {
void INTERRUPT_PROCEDURE SysTick_Handler()
{
systick_value++;
}
}
Assembler Code:
0x08001280 push {r4, r6, r7, lr}
0x08001282 add r7, sp, #8
0x08001284 mov r4, sp
0x08001286 bfc r4, #0, #3
0x0800128a mov sp, r4
0x0800128c movw r0, #8192 ; 0x2000
0x08001290 movt r0, #8192 ; 0x2000
0x08001294 ldrd r1, r2, [r0]
0x08001298 adds r1, #1
0x0800129a adc.w r2, r2, #0
0x0800129e strd r1, r2, [r0]
0x080012a2 sub.w r4, r7, #8
0x080012a6 mov sp, r4
0x080012a8 pop {r4, r6, r7, pc}
0x080012aa movs r0, r0
0x080012ac movs r0, r0
0x080012ae movs r0, r0

You do not need this attribute. It is needed in very rare circumstances when the stack is not aligned to 8 bytes (STKALGN bit is not set) by the hardware and you are going to use functions with 64 bits parameters (like uint64_t). ARM automatically saves R0-R3 + some others registers on the stack when entering the ISR handler. If you use FPU you may want to enable FPU registers stackup as well.

Program OK outside debugger, SIGILL under debugger when stepping?

I'm trying to debug a program on a BeagleBone Black. Outside the debugger it produces an incorrect result but no SIGILL. It also runs OK under the debugger without a breakpoint. However it produces a SIGILL with a breakpoint set when stepping. The program and library does not use SIGILL-based cpu feature probes. However, I don't know what GDB is doing.
Under the debugger I am seeing:
(gdb) b main
Breakpoint 1 at 0x26f20: file test.cxx, line 22.
(gdb) r
Starting program: /home/cryptopp/test.exe
Breakpoint 1, main (argc=0x1, argv=0xbeffea54) at test.cxx:22
22 byte key[16] = {0};
(gdb) n
23 byte iv[12] = {0};
(gdb)
25 GCM<AES>::Encryption enc;
(gdb)
26 enc.SetKeyWithIV(key, 16, iv, 12);
(gdb)
28 std::string plain(0x00, 16);
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00026d5c in std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)
()
(gdb) n
Single stepping until exit from function _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_,
which has no line number information.
Program terminated with signal SIGILL, Illegal instruction.
The program no longer exists.
And:
(gdb) shell echo _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ | c++filt
std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)
I tried searching for this issue, but I have not been able to locate a hit. I'm getting too much noise.
Why am I experiencing a SIGILL when GDB sets a breakpoint, and how do I work around it?
NEON is the problem I am trying to investigate. Here's the command line used for the program and library:
$ echo $CXXFLAGS
-DDEBUG -g3 -O0 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=hard
$ g++ $CXXFLAGS test.cxx ./libcryptopp.a -o test.exe
And:
$ gdb --version
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
$ uname -a
Linux beaglebone 4.1.15-ti-rt-r40 #1 SMP PREEMPT RT Thu Jan 7 23:32:08 UTC 2016 armv7l GNU/Linux
$ cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 2 (v7l)
BogoMIPS : 996.14
Features : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc08
CPU revision : 2
Hardware : Generic AM33XX (Flattened Device Tree)
Revision : 0000
Serial : 0000000000000000
And:
Breakpoint 1, main (argc=0x1, argv=0xbeffea54) at test.cxx:22
22 byte key[16] = {0};
(gdb) n
23 byte iv[12] = {0};
(gdb)
25 GCM<AES>::Encryption enc;
(gdb)
26 enc.SetKeyWithIV(key, 16, iv, 12);
(gdb)
28 std::string plain(0x00, 16);
(gdb)
Program received signal SIGILL, Illegal instruction.
0x00026d5c in std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)
()
(gdb) up
#1 0x00026f82 in main (argc=0x1, argv=0xbeffea54) at test.cxx:28
28 std::string plain(0x00, 16);
(gdb) disass
Dump of assembler code for function main(int, char**):
0x00026f10 <+0>: push {r4, r7, lr}
0x00026f12 <+2>: sub.w sp, sp, #916 ; 0x394
0x00026f16 <+6>: add r7, sp, #16
0x00026f18 <+8>: adds r3, r7, #4
0x00026f1a <+10>: str r0, [r3, #0]
0x00026f1c <+12>: mov r3, r7
0x00026f1e <+14>: str r1, [r3, #0]
0x00026f20 <+16>: add.w r3, r7, #692 ; 0x2b4
0x00026f24 <+20>: movs r2, #0
0x00026f26 <+22>: str r2, [r3, #0]
0x00026f28 <+24>: adds r3, #4
0x00026f2a <+26>: movs r2, #0
0x00026f2c <+28>: str r2, [r3, #0]
0x00026f2e <+30>: adds r3, #4
0x00026f30 <+32>: movs r2, #0
0x00026f32 <+34>: str r2, [r3, #0]
0x00026f34 <+36>: adds r3, #4
0x00026f36 <+38>: movs r2, #0
0x00026f38 <+40>: str r2, [r3, #0]
0x00026f3a <+42>: adds r3, #4
0x00026f3c <+44>: add.w r3, r7, #680 ; 0x2a8
0x00026f40 <+48>: movs r2, #0
---Type <return> to continue, or q <return> to quit---
0x00026f42 <+50>: str r2, [r3, #0]
0x00026f44 <+52>: adds r3, #4
0x00026f46 <+54>: movs r2, #0
0x00026f48 <+56>: str r2, [r3, #0]
0x00026f4a <+58>: adds r3, #4
0x00026f4c <+60>: movs r2, #0
0x00026f4e <+62>: str r2, [r3, #0]
0x00026f50 <+64>: adds r3, #4
0x00026f52 <+66>: add.w r3, r7, #240 ; 0xf0
0x00026f56 <+70>: mov r0, r3
0x00026f58 <+72>: bl 0x2a804 <CryptoPP::GCM_Final<CryptoPP::Rijndael, (CryptoPP::GCM_TablesOption)0, true>::GCM_Final()>
0x00026f5c <+76>: add.w r1, r7, #240 ; 0xf0
0x00026f60 <+80>: add.w r2, r7, #692 ; 0x2b4
0x00026f64 <+84>: add.w r4, r7, #680 ; 0x2a8
0x00026f68 <+88>: movs r3, #12
0x00026f6a <+90>: str r3, [sp, #0]
0x00026f6c <+92>: mov r0, r1
0x00026f6e <+94>: mov r1, r2
0x00026f70 <+96>: movs r2, #16
0x00026f72 <+98>: mov r3, r4
0x00026f74 <+100>: bl 0x2da0c <CryptoPP::SimpleKeyingInterface::SetKeyWithIV(unsigned char const*, unsigned int, unsigned char const*, unsigned int)>
---Type <return> to continue, or q <return> to quit---
0x00026f78 <+104>: add.w r3, r7, #708 ; 0x2c4
0x00026f7c <+108>: mov r0, r3
0x00026f7e <+110>: blx 0x26d58 <_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_+852>
=> 0x00026f82 <+114>: add.w r2, r7, #676 ; 0x2a4
0x00026f86 <+118>: add.w r3, r7, #708 ; 0x2c4
0x00026f8a <+122>: mov r0, r2
0x00026f8c <+124>: movs r1, #0
0x00026f8e <+126>: movs r2, #16
...

Thanks to #ks1322, this is a known GDB/Kernel bug. See GDB crashes on debugging multithreaded program on ARM SMP dual core system in the GDB issue tracker.
According to the Debian BTS, this is also a known issue. See SIGILL when stepping through application on armhf in the Debian BTS.
The bug was refilled in hopes that it might actually be fixed sometime in the next year or two. See GDB Crash due to GDB/Kernel generated SIGILL
This is why I despise Debian's bug reporting systems. Stuff gets reported and then it just rots. Nothing gets fixed.

Assembly, global variable

I have the following source code:
const ClassTwo g_classTwo;
void ClassOne::first()
{
g_classTwo.doSomething(1);
}
void ClassOne::second()
{
g_classTwo.doSomething(2);
}
Which produces the following objdump:
void ClassOne::first()
{
1089c50: e1a0c00d mov ip, sp
1089c54: e92dd800 push {fp, ip, lr, pc}
1089c58: e24cb004 sub fp, ip, #4
1089c5c: e24dd008 sub sp, sp, #8
1089c60: e50b0010 str r0, [fp, #-16]
g_classTwo.doSomething(1);
1089c64: e59f3014 ldr r3, [pc, #20] ; 1089c80 <ClassOne::first()+0x30>
1089c68: e08f3003 add r3, pc, r3
1089c6c: e1a00003 mov r0, r3
1089c70: e3a01001 mov r1, #1
1089c74: ebffffe2 bl 1089c04 <ClassTwo::doSomething(int) const>
}
1089c78: e24bd00c sub sp, fp, #12
1089c7c: e89da800 ldm sp, {fp, sp, pc}
1089c80: 060cd35c .word 0x060cd35c
01089c84 <ClassOne::second()>:
void ClassOne::second()
{
1089c84: e1a0c00d mov ip, sp
1089c88: e92dd800 push {fp, ip, lr, pc}
1089c8c: e24cb004 sub fp, ip, #4
1089c90: e24dd008 sub sp, sp, #8
1089c94: e50b0010 str r0, [fp, #-16]
g_classTwo.doSomething(2);
1089c98: e59f3014 ldr r3, [pc, #20] ; 1089cb4 <ClassOne::second()+0x30>
1089c9c: e08f3003 add r3, pc, r3
1089ca0: e1a00003 mov r0, r3
1089ca4: e3a01002 mov r1, #2
1089ca8: ebffffd5 bl 1089c04 <ClassTwo::doSomething(int) const>
}
1089cac: e24bd00c sub sp, fp, #12
1089cb0: e89da800 ldm sp, {fp, sp, pc}
1089cb4: 060cd328 .word 0x060cd328
Both methods are loading the address of g_classTwo with a pc relative offset: ldr r3, [pc, #20], which translates to 0x060cd35c and 0x060cd328 for the first and second method respectively.
Why are the addresses different even though they are both addressing the same global variable?
How do those addresses relate to the nm output for the same symbol: 07156fcc b g_classTwo?

In ClassOne::first() you have:
1089c64: e59f3014 ldr r3, [pc, #20] ; 1089c80 <ClassOne::first()+0x30>
1089c68: e08f3003 add r3, pc, r3
1089c6c: e1a00003 mov r0, r3
...
1089c80: 060cd35c .word 0x060cd35c
In ClassOne::second() you have:
1089c98: e59f3014 ldr r3, [pc, #20] ; 1089cb4 <ClassOne::second()+0x30>
1089c9c: e08f3003 add r3, pc, r3
1089ca0: e1a00003 mov r0, r3
...
1089cb4: 060cd328 .word 0x060cd328
In both, r0 is the this pointer (g_classTwo). As you can see, after loading an address from the literal pool into r3 it is summed to pc to get r0.
In ClassOne::first(), you get r0 = pc + r3 = 0x01089c70 + 0x060cd35c = 0x07156fcc.
In ClassOne::second(), you get r0 = pc + r3 = 0x01089ca4 + 0x060cd328 = 0x07156fcc.
So for both the this pointer is 0x07156fcc, which is the address of g_classTwo.

Looking for unnecessary buffer copies in assembly code

I am using Visual Studio 2008 C++ for Windows Mobile 6 ARMV4I and I'm trying to learn to read the ARM assembly code generated by VS to minimize unneessary buffer copies within an application. So, I've created a test application that looks like this:
#include <vector>
typedef std::vector< BYTE > Buf;
class Foo
{
public:
Foo( Buf b ) { b_.swap( b ); };
private:
Buf b_;
};
Buf Create()
{
Buf b( 1024 );
b[ 0 ] = 0x0001;
return b;
}
int _tmain( int argc, _TCHAR* argv[] )
{
Foo f( Create() );
return 0;
}
I'd like to understand if the buffer returned by Create is copied when given to the Foo constructor or if the compiler is able to optimize that copy away. In the Release build with optimizations turned on, this generates assembly like this:
class Foo
{
public:
Foo( Buf b ) { b_.swap( b ); };
0001112C stmdb sp!, {r4 - r7, lr}
00011130 mov r7, r0
00011134 mov r3, #0
00011138 str r3, this
0001113C str r3, [r7, #4]
00011140 str r3, [r7, #8]
00011144 ldr r3, this
00011148 ldr r2, this
0001114C mov r5, r7
00011150 mov r4, r1
00011154 str r3, this, #4
00011158 str r2, this, #4
0001115C mov r6, r1
00011160 ldr r2, this
00011164 ldr r3, this
00011168 mov lr, r7
0001116C str r3, this
00011170 str r2, this
00011174 ldr r2, [lr, #8]!
00011178 ldr r3, [r6, #8]!
0001117C str r3, this
00011180 str r2, this
00011184 ldr r3, this
00011188 movs r0, r3
0001118C beq |Foo::Foo + 0x84 ( 111b0h )|
00011190 ldr r3, [r1, #8]
00011194 sub r1, r3, r0
00011198 cmp r1, #0x80
0001119C bls |Foo::Foo + 0x80 ( 111ach )|
000111A0 bl 000112D4
000111A4 mov r0, r7
000111A8 ldmia sp!, {r4 - r7, pc}
000111AC bl |stlp_std::__node_alloc::_M_deallocate ( 11d2ch )|
000111B0 mov r0, r7
000111B4 ldmia sp!, {r4 - r7, pc}
--- ...\stlport\stl\_vector.h -----------------------------
// snip!
--- ...\asm_test.cpp
private:
Buf b_;
};
Buf Create()
{
00011240 stmdb sp!, {r4, lr}
00011244 mov r4, r0
Buf b( 1024 );
00011248 mov r1, #1, 22
0001124C bl |
b[ 0 ] = 0x0001;
00011250 ldr r3, [r4]
00011254 mov r2, #1
return b;
}
int _tmain( int argc, _TCHAR* argv[] )
{
00011264 str lr, [sp, #-4]!
00011268 sub sp, sp, #0x18
Foo f( Create() );
0001126C add r0, sp, #0xC
00011270 bl |Create ( 11240h )|
00011274 mov r1, r0
00011278 add r0, sp, #0
0001127C bl |Foo::Foo ( 1112ch )|
return 0;
00011280 ldr r0, argc
00011284 cmp r0, #0
00011288 beq |wmain + 0x44 ( 112a8h )|
0001128C ldr r3, [sp, #8]
00011290 sub r1, r3, r0
00011294 cmp r1, #0x80
00011298 bls |wmain + 0x40 ( 112a4h )|
0001129C bl 000112D4
000112A0 b |wmain + 0x44 ( 112a8h )|
000112A4 bl |stlp_std::__node_alloc::_M_deallocate ( 11d2ch )|
000112A8 mov r0, #0
}
What patterns can I look for in the assembly code to understand where the Buf structure is being copied?

Analyzing Create is fairly straightforward, because the code is so short. NRVO clearly has been applied here because the return statement generated no instructions, the return value is constructed in-place in r0.
The copy that would take place for Foo::Foo's pass-by-value parameter is slightly harder to analyze, but there's very little code between the calls to Create and Foo::Foo where the copy would have to take place, and nothing that would do a deep copy of a std::vector. So it looks like that copy has been eliminated as well. The other possibility is a custom calling convention for Foo::Foo where the argument is actually passed by reference and copied inside the function. You'd need someone capable of deeper ARM assembly analysis that I am to rule that out.

The buffer will be copied; you are using pass by value semantics of c++; no compiler will optimize that for you. How its copied will depend on the copy constructor of std::vector.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js