My problem is the following. I have a code unit composed of various c files, say for example
file1.c
file2.c
file3.c
that are all compiled with GCC to a unique object "object.o", which in turn is then linked with other objects to give at the end the executable "application.out", running on VxWorks.
Since I'm doing unit testing on "object.o", my need is to stimulate all the possible ways through the code. Specifically, there are situations where I should have mock functions executed instead of original ones in order to simulate error occurrences.
Suppose for example that there is a function "func_caller" that I'm trying to test that, at some point in the execution, makes a call to another function "func_called" (declared as static).
Since I DON'T WANT TO MODIFY THE ORIGINAL CODE, I wonder if there is a way to manipulate the instruction pointers in such a way that when "func_called" is called, it actually executes another mock function "func_called_mock" and the caller "func_caller" does not notice anything.
Thanks in advance.
The most direct method of overriding function calls would be to use VxWorks's load time linking. Consider the following source:
file1.c:
#include <stdio.h>
int function1 (void);
int function1 ()
{
printf ("function1 called\n");
return 1;
}
file2.c:
#include <stdio.h>
int function2 (void);
int function2 ()
{
printf ("function2 called\n");
return 2;
}
file3.c:
int function1 (void);
int function2 (void);
int function3 (void);
int function3 ()
{
function1 ();
function2 ();
return 0;
}
mock.c:
#include <stdio.h>
int function1 (void);
int function2 (void);
int function1 ()
{
printf ("mock function1 called\n");
return 1;
}
int function2 ()
{
printf ("mock function2 called\n");
return 2;
}
When you load an object, its functions are added to the global symbol table.
-> ld < file1.o
value = 273740816 = 0x1050f410
-> lkup "function"
function1 0x108b0000 text (file1.o)
value = 0 = 0x0
->
When you load an object that uses functions already in the symbol table, each call will be immediately resolved to the last address associated with that symbol in the table.
-> ld < file2.o
value = 292535232 = 0x116fbbc0
-> ld < file3.o
value = 292537592 = 0x116fc4f8
-> lkup "function"
function1 0x108b0000 text (file1.o)
function2 0x108d0000 text (file2.o)
function3 0x108f0000 text (file3.o)
value = 0 = 0x0
-> l function3
function3:
0x108f0000 55 PUSH EBP
0x108f0001 89 e5 MOV EBP, ESP
0x108f0003 56 PUSH ESI
0x108f0004 57 PUSH EDI
0x108f0005 e8 f6 ff fb ff CALL function1
0x108f000a e8 f1 ff fd ff CALL function2
0x108f000f 31 c0 XOR EAX, EAX
0x108f0011 5f POP EDI
0x108f0012 5e POP ESI
0x108f0013 89 ec MOV ESP, EBP
value = 0 = 0x0
-> function3
function1 called
function2 called
value = 0 = 0x0
->
Although l() helpfully displays function names, no symbols are actually loaded into memory with the object. Instead, a call to the last address associated with the function is loaded. So, a previously loaded function may be overridden by loading another function of the same name.
-> unld "file3.o"
value = 0 = 0x0
-> ld < mock.o
value = 292537592 = 0x116fc4f8
-> ld < file3.o
value = 292539496 = 0x116fcc68
-> lkup "function"
function1 0x108f0000 text (mock.o)
function1 0x108b0000 text (file1.o)
function2 0x108f0020 text (mock.o)
function2 0x108d0000 text (file2.o)
function3 0x10910000 text (file3.o)
value = 0 = 0x0
-> l function3
function3:
0x10910000 55 PUSH EBP
0x10910001 89 e5 MOV EBP, ESP
0x10910003 56 PUSH ESI
0x10910004 57 PUSH EDI
0x10910005 e8 f6 ff fd ff CALL function1
0x1091000a e8 11 00 fe ff CALL function2
0x1091000f 31 c0 XOR EAX, EAX
0x10910011 5f POP EDI
0x10910012 5e POP ESI
0x10910013 89 ec MOV ESP, EBP
value = 0 = 0x0
-> function3
mock function1 called
mock function2 called
value = 0 = 0x0
->
Note that for this method to work, the called and calling functions cannot be compiled into the same object. You might also note that the addresses to be called don't match those in the symbol table. This is the result of executing the above in VxSim. The VxSim loader actually calls the loader of the underlying operating system. So, these addresses don't match those in the symbol table and the assembly reflects the underlying Pentium architecture on which WorkBench is being run.
A function call may also be overridden by directly manipulating the address to be called in memory. This method is going to be implementation dependent. Below, this is demonstrated for source compiled for PPC using the gcc -mlongcall option. This has been run on an actual target, not VxSim.
-> ld < file1.o
value = 33538216 = 0x1ffc0a8 = function1 + 0x498
-> ld < file2.o
value = 33548336 = 0x1ffe830 = function2 + 0x80
-> ld < mock.o
value = 33549600 = 0x1ffed20 = function2 + 0x570
-> ld < file3.o
value = 33550744 = 0x1fff198 = function2 + 0x9e8
->
-> lkup "function"
function1 0x01ffbef8 text (mock.o)
function1 0x01ffbc10 text (file1.o)
function2 0x01ffbf58 text (mock.o)
function2 0x01ffe7b0 text (file2.o)
function3 0x01ffe558 text (file3.o)
value = 0 = 0x0
->
-> function3
mock function1 called
mock function2 called
value = 0 = 0x0
->
-> l function3
function3:
0x1ffe558 9421ffe8 stwu r1,-24(r1)
0x1ffe55c 7c0802a6 mfspr r0,LR
0x1ffe560 93a1000c stw r29,12(r1)
0x1ffe564 93c10010 stw r30,16(r1)
0x1ffe568 93e10014 stw r31,20(r1)
0x1ffe56c 9001001c stw r0,28(r1)
0x1ffe570 7c3f0b78 or r31,r1,r1
0x1ffe574 3d200200 lis r9,512
0x1ffe578 3ba9bef8 addi r29,r9,-16648
0x1ffe57c 7fa803a6 mtspr LR,r29
value = 33547648 = 0x1ffe580 = function3 + 0x28
->
-> *0x1ffe578
function3 + 0x20 = 0x1ffe578: value = 1000980216 = 0x3ba9bef8
-> *0x1ffe578 = 0x3ba9bc10
function3 + 0x20 = 0x1ffe578: value = 1000979472 = 0x3ba9bc10
-> *0x1ffe578
function3 + 0x20 = 0x1ffe578: value = 1000979472 = 0x3ba9bc10
->
-> function3
function1 called
mock function2 called
value = 0 = 0x0
->
Obviously, directly manipulating pointers in memory is going to quickly become tedious. Additionally, memory protection will prevent you from changing RTPs or objects loaded in VxSim. (Hence, why I ran this on actual hardware.) I mention the possibility, primarily because it seems to best match your question's statement.
Finally, for non-trivial unit testing, you may want to consider a tool designed specifically for the task. Try a search on "vxworks unit test framework". I don't have deep experience with any particular tool (and don't want to come across spammy). Perhaps, someone else here can provide a good suggestion.
Related
I have a broadly used function foo(int a, int b) and I want to provide a special version of foo that performs differently if a is say 1.
a) I don't want to go through the whole code base and change all occurrences of foo(1, b) to foo1(b) because the rules on arguments may change and I dont want to keep going through the code base whenever the rules on arguments change.
b) I don't want to burden function foo with an "if (a == 1)" test because of performance issues.
It seems to me to be a fundamental skill of the compiler to call the right code based on what it can see in front of it. Or is this a possible missing feature of C++ that requires macros or something to handle currently.
Simply write
inline int foo(int a, int b)
{
if (a==1) {
// skip complex code and call easy code
call_easy(b);
} else {
// complex code here
do_complex(a, b);
}
}
When you call
foo(1, 10);
the optimizer will/should simply insert a call_easy(b).
Any decent optimizer will inline the function and detect if the function has been called with a==1. Also I think that the entire constexpr mentioned in other posts is nice, but not really necessary in your case. constexpr is very useful, if you want to resolve values at compile time. But you simply asked to switch code paths based on a value at runtime. The optimizer should be able to detect that.
In order to detect that, the optimizer needs to see your function definition at all places where your function is called. Hence the inline requirement - although compilers such as Visual Studio have a "generate code at link time" feature, that reduces this requirement somewhat.
Finally you might want to look at C++ attributes [[likely]] (I think). I haven't worked with them yet, but they are supposed to tell the compiler which execution path is likely and give a hint to the optimizer.
And why don't you experiment a little and look at the generated code in the debugger/disassemble. That will give you a feel for the optimizer. Don't forget that the optimizer is likely only active in Release Builds :)
Templates work in compile time and you want to decide in runtime which is never possible. If and only if you really can call your function with constexpr values, than you can change to a template, but the call becomes foo<1,2>() instead of foo(1,2); "performance issues"... that's really funny! If that single compare assembler instruction is the performance problem... yes, than you have done everything super perfect :-)
BTW: If you already call with constexpr values and the function is visible in the compilation unit, you can be sure the compiler already knows to optimize it away...
But there is another way to handle such things if you really have constexpr values sometimes and your algorithm inside the function can be constexpr evaluated. In that case, you can decide inside the function if your function was called in a constexpr context. If that is the case, you can do a full compile time algorithm which also can contain your if ( a== 1) which will be fully evaluated in compile time. If the function is not called in constexpr context, the function is running as before without any additional overhead.
To do such decision in compile time we need the actual C++ standard ( C++20 )!
constexpr int foo( int a, int)
{
if (std::is_constant_evaluated() )
{ // this part is fully evaluated in compile time!
if ( a == 1 )
{
return 1;
}
else
{
return 2;
}
}
else
{ // and the rest runs as before in runtime
if ( a == 0 )
{
return 3;
}
else
{
return 4;
}
}
}
int main()
{
constexpr int res1 = foo( 1,0 ); // fully evaluated during compile time
constexpr int res2 = foo( 2,0 ); // also full compile time
std::cout << res1 << std::endl;
std::cout << res2 << std::endl;
std::cout << foo( 5, 0) << std::endl; // here we go in runtime
std::cout << foo( 0, 0) << std::endl; // here we go in runtime
}
That code will return:
1
2
4
3
So we do not need to go with classic templates, no need to change the rest of the code but have full compile time optimization if possible.
#Sebastian's suggestion works at least in the simple case with all optimisation levels except -O0 in g++ 9.3.0 on Ubuntu 20.04 in c++20 mode. Thanks again.
See below disassembly always calling directly the correct subfunction func1 or func2 instead of the top function func(). A similar disassembly after -O0 shows only the top level func() being called leaving the decision to run-time which is not desired.
I hope this will work in production code and perhaps with multiple hard coded arguments.
Breakpoint 1, main () at p1.cpp:24
24 int main() {
(gdb) disass /m
Dump of assembler code for function main():
6 inline void func(int a, int b) {
7
8 if (a == 1)
9 func1(b);
10 else
11 func2(a,b);
12 }
13
14 void func1(int b) {
15 std::cout << "func1 " << " " << " " << b << std::endl;
16 }
17
18 void func2(int a, int b) {
19 std::cout << "func2 " << a << " " << b << std::endl;
20 }
21
22 };
23
24 int main() {
=> 0x0000555555555286 <+0>: endbr64
0x000055555555528a <+4>: push %rbp
0x000055555555528b <+5>: push %rbx
0x000055555555528c <+6>: sub $0x18,%rsp
0x0000555555555290 <+10>: mov $0x28,%ebp
0x0000555555555295 <+15>: mov %fs:0x0(%rbp),%rax
0x000055555555529a <+20>: mov %rax,0x8(%rsp)
0x000055555555529f <+25>: xor %eax,%eax
25
26 X x1;
27
28 int b=1;
29 x1.func(1,b);
0x00005555555552a1 <+27>: lea 0x7(%rsp),%rbx
0x00005555555552a6 <+32>: mov $0x1,%esi
0x00005555555552ab <+37>: mov %rbx,%rdi
0x00005555555552ae <+40>: callq 0x55555555531e <X::func1(int)>
30
31 b=2;
32 x1.func(2,b);
0x00005555555552b3 <+45>: mov $0x2,%edx
0x00005555555552b8 <+50>: mov $0x2,%esi
0x00005555555552bd <+55>: mov %rbx,%rdi
0x00005555555552c0 <+58>: callq 0x5555555553de <X::func2(int, int)>
33
34 b=3;
35 x1.func(1,b);
0x00005555555552c5 <+63>: mov $0x3,%esi
0x00005555555552ca <+68>: mov %rbx,%rdi
0x00005555555552cd <+71>: callq 0x55555555531e <X::func1(int)>
36
37 b=4;
38 x1.func(2,b);
0x00005555555552d2 <+76>: mov $0x4,%edx
0x00005555555552d7 <+81>: mov $0x2,%esi
0x00005555555552dc <+86>: mov %rbx,%rdi
0x00005555555552df <+89>: callq 0x5555555553de <X::func2(int, int)>
39
40 return 0;
0x00005555555552e4 <+94>: mov 0x8(%rsp),%rax
0x00005555555552e9 <+99>: xor %fs:0x0(%rbp),%rax
0x00005555555552ee <+104>: jne 0x5555555552fc <main()+118>
0x00005555555552f0 <+106>: mov $0x0,%eax
0x00005555555552f5 <+111>: add $0x18,%rsp
0x00005555555552f9 <+115>: pop %rbx
0x00005555555552fa <+116>: pop %rbp
0x00005555555552fb <+117>: retq
0x00005555555552fc <+118>: callq 0x555555555100 <__stack_chk_fail#plt>
End of assembler dump.
Very simply, what is tail-call optimization?
More specifically, what are some small code snippets where it could be applied, and where not, with an explanation of why?
Tail-call optimization is where you are able to avoid allocating a new stack frame for a function because the calling function will simply return the value that it gets from the called function. The most common use is tail-recursion, where a recursive function written to take advantage of tail-call optimization can use constant stack space.
Scheme is one of the few programming languages that guarantee in the spec that any implementation must provide this optimization, so here are two examples of the factorial function in Scheme:
(define (fact x)
(if (= x 0) 1
(* x (fact (- x 1)))))
(define (fact x)
(define (fact-tail x accum)
(if (= x 0) accum
(fact-tail (- x 1) (* x accum))))
(fact-tail x 1))
The first function is not tail recursive because when the recursive call is made, the function needs to keep track of the multiplication it needs to do with the result after the call returns. As such, the stack looks as follows:
(fact 3)
(* 3 (fact 2))
(* 3 (* 2 (fact 1)))
(* 3 (* 2 (* 1 (fact 0))))
(* 3 (* 2 (* 1 1)))
(* 3 (* 2 1))
(* 3 2)
6
In contrast, the stack trace for the tail recursive factorial looks as follows:
(fact 3)
(fact-tail 3 1)
(fact-tail 2 3)
(fact-tail 1 6)
(fact-tail 0 6)
6
As you can see, we only need to keep track of the same amount of data for every call to fact-tail because we are simply returning the value we get right through to the top. This means that even if I were to call (fact 1000000), I need only the same amount of space as (fact 3). This is not the case with the non-tail-recursive fact, and as such large values may cause a stack overflow.
Let's walk through a simple example: the factorial function implemented in C.
We start with the obvious recursive definition
unsigned fac(unsigned n)
{
if (n < 2) return 1;
return n * fac(n - 1);
}
A function ends with a tail call if the last operation before the function returns is another function call. If this call invokes the same function, it is tail-recursive.
Even though fac() looks tail-recursive at first glance, it is not as what actually happens is
unsigned fac(unsigned n)
{
if (n < 2) return 1;
unsigned acc = fac(n - 1);
return n * acc;
}
ie the last operation is the multiplication and not the function call.
However, it's possible to rewrite fac() to be tail-recursive by passing the accumulated value down the call chain as an additional argument and passing only the final result up again as the return value:
unsigned fac(unsigned n)
{
return fac_tailrec(1, n);
}
unsigned fac_tailrec(unsigned acc, unsigned n)
{
if (n < 2) return acc;
return fac_tailrec(n * acc, n - 1);
}
Now, why is this useful? Because we immediately return after the tail call, we can discard the previous stackframe before invoking the function in tail position, or, in case of recursive functions, reuse the stackframe as-is.
The tail-call optimization transforms our recursive code into
unsigned fac_tailrec(unsigned acc, unsigned n)
{
TOP:
if (n < 2) return acc;
acc = n * acc;
n = n - 1;
goto TOP;
}
This can be inlined into fac() and we arrive at
unsigned fac(unsigned n)
{
unsigned acc = 1;
TOP:
if (n < 2) return acc;
acc = n * acc;
n = n - 1;
goto TOP;
}
which is equivalent to
unsigned fac(unsigned n)
{
unsigned acc = 1;
for (; n > 1; --n)
acc *= n;
return acc;
}
As we can see here, a sufficiently advanced optimizer can replace tail-recursion with iteration, which is far more efficient as you avoid function call overhead and only use a constant amount of stack space.
TCO (Tail Call Optimization) is the process by which a smart compiler can make a call to a function and take no additional stack space. The only situation in which this happens is if the last instruction executed in a function f is a call to a function g (Note: g can be f). The key here is that f no longer needs stack space - it simply calls g and then returns whatever g would return. In this case the optimization can be made that g just runs and returns whatever value it would have to the thing that called f.
This optimization can make recursive calls take constant stack space, rather than explode.
Example: this factorial function is not TCOptimizable:
from dis import dis
def fact(n):
if n == 0:
return 1
return n * fact(n-1)
dis(fact)
2 0 LOAD_FAST 0 (n)
2 LOAD_CONST 1 (0)
4 COMPARE_OP 2 (==)
6 POP_JUMP_IF_FALSE 12
3 8 LOAD_CONST 2 (1)
10 RETURN_VALUE
4 >> 12 LOAD_FAST 0 (n)
14 LOAD_GLOBAL 0 (fact)
16 LOAD_FAST 0 (n)
18 LOAD_CONST 2 (1)
20 BINARY_SUBTRACT
22 CALL_FUNCTION 1
24 BINARY_MULTIPLY
26 RETURN_VALUE
This function does things besides call another function in its return statement.
This below function is TCOptimizable:
def fact_h(n, acc):
if n == 0:
return acc
return fact_h(n-1, acc*n)
def fact(n):
return fact_h(n, 1)
dis(fact)
2 0 LOAD_GLOBAL 0 (fact_h)
2 LOAD_FAST 0 (n)
4 LOAD_CONST 1 (1)
6 CALL_FUNCTION 2
8 RETURN_VALUE
This is because the last thing to happen in any of these functions is to call another function.
Probably the best high level description I have found for tail calls, recursive tail calls and tail call optimization is the blog post
"What the heck is: A tail call"
by Dan Sugalski. On tail call optimization he writes:
Consider, for a moment, this simple function:
sub foo (int a) {
a += 15;
return bar(a);
}
So, what can you, or rather your language compiler, do? Well, what it can do is turn code of the form return somefunc(); into the low-level sequence pop stack frame; goto somefunc();. In our example, that means before we call bar, foo cleans itself up and then, rather than calling bar as a subroutine, we do a low-level goto operation to the start of bar. Foo's already cleaned itself out of the stack, so when bar starts it looks like whoever called foo has really called bar, and when bar returns its value, it returns it directly to whoever called foo, rather than returning it to foo which would then return it to its caller.
And on tail recursion:
Tail recursion happens if a function, as its last operation, returns
the result of calling itself. Tail recursion is easier to deal with
because rather than having to jump to the beginning of some random
function somewhere, you just do a goto back to the beginning of
yourself, which is a darned simple thing to do.
So that this:
sub foo (int a, int b) {
if (b == 1) {
return a;
} else {
return foo(a*a + a, b - 1);
}
gets quietly turned into:
sub foo (int a, int b) {
label:
if (b == 1) {
return a;
} else {
a = a*a + a;
b = b - 1;
goto label;
}
What I like about this description is how succinct and easy it is to grasp for those coming from an imperative language background (C, C++, Java)
GCC C minimal runnable example with x86 disassembly analysis
Let's see how GCC can automatically do tail call optimizations for us by looking at the generated assembly.
This will serve as an extremely concrete example of what was mentioned in other answers such as https://stackoverflow.com/a/9814654/895245 that the optimization can convert recursive function calls to a loop.
This in turn saves memory and improves performance, since memory accesses are often the main thing that makes programs slow nowadays.
As an input, we give GCC a non-optimized naive stack based factorial:
tail_call.c
#include <stdio.h>
#include <stdlib.h>
unsigned factorial(unsigned n) {
if (n == 1) {
return 1;
}
return n * factorial(n - 1);
}
int main(int argc, char **argv) {
int input;
if (argc > 1) {
input = strtoul(argv[1], NULL, 0);
} else {
input = 5;
}
printf("%u\n", factorial(input));
return EXIT_SUCCESS;
}
GitHub upstream.
Compile and disassemble:
gcc -O1 -foptimize-sibling-calls -ggdb3 -std=c99 -Wall -Wextra -Wpedantic \
-o tail_call.out tail_call.c
objdump -d tail_call.out
where -foptimize-sibling-calls is the name of generalization of tail calls according to man gcc:
-foptimize-sibling-calls
Optimize sibling and tail recursive calls.
Enabled at levels -O2, -O3, -Os.
as mentioned at: How do I check if gcc is performing tail-recursion optimization?
I choose -O1 because:
the optimization is not done with -O0. I suspect that this is because there are required intermediate transformations missing.
-O3 produces ungodly efficient code that would not be very educative, although it is also tail call optimized.
Disassembly with -fno-optimize-sibling-calls:
0000000000001145 <factorial>:
1145: 89 f8 mov %edi,%eax
1147: 83 ff 01 cmp $0x1,%edi
114a: 74 10 je 115c <factorial+0x17>
114c: 53 push %rbx
114d: 89 fb mov %edi,%ebx
114f: 8d 7f ff lea -0x1(%rdi),%edi
1152: e8 ee ff ff ff callq 1145 <factorial>
1157: 0f af c3 imul %ebx,%eax
115a: 5b pop %rbx
115b: c3 retq
115c: c3 retq
With -foptimize-sibling-calls:
0000000000001145 <factorial>:
1145: b8 01 00 00 00 mov $0x1,%eax
114a: 83 ff 01 cmp $0x1,%edi
114d: 74 0e je 115d <factorial+0x18>
114f: 8d 57 ff lea -0x1(%rdi),%edx
1152: 0f af c7 imul %edi,%eax
1155: 89 d7 mov %edx,%edi
1157: 83 fa 01 cmp $0x1,%edx
115a: 75 f3 jne 114f <factorial+0xa>
115c: c3 retq
115d: 89 f8 mov %edi,%eax
115f: c3 retq
The key difference between the two is that:
the -fno-optimize-sibling-calls uses callq, which is the typical non-optimized function call.
This instruction pushes the return address to the stack, therefore increasing it.
Furthermore, this version also does push %rbx, which pushes %rbx to the stack.
GCC does this because it stores edi, which is the first function argument (n) into ebx, then calls factorial.
GCC needs to do this because it is preparing for another call to factorial, which will use the new edi == n-1.
It chooses ebx because this register is callee-saved: What registers are preserved through a linux x86-64 function call so the subcall to factorial won't change it and lose n.
the -foptimize-sibling-calls does not use any instructions that push to the stack: it only does goto jumps within factorial with the instructions je and jne.
Therefore, this version is equivalent to a while loop, without any function calls. Stack usage is constant.
Tested in Ubuntu 18.10, GCC 8.2.
Note first of all that not all languages support it.
TCO applys to a special case of recursion. The gist of it is, if the last thing you do in a function is call itself (e.g. it is calling itself from the "tail" position), this can be optimized by the compiler to act like iteration instead of standard recursion.
You see, normally during recursion, the runtime needs to keep track of all the recursive calls, so that when one returns it can resume at the previous call and so on. (Try manually writing out the result of a recursive call to get a visual idea of how this works.) Keeping track of all the calls takes up space, which gets significant when the function calls itself a lot. But with TCO, it can just say "go back to the beginning, only this time change the parameter values to these new ones." It can do that because nothing after the recursive call refers to those values.
Look here:
http://tratt.net/laurie/tech_articles/articles/tail_call_optimization
As you probably know, recursive function calls can wreak havoc on a stack; it is easy to quickly run out of stack space. Tail call optimization is way by which you can create a recursive style algorithm that uses constant stack space, therefore it does not grow and grow and you get stack errors.
The recursive function approach has a problem. It builds up a call stack of size O(n), which makes our total memory cost O(n). This makes it vulnerable to a stack overflow error, where the call stack gets too big and runs out of space.
Tail call optimization (TCO) scheme. Where it can optimize recursive functions to avoid building up a tall call stack and hence saves the memory cost.
There are many languages who are doing TCO like (JavaScript, Ruby and few C) whereas Python and Java do not do TCO.
JavaScript language has confirmed using :) http://2ality.com/2015/06/tail-call-optimization.html
We should ensure that there are no goto statements in the function itself .. taken care by function call being the last thing in the callee function.
Large scale recursions can use this for optimizations, but in small scale, the instruction overhead for making the function call a tail call reduces the actual purpose.
TCO might cause a forever running function:
void eternity()
{
eternity();
}
In a functional language, tail call optimization is as if a function call could return a partially evaluated expression as the result, which would then be evaluated by the caller.
f x = g x
f 6 reduces to g 6. So if the implementation could return g 6 as the result, and then call that expression it would save a stack frame.
Also
f x = if c x then g x else h x.
Reduces to f 6 to either g 6 or h 6. So if the implementation evaluates c 6 and finds it is true then it can reduce,
if true then g x else h x ---> g x
f x ---> h x
A simple non tail call optimization interpreter might look like this,
class simple_expresion
{
...
public:
virtual ximple_value *DoEvaluate() const = 0;
};
class simple_value
{
...
};
class simple_function : public simple_expresion
{
...
private:
simple_expresion *m_Function;
simple_expresion *m_Parameter;
public:
virtual simple_value *DoEvaluate() const
{
vector<simple_expresion *> parameterList;
parameterList->push_back(m_Parameter);
return m_Function->Call(parameterList);
}
};
class simple_if : public simple_function
{
private:
simple_expresion *m_Condition;
simple_expresion *m_Positive;
simple_expresion *m_Negative;
public:
simple_value *DoEvaluate() const
{
if (m_Condition.DoEvaluate()->IsTrue())
{
return m_Positive.DoEvaluate();
}
else
{
return m_Negative.DoEvaluate();
}
}
}
A tail call optimization interpreter might look like this,
class tco_expresion
{
...
public:
virtual tco_expresion *DoEvaluate() const = 0;
virtual bool IsValue()
{
return false;
}
};
class tco_value
{
...
public:
virtual bool IsValue()
{
return true;
}
};
class tco_function : public tco_expresion
{
...
private:
tco_expresion *m_Function;
tco_expresion *m_Parameter;
public:
virtual tco_expression *DoEvaluate() const
{
vector< tco_expression *> parameterList;
tco_expression *function = const_cast<SNI_Function *>(this);
while (!function->IsValue())
{
function = function->DoCall(parameterList);
}
return function;
}
tco_expresion *DoCall(vector<tco_expresion *> &p_ParameterList)
{
p_ParameterList.push_back(m_Parameter);
return m_Function;
}
};
class tco_if : public tco_function
{
private:
tco_expresion *m_Condition;
tco_expresion *m_Positive;
tco_expresion *m_Negative;
tco_expresion *DoEvaluate() const
{
if (m_Condition.DoEvaluate()->IsTrue())
{
return m_Positive;
}
else
{
return m_Negative;
}
}
}
The main application file is crashing becausing The server is affected by a format string bug when handles the players nicknames due the access to
an invalid memory zone.
The instruction executed is "cmp [EAX], 00000000" where EAX contains 4
of the bytes in the nickname and Crashes the Server.
I debugged and found that "%s" is missing before the logging string passed to the File_printf function. So i have tried to add this string via IDA Debugger and Successed. After entering these bytes now the server is crashing with the message "server is not vulnerable" before it was crashing with the message "server is vulnerable"
CODE
Bytes I have entered to patch the application:
RVA
00400000
OFFSET
0041dfad cc 68 ; push 0061d0dc
+ cc |0061d0dc
+ cc e8 ; call 0040d270
+ cc ^0040d270
+ cc 83 ; add esp,04
+ cc c4
+ cc 04
+ cc e9 ; jmp 0041e059
+ cc ^0041e059
0041e054 e8 e9 ; jmp 0041dfad
+ ?? ^0041dfad
0055DD63 cmp dword ptr [eax], 0
/*source*/
if ( *(_DWORD *)a1 )
a1 = sub_445D50();
if ( v2 )
{
--*(_DWORD *)(v2 + 4);
*(_DWORD *)a1 = *(_DWORD *)(v2 + 20);
*(_DWORD *)(v2 + 20) = a1;
}
else
{
v3 = *(_DWORD *)((a1 - 4) & 0xFFFFFFFC);
--dword_798ABD0;
sub_445D50();
memset(*(void **)(v3 + 8), 0xCDu, *(_DWORD *)(v3 + 16));
free((void *)v3);
}
}
/Hex Value/
0055DD63 83 38 00
After Testing the Server to Crash then the server Crashed with the message in the testing tool "Server is not Vulnerable" but Crashed.
And in the Debugger IDA i get this result with the detailed Message:
55dd63: The Instruction at 0x55DD63 referenced memory at 0x61616161, The memory could not be read -> 61616161 (exc.code c0000005, tid 4692)
Image 1
Image 2
Image 3
Image 4
Image 5
I can Share the testing tool also but not here because the testing tool has .simplese trojan and it may harm your pc, but i can share the Source code of the testing tool on Request.
The bug is caused by the logging function NetManager_LogMessage which
takes the text to dump, adds a timestamp (using snprintf) and then
passes the whole string to the function File_printf without the needed
format argument (%s) and you need to using the value 05 instead of 04 to make an ampty space to fool the bug. This trick work on many games and good luck
While playing around with template recursion in D, I found that the intermediate results of the classical factorial are still in the object file. I suppose they are also in the executable...?
I can see that the actually executed code contains only the value (or a pointer to it) but:
Shouldn't there be a single mov statement without the intermediate data being saved for no reason?
This is the code:
int main()
{
static int x = factorial!(5);
return x;//factorial!(5);
}
template factorial(int n)
{
static if (n == 1)
const factorial = 1;
else
const factorial = n * factorial!(n-1);
}
and this is the output of obj2asm test.o:
( for your convenience: 1! = 1h, 2! = 2h, 3! = 6h, 4! = 18h, 5! = 78h )
FLAT group
;File = test_fac_01.d
extrn _main
public _deh_beg
public _deh_end
public _tlsstart
public _tlsend
public _D11test_fac_014mainFZi1xi
extrn _GLOBAL_OFFSET_TABLE_
public _Dmain
public _D11test_fac_0112__ModuleInfoZ
extrn _Dmodule_ref
public _D11test_fac_017__arrayZ
public _D11test_fac_018__assertFiZv
public _D11test_fac_0115__unittest_failFiZv
extrn _d_array_bounds
extrn _d_unittestm
extrn _d_assertm
.text segment
assume CS:.text
:
mov EAX,offset FLAT:_D11test_fac_0112__ModuleInfoZ[018h]#32
mov ECX,offset FLAT:_Dmodule_ref#32
mov RDX,[RCX]
mov [RAX],RDX
mov [RCX],RAX
ret
.text ends
.data segment
_D11test_fac_0112__ModuleInfoZ:
db 004h,000h,000h,0ffffff80h,000h,000h,000h,000h ;........
db 074h,065h,073h,074h,05fh,066h,061h,063h ;test_fac
db 05fh,030h,031h,000h,000h,000h,000h,000h ;_01.....
db 000h,000h,000h,000h,000h,000h,000h,000h ;........
dq offset FLAT:_D11test_fac_0112__ModuleInfoZ#64
.data ends
.bss segment
.bss ends
.rodata segment
.rodata ends
.tdata segment
_tlsstart:
db 000h,000h,000h,000h,000h,000h,000h,000h ;........
db 000h,000h,000h,000h,000h,000h,000h,000h ;........
.tdata ends
.tdata. segment
_D11test_fac_014mainFZi1xi:
db 078h,000h,000h,000h ;x...
.tdata. ends
.text._Dmain segment
assume CS:.text._Dmain
_Dmain:
push RBP
mov RBP,RSP
mov RAX,FS:[00h]
mov RCX,_D11test_fac_014mainFZi1xi#GOTTPOFF[RIP]
mov EAX,[RCX][RAX]
pop RBP
ret
nop
nop
nop
.text._Dmain ends
.data._D11test_fac_0117__T9factorialVi5Z9factorialxi segment
_D11test_fac_0117__T9factorialVi5Z9factorialxi:
db 078h,000h,000h,000h ;x...
.data._D11test_fac_0117__T9factorialVi5Z9factorialxi ends
.data._D11test_fac_0117__T9factorialVi4Z9factorialxi segment
_D11test_fac_0117__T9factorialVi4Z9factorialxi:
db 018h,000h,000h,000h ;....
.data._D11test_fac_0117__T9factorialVi4Z9factorialxi ends
.data._D11test_fac_0117__T9factorialVi3Z9factorialxi segment
_D11test_fac_0117__T9factorialVi3Z9factorialxi:
db 006h,000h,000h,000h ;....
.data._D11test_fac_0117__T9factorialVi3Z9factorialxi ends
.data._D11test_fac_0117__T9factorialVi2Z9factorialxi segment
_D11test_fac_0117__T9factorialVi2Z9factorialxi:
db 002h,000h,000h,000h ;....
.data._D11test_fac_0117__T9factorialVi2Z9factorialxi ends
.data._D11test_fac_0117__T9factorialVi1Z9factorialxi segment
_D11test_fac_0117__T9factorialVi1Z9factorialxi:
db 001h,000h,000h,000h ;....
.data._D11test_fac_0117__T9factorialVi1Z9factorialxi ends
.ctors segment
dq offset FLAT:#64
.ctors ends
.text._D11test_fac_017__arrayZ segment
assume CS:.text._D11test_fac_017__arrayZ
_D11test_fac_017__arrayZ:
push RBP
mov RBP,RSP
sub RSP,010h
mov RSI,RDI
mov RDI,offset FLAT:_D11test_fac_0112__ModuleInfoZ#64
call _d_array_bounds#PC32
nop
nop
.text._D11test_fac_017__arrayZ ends
.text._D11test_fac_018__assertFiZv segment
assume CS:.text._D11test_fac_018__assertFiZv
_D11test_fac_018__assertFiZv:
push RBP
mov RBP,RSP
sub RSP,010h
mov RSI,RDI
mov RDI,offset FLAT:_D11test_fac_0112__ModuleInfoZ#64
call _d_assertm#PC32
nop
nop
.text._D11test_fac_018__assertFiZv ends
.text._D11test_fac_0115__unittest_failFiZv segment
assume CS:.text._D11test_fac_0115__unittest_failFiZv
_D11test_fac_0115__unittest_failFiZv:
push RBP
mov RBP,RSP
sub RSP,010h
mov RSI,RDI
mov RDI,offset FLAT:_D11test_fac_0112__ModuleInfoZ#64
call _d_unittestm#PC32
leave
ret
.text._D11test_fac_0115__unittest_failFiZv ends
end
You shouldn't use templates when what you want is compile-time function execution. Just write the function as you would and call it in a static context.
int main()
{
static int x = factorial(5); // static causes CTFE
return x;
}
int factorial(int n)
{
if (n == 1)
return 1;
else
return n * factorial(n-1);
}
This won't result in any extra symbols because factorial is evaluated at compile time. There are no symbols other than factorial itself. Your template trick instantiates symbols to achieve the same effect, but it's not symbols you want.
Alternatively, if you still want to use templates, but don't want symbols then you can use manifest constants via enum.
template factorial(int n)
{
static if (n == 1)
enum factorial = 1;
else
enum factorial = n * factorial!(n-1);
}
Notice the change from const to enum. enum values are purely compile-time, so they produce no symbols or data in the object files.
I'm debugging this code :
len = NGX_SYS_NERR * sizeof(ngx_str_t);
ngx_sys_errlist = malloc(len);
if (ngx_sys_errlist == NULL) {
goto failed;
}
for (err = 0; err < NGX_SYS_NERR; err++) {
But in gdb if (ngx_sys_errlist == NULL) { is skipped directly:
(gdb)
59 ngx_sys_errlist = malloc(len);
(gdb) n
64 for (err = 0; err < NGX_SYS_NERR; err++) {
I also have experienced this before,but never knows the reason,anyone knows?
Is it a bug?
UPDATE
0x000000000041be9d <ngx_strerror_init+0>: mov %rbx,-0x30(%rsp)
0x000000000041bea2 <ngx_strerror_init+5>: mov %rbp,-0x28(%rsp)
0x000000000041bea7 <ngx_strerror_init+10>: mov %r12,-0x20(%rsp)
0x000000000041beac <ngx_strerror_init+15>: mov %r13,-0x18(%rsp)
0x000000000041beb1 <ngx_strerror_init+20>: mov %r14,-0x10(%rsp)
0x000000000041beb6 <ngx_strerror_init+25>: mov %r15,-0x8(%rsp)
0x000000000041bebb <ngx_strerror_init+30>: sub $0x38,%rsp
0x000000000041bebf <ngx_strerror_init+34>: mov $0x840,%edi
0x000000000041bec4 <ngx_strerror_init+39>: callq 0x402388 <malloc#plt>
0x000000000041bec9 <ngx_strerror_init+44>: mov %rax,0x26e718(%rip) # 0x68a5e8 <ngx_sys_errlist>
0x000000000041bed0 <ngx_strerror_init+51>: mov $0x840,%r12d
0x000000000041bed6 <ngx_strerror_init+57>: test %rax,%rax
0x000000000041bed9 <ngx_strerror_init+60>: je 0x41bf56 <ngx_strerror_init+185>
0x000000000041bedb <ngx_strerror_init+62>: mov $0x0,%r13d
0x000000000041bee1 <ngx_strerror_init+68>: mov $0x0,%r14d
0x000000000041bee7 <ngx_strerror_init+74>: mov $0x0,%r15d
0x000000000041beed <ngx_strerror_init+80>: mov %r13d,%edi
0x000000000041bef0 <ngx_strerror_init+83>: callq 0x402578 <strerror#plt>
UPDATE
Nobody else ever met the same thing in using gdb? It happens to me frequently when debugging.
Most likely the two statements were optimized into a single set-and-test expression, which then can't be decomposed into the original two lines. The generated pseudocode is likely to be something like
call _malloc
jz _failed
mov acc, _ngx_sys_errlist
where the test now happens before the assignment; do you let the source level trace go backwards to reflect this?
please check,
a) if you are debugging release build (if there exists one)
b) if your source file is modified
if you still have the issue, please provide the details (Complier with version, degugger version , platform and code ...)