Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I need to use an array in a loop, and the loop time is very huge.
Case 1: define the array outside the for-loop and pass it to fun2
void fun1(){
int temp[16];
for(int i = 0;i <times; i++)
{
fun2(temp);
}
}
void fun2(int[]& temp){
/** do something with temp*/
}
Case 2: define the array in fun2:
void fun1() {
for (int i = 0; i < times; i++)
{
fun2();
}
}
void fun2() {
int temp[16];
/** do something with temp */
}
fun1 will be called very often. In this situation, which is better?
Does Case 2 have some influence on performance?
If you look for an answer to the general case, the answer is, "it depends." If you want an answer to your specific example, the answer is that the second version will be more efficient.
Ask yourself:
Is there a cost to construction / destruction / reuse?
In your example, there is none (except adjusting the stack pointer, which is extremely cheap). But if it was an array of objects, or if you had to initialize the array to a specific value, that changes.
How does the cost of parameterization factor in?
This is very minor but in your first case, you pass a pointer to the function. If the function is not inlined, this means that the array can only be accessed through that pointer. This takes up one register which could be used otherwise. In the second example, the array can be accessed through the stack pointer, which is basically free.
It also affects alias and escape analysis negatively which can lead to less efficient code. Basically, the compiler has to write values to memory instead of keeping them in registers if it cannot prove that a following memory read may not refer to the same value.
Which version is more robust?
The second version ensures that the array is always properly sized. On the other hand, if you pass an object whose constructor may throw an exception, constructing outside the function may allow you to throw the exception at a more convenient location. This could be significant for exception safety guarantees.
Is there a benefit in deallocating early?
Yes, allocation and deallocation are costly, but early destruction may allow some reuse. I've had cases where deallocating objects early allowed reuse of the memory in other parts of the code which improved use of the CPU cache.
depends on what you want to achieve..in this case, i'm assuming you are looking for performance which case 2 would be the better option as the function would create the variable on the fly instead of trying to get the variable globally then its value.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
i wonder what's the maximum depth of a recursion function. I know it has relationship with the stack size. But what's the relationship? If i write a function in 32-bit machine which dose nothing but call itself, What's the maximum depth?
unsigned long times=0;
void fun()
{
++times;
fun();
}
Then what's value of 'times' when the stack overflow?
The relationship is roughly this:
Maximum recursion depth = ((Stack size) - (Total size of stack frames in call chain up to the recursive function)) / (Stack frame size of recursive function)
The stack frame is the data that gets pushed onto the stack each time you make a function call. It consists of the function return address, space for parameters (that weren't passed in registers) and space for the local variables. It will differ for different functions but it will be constant for a given function recursively calling itself at each call.
It follows from this that a recursive function with a large number of parameters and/or large number of local variables will have a larger stack frame size, and therefore a smaller maximum recursion depth for a stack of a given size.
If the compiler performs tail recursion optimization, then the stack frame size is effectively zero after the top-level call, so the formula gives a divide by zero: no max recursion depth.
Everything I've said here probably has multiple exceptions to the rule, but this is the basic relationship.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Is there a way to take two pieces of heap allocated memory, and put them together efficiently? Would this be more efficient than the following?
for( unsigned int i = 0; i < LENGTH_0; ++i )
AddToArray( buffer, array0[ i ] );
for( unsigned int i = 0; i < LENGTH_1; ++i )
AddToArray( buffer, array1[ i ] );
For copying memory byte by byte, you can't go wrong with memcpy. That's going to be the fastest way to move memory.
Note that there are several caveats, however. For one, you have to manually ensure that your destination memory is big enough. You have to manually compute sizes of objects (with the sizeof operator). It won't work well with some objects (shared_ptr comes to mind). And it's pretty gross looking in the middle of some otherwise elegant C++11.
Your way works too and should be nearly as fast.
You should strongly consider C++'s copy algorithm (or one of its siblings), and use vectors to resize on the fly. You get to use iterators, which are much nicer. Again, it should be nearly as fast as memcpy, with the added benefit that it is far, far safer than moving bytes around: shared_ptr and its ilk will work as expected.
I'd do something like this until proven to slow:
vector<decltype(*array0)> vec;
vec.reserve(LENGTH_0 + LENGTH_1);
vec.insert(vec.end(),array0,array0 + LENGTH_0);
vec.insert(vec.end(),array1,array1 + LENGTH_1);
Depending on the data stored in array1 and array0 that might be as fast or even faster than calling a function for every single data.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
What's faster when compiling with GCC in C++,
using a precompute array of 2 columns and 300 rows,
or using a third grade polynomial, such as "x^3 + x^2 + x + 40"?
(Sorry for my english)
Edit:
Is faster searching in an array, (Input value the first column and output the second column.)
or using a function (The input and output of the polynomial is obvious)?
edit2:
using index
I think he is trying to compare the speed between a polynomial computation and a lookup table.
It depends. The lookup table is usually stored in memory and the LD instruction will be involved. If the lookup table is not cached then expect long delay from memory.
If you need to access the lookup table frequently and for multiple times, and the table is of reasonable size, try to use lookup table. it is because, very likely, the table will be cached. If you were able to store the table on stack, then do it. Since the data on stack are more likely to be cached than the data on heap.
On the other hand, if the calculation is not frequent, then using polynomial computation is fine. This can save you some memory and make your code more readable.
You should actually profile the code.
Polynomial Evaluation
The function is:
int Evaluate_Polynomial(int x)
{
register const int term1 = x * x * x;
register const int term2 = x * x;
register const int result = term1 + term2 + x + 40;
return result;
}
Note: In the above function, register is used to remind the compiler to use registers, even in the unoptimized version (a.k.a. debug).
Without optimization, the above function has 3 multiply operations and 3 addition operations, for a total of 6 data processing operations (not including load or store).
Table Lookup
The function is:
int Table_Lookup_Polynomial(int x)
{
int result = 0;
if ((x < 0) || (x > 300))
{
result = table[x];
}
else
{
// Handle array index out of bounds
}
return result;
}
In the above example, there is a possibility of 3 comparisons (jumps) and a pointer dereference. Verify importantly, there is a need for error handling.
Summary
The polynomial version may contain more instructions, but they are data processing instructions and can be easily inlined. They do not cause a processor's instruction cache to be reloaded.
The table lookup needs to perform boundary checking. The boundary checking will cause the processor to pause and maybe reload the instruction pipeline, which takes time. The error checking and handling may cause issues during maintenance if the range is ever changed.
The functions should be profiled to verify which algorithm is faster. The loss of time due to altering the execution flow may be longer than the pure math data processing functions of the polynomial evaluation.
The compiler may be able to use special processor functions to make the polynomial evaluation faster. For example, the ARM7 processor has instructions that can perform a multiply and add together.
Computing a 3rd grade polynomial will nearly always be faster.
People seem to forget they need to search a value in this "look-up" table. Which is O(log N).
Evaluating polynomial of 3rd grade is trivial enough, that the table would need to be uselessly small to outperform it.
The table has chance only, if you store exactly the values for arguments you will look for and you know where in the table the reside. So you do not have to perform a search. That would make it a real look-up table, and it would probably be faster.
The example I know where tables are indeed used are computing sine function with high precision (more on wiki). Although, there computation would be really expensive.
If we consider recursive function in C/C++, are they useful in any way? Where exactly they are used mostly?
Are there any advantages in terms of memory by using recursive functions?
Edit: is the recursion better or using a while loop?
Recursive functions are primarily used for ease of designing algorithms. For example you need to traverse a directory tree recursively - its depth it limited, so you're quite likely to never face anything like too deep recursion and consequent stack overflow, but writing a tree traversal recursively is soo much easier, then doing the same in iterative manner.
In most cases recursive functions don't save memory compared to iterative solutions. Even worse they consume stack memory which is relatively scarse.
they have many uses and some things become very difficult to impossible without them. iterating trees for instance.
Recursion definitively has advantages at problems with a recursive nature. Other posters named some of them.
To use the capability of C for recursion definitively has advantages in memory management. When you try to avoid recursion, most of the time an own stack or other dynamic data type is used to break the problem. This involves dynamic memory management in C/C++. Dynamic memory management is costly and errorprone!
You can't beat the stack
On the other hand, when you just use the stack and use recursion with local variables -- the memory management is just simple and the stack is most of the time more time-efficient then all the memory management you can do by yourself or with plain C/C++ memory-management. The reason is that the system stack is such a simple and convenient data structure with low overhead and it is implemented using special processor operations that are optimized for this work. Believe me, you can't beat that, since compilers, operation systems and processors are optimized for stack manipulations for decades!
PS: Also the stack does not become fragmented, like heap memory does easily. This way it is also possible to save memory by using the stack / recursion.
Implement QuickSort with and without using recursion, then you can tell by yourself if it's useful or not.
I often find recursive algorithms easier to understand because they involve less mutable state. Consider the algorithm for determining the greatest common divisor of two numbers.
unsigned greatest_common_divisor_iter(unsigned x, unsigned y)
{
while (y != 0)
{
unsigned temp = x % y;
x = y;
y = temp;
}
return x;
}
unsigned greatest_common_divisor(unsigned x, unsigned y)
{
return y == 0 ? x : greatest_common_divisor(y, x % y);
}
There is too much renaming going on in the iterative version for my taste. In the recursive version, everything is immutable, so you could even make x and y const if you liked.
When using recursion, you can store data on the stack (effectively, in the calling contexts of all the functions above the current instance) that you would have instead to store in the heap with dynamic allocation if you were trying to do the same thing with a while loop.
Think of most divide-and-conquer algorithms where there are two things to do on each call (that is, one of the calls is not tail-recursive).
And with respect to Tom's interesting comment/subquestion, this advantage of recursive functions is maybe particularly noticeable in C where the management of dynamic memory is so basic. But that doesn't make it very specific to C.
Dynamic programming is a key area where recursion is crucial, though it goes beyond that (remembering sub-answers can give drastic performance improvements). Algorithms are where recursion is normally used, rather than typical day to day coding. It's more a computer-science concept than a programming one.
One thing that is worth mentioning is that in most functional languages (Scheme for example), you can take advantage of tail call optimizations, and thus you can use recursive functions without increasing the amount of memory in your stack.
Basically, complex recursive tail calls can runs flawlessly in Scheme while in C/C++ the same ones will create a stack overflow.
There are two reasons I see for using recursion:
an algorithm operates on recursive data structures (like e.g. a tree)
an algorithm is of recursive nature (often happens for mathematical problems as recursion often offers beautiful solutions)
Handle recursion with care as there is always the danger of infinite recursion.
Recursive functions make it easier to code solutions that have a recurrence relation.
For instance the factorial function has a recurrence relation:
factorial(0) = 1
factorial(n) = n * factorial(n-1)
Below I have implemented factorial using recursion and looping.
The recursive version and recurrence relation defined above look similar and is
hence easier to read.
Recursive:
double factorial ( int n )
{
return ( n ? n * factorial ( n-1 ) : 1 );
}
Looping:
double factorial ( int n )
{
double result = 1;
while ( n > 1 )
{
result = result * n;
n--;
}
return result;
}
One more thing:
The recursive version of factorial includes a tail call to itself and can be tail-call optimized. This brings the space complexity of recursive factorial down to the space complexity of the iterative factorial.