Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
In terms of readability and memory usage/ processing speed is it better to define a variable, modify it and output the variable or to just output a result? eg:
int a = 1, b = 2, c;
c = a+b;
std::cout << c << std::endl;
vs
int a = 1, b = 2;
std::cout << a+b << std::endl;
Thanks
Well with this example processing speed and space is negligible. So small and so few instructions.
But in the grand scheme of things the answer is -- well it depends.
The term "better" is in the eye of the beholder. What is better for one program might not be better for another (this includes readability). What may work in one instance may not work in another. Or in the end it could be negligible (arithmetic instructions are pretty fast depending on the scope of what you need and int, double, char, float data types are relatively small and well defined so you know how much memory you are taking up).
Here you do not define if these variables were declared on the stack or the heap. If on the stack then it doesn't matter if you declared it because after the function that these variables live in ends, the memory gets released. If on the heap you may not want to declare millions of variables just to sit there. But then again you may need them there.
So its based almost entirely on a case by case bases when dealing with bigger projects.
And you tell me what is better here?
int result = (3434*234+3423-4/3*23< 233+5435*234+342)? (int)(234+234+234234/34):(int)(2+3*234);
std::cout << result << std::endl;
OR
double x = 3434*234+3423-4/3*23;
double y = 233+5435*234+342;
double a = 234+234+234234/34;
double b = 2+3*234;
int result = 0;
if( x>y) result = a;
else result = b;
std::cout << result << std::endl;
in the end it these do the same things are the same with negligble difference but which one is easier to read?
Your question on memory is easy to answer, variables are stored identifiers so each take a couple bytes (bytes store 8 bits or binary digits) to store. That being said, a byte is almost no memory, meaning that ultimately it has no net effect. In terms of RAM (or Random Access Memory) a byte is again, almost negligible meaning that defining a, b, and c is barely slower than just calculating a + b. Makes sense?
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I've implemented factorial recursively this way:
int factorial(int x){
if (x <= 0)
return 1;
else
return (x*factorial(x - 1));
}
int _tmain(int argc, _TCHAR* argv[])
{
std::cout << "Please enter your number :::";
int x;
std::cin >> x;
std::cout<<"factorial(" << x << ") = "<< factorial(x);
getchar(); getchar();
}
which way of implementing such code is more useful writing it using iteration and loops or writing it recursively such as above?
It depends on the number itself.
For normal-range numbers, recursive solution could be used. Since it makes use of previous values to calculate future values, it can provide the value of 'factorial(n-1)' on the fly.
factorial(n) = factorial(n-1) * n
However, since recursion makes use of stacks, it would eventually overflow if your calculation goes deeper than the stack-size. Moreover, recursive solution would give poor performance because of heavy push-pop of the registers in the ill level of each recursive call.
In such cases, iterative approach would be safe.
Have a look at this comparison.
Hope it helps!
In C++ a recursive implementation is usually more expensive than an iterative one.
The reason is that each call causes a creation of a new stack frame which holds data such as return address, local variables and registers that are needed to be saved. Therefore a recursive implementations requires amount of memory which is linear with the depth of your nested calls while iterative implementations use a constant amount of memory.
In your specific case, which is tail recursion, a compiler optimization may switch the function call with a direct jump preventing the unnecessary usage of memory on the call-stack.
For more detailed explanation you might want to read:
Is recursion ever faster than looping?
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Code 1
int A, B, MAX;
cout << "Give two numbers:" << endl;
cin >> A >> B;
if (A > B)
{
MAX = A;
}
else
{
MAX = B;
}
cout << "Largest amongst given numbers is: ";
cout << MAX << endl;
return 0;
Code 2
int A, B, MAX;
cout << "Give two numbers:" << endl;
cin >> A >> B;
MAX = A;
if (B > MAX)
{
MAX = B;
}
cout << "Largest amongst given numbers is: ";
cout << MAX << endl;
return 0;
In above program logic, which one is best and why? is there any difference between them.? it is my exam question i would like to ask stack overflow to know best opinion.
MAX = std::max(A, B);
Is better than both in terms of clarity.
In terms of speed, the compiler should be able to optimise any of these methods to be equivalent; but again I'd favour std::max because I'd sooner trust the compiler to optimise a standard function than some arbitrary made up code to perform the same task.
They are the same so I would prefer code 1 because it's more readable. In both cases you have to bring both A and B into a register (regardless) and then do a single comparison (regardless). And it won't write out to the variable MAX until after this segment is done (because it won't need to kick anything out of a register).
This isn't something that is going to cause any kind of performance gain. In fact, it's possible (although I doubt it) that the compiler would compile them the same (the compiler does all kinds of code modification to create an optimal set of instructions).
As suggested the only thing that likely would give a performance gain is using the library function std::max. This is because the compiler will basically perform the comparison in the most efficient way (likely without even calling a conditional jump). If your two values are integers, then you can see here that it's possible to find the max with five integer operations (all of which, except the multiplication can generally be done in a single cycle). You generally want to avoid conditional jumps as much as possible and this algorithm does that. Most likely the max function does something like this or similar (but it would have to be different for floating point values).
After MAX = std::max(A,B), the next best code is:
MAX = A > B ? A : B;
If you don't want to use that, then I prefer your "code 1" because:
code 2 always sets MAX = A, which is momentarily misleading and only becomes clear as the later code is studied; while this issue is common in C++ and many other languages, there's no particular reason to embrace that complication when it's easily avoided
for some types (the question didn't originally specify int), an assignment may be an expensive operation (e.g. copying a lot of data), so always assigning once is better than potentially assigning twice
For both those reasons, it also desirable to declare and define MAX in one step. It may not be practical if say it's a non-const reference accepted as a function argument, but when it is possible it's another good reason to prefer std::max or the ternary ? approach: you won't need a misleading and potentially inefficient or unavailable default construction. (Now you've changed the question to be int specific, the expense of copying and construction is known and trivial, and the optimiser can be counted on to remove them in most cases).
I would say use code 2. It is better because you explicitly say that if MAX B is greater than MAX A, then change MAX to B. In the other one, you don't have any defining factors about why MAX A is greater than MAX B. From what I see, you will probably have a harder time using code 1 than code 2.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
What is the faster between the two following code? why?
In which case one can be preferred to the other?
double x = (a + b) >> 1
double x = (a + b) / 2.0
These do different things so pick the one with the functionality you like: Truncating the result down or returning the 0.5 fraction.
"Premature optimization is root of all evil". Use what is more readable, when you have perfomance issue first look for algorithms and data structures, where you can get most perfomance gain, then run profiler and optimize where is necessary.
As others have said, the two statements produce different results, especially if (a + b) is an odd value.
Also, according to the language, a and b must be integral values in order to satisfy the shifting operation.
If a and b differ in type between the two statements, you are comparing apples to elephants.
Given this demo program:
#include <iostream>
#include <cstdlib>
#include <cmath>
using std::cout;
using std::endl;
int main(void)
{
const unsigned int a = 5;
const unsigned int b = 8;
double x = (a + b) >> 1;
cout << x << endl;
double y = (a + b) / 2.0;
cout << y << endl;
return EXIT_SUCCESS;
}
The output:
6
6.5
Based on this experiment, the comparison is apples to oranges. The statement involving shifting is a different operation that dividing by a floating point number.
As far as speed goes, the second statement is slower because the expression (a + b) must be converted to double before applying the division. The division is floating point, which may be slow on platforms without hardware floating point support.
You should not concern yourself on the execution speed of either statement. Of more importance is the correctness and robustness of the program. For example, the two statements above provide different results, which is a very important concern for correctness.
Most Users would wait for a program to produce correct results than have a quick program producing incorrect results or behavior (nobody is in a hurry for a program to crash).
Management would rather you spend time completing the program than wasting time optimizing portions of the program that are executed infrequently.
If a or b is a double or float, shifting will produce incorrect results.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I wanted to understand how type conversion happens in a professional grade software?
Consider the following conversions:
int to double
double to int
string to double
double to string
Currently I am using Qt for my project, which has API for doing these tasks.
So I just wanted to know how people perform these conversions with standard C++ only.
Accuracy, speed & memory are the priorities in their respective order.
For int to double you can simply do static_cast<double>(int_value);
from double to int it depends on the case , most of the time static_cast would do , however, sometimes you need to have specific control over the value being rounded to int. For this you can use functions like floor, round , or ceil.
For string to anything and anything to string there are a couple of options:
snprintf() - not recommended unless you know what you are doing. accuracy is hard to control
stringstream / istringstream - this has good accuracy and easy to controll
boost::format - personal favorite, check doc: http://www.boost.org/doc/libs/1_53_0/libs/format/doc/format.html
In terms of performance it depends i would pick snprintf to be the fastest in certain cases since it does not require allocations.
The first one is often implicit (since it's a promotion). The first and the second one can be accomplished with a simple static_cast:
double x = 0.123;
int y = 123;
std::cout << static_cast<int>(x) << '\n';
std::cout << static_cast<double>(y) << '\n';
The third one would require a std::istringstream:
std::string x = "123.456";
double y;
std::istringstream ss(x);
ss >> y;
The fourth would require a std::stringstream:
double x = 123.456;
std::stringstream ss;
ss << x;
ss.str(); // your string
int to double is a promotion in C++, which can be done implicitly.
double to int can also be done implicitly but it should be use with caution since many doubles either don't represent round integral values or are too big to be converted. Most compilers warn about that loss of precision, so the cast should be made explicit to get rid of the warning - if you are sure you are doing the right thing.
string to double, double to string: These are normally used only for input/output (GUI, console, textfiles, ...). Numerical values should be handled inside the program as such and not as strings. You should prevent double->string->double conversion chains inside your program when possible, since both conversions are not accurate and prone to rounding and other errors.
In the book Game Coding Complete, 3rd Edition, the author mentions a technique to both reduce data structure size and increase access performance. In essence it relies on the fact that you gain performance when member variables are memory aligned. This is an obvious potential optimization that compilers would take advantage of, but by making sure each variable is aligned they end up bloating the size of the data structure.
Or that was his claim at least.
The real performance increase, he states, is by using your brain and ensuring that your structure is properly designed to take take advantage of speed increases while preventing the compiler bloat. He provides the following code snippet:
#pragma pack( push, 1 )
struct SlowStruct
{
char c;
__int64 a;
int b;
char d;
};
struct FastStruct
{
__int64 a;
int b;
char c;
char d;
char unused[ 2 ]; // fill to 8-byte boundary for array use
};
#pragma pack( pop )
Using the above struct objects in an unspecified test he reports a performance increase of 15.6% (222ms compared to 192ms) and a smaller size for the FastStruct. This all makes sense on paper to me, but it fails to hold up under my testing:
Same time results and size (counting for the char unused[ 2 ])!
Now if the #pragma pack( push, 1 ) is isolated only to FastStruct (or removed completely) we do see a difference:
So, finally, here lies the question: Do modern compilers (VS2010 specifically) already optimize for the bit alignment, hence the lack of performance increase (but increase the structure size as a side-affect, like Mike Mcshaffry stated)? Or is my test not intensive enough/inconclusive to return any significant results?
For the tests I did a variety of tasks from math operations, column-major multi-dimensional array traversing/checking, matrix operations, etc. on the unaligned __int64 member. None of which produced different results for either structure.
In the end, even if their was no performance increase, this is still a useful tidbit to keep in mind for keeping memory usage to a minimum. But I would love it if there was a performance boost (no matter how minor) that I am just not seeing.
It is highly dependent on the hardware.
Let me demonstrate:
#pragma pack( push, 1 )
struct SlowStruct
{
char c;
__int64 a;
int b;
char d;
};
struct FastStruct
{
__int64 a;
int b;
char c;
char d;
char unused[ 2 ]; // fill to 8-byte boundary for array use
};
#pragma pack( pop )
int main (void){
int x = 1000;
int iterations = 10000000;
SlowStruct *slow = new SlowStruct[x];
FastStruct *fast = new FastStruct[x];
// Warm the cache.
memset(slow,0,x * sizeof(SlowStruct));
clock_t time0 = clock();
for (int c = 0; c < iterations; c++){
for (int i = 0; i < x; i++){
slow[i].a += c;
}
}
clock_t time1 = clock();
cout << "slow = " << (double)(time1 - time0) / CLOCKS_PER_SEC << endl;
// Warm the cache.
memset(fast,0,x * sizeof(FastStruct));
time1 = clock();
for (int c = 0; c < iterations; c++){
for (int i = 0; i < x; i++){
fast[i].a += c;
}
}
clock_t time2 = clock();
cout << "fast = " << (double)(time2 - time1) / CLOCKS_PER_SEC << endl;
// Print to avoid Dead Code Elimination
__int64 sum = 0;
for (int c = 0; c < x; c++){
sum += slow[c].a;
sum += fast[c].a;
}
cout << "sum = " << sum << endl;
return 0;
}
Core i7 920 # 3.5 GHz
slow = 4.578
fast = 4.434
sum = 99999990000000000
Okay, not much difference. But it's still consistent over multiple runs.So the alignment makes a small difference on Nehalem Core i7.
Intel Xeon X5482 Harpertown # 3.2 GHz (Core 2 - generation Xeon)
slow = 22.803
fast = 3.669
sum = 99999990000000000
Now take a look...
6.2x faster!!!
Conclusion:
You see the results. You decide whether or not it's worth your time to do these optimizations.
EDIT :
Same benchmarks but without the #pragma pack:
Core i7 920 # 3.5 GHz
slow = 4.49
fast = 4.442
sum = 99999990000000000
Intel Xeon X5482 Harpertown # 3.2 GHz
slow = 3.684
fast = 3.717
sum = 99999990000000000
The Core i7 numbers didn't change. Apparently it can handle
misalignment without trouble for this benchmark.
The Core 2 Xeon now shows the same times for both versions. This confirms that misalignment is a problem on the Core 2 architecture.
Taken from my comment:
If you leave out the #pragma pack, the compiler will keep everything aligned so you don't see this issue. So this is actually an example of what could happen if you misuse #pragma pack.
Such hand-optimizations are generally long dead. Alignment is only a serious consideration if you're packing for space, or if you have an enforced-alignment type like SSE types. The compiler's default alignment and packing rules are intentionally designed to maximize performance, obviously, and whilst hand-tuning them can be beneficial, it's not generally worth it.
Probably, in your test program, the compiler never stored any structure on the stack and just kept the members in registers, which do not have alignment, which means that it's fairly irrelevant what the structure size or alignment is.
Here's the thing: There can be aliasing and other nasties with sub-word accessing, and it's no slower to access a whole word than to access a sub-word. So in general, it's no more efficient, in time, to pack more tightly than word size if you're only accessing, say, one member.
Visual Studio is a great compiler when it comes to optimization. However, bear in mind that the current "Optimization War" in game development is not on the PC arena. While such optimizations may quite well be dead on the PC, on the console platforms it's a completely different pair of shoes.
That said, you might want to repost this question on the specialized gamedev stackexchange site, you might get some answers directly from "the field".
Finally, your results are exactly the same up to the microsecond which is dead impossible on a modern multithreaded system -- I'm pretty sure you either use a very low resolution timer, or your timing code is broken.
Modern compilers align members on different byte boundaries depending on the size of the member. See the bottom of this.
Normally you really shouldn't care about structure padding but if you have an object that is going to have 1000000 instances or something the rule of the thumb is simply to order your members from biggest to smallest. I wouldn't recommend messing with the padding with #pragma directives.
The compiler is going to either optimize for size or speed and unless you explicitly tell it you wont know what you get. But if you follow the advice of that book you will win-win on most compilers. Put the biggest, aligned, things first in your struct then half size stuff, then single byte stuff if any, add some dummy variables to align. Using bytes for things that dont have to be can be a performance hit anyway, as a compromise use ints for everything (have to know the pros and cons of doing that)
The x86 has made for a lot of bad programmers and compilers because it allows unaligned accesses. Making it hard for many folks to move to other platforms (that are taking over). Although unaligned accesses work on an x86 you take a serious performance hit. Which is why it is important to know how compilers work both in general as well as the particular one you are using.
having caches, and as with the modern computer platforms relying on caches to get any kind of performance, you want to both be aligned and packed. The simple rule being taught gives you both...in general. It is very good advice. Adding compiler specific pragmas is not nearly as good, makes the code non-portable, and doesnt take much searching through SO or googling to find out how often the compiler ignores the pragma or doesnt do what you really wanted.
On some platforms the compiler doesn't have an option: objects of types bigger than char often have strict requirements to be at a suitably aligned address. Typically the alignment requirements are identical to the size of the object up to the size of the biggest word supported by the CPU natively. That is short typically requires to be at an even address, long typically requires to be at an address divisible by 4, double at an address divisible by 8, and e.g. SIMD vectors at an address divisible by 16.
Since C and C++ require ordering of members in the order they are declared, the size of structures will differ quite a bit on the corresponding platforms. Since bigger structures effectively cause more cache misses, page misses, etc., there will be a substantial performance degradation when creating bigger structures.
Since I saw a claim that it doesn't matter: it matters on most (if not all) systems I'm using. There is a simple examples of showing different sizes. How much this affects the performance obviously depends on how the structures are to be used.
#include <iostream>
struct A
{
char a;
double b;
char c;
double d;
};
struct B
{
double b;
double d;
char a;
char c;
};
int main()
{
std::cout << "sizeof(A) = " << sizeof(A) << "\n";
std::cout << "sizeof(B) = " << sizeof(B) << "\n";
}
./alignment.tsk
sizeof(A) = 32
sizeof(B) = 24
The C standard specifies that fields within a struct must be allocated at increasing addresses. A struct which has eight variables of type 'int8' and seven variables of type 'int64', stored in that order, will take 64 bytes (pretty much regardless of a machine's alignment requirements). If the fields were ordered 'int8', 'int64', 'int8', ... 'int64', 'int8', the struct would take 120 bytes on a platform where 'int64' fields are aligned on 8-byte boundaries. Reordering the fields yourself will allow them to be packed more tightly. Compilers, however, will not reorder fields within a struct absent explicit permission to do so, since doing so could change program semantics.