I've been told that when we write int a[100] = {1};, the elements after 1 will be initialized to 0. But I didn't find out how this is done. (And is this true?)
I've tried the code below to roughly find out the time cost:
#define M 500000
clock_t begin = clock();
int main(){
/*-1-*/ //int a[M] = {1};
/*-2-*/ //int a[M]; memset(a, 0, sizeof(int) * M);
/*-3-*/ //int a[M]; for(int i = 0; i < M; i++) a[i] = 0;
clock_t end = clock();
cout<<(double) (end - begin) / CLOCKS_PER_SEC<<endl;
return 0;
}
Where -1- to -3- is the 3 cases tested. Each time I use one of them. On my computer, the average time of the first 2 cases is 0.04, and the 3rd case costs 0.08 (Each tested for 10 times). So I guess the initialization is done like memset?
But I'm still confused whether these two are the same. If so, who is doing this?
Please excuse me for my poor English.
Thanks for the comments! And I've read the references and the linked questions.
I've make it clear that the compiler does make the remainder elements to 0, but how is this done in lower level... maybe assembly? I'm sorry I didn't find references about that.
I'm new to stackoverflow, should I just edit my question like this?
Update:
Thanks a lot! I think I'm clear now.
The standard says when I int a[100] = {1};, the rest will be left 0. So my compiler will use some method to implement this, and different compiler may do this in different way. (Thank #eerorika to let me understand this!)
And thank #john for the site godbolt.org where I can read the assembly produced easily. Here I find out in x86-64 gcc 10.2(although in C but not C++), int a[100] = {1}; behaves like:
leaq -400(%rbp), %rdx
movl $0, %eax
movl $50, %ecx
movq %rdx, %rdi
rep stosq
movl $1, -400(%rbp)
movl $0, %eax
It use stosq to save 0's every quad-words for 50 times, which equals to 2 int to initialize the array.
And thank #largest_prime_is_463035818 to let me know my mistake in measuring time.
Thank all you for making me clear!
There is an answer to your question, but the C++ standard C++ latest draft - 9.4.2 Initializers - Aggregates (dcl.init.aggr) defines the elements explicitly initialized, and elements not explicitly initialized are copy initialized from an empty initializer list decl.init.aggr/5.2
I've been told that when we write int a[100] = {1};, the elements
after 1 will be initialized to 0.
You were told correctly. With int a[100] = {1};, a[0] is explicitly initialized from the initializer list, the remaining a[1-99] are copy-initialized from an empty list to 0.
But I didn't find out how this is done.
It is done by the compiler.
And is this true?
Yes, this is true.
but how is this done in lower level
It can be done in any way that the compiler designer had chosen.
If you are interested on what your compiler did, you can read the assembly that it produced.
I'm sorry I didn't find references about that.
There is no reference for that.
Related
If there was similar questions please direct me there, I searched quiet some time but didn't find anything.
Backround:
I was just playing around and found some behavior I can't completely explain...
For primitive types, it looks like when there's an implicit conversion, the assignment operator = takes longer time, compared to an explicit assignment.
int iTest = 0;
long lMax = std::numeric_limits<long>::max();
for (int i=0; i< 100000; ++i)
{
// I had 3 such loops, each running 1 of the below lines.
iTest = lMax;
iTest = (int)lMax;
iTest = static_cast<int>(lMax);
}
The result is that the c style cast and c++ style static_cast performs the same on average (differs each time, but no visible difference). AND They both outperforms the implicit assignment.
Result:
iTest=-1, lMax=9223372036854775807
(iTest = lMax) used 276 microseconds
iTest=-1, lMax=9223372036854775807
(iTest = (int)lMax) used 191 microseconds
iTest=-1, lMax=9223372036854775807
(iTest = static_cast<int>(lMax)) used 187 microseconds
Question:
Why is the implicit conversion results in larger latency? I can guess it has to be detected in the assignment that int overflows, so adjusted to -1. But what exactly is going on in the assignment?
Thanks!
If you want to know why something is happening under the covers, the best place to look is ... wait for it ... under the covers :-)
That means examining the assembler language that is produced by your compiler.
A C++ environment is best thought of as an abstract machine for running C++ code. The standard (mostly) dictates behaviour rather than implementation details. Once you leave the bounds of the standard and start thinking about what happens underneath, the C++ source code is of little help anymore - you need to examine the actual code that the computer is running, the stuff output by the compiler (usually machine code).
It may be that the compiler is throwing away the loop because it's calculating the same thing every time so only needs do it once. It may be that it throws away the code altogether if it can determine you don't use the result.
There was a time many moons ago, when the VAX Fortran compiler (I did say many moons) outperformed its competitors by several orders of magnitude in a given benchmark.
That was for that exact reason. It had determined the results of the loop weren't used so had optimised the entire loop out of existence.
The other thing you might want to watch out for is the measuring tools themselves. When you're talking about durations of 1/10,000th of a second, your results can be swamped by the slightest bit of noise.
There are ways to alleviate these effects such as ensuring the thing you're measuring is substantial (over ten seconds for example), or using statistical methods to smooth out any noise.
But the bottom line is, it may be the measuring methodology causing the results you're seeing.
#include <limits>
int iTest = 0;
long lMax = std::numeric_limits<long>::max();
void foo1()
{
iTest = lMax;
}
void foo2()
{
iTest = (int)lMax;
}
void foo3()
{
iTest = static_cast<int>(lMax);
}
Compiled with GCC 5 using -O3 yields:
__Z4foo1v:
movq _lMax(%rip), %rax
movl %eax, _iTest(%rip)
ret
__Z4foo2v:
movq _lMax(%rip), %rax
movl %eax, _iTest(%rip)
ret
__Z4foo3v:
movq _lMax(%rip), %rax
movl %eax, _iTest(%rip)
ret
They are all exactly the same.
Since you didn't provide a complete example I can only guess that the difference is due to something you aren't showing us.
My question is a very basic one. In C or C++:
Let's say the for loop is as follows,
for(int i=0; i<someArray[a+b]; i++) {
....
do operations;
}
My question is whether the calculation a+b, is performed for each for loop or it is computed only once at the beginning of the loop?
For my requirements, the value a+b is constant. If a+b is computed and the value someArray[a+b]is accessed each time in the loop, I would use a temporary variable for someArray[a+b]to get better performance.
You can find out, when you look at the generated code
g++ -S file.cpp
and
g++ -O2 -S file.cpp
Look at the output file.s and compare the two versions. If someArray[a+b] can be reduced to a constant value for all loop cycles, the optimizer will usually do so and pull it out into a temporary variable or register.
It will behave as if it was computed each time. If the compiler is optimising and is capable of proving that the result does not change, it is allowed to move the computation out of the loop. Otherwise, it will be recomputed each time.
If you're certain the result is constant, and speed is important, use a variable to cache it.
is performed for each for loop or it is computed only once at the beginning of the loop?
If the compiler is not optimizing this code then it will be computed each time. Safer is to use a temporary variable it should not cost too much.
First, the C and C++ standards do not specify how an implementation must evaluate i<someArray[a+b], just that the result must be as if it were performed each iteration (provided the program conforms to other language requirements).
Second, any C and C++ implementation of modest quality will have the goal of avoiding repeated evaluation of expressions whose value does not change, unless optimization is disabled.
Third, several things can interfere with that goal, including:
If a, b, or someArray are declared with scope visible outside the function (e.g., are declared at file scope) and the code in the loop calls other functions, the C or C++ implementation may be unable to determine whether a, b, or someArray are altered during the loop.
If the address of a, b, or someArray or its elements is taken, the C or C++ implementation may be unable to determine whether that address is used to alter those objects. This includes the possibility that someArray is an array passed into the function, so its address is known to other entities outside the function.
If a, b, or the elements of someArray are volatile, the C or C++ implementation must assume they can be changed at any time.
Consider this code:
void foo(int *someArray, int *otherArray)
{
int a = 3, b = 4;
for(int i = 0; i < someArray[a+b]; i++)
{
… various operations …
otherArray[i] = something;
}
}
In this code, the C or C++ implementation generally cannot know whether otherArray points to the same array (or an overlapping part) as someArray. Therefore, it must assume that otherArray[i] = something; may change someArray[a+b].
Note that I have answered regarding the larger expression someArray[a+b] rather than just the part you asked about, a+b. If you are only concerned about a+b, then only the factors that affect a and b are relevant, obviously.
Depends on how good the compiler is, what optimization levels you use and how a and b are declared.
For example, if a and/or b has volatile qualifier then compiler has to read it/them everytime. In that case, compiler can't choose to optimize it with the value of a+b. Otherwise, look at the code generated by the compiler to understand what your compiler does.
There's no standard behaviour on how this is calculated in neither C not C++.
I will bet that if a and b do not change over the loop it is optimized. Moreover, if someArray[a+b] is not touched it is also optimized. This is actually more important since since fetching operations are quite expensive.
That is with any half-decent compiler with most basic optimizations. I will also go as far as saying that people who say it does always evaluate are plain wrong. It is not always for certain, and it is most probably optimized whenever possible.
The calculation is performed each for loop. Although the optimizer can be smart and optimize it out, you would be better off with something like this:
// C++ lets you create a const reference; you cannot do it in C, though
const some_array_type &last(someArray[a+b]);
for(int i=0; i<last; i++) {
...
}
It calculates every time. Use a variable :)
You can compile it and check the assembly code just to make sure.
But I think most compilers are clever enough to optimize this kind of stuff. (If you are using some optimization flag)
It might be calculated every time or it might be optimised. It will depends on whether a and b exist in a scope that the compiler can guarantee that no external function can change their values. That is, if they are in a global context, the compiler cannot guarantee that a function you call in the loop will modify them (unless you don't call any functions). If they are only in local context, then the compiler can attempt to optimise that calculation away.
Generating both optimised and unoptimised assembly code is the easiest way to check. However, the best thing to do is not care because the cost of that sum is so incredibly cheap. Modern processors are very very fast and the thing that is slow is pulling in data from RAM to the cache. If you want to optimised your code, profile it; don't guess.
The calculation a+b would be carried out every iteration of the loop, and then the lookup into someArray is carried out every iteration of the loop, so you could probably save a lot of processor time by having a temporary variable set outside the loop, for example(if the array is an array of ints say):
int myLoopVar = someArray[a+b]
for(int i=0; i<myLoopVar; i++)
{
....
do operations;
}
Very simplified explanation:
If the value at array position a+b were a mere 5 for example, that would be 5 calculations and 5 lookups, so 10 operations, which would be replaced by 8 by using a variable outside the loop (5 accesses (1 per iteration of the loop), 1 calculation of a+b, 1 lookup and 1 assignment to the new variable) not so great a saving. If however you are dealing with larger values, for example the value stored in the array at a+b id 100, you would potentially be doing 100 calculations and 100 lookups, versus 103 operations if you have a variable outside the loop (100 accesses(1 per iteration of the loop), 1 calculation of a+b, 1 lookup and 1 assignment to the new variable).
The majority of the above however is dependant on the compiler: depending upon which switches you utilise, what optimisations the compiler can apply automatically etc., the code may well be optimised without you having to do any changes to your code. Best thing to do is weigh up the pros and cons of each approach specifically for your current implementation, as what may suit a large number of iterations may not be most efficient for a small number, or perhaps memory may be an issue which would dictate a differing style to your program . . . Well you get the idea :)
Let me know if you need any more info:)
for the following code:
int a = 10, b = 10;
for(int i=0; i< (a+b); i++) {} // a and b do not change in the body of loop
you get the following assembly:
L3:
addl $1, 12(%esp) ;increment i
L2:
movl 4(%esp), %eax ;move a to reg AX
movl 8(%esp), %edx ;move b to reg BX
addl %edx, %eax ;AX = AX + BX, i.e. AX = a + b
cmpl 12(%esp), %eax ;compare AX with i
jg L3 ;if AX > i, jump to L3 label
if you apply the compiler optimization, you get the following assembly:
movl $20, %eax ;move 20 (a+b) to AX
L3:
subl $1, %eax ;decrement AX
jne L3 ;jump if condition met
movl $0, %eax ;move 0 to AX
basically, in this case, with my compiler (MinGW 4.8.0), the loop will do "the calculation" regardless of whether you're changing the conditional variables within the loop or not (haven't posted assembly for this, but take my word for it, or even better, don't and disassemble the code yourself).
when you apply the optimization, the compiler will do some magic and churn out a set of instructions that are completely unrecognizable.
if you dont feel like optimizing your loop through a compiler action (-On), then declaring one variable and assigning it a+b will reduce your assembly by an instruction or two.
int a = 10, b = 10;
const int c = a + b;
for(int i=0; i< c; i++) {}
assembly:
L3:
addl $1, 12(%esp)
L2:
movl 12(%esp), %eax
cmpl (%esp), %eax
jl L3
movl $0, %eax
keep in mind, the assembly code i posted here is only the relevant snippet, there's a bit more, but it's not relevant as far as the question goes
I have enermous array:
int* arr = new int[BIGNUMBER];
How to fullfil it with 1 number really fast. Normally I would do
for(int i = 0; i < BIGNUMBER; i++)
arr[i] = 1
but I think it would take long.
Can I use memcpy or similar?
You could try using the standard function std::uninitialized_fill_n:
#include <memory>
// ...
std::uninitialized_fill_n(arr, BIGNUMBER, 1);
In any case, when it comes to performance, the rule is to always make measurements to back up your assumptions - especially if you are going to abandon a clear, simple design to embrace a more complex one because of an alleged performance improvement.
EDIT:
Notice that - as Benjamin Lindley mentioned in the comments - for trivial types std::uninitialized_fill_n does not bring any advantage over the more obvious std::fill_n. The advantage would exist for non-trivial types, since std::uninitialized_fill would allow you to allocate a memory region and then construct objects in place.
However, one should not fall into the trap of calling std::uninitialized_fill_n for a memory region that is not uninitialized. The following, for instance, would give undefined behavior:
my_object* v = new my_object[BIGNUMBER];
std::uninitialized_fill_n(my_object, BIGNUMBER, my_object(42)); // UB!
Alternative to a dynamic array is std::vector<int> with the constructor that accepts an initial value for each element:
std::vector<int> v(BIGNUMBER, 1); // 'BIGNUMBER' elements, all with value 1.
as already stated, performance would need measured. This approach provides the additional benefit that the memory will be freed automatically.
Some possible alternatives to Andy Prowl's std::uninitialized_fill_n() solution, just for posterity:
If you are lucky and your value is composed of all the same bytes, memset will do the trick.
Some implementations offer a 16-bit version memsetw, but that's not everywhere.
GCC has an extension for Designated Initializers that can fill ranges.
I've worked with a few ARM systems that had libraries that had accelerated CPU and DMA variants of word-fill, hand coded in assembly -- you might look and see if your platform offers any of this, if you aren't terribly concerned about portability.
Depending on your processor, even looking into loops around SIMD intrinsics may provide a boost; some of the SIMD units have load/store pipelines that are optimized for moving data around like this. On the other hand you may take severe penalties for moving between register types.
Last but definitely not least, to echo some of the commenters: you should test and see. Compilers tend to be pretty good at recognizing and optimizing patterns like this -- you probably are just trading off portability or readability with anything other than the simple loop or uninitialized_fill_n.
You may be interested in prior questions:
Is there memset() that accepts integers larger than char?
initializing an array of ints
How to initialize all members of an array to the same value?
Under Linux/x86 gcc with optimizations turned on, your code will compile to the following:
rax = arr
rdi = BIGNUMBER
400690: c7 04 90 01 00 00 00 movl $0x1,(%rax,%rdx,4)
Move immediate int(1) to rax + rdx
400697: 48 83 c2 01 add $0x1,%rdx
Increment register rdx
40069b: 48 39 fa cmp %rdi,%rdx
Cmp rdi to rdx
40069e: 75 f0 jne 400690 <main+0xa0>
If BIGNUMBER has been reached jump back to start.
It takes about 1 second per gigabyte on my machine, but most of that I bet is paging in physical memory to back the uninitialized allocation.
Just unroll the loop by, say, 8 or 16 times. Functions like memcpy are fast, but they're really there for convenience, not to be faster than anything you could possibly write:
for (i = 0; i < BIGNUMBER-8; i += 8){
a[i+0] = 1; // this gets rid of the test against BIGNUMBER, and the increment, on 7 out of 8 items.
a[i+1] = 1; // the compiler should be able to see that a[i] is being calculated repeatedly here
...
a[i+7] = 1;
}
for (; i < BIGNUMBER; i++) a[i] = 1;
The compiler might be able to unroll the loop for you, but why take the chance?
Use memset or memcpy
memset(arr, 0, BIGNUMER);
Try using memset?
memset(arr, 1, BIGNUMBER);
http://www.cplusplus.com/reference/cstring/memset/
memset(arr, 1, sizeof(int) * BIGNUMBER);
I have the following line in a function to count the number of 'G' and 'C' in a sequence:
count += (seq[i] == 'G' || seq[i] == 'C');
Are compilers smart enough to do nothing when they see 'count += 0' or do they actually lose time 'adding' 0 ?
Generally
x += y;
is faster than
if (y != 0) { x += y; }
Even if y = 0, because there is no branch in the first option. If its really important, you'll have to check the compiler output, but don't assume your way is faster because it sometimes doesn't do an add.
Honestly, who cares??
[Edit:] This actually turned out somewhat interesting. Contrary to my initial assumption, in unoptimized compilation, n += b; is better than n += b ? 1 : 0;. However, with optimizations, the two are identical. More importantly, though, the optimized version of this form is always better than if (b) ++n;, which always creates a cmp/je instruction. [/Edit]
If you're terribly curious, put the following code through your compiler and compare the resulting assembly! (Be sure to test various optimization settings.)
int n;
void f(bool b) { n += b ? 1 : 0; }
void g(bool b) { if (b) ++n; }
I tested it with GCC 4.6.1: With g++ and with no optimization, g() is shorter. With -O3, however, f() is shorter:
g(): f():
cmpb $0, 4(%esp) movzbl 4(%esp), %eax
je .L1 addl %eax, n
addl $1, n
.L1:
rep
Note that the optimization for f() actually does what you wrote originally: It literally adds the value of the conditional to n. This is in C++ of course. It'd be interesting to see what the C compiler would do, absent a bool type.
Another note, since you tagged this C as well: In C, if you don't use bools (from <stdbool.h>) but rather ints, then the advantage of one version over the other disappears, since both now have to do some sort of testing.
It depends on your compiler, its optimization options that you used and its optimization heuristics. Also, on some architectures it may be faster to add than to perform a conditional jump to avoid the addition of 0.
Compilers will NOT optimize away the +0 unless the expression on the right is a compiler const value equaling zero. But adding zero is much faster on all modern processors than branching (if then) to attempt to avoid the add. So the compiler ends up doing the smartest thing available in the given situation- simply adding 0.
Some are and some are not smart enough. its highly dependent on the optimizer implementation.
The optimizer might also determine that if is slower than + so it will still do the addition.
Haven't used C++ in a while. I've been depending on my Java compiler to do optimization.
What's is the most optimized way to do a for loop in C++? Or it is all the same now with moderm compilers? In the 'old days' there was a difference.
for (int i=1; i<=100; i++)
OR
int i;
for (i=1; i<=100; i++)
OR
int i = 1;
for ( ; i<=100; i++)
Is it the same in C?
EDIT:
Okay, so the overwhelming consensus is to use the first case and let the complier optimize with it if it want to.
I'd say that trivial things like this are probably optimized by the compiler, and you shouldn't worry about them. The first option is the most readable, so you should use that.
EDIT: Adding what other answers said, there is also the difference that if you declare the variable in the loop initializer, it will stop to exist after the loop ends.
The difference is scope.
for(int i = 1; i <= 100; ++i)
is generally preferable because then the scope of i is restricted to the for loop. If you declare it before the for loop, then it continues to exist after the for loop has finished and could clash with other variables. If you're only using it in the for loop, there's no reason to let it exist longer than that.
Let's say the original poster had a loop they really wanted optimized - every instruction counted. How can we figure out - empirically - the answer to his question?
gcc at least has a useful, if uncommonly used switch, '-S'. It dumps the assembly code version of the .c file and can be used to answer questions like the OP poses. I wrote a simple program:
int main( )
{
int sum = 0;
for(int i=1;i<=10;++i)
{
sum = sum + i;
}
return sum;
}
And ran: gcc -O0 -std=c99 -S main.c, creating the assembly version of the main program. Here's the contents of main.s (with some of the fluff removed):
movl $0, -8(%rbp)
movl $1, -4(%rbp)
jmp .L2
.L3:
movl -4(%rbp), %eax
addl %eax, -8(%rbp)
addl $1, -4(%rbp)
.L2:
cmpl $10, -4(%rbp)
jle .L3
You don't need to be an assembly expert to figure out what's going on. movl moves values, addl adds things, cmpl compares and jle stands for 'jump if less than', $ is for constants. It's loading 0 into something - that must be 'sum', 1 into something else - ah, 'i'! A jump to L2 where we do the compare to 10, jump to L3 to do the add. Fall through to L2 for the compare again. Neat! A for loop.
Change the program to:
int main( )
{
int sum = 0;
int i=1;
for( ;i<=10;++i)
{
sum = sum + i;
}
return sum;
}
Rerun gcc and the resultant assembly will be very similar. There's some stuff going on with recording line numbers, so they won't be identical, but the assembly ends up being the same. Same result with the last case. So, even without optimization, the code's just about the same.
For fun, rerun gcc with '-O3' instead of '-O0' to enable optimization and look at the .s file.
main:
movl $55, %eax
ret
gcc not only figured out we were doing a for loop, but also realized it was to be run a constant number of times did the loop for us at compile time, chucked out 'i' and 'sum' and hard coded the answer - 55! That's FAST - albeit a bit contrived.
Moral of the story? Spend your time on ensuring your code is clean and well designed. Code for readability and maintainability. The guys that live on mountain dew and cheetos are way smarter than us and have taken care of most of these simple optimization problems for us. Have fun!
It's the same. The compiler will optimize these to the same thing.
Even if they weren't the same, the difference compared to the actual body of your loop would be negligible. You shouldn't worry about micro-optimizations like this. And you shouldn't make micro-optimizations unless you are performance profiling to see if it actually makes a difference.
It's the same in term of speed. Compiler will optimize if you do not have a later use of i.
In terms of style - I'd put the definition in the loop construct, as it reduces the risk that you'll conflict if you define another i later.
Don't worry about micro-optimizations, let the compiler do it. Pick whichever is most readable. Note that declaring a variable within a for initialization statement scopes the variable to the for statement (C++03 § 6.5.3 1), though the exact behavior of compilers may vary (some let you pick). If code outside the loop uses the variable, declare it outside the loop. If the variable is truly local to the loop, declare it in the initializer.
It has already been mentioned that the main difference between the two is scope. Make sure you understand how your compiler handles the scope of an int declared as
for (int i = 1; ...;...)
I know that when using MSVC++6, i is still in scope outside the loop, just as if it were declared before the loop. This behavior is different from VS2005, and I'd have to check, but I think the last version of gcc that I used. In both of those compilers, that variable was only in scope inside the loop.
for(int i = 1; i <= 100; ++i)
This is easiest to read, except for ANSI C / C89 where it is invalid.
A c++ for loop is literally a packaged while loop.
for (int i=1; i<=100; i++)
{
some foobar ;
}
To the compiler, the above code is exactly the same as the code below.
{
int i=1 ;
while (i<=100){
some foobar ;
i++ ;
}
}
Note the int i=1 ; is contained within a dedicated scope that encloses only it and the while loop.
It's all the same.