g++ compiler optimization - c++

list<mpz_class> baseFactor;
1)
int *tab = new int [baseFactor.size()]; //baseFactor.size() ~= 20000
for(i = 0; i < baseFactor.size(); i++){
cout << tab[i] << endl;
}
// Total time: 2.620790
2)
int size = baseFactor.size();
int *tab = new int [size]; //baseFactor.size() ~= 20000
for(i = 0; i < size; i++){
cout << tab[i] << endl;
}
//Total time: 0.366500
Why the g++ compiler doesn't optimize code 1) in 2) ?

Depending on where baseFactor is defined (global variable?), it can be difficult for the compiler to prove that size() always returns the same value.
If it cannot prove that, the call can not be moved out of the loop.

For the first one to be optimised to the 2nd it would requre that baseFactor.size() never changes during the loop.
Of course it probably doesn't, but does the compiler know that?

A std::list container is a linked list, and computing its size may be costly (O(n) algorithm, that has changed in the latest C++11 standard IIRC). The compiler has no idea that the body of your function is not changing basefactor, so its size is computed once in every loop in the first case (at every test of the for loop) and only once in the second.
Maybe you should consider using std::vector instead.

Related

heap-buffer-overflow issue in array C++ [duplicate]

So the user inputs values within the for loop and the vector pushes it back, creating its own index. The problem arises in the second for loop, I think it has to do something with sizeof(v)/sizeof(vector).
vector<int> v;
for (int i; cin >> i;)
{
v.push_back(i);
cout << v.size() << endl;
}
for (int i =0; i < sizeof(v)/sizeof(vector); i++)
{
cout << v[i] << endl;
}
How will I determine the size of the vector after entering values?
(I'm quite new to C++ so If I have made a stupid mistake, I apologize)
Use the vector::size() method: i < v.size().
The sizeof operator returns the size in bytes of the object or expression at compile time, which is constant for a std::vector.
How will I determine the size of the vector after entering values?
v.size() is the number of elements in v. Thus,
another style for the second loop, which is easy to understand
for (int i=0; i<v.size(); ++i)
A different aspect of the 'size' function you might find interesting:
on Ubuntu 15.10, g++ 5.2.1,
Using a 32 byte class UI224, the sizeof(UI224) reports 32 (as expected)
Note that
sizeof(std::vector<UI224>) with 0 elements reports 24
sizeof(std::vector<UI224>) with 10 elements reports 24
sizeof(std::vector<UI224>) with 100 elements reports 24
sizeof(std::vector<UI224>) with 1000 elements reports 24
Note also, that
sizeof(std::vector<uint8_t> with 0 elements reports 24
(update)
Thus, in your line
for (int i =0; i < sizeof(v) / sizeof(vector); i++)
^^^^^^^^^ ^^^^^^^^^^^^^^
the 2 values being divided are probably not what you are expecting.
http://cppreference.com is a great site to look-up member functions of STL containers.
That being said you are looking for the vector::size() member function.
for (int i = 0; i < v.size(); i++)
{
cout << v[i] << endl;
}
If you have at your disposal a compiler that supports C++11 onwards you can use the new range based for loops:
for(auto i : v)
{
cout << i << endl;
}
A std::vector is a class. It's not the actual data, but a class that manages it.
Use std::vector.size() to get the size of the actual data.
Coliru example:
http://coliru.stacked-crooked.com/a/de0bffb1f4d8c836

How to deal with large sizes of data such as array or just number that causing stack in Cpp?

its my first time dealing with large numbers or arrays and i cant avoid over stacking i tried to use long long to try to avoid it but it shows me that the error is int main line :
CODE:
#include <iostream>
using namespace std;
int main()
{
long long n=0, city[100000], min[100000] = {10^9}, max[100000] = { 0 };
cin >> n;
for (int i = 0; i < n; i++) {
cin >> city[i];
}
for (int i = 0; i < n; i++)
{//min
for (int s = 0; s < n; s++)
{
if (city[i] != city[s])
{
if (min[i] >= abs(city[i] - city[s]))
{
min[i] = abs(city[i] - city[s]);
}
}
}
}
for (int i = 0; i < n; i++)
{//max
for (int s = 0; s < n; s++)
{
if (city[i] != city[s])
{
if (max[i] <= abs(city[i] - city[s]))
{
max[i] = abs(city[i] - city[s]);
}
}
}
}
for (int i = 0; i < n; i++) {
cout << min[i] << " " << max[i] << endl;
}
}
**ERROR:**
Severity Code Description Project File Line Suppression State
Warning C6262 Function uses '2400032' bytes of stack: exceeds /analyze:stacksize '16384'. Consider moving some data to heap.
then it opens chkstk.asm and shows error in :
test dword ptr [eax],eax ; probe page.
Small optimistic remark:
100,000 is not a large number for your computer! (you're also not dealing with that many arrays, but arrays of that size)
Error message describes what goes wrong pretty well:
You're creating arrays on your current function's "scratchpad" (the stack). That has very limited size!
This is C++, so you really should do things the (modern-ish) C++ way and avoid manually handling large data objects when you can.
So, replace
long long n=0, city[100000], min[100000] = {10^9}, max[100000] = { 0 };
with (I don't see any case where you'd want to use long long; presumably, you want a 64bit variable?)
(10^9 is "10 XOR 9", not "10 to the power of 9")
constexpr size_t size = 100000;
constexpr int64_t default_min = 1'000'000'000;
uint64_t n = 0;
std::vector<int64_t> city(size);
std::vector<int64_t> min_value(size, default_min);
std::vector<int64_t> max_value(size, 0);
Additional remarks:
Notice how I took your 100000 and your 10⁹ and made them constexpr constants? Do that! Whenever some non-zero "magic constant" appears in your code, it's a good time to ask yourself "will I ever need that value somewhere else, too?" and "Would it make sense to give this number a name explaining what it is?". And if you answer one of them with "yes": make a new constexpr constant, even just directly above where you use it! The compiler will just deal with that as if you had the literal number where you use it, it's not any extra memory, or CPU cycles, that this will cost.
Matter of fact, that's even bad! You pre-allocating not-really-large-but-still-unneccesarily-large arrays is just a bad idea. Instead, read n first, then use that n to make std::vectors of that size.
Don not using namespace std;, for multiple reasons, chief among them that now your min and max variables would shadow std::min and std::max, and if you call something, you never know whether you're actually calling what you mean to, or just the function of the same name from the std:: namespace. Instead using std::cout; using std::cin; would do for you here!
This might be beyond your current learning level (that's fine!), but
for (int i = 0; i < n; i++) {
cin >> city[i];
}
is inelegant, and with the std::vector approach, if you make your std::vector really have length n, can be written nicely as:
for (auto &value: city) {
cin >> value;
}
This will also make sure you're not accidentally reading more values than you mean when changing the length of that city storage one day.
It looks as if you're trying to find the minimum and maximum absolute distance between city values. But you do it in an incredibly inefficient way, needing multiple loops over 10⁵·10⁵=10¹⁰ iterations.
Start with the maximum distance: assume your city vector, array (whatever!) were sorted. What are the two elements with the greatest absolute distance?
If you had a sorted array/vector: how would you find the two elements with the smallest distance?

Second FOR loop is being skipped

I am not getting any compile-time errors, but whenever i run the program the second FOR loop gets skipped and the program stops. i put a cout after it and that executed, but the second for loop was still skipped.
#include <iostream>
using std::cout;
int main()
{
int numbers[5];
for(int i; i < 5; i++)
{
std::cin >> numbers[i];
}
for(int i; i < 5; i++)
{
cout << numbers[i] << "\t";
}
}
When you write
int i;
That declares an int named i without initializing it. Its value is said to be indeterminate. There is not much you can do with an indeterminate value, for example
int x = i;
invokes undefined behavior. When your code has ub then almost anything can happen. You could get no output at all, or some gibberish, or Hamlet printed on the screen (though this happens rarely).
You are not initializing the counter in both loops. That the first one appears to work is purely a matter of luck. Or rather bad luck, because appearing to work is the worst incarnation of wrong code.
Many guidelines suggest to initialize variables as soon as you declare them. Often this is put another way: Only declare a variable when you can initialize it. In any case, seeing int i; should make you shiver in fear ;).
Your loops should look like this:
for(int i=0; i < 5; i++)
{
std::cin >> numbers[i];
}
for(int i=0; i < 5; i++)
{
std::cout << numbers[i] << "\t";
}
On the other hand, if you want to iterate the whole container you can use a range based for loop which eliminates the chance for such mistake completely:
for(auto& n : numbers)
{
std::cin >> n;
}
for(const auto& n : numbers)
{
std::cout << n << "\t";
}
Note that use of initialized variables can be diagnosed by most compilers. With -Wall -Werror clang refuses to compile your code. Unfortunatly gcc fails to diagnose the issue in your code (while it quite reliably can diagnose cases such as int i; int x = i;). Hence, you should pay attention to warnings, and to make sure you cannot miss them you can treat them as errors.
In both of the for loops, try replacing int i; with int i = 0;.
From wikipedia: Unintialized variable in C
A common assumption made by novice programmers is that all variables are set to a known value, such as zero, when they are declared. While this is true for many languages, it is not true for all of them, and so the potential for error is there
In C++/C when you don't initialize a variable, it gets the current value from memory or variable stack, you can't expect a especific value.
It's considered a good practice to always initialize variables in C/C++. So just change your code to:
for(int i = 0; i < 5; i++) for both loops and it will works.
Extra
Your IDE should warn you about unitialized variables, check plugins like cppcheck or any kind of linter for c/c++

Why is the auto-vectorized version of this program fragment slower than the simple version

In a larger numerical computation, I have to perform the trivial task of summing up the products of the elements of two vectors. Since this task needs to be done very often, I tried to make use of the auto vectorization capabilities of my compiler (VC2015). I introduced a temporary vector, where the products are saved in in a first loop and then performed the summation in a second loop. Optimization was set to full and fast code was preferred. This way, the first loop got vectorized by the compiler (I know this from the compiler output).
The result was surprising. The vectorized code performed 3 times slower on my machine (core i5-4570 3.20 GHz) than the simple code. Could anybody explain why and what might improve the performance? I've put both versions of the algorithm fragment into a minimal running example, which I used myself for testing:
#include "stdafx.h"
#include <vector>
#include <Windows.h>
#include <iostream>
using namespace std;
int main()
{
// Prepare timer
LARGE_INTEGER freq,c_start,c_stop;
QueryPerformanceFrequency(&freq);
int size = 20000000; // size of data
double v = 0;
// Some data vectors. The data inside doesn't matter
vector<double> vv(size);
vector<double> tt(size);
vector<float> dd(size);
// Put random values into the vectors
for (int i = 0; i < size; i++)
{
tt[i] = rand();
dd[i] = rand();
}
// The simple version of the algorithm fragment
QueryPerformanceCounter(&c_start); // start timer
for (int p = 0; p < size; p++)
{
v += tt[p] * dd[p];
}
QueryPerformanceCounter(&c_stop); // Stop timer
cout << "Simple version took: " << ((double)(c_stop.QuadPart - c_start.QuadPart)) / ((double)freq.QuadPart) << " s" << endl;
cout << v << endl; // We use v once. This avoids its calculation to be optimized away.
// The version that is auto-vectorized
for (int i = 0; i < size; i++)
{
tt[i] = rand();
dd[i] = rand();
}
v = 0;
QueryPerformanceCounter(&c_start); // start timer
for (int p = 0; p < size; p++) // This loop is vectorized according to compiler output
{
vv[p] = tt[p] * dd[p];
}
for (int p = 0; p < size; p++)
{
v += vv[p];
}
QueryPerformanceCounter(&c_stop); // Stop timer
cout << "Vectorized version took: " << ((double)(c_stop.QuadPart - c_start.QuadPart)) / ((double)freq.QuadPart) << " s" << endl;
cout << v << endl; // We use v once. This avoids its calculation to be optimized away.
cin.ignore();
return 0;
}
You added a large amount of work by storing the products in a temporary vector.
For such a simple computation on large data, the CPU time that you expect to save by vectorization doesn't matter. Only memory references matter.
You added memory references, so it runs slower.
I would have expected the compiler to optimize the original version of that loop. I doubt the optimization would affect the execution time (because it is dominated by memory access regardless). But it should be visible in the generated code. If you wanted to hand optimize code like that, a temporary vector is always the wrong way to go. The right direction is the following (for simplicity, I assumed size is even):
for (int p = 0; p < size; p+=2)
{
v += tt[p] * dd[p];
v1 += tt[p+1] * dd[p+1];
}
v += v1;
Note that your data is large enough and operation simple enough, that NO optimization should be able to improve on the simplest version. That includes my sample hand optimization. But I assume your test is not exactly representative of what you are really trying to do or understand. So with smaller data or a more complicated operation, the approach I showed may help.
Also notice my version relies on addition being commutative. For real numbers, addition is commutative. But in floating point, it isn't. The answer is likely to be different by an amount too tiny for you to care. But that is data dependent. If you have large values of opposite sign in odd/even positions canceling each other early in the original sequence, then by segregating the even and odd positions my "optimization" would totally destroy the answer. (Of course, the opposite can also be true. For example, if all the even positions were tiny and the odds included large values canceling each other, then the original sequence produced garbage and the changed sequence would be more correct).

which is faster method to traverse an array

Compare following two codes:
for (int i = 0; i<array_size; i++)
cout << array[i] << endl;
OR
int* p_end = array+array_size;
for (int* p_array = array; p_array != p_end; p_array++)
cout << *p_array << endl;
Which one is faster?
Another question is if we just want to traverse then which is faster: link list or array? Thankyou!
array[i] is potentially faster because the compiler knows you're not aliasing your pointer to someplace you really shouldn't.
Lists are much slower to traverse because of indirection imposed inbetween every node - this will ruin your caches and cause many a cache miss, which is probably the worst thing that can happen to a modern processor.
The first one is likely to be faster on an array.
The second one terminates on finding a zero entry in the structure. It really isn't the same thing. But suppose you changed your code to read:
p_array = array;
for (int i = 0; i < array_size; ++i, ++p_array)
cout << *p_array << endl;
That loop would probably take the same amount of time as the first one with any modern compiler. (In all actuality, the cost of cout would swamp everything anyway.)
The best approach for iterating through a container is really dictated by the container. vector<T> allows O(1) random access via operator[]. Also, its iterators offer O(1) operator++. In contrast, a container like list<T> only lets you use operator++, and it's O(1).
And final thought: You likely want to stick to pre-increment instead of post-increment in C++ when you can.
Note the following:
for (int i = 0; i<array_size; i++)
cout << array[i] << endl;
Each loop you must - compare i to array size, increment i, add i to array
int* p_end = array+array_size;
for (int* p_array = array; p_array != p_end; p_array++)
cout << *p_array << endl;
Each loop you must - compare p_array to p_end, increment p_array (there is no add here)
So it would seem that the second loop - becuase it does not have to do an add should be faster
However - optimizers can do quite a lot so I would recommend compiling both and then comparing the asm for each loop
Also for your second question - Arrays are faster than linked lists to traverse because of the load time for each node and because they take more memory so fewer nodes can be cached