can you please explain why this code is going in infinite loop? I am unable to find the error. It is working fine with small values of n and m.
#include<bits/stdc++.h>
using namespace std;
int main()
{
long long n=1000000, m=1000000;
long long k = 1;
for (long long i = 0; i < n; i++)
{
for (long long j = 0; j < m; j++)
{
k++;
}
}
cout << k;
return 0;
}
It's not infinite, but that k++ operation has to run for 1,000,000 * 1,000,000 = 1,000,000,000,000 times. It's not infinite, but it takes too long. That's exactly why it works well with small n and m values.
It is a typical target for optimization.
Build with -Ofast.
g++ t_duration.cpp -Ofast -std=c++11 -o a_fast
#time ./a_fast
1000000000001
real 0m0.002s
user 0m0.000s
sys 0m0.002s
it takes almost no time to return the output.
Build with -O1.
g++ t_duration.cpp -O1 -std=c++11 -o a_1
#./a_1
419774 ms
About 420 seconds to complete the calculation.
when I Run this code I got this as my output
1000000000001
reference : I had run this code in the code chef IDE(https://www.codechef.com/ide)
you can try in this IDE once, I guess there is some problem with your IDE
or might be some other issue
it took me less than 20 sec to run this(on clock😁)
But when I put the same code in CODE::BLOCKS its taking long time(like you said infinite loops running) and the reason is quite simple it should do 1000000000000 runs
however this brings me a questions what's difference between code chef IDE and code::blocks compiler
(Got a New question from your question 😁🤔(Difference Between Code-Chef IDE and Code::Blocks))
Finally answer is try on with code chef IDE that's it , this code runs fast there🤣
Hope this Helps you 😃
Related
So I encountered some strange behavior, which I stripped down to the following minimal example:
#include <iostream>
#include <vector>
int main()
{
std::vector<int> vec;
for(int i = 0; i < 1000; i++)
{
vec.push_back(2150000 * i);
if(i % 100 == 0) std::cout << i << std::endl;
}
}
When compiling with gcc 7.3.0 using the command
c++ -Wall -O2 program.cpp -o program
I get no warnings. Running the program produces the following output:
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
[ snip several thousand lines of output ]
1073741600
1073741700
1073741800
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
which I guess means that I finally ran out of memory for the vector.
Clearly something is wrong here. I guess this has something to do with the fact that 2150000 * 1000 is slightly larger than 2^31, but it's not quite as simple as that -- if I decrease this number to 2149000 then the program behaves as expected:
0
100
200
300
400
500
600
700
800
900
The cout isn't necessary to reproduce this behavior, so I suppose a minimal example is actually
#include <vector>
int main()
{
std::vector<int> vec;
for(int i = 0; i < 1000; i++)
{
vec.push_back(2150000 * i);
}
}
Running this causes the program to wait for a long time and then crash.
Question
I'm fairly new to C++ at any serious level. Am I doing something stupid here that allows for undefined behavior, and if so, what? Or is this a bug in gcc?
I did try to Google this, but I don't really know what to Google.
Addendum
I see that (signed) integer overflow is undefined behavior in C++. To my understanding, that would only mean that the behavior of the expression
21500000 * i
is undefined -- i.e. that it could evaluate to an arbitrary number. That said, we can see that this expression is at least not changing the value of i.
To answer my own question, after examining the assembler output it looks like g++ optimizes this loop by changing
for(int i = 0; i < 1000; i++)
{
vec.push_back(2150000 * i);
}
to something like
for(int j = 0; j < 1000 * 2150000; j += 2150000)
{
vec.push_back(j);
}
I guess the addition is faster than doing a multiplication each cycle, and the rule about overflows being undefined behavior means that this change can be made without worrying about whether this introduces unexpected behavior if that calculation overflows.
Of course the conditional in the optimized loop always fails, so ultimately I end up with something more like
for(int j = 0; true; j += 2150000)
{
vec.push_back(j);
}
Consider the following code:
#include <iostream>
#include <chrono>
using Time = std::chrono::high_resolution_clock;
using us = std::chrono::microseconds;
int main()
{
volatile int i, k;
const int n = 1000000;
for(k = 0; k < 200; ++k) {
auto begin = Time::now();
for (i = 0; i < n; ++i); // <--
auto end = Time::now();
auto dur = std::chrono::duration_cast<us>(end - begin).count();
std::cout << dur << std::endl;
}
return 0;
}
I am repeatedly measuring the execution time of the inner for loop.
The results are shown in the following plot (y: duration, x: repetition):
What is causing the decreasing of the loop execution time?
Environment: linux (kernel 4.2) # Intel i7-2600, compiled using: g++ -std=c++11 main.cpp -O0 -o main
Edit 1
The question is not about compiler optimization or performance benchmarks.
The question is, why the performance gets better over time.
I am trying to understand what is happening at run-time.
Edit 2
As proposed by Vaughn Cato, I have changed the CPU frequency scaling policy to "Performance". Now I am getting the following results:
It confirms Vaughn Cato's conjecture. Sorry for the silly question.
What you are probably seeing is CPU frequency scaling (throttling). The CPU goes into a low-frequency state to save power when it isn't being heavily used.
Just before running your program, the CPU clock speed is probably fairly low, since there is no big load. When you run your program, the busy loop increases the load, and the CPU clock speed goes up until you hit the maximum clock speed, decreasing your times.
If you run your program several times in a row, you'll probably see the times stay at a lower value after the first run.
In you original experiment, there are too many variables than can affect the measurements:
the use of your processor by other active processes (i.e. scheduling of your OS)
The question whether your loop is optimized away or not
The access and buffering to the console.
The initial mode of your CPU (see answer about throtling)
I must admit that I was very skeptical about your observations. I therefore wrote a small variant using a preallocated vector, to avoid I/O synchronisation effects:
volatile int i, k;
const int n = 1000000, kmax=200,n_avg=30;
std::vector<long> v(kmax,0);
for(k = 0; k < kmax; ++k) {
auto begin = Time::now();
for (i = 0; i < n; ++i); // <-- remain thanks to volatile
auto end = Time::now();
auto dur = std::chrono::duration_cast<us>(end - begin).count();
v[k]=dur;
}
I then ran it several times on ideone (which, given the scale of its use, we can assume that in average the processor whould be in a constantly sollicitated state). Indeed your observations seemed to be confirmed.
I guess that this could be related to branch prediction, which should improve through the repetitive patterns.
I however went on, updated the code slightly and added a loop to repeat the experiment several times. Then I started to get also runs where your observation was not confirmed (i.e. at the end, the time was higher). But it may also be that the many other processes running on the ideone also influence the branch prediction in a different manner.
So in the end, to conclude anything would require a more cautious experiment, on a machine running this benchmark (and only it) a couple of hours.
One comparison I felt that was missing from the Why is this C++ program so incredibly fast? discussion is Fortran. I translated Sven Hager's C++ benchmark:
#include <iostream>
#include <cstdlib>
#include <cstdint>
int main(int argc, char* argv[]) {
uint32_t s = 0;
uint32_t outer = atoi(argv[1]);
uint32_t inner = atoi(argv[2]);
for (uint32_t i = 0; i < outer; ++i) {
for (uint32_t j = 0; j < inner; ++j)
++s;
s -= inner;
}
std::cout << s << std::endl;
return 0;
}
to its Fortran equivalent:
PROGRAM Benchmark
IMPLICIT NONE
INTEGER :: i,j,s
INTEGER, PARAMETER :: outer=1000,inner=1000000
s = 0
DO i = 1, outer
DO j = 1, inner
s = s + 1
END DO
s = s - inner
END DO
PRINT *, s
END PROGRAM Benchmark
and compiled a fully optimized version with gfortran -g -std=f2008 -Wall -Wextra -O3 Benchmark.f08. I expected to acheive similar performance as Herr Hager:
real 0m0.003s
user 0m0.002s
sys 0m0.002s
What I got was a little puzzling:
real 0m0.003s
user 0m0.000s
sys 0m0.000s
Digging more deeply, I found this discussion on What do 'real', 'user' and 'sys' mean in the output of time(1)?. In it they say that user+sys gives the actual CPU time the process used. So, what does a user+sys of zero actually mean?
In this case, "zero" means "less than a millisecond" (or half a millisecond, depending on how it's rounded), since that's the resolution of the times given by time.
It's only useful for measuring programs that take considerably longer than a millisecond to run.
The reason that the Fortran version is much faster is probably because the loop bounds are hard-coded constants, so that the entire calculation can be done at compile time, leaving just PRINT 0 to do at runtime.
I've made a small application that averages the numbers between 1 and 1000000. It's not hard to see (using a very basic algebraic formula) that the average is 500000.5 but this was more of a project in learning C++ than anything else.
Anyway, I made clock variables that were designed to find the amount of clock steps required for the application to run. When I first ran the script, it said that it took 3770000 clock steps, but every time that I've run it since then, it's taken "0.0" seconds...
I've attached my code at the bottom.
Either a.) It's saved the variables from the first time I ran it, and it's just running quickly to the answer...
or b.) something is wrong with how I'm declaring the time variables.
Regardless... it doesn't make sense.
Any help would be appreciated.
FYI (I'm running this through a Linux computer, not sure if that matters)
double avg (int arr[], int beg, int end)
{
int nums = end - beg + 1;
double sum = 0.0;
for(int i = beg; i <= end; i++)
{
sum += arr[i];
}
//for(int p = 0; p < nums*10000; p ++){}
return sum/nums;
}
int main (int argc, char *argv[])
{
int nums = 1000000;//atoi(argv[0]);
int myarray[nums];
double timediff;
//printf("Arg is: %d\n",argv[0]);
printf("Nums is: %d\n",nums);
clock_t begin_time = clock();
for(int i = 0; i < nums; i++)
{
myarray[i] = i+1;
}
double average = avg(myarray, 0, nums - 1);
printf("%f\n",average);
clock_t end_time = clock();
timediff = (double) difftime(end_time, begin_time);
printf("Time to Average: %f\n", timediff);
return 0;
}
You are measuring the I/O operation too (printf), that depends on external factors and might be affecting the run time. Also, clock() might not be as precise as needed to measure such a small task - look into higher resolution functions such as clock_get_time(). Even then, other processes might affect the run time by generating page fault interrupts and occupying the memory BUS, etc. So this kind of fluctuation is not abnormal at all.
On the machine I tested, Linux's clock call was only accurate to 1/100th of a second. If your code runs in less than 0.01 seconds, it will usually say zero seconds have passed. Also, I ran your program a total of 50 times in .13 seconds, so I find it suspicous that you claim it takes 2 seconds to run it once on your computer.
Your code incorrectly uses the difftime, which may display incorrect output as well if clock says time did pass.
I'd guess that the first timing you got was with different code than that posted in this question, becase I can't think of any way the code in this question could produce a time of 3770000.
Finally, benchmarking is hard, and your code has several benchmarking mistakes:
You're timing how long it takes to (1) fill an array, (2) calculate an average, (3) format the result string (4) make an OS call (slow) that prints said string in the right language/font/colo/etc, which is especially slow.
You're attempting to time a task which takes less than a hundredth of a second, which is WAY too small for any accurate measurement.
Here is my take on your code, measuring that the average takes ~0.001968 seconds on this machine.
I have a C++ application in which I need to compare two values and decide which is greater. The only complication is that one number is represented in log-space, the other is not. For example:
double log_num_1 = log(1.23);
double num_2 = 1.24;
If I want to compare num_1 and num_2, I have to use either log() or exp(), and I'm wondering if one is easier to compute than the other (i.e. runs in less time, in general). You can assume I'm using the standard cmath library.
In other words, the following are semantically equivalent, so which is faster:
if(exp(log_num_1) > num_2)) cout << "num_1 is greater";
or
if(log_num_1 > log(num_2)) cout << "num_1 is greater";
AFAIK the algorithms, the complexity is the same, the difference should be only a (hopefully negligible) constant.
Due to this, I'd use the exp(a) > b, simply because it doesn't break on invalid input.
Do you really need to know? Is this going to occupy a large fraction of you running time? How do you know?
Worse, it may be platform dependent. Then what?
So sure, test it if you care, but spending much time agonizing over micro-optimization is usually a bad idea.
Edit: Modified the code to avoid exp() overflow. This caused the margin between the two functions to shrink considerably. Thanks, fredrikj.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main(int argc, char **argv)
{
if (argc != 3) {
return 0;
}
int type = atoi(argv[1]);
int count = atoi(argv[2]);
double (*func)(double) = type == 1 ? exp : log;
int i;
for (i = 0; i < count; i++) {
func(i%100);
}
return 0;
}
(Compile using:)
emil#lanfear /home/emil/dev $ gcc -o test_log test_log.c -lm
The results seems rather conclusive:
emil#lanfear /home/emil/dev $ time ./test_log 0 10000000
real 0m2.307s
user 0m2.040s
sys 0m0.000s
emil#lanfear /home/emil/dev $ time ./test_log 1 10000000
real 0m2.639s
user 0m2.632s
sys 0m0.004s
A bit surprisingly log seems to be the faster one.
Pure speculation:
Perhaps the underlying mathematical taylor series converges faster for log or something? It actually seems to me like the natural logarithm is easier to calculate than the exponential function:
ln(1+x) = x - x^2/2 + x^3/3 ...
e^x = 1 + x + x^2/2! + x^3/3! + x^4/4! ...
Not sure that the c library functions even does it like this, however. But it doesn't seem totally unlikely.
Since you're working with values << 1, note that x-1 > log(x) for x<1,
which means that x-1 < log(y) implies log(x) < log(y), which already
takes care of 1/e ~ 37% of the cases without having to use log or exp.
some quick tests in python (which uses c for math):
$ time python -c "from math import log, exp;[exp(100) for i in xrange(1000000)]"
real 0m0.590s
user 0m0.520s
sys 0m0.042s
$ time python -c "from math import log, exp;[log(100) for i in xrange(1000000)]"
real 0m0.685s
user 0m0.558s
sys 0m0.044s
would indicate that log is slightly slower
Edit: the C functions are being optimized out by compiler it seems, so the loop is what is taking up the time
Interestingly, in C they seem to be the same speed (possibly for reasons Mark mentioned in comment)
#include <math.h>
void runExp(int n) {
int i;
for (i=0; i<n; i++) {
exp(100);
}
}
void runLog(int n) {
int i;
for (i=0; i<n; i++) {
log(100);
}
}
int main(int argc, char **argv) {
if (argc <= 1) {
return 0;
}
if (argv[1][0] == 'e') {
runExp(1000000000);
} else if (argv[1][0] == 'l') {
runLog(1000000000);
}
return 0;
}
giving times:
$ time ./exp l
real 0m2.987s
user 0m2.940s
sys 0m0.015s
$ time ./exp e
real 0m2.988s
user 0m2.942s
sys 0m0.012s
It can depend on your libm, platform and processor. You're best off writing some code that calls exp/log a large number of times, and using time to call it a few times to see if there's a noticeable difference.
Both take basically the same time on my computer (Windows), so I'd use exp, since it's defined for all values (assuming you check for ERANGE). But if it's more natural to use log, you should use that instead of trying to optimise without good reason.
It would make sense that log be faster... Exp has to perform several multiplications to arrive at its answer whereas log only has to convert the mantissa and exponent to base-e from base-2.
Just be sure to boundary check (as many others have said) if you're using log.
if you are sure that this is hotspot -- compiler instrinsics are your friends. Although it`s platform-dependent (if you go for performance in places like this -- you cannot be platform-agnostic)
So. The question really is -- which one is asm instruction on your target architecture -- and latency+cycles. Without this it is pure speculation.