C++ Program, Console/Terminal Output. How to implement "updating text" - c++

I am writing a C++ program, which runs a long data analysis algorithm. It takes several days to finish running, so it is useful to have a prompt which outputs the "percentage complete" every time a new loop in the program starts so that the user (me) knows the computer isn't sitting in an infinite loop somewhere or has crashed.
At the moment I am doing this the most basic way, by computing the percentage complete as a floating point number and doing:
std::cout << "Percentage complete: " << percentage_complete << " %" << std::endl;
But, when the program has a million loops to run, this is kind of messy. In addition, if the terminal scrollback is only 1000 lines, then I lose the initial debug info printed out at the start once the program is 0.1 % complete.
I would like to copy an idea I have seen in other programs, where instead of writing a new line each time with the percentage complete, I simply replace the last line written to the terminal with the new percentage complete.
How can I do this? Is that possible? And if so, can this be done in a cross platform way? Are there several methods of doing this?
I am unsure how to describe what I am trying to do perfectly clearly, so I hope that this clear enough that you understand what I am trying to do.
To clarify, rather than seeing this:
Running program.
Debug info:
Total number of loops: 1000000
Percentage complete: 0 %
Percentage complete: 0.001 %
Percentage complete: 0.002 %
.
.
.
Percentage complete: 1.835 %
I would like to see this:
Running program.
Debug info:
Total number of loops: 1000000
Percentage complete: 1.835 %
And then on the next loop the terminal should update to this:
Running program.
Debug info:
Total number of loops: 1000000
Percentage complete: 1.836 %
I hope that's enough information.
(Okay, so this output would actually be for 100000 steps, not 1000000.)

Instead of \n or std::endl, use \r. The difference is that the latter returns the cursor to the beginning if the line without a new line.
Disclaimer (as per Lightness' objections): This is not necessarily portable, so YMMV.

Related

C++ not printing to console during while loop, only after loop is finished

Background
I'm currently writing some code for a naughts and crosses machine learning program based on M.E.N.A.C.E, and I have finished the code for the actual machine learning and for playing against the computer, as well as another computer to play games against it to train it. The user can enter that they'd like to let the computer train itself, and then enter the number of games to play.
Problem
I'm trying to make a % completion display that overwrites itself each time it updates. The issue I have is that for some reason the code won't print anything that should be printed during the while loop, instead waiting until the end to print all of it in one go. I am using '\r' (carriage return) to overwrite the last printed text. If I remove the carriage return, the while loop prints the text on each iteration like it should do. I don't have any idea what's causing this problem as I'm quite new to C++.
I am programming in Repl.it since I'm not able to install an IDE on the computer I'm using.
Here is the subroutine for calculating and displaying the % completion (using namespace std).
void calcCompletion(int a, int b)
{
int completion = (static_cast<float>(a)/b) * 100;
cout << '\r';
cout << completion << "%";
}
And here is the start of the while loop where the procedure is called (mode is always 2 when I am testing this).
while(gamesPlayed < gameEnd)
{
//permutations();
if(mode != "1")
{
calcCompletion(gamesPlayed, gameEnd);
}
It's a very long while loop so I won't show the whole thing (hence why the curly brackets do not match up).
And here is the output:
 clang++-7 -pthread -std=c++17 -o main ai.cpp base3.cpp main.cpp otherai.cpp permutations.cpp winCheck.cpp
 ./main
Enter mode.
1 - Play the AI
2 - Train the AI
2
How many games would you like the AI to play?
5
Simulating...
80%
Games complete.
Games played: 5
Games won: 1
Games lost: 0
Games drawn: 4
Win Percentage: 20%
Loss Percentage: 0%
--------------
It just waits until it is done with the while loop and then prints the last number, instead of printing as it goes.
I have tested trying to overwrite something I've written with no time delay in another code, it works fine so clearly being overwritten too quickly isn't the problem.
You have not closed the curly braces of (if clause) under the while loop

FIO runtime different than gettimeofday()

I am trying to measure the execution time of FIO benchmark. I am, currently, doing so wrapping the FIO call between gettimeofday():
gettimeofday(&startFioFix, NULL);
FILE* process = popen("fio --name=randwrite --ioengine=posixaio rw=randwrite --size=100M --direct=1 --thread=1 --bs=4K", "r");
gettimeofday(&doneFioFix, NULL);
and calculate the elapsed time as:
double tstart = startFioFix.tv_sec + startFioFix.tv_usec / 1000000.;
double tend = doneFioFix.tv_sec + doneFioFix.tv_usec / 1000000.;
double telapsed = (tend - tstart);
Now, the question(s) is
telapsed time is different (larger) than the runt by FIO output. Can you please help me in understanding Why? as the fact can be seen in FIO output:
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=posixaio, iodepth=1
fio-2.2.8
Starting 1 thread
randwrite: (groupid=0, jobs=1): err= 0: pid=3862: Tue Nov 1 18:07:50 2016
write: io=102400KB, bw=91674KB/s, iops=22918, runt= 1117msec
...
and the telapsed is:
telapsed: 1.76088 seconds
what is the actual time taken by FIO execution:
a) runt given by FIO, or
b) the elapsed time by getttimeofday()
How does FIO measure its runt? (probably, this question linked to 1.)
PS: I have tried to replace the gettimeofday(with std::chrono::high_resolution_clock::now()), but it also behaves the same (by same, I mean it also gives larger elapsed time than runt)
Thank you in advance, for your time and assistance.
A quick point:gettimeofday() on Linux uses a clock that doesn't necessarily tick at a constant interval and can even move backwards (see http://man7.org/linux/man-pages/man2/gettimeofday.2.html and https://stackoverflow.com/a/3527632/4513656 ) - this may make telapsed unreliable (or even negative).
Your gettimeofday/popen/gettimeofday measurement (telapsed) is going to be: the fio process start up (i.e. fork+exec on Linux) elapsed + fio initialisation (e.g. thread creation because I see --thread, ioengine initialisation) + fio job elapsed (runt) + fio stopping elapsed + process stop elapsed). You are comparing this to just runt which is a sub component of telapsed. It is unlikely all the non-runt components are going to happen instantly (i.e. take up 0 usecs) so the expectation is that runt will be smaller than telapsed. Try running fio with --debug=all just to see all the things it does in addition to actually submitting I/O for the job.
This is difficult to answer because it depends on what you want you mean when you say "fio execution" and why (i.e. the question is hard to interpret in an unambiguous way). Are you interested in how long fio actually spent trying to submit I/O for a given job (runt)? Are you interested in how long it takes your system to start/stop a new process that just so happens to try and submit I/O for a given period (telapsed)? Are you interested in how much CPU time was spent submitting I/O (none of the above)? So because I'm confused I'll ask you some questions instead: what are you going to use the result for and why?
Why not look at the source code? https://github.com/axboe/fio/blob/7a3b2fc3434985fa519db55e8f81734c24af274d/stat.c#L405 shows runt comes from ts->runtime[ddir]. You can see it is initialised by a call to set_epoch_time() (https://github.com/axboe/fio/blob/6be06c46544c19e513ff80e7b841b1de688ffc66/backend.c#L1664 ), is updated by update_runtime() ( https://github.com/axboe/fio/blob/6be06c46544c19e513ff80e7b841b1de688ffc66/backend.c#L371 ) which is called from thread_main().

Single thread programme apparently using multiple core

Question summary: all four cores used when running a single threaded programme. Why?
Details: I have written a non-parallelised programme in Xcode (C++). I was in the process of parallelising it, and wanted to see whether what I was doing was actually resulting in more cores being used. To that end I used Instruments to look at the core usage. To my surprise, while my application is single threaded, all four cores were being utilised.
To test whether it changed the performance, I dialled down the number of cores available to 1 (you can do it in Instruments, preferences) and the speed wasn't reduced at all. So (as I knew) the programme isn't parallelised in any way.
I can't find any information on what it means to use multiple cores to perform single threaded tasks. Am I reading the Instruments output wrong? Or is the single-threaded process being shunted between different cores for some reason (like changing lanes on a road instead of driving in two lanes at once - i.e. actual parallelisation)?
Thanks for any insight anyone can give on this.
EDIT with MWE (apologies for not doing this initially).
The following is C++ code that finds primes under 500,000, compiled in Xcode.
#include <iostream>
int main(int argc, const char * argv[]) {
clock_t start, end;
double runTime;
start = clock();
int i, num = 1, primes = 0;
int num_max = 500000;
while (num <= num_max) {
i = 2;
while (i <= num) {
if(num % i == 0)
break;
i++;
}
if (i == num){
primes++;
std::cout << "Prime: " << num << std::endl;
}
num++;
}
end = clock();
runTime = (end - start) / (double) CLOCKS_PER_SEC;
std::cout << "This machine calculated all " << primes << " under " << num_max << " in " << runTime << " seconds." << std::endl;
return 0;
}
This runs in 36s or thereabouts on my machine, as shown by the final out and my phone's stopwatch. When I profile it (using instruments launched from within Xcode) it gives a run-time of around 28s. The following image shows the core usage.
instruments showing core usage with all 4 cores (with hyper threading)
Now I reduce number of available cores to 1. Re-running from within the profiler (pressing the record button), it says a run-time of 29s; a picture is shown below.
instruments output with only 1 core available
That would accord with my theory that more cores doesn't improve performance for a single thread programme! Unfortunately, when I actually time the programme with my phone, the above took about 1 minute 30s, so there is a meaningful performance gain from having all cores switched on.
One thing that is really puzzling me, is that, if you leave the number of cores at 1, go back to Xcode and run the program, it again says it takes about 33s, but my phone says it takes 1 minute 50s. So changing the cores is doing something to the internal clock (perhaps).
Hopefully that describes the problem fully. I'm running on a 2015 15 inch MBP, with 2.2GHz i7 quad core processor. Xcode 7.3.1
I want to premise your answer lacks a lots of information in order to proceed an accurate diagnostic. Anyway I'll try to explain you the most common reason IHMO, supposing you application doesn't use 3-rd part component which perform in a multi-thread way.
I think that could be a result of scheduler effect. I'm going to explain what I mean.
Each core of the processor takes a process in the system and executed it for a "short" amount of time. This is the most common solution in desktop operative system.
Your process is executed on a single core for this amount of time and then stopped in order to allow other process to continue. When your same process is resumed it could be executed in another core (always one core, but a different one). So a poor precise task manager with a low resolution time could register the utilization of all cores, even if it does not.
In order to verify whether the cause could be that, I suggest you to see the amount of CPU % used in the time your application is running. Indeed in case of a single thread application the CPU should be about 1/#numberCore , in your case 25%.
If it's a release build your compiler may be vectorising parallelise your code. Also libraries you link against, say the standard library for example, may be threaded or vectorised.

How to track total time of program?

Normally in an IDE, when you run a program the IDE will tell you the total amount of time that it took to run the program. Is there a way to get the total amount of time that it takes to run a program when using the terminal in Unix/Linux to compile and run?
I'm aware of ctime which allows for getting the total time since 1970, however I want to get just the time that it takes for the program to run.
You can start programs with time:
[:~/tmp] $ time sleep 1
real 0m1.007s
user 0m0.001s
sys 0m0.003s
You are on the right track! You can get the current time and subtract it from the end time of your program. The code below illustrated:
time_t begin = time(0); // get current time
// Do Stuff //
time_t end = time(0); // get current time
// Show number of seconds that have passed since program began //
std::cout << end - begin << std::endl;
NOTE: The time granularity is only a single second. If you need higher granularity, I suggest looking into precision timers such as QueryPerformanceCounter() on windows or clock_gettime() on linux. In both cases, the code will likely work very similarly.
As an addendum to mdsl's answer, if you want to get something close to that measurement in the program itself, you can get the time at the start of the program and get the time at the end of the program (as you said, in time since 1970) - then subtract the start time from the end time.

C++ Program performs better when piped

I haven't done any programming in a decade. I wanted to get back into it, so I made this little pointless program as practice.
The easiest way to describe what it does is with output of my --help codeblock:
./prng_bench --help
./prng_bench: usage: ./prng_bench $N $B [$T]
This program will generate an N digit base(B) random number until
all N digits are the same.
Once a repeating N digit base(B) number is found, the following statistics are displayed:
-Decimal value of all N digits.
-Time & number of tries taken to randomly find.
Optionally, this process is repeated T times.
When running multiple repititions, averages for all N digit base(B)
numbers are displayed at the end, as well as total time and total tries.
My "problem" is that when the problem is "easy", say a 3 digit base 10 number, and I have it do a large number of passes the "total time" is less when piped to grep. ie:
command ; command |grep took :
./prng_bench 3 10 999999 ; ./prng_bench 3 10 999999|grep took
....
Pass# 999999: All 3 base(10) digits = 3 base(10). Time: 0.00005 secs. Tries: 23
It took 191.86701 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.
An average of 0.00019 secs & 99 tries was needed to find each one.
It took 159.32355 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.
If I run the same command many times w/o grep time is always VERY close.
I'm using srand(1234) for now, to test. The code between my calls to clock_gettime() for start and stop do not involve any stream manipulation, which would obviously affect time. I realize this is an exercise in futility, but I'd like to know why it behaves this way.
Below is heart of the program. Here's a link to the full source on DB if anybody wants to compile and test. https://www.dropbox.com/s/bczggar2pqzp9g1/prng_bench.cpp
clock_gettime() requires -lrt.
for (int pass_num=1; pass_num<=passes; pass_num++) { //Executes $passes # of times.
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &temp_time); //get time
start_time = timetodouble(temp_time); //convert time to double, store as start_time
for(i=1, tries=0; i!=0; tries++) { //loops until 'comparison for' fully completes. counts reps as 'tries'. <------------
for (i=0; i<Ndigits; i++) //Move forward through array. |
results[i]=(rand()%base); //assign random num of base to element (digit). |
/*for (i=0; i<Ndigits; i++) //---Debug Lines--------------- |
std::cout<<" "<<results[i]; //---a LOT of output.---------- |
std::cout << "\n"; //---Comment/decoment to disable/enable.*/ // |
for (i=Ndigits-1; i>0 && results[i]==results[0]; i--); //Move through array, != element breaks & i!=0, new digits drawn. -|
} //If all are equal i will be 0, nested for condition satisfied. -|
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &temp_time); //get time
draw_time = (timetodouble(temp_time) - start_time); //convert time to dbl, subtract start_time, set draw_time to diff.
total_time += draw_time; //add time for this pass to total.
total_tries += tries; //add tries for this pass to total.
/*Formated output for each pass:
Pass# ---: All -- base(--) digits = -- base(10) Time: ----.---- secs. Tries: ----- (LINE) */
std::cout<<"Pass# "<<std::setw(width_pass)<<pass_num<<": All "<<Ndigits<<" base("<<base<<") digits = "
<<std::setw(width_base)<<results[0]<<" base(10). Time: "<<std::setw(width_time)<<draw_time
<<" secs. Tries: "<<tries<<"\n";
}
if(passes==1) return 0; //No need for totals and averages of 1 pass.
/* It took ----.---- secs & ------ tries to find --- repeating -- digit base(--) numbers. (LINE)
An average of ---.---- secs & ---- tries was needed to find each one. (LINE)(LINE) */
std::cout<<"It took "<<total_time<<" secs & "<<total_tries<<" tries to find "
<<passes<<" repeating "<<Ndigits<<" digit base("<<base<<") numbers.\n"
<<"An average of "<<total_time/passes<<" secs & "<<total_tries/passes
<<" tries was needed to find each one. \n\n";
return 0;
Printing to the screen is very slow in comparison to a pipe or running without printing. Piping to grep keeps you from doing it.
It is not about printing to the screen; it is about the output being a terminal (tty).
According to the POSIX spec:
When opened, the standard error stream is not fully buffered; the
standard input and standard output streams are fully buffered if and
only if the stream can be determined not to refer to an interactive
device.
Linux interprets this to make the FILE * (i.e. stdio) stdout line-buffered when the output is a tty (e.g. your terminal window), and block-buffered otherwise (e.g. your pipe).
The reason sync_with_stdio makes a difference is that when it is enabled, the C++ cout stream inherits this behavior. When you set it to false, it is no longer bound by that behavior and thus becomes block buffered.
Block buffering is faster because it avoids the overhead of flushing the buffer on every newline.
You can further verify this by piping to cat instead of grep. The difference is the pipe itself, not the screen per se.
Thank you Collin & Nemo. I was certain that because I wasn't calling std::cout between getting start & stop times that it wouldn't have an effect. Not so. I think this is due to optimizations that the compiler performs even with -O0 or 'defaults'.
What I think is happening...? I think that as Collin suggested, the compiler is trying to be clever about when it writes to the TTY. And, as Nemo pointed out, cout inherits the line buffered properties of stdio.
I'm able to reduce the effect, but not eliminate, by using:
std::cout.sync_with_stdio(false);
From my limited reading on this, it should be called before any output operations are done.
Here's source for no_sync version: https://www.dropbox.com/s/wugo7hxvu9ao8i3/prng_bench_no_sync.cpp
./no_sync 3 10 999999;./no_sync 3 10 999999|grep took
Compiled with -O0
999999: All 3 base(10) digits = 3 base(10) Time: 0.00004 secs. Tries: 23
It took 166.30801 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.
An average of 0.00017 secs & 99 tries was needed to find each one.
It took 163.72914 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.
Complied with -O3
999999: All 3 base(10) digits = 3 base(10) Time: 0.00003 secs. Tries: 23
It took 143.23234 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.
An average of 0.00014 secs & 99 tries was needed to find each one.
It took 140.36195 secs & 99947208 tries to find 999999 repeating 3 digit base(10) numbers.
Specifying not to sync with stdio changed my delta between piped and non-piped from over 30 seconds to less than 3. See original question for original delta it was ~191 - ~160
To further test I created another version using a struct to store stats about each pass. This method does all output after all passes are complete. I want to emphasize that this is probably a terrible idea. I'm allowing a command line argument to determine the size of a dynamically allocated array of structs containing an int, double and unsigned long. I can't even run this version with 999,999 passes. I get a segmentation fault. https://www.dropbox.com/s/785ntsm622q9mwd/prng_bench_struct.cpp
./struct_prng 3 10 99999;./struct_prng 3 10 99999|grep took
Pass# 99999: All 3 base(10) digits = 6 base(10) Time: 0.00025 secs. Tries: 193
It took 13.10071 secs & 9970298 tries to find 99999 repeating 3 digit base(10) numbers.
An average of 0.00013 secs & 99 tries was needed to find each one.
It took 13.12466 secs & 9970298 tries to find 99999 repeating 3 digit base(10) numbers.
What I've learned from this is that you can't count on the order you've coded things being the order they're executed in. In future programs I'll probably implement getopt instead of writing my own parse_args function. This would allow me to surpress extraneous output on high repetition loops, by requiring users to use the -v switch if they want to see it.
I hope the further testing proves useful to anybody wondering about piping and output in loops. All of the results I've posted were obtained on a RasPi. All of the source codes linked are GPL, just because that's the first license I could think of... I really have no self-aggrandizing need for the copyleft provisions of the GPL, I just wanted to be clear that it's free, but without warranty or liability.
Note that all of the sources linked have the call to srand(...) commented out, so all of your pseudo-random results will be exactly the same.