I was experimenting with c++ trying to figure out how I could print the numbers from 0 to n as fast as possible.
At first I just printed all the numbers with a loop:
for (int i = 0; i < n; i++)
{
std::cout << i << std::endl;
}
However, I think this flushes the buffer after every single number that it outputs, and surely that must take some time, so I tried to first print all the numbers to the buffer (or actually until it's full as it seems then seems to flush automatically) and then flush it all at once. However it seems that printing a \n after flushes the buffer like the std::endl so I omitted it:
for (int i = 0; i < n; i++)
{
std::cout << i << ' ';
}
std::cout << std::endl;
This seems to run about 10 times faster than the first example. However I want to know how to store all the values in the buffer and flush it all at once rather than letting it flush every time it becomes full so I have a few questions:
Is it possible to print a newline without flushing the buffer?
How can I change the buffer size so that I could store all the values inside it and flush it at the very end?
Is this method of outputting text dumb? If so, why, and what would be a better alternative to it?
EDIT: It seems that my results were biased by a laggy system (Terminal app of a smartphone)... With a faster system the execution times show no significant difference.
TL;DR: In general, using '\n' instead of std::endl is faster since std::endl
Explanation:
std::endl causes a flushing of the buffer, whereas '\n' does not.
However, you might or might not notice any speedup whatsoever depending upon the method of testing that you apply.
Consider the following test files:
endl.cpp:
#include <iostream>
int main() {
for ( int i = 0 ; i < 1000000 ; i++ ) {
std::cout << i << std::endl;
}
}
slashn.cpp:
#include <iostream>
int main() {
for ( int i = 0 ; i < 1000000 ; i++ ) {
std::cout << i << '\n';
}
}
Both of these are compiled using g++ on my linux system and undergo the following tests:
1. time ./a.out
For endl.cpp, it takes 19.415s.
For slashn.cpp, it takes 19.312s.
2. time ./a.out >/dev/null
For endl.cpp, it takes 0.397s
For slashn.cpp, it takes 0.153s
3. time ./a.out >temp
For endl.cpp, it takes 2.255s
For slashn.cpp, it takes 0.165s
Conclusion: '\n' is definitely faster (even practically), but the difference in speed can be dependant upon other factors. In the case of a terminal window, the limiting factor seems to depend upon how fast the terminal itself can display the text. As the text is shown on screen, and auto scrolling etc needs to happen, massive slowdowns occur in the execution. On the other hand, for normal files (like the temp example above), the rate at which the buffer is being flushed affects it a lot. In the case of some special files (like /dev/null above), since the data is just sinked into a black-hole, the flushing doesn't seem to have an effect.
Related
Update
Ok, I removed the 3 couts and replaced it with *buffer = 'a', and there was a big performance difference. Removing that line made the program 2x as fast. If you go on godbolt and compile it using msvc, that single line of code changes most of the program. (It adds a whole lot more complexity)
The following might seem extremely weird, but it's true on my computer:
Alright, so I was doing some benchmarking of some code, and I noticed extremely weird performance anomalies that were 100% consistent. I'm running windows-10 and visual-studio-2019. Basically, deleting a line of code that is never called completely changes the performance of the program.
Here is exactly what to do:
Create new VS-2019 Console C++ App project
Set the configuration to Release & x64
Paste the code below:
#include <iostream>
#include <chrono>
class Test {
public:
size_t length;
size_t doublingVal;
char* buffer;
Test() : length(0), doublingVal(2) {
buffer = static_cast<char*>(malloc(1));
}
~Test() {
std::cout << "called" << "\n";
std::cout << "called" << "\n";
std::cout << "called" << "\n"; // Remove this line and the time decreases DRASTICALLY (ie remove line 14)
}
void append() {
if (doublingVal == length) {
doublingVal <<= 1;
}
*buffer = 'a';
++length;
}
};
int main()
{
Test test;
auto start = std::chrono::high_resolution_clock::now();
for (size_t i = 0; i < static_cast<size_t>(1024) * 1024 * 1024 * 4; ++i) {
test.append();
}
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - start).count() << "\n";
}
Run the program using CTRL+F5, not in debug. Now remember how long it takes to run. (a few seconds)
Then, in the destructor of Test, remove the third line which has the comment.
Run the program again, and you should see that the performance increases drastically. I tested this exact same code with 4 different projects all brand new, and 3 different computers.
The destructor is called at the very end, when the entire program is finished measuring time. The extra cout shouldn't affect anything.
Edit:
You can also see a similar thing go on if you remove the 3 cout's and replace it with a single *buffer = 'a'. Then CTRL+F5 once again, record the time, and then remove that line we just added. Then run it again and the time magically decreases by half.
WTF is going on, and how do you solve the weird performance difference?
My program reads a file, processes it and saves the results in a csv file.
The whole of us include a loop in which many different files are processed. a separate csv file is generated for each of these files.
I was able to implement the processing very efficiently in terms of time, so that saving the respective results is the longest process in the loop.
The results are available as vector <float> and are currently saved as follows:
std::vector<float*> out = calculation(bla);
fstream data;
data.open(savepfad + name + ".csv", ios::out);
data<< sizex << endl;
data<< sizey << endl;
data<< dim << endl;
for (int d = 0; d < dim; d++)
{
for (int x = 0; x < sizex * sizey; x++)
{
data << out[d][x] << ",";
}
data << endl;
}
data.close();
my first thought was that i would simply outsource the storage process to a new thread (possibly with a fork) so i could continue with the main loop. But I use windows.
can I somehow write the data to the hard drive faster?
does anyone have a brilliant idea?
EDIT:
so i rebuilt the code according to the statements, but there is no real speed advantage. The code now looks like this:
std::vector<float*> out = calculation(bla);
string line = std::to_string(sizex) + "\n" + std::to_string(sizey ) + "\n" + std::to_string(dim) + "\n";
for (int d = 0; d < dim; d++)
{
for (int x = 0; x < sizex * sizey; x++)
{
line += out[d][x];
line += ",";
}
line += "\n";
}
fstream data;
data.open(savepfad + name + ".csv", ios::out);
data<<line;
data.close();
I also noticed that if out [] [] = 0 hours :: to_string (out [] []) makes 0 from 0.00 to 0.000000, and a data << out [] [] only writes 0 into the file. this makes the file size from 8000KB to 36000KB.
So if I can dump quasi instant 100MB onto the hard disk in python, I have to be able to write 8000KB relatively quickly, currently it takes between 1 and 2 minutes.
example size:
sizex = 638
sizey = 958
dim = 8
The time measurement shows that it takes almost the entire time to go through the two loops. it is a vector consisting of arrays. is the access to out too slow?
data << endl sends a newline AND flushes the result to disk.
You could do
data << "\n";
instead to send a newline without flushing.
The end result is that you flush fewer times, which means you spend less time waiting for the OS.
If that is still not fast enough, consider buffering everything into a ostrstream and dumping that into data in one go.
There are a couple of things you can do which may help, I would try implementing them one after another and measure the performance.
Don't flush after every line:
std::endl actually flushes the buffers and saves the file to the drive, that's probably killing the performance. So use << '\n';
You can try to minimize memory allocation and copying, if you buffer every line (or multiple lines) before writing it out. I would try to reserve a big string (std::string line; line.reserve(<big number enough for the full line>);) and do line += std::to_string(out[d][x]); line += ',';
You can optimize this even further, and you can try to use std::to_chars.
+1. If you are on windows, you can try to use the latest MSVC, they reported 5x speedup in float to string conversion (compared to crt functions), after implementing to_chars. https://www.youtube.com/watch?v=4P_kbF0EbZM
I know endl or calling flush() will flush it. I also know that when you call cin after cout, it flushes too. And also when the program exit. Are there other situations that cout flushes?
I just wrote a simple loop, and I didn't flush it, but I can see it being printed to the screen. Why? Thanks!
for (int i =0; i<399999; i++) {
cout<<i<<"\n";
}
Also the time for it to finish is same as withendl both about 7 seconds.
for (int i =0; i<399999; i++) {
cout<<i<<endl;
}
There is no strict rule by the standard - only that endl WILL flush, but the implementation may flush at any time it "likes".
And of course, the sum of all digits in under 400K is 6 * 400K = 2.4MB, and that's very unlikely to fit in the buffer, and the loop is fast enough to run that you won't notice if it takes a while between each output. Try something like this:
for(int i = 0; i < 100; i++)
{
cout<<i<<"\n";
Sleep(1000);
}
(If you are using a Unix based OS, use sleep(1) instead - or add a loop that takes some time, etc)
Edit: It should be noted that this is not guaranteed to show any difference. I know that on my Linux machine, if you don't have a flush in this particular type of scenario, it doesn't output anything - however, some systems may do "flush on \n" or something similar.
I am working on some grid generation code, during which I really want to see where I am, so I download a piece of progress bar code from internet and then inserted it into my code, something like:
std::string bar;
for(int i = 0; i < 50; i++)
{
if( i < (percent/2))
{
bar.replace(i,1,"=");
}
else if( i == (percent/2))
{
bar.replace(i,1,">");
}
else
{
bar.replace(i,1," ");
}
}
std::cout<< "\r" "[" << bar << "] ";
std::cout.width( 3 );
std::cout<< percent << "% "
<< " ieration: " << iterationCycle << std::flush;
This is very straightforward. However, it GREATLY slows down the whole process, note percent=iterI/nIter.
I am really get annoyed with this, I am wondering if there is any smarter and more efficient way to print a progress bar to the screen.
Thanks a million.
Firstly you could consider only updating it on every 100 or 1000 iterations. Secondly, I don't think the division is the bottleneck, but much rather the string operations and the outputting itself.
I guess the only significant improvement would be to just output less often.
Oh and just for good measure - an efficient way to only execute the code every, say, 1024 iterations, would be not to see if 1024 is a divisor using the modulo operations, but rather using bitwise calls. Something along the lines of
if (iterationCycle & 1024) {
would work. You'd be computing the bitwise AND of iterationCycle and 1024, only returning positive for every time the bit on the 10th position would be a 1. These kind of operations are done extremely fast, as your CPU has specific hardware for them.
You might be overthinking this. I would just output a single character every however-many cycles of your main application code. Run some tests to see how many (hundreds? millions?), but you shouldn't print more than say once a second. Then just do:
std::fputc('*', stdout);
std::fflush(stdout);
You should really check "efficiency", but what would work almost the same ist boost.progress:
#include <boost/progress.hpp>
...
boost::progress_display pd(50);
for (int i=0; i<=60; i++) {
++pd;
}
and as Joost already answered, output less often
We are using the following method to write the log to log file. The log entries are kept in a vector named m_LogList (stl string entries are kept in the vector). The method is called when the size of the vector is more than 100. The CPU utilization of the Log server is around 20-40% if we call FlushLog method. If we comment out the FlushLog method, the CPU utilisation drops to 10-20% range.
What optimizations can I use to reduce the CPU utilization? We are using fstream object for writing the log entries to file
void CLogFileWriter::FlushLog()
{
CRCCriticalSectionLock lock(m_pFileCriticalSection);
//Entire content of the vector are writing to the file
if(0 < m_LogList.size())
{
for (int i = 0; i < (int)m_LogList.size(); ++i)
{
m_ofstreamLogFile << m_LogList[i].c_str()<<endl;
m_nSize = m_ofstreamLogFile.tellp();
if(m_pLogMngr->NeedsToBackupFile(m_nSize))
{
// Backup the log file
}
}
m_ofstreamLogFile.flush();
m_LogList.clear(); //Clearing the content of the Log List
}
}
The first optimization I'd use is to drop the .c_str() in << m_LogList[i].c_str(). It forces operator<< to do a strlen (O(n)) instead of relying on string::size (O(1)).
Also, I'd just sum string sizes, instead of calling tellp.
Finally, << endl includes a flush, on every line. Just use << '\n'. You already have the flush at the end.
I'd consider first of all dumping the log in one stdlib call like this:
std::copy(list.begin(), list.end(), std::ostream_iterator<std::string>(m_ofstreamLogFile, "\n"));
This will remove the flushing because of endl and the unnecessary conversion to a c string. CPU-wise this should be quite efficient.
You can do the backup afterwards unless you really care about a very specific limit, but even in that case I'd say: backup atsome lower threshold so that you can account for some overflow.
Also, remove if(0 < m_LogList.size()), it's not really necessary for anything.
A few comments:
if(0 < m_LogList.size())
Should be:
if(!m_LogList.empty())
Although with a vector it shouldn't make a difference.
Also you should consider moving the
m_nSize = m_ofstreamLogFile.tellp();
if(m_pLogMngr->NeedsToBackupFile(m_nSize)) { /*...*/ }
out of the loop. You don't say how much CPU it uses, but I'd bet it is heavy.
You could also iterate using iterators:
for (int i = 0; i < (int)m_LogList.size(); ++i)
Should be:
for (std::vector<std::string>::iterator it = m_LogList.begin();
it != m_LogList.end(); ++it)
Lastly, change the line:
m_ofstreamLogFile << m_LogList[i].c_str()<<endl;
Into:
m_ofstreamLogFile << m_LogList[i] << '\n';
The .c_str() is unnecesary. And the endl writes an EOL and flushes the stream. You do not want to do that, as you are flushing it at the end of the loop.