C++ program runs slow in VS2008

C++ program runs slow in VS2008 - c++

I have a program written in C++, that opens a binary file(test.bin), reads it object by object, and puts each object into a new file (it opens the new file, writes into it(append), and closes it).
I use fopen/fclose, fread and fwrite.
test.bin contains 20,000 objects.
This program runs under linux with g++ in 1 sec but in VS2008 in debug/release mode in 1min!
There are reasons why I don't do them in batches or don't keep them in memory or any other kind of optimizations.
I just wonder why it is that much slow under windows.
Thanks,

I believe that when you close a file in Windows, it flushes the contents to disk each time. In Linux, I don't think that is the case. The flush on each operation would be very expensive.

Unfortunately file access on Windows isn't renowned for its brilliant speed, particularly if you're opening lots of files and only reading and writing small amounts of data. For better results, the (not particularly helpful) solution would be to read large amounts of data from a small number of files. (Or switch to Linux entirely for this program?!)
Other random suggestions to try:
turn off the virus checker if you have one (I've got Kaspersky on my PC, and writing 20,000 files quickly drove it bananas)
use an NTFS disk if you have one (FAT32 will be even worse)
make sure you're not accidentally using text mode with fopen (easily done)
use setvbuf to increase the buffer size for each FILE
try CreateFile/ReadFile/etc. instead of fopen and friends, which won't solve your problem but may shave a few seconds off the running time (since the stdio functions do a bit of extra work that you probably don't need)

I think it is not matter of VS 2008. It is matter of Linux and Windows file system differences. And how C++ works with files in both systems.

I'm seeing a lot of guessing here.
You're running under VS2008 IDE. You can always use the "poor man's profiler" and find out exactly what's going on.
In that minute, hit the "pause" button and look at what it's doing, including the call stack. Do this several times. Every single pause is almost certain (Prob = 59/60) to catch it doing precisely what it doesn't do under Linux.

Related

Writing similar contents to many files at once in C++

I am working on a C++ program that needs to write several hundreds of ASCII files. These files will be almost identical. In particular, the size of the files is always exactly the same, with only few characters different between them.
For this I am currently opening up N files with a for-loop over fopen and then calling fputc/fwrite on each of them for every chunk of data (every few characters). This seems to work, but it feels like there should be some more efficient way.
Is there something I can do to decrease the load on the file system and/or improve the speed of this? For example, how taxing is it on the file system to keep hundreds of files open and write to all of them bit by bit? Would it be better to open one file, write that one entirely, close it and only then move on to the next?

If you consider the cost of context switches usually involved on doing any of those syscalls then yes, you should "pigghy back" as much data is possible taing into account the writing time and the lenght of buffers.
Given also the fact that this is primarly an io driven problem maybe a pub sub architecture where the publisher bufferize data for you to give to any subscriber that does the io work (and that also waits for the underlying storage mechanism to be ready) could be a good choice.

You can write just once to one file and then make copies of that file. You can read about how making copies here
This is the sample code from the upper link how to do it in C++:
int main() {
String* path = S"c:\\temp\\MyTest.txt";
String* path2 = String::Concat(path, S"temp");
// Ensure that the target does not exist.
File::Delete(path2);
// Copy the file.
File::Copy(path, path2);
Console::WriteLine(S"{0} copied to {1}", path, path2);
return 0;
}

Without benchmarking your particular system, I would GUESS - and that is probably as best as you can get - that writing a file at a time is better than opening lost of files and writing the data to several files. After all, preparing the data in memory is a minor detail, the writing to the file is the "long process".

I have done some testing now and it seems like, at least on my system, writing all files in parallel is about 60% slower than writing them one after the other (263s vs. 165s for 100 files times 100000000 characters).
I also tried to use ofstream instead of fputc, but fputc seems to be about twice as fast.
In the end, I will probably keep doing what I am doing at the moment, since the complexity of rewriting my code to write one file at a time is not worth the performance improvement.

Does my text editor application have a memory leak? Why does it consume 3x more memory than Notepad

I am writing a text Editor application. As an experiment I ran the application and monitered its memory usage on Task Manager as I performed different actions.
When I first launched the application, it used 3000 kB.
It stayed roughly the same when I typed
When I clicked on save, it shot up to 9000kb
and then it just stayed at 8500kb (It didn't go back down to 3000kb)
Is this caused by a memory leak? I'm a bit confused because I observed similar behaviour with Notepad.
Launching: 1500kb
Saving: 6000 kb
After saving, memory stays at around 5000kb
Also, why does my application take up 3x more memory than Notepad.exe, what kind of things could cause that? Should I be worried?

To start with you want to know where that memory is actually being used. There are a lot of complex programs to do memory analysis/profiling, but if you want something more detailed than Task Manager but still fairly simple and free, Sysinternals vmmap is great.
http://technet.microsoft.com/en-us/sysinternals/dd535533
As others have mentioned, the save is probably causing other libraries to be pulled in. The text itself is also going to contribute to your memory usage. VMMap will help you determine how much is yours and how much is other stuff. Then you could see if your part is really growing substantially over time or not. You probably want a large amount of time of stress testing to really see if it is leaking memory if you are not going to use a memory profiler, otherwise the leak is probably not going to be big enough to really notice easily.

The File-Save dialog starting up for the first time probably burns a lot of memory. Opening the file dialog embeds a copy of Explorer in the window, for instance, and loading Explorer into your process carries a lot of baggage along with it.

The fact that you are using Qt means there's a lots of unnecessary code added to your software. Qt Core for instance is over 2MB, Qt Gui is about 8 MB. Microsoft on the other hand have probably coded Notepad using pure C/C++ and the Windows API, which mean they have a smaller and faster executable.
Finally, it also depends on your compiler. MinGW is going to create larger and slower executables than Visual C++ compiler. So if you can, try to use Microsoft's compiler.

I tried exactly the same in notepad, the save needs more memory. If you open a current file and save it, then there is no difference in memory. Creating the file takes tons of memory, in the end.

Should I leave a QFile (or fstream) open?

I usually use two methods to write to files, either with Qt's QFile or STL's fstream.
I have a long-running (several minutes) simulation which logs data to a file. Performance-wise and design-wise, is it a good idea to:
Keep it open the whole time
Close and open on every write
Somewhere in-between (1) and (2)
Several question on here address this issue (for Perl, for fopen), but I didn't see any discussion of QFile and fstream. The previous answers suggest to keep it open (option 1). Is this still the cast for QFile and fstream?

Performance wise, it would definitely be better to keep it open for the life of the application, just because less work opening and closing files means less time spent doing things that don't move the application closer to completion, which will slow the program down. As for design, just make sure to close the file before the application terminates.
QFile and fstream probably use fopen, fwrite etc under the hood (although it is of course implementation dependent). So I would bet that anything applying to FILE*s would apply to QFiles and fstreams.

This may be dependent on your libc implementation but fstream is generally uses memory-mapped files. These are generally very efficient, and only main memory or swap when a page of data is written to.
If you are running a 32-bit system and these files are very large or very numerous then you could have issues with exhausting the virtual address space (on windows ~2GB might cause such problems). Seeing as you are simply logging this seems quite unlikely.
But simply closing the files might make it worse in that case because then the virtual address space could become fragmented.
I would advise that it is best to leave the files open at all times unless you think you will run into the issues above. If you are memory constrained then flushing the data will reduce the physical memory requirements.

Need some help writing my results to a file

My application continuously calculates strings and outputs them into a file. This is being run for almost an entire day. But writing to the file is slowing my application. Is there a way I can improve the speed ? Also I want to extend the application so that I can send the results to an another system after some particular amount of time.
Thanks & Regards,
Mousey

There are several things that may or may not help you, depending on your scenario:
Consider using asynchronous I/O, for instance by using Boost.Asio. This way your application does not have to wait for expensive I/O-operations to finish. However, you will have to buffer your generated data in memory, so make sure there is enough available.
Consider buffering your strings to a certain size, and then write them to disk (or the network) in big batches. Few big writes are usually faster than many small ones.
If you want to make it really good C++, meaning STL-comliant, make your algorithm a template-function that takes and output-iterator as argument. This way you can easily have it write to files, the network, memory or the console by providing appropriate iterators.

How if you write the results to a socket, instead of file. Another program Y, will read the socket, open a file, write on it and close it, and after the specified time will transfer the results to another system.
I mean the process of file handling is handled by other program. Original program X just sends the output to the socket. It does not concern it self with flushing the file stream.
Also I want to extend the application
so that I can send the results to an
another system after some particular
amount of time.
If you just want to transfer the file to other system, then I think a simple script will be enough for that.

Use more than one file for the logging. Say, after your file reaches size of 1 MB, change its name to something contains the date and the time and start to write to a new one, named as the original file name.
then you have:
results.txt
results2010-1-2-1-12-30.txt (January 2 2010, 1:12:30)
and so on.

You can buffer the result of different computations in memory and only write to the file when buffer is full. For example, your can design your application in such a way that, it computes result for 100 calculations and writes all those 100 results at once in a file. Then computes another 100 and so on.

Writing file is obviously slow, but you can buffered data and initiate the separate thread for writhing on file. This can improve speed of your application.
Secondly you can use ftp for transfer files to other system.

I think there are some red herrings here.
On an older computer system, I would recommend caching the strings and doing a small number of large writes instead of a large number of small writes. On modern systems, the default disk-caching is more than adequate and doing additional buffering is unlikely to help.
I presume that you aren't disabling caching or opening the file for every write.
It is possible that there is some issue with writing very large files, but that would not be my first guess.
How big is the output file when you finish?
What causes you to think that the file is the bottleneck? Do you have profiling data?
Is it possible that there is a memory leak?
Any code or statistics you can post would help in the diagnosis.

Random Complete System Unresponsiveness Running Mathematical Functions

I have a program that loads a file (anywhere from 10MB to 5GB) a chunk at a time (ReadFile), and for each chunk performs a set of mathematical operations (basically calculates the hash).
After calculating the hash, it stores info about the chunk in an STL map (basically <chunkID, hash>) and then writes the chunk itself to another file (WriteFile).
That's all it does. This program will cause certain PCs to choke and die. The mouse begins to stutter, the task manager takes > 2 min to show, ctrl+alt+del is unresponsive, running programs are slow.... the works.
I've done literally everything I can think of to optimize the program, and have triple-checked all objects.
What I've done:
Tried different (less intensive) hashing algorithms.
Switched all allocations to nedmalloc instead of the default new operator
Switched from stl::map to unordered_set, found the performance to still be abysmal, so I switched again to Google's dense_hash_map.
Converted all objects to store pointers to objects instead of the objects themselves.
Caching all Read and Write operations. Instead of reading a 16k chunk of the file and performing the math on it, I read 4MB into a buffer and read 16k chunks from there instead. Same for all write operations - they are coalesced into 4MB blocks before being written to disk.
Run extensive profiling with Visual Studio 2010, AMD Code Analyst, and perfmon.
Set the thread priority to THREAD_MODE_BACKGROUND_BEGIN
Set the thread priority to THREAD_PRIORITY_IDLE
Added a Sleep(100) call after every loop.
Even after all this, the application still results in a system-wide hang on certain machines under certain circumstances.
Perfmon and Process Explorer show minimal CPU usage (with the sleep), no constant reads/writes from disk, few hard pagefaults (and only ~30k pagefaults in the lifetime of the application on a 5GB input file), little virtual memory (never more than 150MB), no leaked handles, no memory leaks.
The machines I've tested it on run Windows XP - Windows 7, x86 and x64 versions included. None have less than 2GB RAM, though the problem is always exacerbated under lower memory conditions.
I'm at a loss as to what to do next. I don't know what's causing it - I'm torn between CPU or Memory as the culprit. CPU because without the sleep and under different thread priorities the system performances changes noticeably. Memory because there's a huge difference in how often the issue occurs when using unordered_set vs Google's dense_hash_map.
What's really weird? Obviously, the NT kernel design is supposed to prevent this sort of behavior from ever occurring (a user-mode application driving the system to this sort of extreme poor performance!?)..... but when I compile the code and run it on OS X or Linux (it's fairly standard C++ throughout) it performs excellently even on poor machines with little RAM and weaker CPUs.
What am I supposed to do next? How do I know what the hell it is that Windows is doing behind the scenes that's killing system performance, when all the indicators are that the application itself isn't doing anything extreme?
Any advice would be most welcome.

I know you said you had monitored memory usage and that it seems minimal here, but the symptoms sound very much like the OS thrashing like crazy, which would definitely cause general loss of OS responsiveness like you're seeing.
When you run the application on a file say 1/4 to 1/2 the size of available physical memory, does it seem to work better?
What I suspect may be happening is that Windows is "helpfully" caching your disk reads into memory and not giving up that cache memory to your application for use, forcing it to go to swap. Thus, even though swap use is minimal (150MB), it's going in and out constantly as you calculate the hash. This then brings the system to its knees.

Some things to check:
Antivirus software. These often scan files as they're opened to check for viruses. Is your delay occuring before any data is read by the application?
General system performance. Does copying the file using Explorer also show this problem?
Your code. Break it down into the various stages. Write a program that just reads the file, then one that reads and writes the files, then one that just hashes random blocks of ram (i.e. remove the disk IO part) and see if any particular step is problematic. If you can get a profiler then use this as well to see if there any slow spots in your code.
EDIT
More ideas. Perhaps your program is holding on to the GDI lock too much. This would explain everything else being slow without high CPU usage. Only one app at a time can have the GDI lock. Is this a GUI app, or just a simple console app?
You also mentioned RtlEnterCriticalSection. This is a costly operation, and can hang the system quite easily, i.e. mismatched Enters and Leaves. Are you multi-threading at all? Is the slow down due to race conditions between threads?

XPerf is your guide here - watch the PDC Video about it, and then take a trace of the misbehaving app. It will tell you exactly what's happening throughout the system, it is extremely powerful.

I like the disk-caching/thrashing suggestions, but if that's not it, here are some scattershot suggestions:
What non-MSVC libraries, if any, are you linking to?
Can your program be modified (#ifdef'd) to run without a GUI? Does the problem occur?
You added ::Sleep(100) after each loop in each thread, right? How many threads are you talking about? A handful or hundreds? How long does each loop take, roughly? What happens if you make that ::Sleep(10000)?
Is your program perhaps doing something else that locks a limited resources (ProcExp can show you what handles are being acquired ... of course you might have difficulty with ProcExp not responding:-[)
Are you sure CriticalSections are userland-only? I recall that was so back when I worked on Windows (or so I believed), but Microsoft could have modified that. I don't see any guarantee in the MSDN article Critical Section Objects (http://msdn.microsoft.com/en-us/library/ms682530%28VS.85%29.aspx) ... and this leads me to wonder: Anti-convoy locks in Windows Server 2003 SP1 and Windows Vista
Hmmm... presumably we're all multi-processor now, so are you setting the spin count on the CS?
How about running a debugging version of one of these OSes and monitoring the kernel debugging output (using DbgView)... possibly using the kernel debugger from the Platform SDK ... if MS still calls it that?
I wonder whether VMMap (another SysInternal/MS utility) might help with the Disk caching hypothesis.

It turns out that this is a bug in the Visual Studio compiler. Using a different compiler resolves the issue entirely.
In my case, I installed and used the Intel C++ Compiler and even with all optimizations disabled I did not see the fully-system hang that I was experiencing w/ the Visual Studio 2005 - 2010 compilers on this library.
I'm not certain as to what is causing the compiler to generate such broken code, but it looks like we'll be buying a copy of the Intel compiler.

It sounds like you're poking around fixing things without knowing what the problem is. Take stackshots. They will tell you what your program is doing when the problem occurs. It might not be easy to get the stackshots if the problem occurs on other machines where you cannot use an IDE or a stack sampler. One possibility is to kill the app and get a stack dump when it's acting up. You need to reproduce the problem in an environment where you can get a stack dump.
Added: You say it performs well on OSX and Linux, and poorly on Windows. I assume the ratio of completion time is some fairly large number, like 10 or 100, if you've even had the patience to wait for it. I said this in the comment, but it is a key point. The program is waiting for something, and you need to find out what. It could be any of the things people mentioned, but it is not random.
Every program, all the time while it runs, has a call stack consisting of a hierarchy of call instructions at specific addresses. If at a point in time it is calculating, the last instruction on the stack is a non-call instruction. If it is in I/O the stack may reach into a few levels of library calls that you can't see into. That's OK. Every call instruction on the stack is waiting. It is waiting for the work it requested to finish. If you look at the call stack, and look at where the call instructions are in your code, you will know what your program is waiting for.
Your program, since it is taking so long to complete, is spending nearly all of its time waiting for something to finish, and as I said, that's what you need to find out. Get a stack dump while it's being slow, and it will give you the answer. The chance that it will miss it is 1/the-slowness-ratio.
Sorry to be so elemental about this, but lots of people (and profiler makers) don't get it. They think they have to measure.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js