I have a program (lets call it a.exe) that reads some data from files, calculates something and then writes the result to another file.
I need to call a.exe from my own fortran code.
I came up with the following solution:
write data in fortran to disk.
call a.exe from fortran.
a.exe reads now it's input file, does a calculation and writes the result in a file.
parse output of a.exe in fortran.
This works theoretically but it's not fast enough because i need to call a.exe a lot. Therefore my program spends too much time doing IO.
So, if anyone has an idea to improve my solution, I'd be grateful.
As long as a.exe remains a separate executable that operates in the way you describe, there is little improvement possible. After the first invocation, the contents of a.exe should be cached in your memory, and further invocations should not be reading a.exe from the disk any more till it gets squeezed out.
The time spent on IO of reading and writing the files you are processing is inevitable as long as you are using a disk for interprocess communication; this can be sped up considerably by setting up a ram disk (tmpfs).
Obviously, more could be done if you can rewrite the code of a.exe. If you can make it in to an object file that can be linked with your fortran code, that would be ideal. If not, you could make it into a persistent server, for example, so you only run it once. Though if your process is still slow after moving your data files to a ramdisk, you might have other problems.
Related
I am working on a C++ program that needs to write several hundreds of ASCII files. These files will be almost identical. In particular, the size of the files is always exactly the same, with only few characters different between them.
For this I am currently opening up N files with a for-loop over fopen and then calling fputc/fwrite on each of them for every chunk of data (every few characters). This seems to work, but it feels like there should be some more efficient way.
Is there something I can do to decrease the load on the file system and/or improve the speed of this? For example, how taxing is it on the file system to keep hundreds of files open and write to all of them bit by bit? Would it be better to open one file, write that one entirely, close it and only then move on to the next?
If you consider the cost of context switches usually involved on doing any of those syscalls then yes, you should "pigghy back" as much data is possible taing into account the writing time and the lenght of buffers.
Given also the fact that this is primarly an io driven problem maybe a pub sub architecture where the publisher bufferize data for you to give to any subscriber that does the io work (and that also waits for the underlying storage mechanism to be ready) could be a good choice.
You can write just once to one file and then make copies of that file. You can read about how making copies here
This is the sample code from the upper link how to do it in C++:
int main() {
String* path = S"c:\\temp\\MyTest.txt";
String* path2 = String::Concat(path, S"temp");
// Ensure that the target does not exist.
File::Delete(path2);
// Copy the file.
File::Copy(path, path2);
Console::WriteLine(S"{0} copied to {1}", path, path2);
return 0;
}
Without benchmarking your particular system, I would GUESS - and that is probably as best as you can get - that writing a file at a time is better than opening lost of files and writing the data to several files. After all, preparing the data in memory is a minor detail, the writing to the file is the "long process".
I have done some testing now and it seems like, at least on my system, writing all files in parallel is about 60% slower than writing them one after the other (263s vs. 165s for 100 files times 100000000 characters).
I also tried to use ofstream instead of fputc, but fputc seems to be about twice as fast.
In the end, I will probably keep doing what I am doing at the moment, since the complexity of rewriting my code to write one file at a time is not worth the performance improvement.
I would like to have a small "application loader" program that receives other binary application files over TCP from an external server and runs them.
I could do this by saving the transmitted file to the hard disk and using the system() call to run it. However, I am wondering if it would be possible to launch the new application from memory without it ever touching the hard drive.
The state of the loader application does not matter after loading a new application. I prefer to stick to C, but C++ solutions are welcome as well. I would also like to stick to standard Linux C functions and not use any external libraries, if possible.
Short answer: no.
Long answer: It's possible but rather tricky to do this without writing it out to disk. You can theoretically write your own elf loader that reads the binary, maps some memory, handles the dynamic linking as required, and then transfers control but that's an awful lot of work, that's hardly ever going to be worth the effort.
The next best solution is to write it to disk and call unlink ASAP. The disk doesn't even have to be "real" disk, it can be tmpfs or similar.
The alternative I've been using recently is to not pass complete compiled binaries around, but to pass LLVM bytecode instead, which can then be JIT'd/interpreted/saved as fit. This also has the advantage of making your application work in heterogeneous environments.
It may be tempting to try a combination of fmemopen, fileno and fexecve, but this won't work for two reasons:
From fexecve() manpage:
"The file descriptor fd must be opened read-only, and the caller must have permission to execute the file that it refers to"
I.e. it needs to be a fd that refers to a file.
From fmemopen() manpage:
"There is no file descriptor associated with the file stream returned by these functions (i.e., fileno(3) will return an error if called on the returned stream)"
Much easier than doing it is C would just to set up a tmpfs file system. You'd have all the advantages of the interface of a harddisk, from your program / server / whatever you could just do an exec. These types of virtual filesystems are quite efficient nowadays, there would be really just one copy of the executable in the page cache.
As Andy points out, for such scheme to be efficient you'd have to ensure that you don't use buffered writes to the file but that you "write" (in a broader sense) directly in place.
you'd have to know how large your executable will be
create a file on your tmpfs
scale it to that size with ftruncate
"map" that file into memory with mmap to obtain the addr of a buffer
pass that address directly to the recv call to write the data in place
munmap the file
call exec with the file
rm the file. can be done even when the executable is still running
You might want to look at and reuse UPX, which decompresses the executable to memory, and then transfers control to ld-linux to start it.
My application continuously calculates strings and outputs them into a file. This is being run for almost an entire day. But writing to the file is slowing my application. Is there a way I can improve the speed ? Also I want to extend the application so that I can send the results to an another system after some particular amount of time.
Thanks & Regards,
Mousey
There are several things that may or may not help you, depending on your scenario:
Consider using asynchronous I/O, for instance by using Boost.Asio. This way your application does not have to wait for expensive I/O-operations to finish. However, you will have to buffer your generated data in memory, so make sure there is enough available.
Consider buffering your strings to a certain size, and then write them to disk (or the network) in big batches. Few big writes are usually faster than many small ones.
If you want to make it really good C++, meaning STL-comliant, make your algorithm a template-function that takes and output-iterator as argument. This way you can easily have it write to files, the network, memory or the console by providing appropriate iterators.
How if you write the results to a socket, instead of file. Another program Y, will read the socket, open a file, write on it and close it, and after the specified time will transfer the results to another system.
I mean the process of file handling is handled by other program. Original program X just sends the output to the socket. It does not concern it self with flushing the file stream.
Also I want to extend the application
so that I can send the results to an
another system after some particular
amount of time.
If you just want to transfer the file to other system, then I think a simple script will be enough for that.
Use more than one file for the logging. Say, after your file reaches size of 1 MB, change its name to something contains the date and the time and start to write to a new one, named as the original file name.
then you have:
results.txt
results2010-1-2-1-12-30.txt (January 2 2010, 1:12:30)
and so on.
You can buffer the result of different computations in memory and only write to the file when buffer is full. For example, your can design your application in such a way that, it computes result for 100 calculations and writes all those 100 results at once in a file. Then computes another 100 and so on.
Writing file is obviously slow, but you can buffered data and initiate the separate thread for writhing on file. This can improve speed of your application.
Secondly you can use ftp for transfer files to other system.
I think there are some red herrings here.
On an older computer system, I would recommend caching the strings and doing a small number of large writes instead of a large number of small writes. On modern systems, the default disk-caching is more than adequate and doing additional buffering is unlikely to help.
I presume that you aren't disabling caching or opening the file for every write.
It is possible that there is some issue with writing very large files, but that would not be my first guess.
How big is the output file when you finish?
What causes you to think that the file is the bottleneck? Do you have profiling data?
Is it possible that there is a memory leak?
Any code or statistics you can post would help in the diagnosis.
I have a program written in C++, that opens a binary file(test.bin), reads it object by object, and puts each object into a new file (it opens the new file, writes into it(append), and closes it).
I use fopen/fclose, fread and fwrite.
test.bin contains 20,000 objects.
This program runs under linux with g++ in 1 sec but in VS2008 in debug/release mode in 1min!
There are reasons why I don't do them in batches or don't keep them in memory or any other kind of optimizations.
I just wonder why it is that much slow under windows.
Thanks,
I believe that when you close a file in Windows, it flushes the contents to disk each time. In Linux, I don't think that is the case. The flush on each operation would be very expensive.
Unfortunately file access on Windows isn't renowned for its brilliant speed, particularly if you're opening lots of files and only reading and writing small amounts of data. For better results, the (not particularly helpful) solution would be to read large amounts of data from a small number of files. (Or switch to Linux entirely for this program?!)
Other random suggestions to try:
turn off the virus checker if you have one (I've got Kaspersky on my PC, and writing 20,000 files quickly drove it bananas)
use an NTFS disk if you have one (FAT32 will be even worse)
make sure you're not accidentally using text mode with fopen (easily done)
use setvbuf to increase the buffer size for each FILE
try CreateFile/ReadFile/etc. instead of fopen and friends, which won't solve your problem but may shave a few seconds off the running time (since the stdio functions do a bit of extra work that you probably don't need)
I think it is not matter of VS 2008. It is matter of Linux and Windows file system differences. And how C++ works with files in both systems.
I'm seeing a lot of guessing here.
You're running under VS2008 IDE. You can always use the "poor man's profiler" and find out exactly what's going on.
In that minute, hit the "pause" button and look at what it's doing, including the call stack. Do this several times. Every single pause is almost certain (Prob = 59/60) to catch it doing precisely what it doesn't do under Linux.
I have created an application that does the following:
Make some calculations, write calculated data to a file - repeat for 500,000 times (over all, write 500,000 files one after the other) - repeat 2 more times (over all, 1.5 mil files were written).
Read data from a file, make some intense calculations with the data from the file - repeat for 1,500,000 iterations (iterate over all the files written in step 1.)
Repeat step 2 for 200 iterations.
Each file is ~212k, so over all i have ~300Gb of data. It looks like the entire process takes ~40 days on a Core 2 Duo CPU with 2.8 Ghz.
My problem is (as you can probably guess) is the time it takes to complete the entire process. All the calculations are serial (each calculation is dependent on the one before), so i can't parallel this process to different CPUs or PCs. I'm trying to think how to make the process more efficient and I'm pretty sure the most of the overhead goes to file system access (duh...). Every time i access a file i open a handle to it and then close it once i finish reading the data.
One of my ideas to improve the run time was to use one big file of 300Gb (or several big files of 50Gb each), and then I would only use one open file handle and simply seek to each relevant data and read it, but I'm not what is the overhead of opening and closing file handles. can someone shed some light on this?
Another idea i had was to try and group the files to bigger ~100Mb files and then i would read 100Mb each time instead of many 212k reads, but this is much more complicated to implement than the idea above.
Anyway, if anyone can give me some advice on this or have any idea how to improve the run time i would appreciate it!
Thanks.
Profiler update:
I ran a profiler on the process, it looks like the calculations take 62% of runtime and the file read takes 34%. Meaning that even if i miraculously cut file i/o costs by a factor of 34, I'm still left with 24 days, which is quite an improvement, but still a long time :)
Opening a file handle isn't probable to be the bottleneck; actual disk IO is. If you can parallelize disk access (by e.g. using multiple disks, faster disks, a RAM disk, ...) you may benefit way more. Also, be sure to have IO not block the application: read from disk, and process while waiting for IO. E.g. with a reader and a processor thread.
Another thing: if the next step depends on the current calculation, why go through the effort of saving it to disk? Maybe with another view on the process' dependencies you can rework the data flow and get rid of a lot of IO.
Oh yes, and measure it :)
Each file is ~212k, so over all i have
~300Gb of data. It looks like the
entire process takes ~40 days ...a ll the
calculations are serial (each
calculation is dependent on the one
before), so i can't parallel this
process to different CPUs or PCs. ... pretty
sure the most of the overhead goes to
file system access ... Every
time i access a file i open a handle
to it and then close it once i finish
reading the data.
Writing data 300GB of data serially might take 40 minutes, only a tiny fraction of 40 days. Disk write performance shouldn't be an issue here.
Your idea of opening the file only once is spot-on. Probably closing the file after every operation is causing your processing to block until the disk has completely written out all the data, negating the benefits of disk caching.
My bet is the fastest implementation of this application will use a memory-mapped file, all modern operating systems have this capability. It can end up being the simplest code, too. You'll need a 64-bit processor and operating system, you should not need 300GB of RAM. Map the whole file into address space at one time and just read and write your data with pointers.
From your brief explaination it sounds like xtofl suggestion of threads is the correct way to go. I would recommend you profile your application first though to ensure that the time is divided between IO an cpu.
Then I would consider three threads joined by two queues.
Thread 1 reads files and loads them into ram, then places data/pointers in the queue. If the queue goes over a certain size the thread sleeps, if it goes below a certain size if starts again.
Thread 2 reads the data off the queue and does the calculations then writes the data to the second queue
Thread 3 reads the second queue and writes the data to disk
You could consider merging thread 1 and 3, this might reduce contention on the disk as your app would only do one disk op at a time.
Also how does the operating system handle all the files? Are they all in one directory? What is performance like when you browse the directory (gui filemanager/dir/ls)? If this performance is bad you might be working outside your file systems comfort zone. Although you could only change this on unix, some file systems are optimised for different types of file usage, eg large files, lots of small files etc. You could also consider splitting the files across different directories.
Before making any changes it might be useful to run a profiler trace to figure out where most of the time is spent to make sure you actually optimize the real problem.
What about using SQLite? I think you can get away with a single table.
Using memory mapped files should be investigated as it will reduce the number of system calls.