File writing with overlapped IO vs file writing in a separate thread - c++

Is there any advantage in using file writing with overlapped IO in Windows, vs just doing the file writing in a separate thread that I create?
[Edit - please note that I'm doing the file writes without system caching, ie I use the FILE_FLAG_NO_BUFFERING flag in CreateFile)

Since all writes are cached in the system cache by default, there is little advantage to doing overlapped I/O or creating a separate thread for writes at all. Most WriteFile calls are just memcpys at their core, which are lazily written to disk by the OS in an optimal fashion with other writes.
You can, of course, turn off buffered I/O via flags to CreateFile and then there are advantages to doing some sort of async I/O - but you probably didn't/shouldn't do that.
Edit
The OP has clarified they are in fact using unbuffered I/O. In that case the two suggested solutions are nearly identical; internally Windows uses a thread pool to service async I/O requests. But hypothetically Windows can be more efficient because their half is implemented in the kernel, has less context switches, etc.

One advantage with overlapped I/O is that it lets a single thread (or more usually a pool of threads) to handle an arbitrary number of I/O requests concurrently. This might not be an advantage for an single-user, desktop application, but for a server application that can get requests for I/O from many different clients it can be a major win.

Possibly because overlapped I/O in windows will tell Windows to write out the file on it's own time in background, as opposed to spawning a whole new thread and engaging in a blocking operation?

Related

What actually happens in asynchronous IO

I keep reading about why asynchronous IO is better than synchronous IO, which is because in a-sync IO, your program can keep running, while in sync IO you're blocked until operation is finished.
I do not understand this saying because using sync IO (such as write()) the kernel writes the data to the disk - it doesn't happen by itself. The kernel do need CPU time in order to do it.
So in a-sync IO, it needs it as well, which might result in context switch from my application to the kernel. So it's not really blocking, but there cpu cycles do need to run this operation.
Is that correct?
Is the difference between those two that we assume disk access is slow, so compared to sync IO where you wait for the data to be written to disk, in a-sync IO the time you wait for it to be written to disk can be used to continue doing application processing, and the kernel part of writing it to disk is small?
Let's say I have an application that all it does is get info and write it into files. Is there any benefit for using a-sync IO instead of sync IO?
Examples of sync IO:
write()
Examples of async IO:
io_uring (as I understand has zero copy as well, so it's a benefit)
spdk (should be best, though I don't understand how to use it)
aio
Your understanding is partly right, but which tools you use are a matter of what programming model you prefer, and don't determine whether your program will freeze waiting for I/O operations to finish. For certain, specialized, very-high-load applications, some models are marginally to moderately more efficient, but unless you're in such a situation, you should pick the model that makes it easy to write and maintain your program and have it be portable to systems you and your users care about, not the one someone is marketing as high-performance.
Traditionally, there were two ways to do I/O without blocking:
Structure your program as an event loop performing select (nowadays poll; select is outdated and has critical flaws) on a set of file descriptors that might be ready for reading input or accepting output. This requires keeping some sort of explicit state for partial input that you're not ready to process yet and for pending output that you haven't been able to write out yet.
Separate I/O into separate execution contexts. Historically the unixy approach to this was separate processes, and that can still make sense when you have other reasons to want separate processes anyway (privilege isolation, etc.) but the more modern way to do this is with threads. With a separate execution context for each I/O channel you can just use normal blocking read/write (or even buffered stdio functions) and any partial input or unfinished output state is kept for you implicitly in the call frame stack/local variables of its execution context.
Note that, of the above two options, only the latter helps with stalls from disk access being slow, as regular files are always "ready" for input and output according to select/poll.
Nowadays there's a trend, probably owing largely to languages like JavaScript, towards a third approach, the "async model", with even handler callbacks. I find it harder to work with, requiring more boilerplate code, and harder to reason about, than either of the above methods, but plenty of people like it. If you want to use it, it's probably preferable to do so with a library that abstracts the Linuxisms you mentioned (io_uring, etc.) so your program can run on other systems and doesn't depend on latest Linux fads.
Now to your particular question:
Let's say I have an application that all it does is get info and write it into files. Is there any benefit for using a-sync IO instead of sync IO?
If your application has a single input source (no interactivity) and single output, e.g. like most unix commands, there is absolutely no benefit to any kind of async I/O regardless of which programmind model (event loop, threads, async callbacks, whatever). The simplest and most efficient thing to do is just read and write.
The kernel do need CPU time in order to do it.
Is that correct?.
Pretty much, yes.
Is the difference between those two that we assume disk access is slow ... in a-sync IO the time you wait for it to be written to disk can be used to continue doing application processing, and the kernel part of writing it to disk is small?
Exactly.
Let's say I have an application that all it does is get info and write it into files. Is there any benefit for using a-sync IO instead of sync IO?
Depends on many factors. How does the application "get info"? Is it CPU intensive? Does it use the same IO as the writing? Is it a service that processes multiple requests concurrently? How many simultaneous connections? Is the performance important in the first place? In some cases: Yes, there may be significant benefit in using async IO. In some other cases, you may get most of the benefits by using sync IO in a separate thread. And in other cases single threaded sync IO can be sufficient.
I do not understand this saying because using sync IO (such as write()) the kernel writes the data to the disk - it doesn't happen by itself. The kernel do need CPU time in order to do it.
No. Most modern devices are able to transfer data to/from RAM by themselves (using DMA or bus mastering).
For an example; the CPU might tell a disk controller "read 4 sectors into RAM at address 0x12345000" and then the CPU can do anything else it likes while the disk controller does the transfer (and will be interrupted by an IRQ from the disk controller when the disk controller has finished transferring the data).
However; for modern systems (where you can have any number of processes all wanting to use the same device at the same time) the device driver has to maintain a list of pending operations. In this case (under load); when the device generates an IRQ to say that it finished an operation the device driver responds by telling the device to start the next "pending operation". That way the device spends almost no time idle waiting to be asked to start the next operation (much better device utilization) and the CPU spends almost all of its time doing something else (between IRQs).
Of course often hardware is more advanced (e.g. having an internal queue of operations itself, so driver can tell it to do multiple things and it can start the next operation as soon as it finished the previous operation); and often drivers are more advanced (e.g. having "IO priorities" to ensure that more important stuff is done first rather than just having a simple FIFO queue of pending operations).
Let's say I have an application that all it does is get info and write it into files. Is there any benefit for using a-sync IO instead of sync IO?
Lets say that you get info from deviceA (while CPU and deviceB are idle); then process that info a little (while deviceA and deviceB are idle); then write the result to deviceB (while deviceA and CPU are idle). You can see that most hardware is doing nothing most of the time (poor utilization).
With asynchronous IO; while deviceA is fetching the next piece of info the CPU can be processing the current piece of info while deviceB is writing the previous piece of info. Under ideal conditions (no speed mismatches) you can achieve 100% utilization (deviceA, CPU and deviceB are never idle); and even if there are speed mismatches (e.g. deviceB needs to wait for CPU to finish processing the current piece) the time anything spends idle will be minimized (and utilization maximized as much as possible).
The other alternative is to use multiple tasks - e.g. one task that fetches data from deviceA synchronously and notifies another task when the data was read; a second task that waits until data arrives and processes it and notifies another task when the data was processed; then a third task that waits until data was processed and writes it to deviceB synchronously. For utilization; this is effectively identical to using asynchronous IO (in fact it can be considered "emulation of asynchronous IO"). The problem is that you've added a bunch of extra overhead managing and synchronizing multiple tasks (more RAM spent on state and stacks, task switches, lock contention, ...); and made the code more complex and harder to maintain.
Context switching is necessary in any case. Kernel always works in its own context. So, the synchronous access doesn't save the processor time.
Usually, writing doesn't require a lot of processor work. The limiting factor is the disk response. The question is will we wait for this response do our work.
Let's say I have an application that all it does is get info and write
it into files. Is there any benefit for using a-sync IO instead of
sync IO?
If you implement a synchronous access, your sequence is following:
get information
write information
goto 1.
So, you can't get information until write() completes. Let the information supplier is as slow as the disk you write to. In this case the program will be twice slower that the asynchronous one.
If the information supplier can't wait and save the information while you are writing, you will lose portions of information when write. Examples of such information sources could be sensors for quick processes. In this case, you should synchronously read sensors and asynchronously save the obtained values.
Asynchronous IO is not better than synchronous IO. Nor vice versa.
The question is which one is better for your use case.
Synchronous IO is generally simpler to code, but asynchronous IO can lead to better throughput and responsiveness at the expense of more complicated code.
I never had any benefit from asynchronous IO just for file access, but some applications may benefit from it.
Applications accessing "slow" IO like the network or a terminal have the most benefit. Using asychronous IO allows them to do useful work while waiting for IO to complete. This can mean the ability to serve more clients or to keep the application responsive for the user.
(and "slow" just means that the time for an IO operation to finish is unbounded, it may ever never finish, eg when waiting for a user to press enter or a network client to send a command)
In the end, asynchronous IO doesn't do less work, it's just distributed differently in time to reduce idle waiting.

fopen and fwrite to the same file from multiple threads

This is similar but a bit different to existing questions. Say I have many threads that open the same file but they all do their own fopen and maintain their own FILE pointer.
a) is it necessary to lock fwrite calls if they have their own FILE ptrs?
b) if it is necessary, is locking around fwrite enough or will they potentially flush at different times and end up intermingling when they flush? If yes, would locking on fwrite and then fflush cover it?
This question can not be answered in the context of programming languages. As far as programming language is concerned, those file handles are completely independent objects, and whatever you do with one has no effect whatsoever on another.
The question is on the operating system - can it handle multiple write operation to the same underlying file at the same time. In other words, are those writes atomic. I can't say for all of them, but in Linux, for example, writes for less than PIPE_BUF size are atomic.
For the quick measure, yeah, you can put a lock around the I/O part. That'd work, I guarantee it. As for flusing I/O cache, I'd recommend not doing that. It's always best to let OS to handle I/O timing because kernel knows what's going on the best. You are not gonna have it in effect immediately after calling flush anyway because it's that complicated. Just like the other flush operations(java GC, glFlush and so on). If you choose to stick to this option, please be mindful of a start and an end point of the concurrent I/O op. You wouldn't want a case where the main thread closes the file and another worker thread tries to do I/O on that.
The general solution to this problem is creating a thread that handles the file exclusively. If other thread should read/write from/to the file, they must ask the thread to do that for them. This is tricky, I know. You'd need to compose a simple protocol, sync mechanism, but in a nutshell, it goes like this:
prep a queue, a cv(condition variable), a lock. create a thread and open the file. Doesn't matter who opens the file
The thread spawns and waits for the queue to be filled in
Other threads send a request I/O op to the thread. The request includes the data for the file and an op code.
The thread handles the requests from the queue. This is where the real I/O happens.
You could use anonymous FIFO instead of a queue. Or skip the opcode part if the file is write-only.
Unlike network I/O, modern OSes can't do file I/Os in a non-blocking manner. So expect a significant blocking time(io wait). Also, there's this problem where the queue fills up too quick and eats a lot of memory when I/O is relatively slow. There will be a case where the whole program should wait for the I/O to complete before terminating itself. Not much you can do about it. You could close the file from another thread while I/O is in progress on Linux(close() is MT-safe ), I don't know how that's gonna work on other OS.
There are alternatives like async file I/O or overlapped I/O which involves signal handling or callbacks. Using these doesn't require a creating of a thread but each has pros and cons, mostly regarding portability.

fastest technique to read a file into memory?

Is there a generally-accepted fastest technique which is used to read a file into memory in c++?
I will only be reading the file.
I have seen boost have an implementation and I have seen a couple other implementations on here but I would like to know what is considered the fastest?
Thank you in advance
In case it matters, I am considering files up to 1GB and this is for windows.
Use memory-mapped files, maybe using the boost wrapper for portability.
If you want to read files bigger than the free, contiguous portion of your virtual address space, you can move the mapped portion of the file at will.
Consider using Memory-Mapped Files for your case, as the files can be upto 1 GB size.
Memory-mapped files and how they work
And here you can start with win32 API:
MapViewOfFile
There are several other helpful API on MSDN page.
In the event memory-mapped files are not adequate for your application, and file I/O is your bottleneck, using an I/O completion port to handle async I/O on the files will be the fastest you can get on Windows.
I/O completion ports provide an efficient threading model for
processing multiple asynchronous I/O requests on a multiprocessor
system. When a process creates an I/O completion port, the system
creates an associated queue object for requests whose sole purpose is
to service these requests. Processes that handle many concurrent
asynchronous I/O requests can do so more quickly and efficiently by
using I/O completion ports in conjunction with a pre-allocated thread
pool than by creating threads at the time they receive an I/O request.
Generally speaking, mmap it is. But n Windows they have invented their own way of doing this, see "File Mapping". Boost has Memory-Mapped Files library that wraps both ways under a portable pile of code.
Also, you have to optimize for your use-case if you want to be fast. Just mapping file contents into memory is not enough. It is indeed possible that you don't need memory mapped files and could better off using async file I/O, for example. There are many solutions for many problems.

Multiple threads with locks vs single threads?

I am designing a client and server socket program.
I have a file to be transferred to the server from the client using UDP, I repeat I am using UDP.....
I am sending through UDP so, the sending rate is too fast then the receiver, so I have created 3 threads listening on the same socket, so that when one thread is doing some work(I mean writing to a file using fwrite) with the received data the other thread can recv from the client.
My 1st question is when I am using a fwrite with multiple threads I have to use locks as the file pointer is shared between the threads. I am right in thinking???
My 2nd question is "Will there be any improvement in the performance if I use multiple threads to fwrite using locks over using a single thread to do the fwrite work with no locks...??? " ... Please guide me...
I would use one thread. Saves the complications. You can buffer the data and use asynchronous writes
http://www.gnu.org/s/hello/manual/libc/Asynchronous-Reads_002fWrites.html
Cache the data before writing it.
Let the writing happen in another thread.
Doing it the way you do will require locking the socket.
Q1: yes you do need to lock it (very slow!). Why not use a separate file descriptor in each thread? the problem comes mostly with the current file position managed by that descriptor.
Q2: Neither. If data needs ordering (yes, UDP!) you should still buffer it. RAM is much faster then disk IO. Feed a stream to buffer it and handle the data in that stream in a separate thread.
Similar to Ed's answer, I'd suggest using asynchronous I/O and a single thread for your server. Though I find using Boost.Asio easier than posix AIO.
My 1st question is when I am using a fwrite with multiple threads I have to use locks as the file pointer is shared between the threads
Yes, you always have to use locks when multiple threads are writing to a single object (file, memory, etc).
My 2nd question is "Will there be any improvement in the performance if I use multiple threads to fwrite using locks over using a single thread to do the fwrite work with no locks...??? "
I would use two threads. The first thread does nothing but read from the socket and store the data in memory. The second thread reads data from memory and writes it to the file. Treat the memory buffer as a FIFO queue and use a mutex to protect the queue pointers. You'll gain nothing from a third thread. In fact, it would probably harm performance and it definitely makes the problem far more complicated.
First, try to avoid using UDP for bulk transfers. If you use UDP you have to reinvent your own flow control protocol, as well as logic for retransmission and reordering. From the sounds of it, your problems boil down to missing flow control - so why not just use TCP?
Anyway, don't put your file writing in another thread. Modern OSes will internally buffer disk writes in any case - you'll only start blocking if you're writing data much faster than the disk can keep up, in which case buffering inside your process will only buy you another few seconds at most. Switch to TCP, or implement a proper flow control mechanism.

How to perform Cross-Platform Asynchronous File I/O in C++

I am writing an application needs to use large audio multi-samples, usually around 50 mb in size. One file contains approximately 80 individual short sound recordings, which can get played back by my application at any time. For this reason all the audio data gets loaded into memory for quick access.
However, when loading one of these files, it can take many seconds to put into memory because I need to read a large amount of data with ifstream, meaning my program GUI is temporarily frozen. I have tried memory mapping my file but this causes huge CPU spikes and a mess of audio every time I need to jump to a different area of the file, which is not acceptable.
So this has led me to think that performing an Asynchronous file read will solve my problem, that is the data gets read in a different process and calls a function on completion. This needs to be both compatible for Mac OS X and Windows and in C++.
EDIT: Don't want to use the Boost library because I want to keep a small code base.
boost has an asio library, which I've not used before (it's not on NASA's list of approved third-party libraries).
My own approach has been to write the file reading code twice, once for Windows, once for the POSIX aio API, and then just pick the right one to link with.
For Windows, use OVERLAPPED (you have to enable it in the CreateFile call, then pass an OVERLAPPED structure when you read). You can either have it set an event on completion (ReadFile) or call a completion callback (ReadFileEx). You'll probably need to change your main event loop to use MsgWaitForMultipleObjectsEx so you can either wait for the I/O events or allow callbacks to run, in addition to receiving WM_ window messages. MSDN has documentation for these functions.
For Linux, there's either fadvise and epoll, which will use the readahead cache, or aio_read which will allow actual async read requests. You'll get a signal when the request completes, which you should use to post an XWindows message and wake up your event processing loop.
Both are a little different in the details, but the net effect is the same -- you request a read which completes in the background, then your event dispatch loop gets woken up when the I/O finishes.
Boost.Asio library has limited implementation of asynchronous file I/O operations (only Windows wrapper for HANDLE) therefore it not suitable for you. See this question also.
You could easily implement your own asynchronous reading using standard streams and Boost.Thread library (or platform specific threads support).