Bakcground:
I'm developing on the new SparkleDB NoSQL database, the database is ACID and has its own disk space manager (DSM) all for its database file storage accessing. The DSM allows for multiple thread concurrent I/O operations on the same physical file, ie. Asynchronous I/O or overlapped I/O. We disable disk caching, thus we write pages directly to the disk, as this is required for ACID databases.
My question is:
Is there a performance gain by arranging continuous disk page from many threads writes before sending the I/O request to the underlying disk OS I/O subsystem(thus merging the data to be written if they are continuous), or does the I/O subsystem do this for you? My question applies to UNIX, Linux, and Windows.
Example (all happends within a space of 100ms):
Thread #1: Write 4k to physical file address 4096
Thread #2: Write 4k to physical file address 0
Thread #3: Write 4k to physical file address 8192
Thread #4: Write 4k to physical file address 409600
Thread #5: Write 4k to physical file address 413696
Using this information, the DSM arranges a single 12kb write operation to physical file address 0, and a single 8kb write operation to physical file address 409600.
Update:
The DSM does all the physical file access address positioning on Windows by providing a OVERLAPPED structure, io_prep_pwrite on Linux AIO, and aiocb's aio_offset on POSIX AIO.
The most efficient method to use a hard drive is to keep writing as much data as you can while the platters are still spinning. This involves reducing the quantity of writes and increase the amount of data per write. If this can happen, then having a disk area of continuous sectors will help.
For each write, the OS needs to translate the write to your file into logical or physical coordinates on the drive. This may involve reading the directory, searching for your file and locating the mapping of your file within the directory.
After the OS determines the location, it sends data across the interface to the hard drive. Your data may be cached along the way many times until it is placed onto the platters. An efficient write will use the block sizes of the caches and data interfaces.
Now the questions are: 1) How much time does this save? and 2) Is the time saving significant. For example, if all this work saves you 1 second, this one second gained may be lost in waiting for a response from the User.
Many programs, OS and drivers will postpone writes to a hard drive to non-critical or non-peak periods. For example, while you are waiting for User input, you could be writing to the hard drive. This posting of writes may be less effort than optimizing the disk writes and have more significant impact to your application.
BTW, this has nothing to do with C++.
Related
I have a hundred million files, my program reads all these files at every startup. I have been looking for ways to make this process faster. On the way, I've encountered something strange. My CPU has 4 physical cores, but reading this many files with even higher thread counts yields much better results. Which is interesting, given that opening threads more than the logical core count of the CPU should be somewhat pointless.
8 Threads: 29.858 s
16 Threads: 15.882 s
32 Threads: 9.989 s
64 Threads: 7.965 s
128 Threads: 8.275 s
256 Threads: 8.159 s
512 Threads: 8.098 s
1024 Threads: 8.253 s
4096 Threads: 8.744 s
16001 Threads: 10.033 s
Why this may occur ? Is it some disk bottleneck ?
Did the homework, profiled the code, literally %95 of the runtime consists of read(), open() and close()
I am reading the first 4096 bytes of every file (my pagesize)
Ubuntu 18.04
Intel i7 6700HQ
Samsung 970 Evo Plus NVMe SSD
GCC/G++ 11
Why this may occur ?
If you open one file at "/a/b/c/d/e" then read one block of data from the file; the OS may have to fetch directory info for "/a", then fetch directory info for "/a/b", then fetch directory info for "/a/b/c", then... It might add up to a total of 6 blocks fetched from disk (5 blocks of directory info then one block of file data), and those blocks might be scattered all over the disk.
If you open a 100 million files and read one block of file data from each; then this might involve fetching 600 million things (500 million pieces of directory info, and 100 million pieces of file data).
What is the optimal order to do these 600 million things?
Often there's directory info caches and file data caches involved (and all requests that can be satisfied by data that's already cached should be done ASAP, before that data is evicted out of cache/s to make room for other data). Often the disk hardware also has rules (e.g. faster to access all blocks within the same "group of disk blocks" before switching to the next "group of disk blocks"). Sometimes there's parallelism in the disk hardware (e.g. two requests from the same zone can't be done in parallel, but 2 requests from different zones can be done in parallel).
The optimal order to do these 600 million things is something the OS can figure out.
More specifically; the optimal order to do these 600 million things is something the OS can figure out; if and only if the OS actually knows about all of them.
If you have (e.g.) 8 threads that send one request (e.g. to open a file) and then block (using no CPU time) until the pending request completes; then the OS will only know about a maximum of 8 requests at a time. In other words; the operating system's ability to optimize the order that file IO requests are performed is constrained by the number of pending requests, which is constrained by the number of threads you have.
Ideally; a single thread would be able to ask the OS "open all the files in this list of a hundred million files" so that the OS can fully optimize the order (with the least thread management overhead). Sadly, most operating systems don't support anything like this (e.g. POSIX asynchronous IO fails to support any kind of "asynchronous open").
Having a large number of threads (that are all blocked and not using any CPU time while they wait for their request/s to actually be done by file system and/or disk driver) is the only way to improve the operating system's ability to optimize the order of IO requests.
I am trying to build a tool which is something similar to Task Manager. I was able to get the CPU and Memory of each processes, but I couldn't figure out the Disk statistics. I was able to get the I/O Read, Write bytes, but it includes all file, disk and network. How could I get only the Disk Utilized by each processes??Otherwise is it possible to segregate the disk statistics from those I/O bytes? If yes, how could I do it?
I am receiving a large quantity of data at a fixed rate. I need to do some processing on this data on a different thread, but this may run slower than the data is coming in, so I need to buffer the data. Due to the quantity of data coming in the available RAM would be quickly exhausted, so it needs to overflow onto the hard disk. What I could do with is something like a filesystem-backed pipe, so the writer could be blocked by the filesystem, but not by the reader running too slowly.
Here's a rough set of requirements:
Writing should not be blocked by the reader running too slowly.
If data is read slow enough that the available RAM is exhausted it should overflow to the filesystem. It's ok for writes to the disk to block.
Reading should block if no data is available unless the stream has been closed by the writer.
If the reader is able to keep up with the data then it should never hit the hard disk as the RAM buffer would be sufficient (nice but not essential).
Disk space should be recovered as the data is consumed (or soon after).
Does such a mechanism exist in Windows?
This looks like a classic message queue. Did you consider MSMQ or similar? MSMQ has all the properties you are asking for. You may want to use direct addressing to avoid Active Directory http://msdn.microsoft.com/en-us/library/ms700996(v=vs.85).aspx and use local or TCP/IP queue address.
Use an actual file. Write to the file as the data is received, and in another process read the data from the file and process it.
You even get the added benefits of no multithreading.
I have written a program in C/C++ which needs to fetch data from the disk. After some time it so happens that the operating system stores some of the data in its caches. Is there some way by which I may figure out in a C/c++ programs whether the data has been retrieved from the caches or the data has been retrieved from the disk?
A simple solution would be to time the read operation. Disk reads are significantly slower. you can read a a group of file blocks (4K) twice to get an estimate.
The problem is that if you run the program again or copy the file in a shell, the OS will cache it.
I'm working on a ACID database software product and I have some questions about file durability on WinOS.
CreateFile has two attributes, FILE_FLAG_WRITE_THROUGH and FILE_FLAG_NO_BUFFERING - do I need both these to achieve file durability (ie. override all kinds of disk or OS file caching)? I'm asking since they seem to do the same thing, and setting FILE_FLAG_NO_BUFFERING causes WriteFile to throw an ERROR_INVALID_PARAMETER error.
FILE_FLAG_NO_BUFFERING specifies no caching at al. No read nor write cache all data goes directly to and from your application to disk. This is mostly usefull if you read such large chunks that caching is useless or you do your own caching. Note WhozCraig's comment on properly aligning your data when using this flag.
FILE_FLAG_WRITE_THROUGH only means that writes should written directly to disk before the function returns. This is enough to achieve ACID while it still gives the option to the OS to cache data from the file.
Using FlushFileBuffers() can provide a more efficient approach for achieving ACID as you can do several writes to a file and then flush them in one go. Combining writes in one flush is very important as non cached writes will limit you to the spindle speed of your harddrive. 120 non cached writes or flushes per second max for a 7200 rpm disk.