synchronized write operation in C - c++

I am working on a smart camera that runs linux. I capture images from the camera streaming software and writes the images on a SD card (attached with the camera). For writing the individual JPEG images, I used fopen and fwrite C functions. For synchronizing the disk write operation, I use fflulsh(pointer) to flush the buffers and write the data on the SD card. But it seems it has no effect as the write operation uses system memory and the memory gets decreased after every write operation. I also used low-level open and write functions in conjunction with fsync (filedesc), but it also has no effect.
The flushing of buffers take place only when I dismount the SD card and then the memory is freed. How can I disable this cache write instead of SD card write? or how can I force the data to be written on the SD card at the same time instead of using the system memory?

sync(2) is probably your best bet:
SYNC(2) Linux Programmer's Manual SYNC(2)
NAME
sync - commit buffer cache to disk
SYNOPSIS
#include <unistd.h>
void sync(void);
DESCRIPTION
sync() first commits inodes to buffers, and then buffers to disk.
BUGS
According to the standard specification (e.g., POSIX.1-2001), sync()
schedules the writes, but may return before the actual writing is done.
However, since version 1.3.20 Linux does actually wait. (This still
does not guarantee data integrity: modern disks have large caches.)

You can set the O_SYNC if you open the file using open(), or use sync() as suggested above.
With fopen(), you can use fsync(), or use a combination of fileno() and ioctl() to set options on the descriptor.
For more details see this very similar post: How can you flush a write using a file descriptor?

Check out fsync(2) when working with specific files.

There may be nothing that you can really do. Many file systems are heavily cached in memory so a write to a file may not immediately be written to disk. The only way to guarantee a write in this scenario is to actually unmount the drive.
When mounting the disk, you might want to specify the sync option (either using the -oflag in mount or on your fstab line. This will ensure that at least your writes are written synchronously. This is what you should always use for removable media.

Just because it's still taking up memory doesn't mean it hasn't also been written out to storage - a clean (identical to the copy on physical storage) copy of the data will stay in the page cache until that memory is needed for something else, in case an application later reads that data back.
Note that fflush() doesn't ensure the data has been written to storage - if you are using stdio, you must first use fflush(f), then fsync(fileno(f)).
If you know that you will not need to read that data again in the forseeable future (as seems likely for this case), you can use posix_fadvise() with the POSIX_FADV_DONTNEED flag before closing the file.

Related

Does fwrite block until data has been written to disk?

Does the fwrite() function return after the data to be written to disk has been handed over to the operating system or does it return only after the data is actually physically written to the disk?
For my case, I'm hoping that it's the first case since I don't want to wait until all the data is physically written to the disk. I'm hoping that another OS thread transfers it in the background.
I'm curious about behavior on Windows 10 in this particular case.
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fwrite
There are several places where there is buffering of data in order to improve efficiency when using fwrite(): buffering within the C++ Runtime and buffering in the operating system file system interface and buffering within the actual disk hardware.
The default for these are to delay the actual physical writing of data to disk until there is an actual request to flush buffers or if appropriate indicators are turned on to perform physical writes as the write requests are made.
If you want to change the behavior of fwrite() take a look at the setbuf() function setbuf redirection as well as setbuff() Linux man page and here is the Microsoft documentation on setbuf().
And if you look at the documentation for the underlying Windows CreateFile() function you will see there are a number of flags which include flags as to whether buffering of data should be done or not.
FILE_FLAG_NO_BUFFERING 0x20000000
The file or device is being opened with no system caching for data
reads and writes. This flag does not affect hard disk caching or
memory mapped files.
There are strict requirements for successfully working with files
opened with CreateFile using the FILE_FLAG_NO_BUFFERING flag, for
details see File Buffering.
And see the Microsoft documentation topic File Buffering.
In a simple example, the application would open a file for write
access with the FILE_FLAG_NO_BUFFERING flag and then perform a call to
the WriteFile function using a data buffer defined within the
application. This local buffer is, in these circumstances, effectively
the only file buffer that exists for this operation. Because of
physical disk layout, file system storage layout, and system-level
file pointer position tracking, this write operation will fail unless
the locally-defined data buffers meet certain alignment criteria,
discussed in the following section.
Take a look at this discussion about settings at the OS level for what looks to be Linux https://superuser.com/questions/479379/how-long-can-file-system-writes-be-cached-with-ext4
Does the fwrite(fp,...) function return after the data to be written to disk has been handed over to the operating system or does it return only after the data is actually physically written to the disk?
No. In fact, it does not even (necessarily) wait until data has been handed over to the OS -- fwrite may just put the data in its internal buffer and return immediately without actually writing anything.
To force data to the OS, you need to use fflush(fp) on the FILE pointer, but that still does not necessarily write data to the disk, though it will generally queue it for writing. But it does not wait for those queued writes to finish.
So to guarentee the data is written to disk, you need to do an OS level call to wait until the queued writes complete. On POSIX systems (such as Linux), that is fsync(fileno(fp)). You'll need to study the Windows documentation to figure out how to do the equivalent on Windows.

RAM consumption on opening a file

I have a Binary file of ~400MB which I want to convert to CSV format. The output CSV file will be ~1GB (according to my calculations).
I read the binary file and store it in an array of structures (required for other processing too), and when the user wants to export it to CSV, I am creating a file (or opening an existing file - depending on the user's choice), opening it using fopen and then writing to it using fwrite, line by line.
Coming to my question, this link from CPlusPlus.com says:
The returned stream is fully buffered by default if it is known to not
refer to an interactive device
My query is when I open this file, will it be loaded in RAM? Like when at the end, my file is of ~1GB, will it consume that much RAM or will it be just on the hard disk?
This code will run on Windows as well as Android.
FILE* streams buffering is a C feature and it is used to reduce system call overhead (i.e. do not call read() for each fgetc() which is expensive). Usually buffer is small - i.e. 512 bytes.
Page Cache or similiar mechanisms are different beasts -- they are used to reduce number of disks operations. Usually operating system uses free memory to cache previously read or written data to/from disk so subsequent operations will use RAM.
If there are shortage of free memory -- data is evicted from page cache.
It is operating system and file system and computer specific. And it might not matter that much. Read about page cache.
BTW, you might be interested by sqlite
From an application writer point of view, you should care more about virtual memory and address space of your process than about RAM. Physical RAM is managed by the operating system.
On Linux and Android, if you want to optimize that you might consider (later) using posix_fadvise(2) and perhaps madvise(2). I'm not sure it is worth the pain in your case (since a gigabyte file is not that much today).
I read the binary file and store it in an array of structures (required for other processing too), and when the user wants to export it to CSV
Reading per se doesn't use a lot of memory, like myaut says the buffer is small. The elephant in the room here is: do you you read up all the file and put all the data into structures? or do you start processing after one or few reads to get the minimum amount of data needed to do some processing? Doing the former will indeed use ~400MB or more memory, doing the later will use quite a lot less, that being said, it all depends on the amount of data needed to start processing, and maybe you need all the data loaded at once.

Flushing only file metadata

We're developing on a new ACID database system that focuses more on data integrity than throughput. Its storage engine accesses secondary storage devices directly with flags like O_DIRECT or FILE_FLAG_WRITE_THROUGH & FILE_FLAG_NO_BUFFERING.
In some cases we only change file metadata using kernel functions like fallocate() or SetFileValidData() - in these cases I would like to flush only the metadata and not all pending file I/O to leverage execution performance as the call blocks until the device reports that the transfer has completed - even if no file buffering is in use it still only applies to application data and the file system may still cache file metadata.
I've so far found that fsync() or FlushFileBuffers() flushes metadata, but unfortunately it also flushes all pending I/O. Anyone know of a way of only flushing the file metadata? This problem applies to Linux, UNIX, and Windows.
I am a newbie to FS. But when you go through implementation of any physical FS (ext4/ext3/etc) they haven't exposed such functionality to upper layer. But internally in fsyc() implementation they only update metadata of the file and remaining task is delegated to generic_block_fdatasync().
You might want to write a hack for your requirement of flushing only metadata.
Anyone know of a way of only flushing the file metadata?
No, Based on my understanding, there is no interface/API provided by any operating system. There are two types of the interfaces provided by FileSystem through which application(User mode) program can control when data gets written/saved to disk.
fsync: A call to fsync( ) ensures that all dirty data associated with the file mapped by the file descriptor fd is written back to disk. This call writes back both data and metadata.
fdatasync: This system call does the same thing as fsync( ), except that it only flushes data.
This means there is a way to perform something opposite to the task mentioned in this question. However while reading your question,it appears to me that you want to achieve this to get optimal performance and data consistency. With my understanding we should not think much about the execution performance as modern FileSystem implements the "delayed write" and various other mechanism to avoid unnecessary disk writes.
The main intention over here is to switch between User Mode and Kernel Mode as it is more expensive compared to anything else. This might be reason that kernel developer has not provided such interface which can only be used to update the meta data of that particular file. This could be due to limitation of the FileSystem and I guess here we can do little to achieve more efficiency.
For complete information on internal algorithm you may want to refer the great great classic book "The Design Of UNIX Operating System" By Maurice J Bach which describes these concepts and the implementation in detailed way.

Speeding up file I/O: mmap() vs. read()

I have a Linux application that reads 150-200 files (4-10GB) in parallel. Each file is read in turn in small, variably sized blocks, typically less than 2K each.
I currently need to maintain over 200 MB/s read rate combined from the set of files. The disks handle this just fine. There is a projected requirement of over 1 GB/s (which is out of the disk's reach at the moment).
We have implemented two different read systems both make heavy use of posix_advise: first is a mmaped read in which we map the entirety of the data set and read on demand.
The second is a read()/seek() based system.
Both work well but only for the moderate cases, the read() method manages our overall file cache much better and can deal well with 100s of GB of files, but is badly rate limited, mmap is able to pre-cache data making the sustained data rate of over 200MB/s easy to maintain, but cannot deal with large total data set sizes.
So my question comes to these:
A: Can read() type file i/o be further optimized beyond the posix_advise calls on Linux, or having tuned the disk scheduler, VMM and posix_advise calls is that as good as we can expect?
B: Are there systematic ways for mmap to better deal with very large mapped data?
Mmap-vs-reading-blocks
is a similar problem to what I am working and provided a good starting point on this problem, along with the discussions in mmap-vs-read.
Reads back to what? What is the final destination of this data?
Since it sounds like you are completely IO bound, mmap and read should make no difference. The interesting part is in how you get the data to your receiver.
Assuming you're putting this data to a pipe, I recommend you just dump the contents of each file in its entirety into the pipe. To do this using zero-copy, try the splice system call. You might also try copying the file manually, or forking an instance of cat or some other tool that can buffer heavily with the current file as stdin, and the pipe as stdout.
if (pid = fork()) {
waitpid(pid, ...);
} else {
dup2(dest, 1);
dup2(source, 0);
execlp("cat", "cat");
}
Update0
If your processing is file-agnostic, and doesn't require random access, you want to create a pipeline using the options outlined above. Your processing step should accept data from stdin, or a pipe.
To answer your more specific questions:
A: Can read() type file i/o be further optimized beyond the posix_advise calls on Linux, or having tuned the disk scheduler, VMM and posix_advise calls is that as good as we can expect?
That's as good as it gets with regard to telling the kernel what to do from userspace. The rest is up to you: buffering, threading etc. but it's dangerous and probably unproductive guess work. I'd just go with splicing the files into a pipe.
B: Are there systematic ways for mmap to better deal with very large mapped data?
Yes. The following options may give you awesome performance benefits (and may make mmap worth using over read, with testing):
MAP_HUGETLB
Allocate the mapping using "huge pages."
This will reduce the paging overhead in the kernel, which is great if you will be mapping gigabyte sized files.
MAP_NORESERVE
Do not reserve swap space for this mapping. When swap space is reserved, one has the guarantee that it is possible to modify the mapping. When swap space is not reserved one might get SIGSEGV upon a write if no physical memory is available.
This will prevent you running out of memory while keeping your implementation simple if you don't actually have enough physical memory + swap for the entire mapping.**
MAP_POPULATE
Populate (prefault) page tables for a mapping. For a file mapping, this causes read-ahead on the file. Later accesses to the mapping will not be blocked by page faults.
This may give you speed-ups with sufficient hardware resources, and if the prefetching is ordered, and lazy. I suspect this flag is redundant, the VFS likely does this better by default.
Perhaps using the readahead system call might help, if your program can predict in advance the file fragments it wants to read (but this is only a guess, I could be wrong).
And I think you should tune your application, and perhaps even your algorithms, to read data in chunk much bigger than a few kilobytes. Can't than be half a megabyte instead?
The problem here doesn't seem to be which api is used. It doesn't matter if you use mmap() or read(), the disc still has to seek to the specified point and read the data (although the os does help to optimize the access).
mmap() has advantages over read() if you read very small chunks (a couple of bytes) because you don't have call the os for every chunk, which becomes very slow.
I would also advise like Basile did to read more than 2kb consecutively so the disc doesn't have to seek that often.

How best to manage Linux's buffering behavior when writing a high-bandwidth data stream?

My problem is this: I have a C/C++ app that runs under Linux, and this app receives a constant-rate high-bandwith (~27MB/sec) stream of data that it needs to stream to a file (or files). The computer it runs on is a quad-core 2GHz Xeon running Linux. The filesystem is ext4, and the disk is a solid state E-SATA drive which should be plenty fast for this purpose.
The problem is Linux's too-clever buffering behavior. Specifically, instead of writing the data to disk immediately, or soon after I call write(), Linux will store the "written" data in RAM, and then at some later time (I suspect when the 2GB of RAM starts to get full) it will suddenly try to write out several hundred megabytes of cached data to the disk, all at once. The problem is that this cache-flush is large, and holds off the data-acquisition code for a significant period of time, causing some of the current incoming data to be lost.
My question is: is there any reasonable way to "tune" Linux's caching behavior, so that either it doesn't cache the outgoing data at all, or if it must cache, it caches only a smaller amount at a time, thus smoothing out the bandwidth usage of the drive and improving the performance of the code?
I'm aware of O_DIRECT, and will use that I have to, but it does place some behavioral restrictions on the program (e.g. buffers must be aligned and a multiple of the disk sector size, etc) that I'd rather avoid if I can.
You can use the posix_fadvise() with the POSIX_FADV_DONTNEED advice (possibly combined with calls to fdatasync()) to make the system flush the data and evict it from the cache.
See this article for a practical example.
If you have latency requirements that the OS cache can't meet on its own (the default IO scheduler is usually optimized for bandwidth, not latency), you are probably going to have to manage your own memory buffering. Are you writing out the incoming data immediately? If you are, I'd suggest dropping that architecture and going with something like a ring buffer, where one thread (or multiplexed I/O handler) is writing from one side of the buffer while the reads are being copied into the other side.
At some size, this will be large enough to handle the latency required by a pessimal OS cache flush. Or not, in which case you're actually bandwidth limited and no amount of software tuning will help you until you get faster storage.
You can adjust the page cache settings in /proc/sys/vm, (see /proc/sys/vm/dirty_ratio, /proc/sys/vm/swappiness specifically) to tune the page cache to your liking.
If we are talking about std::fstream (or any C++ stream object)
You can specify your own buffer using:
streambuf* ios::rdbuf ( streambuf* streambuffer);
By defining your own buffer you can customize the behavior of the stream.
Alternatively you can always flush the buffer manually at pre-set intervals.
Note: there is a reson for having a buffer. It is quicker than writting to a disk directly (every 10 bytes). There is very little reason to write to a disk in chunks smaller than the disk block size. If you write too frquently the disk controler will become your bottle neck.
But I have an issue with you using the same thread in the write proccess needing to block the read processes.
While the data is being written there is no reason why another thread can not continue to read data from your stream (you may need to some fancy footwork to make sure they are reading/writting to different areas of the buffer). But I don't see any real potential issue with this as the IO system will go off and do its work asyncroniously (potentially stalling your write thread (depending on your use of the IO system) but not nesacerily your application).
I know this question is old, but we know a few things now we didn't know when this question was first asked.
Part of the problem is that the default values for /proc/sys/vm/dirty_ratio and /proc/sys/vm/dirty_background_ratio are not appropriate for newer machines with lots of memory. Linux begins the flush when dirty_background_ratio is reached, and blocks all I/O when dirty_ratio is reached. Lower dirty_background_ratio to start flushing sooner, and raise dirty_ratio to start blocking I/O later. On very large memory systems, (32GB or more) you may even want to use dirty_bytes and dirty_background_bytes, since the minimum increment of 1% for the _ratio settings is too coarse. Read https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/ for a more detailed explanation.
Also, if you know you won't need to read the data again, call posix_fadvise with FADV_DONTNEED to ensure cache pages can be reused sooner. This has to be done after linux has flushed the page to disk, otherwise the flush will move the page back to the active list (effectively negating the effect of fadvise).
To ensure you can still read incoming data in the cases where Linux does block on the call to write(), do file writing in a different thread than the one where you are reading.
Well, try this ten pound hammer solution that might prove useful to see if i/o system caching contributes to the problem: every 100 MB or so, call sync().
You could use a multithreaded approach—have one thread simply read data packets and added them to a fifo, and the other thread remove packets from the fifo and write them to disk. This way, even if the write to disk stalls, the program can continue to read incoming data and buffer it in RAM.