What does _locking() really do? - c++

Looking for answer of this question I found function _locking(). There tells that it Locks or unlocks bytes of a file (actually I can't understand what does this sentence really mean). If someone have experience of using of this function, is it possible to use the function for solving problem described in first question?

Quoting the MSDN page you linked:
int _locking(
int fd,
int mode,
long nbytes
);
The _locking function locks or unlocks nbytes bytes of the file specified by fd. Locking bytes in a file prevents access to those bytes by other processes. All locking or unlocking begins at the current position of the file pointer and proceeds for the next nbytes bytes. It is possible to lock bytes past end of file.

It simply reserves a range of a file for the exclusive use of the process that acquires the file lock. If a lock call succeeds, another process that tries to read or write that portion of the file will fail. This allow multiple processes to access the same file and update it in a coherent manner. It's kind of like a mutex for a range of a file.
Basically it allows you to update portions of a file atomically, so any other process reading or writing the file will see (or change) either all of the update, or none of it. It also applies to read - you can lock a range of a file that you want to read to prevent another process from changing part of it while you're in the middle of reading it.
But processes can still access other parts of the file without error or delay.
It will not solve the problem in the question you're referring to because _lock() on;t works at a process granularity. If thread A locks a file range, then thread B in that same process can still read/write that range. To prevent another thread in the same process from accessing a file range, the process would have to implement its own internal mechanism for respecting that a file range has been locked by another thread. At least I'm unaware of something that does that in the Win32 API (I suppose there could be something that I don't know about).

http://msdn.microsoft.com/en-us/library/8054ew2f(v=vs.71).aspx
I found that this helps "fix" the problem Race condition!
The last person to write to a file wins. Say you only need to read the first half of a file no reason to lock the whole file.
So you take the file size in bytes pass it to this function then lock it down.
The function will returns a 0 if successful. A return value of –1 indicates failure, in which case errno is set to what the MSDN page tells you.
To answer your question
You can take the size of the file then lock it down but you will only be able to read form the file. you lock it into a kind of read mode only.
In the wiki of Race condition it tell you examples of how to lock a file by getting a 2nd process to check for a flag this may work in your case look into it.

It prevents other processes from accessing that same part of the file.

Related

How can I make sure only one thread performs IO from a file?

Here's my use case (using C++): I have a multithreaded environment performing operations on data structures written on disk. There are M files. The workflow is:
Thread reads from file into a data structure
Operations on the data structure are performed
The data structure is inserted in cache
Last recently used element is written on file
Cache insertions and deletions are thread-safe already. However, I have no idea how to parallelize writes and reads, ie if Thread 1 is reading from File 1, then Thread 2 can read from File 2. Of course Thread 2 should not read from File 1. If I simply insert a mutex, the whole section is locked and only one thread can read at the same time. What is the most efficient way to make sure only one thread reads from one file, but multiple files are read at the same time?
edit: code is something like this
for element in elements
file = element.txt
data = file.read()
cache.insert(data)
Put file name in an std::map as a key. Then add mutex pointer as a value. Then whenever a thread has a file name to work on, it locks using the mutex and a lock guard.
{
lock_guard<mutex> lg (*mapping[filename] );
compute(filename);
}
As OS has its own file cache, it would be good to use read-lock to let multiple threads read a file concurrently and still lock against writing by a unique lock.

Random File Writing

If I have multiple threads generating blocks of a file, what is the best way to write out the blocks?
ex) 5 threads working on a file of 500 blocks, block 0 is not necessarily completed before block 1, but the output file on disk need to be in order. (block 0, block 1, block 2, .... block 499)
the program is in C++, can fwrite() somehow "random access" the file? the file is created from scratch, meaning when block 5 is completed, the file may still be of size 0 due to block 1~4 are not completed yet. Can I directly write out block 5? (with proper fseek)
This piece of code is performance critical, so I'm really curious about anything that can improve the perf. This looks like a multiple producer(block generators) and one consumer(output writer) scenario. The idea case is that thread A can continue generating the next block when it complete the previous.
if fwrite can be "random", then the output writer can simply takes outputs, seek, and then write. However not sure if this design can perform well in large scale.
Some limitations
Each block is of the same size, generated in memory
block size is known in advance, but not the total number of blocks.
the total size is a few GBs. Big.
There could be multiple jobs running on one server. each job is described at above. They have their own independent generators/writer, difference processes.
The server is a Linux/CentOS machine.
Assuming each block is the same size, and that the blocks are generated in memory before they are required to be written to disk, then a combination of lseek and write would be perfectly fine.
If you are able to write the entire block in one write you would not gain any advantage in using fwrite -- so just use write directly -- however you would need some sort of locking access control (mutex) if all the threads are sharing the same fd -- since seek+write cannot be done atomically, and you would not want one thread to seek before an just before a second thread is about to write.
This further assume that your file system is a standard file system, and not of some exotic nature, since not all input/output device everything supports lseek (for example a pipe).
Update: lseek can seek beyond the end of file, just set the whence parameter = SEEK_SET and the offset to the absolute position in the file (fseek has the same option, but I have never used).

Do I need mutex for 1 reader and 1 writer where I don't mind losing some writes?

I have a ROS node running two threads and they both share the same class. This class has two sets of parameters "to read" and "to write" to be updated in a control loop.
There are two situations where questions arises.
My program is a node that pumps control data into a quadrotor (case 1) and reads the drone data to get feedback (case 2). Here I can control the execution frequency of thread A and I know the frequency at which thread B can communicate with its read/write external source.
The thread A reads data from the control source and updates the "to read" parameters. The thread B is constantly reading this "to read" parameters and writting them into the drone source. My point here is that I don't mind if I miss some of the values A thread has read, but thread B could happen to read something that's not a "true" value because thread A is writting or something similar?
The thread B after writting the "to read" parameters, reads the state of the drone that will update the second set "to write". Again thread A needs to read this "to write" parameters and write them back to the control source, the same way I don't care if a value is missed because I'll get the next one.
So do I need a mutex here? Or the reading threads will just miss some values but the ones read will be correct and consistent?
BTW: I am using boost:threads to implement the thread B as the thread A it's the ROS node itself.
A data race is undefined behavior. Even if the hardware guarantees atomic access and even your threads never actually access the same data at the same time due to timings. There is no such thing as a benign data race in C++. You can get lucky that the undefined behavior does what you want, but you can never be sure and every new compilation could break everything (not just a missed write). I strongly suggest you use an std::atomic. It will most likely generate almost the same code except that it is guaranteed to always work.
In general the answer is that you need a lock or some other type of synchronization mechanism. For example, if your data is a null-terminated string it's possible for you to read interleaved data. Say one thread was reading the buffer and the string in the buffer is "this is a test". The thread copies the first four bytes and then another thread comes in and overwrites the buffer with "cousin it is crazy". You'd end up copying "thisin it is crazy". That's just one example of things that could go wrong.
If you're always copying atomic types and everything is fixed length, then you could get away with it. But if you do, your data is potentially inconsistent. If two values are supposed to be related, it's possible for that relationship now to be broken because you read one value from the previous update and one value from the new update.

Can asynchronous calls to WriteFile result in torn writes?

I have hypothetical scenario where a file handle opened in asynchronous mode, and some threads which are appending to that file handle. They append by setting the Offset and OffsetHigh parts of the OVERLAPPED structure to 0xFFFFFFFF, as documented in the MSDN article for WriteFile.
Can I issue a second write in append mode like this before the first append completes, and expect the file to contain the entire contents of the first append followed by the entire contents of the second append? Or must I wait to issue the following asynchronous write until the previous write completes?
Yes. It works. I worked at a company that used a similar scheme, although to get their seek calls to work each time, the predefined the file at a known size (about 2Gb...) then truncated the file at the end.
However, you can just "append" by going to the right location before each write. You'll have to handle the position yourself though.
And also each thread must access the file atomically, "of course."
A simple example:
lock mutex
seek to position
write data
position += data size
unlock mutex
Of course here I assume that the file is properly opened before you call this function from any thread.
The one thing that you cannot do, unless you create a large file first (which is very fast since files with all zeroes are created virtually), is seek at a position depending on something such as a frame number. So if thread 3 wants to write at "size * 3" and that happens before thread 2 writes at "size * 2" then the seek() will fail...
You should never issue multiple outstanding WriteFile operations with offset set to 0xFFFFFFFF even from a single thread. This will cause issue where multiple calls try to access the end of data at the same time and will lead to data corruption. This is due to the fact that if WriteFile operates in asynch mode and there are other outstanding WriteFile in process using the end of file, some operations might write data to the end of file and other outstanding operations will get wrong end of file pointer. In short you should use 0xFFFFFFFF only 1 time and wait for the operation to finish to issue another one using that offset. Otherwise you need to calculate the offsets yourself so that each outstanding write operation uses a unique offset. This bug took me 3 days to find about due to poor MSDN documentation about that offset.

How to detect the disk full error and let program resume after getting free disk space

I am writing an application to run on LINUX, which writes to disk with fprintf & fwrite. I would like to be able to trap "disk full" errors, prompt the user to make more space and then resume operation as if nothing had happened. Is there any graceful solution for it?
Check the return value of each call to fprintf() and fwrite(). If either call returns a negative value, check errno to see if errno is equal to EDQUOT or ENOSPC, see manpage for write (or in case of fprintf() maybe even ENOMEM, as mentioned in some manpages for fprintf but not in all). If so, you're probably out of disk space.
As for resuming the operation as if nothing ever happened; that's a little bit harder; you'll need to keep track of what data you successfully wrote to the disk, so that after you've notified the user and they've indicated that it's time to try again, you can resume writing that data from the point at which the error occurred. That means keeping the state of the write in a structure of some sort (i.e. not just on the stack) so that you can return from your writing-function and then resume it later on. (Either that, or do the writing in a separate thread, and have the thread notify the main thread and then block until the main thread notifies back that it's safe to continue... that might get a little tricky though)
You could reserve space in larger chunks (say, 64kB or 1MB) and use custom wrappers for fwrite and fprintf to make sure that data is written only in the already reserved area. These wrappers would also allocate new disk space for your files as necessary. Then, you'll have only a few points in your code where the "out of disk space" can actually happen, and this error is relatively easy to recover from if you know you only have been allocating.
If you are able to use boost library then is pretty simple.
boost::filesystem::space returns information about disk space. Input to space method is path to the file, and result is space_info structure which contains capacity, free space and available space. More about space_info is here.