SHARING_VIOLATION with multi-threaded file IO on Windows - c++

I have some code that resembles this minimal reproduction example (the real version generates some code and compiles it):
#include <fstream>
#include <string>
#include <thread>
#include <vector>
void write(unsigned int thread)
{
std::ofstream stream("test_" + std::to_string(thread) + ".txt");
stream << "test" << std::endl;
stream << "thread" << std::endl;
stream << "bad" << std::endl;
}
void test(unsigned int thread)
{
write(thread);
#ifdef _WIN32
const std::string command = "rename test_" + std::to_string(thread) + ".txt test_renamed_" + std::to_string(thread) + ".txt";
#else
const std::string command = "mv test_" + std::to_string(thread) + ".txt test_renamed_" + std::to_string(thread) + ".txt";
#endif
system(command.c_str());
}
int main()
{
std::vector<std::thread> threads;
for(unsigned int i = 0; i < 5; i++) {
// Remove renamed file
std::remove(("test_renamed_" + std::to_string(i) + ".txt").c_str());
threads.emplace_back(test, i);
}
// Join all threads
for(auto &t : threads) {
t.join();
}
return EXIT_SUCCESS;
}
My understanding is that std::ofstream should behave in a nice RAII manner and close and flush at the end of the write function. On Linux, it appears to do just this. However, on Windows 10 I get sporadic "The process cannot access the file because it is being used by another process" errors. I've dug into it with procmon and it looks like the file isn't getting closed by the parent process (22224) resulting in the SHARING_VIOLATION which presumably causes the error:
Although the procmon trace looks like the problem is within my process, I have tried turning off the virus scanner. I have also tried using C-style fopen,fprintf,fclose and also ensuring that the process I'm spawning with system isn't inheriting file handles somehow by clearing HANDLE_FLAG_INHERIT on the underlying file handle...which leaves me somewhat out of ideas! Any thoughts SO?

We can rewrite the file writing using Win32 API:
void writeRaw(unsigned int thread)
{
const auto str = "test_" + std::to_string(thread) + ".txt";
auto hFile = CreateFileA(str.c_str(), GENERIC_WRITE,
FILE_SHARE_WRITE, nullptr, CREATE_ALWAYS, 0, nullptr);
DWORD ret{};
WriteFile(hFile, str.data(), str.size(), &ret, nullptr);
CloseHandle(hFile);
}
Running the test still gives a file share violation due to the way windows works. When the last handle is closed, filesystem driver performs IRP_MJ_CLEANUP IOCTL to finish processing anything related to the file.
Antivirus software, for instance, would attempt to scan the file (and incidentally holds the lock on it =) ). Additionally documentation MSDN IRP_MJ_CLEANUP states that:
It is important to note that when all handles to a file object have been closed, this does not necessarily mean that the file object is no longer being used. System components, such as the Cache Manager and the Memory Manager, might hold outstanding references to the file object. These components can still read to or write from a file, even after an IRP_MJ_CLEANUP request is received.
Conclusion: It is expected that to receive a file share violation in windows if a process tries to do something with the file shortly after closing the handle as the underlying system components are still processing the file close request.

At least on VS 2017, I can confirm the file is closed from your snippet. (In destructor of ofstream, the code calls fclose on the handle).
I think however, that the issue is not in the C++ code, but the behavior of the OS.
With Windows, the act of removing a file which the OS thinks is open, will be blocked. In Unix the behavior of unlinking a file from a directory, is to allow existing handles to continue acting the orphaned files. So in unix the operation could never be a sharing violation, as the act of unlinking a file is a different operation. linux semantics can be opted into for recent windows 10 builds.
procmon on Windows has a given altitude. That means that any operation which is perfformed by virus scanners may be hidden to procmon, and it would give a false answer.
A process can also duplicate a handle on the open file, and that would also cause this issue, but not show the handle being closed.

The most probable cause of the problem is that when you delete a file in windows ,it isn't immediatly deleted (it's just flagged for deletion). It can/will take some milliseconds (up to seconds if you're very unlucky) for it to be actually deleted.
Source : Niall Douglas in “Better mutual exclusion on the filesystem using Boost.AFIO" about 10m:10s https://www.youtube.com/watch?v=9l28ax3Zq0w

Related

In C++, how to detect that file has been already opened by own process?

I need to create a logger facility that outputs from different places of code to the same or different files depending on what the user provides. It should recreate a file for logging if it is not opened. But it must append to an already opened file.
This naive way such as
std::ofstream f1(“log”);
f1 << "1 from f1\n";
std::ofstream f2(“log”);
f2 << "1 from f2\n";
f1 << "2 from f1\n";
steals stream and recreates the file. Log contains
1 from f2
With append, it will reuse file, but the second open steals stream from f1.
Log contains
1 from f1
1 from f2
Try to guess which files will be used and open them all in the very begging will work but may create a lot of files that are not actually used.
Open for append and closing on each logging call would be an almost working solution, but it seems to be a slow solution due to a lot of system calls and flushing on each logging action.
I’m going to create a static table of opened files, hoping that std::filesystem::canonical
will work in all of my cases. But as far as I understand such a table should already exist somewhere in the process.
I've read that in Fortran people can check if a file was opened using inquire.
Check whether file has been opened already
But that answer did not give me any insight on how to achieve the same with С/C++.
Update
A scratch of the logger with a "static" table of open logs can look like
//hpp
class Logger {
static std::mutex _mutex;
static std::unordered_map<std::string, std::ofstream> _openFiles;
std::ostream& _appender;
std::ostream& _createAppender(const std::filesystem::path& logPath);
public:
Logger(const std::filesystem::path& logPath):
_appender(_createAppender(logPath)) {
}
template<class... Args>
void log(const Args&... args) const {
std::scoped_lock<std::mutex> lock(_mutex);
(_appender << ... << args);
}
};
//cpp
#include "Logger.hpp"
std::mutex Logger::_mutex;
std::unordered_map<std::string, std::ofstream> Logger::_openFiles;
std::ostream& Logger::_createAppender(const std::filesystem::path& logPath) {
if (logPath.empty()) return std::cout;
const auto truePath{std::filesystem::weakly_canonical(logPath).string()};
std::scoped_lock<std::mutex> lock(_mutex);
const auto entry{_openFiles.find(truePath)};
if (entry != _openFiles.end()) return entry->second;
_openFiles.emplace(truePath, logPath);
std::ostream& stream{_openFiles[truePath]};
stream.exceptions(std::ifstream::failbit|std::ifstream::badbit);
return stream;
}
maybe it will help someone.
Yet, I still wonder if it is possible to get table mapping handles/descriptors from OS mentioned by #yzt, and will accept as an answer if someone explains how to do that inside the program.
So here is a simple Linux specific code that checks whether a specified target file is open by the current process (using --std=c++17 for dir listing but any way can be used of course).
#include <string>
#include <iostream>
#include <filesystem>
#include <sys/types.h>
#include <unistd.h>
#include <limits.h>
bool is_open_by_me(const std::string &target)
{
char readlinkpath[PATH_MAX];
std::string path = "/proc/" + std::to_string(getpid()) + "/fd";
for (const auto & entry : std::filesystem::directory_iterator(path)) {
readlink(entry.path().c_str(), readlinkpath, sizeof(readlinkpath));
if (target == readlinkpath)
return true;
}
return false;
}
Simply list the current pid's open handles via proc, then use readlink function to resolve it to the actual file name.
That is the best way to do it from the userspace I know. This information is not known by the process itself, it is known by the kernel about the process, hence the process has to use various tricks, in this case parsing procfs, to access it.
If you want to check whether a different process hold an open handle to a file, you will have to parse all the procfs for all processes. That may not be always possible since other processes may be run by different users.
All that said - in your specific case, when you are the one owner, opening and closing the files - maintaining a table of open handles is a much cleaner solution.

`boost::asio::io_context` with `boost::process::async_pipe`: is there a way to run it reliably?

With Windows API, we can use pipes to forward process output and error streams, so we can read process output without any temporary files. Instead of this:
std::system("my_command.exe > out.tmp");
we can work much faster and without risk to generate a lot of forgotten temporary files (on system crash for example).
Linux has similar functionality. However, implementation of OS-specific code for each OS is time consuming and complex task, so it looks like good idea to use some portable solution.
boost::process claims to be such solution. However, it is fundamentally unreliable. See following sample program:
#include <fstream>
#include <iostream>
#include <memory>
#include <vector>
#include <boost/asio.hpp>
#include <boost/process.hpp>
void ReadPipe(boost::process::async_pipe& pipe, char* output_buffer, size_t output_buffer_size, boost::asio::io_context::strand& executor, std::ofstream& output_saver)
{
namespace io = boost::asio;
using namespace std;
io::async_read(
pipe,
io::buffer(
output_buffer,
output_buffer_size
),
io::bind_executor(executor, [&pipe, output_buffer, output_buffer_size, &executor, &output_saver](const boost::system::error_code& error, std::size_t bytes_transferred) mutable
{
// Save transferred data
if (bytes_transferred)
output_saver.write(output_buffer, bytes_transferred);
// Handle error
if (error)
{
if (error.value() == boost::asio::error::basic_errors::broken_pipe)
cout << "Child standard output is broken, so the process is most probably exited." << endl;
else
cout << "Child standard output read error occurred. " << boost::system::system_error(error).what() << endl;
}
else
{
//this_thread::sleep_for(chrono::milliseconds(50));
ReadPipe(pipe, output_buffer, output_buffer_size, executor, output_saver);
}
})
);
}
int main(void)
{
namespace io = boost::asio;
namespace bp = boost::process;
using namespace std;
// Initialize
io::io_context asio_context;
io::io_context::strand executor(asio_context);
bp::async_pipe process_out(asio_context);
char buffer[65535];
constexpr const size_t buffer_size = sizeof(buffer);
ofstream output_saver(R"__(c:\screen.png)__", ios_base::out | ios_base::binary | ios_base::trunc);
// Schedule to read standard output
ReadPipe(process_out, buffer, buffer_size, executor, output_saver);
// Run child
bp::child process(
bp::search_path("adb"),
bp::args({ "exec-out", "screencap", "-p" }),
bp::std_in.close(),
bp::std_out > process_out,
bp::std_err > process_out
);
asio_context.run();
process.wait();
output_saver.close();
// Finish
return 0;
}
This code works nice; it runs ADB, generates Android device screenshot and saves it with asynchronous pipe, so no temporary files are involved. This specific example saves the screenshot as a file, but in real application you can save data in memory, load and parse it.
I use ADB in my sample, because this tool gives good example of data that generated comparably slowly and that sent via USB or Wi-Fi (so also slowly), and the data size is comparably big (for full HD device with complex image the PNG file will be 1M+).
When I uncomment following line:
this_thread::sleep_for(chrono::milliseconds(50));
The pipe reading operation becomes completely unreliable. The program reads only part of data (of unpredictable size).
So, even so short delay as 50 milliseconds forces boost implementation of asynchronous pipe to fail.
It is not normal situation. What if CPU usage is near 100% (i. e. we are on highly loaded server)? What if the thread runs other ASIO jobs that may execute during 50 milliseconds or less? So, it is just easily reproducible implementation of fundamental boost ASIO bug: asynchronous pipe can not tolerate any delays when you started to read it; you have to call async_read again instantly after you received data, otherwise you are at risk to loose your data.
In practice when I use the same ASIO context to run multiple jobs (not only one async_read that reads process standard output), async_pipe fails in 50% of attempts to read 1M of data or more.
Does anyone know a workaround how to make async_pipe reliable and not to break connection if ASIO context runs async_read with very small delays required to run other jobs?

How to open file in exclusive mode in C++

I am implementing some file system in C++. Up to now I was using fstream but I realized that it is impossible to open it in exclusive mode. Since there are many threads I want to allow multiple reads, and when opening file in writing mode I want to open the file in exclusive mode?
What is the best way to do it? I think Boost offers some features. And is there any other possibility? I would also like to see simple example. If it is not easy / good to do in C++ I could write in C as well.
I am using Windows.
On many operating systems, it's simply impossible, so C++
doesn't support it. You'll have to write your own streambuf.
If the only platform you're worried about is Windows, you can
possibly use the exclusive mode for opening that it offers.
More likely, however, you would want to use some sort of file
locking, which is more precise, and is available on most, if not
all platforms (but not portably—you'll need LockFileEx
under Windows, fcntl under Unix).
Under Posix, you could also use pthread_rwlock. Butenhof
gives an implementation of this using classical mutex and
condition variables, which are present in C++11, so you could
actually implement a portable version (provided all of the
readers and writers are in the same process—the Posix
requests will work across process boundaries, but this is not
true for the C++ threading primitives).
if your app only works on Windows, the Win32 API function CreateFile() is your choice.
For example:
HANDLE hFile = ::CreateFileW(lpszFileFullPathName, GENERIC_WRITE, FILE_SHARE_READ, NULL, OPEN_EXISTING, NULL, NULL);
If you are open to using boost, then I would suggest you use the file_lock class. This means you want to keep the filename of the files you open/close because fstream does not do so for you.
They have two modes lock() that you can use for writing (i.e. only one such lock at a time, the sharable lock prevents this lock too) and lock_sharable() that you can use for reading (i.e. any number of threads can obtain such a lock).
Note that you will find it eventually complicated to manage both, read and write, in this way. That is, if there is always someone to read, the sharable lock may never get released. In that case, the exclusive lock will never be given a chance to take....
// add the lock in your class
#include <boost/interprocess/sync/file_lock.hpp>
class my_files
{
...
private:
...
boost::file_lock m_lock;
};
Now when you want to access a file, you can lock it one way or the other. If the thread is in charge of when they do that, you could add functions for the user to have access to the lock. If your implementation of the read and write functions in my_files are in charge, you want to get a stack based object that locks and unlocks for you (RAII):
class safe_exclusive_lock
{
public:
safe_exclusive_lock(file_lock & lock)
: m_lock_ref(lock)
{
m_lock_ref.lock();
}
~safe_exclusive_lock()
{
m_lock_ref.unlock();
}
private:
file_lock & m_lock_ref;
};
Now you can safely lock the file (i.e. you lock, do things that may throw, you always unlock before exiting your current {}-block):
ssize_t my_files::read(char *buf, size_t len)
{
safe_exclusive_lock guard(m_lock);
...your read code here...
return len;
} // <- here we get the unlock()
ssize_t my_files::write(char const *buf, size_t len)
{
safe_exclusive_lock guard(m_lock);
...your write code here...
return len;
} // <- here we get the unlock()
The file_lock uses a file, so you will want to have the fstream file already created whenever the file_lock is created. If the fstream file may not be created in your constructor, you probably will want to transform the m_lock variable in a unique pointer:
private:
std::unique_ptr<file_lock> m_lock;
And when you reference it, you now need an asterisk:
safe_exclusive_lock guard(*m_lock);
Note that for safety, you should check whether the pointer is indeed allocated, if not defined, it means the file is not open yet so I would suggest you throw:
if(m_lock)
{
safe_exclusive_lock guard(*m_lock);
...do work here...
}
else
{
throw file_not_open();
}
// here the lock was released so you cannot touch the file anymore
In the open, you create the lock:
bool open(std::string const & filename)
{
m_stream.open(...);
...make sure it worked...
m_lock.reset(new file_lock(filename));
// TODO: you may want a try/catch around the m_lock and
// close the m_stream if it fails or use a local
// variable and swap() on success...
return true;
}
And do not forget to release the lock object in your close:
void close()
{
m_lock.reset();
}
Well you can manually prevent yourself from opening a file if it has been opened in write mode already. Just keep track internally of which files you've opened in write mode.
Perhaps you could hash the filename and store it in a table upon open with write access. This would allow fast lookup to see if a file has been opened or not.
You could rename the file, update it under the new name, and rename it back. I've done it, but it's a little heavy.
Since C++17 there are two options:
In C++23 by using the openmode std::ios::noreplace.
In C++17 by using the std::fopen mode x (exclusive).
Note: The x mode was added to c in C11.
C++23 and later:
#include <cerrno>
#include <cstring>
#include <fstream>
#include <iostream>
int main() {
std::ofstream ofs("the_file", std::ios::noreplace);
if (ofs) {
std::cout << "success\n";
} else {
std::cerr << "Error: " << std::strerror(errno) << '\n';
}
}
Demo
C++17 and later:
#include <cerrno>
#include <cstdio>
#include <cstring>
#include <fstream>
#include <iostream>
#include <memory>
struct FILE_closer {
void operator()(std::FILE* fp) const { std::fclose(fp); }
};
// you may want overloads for `std::filesystem::path`, `std::string` etc too:
std::ofstream open_exclusively(const char* filename) {
bool excl = [filename] {
std::unique_ptr<std::FILE, FILE_closer> fp(std::fopen(filename, "wx"));
return !!fp;
}();
auto saveerr = errno;
std::ofstream stream;
if (excl) {
stream.open(filename);
} else {
stream.setstate(std::ios::failbit);
errno = saveerr;
}
return stream;
}
int main() {
std::ofstream ofs = open_exclusively("the_file");
if (ofs) {
std::cout << "success\n";
} else {
std::cout << "Error: " << std::strerror(errno) << '\n';
}
}
Demo

c++ ofstream pointer fails to write to disk

I am having trouble writing to ofstream pointer and this is quite perplexing as I really don't see anything that is missing anymore. Note, this is a follow up from this question:
C++ vector of ofstream, how to write to one particular element
My code is as follows:
std::vector<shared_ptr<ofstream>> filelist;
void main()
{
for(int ii=0;ii<10;ii++)
{
string filename = "/dev/shm/table_"+int2string(ii)+".csv";
filelist.push_back(make_shared<ofstream>(filename.c_str()));
}
*filelist[5]<<"some string"<<endl;
filelist[5]->flush();
exit(1);
}
This does doesn't write anything to the output file but it does create 10 empty files. Does anybody know what might possibly be wrong here?
EDIT: I ran some further tests. I let the code run without exit(1) until completion, over all files until all callbacks are finished. It turns out that some files are not empty, while others that should have data are empty.
There is plenty of disk space, and I know I have more file descriptors than are necessary for this. Any explanation for why some of the files would be written properly while others are not?
I'd try: (*filelist[5])<<"some string\n";.
I'd guess, however, that you probably meant to write to the files inside a loop -- as-is, you're writing to only one file.
Oh, and in C++, you don't want to use exit.
Edit: Here's a quick (tested) standalone demo:
#include <fstream>
#include <string>
#include <vector>
std::vector<std::ofstream *> filelist;
int main() {
for(int ii=0;ii<3;ii++)
{
char *names[] = {"one", "two", "three"};
std::string filename = "c:\\trash_";
filename += names[ii];
filename += ".txt";
filelist.push_back(new std::ofstream(filename.c_str()));
}
for (int i=0; i<filelist.size(); i++) {
(*filelist[i])<<"some string\n";
filelist[i]->close();
}
}
Note, however, that the file name this generates is for Windows, whereas the original was (apparently) intended for something Unix-like. For a Unix-like OS, you'll need/want a different file name string.
Try closing the file before you call exit with filelist[5]->close();. You've aborted a process with an open file which means your write may not have made it to the OS buffer or was discarded upon process exit. You could also remove the exit call it would probably fix the problem. The results of IO on a process that is aborted are tricky to nail down, so it's best to try avoiding aborts with active IO or to assume any active IO will fail upon abort.

How to read from a library standard error?

I have a Qt/C++ acpplication which is using a C++ library.
This library has a log mechanism that writes string messages to standard error.
Now, I would like to be able to redirect those messages toward a panel in my Qt tool.
I would like to avoid modifying the library because is adopted by many other clients.
Any idea how to get at runtime these messages?
Having instead the possibility of changing it what could be a good practise for carrying those messages up to the application?
That's very poor library design. However...
How does it write to standard error. If it is outputing to std::cerr,
then you can change the streambuf used by std::cerr, something like:
std::filebuf logStream;
if ( ~logStream.open( "logfile.txt" ) )
// Error handling...
std::streambuf* originalCErrStream = std::cerr.rdbuf();
std::cerr.rdbuf( &logStream );
// Processing here, with calls to library
std::cerr.rdbuf( originalCErrStream ); // Using RAII would be better.
Just don't forget to restore the original streambuf; leaving std::cerr
pointing to a filebuf which has been destructed is not a good idea.
If they're using FILE*, there's an freopen function in C (and by
inclusion in C++) that you can use.
If they're using system level output (write under Unix, WriteFile
under Windows), then you're going to have to use some system level code
to change the output. (open on the new file, close on fd
STDERR_FILENO, and dup2 to set STDERR_FILENO to use the newly
opened file under Unix. I'm not sure it's possible under
Windows—maybe something with ReOpenFile or some combination of
CloseHandle followed by CreateFile.)
EDIT:
I just noticed that you actually want to output to a Qt window. This
means that you probably need a string, rather than a file. If the
library is using std::cerr, you can use a std::stringbuf, instead of
a std::filebuf; you may, in fact, want to create your own streambuf,
to pick up calls to sync (which will normally be called after each
<< on std::cerr). If the library uses one of the other techniques,
the only thing I can think of is to periodically read the file, to see
if anything has been added. (I would use read() in Unix, ReadFile()
in Windows for this, in order to be sure of being able to distinguish a
read of zero bytes, due to nothing having been written since the last
read, and an error condition. FILE* and iostream functions treat a
read of zero bytes as end of file, and will not read further.)
write to stderr is actually a syscall:
write(2, "blahblah ...");
you can redirect file descriptor number 2 to anything (file, pipe, socket):
close(2); // close old stderr
int redirect_target = open(...); // open a file where you want to redirect to
// or use pipe, socket whatever you like
dup2(redirect_target, 2); // copy the redirect_target fd to fd number 2
close(redirect_target);
in your situation, you will need a pipe.
close(2);
int pipefd[2];
pipe2(pipefd);
dup2(pipefd[1], 2);
close(pipefd[1]);
then, everything write to stderr can be obtained by reading pipe[0]:
read(pipe[0], buffer, ...);
If they're using calls to std::cerr, you can redirect this to a std::ostringstream.
#include <iostream>
#include <sstream>
class cerr_redirector
{
public:
cerr_redirector(std::ostream& os)
:backup_(std::cerr.rdbuf())
,sbuf_(os.rdbuf())
{
std::cerr.rdbuf(sbuf_);
}
~cerr_redirector()
{
std::cerr.rdbuf(backup_);
}
private:
cerr_redirector();
cerr_redirector(const cerr_redirector& copy);
cerr_redirector& operator =(const cerr_redirector& assign);
std::streambuf* backup_;
std::streambuf* sbuf_;
};
You can capture the output using:
std::ostringstream os;
cerr_redirector red(os);
std::cerr << "This is written to the stream" << std::endl;
std::cout will be unaffected:
std::cout << "This is written to stdout" << std::endl;
So you can then test your capture is working:
std::cout << "and now: " << os.str() << std::endl;
Or just add the contents of os.str() to your Qt Window.
Demonstration at ideone.
Here I found a complete implemenation of what i needed...
Thanks everybody for the help! :)
Will loading a DLL dynamically reconcile its stderr to a main application? If so, then how...?