Reading a (binary) file fails, but only on linux

Reading a (binary) file fails, but only on linux - c++

I've never done programming in linux before, but now am writing server software. I want to read a binary file. The code is essentially
std::basic_ifstream<std::byte> read_file(filename.data(), std::ios::binary);
std::vector<std::byte> contents(100);
read_file.read(contents.data(), 1);
Note that this is a reduced code. The actual allocates the correct amount of bytes, verifies failbits and correct filesize etc. I can verify that the file in question exists and is actually found by the program (as renaming produces a violent error). But even the single byte read changes the failbit from false to true. The file has all the permissions in the world and the program has been optionally executed as root. I don't know anything about linux or what could possible cause such a fail. Again: he finds the file, the error only happens when reading the first byte.
The server is a digitalocean droplet - is there any kind of protection against reading files I'm not aware of maybe? I'm a bit out of ideas.
The exact same program works correct on a (local) windows machine.

Related

How to check if a file is used by another process in C++?

I need to check if a file is currently opened by another process, e.g. a text editor (but needs to apply to everything else too).
I tried using std::ofstream::is_open() etc., but this did not work. I could open the file in my text editor while my program was checking if it was open. The program saw it as a closed file and went on. Only if I opened it as another ofstream would this work.
I'm using the filesystem library to copy files and they may only be copied (and later removed) if the file is not currently written to by another process on the client server.
Really curious about this one. Been wondering this for quite some time but never found a good way for it myself.
I'm currently making a program that needs to be able to run on both linux and windows. every 5 seconds it copies all files from directory a,b,c,d to x. This can be set by the client in rules. after it copied everything. all the files may be removed. After a day (or whatever the client tells the program) all those files from x need to be zipped and archived on location y. Hence the problem, files may only be deleted (and copied) if the other programs that place all the files in directories a,b,c,d are not touching that specific file right now. Hope that makes the question clearer.
And before anybody starts. Yes I know about the data race condition. I do not care about this for now. The program does absolutely nothing with the contents of a file. And after a file is closed by the other process, it will be closed forever.

I need to check if a file is currently opened by another process
This is heavily operating system specific (and might be useless)
So read first a good textbook on operating systems.
On Linux specifically you might use inotify(7) facilities, or /proc/ pseudo-file system (see proc(5)), or perhaps lsof(8). They work only for local file systems (not remote ones, like NFS). See also Advanced Linux Programming and syscalls(2).
And you could have surprises (e.g. a process being scheduled so quickly that removes a file that you won't have time to do anything)
For Windows take more time to read its documentation.
I'm currently making a program that needs to be able to run on both linux and windows. every 5 seconds it copies all files from directory a,b,c,d to x.
You might look, at least for inspiration, inside the source code of rsync.
I don't understand what your actual problem is, but rsync might be part of the solution and is rumored to run on both Windows and Linux

Is file ready to read immediately after std::ofstream is destroyed?

Basically I have following workflow (through console application):
Read binary file (std::ifstream::read)
Do something with data read
Write back to same file (std::ofstream::write), overwriting what
was there before.
Now, if I run this whole console program 1000 times through a shell script (always using same file), is it safe to assume that read operation will not conflict with previously run program trying to write to file? Or I need to wait between executions (how long???)? Can I reliably determine if file is ready?
I know it is not the best design, just want to know if that will work reliably (trying to quickly gather some statistics - inputs are different, but output file is always same - needs to be read, info needs to be processed, then it needs to be updated (at this point simply overwriting it) ).
EDIT:
It looks like the problem with output being wrong is not related to OS based on answers, the read/write I do looks like:
//read
std::ifstream input(fname,std::ios_base::binary);
while(input)
{
unsigned value;
input.read(reinterpret_cast<char*>(&value),sizeof(unsigned));
....
}
input.close();
...
//write
std::ofstream output(fname,std::ios_base::binary);
for(std::map<unsigned,unsigned>::const_iterator iter =originalMap.begin();iter != originalMap.end();++iter)
{
unsigned temp = iter->first;
output.write(reinterpret_cast<char*>(&temp),sizeof(unsigned));
temp = iter->second;
output.write(reinterpret_cast<char*>(&temp),sizeof(unsigned));
}

They are running sequentially, essentially, shell script runs same console app in a loop...
Then, on any "normal" operating system, there should be no problem.
When the application terminates the streams are destroyed, so all the data that may be kept in cache by the C/C++ streams is written in the underlying OS streams, that are then closed.
Whether the OS does some more caching is irrelevant - caching done by the operating system is transparent to applications, so, as far as applications are concerned, the data is now written in the file. If it is actually written on disk is of no concern here - the applications reading from the file will see the data in it anyway.
If you think about it, without such a guarantee it would be complicated to do reliably any work on a computer! :)
Update:
std::ofstream output(fname,std::ios_base::binary);
You should truncate the output file before writing on it, otherwise, if the input file was longer than the output, old data will still be lingering at the end of the file:
std::ofstream output(fname,std::ios_base::binary | std::ios_base::trunc);

Check parameters of the fstream ctor. Some implementations have extension, that allows conveniently set set sharing modes.
If you ask exclusive read or write, that's what you get as long as you keep the stream open -- other similar operations can not happen either from different processes or the same with a different stream instance.
With pure standard it requires more hops, probably setting them in filebuf and replace the stock one. Look after it.
Use of sharing modes is the mainstream way to defend file consistency, so I suggest to use it in any case.
Certainly if you make sure that race conditions are handled, one process will not open the file before the other closed it, the result is good that way too.

ofstream::write fails in the middle when writing large binary files

During runtime my program creates and writes two large binary files simutaneously to the disk. File A is about 240GB, file B is about 480GB. The two files are maintained by two ofstream objects, and the write opertations are performed with the member funcion write in a loop.
Now the problem is: The write file operation fails everytime the whole write file procedure reaches 63~64%. The first time it failed on file A, and the second time it failed on file B.
During the program runs these days, the power supply of my building happens to be under upgrade. By a strange coincidence, every time the program failed, the electrician happened to be cutting and resuming the power supply of the central air-conditioner and some offices. Therefore, I really wonder whether the write file failures were caused by unstable power supply.
I'm sure that the failure is not caused by file size limit, because I've tried to write a single 700GB file using the same method without any problem.
Is there any way to find out the detailed reason? I feel that the flags (badbit, eofbit and failbit) of ofstream don't provide too much information. Now I'm trying to use errno and strerror to get the detailed error message. However, I see that a possible value of errno is EIO, which measn "I/O error", which again provides no useful information.
Is there anyone who encountered this situation before?
By the way, the program runs without error when the sizes of file A and file B are small.
PS: This time the program fails at 55%, and the errno value is EINVAL: Invalid argument. Very strange.

Confirmed, the cause is indeed a bug of NTFS: A heavily fragmented file in an NTFS volume may not grow beyond a certain size. This means that CreateFile and WriteFile cannot fundamentally solve the problem, either.

All right, I've solved the problem with Win32 API: CreateFile and WriteFile.

Versioning executable and modifying it in runtime

What I'm trying to do is to sign my compiled executable's first 32 bytes with a version signature, say "1.2.0" and I need to modify this signature in runtime, keeping in mind that:
this will be done by the executable itself
the executable resides on the client side, meaning no recompilation is possible
using an external file to track the version instead of encoding it in the binary itself is also not an option
the solution has to be platform-independent; I'm aware that Windows/VC allows you to version an executable using a .rc resource, but I'm unaware of an equivalent for Mac (maybe Info.plist?) and Linux
The solution in my head was to write the version signature in the first or last 32 bytes of the binary (which I didn't figure out how to do yet) and then I'll modify those bytes when I need to. Sadly it's not that simple as I'm trying to modify the same binary that I'm executing.
If you know of how I can do this, or of a cleaner/mainstream solution for this problem, I'd be very grateful. FWIW, the application is a patcher/launcher for a game; I chose to encode the version in the patcher itself instead of the game executable as I'd like it to be self-contained and target-independent.
Update: from your helpful answers and comments, I see that messing with the header/footer of the binary is not the way to go. But regarding the write permission for the running users, the game has to be patched one way or another and the game files need to be modified, there's no way to circumvent that: to update the game, you'll need admin privileges.
I would opt for using an external file to hold the signature, and modify that with every update, but I can't see how I can guard against the user spoofing with that file: if they mess up the version numbers, how can I detect which version I'm running?
Update2: Thanks for all your answers and comments, in truth there are 2 ways to do this: either use an external resource to track the version or embed it in the main application's binary itself. I could choose only 1 answer on SO so I did the one I'm going with, although it's not the only one. :-)

Modern Windows versions will not allow you to update an installed program file unless you're running with administrator privileges. I believe all versions of Windows block modifications to a running file altogether; this is why you're forced to reboot after an update. I think you're asking for the impossible.

This is going to be a bit of a challenge, for a number of reasons. First, writing to the first N bytes of the binary is likely to step on the binary file's header information, which is used by the program loader to determine where the code & data segments, etc. are located within the file. This will be different on different platforms (see the ELF format and executable format comparison)--there are a lot of different binary format standards.
Assuming you can overcome that one, you're likely to run afoul of security/antivirus systems if you start modifying a program's code at runtime. I don't believe most current operating systems will allow you to overwrite a currently-running executable. At the very least, they might allow you to do so with elevated permissions--not likely to be present while gaming.

If your application is meant to patch a game, why not embed the version in there while you're at it? You can use a string like #Juliano shows and modify that from the patcher while the game is not running - which should be the case if you're currently patching anyways. :P
Edit: If you're working with Visual Studio, it's really easy to embed such a string in the executable with a #pragma comment, according to this MSDN page:
#pragma comment(user, "Version: 1.4.1")
Since the second argument is a simple string literal, it can be concatenated, and I'd have the version in a simple #define:
// somehwere
#define MY_EXE_VERSION "1.4.1"
// somewhere else
#pragma comment(user, "Version: " MY_EXE_VERSION)

I'll give just some ideas on how to do this.
I think it's not possible to change some arbitrary bytes in the executable without side effects. To overcome this, I would create some string in your source code, like:
char *Version = "Version: AA.BB.CC";
I don't know if this is a rule, but you can look for this string in your binary code (open it in a text editor and you will see). So, you search and change this bytes for your version number in the binary file. Probably, their position will vary each time you compile the application, so this it is possible only if that location is not a problem for you.
Because the file is being used (it's running), you have to launch an external program that would do this. After modifying the file, this external program could relaunch the original application.
The version will be stored in your binary code in some part. Is that useful? How will you retrieve the version number?

fopen: is it good idea to leave open, or use buffer?

So I have many log files that I need to write to. They are created when program begins, and they save to file when program closes.
I was wondering if it is better to do:
fopen() at start of program, then close the files when program ends - I would just write to the files when needed. Will anything (such as other file io) be slowed down with these files being still "open" ?
OR
I save what needs to be written into a buffer, and then open file, write from buffer, close file when program ends. I imagine this would be faster?

Well, fopen(3) + fwrite(3) + fclose(3) is a buffered I/O package, so another layer of buffering on top of it might just slow things down.
In any case, go for a simple and correct program. If it seems to run slowly, profile it, and then optimize based on evidence and not guesses.

Short answer:
Big number of opened files shouldn't slow down anything
Writing to file will be buffered anyway
So you can leave those files opened, but do not forget to check the limit of opened files in your OS.

Part of the point of log files is being able to figure out what happened when/if your program runs into a problem. Quite a few people also do log file analysis in (near) real-time. Your second scenario doesn't work for either of these.
I'd start with the first approach, but with a high-enough level interface that you could switch to the second if you really needed to. I wouldn't view that switch as a major benefit of the high-level interface though -- the real benefit would normally be keeping the rest of the code a bit cleaner.

There is no good reason to buffer log messages in your program and write them out on exit. Simply write them as they're generated using fprintf. The stdio system will take care of the buffering for you. Of course this means opening the file (with fopen) from the beginning and keeping it open.

For log files, you will probably want a functional interface that flushes the data to disk after each complete message, so that if the program crashes (it has been known to happen), the log information is safe. Leaving stuff in standard I/O buffers means excavating the data from a core dump - which is less satisfactory than having the information on disk safely.
Other I/O really won't be affected by holding one - or even a few - log files open. You lose a few file descriptors, perhaps, but that is not often a serious problem. When it is a problem, you use one file descriptor for one log file - and you keep it open so you can log information. You might elect to map stderr to the log file, leaving that as the file descriptor that's in use.

It's been mentioned that the FILE* returned by fopen is already buffered. For logging, you should probably also look into using the setbuf() or setvbuf() functions to change the buffering behavior of the FILE*.
In particular, you might want to set the buffering mode to line-at-a-time, so the log file is flushed automatically after each line is written. You can also specify the size of the buffer to use.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js