Changing binary file - c++

I have a file on windows. I'm writing in C++. I have a problem where I need to remove some bytes from the end of the file. I am using ifstream, but I don't know how to remove those chars, simple put '\0' in the file or what ?

On linux machines, use truncate():
http://linux.die.net/man/2/truncate
On the Windows machines, use SetEndOfFile():
http://msdn.microsoft.com/en-us/library/aa365531%28v=vs.85%29.aspx
Both are OS dependent calls.

You can't portably change the size of a file; the only way to do it is to copy the file to a temporary, then delete the original and rename the temporary.
If it's just a case of truncating the file, both Windows and Unix (but not necessarily other systems) have system level functions which can do this, but there's nothing in the standard which supports it. And if you ever end up having to remove bytes other than at the end, neither Windows nor Unix allow it (although some other systems do, at least in specific cases).

Why not truncate the file? Have a look at the chsize() method.

Related

How to keep CR when reading file into string? [duplicate]

I tried using fopen in C, the second parameter is the open mode. The two modes "r" and "rb" tend to confuse me a lot. It seems they are the same. But sometimes it is better to use "rb". So, why does "r" exist?
Explain it to me in detail or with examples.
Thank You.
You should use "r" for opening text files. Different operating systems have slightly different ways of storing text, and this will perform the correct translations so that you don't need to know about the idiosyncracies of the local operating system. For example, you will know that newlines will always appear as a simple "\n", regardless of where the code runs.
You should use "rb" if you're opening non-text files, because in this case, the translations are not appropriate.
On Linux, and Unix in general, "r" and "rb" are the same. More specifically, a FILE pointer obtained by fopen()ing a file in in text mode and in binary mode behaves the same way on Unixes. On windows, and in general, on systems that use more than one character to represent "newlines", a file opened in text mode behaves as if all those characters are just one character, '\n'.
If you want to portably read/write text files on any system, use "r", and "w" in fopen(). That will guarantee that the files are written and read properly. If you are opening a binary file, use "rb" and "wb", so that an unfortunate newline-translation doesn't mess your data.
Note that a consequence of the underlying system doing the newline translation for you is that you can't determine the number of bytes you can read from a file using fseek(file, 0, SEEK_END).
Finally, see What's the difference between text and binary I/O? on comp.lang.c FAQs.
use "rb" to open a binary file. Then the bytes of the file won't be encoded when you read them
"r" is the same as "rt" for Translated mode
"rb" is
non-translated mode.
This makes a difference on Windows, at least. See that link for details.
On most POSIX systems, it is ignored. But, check your system to be sure.
XNU
The mode string can also include the letter 'b' either as last character or as a character between the characters in any of the two-character strings described above. This is strictly for compatibility with ISO/IEC 9899:1990 ('ISO C90') and has no effect; the 'b' is ignored.
Linux
The mode string can also include the letter 'b' either as a last
character or as a character between the characters in any of the two-
character strings described above. This is strictly for
compatibility with C89 and has no effect; the 'b' is ignored on all
POSIX conforming systems, including Linux. (Other systems may treat
text files and binary files differently, and adding the 'b' may be a
good idea if you do I/O to a binary file and expect that your program
may be ported to non-UNIX environments.)

Multi-processing and file operations?

In windows-based OS, assuming there are several different processes that may read and/or write a file freqently by using fopen/fopen_s/fwrite etc, in such case, do I need to consider data-races, or the OS can handle this automatically to ensure the file can only be opened/updated by a single process here at any given time whilst the rest fopen attemp will fail? And what about linux-based OS on this matter?
In Windows it depends on how you open the file.
see some possible values for uStyle parameter in case of OpenFile and dwShareMode in case of CreateFile.
Please note that OpenFile is kind of deprecated though so better use CreateFile.
You will have to take care to not open the same file from multiple threads simultaneously - as it's entirely possible to open the file multiple times, and the OS may or may not do what you expect, depending on the mode you are opening the file in - e.g. if you create a new file it will definitely create two different files (one of which will disappear when it gets closed, because it was deleted by the other thread, great, eh?). The rules are pretty complex, and the worst part is that if you don't take extra care, you'll get "mixed up output to the same file" - so lines or even parts of lines get mixed from the two threads.
Even if the OS stops you from opening the same file twice, you will still have to deal with the consequences of "FILE * came back as NULL". What do you do then? Go back and try again, or fail, or?
I'm not sure I can make a good suggestion as to HOW to solve this problem, since you haven't described very well what you are doing to these files. There are a few different things that come to mind:
Keep a "register" of file-names, and a mutex for each file that has to be held to be able to open the file.
Use a single "file-thread" to read/write data on files, and just queue "I want to write this stuff to file aa.txt", and let the worker write as it goes along.
Use lower level file system calls, and use "exclusive" access to the files, with some sort of "backoff" behaviour in case of collision.
I'm sure there are dozens of other ways to solve the problem - it really depends on what you are trying to do.
Maybe. If you're talking about different processes (and not
threads), the conventional data race conditions which apply to
threads don't apply. However (and there is no difference
between Unix and Windows here):
Any single write/WriteFile operation will be atomic. (I'm
not 100% sure concerning Windows, but I can't imagine it
otherwise.) However, if you're using iostream or the older
FILE* functions, you don't have direct control of when those
operations take place. Normally, they will only occur when the
stream's buffer is full. You'll want to ensure that the buffer
is big enough, and explicitly flush after each output. (If
you're outputting lines of a reasonable length, say 80
characters at the most, it's a safe bet that the buffer will
hold a complete line. In this case, just use std::endl to
terminate the lines in iostreams; for the C style functions,
you'll have to call setvbuf( stream, NULL, _IOLBF, 0 ) before
the first output.
Each open file in the process has its own idea of where to
write in the file, and its own idea of where the end of file is.
If you want all writes to go to the end of file, you'll need to
open it with std::ios_base::app in C++, or "a" in C. Just
std::ios_base::out/"w" is not enough. (Also, of course,
with just std::ios_base::out or "w", the file will be
truncated when it is opened. Having several different processes
truncating the file could result in loss of data.)
When reading a file that other processes are writing to: when
you reach end of file, the stream or FILE goes into an error
state, and will not try to read further, even if other processes
are appending data. In C, clearerr should (I think) undo
this, but it's not clear what happens next; in C++, clearing the
error in the stream doesn't necessarily mean that further reads
will not immediately encounter end of file either. In both
cases, the safest bet is to memorize where you were before each
read, and if the read fails, close the file, then later reopen
it, seek to where you were, and start reading from there.
Random access, writing other than at the end of file, will
also work, as long as all writes are atomic (see above); you
should always get a consistent state. If what you write depends
on what you have read, however, and other processes are doing
something similar, you'll need file locking, which isn't
available at the iostream/FILE* level.

Read/Write at the same time

What I am doing is opening my file using fstream at the start of the main and closing it at the end. In between I am writing "Hello World" and after that reading what I wrote but the result is always weired charecters and not the "Hello World". I did do a cast to char but that didnt help. Any way I can do this?
You need to interpose an fseek call when you switch from reading to writing, or viceversa. (Of course, you also need to fopen for "r+" or the like, so that both reading and writing are allowed, but I imagine you are already aware of that -- the need for seeking in order to switch between reading and writing is a lesser known fact).
As this page puts it,
For the modes where both read and
writing (or appending) are allowed
(those which include a "+" sign), the
stream should be flushed (fflush) or
repositioned (fseek, fsetpos, rewind)
between either a reading operation
followed by a writing operation or a
writing operation followed by a
reading operation.
I'd be amused if this works, because I always had to open a file twice to do that: once for reading and once for writing. Even then, I had to write the whole file out and close it (which flushed the OS buffers) before I could be sure I could read the whole file and not get an early EOF.
Nowadays, since I use Unix-style operating systems, I would just use the pipe() function. Not sure if that works in Windows (because so much doesn't, like select() on files).
Make sure you are seeking to the beginning of the file before reading, like so:
fileFStream.seekg(0, ios_base::beg);
If that doesn't work, post your code.

Can we write an EOF character ourselves?

Most of the languages like C++ when writing into a file, put an EOF character even if we miss to write statements like :
filestream.close
However is there any way, we can put the EOF character according to our requirement, in C++, for an instance.
Or any other method we may use apart from using the functions provided in C++.
If you need to ask more of information then kindly do give a comment.
EDIT:
What if, we want to trick the OS and place an EOF character in a file and write some data after the EOF so that an application like notepad.exe is not able to read after our EOF character.
I have read answers to the question related to this topic and have come to know that nowdays OS generally don't see for an EOF character rather check the length of file to get the correct idea of knowing about the length of the file but, there must be a procedure in OS which would be checking the length of file and then updating the file records.
I am sorry if I am wrong at any point in my estimation but please do help me because it can lead to a lot of new ideas.
There is no EOF character. EOF by definition "is unequal to any valid character code". Often it is -1. It is not written into the file at any point.
There is a historical EOF character value (CTRL+Z) in DOS, but it is obsolete these days.
To answer the follow-up question of Apoorv: The OS never uses the file data to determine file length (files are not 'null terminated' in any way). So you cannot trick the OS. Perhaps old, stupid programs won't read after CTRL+Z character. I wouldn't assume that any Windows application (even Notepad) would do that. My guess is that it would be easier to trick them with a null (\0) character.
Well, EOF is just a value returned by the function defined in the C stdio.h header file. Its actually returned to all the reading functions by the OS, so its system dependent. When OS reaches the end of file, it sends it to the function, which in its return value than places most commonly (-1), but not always. So, to summarize, EOF is not character, but constant returned by the OS.
EDIT: Well, you need to know more about filesystem, look at this.
Hi, to your second question:
once again, you should look better into filesystems. FAT is a very nice example because you can find many articles about it, and its principles are very similar to NTFS. Anyway, once again, EOF is NOT a character. You cannot place it in file directly. If you could do so, imagine the consequences, even "dumb" image file could not be read by the system.
Why? Because OS works like very complex structure of layers. One of the layers is the filesystem driver. It makes sure that it transfers data from every filesystem known to the driver. It provides a bridge between applications and the actual system of storing files into HDD.
To be exact, FAT filesystem uses the so-called FAT table - it is a table located close to the start of the HDD (or partition) address space, and it contains map of all clusters (little storage cells). OK, so now, when you want to save some file to the HDD, OS (filesystem driver) looks into FAT table, and searches for the value "0x0". This "0x0" value says to the OS that cluster which address is described by the location of that value in FAT table is free to write.
So it writes into it the first part of the file. Then, it looks for another "0x0" value in FAT, and if found, it writes the second part of the file into cluster which it points to. Then, it changes the value of the first FAT table record where the file is located to the physical address of the next in our case second part of the file.
When your file is all stored on HDD, now there comes the final part, it writes desired EOF value, but into FAT table, not into the "data part" of the HDD. So when the file is read next time, it knows this is the end, donĀ“t look any further.
So, now you see, if you would want to manually write EOF value into the place it doesn't belong to, you have to write your own driver which would be able to rewrite the FAT record, but this is practically impossible to do for beginners.
I came here while going through the Kernighan & Ritchie C exercises.
Ctrl+D sends the character that matches the EOF constant from stdio.h.
(Edit: this is on Mac OS X; thanks to #markmnl for pointing out that the Windows 10 equivalent is Ctrl+Z)
Actually in C++ there is no physical EOF character written to a file using either the fprintf() or ostream mechanisms. EOF is an I/O condition to indicate no more data to read.
Some early disk operating systems like CP/M actually did use a physical 0x1A (ASCII SUB character) to indicate EOF because the file system only maintained file size in blocks so you never knew exactly how long a file was in bytes. With the advent of storing actual length counts in the directory it is no longer typical to store an "EOF" character as part of the 'in-band' file data.
Under Windows, if you encounter an ASCII 26 (EOF) in stdin, it will stop reading the rest of the data. I believe writing this character will also terminate output sent to stdout, but I haven't confirmed this. You can switch the stream to binary mode as in this SO question:
#include <io.h>
#include <fcntl.h>
...
_setmode(0, _O_BINARY)
And not only will you stop 0x0A being converted to 0x0D 0x0A, but you'll also gain the ability to read/write 0x1A as well. Note you may have to switch both stdin (0) and stdout (1).
If by the EOF character you mean something like Control-Z, then modern operating systems don't need such a thing, and the C++ runtime will not write one for you. You can of course write one yourself:
filestream.put( 26 ); // write Ctrl-Z
but there is no good reason to do so. There is also no need to do:
filesystem.close();
as the file stream will be closed for you automatically when its destructor is called, but it is (I think) good practice to do so.
There is no such thing as the "EOF" character. The fact of closing the stream in itself is the "EOF" condition.
When you press Ctrl+D in a unix shell, that simply closes the standard input stream, which in turn is recognized by the shell as "EOF" and it exits.
So, to "send" an "EOF", just close the stream to which the "EOF" needs to be sent.
Nobody has yet mentioned the [f]truncate system calls, which are how you make a file shorter without recreating it from scratch.
The truncate() and ftruncate() functions cause the regular file named by path or referenced by fd to be truncated to a size of precisely length bytes.
If the file previously was larger than this size, the extra data is lost. If the file previously was shorter, it is extended, and the extended part reads as null bytes ('\0').
Understand that this is a distinct operation from writing any sort of data to the file. The file is a linear array of bytes, laid out on disk somehow, with metadata that says how long it is; truncate changes the metadata.
On modern filesystems EOF is not a character, so you don't have to issue it when finishing to write to a file. You just have to close the file or let the OS do it for you when your process terminates.
Yes, you can manually add EOF to a file.
1) in Mac terminan, create a new file. touch filename.txt
2) Open the file in VI
vi filename.txt
3) In Insert mode (hit i), type Control+V and then Control+D. Do not let go of the Control key on the Mac.
Alternatively, if I want other ^NewLetters, like ^N^M^O^P, etc, I could do Contorl+V and then Control+NewLetter. So for example, to do ^O, hold down control, and then type V and O, then let go of Control.

C++ ofstream vs. C++ cout piped to file

I'm writing a set of unit tests that write calculated values out to files. Each test produces a square matrix that holds anywhere from 50,000 to 500,000 doubles, and I have a total of 128 combinations of test cases.
Is there any significant overhead involved in writing cout statements and then piping that output to files, or would I be better off writing directly to the file using an ofstream?
This is going to be dependent on your system and environment. This likely to be very little difference, but there is only one way to be sure: try both approaches and measure them.
Since the dimensions involved are so large I'm assuming that these files are not meant to be read by a human being? Just make sure you write them out as binary and not human-readable text because that will make so much more difference than the difference between using ofstream or piping cout.
Whether this means you have to use ofstream or not I don't know. I've never written binary to cout so I can't say whether that's possible...
As Charles Bailey said, it's implementation dependent; what follows is mostly for linux implementation with gnu toolchain, but I hardly imagine it being very different in other os.
In libstdc++ 4.4.2:
An fstream contain an underlying stdio_filebuf which is a basic_filebuf. This basic_filebuf contain it's own buffer by inheriting basic_streambuf, and actually contain a __basic_file, itself containing an underlying plain C stdio abstraction (FILE* or std::__c_file*), in which it flush the buffer.
cout, which is an ostream is initialized with a stdio_sync_filebuf itself initialized with the C file abstraction stdout. stdio_sync_filebuf call plain C stdio functions.
Considering only C++, it appear that an fstream may be more efficient thanks to two layers of buffer.
Considering C only, if the process is forked with the stdout file descriptor redirected in a file, there should be no difference between writing to a new opened file (what fstream does at the end) or to stdout since the fd point to a file anyway (what cout does at the end).
If I were you, I would use an fstream since it's your intent.