Reading binary istream byte by byte - c++

I was attempting to read a binary file byte by byte using an ifstream. I've used istream methods like get() before to read entire chunks of a binary file at once without a problem. But my current task lends itself to going byte by byte and relying on the buffering in the io-system to make it efficient. The problem is that I seemed to reach the end of the file several bytes sooner than I should. So I wrote the following test program:
#include <iostream>
#include <fstream>
int main() {
typedef unsigned char uint8;
std::ifstream source("test.dat", std::ios_base::binary);
while (source) {
std::ios::pos_type before = source.tellg();
uint8 x;
source >> x;
std::ios::pos_type after = source.tellg();
std::cout << before << ' ' << static_cast<int>(x) << ' '
<< after << std::endl;
}
return 0;
}
This dumps the contents of test.dat, one byte per line, showing the file position before and after.
Sure enough, if my file happens to have the two-byte sequence 0x0D-0x0A (which corresponds to carriage return and line feed), those bytes are skipped.
I've opened the stream in binary mode. Shouldn't that prevent it from interpreting line separators?
Do extraction operators always use text mode?
What's the right way to read byte by byte from a binary istream?
MSVC++ 2008 on Windows.

The >> extractors are for formatted input; they skip white space (by
default). For single character unformatted input, you can use
istream::get() (returns an int, either EOF if the read fails, or
a value in the range [0,UCHAR_MAX]) or istream::get(char&) (puts the
character read in the argument, returns something which converts to
bool, true if the read succeeds, and false if it fails.

there is a read() member function in which you can specify the number of bytes.

Why are you using formatted extraction, rather than .read()?

source.get()
will give you a single byte. It is unformatted input function.
operator>> is formatted input function that may imply skipping whitespace characters.

As others mentioned, you should use istream::read(). But, if you must use formatted extraction, consider std::noskipws.

Related

Read a file line by line in C++

I wrote the following C++ program to read a text file line by line and print out the content of the file line by line. I entered the name of the text file as the only command line argument into the command line.
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char* argv[])
{
char buf[255] = {};
if (argc != 2)
{
cout << "Invalid number of files." << endl;
return 1;
}
ifstream f(argv[1], ios::in | ios::binary);
if (!f)
{
cout << "Error: Cannot open file." << endl;
return 1;
}
while (!f.eof())
{
f.get(buf,255);
cout << buf << endl;
}
f.close();
return 0;
}
However, when I ran this code in Visual Studio, the Debug Console was completely blank. What's wrong with my code?
Apart from the errors mentioned in the comments, the program has a logical error because istream& istream::get(char* s, streamsize n) does not do what you (or I, until I debugged it) thought it does. Yes, it reads to the next newline; but it leaves the newline in the input!
The next time you call get(), it will see the newline immediately and return with an empty line in the buffer, for ever and ever.
The best way to fix this is to use the appropriate function, namely istream::getline() which extracts, but does not store the newline.
The EOF issue
is worth mentioning. The canonical way to read lines (if you want to write to a character buffer) is
while (f.getline(buf, bufSz))
{
cout << buf << "\n";
}
getline() returns a reference to the stream which in turn has a conversion function to bool, which makes it usable in a boolean expression like this. The conversion is true if input could be obtained. Interestingly, it may have encountered the end of file, and f.eof() would be true; but that alone does not make the stream convert to false. As long as it could extract at least one character it will convert to true, indicating that the last input operation made input available, and the loop will work as expected.
The next read after encountering EOF would then fail because no data could be extracted: After all, the read position is still at EOF. That is considered a read failure. The condition is wrong and the loop is exited, which was exactly the intent.
The buffer size issue
is worth mentioning, as well. The standard draft says in 30.7.4.3:
Characters are extracted and stored until one of the following occurs:
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit));
traits::eq(c, delim) for the next available input character c
(in which case the input character
is extracted but not stored);
n is less than one or n - 1 characters are stored
(in which case the function calls setstate(
failbit)).
The conditions are tested in that order, which means that if n-1 characters have been stored and the next character is a newline (the default delimiter), the input was successful (and the newline is extracted as well).
This means that if your file contains a single line 123 you can read that successfully with f.getline(buf, 4), but not a line 1234 (both may or may not be followed by a newline).
The line ending issue
Another complication here is that on Windows a file created with a typical editor will have a hidden carriage return before the newline, i.e. a line actually looks like "123\r\n" ("\r" and "\n" each being a single character with the values 13 and 10, respectively). Because you opened the file with the binary flag the program will see the carriage return; all lines will contain that "invisible" character, and the number of visible characters fitting in the buffer will be one shorter than one would assume.
The console issue ;-)
Oh, and your Console was not entirely empty; it's just that modern computers are too fast and the first line which was probably printed (it was in my case) scrolled away faster than anybody could switch windows. When I looked closely there was a cursor in the bottom left corner where the program was busy printing line after line of nothing ;-).
The conclusion
Debug your programs. It's very easy with VS.
Use getline(istream, string).
Use the return value of input functions (typically the stream)
as a boolean in a while loop: "As long as you can extract any input, use that input."
Beware of line ending issues.
Consider C I/O (printf, scanf) for anything non-trivial (I didn't discuss this in my answer but I think that's what many people do).

C++ ifstream will read some values then stop

I am trying to write a program that reads 940 4-byte long values of binary data [hex] from a bin file, and output the values to console. I have ifstream::read, cout and seekg operations in a loop.
It will work for the first 10 or so iterations, and then in one iteration skip the read and write operations, preform the seekg operation, and continue on reading and writing. Also the last 200 lines or so are coming out the same value.
It will work properly for 12 iterations, then it will start outputting the wrong numbers. At this point it goes from address 0x230 to 0x28B when it should be at 0x260. It looks like read and cout are not called in this particular iteration.
The last correct value reads 3f4fc938. The next value should be 3ef646c1.
Does anyone know why this would fail? Any help is appreciated.
This is the program:
int main(int argc, char* argv[]) {
fstream in;
uint32_t buffer;
in.open(argv[1]);
in.seekg(0x6500,in.beg);
for(int i = 0; i < 940; i++) {
in.read(reinterpret_cast<char*> (&buffer),4);
cout << hex << buffer << endl;
in.seekg(0x2c,in.cur);
}
}
You have opened your file in text mode. Text mode means that operations on the file will interpret a Byte sequence that matches the platform-specific representation of a newline as a single '\n' character. If you're on Windows, for example, newlines are represented as the Byte sequence 0D 0A. So on Windows, whatever you do in your file will work well up to the point where your file happens to have a Byte with value 13 followed by a Byte with value 10. Once you reach that point, that 13 followed by 10 will be interpreted as a single character. Essentially, text mode will just swallow any Byte with value 13 if it happens to appear right before a Byte with value 10. Your application will never see the 13 and anything beyond the point where the 13 appeared will end up "shifted" by one Byte. On other platforms, other newline representations are common. If you wanna work with binary data, you will generally want to open your file in binary mode, for example
fstream in(argv[1], std::ios::binary);
or
in.open(argv[1], std::ios::binary);

visual studio c++ cin big string from command line

When I run the following program and paste 50000 symbols to the command line, the program gets 4096 symbols only. Could you please suggest me what to do in order to get the full list of symbols?
#include <iostream>
#include <string>
using namespace std;
int main()
{
char temp[50001];
while (cin.getline(temp, 50001, '\n'))
{
string s(temp);
cout << s.size() << endl;
}
return 0;
}
P.S.
When I read the symbols from file using fstream, it's OK
I'm taking a leap jump here but since many powershell terminals have 4096 truncation limits (take a look at the Out-File documentation), this is likely a Windows command line limitation rather than a getline limitation.
The same problem has been encountered previously by others: https://github.com/Discordia/large-std-input/blob/master/LargeStdInput/Main.cpp
I don't understand why you are reading into a character array, then transferring it into a string.
In any case, your issue may be with repeated allocations.
Reading into std::string directly
Two simple lines:
std::string s;
getline(cin, s, '\n');
Reading into an array first
Yes, there is a simpler method:
#define BUFFER_SIZE 8196 // Very important, named constant
char temp[BUFFER_SIZE];
cin.getline(temp, BUFFER_SIZE, '\n');
// Get the number of characters actually read
unsigned int chars_read = cin.gcount();
std::string s(temp, chars_read); // Here's how to transfer the characters.
Using a debugger, you need to view the value in chars_read to verify that the quantity of characters read is valid.
Binary reading
Some platforms provide translations between the data read and your program. For example, Windows uses Ctrl-Z as an EOF character; Linux uses Ctrl-D.
The input data may use UTF encoding and contain values outside the range of ASCII printable set.
So, the preferred method is to read from a stream opened in binary mode. Unfortunately, cin cannot be opened easily in binary mode.
See Open cin in binary
The preferred method, if possible, is to put the text into a file and read from the file.

What's the difference between read, readsome, get, and getline?

What is the difference between these functions. When I use them they all do the same thing. For example all three calls return "hello":
#include <iostream>
#include <sstream>
int main()
{
stringstream ss("hello");
char x[10] = {0};
ss.read(x, sizeof(x)); // #1
std::cout << x << std::endl;
ss.clear();
ss.seekg(0, ss.beg);
ss.readsome(x, sizeof(x)); // #2
std::cout << x << std::endl;
ss.clear();
ss.seekg(0, ss.beg);
ss.get(x, sizeof(x)); // #3
std::cout << x;
ss.clear();
ss.seekg(0, ss.beg);
ss.getline(x, sizeof(x)); // #4
std::cout << x << std:endl;
}
get and getline are quite similar, when get is called with parameters ( char_type* s, std::streamsize count ). However, get reads from the stream until a delimiter is found, and then leaves it there. getline by comparison will pull the delimiter off the stream, but then drop it. It won't be added to the buffer it fills.
get looks for \n, and when a specific number of characters is provided in an argument (say, count) it will read up to count - 1 characters before stopping. read will pull in all count of them.
You could envisage read as being an appropriate action on a binary datasource, reading a specific number of bytes. get would be more appropriate on a text stream, when you're reading into a string that you'd like null-terminated, and where things like newlines have useful syntactic meanings splitting up text.
readsome only returns characters that are immediately available in the underlying buffer, something which is a bit nebulous and implementation specific. This probably includes characters returned to the stream using putback, for example. The fact that you can't see the difference between read and readsome just shows that the two might share an implementation on the particular stream type and library you are using.
I've observed the difference between read() and readsome() on a flash filing system.
The underlying stream reads 8k blocks and the read method will go for the next block to satisfy the caller, whereas the readsome method is allowed to return less than the request in order to avoid spending time fetching the next block.
The main difference between get() and getline() is that get() leaves the newline character in the input stream, making it the first character seen by the next input operation, whereas getline() extracts and discards the newline character from the input stream.

using fstream to read every character including spaces and newline

I wanted to use fstream to read a txt file.
I am using inFile >> characterToConvert, but the problem is that this omits any spaces and newline.
I am writing an encryption program so I need to include the spaces and newlines.
What would be the proper way to go about accomplishing this?
Probably the best way is to read the entire file's contents into a string, which can be done very easily using ifstream's rdbuf() method:
std::ifstream in("myfile");
std::stringstream buffer;
buffer << in.rdbuf();
std::string contents(buffer.str());
You can then use regular string manipulation now that you've got everything from the file.
While Tomek was asking about reading a text file, the same approach will work for reading binary data, though the std::ios::binary flag needs to be provided when creating the input file stream.
For encryption, you're better off opening your file in binary mode. Use something like this to put the bytes of a file into a vector:
std::ifstream ifs("foobar.txt", std::ios::binary);
ifs.seekg(0, std::ios::end);
std::ifstream::pos_type filesize = ifs.tellg();
ifs.seekg(0, std::ios::beg);
std::vector<char> bytes(filesize);
ifs.read(&bytes[0], filesize);
Edit: fixed a subtle bug as per the comments.
I haven't tested this, but I believe you need to clear the "skip whitespace" flag:
inFile.unsetf(ios_base::skipws);
I use the following reference for C++ streams:
IOstream Library
std::ifstream ifs( "filename.txt" );
std::string str( ( std::istreambuf_iterator<char>( ifs ) ),
std::istreambuf_iterator<char>()
);
The following c++ code will read an entire file...
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main ()
{
string line;
ifstream myfile ("foo.txt");
if (myfile.is_open()){
while (!myfile.eof()){
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
return 0;
}
post your code and I can give you more specific help to your problem...
A lot of the benefit of the istream layer is providing basic formatting and parsing for simple types ro and from a stream. For the purposes that you describe, none of this is really important and you are just interested in the file as a stream of bytes.
For these purpose you may be better of just using the basic_streambuf interface provided by a filebuf. The 'skip whitespace' behaviour is part of the istream interface functionality that you just don't need.
filebuf underlies an ifstream, but it is perfectly valid to use it directly.
std::filebuf myfile;
myfile.open( "myfile.dat", std::ios_base::in | std::ios_base::binary );
// gets next char, then moves 'get' pointer to next char in the file
int ch = myfile.sbumpc();
// get (up to) the next n chars from the stream
std::streamsize getcount = myfile.sgetn( char_array, n );
Also have a look at the functions snextc (moves the 'get' pointer forward and then returns the current char), sgetc (gets the current char but doesn't move the 'get' pointer) and sungetc (backs up the 'get' pointer by one position if possible).
When you don't need any of the insertion and extraction operators provided by an istream class and just need a basic byte interface, often the streambuf interface (filebuf, stringbuf) is more appropriate than an istream interface (ifstream, istringstream).
You can call int fstream::get(), which will read a single character from the stream. You can also use istream& fstream::read(char*, streamsize), which does the same operation as get(), just over multiple characters. The given links include examples of using each method.
I also recommend reading and writing in binary mode. This allows ASCII control characters to be properly read from and written to files. Otherwise, an encrypt/decrypt operation pair might result in non-identical files. To do this, you open the filestream with the ios::binary flag. With a binary file, you want to use the read() method.
Another better way is to use istreambuf_iterator, and the sample code is as below:
ifstream inputFile("test.data");
string fileData(istreambuf_iterator<char>(inputFile), istreambuf_iterator<char>());
For encryption, you should probably use read(). Encryption algorithms usually deal with fixed-size blocks. Oh, and to open in binary mode (no translation frmo \n\r to \n), pass ios_base::binary as the second parameter to constructor or open() call.
Simple
#include <fstream>
#include <iomanip>
ifstream ifs ("file");
ifs >> noskipws
that's all.
ifstream ifile(path);
std::string contents((std::istreambuf_iterator<char>(ifile)), std::istreambuf_iterator<char>());
ifile.close();
I also find that the get() method of ifstream object can also read all the characters of the file, which do not require unset std::ios_base::skipws. Quote from C++ Primer:
Several of the unformatted operations deal with a stream one byte at a time. These operations, which are described in Table 17.19, read rather ignore whitespaces.
These operations are list as below:
is.get(), os.put(), is.putback(), is.unget() and is.peek().
Below is a minimum working code
#include <iostream>
#include <fstream>
#include <string>
int main(){
std::ifstream in_file("input.txt");
char s;
if (in_file.is_open()){
int count = 0;
while (in_file.get(s)){
std::cout << count << ": "<< (int)s <<'\n';
count++;
}
}
else{
std::cout << "Unable to open input.txt.\n";
}
in_file.close();
return 0;
}
The content of the input file (cat input.txt) is
ab cd
ef gh
The output of the program is:
0: 97
1: 98
2: 32
3: 99
4: 100
5: 10
6: 101
7: 102
8: 32
9: 103
10: 104
11: 32
12: 10
10 and 32 are decimal representation of newline and space character. Obviously, all characters have been read.
As Charles Bailey correctly pointed out, you don't need fstream's services just to read bytes. So forget this iostream silliness, use fopen/fread and be done with it. C stdio is part of C++, you know ;)