C++ iostream >> operator behaves differently than get() unsigned char - c++

I was working on a piece of code to do some compression, and I wrote a bitstream class.
My bitstream class kept track of the current bit we are reading and the current byte (unsigned char).
I noticed that reading the next unsigned character from the file was done differently if I used the >> operator vs get() method in the istream class.
I was just curious why I was getting different results?
ex:
this->m_inputFileStream.open(inputFile, std::ifstream::binary);
unsigned char currentByte;
this->m_inputFileStream >> currentByte;
vs.
this->m_inputFileStream.open(inputFile, std::ifstream::binary);
unsigned char currentByte;
this->m_inputFileStream.get((char&)currentByte);
Additional Info:
To be specific the byte I was reading was 0x0A however when using >> it would read it as 0x6F
I'm not sure how they're even related ? (they're not the 2s complement of each other?)
The >> operator is also defined to work for unsigned char as well however (see c++ istream class reference

operator>> is for formatted input. It'll read "23" as an integer if you stream it into an int, and it'll eat whitespace between tokens. get() on the other hand is for unformatted, byte-wise input.

If you aren't parsing text, don't use operator>> or operator<<. You'll get weird bugs that are hard to track down. They are also resilient to unit tests, unless you know what to look for. Reading a uint8 for instance will fail on 9 for instance.
edit:
#include <iostream>
#include <sstream>
#include <cstdint>
void test(char r) {
std::cout << "testing " << r << std::endl;
char t = '!';
std::ostringstream os(std::ios::binary);
os << r;
if (!os.good()) std::cout << "os not good" << std::endl;
std::istringstream is(os.str(), std::ios::binary);
is >> t;
if (!is.good()) std::cout << "is not good" << std::endl;
std::cout << std::hex << (uint16_t)r
<< " vs " << std::hex << (uint16_t)t << std::endl;
}
int main(int argc, char ** argv) {
test('z');
test('\n');
return 0;
}
produces:
testing z
7a vs 7a
testing
is not good
a vs 21
I suppose that would never have been evident a priori.

C++'s formatted input (operator >>) treats char and unsigned char as a character, rather than an integer. This is a little annoying, but understandable.
You have to use get, which returns the next byte, instead.
However, if you open a file with the binary flag, you should not be using formatted I/O. You should be using read, write and related functions. Formatted I/O won't behave correctly, as it's intended to operate on text formats, not binary formats.

Related

Different behaviors in strringstream given std::hex, uint8_t vs int

I suppose the fact that uint8_t is really just an alias for char means that when the characters in the stringstream get read back in, they are stored natively, i.e. the ascii code for the character, whereas if the char is input to an int, with std::hex specified, it gets read in as a hex digit converted to base 10.
This is a small example exhibiting the behavior, but the reason I'm asking is because I started out under the assumption that specifying an output vector of unit8_t was the right approach, as I need to later feed this vector as an array to a C function as a byte vector (you pass it in as a void pointer and it figures out what do to with it somehow). So I figured reading in pairs of input chars into a string stream and then out into a uint8_t would help me skip manually bitshifting. Is there a way of keeping the idea of reading into a uint8_t but evoking the handling of an int? Can I cast to int at the point of the read from the stringstream? Or what?
#include <iostream>
#include <string>
#include <sstream>
#include <iostream>
#include <iomanip>
#include <vector>
#include <cstdint>
int main (int argc, char** argv) {
uint8_t n = 0;
std::stringstream ss;
ss << 'F';
ss >> std::hex >> n;
std::cout << "uint8:" << (int)n << std::endl;
int m = 0;
std::stringstream tt;
tt << 'F';
tt >> std::hex >> m;
std::cout << "int:" << std::hex << m << std::endl;
return 0;
}
output
uint8:70
int:f
There are operator>> overloads for char, signed char and unsigned char, which all read a single character from the stream. A uint8_t argument selects the unsigned char overload as the best match, reading F as a character (ASCII code 70 or 0x46).
To match an integer extraction operator>> overload, use an argument of type short or bigger (for example, uint16_t):
uint16_t n = 0;
std::stringstream ss;
ss << 'F';
ss >> std::hex >> n;
Can I cast to int at the point of the read from the stringstream?
No. The argument is taken by reference, which in function calls works as a pointer. Type punning a uint8_t argument as (int&) will invoke undefined behavior. Just use a bigger type instead.

C++ how to read what I want from a file?

How can I assign a variable from my C++ code a value from a structured .txt file for example?
If I have this following input.txt structured like this:
<Name> "John Washington"
<Age> "24"
<ID> "19702408447417"
<Alive Status> "Deceased"
In my c++ code if I have
ifstream read("input.txt",ios::in);
char name[64];
read>>name;//name will point to John Washington string
int age;
read>>age;// cout<<age will return 24;
int ID;
read>>ID;// cout<<ID will return 19702408447417
char Alivestatus[32];
read>>Alivestatus;//Alivestatus will point to Deceased string;
How can I make it work like above?
As #πάντα ῥεῖ mentioned in the comments, you will need to implement a parser that can interpret the <> tags within your file. Additionally, I would recommend reconsidering the data types.
Specifically, given that there's no special reason that you're using char [], please switch to std::string. I don't know the use case of your code, but if the input.txt happens to contain data thats larger than the size of the arrays, or even worse if the input is user-controlled, this can easily lead to Buffer Overflows and unwanted exploits. std::string also has the benefit of being standardized, optimized, much more friendly than char arrays, and has a variety of useful algorithms and functions readily available for use.
With regards to text file parsing, you can perhaps implement the following:
#include <fstream>
#include <iostream>
#include <string>
int main()
{
std::ifstream input_file("input.txt");
std::string name_delimeter("<Name> ");
std::string age_delimeter("<Age> ");
std::string id_delimeter("<ID> ");
std::string alive_delimeter("<Alive Status> ");
std::string line;
std::getline(input_file,line);
std::string name(line,line.find(name_delimeter) + name_delimeter.size()); // string for the reasons described above
std::getline(input_file,line);
int age = std::atoi(line.substr(line.find(age_delimeter) + age_delimeter.size()).c_str());
std::getline(input_file,line);
std::string id(line,line.find(id_delimeter) + id_delimeter.size()); // the example ID overflows 32-bit integer
// maybe representing is a string is more appropriate
std::getline(input_file,line);
std::string alive_status(line,line.find(alive_delimeter) + alive_delimeter.size()); // string for the same reason as name
std::cout << "Name = " << name << std::endl << "Age = " << age << std::endl << "ID = " << id << std::endl << "Alive? " << alive_status << std::endl;
}
The basis of the code is just to read the file as it is structured and construct the appropriate data types from them. In fact, because I used std::string for most of the data types, it was easy to build the correct output by means of std::string's constructors and available functions.
Maybe you are performing this in a loop, or the file has several structures. To approach this problem, you can make a Record class that overloads operator >> and reads in the data as required.

Inconsistent behavior when parsing numbers from stringstream on different platforms

In a project i'm using a stringstream to read numeric values using the operator>>. I'm now getting reports indicating that the parsing bahaviour is inconsistent across different platforms if additional characters are appended to the number (for instance "2i"). Compiling the sample below with GCC/VCC/LLVM on Linux results in:
val=2; fail=0
Compiling and running it on iOS with either GCC or LLVM reportedly yields:
val=0; fail=1
What does the standard say about the behavior of operator>> in such a case?
--- Sample Code ---------------------------------------------
#include <sstream>
#include <iostream>
int main(int argc, const char **args)
{
double val;
std::stringstream ss("2i");
ss >> val;
std::cout << "val=" << val << "; fail=" << ss.fail() << std::endl;
return 0;
}
According to this reference:
Thus, in either case:
if your compiler is pre C++11 and reading fails it would leave the value of val intact and flag failbit with 0.
if you compiler is post C++11 and reading fails it would set the value of val equal to 0 and flag failbit with 0.
However, operator>>extracts and parses characters sequentially with function num_get::get [27.7.2.2.2 Arithmetic extractors], from the stream as long as it can interpret them as the representation of a value of the proper type.
Thus, in your case operator>> will call num_get::get for the first character (i.e., 2) and the reading will succeed, then it will move on to read the next character (i.e., i). i doesn't fit a numerical value and consequently num_get::get will fail and reading will stop. However, there are already valid characters been read. These valid characters will be processed and assigned to val, the rest of the characters will remain in the stringstream. To illustrate this I'll give an example:
#include <sstream>
#include <iostream>
#include <string>
int main(int argc, const char **args)
{
double val(0.0);
std::stringstream ss("2i");
ss >> val;
std::cout << "val=" << val << "; fail=" << ss.fail() << std::endl;
std::string str;
ss >> str;
std::cout << str << std::endl;
return 0;
}
Output:
val=2; fail=0
i
You see that if I use extract operator again to a std::string, the character i is extracted.
The above however, doesn't explain why you don't get the same behaviour in ios.
This is a known bug with libc++ that was submitted to Bugzilla. The problem as I see it is with std::num_get::do_get()'s double overload somehow continuing to parse the characters a, b, c, d, e, f, i, x, p, n and their captial equivalents despite those being invalid characters for an integral type (other than e where it denotes scientific notation but must be followed by a numeric value otherwise failure). Normally do_get() would stop when it finds an invalid character and not set failbit as long as characters were sucessfully extracted (as explained above).

Writing class object to file using streams

I have this code to serialize/deserialize class objects to file, and it seems to work.
However, I have two questions.
What if instead two wstring's (as I have now) I want to have one wstring and one string member
variable in my class? (I think in such case my code won't work?).
Finally, below, in main, when I initialize s2.product_name_= L"megatex"; if instead of megatex I write something in Russian say (e.g., s2.product_name_= L"логин"), the code doesn't work anymore as intended.
What can be wrong? Thanks.
Here is code:
// ConsoleApplication3.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <iostream>
#include <string>
#include <fstream> // std::ifstream
using namespace std;
// product
struct Product
{
double price_;
double product_index_;
wstring product_name_;
wstring other_data_;
friend std::wostream& operator<<(std::wostream& os, const Product& p)
{
return os << p.price_ << endl
<< p.product_index_ << endl
<< p.product_name_ << endl
<< p.other_data_ << endl;
}
friend wistream& operator>>(std::wistream& is, Product& p)
{
is >> p.price_ >> p.product_index_;
is.ignore(std::numeric_limits<streamsize>::max(), '\n');
getline(is,p.product_name_);
getline(is,p.other_data_);
return is;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
Product s1,s2;
s1.price_ = 100;
s1.product_index_ = 0;
s1.product_name_= L"flex";
s1.other_data_ = L"dat001";
s2.price_ = 300;
s2.product_index_ = 2;
s2.product_name_= L"megatex";
s2.other_data_ = L"dat003";
// write
wofstream binary_file("c:\\test.dat",ios::out|ios::binary|ios::app);
binary_file << s1 << s2;
binary_file.close();
// read
wifstream binary_file2("c:\\test.dat");
Product p;
while (binary_file2 >> p)
{
if(2 == p.product_index_){
cout<<p.price_<<endl;
cout<<p.product_index_<<endl;
wcout<<p.product_name_<<endl;
wcout<<p.other_data_<<endl;
}
}
if (!binary_file2.eof())
std::cerr << "error during parsing of input file\n";
else
std::cerr << "Ok \n";
return 0;
}
What if instead two wstring's (as I have now) I want to have one
wstring and one string member variable in my class? (I think in such
case my code won't work?).
There are an inserter defined for char * for any basic_ostream (ostream and wostream), so you can use the result of c_str() member function call for the string member. For example, if the string member is other_data_:
return os << p.price_ << endl
<< p.product_index_ << endl
<< p.product_name_ << endl
<< p.other_data_.c_str() << endl;
The extractor case is more complex, since you'll have to read as wstring and the convert to string. The most simple way to do this is just reading as wstring and then narrowing each character:
wstring temp;
getline(is, temp);
p.other_data_ = string(temp.begin(), temp.end());
I'm not using locales in this sample, just converting a sequence of bytes (8 bits) to a sequence of words (16 bits) for output and the opposite (truncating values) for input. That is OK if you are using ASCII chars, or using single-byte chars and you don't require an specific format (as Unicode) for output.
Otherwise, you will need handle with locales. locale gives cultural contextual information to interpret the string (remember that is just a sequence of bytes, not characters in the sense of letters or symbols; the map between the bytes and what symbol represents is defined by the locale). locale is not an very easy to use concept (human culture isn't too). As you suggest yourself, it would be better make first some investigation about how it works.
Anyway, the idea is:
Identify the charset used in string and the charset used in file (Unicode or utf-16).
Convert the strings from original charset to Unicode using locale for output.
Convert the wstrings read from file (in Unicode) to strings using locale.
Finally, below, in main, when I initialize s2.product_name_=
L"megatex"; if instead of megatex I write something in Russian say
(e.g., s2.product_name_= L"логин"), the code doesn't work anymore as
intended.
When you define an array of wchar_t using L"", you'are not really specifying the string is Unicode, just that the array is of chars, not wchar_t. I suppose the intended working is s2.product_name_ store the name in Unicode format, but the compiler will take every char in that string (as without L) and convert to wchar_t just padding with zeros the most significant byte. Unicode is not good supported in the C++ standard until C++11 (and is still not really too supported). It works just for ASCII characters because they have the same codification in Unicode (or UTF-8).
For using the Unicode characters in a static string, you can use escape characters: \uXXXX. Doing that for every not-English character is not very comfortable, I know. You can found a list of Unicode characters in multiple sites in the web. For example, in the Wikipedia: http://en.wikipedia.org/wiki/List_of_Unicode_characters.

How do you print out the binary representation of a file?

I'm trying to create a compression program but I need to know the basics of how to open a file in binary and print out its contents.
In a text file, called "Tester.txt", I have this:
MJ
In a .cpp file, I have this:
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
int main
{
fstream istr;
istr.open("Tester.txt", ios::binary);
}
From my understanding in the cplusplus reference, this uses a stream object to open the file specified in binary?
But I'm stuck on how exactly I can "print" out the first byte of the file, i.e. the letter M in binary?
I know that M (capital letter) in binary is 01001101.
So how do I do a cout of M in binary?
Thanks
You have a confusion between numbers and representations of numbers, probably created by the fact that the word "binary" can sometimes be used to describe both. When you open a file in "binary mode", that means you see the raw values of the bytes in the file. This has nothing to do with "binary" in the sense of representing numbers in base two.
Say a file has "x" followed by a newline and a return. In "binary mode", you will see that as three byte-size values, one containing the ASCII code for "x", one containing the ASCII code for newline, and one containing the ASCII code for return. These are values that you read from the file. You can represent them in binary, but you can also represent them in decimal or hex, you still have read the exact same values from the file.
Reading a file in "binary" determines the values you read, not how you represent them. Two cars are the same two cars whether you represent the value two as "2" (decimal), "10" (binary), or "two" (English).
Binary input/output on streams is done using their member functions read() and write().
Like this:
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
int main
{
fstream istr;
istr.open("Tester.txt", ios::binary);
if (istr) {
// Read one byte
char byte;
if (!istr.read(&byte, 1)) {
// Error when reading
}
// Alternative way to read one byte (thanks to DyP)
byte = istr.get();
// Another alternative:
if (!istr.get(byte)) {
// Error when reading.
}
// Read a block of bytes:
char buffer[1024];
if (!istr.read(buffer, 1024)) {
// Read error, or EOF reached before 1024 bytes were read.
}
}
}
Here is a quick program which uses the C++ Standard Library to do all the heavy lifting.
#include <iostream>
#include <iterator>
#include <bitset>
#include <algorithm>
int main() {
std::istreambuf_iterator< char > in( std::cin ), in_end;
std::ostream_iterator< std::bitset< 8 > > out( std::cout, " " );
std::copy( in, in_end, out );
std::cout << '\n';
}
See it run. I used std::cin for demonstration, but you should open a file with std::ios::binary and pass that instead.
Since each variable is only used once, this could all be done on one line. Even if you open the file instead of using std::cin.
EDIT:
std::copy is a function encapsulating the loop for ( ; in != in_end; ++ in ) * out ++ = * in;.
The type std::istreambuf_iterator either takes an istream constructor argument and provides an iterator in suitable for such a loop, or takes no constructor argument and provides an iterator in_end such that in == in_end if in.eof() == true. The iterator gets unformatted bytes (type char) from the stream.
The type std::ostream_iterator< std::bitset< 8 > > provides an iterator out so * out ++ = x converts x to std::bitset< 8 > and prints the result. In this case x is a byte and bitset provides a constructor for such a byte value, and overloads operator<< to print a binary representation of 1's and 0's.
To output a value in binary you need to do it manually as the standard library does not support that output format.
int mask = 0x80;
while(mask)
{
std::cout << (byteValue & mask ? '1' : '0');
mask >>= 1;
}
std::cout << std::endl;
This will scan from the top bit to the low bit and print out a value representing each one.
try this:
#include <fstream>
#include <iostream>
#include <string>
#include <bitset>
#include <iomanip>
int main()
{
// Set up your objects.
char c;
std::fstream istr("Tester.txt", ios::binary);
unsigned long loc = 0;
// Read the file one character at a time.
// Remembering not to skip white space in this situation.
for(;istr >> std::noskipws >> c;++loc)
{
// When printing compensate for non printable characters.
// If the character is outside the ASCII range then print it as an integer.
std::stringstream charStr;
if ((c < 32) || (c > 126))
{
charStr << "Char: " << c;
}
else
{
charStr << "Non Printable: " << static_cast<int>(c);
}
// Print the value and location in a nicely formatted way.
std::cout << std::setw(16) << location
<< " : "
<< std::bitset<8>(c).to_string() // Prints the character as an 8 bit binary value.
<< " : "
<< charStr.str()
<< "\n";
}
}
But there are standard tools that do this already:
Look at od