Mismatch between characters put and read - c++

I'm trying to write a Huffman encoder but I'm getting some compression errors. I identified the problem as mismatches between characters that were put() to the ofstream and the characters read() from the same file.
One specific instance of this problem :
The put() writes ASCII character 10 (Line feed)
The read() reads ASCII character 13 (Carriage return)
I thought read and put read and write raw data ( no character translations ) I'm not sure why this is happening. Can someone help me out?
Here is the ofstream instance for writing the compressed file:
std::ofstream compressedFileStream(getCompressedFileName(),std::ios::binary||std::ios::ate);
and the ifstream instance for reading the same
std::ifstream fileInput(getFileName()+".huf",std::ios::binary);
The code is running on Windows 7 and all streams in the program are opened in binary mode.

Not opening in binary mode due to a typo:
std::ofstream compressedFileStream(getCompressedFileName(),std::ios::binary||std::ios::ate)
should be:
std::ofstream compressedFileStream(getCompressedFileName(),std::ios::binary|std::ios::ate)
// ^
|, not ||.

The symptoms show that you are creating the ofsteam with text mode or you are creating it using a filedesc that is opened in text mode.
You will want to pass ios::binary to it at construction time or it may run in text mode on Windows.
After you added the code, the reason proves to be a typo;
std::ios::binary||std::ios::ate
should be
std::ios::binary|std::ios::ate

On Windows, if you are writing binary data, you need to open the file with the appropriate attributes.
Similarly, if you are reading binary data, you need to open the file with the appropriate attributes.

Related

C++ - Missing end of line characters in file read

I am using the C++ streams to read in a bunch of files in a directory and then write them to another directory. Since these files may be of different types, I am using a the generic ios::binary flag when reading/writing these files.
Example code below:
std::fstream inf( "ex.txt", std::ios::in | std::ios::binary);
char c;
while( inf >> c ) {
// writing to another file in binary format
}
The issue I have is that in the case of files containing text, the end of line characters in these text files are not being written to the output file.
Edit: Or at least they do not appear to be as when the newly written file is opened, there is only a single continuous line of characters.
Edit again: The problem (of the continuous string) appears to persist even when the read / write is made in text mode.
Thus, I was wondering if there was a way to check if a file has text or binary and then read/write it appropriately. Else, is there any way to preserve the end of line characters even when opening the file in binary format?
Edit: I am using the g++ 4.8.2 compiler
When you want to manipulate bytes, you need to use read and write methods, not >> << operators.
You can get the intended behavior with inp.flags(inp.flags() & ~std::ios_base::skipws);, though.

Why am i getting these invalid characters before my file data?

I am trying to read a file into a string either by getline function or fileContents.assign( (istreambuf_iterator<char>(myFile)), (istreambuf_iterator<char>()));
Either of the way gives me the above output which shown in the image.
First way:
string fileContents;
ifstream myFile("textFile.txt");
while(getline(myFile,fileContents))
cout<<fileContents<<endl;
Alternate way:
string fileContents;
ifstream myFile(fileName.c_str());
if (myFile.is_open())
{
fileContents.assign( (istreambuf_iterator<char>(myFile) ),
(istreambuf_iterator<char>() ) );
cout<<fileContents;
}
The file begins with those characters, most likely a BOM to tell you what the encoding of the file is.
You probably are not able to see them in Windows Notepad because Notepad hides the encoding bytes. Get a decent text editor that lets you see the binary of the file and you will see those characters.
Your file starts with a UTF-8 BOM (bytes 0xEF 0xBB 0xBF). You are reading the file's raw bytes as-is and outputting them to a display that is using an OEM font for codepage 437. To handle text files properly, especially Unicode-encoded text files, you need to read the first few bytes, check for a BOM (and there are several you can look for), and if detected then seek past the BOM and interpret the remaining bytes of the file in the specified encoding, in this case UTF-8.

c++ getline reads entire file

I'm using std::getline() to read from a text file, line by line. However, the first call to getline is reading in the entire file! I've also tried specifying the delimeter as '\n' explicitly. Any ideas why this might be happening?
My code:
std::ifstream serialIn;
...
serialIn.open(argv[3]);
...
std::string tmpStr;
std::getline(serialIn, tmpStr, '\n');
// All 570 lines in the file is in tmpStr!
...
std::string serialLine;
std::getline(serialIn, serialLine);
// serialLine == "" here
I am using Visual Studio 2008. The text file has 570 lines (I'm viewing it in Notepad++ fwiw).
Edit: I worked around this problem by using Notepad++ to convert the line endings in my input text file to "Windows" line endings. The file was written with '\n' at the end of each line, using c++ code. Why would getline() require the Windows line endings (\r\n)?? Does this have to do with character width, or Microsoft implementation?
Just guessing, but could your file have Unix line-endings and you're running on Windows?
You're confusing the newline you see in code ('\n') with the actual line-ending representation for the platform (some combination of carriage-return (CR) and linefeed (LF) bytes).
The standard I/O library functions automatically convert line-endings for your platform to and from conceptual newlines for text-mode streams (the default). See What's the difference between text and binary I/O? from the comp.lang.c FAQ. (Although that's from the C FAQ, the concepts apply to C++ as well.) Since you're on Windows, the standard I/O functions by default write newlines as CR-LF and expect CR-LF for newlines when reading.
If you don't want these conversions done and would prefer to see the raw, unadulterated data, then you should set your streams to binary mode. In binary mode, \n corresponds to just LF, and \r corresponds to just CR.
In C, you can specify binary mode by passing "b" as one of the flags to fopen:
FILE* file = fopen(filename, "rb"); // Open a file for reading in binary mode.
In C++:
std::ifstream in;
in.open(filename, std::ios::binary);
or:
std::ifstream in(filename, std::ios::binary);

Problem with iostream, my output endl are littles squares

I have a problem with with my output when I write to I file I get squares when I put endl to change lines.
std::ofstream outfile (a_szFilename, std::ofstream::binary);
outfile<<"["<<TEST<<"]"<<std::endl;
I get something like this in my file plus the other outputs don't write on the next line but on the same one.
[TEST]square
apparently I can't write the square here, but is it something about the ofstream being binary or something?
You don't really want to open the file in binary mode in this case.
Try this instead:
std::ofstream outfile (a_szFilename);
outfile<<"["<<TEST<<"]"<<std::endl;
You're opening the file in binary mode. in this case the endl is written as \n while a newline on windows is supposed to be \r\n
To open you file in text mode just don't include the binary flag the translation will be done automatically
std::ofstream outfile(a_szFilename);
outfile<<"["<<TEST<<"]"<<std::endl;
It's probably because you're in binary mode and the line endings are wrong. std::endl will place '\n' on the stream before flushing. In text mode, this will be converted to the correct line ending for your platform. In binary mode, no such conversions take place.
If you're on Windows, your code will have a line feed (LF), but Windows also requires a carriage return (CF) first, which is '\r'. That is, it wants "\r\n", not just a newline.
Your fix is to open the file in text mode. Binary files are not suppose to be outputting newlines or formatted output, which is why you don't want to use the extraction and insertion operators.
If you really want to use binary, then treat your file like a binary file and don't expect it to display properly. Binary and formatted output do not go hand in hand. From your usage, it seems you should be opening in text mode.

Why does the C++ ofstream write() method modify my raw data?

I have a jpeg image in a char[] buffer in memory, all I need to do is write it out to disk exactly as is. Right now I'm doing this
ofstream ofs;
ofs.open(filename);
ofs.write(buffer, bufferLen);
ofs.close();
but the image doesn't come out right, it looks garbled with random black and white stripes everywhere. After comparing the image with the original in a hex viewer, I found out that the ofstream is modifying the data when it thinks I'm writing a newline character. Anyplace that 0x0A shows up in the original, the ofstream writes as two bytes: 0x0D0A. I have to assume the ofstream is intending to convert from LF only to CRLF, is there a standard way to get it to not do this?
Set the mode to binary when you open the file:
http://www.cplusplus.com/reference/iostream/ofstream/ofstream/
You should set the file mode to binary when you are opening it:
std::ofstream file;
file.open("filename.jpg", std::ios_base::out | std::ios_base::binary);
This way the stream doesn't try to adjust the newlines to your native text format.
Try opening the ofstream as binary.
Something like this should work:
ofstream ofs;
ofs.open(filename, ios::out | ios::binary);
ofs.write(buffer, bufferLen);
ofs.close();
Since you are not opening the file in binary mode, it is set to formatted output by default. In formatted output, your implementation performs conversion of the end-of-line characters as you describe.
I wish I could get my version to write anything at all... no errors, no complaints, nothing wrong when you debug it but it doesn't even try and create the file.
What the hell was wrong with fopen, fwrite and fclose... I never had a problem with them