When I use this code
std::string filename = "tmp.bin";
std::ifstream fileStream;
std::vector<unsigned char> fileBuffer;
fileStream = std::ifstream(filename.c_str(), std::ios::binary | std::ios::ate);
fileBuffer.reserve(fileStream.tellg());
fileStream.seekg(0, std::ios::beg);
fileBuffer.insert(fileBuffer.begin(), std::istream_iterator<BYTE>(fileStream), std::istream_iterator<BYTE>());
all original spaces in my binary file are skipped -> fileBuffer contains no spaces, but need all tokens for Base64 encoding.
What is wrong here?
You need to use std::istreambuf_iterator<char>, istream_iterator uses operator>> to extract data, which for char and unsigned char will skip whitespace by default.
Side note: filebufs in C++ are defined in terms of the C standard, which has the following to say in a note regarding seeking to the end of binary files:
Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.
It'll probably work fine regardless, but unless the reallocations are a serious issue you should just one-shot the file
std::ifstream fileStream("tmp.bin", std::ios::binary);
std::vector<char> fileBuffer{
std::istreambuf_iterator<char>(fileStream),
std::istreambuf_iterator<char>()
};
Older C++ will need to avoid a vexing parse with
std::vector<char> fileBuffer(
(std::istreambuf_iterator<char>(fileStream)),
std::istreambuf_iterator<char>()
);
If your library has char_traits for unsigned char you could also use std::basic_ifstream<unsigned char> although this isn't portable, you can always convert to unsigned char later anyway depending on what you need.
Related
I'm coding LZ77 compression algorithm, and I have trouble storing unsigned chars in a string. To compress any file, I use its binary representation and then read it as chars (because 1 char is equal to 1 byte, afaik) to a std::string. Everything works perfectly fine with chars. But after some time googling I learned that char is not always 1 byte, so I decided to swap it for unsigned char. And here things start to get tricky:
When compressing plain .txt, everything works as expected, I get equal files before and after decompression (I assume it should, because we basically work with text before and after byte conversion)
However, when trying to compress .bmp, decompressed file loses 3 bytes compared to input file (I lose these 3 bytes when trying to save unsigned chars to a std::string)
So, my question is – is there a way to properly save unsigned chars to
a string?
I tried to use typedef basic_string<unsigned char> ustring and swap all related functions for their basic alternatives to use with unsigned char, but I still lose 3 bytes.
UPDATE: I found out that 3 bytes (symbols) are lost not because of
std::string, but because of std::istream_iterator (that I use instead
of std::istreambuf_iterator) to create string of unsigned chars
(because std::istreambuf_iterator's argument is char, not unsigned
char)
So, are there any solutions to this particular problem?
Example:
std::vector<char> tempbuf(std::istreambuf_iterator<char>(file), {}); // reads 112782 symbols
std::vector<char> tempbuf(std::istream_iterator<char>(file), {}); // reads 112779 symbols
Sample code:
void LZ77::readFileUnpacked(std::string& path)
{
std::ifstream file(path, std::ios::in | std::ios::binary);
if (file.is_open())
{
// Works just fine with char, but loses 3 bytes with unsigned
std::string tempstring = std::string(std::istreambuf_iterator<char>(file), {});
file.close();
}
else
throw std::ios_base::failure("Failed to open the file");
}
char in all of its forms (and std::byte, which is isomorphic with unsigned char) is always the smallest possible type that a system supports. The C++ standard defines that sizeof(char) and its variations shall always be exactly 1.
"One" what? That's implementation-defined. But every type in the system will be some multiple of sizeof(char) in size.
So you shouldn't be too concerned over systems where char is not one byte. If you're working under a system where CHAR_BITS isn't 8, then that system can't handle 8-bit bytes directly at all. So unsigned char won't be any different/better for this purpose.
As to the particulars of your problem, istream_iterator is fundamentally different from istreambuf_iterator iterator. The purpose of the latter is to allow iterator access to the actual stream as a sequence of values. The purpose of istream_iterator<T> is to allow access to a stream as if by performing a repeated sequence of operator >> calls with a T value.
So if you're doing istream_iterator<char>, then you're saying that you want to read the stream as if you did stream >> some_char; variable for each iterator access. That isn't actually isomorphic with accessing the stream's characters directly. Specifically, FormattedInputFunctions like operator>> can do things like skip whitespace, depending on how you set up your stream.
istream_iterator is reading using operator>> which usually skip white spaces as part of its function. If you want to disable that behavior, you'll have to do
#include <ios>
file >> std::noskipws;
I am trying to append into file in binary mode but the logic below is not working.
For Pdf files,file is getting corrupted and for text files, it is adding some junk data in addition to my file contents.
My variable m_strReceivedMessage is of type std::string.
std::ofstream out(file, std::ios::binary | std::ios_base::app );
int i = sizeof(m_strReceivedMessage);
if (out.is_open()) {
// out.write(m_strReceivedMessage.c_str(), m_strReceivedMessage.size());
//out << m_strReceivedMessage;
out.write(reinterpret_cast<char *>(&m_strReceivedMessage), m_strReceivedMessage.size());
}
You're printing the memory of the std::string object, rather than the character buffer that it contains. To get a pointer to the character buffer, see the data() member function. Hint: The fact that you need to cast std::string* using reinterpret_cast<char*> is a dead giveaway that you're doing something very wrong.
Also, I'm not familiar with the PDF spec, but I suspect that it may possibly contain nul bytes. And depending on how you get your std::string, it's possible you may have missed any content after the first nul. std::vector<char> would be more appropriate way to store binary data.
I've been trying to compress strings and save them to text files, then read the data and decompress it. When I try to decompress the read string, however, I get a Z_BUF_ERROR (-5) and the string may or may not decompress.
In the console, I can compress/decompress all day:
std::string s = zlib_compress("HELLO asdfasdf asdf asdfasd f asd f asd f awefo#8 892y*(#Y");
std::string e = zlib_decompress(s);
The string e will return the original string with no difficulty.
However, when I do this:
zlib_decompress(readFile(filename));
I get a Z_BUF_ERROR. I think it might be due in part to hidden characters in files, but I'm not really sure.
Here's my readFile function:
std::string readFile(std::string filename)
{
std::ifstream file;
file.open(filename.c_str(), std::ios::binary);
file.seekg (0, std::ios::end);
int length = file.tellg();
file.seekg (0, std::ios::beg);
char * buffer = new char[length];
file.read(buffer, length);
file.close();
std::string data(buffer);
return data;
}
When I write the compressed data, I use:
void writeFile(std::string filename, std::string data)
{
std::ofstream file;
file.open(filename.c_str(), std::ios::binary);
file << data;
file.close();
}
If needed, I'll show the functions I use to de/compress, but if it works without the File IO, I feel that the problem is an IO problem.
First, you're dealing with binary data that might or might not have embedded null characters. std::string isn't really the correct container for that, although you can handle embedded null characters if you do it correctly. However, using a std::string to store something documents a certain expectation and you're breaking that convention.
Second, the line std::string data(buffer); isn't doing what you think it does - that is the constructor you're supposed to use to construct a string from a null-terminated C string. You're dealing with binary data here so there is a chance that you're either don't get the full buffer into the string because it encounters a null terminator in the middle of the buffer, or it runs off the end of the buffer until it finds a null (or a seg fault). If you absolutely, positively must use a std::string, use the "correct" constructor, which would be std::string data(buffer, length);.
All that said, you are using the wrong data structure - what you want is a dynamic array of char/unsigned char. That would be a std::vector, not a std::string. As an aside, you should also pass the parameters to readFileand writeFile by const reference, the code that you wrote will make copies of the strings and if the buffer you pass into writeFile() is large, that will lead to an unpleasant hit in memory consumption and performance, plus it is completely unnecessary.
As the file might contain '\0' characters, you should specify the size when you assign the content to the std::string.
std::string data(buffer, length);
For what it's worth, here's how you could alter readFile() and writeFile():
std::vector<char> readFile(const std::string& filename)
{
std::ifstream file;
file.open(filename.c_str(), std::ios::binary);
file.seekg (0, std::ios::end);
const int length = file.tellg();
file.seekg (0, std::ios::beg);
std::vector<char> data(length);
file.read(&data[0], length);
file.close();
return data;
}
void writeFile(const std::string& filename, const std::vector<char>& data)
{
std::ofstream file;
file.open(filename.c_str(), std::ios::binary);
file.write(&data[0], data.size());
file.close();
}
Then you would also change your compress() and decompress() functions to work with std::vector<char>. Also note that so far the code is lacking any error handling. For example, what happens if the file doesn't exist? After calling file.open() you can check for any error by doing if (!file) { /* error handling */ }.
Im trying to create a binary file in the following way:
string buf;
...
buf += filename.length();
buf += filename;
etc. So first i give the length in binary format, but how do i convert this into a 4 byte char array, or 2 byte etc? basically i want to achieve the same functionality as this would:
int len = filename.length();
fwrite(&len, sizeof(len), 1, fp);
Which works fine, but having it in one string might be easier to process.
Edit: i dont want to use streams, nor vector, im trying to find out if its possible with strings.
Streams are the way to go. Not strings.
Use a vector for holding the data, or write it straight to the file (via streams)
simply use std:vector<unsigned char> and use a istream or ostream iterator to read/write data to/from the vector. For instance to read from a file you can do:
vector<unsigned char> binary_buffer;
ifstream in_file("my_binary_file.bin", ios_base::binary | ios_base::in);
istream_iterator<unsigned char> end_of_file;
istream_iterator<unsigned char> in_file_iter(in_file);
while (in_file_iter != end_of_file)
{
binary_buffer.push_back(*in_file_iter++);
}
Output would be even simpler:
ofstream out_file("another_binary_file.bin", ios_base::binary | ios_base::out);
ostream_iterator<unsigned char> binary_output(out_file);
copy(binary_buffer.begin(), binary_buffer.end(), binary_output);
Yes, it is possible to do this, because this is C++, and everything is possible in C++.
First, here is how you do it, then I'll answer the question of why you might need to do it:
std::ifstream input( "my.png", std::ios::binary );
std::vector<unsigned char> buffer(std::istreambuf_iterator<char>(input), {});
int buffSize = buffer.size();
std::string myString(buffer.begin(), buffer.end());
In my case, I was using a framework where the HTTP Client only supported a string for post message body, but I needed to post raw binary of a file to the service I was using. I could either mess around with the internals of the library (bad) or introduce another unnecessary dependency for a simple file post (also not so good).
But since strings can hold this data, I was able to read the file, copy it to a string and then upload.
This is an unfortunate situation to be in, either way, but in some cases it is helpful to be able to do. Usually you would want to do something different. This isn't the clearest for someone else to read, and you will have some performance penalties from the copy, but it works; and it helps to understand the internals of why it works. std::string saves the data contiguously internally in binary format, like vector does, and can be initialized from iterators in the vector.
unsigned char is one byte long, so it is like a byte value. The string copies the bytes into itself, so you end up with the exact data.
I am reading a binary file as:
const size_t stBuffer = 256;
char buffer[stBuffer];
std::wstring wPath(L"blah");
std::wifstream ifs(wPath.c_str(), std::wifstream::in | std::wifstream::binary)
while (ifs.good())
{
ifs.read(buffer, sizeof(buffer));
...
}
But I am realizing this is not a true binary read. The ifstream actually reads a byte and converts it to a wide char. So if the binary file has the content 0x112233...ff, I actually read 0x110022003300...ff00.
This doesn't make much sense to me: first, I only need to use a wide fstream because the file name is non Latin. Second, if I say the fstream is binary, why does read read wide chars? The code below does what I want. Is there a way to achieve that using std fstreams?
FILE* ifs = _wfopen(L"blah", L"rb");
while (!feof(ifs))
{
size_t numBytesRead = fread(buffer, 1, sizeof(buffer), ifs);
...
}
The current C++ standard doesn't provide wide char paths. Even the wchar_t version receives a regular const char* filename. You already used a compiler extension, so continue using this extension with a normal ifstream:
std::wstring wPath(L"blah");
std::ifstream ifs(wPath.c_str(), std::ios::in | std::ios::binary)
EDIT: consider using utf-8 strings instead of wide strings and use Boost.Nowide (not yet in boost) to open files.
EDIT: Boost.Nowide was accepted in boost. Also Windows 10 added support for UTF-8 in its narrow-string API, which can be enabled through a manifest. This makes all the wide-char interfaces pretty much unportable and redundant.