How to push binary data into std::string? - c++

Im trying to create a binary file in the following way:
string buf;
...
buf += filename.length();
buf += filename;
etc. So first i give the length in binary format, but how do i convert this into a 4 byte char array, or 2 byte etc? basically i want to achieve the same functionality as this would:
int len = filename.length();
fwrite(&len, sizeof(len), 1, fp);
Which works fine, but having it in one string might be easier to process.
Edit: i dont want to use streams, nor vector, im trying to find out if its possible with strings.

Streams are the way to go. Not strings.

Use a vector for holding the data, or write it straight to the file (via streams)

simply use std:vector<unsigned char> and use a istream or ostream iterator to read/write data to/from the vector. For instance to read from a file you can do:
vector<unsigned char> binary_buffer;
ifstream in_file("my_binary_file.bin", ios_base::binary | ios_base::in);
istream_iterator<unsigned char> end_of_file;
istream_iterator<unsigned char> in_file_iter(in_file);
while (in_file_iter != end_of_file)
{
binary_buffer.push_back(*in_file_iter++);
}
Output would be even simpler:
ofstream out_file("another_binary_file.bin", ios_base::binary | ios_base::out);
ostream_iterator<unsigned char> binary_output(out_file);
copy(binary_buffer.begin(), binary_buffer.end(), binary_output);

Yes, it is possible to do this, because this is C++, and everything is possible in C++.
First, here is how you do it, then I'll answer the question of why you might need to do it:
std::ifstream input( "my.png", std::ios::binary );
std::vector<unsigned char> buffer(std::istreambuf_iterator<char>(input), {});
int buffSize = buffer.size();
std::string myString(buffer.begin(), buffer.end());
In my case, I was using a framework where the HTTP Client only supported a string for post message body, but I needed to post raw binary of a file to the service I was using. I could either mess around with the internals of the library (bad) or introduce another unnecessary dependency for a simple file post (also not so good).
But since strings can hold this data, I was able to read the file, copy it to a string and then upload.
This is an unfortunate situation to be in, either way, but in some cases it is helpful to be able to do. Usually you would want to do something different. This isn't the clearest for someone else to read, and you will have some performance penalties from the copy, but it works; and it helps to understand the internals of why it works. std::string saves the data contiguously internally in binary format, like vector does, and can be initialized from iterators in the vector.
unsigned char is one byte long, so it is like a byte value. The string copies the bytes into itself, so you end up with the exact data.

Related

How to use std::string to store bytes (unsigned chars) in a right way?

I'm coding LZ77 compression algorithm, and I have trouble storing unsigned chars in a string. To compress any file, I use its binary representation and then read it as chars (because 1 char is equal to 1 byte, afaik) to a std::string. Everything works perfectly fine with chars. But after some time googling I learned that char is not always 1 byte, so I decided to swap it for unsigned char. And here things start to get tricky:
When compressing plain .txt, everything works as expected, I get equal files before and after decompression (I assume it should, because we basically work with text before and after byte conversion)
However, when trying to compress .bmp, decompressed file loses 3 bytes compared to input file (I lose these 3 bytes when trying to save unsigned chars to a std::string)
So, my question is – is there a way to properly save unsigned chars to
a string?
I tried to use typedef basic_string<unsigned char> ustring and swap all related functions for their basic alternatives to use with unsigned char, but I still lose 3 bytes.
UPDATE: I found out that 3 bytes (symbols) are lost not because of
std::string, but because of std::istream_iterator (that I use instead
of std::istreambuf_iterator) to create string of unsigned chars
(because std::istreambuf_iterator's argument is char, not unsigned
char)
So, are there any solutions to this particular problem?
Example:
std::vector<char> tempbuf(std::istreambuf_iterator<char>(file), {}); // reads 112782 symbols
std::vector<char> tempbuf(std::istream_iterator<char>(file), {}); // reads 112779 symbols
Sample code:
void LZ77::readFileUnpacked(std::string& path)
{
std::ifstream file(path, std::ios::in | std::ios::binary);
if (file.is_open())
{
// Works just fine with char, but loses 3 bytes with unsigned
std::string tempstring = std::string(std::istreambuf_iterator<char>(file), {});
file.close();
}
else
throw std::ios_base::failure("Failed to open the file");
}
char in all of its forms (and std::byte, which is isomorphic with unsigned char) is always the smallest possible type that a system supports. The C++ standard defines that sizeof(char) and its variations shall always be exactly 1.
"One" what? That's implementation-defined. But every type in the system will be some multiple of sizeof(char) in size.
So you shouldn't be too concerned over systems where char is not one byte. If you're working under a system where CHAR_BITS isn't 8, then that system can't handle 8-bit bytes directly at all. So unsigned char won't be any different/better for this purpose.
As to the particulars of your problem, istream_iterator is fundamentally different from istreambuf_iterator iterator. The purpose of the latter is to allow iterator access to the actual stream as a sequence of values. The purpose of istream_iterator<T> is to allow access to a stream as if by performing a repeated sequence of operator >> calls with a T value.
So if you're doing istream_iterator<char>, then you're saying that you want to read the stream as if you did stream >> some_char; variable for each iterator access. That isn't actually isomorphic with accessing the stream's characters directly. Specifically, FormattedInputFunctions like operator>> can do things like skip whitespace, depending on how you set up your stream.
istream_iterator is reading using operator>> which usually skip white spaces as part of its function. If you want to disable that behavior, you'll have to do
#include <ios>
file >> std::noskipws;

File writing and appending in Binary mode not working

I am trying to append into file in binary mode but the logic below is not working.
For Pdf files,file is getting corrupted and for text files, it is adding some junk data in addition to my file contents.
My variable m_strReceivedMessage is of type std::string.
std::ofstream out(file, std::ios::binary | std::ios_base::app );
int i = sizeof(m_strReceivedMessage);
if (out.is_open()) {
// out.write(m_strReceivedMessage.c_str(), m_strReceivedMessage.size());
//out << m_strReceivedMessage;
out.write(reinterpret_cast<char *>(&m_strReceivedMessage), m_strReceivedMessage.size());
}
You're printing the memory of the std::string object, rather than the character buffer that it contains. To get a pointer to the character buffer, see the data() member function. Hint: The fact that you need to cast std::string* using reinterpret_cast<char*> is a dead giveaway that you're doing something very wrong.
Also, I'm not familiar with the PDF spec, but I suspect that it may possibly contain nul bytes. And depending on how you get your std::string, it's possible you may have missed any content after the first nul. std::vector<char> would be more appropriate way to store binary data.

Reading part of a binary file into an pre-existing basic_string object

I have an large string buffer, and a input stream
basic_string<uint8_t> *buf = ......;
istream in = ......;
What is the most efficient way to read a part of the file into the string? Say, the 0xE3CC'th to 0x1A481'th bytes from the file.
Here istream::read seems not an answer since it reads to a raw char[]. Since the data is quite large, having a temporary variable is [in]efficient.
And sadly, I don't have C++0x, so copy_n can't be used. What would you suggest? Thanks.
buf->resize(size);
in.read(&((*buf)[0], size);
BTW, do you really need buf to be a pointer?

How can I use fread() on a binary file to read the data into a std::vector?

Follow-up question on an earlier question I had, that has been perfectly answered. To quickly recap, I had trouble creating a class holding a huge array (stack overflow error). In the answers, some users recommended I use std::vector instead.
The function to read in the data looks like this:
Test()
{
memset(myarray, 0, sizeof(myarray));
FILE* fstr = fopen("myfile.dat", "rb");
size_t success= fread(myarray, sizeof(myarray), 1, fstr);
fclose(fstr);
}
for a myarray which looked like this:
int myarray[45000000];
My question is: How can I read this into a preferable:
std::vector<int> myvector;
I searched google , and have found multiple answers, usually pointing to the following code:
std::ifstream input("myfile.dat", std::ios::in | std::ifstream::binary);
std::copy(std::istream_iterator<int>(input),
std::istream_iterator<int>(),
std::back_inserter(myvector));
After implementing this, and when calling myvector.size() I get 16 (for whatever reason), and accessing a vector element leads to an immediate crash for going out of the vector bounds.
So what do I have to do to get this right? I once read somewhere that I could just simply use the "old" method, and then reading the array into the vector, but this seems to defeat the purpose of using the vector in the first place.
fread() reads your file binary, while ifstream_iterator tries to extract formatted ints (like 42).
You want to resize your vector and use input.read(...) instead:
const size_t size = 45000000; // change this to the appropriate value
std::vector<char> myvector(size, 0);
std::ifstream input("myfile.dat", std::ios::in | std::ifstream::binary);
input.read(&myvector[0], myvector.size());
Note that you need to use a std::vector<char> since read expects the first parameter to be a char *. You can use other types T if you cast the type correctly:
input.read(reinterpret_cast<char*>(&myvector[0]), myvector.size() * sizeof(T));
If you're using C++ you should try to avoid using the C FILE APIs all together -- so you're on the right track. The problem you're having is that istream_iterator reads input as text, not binary -- it's expecting ASCII digits. This this out instead:
std::vector<int> vec(45000000);
std::filebuf fb;
fb.open("myfile.dat", std::ios_base::in | std::ios_base::binary);
fb.sgetn((char*)&vec[0], vec.size() * sizeof(vec[0]));

Read .part files and concatenate them all

So I am writing my own custom FTP client for a school project. I managed to get everything to work with the swarming FTP client and am down to one last small part...reading the .part files into the main file. I need to do two things. (1) Get this to read each file and write to the final file properly (2) The command to delete the part files after I am done with each one.
Can someone please help me to fix my concatenate function I wrote below? I thought I had it right to read each file until the EOF and then go on to the next.
In this case *numOfThreads is 17. Ended up with a file of 4742442 bytes instead of 594542592 bytes. Thanks and I am happy to provide any other useful information.
EDIT: Modified code for comment below.
std::string s = "Fedora-15-x86_64-Live-Desktop.iso";
std::ofstream out;
out.open(s.c_str(), std::ios::out);
for (int i = 0; i < 17; ++i)
{
std::ifstream in;
std::ostringstream convert;
convert << i;
std::string t = s + ".part" + convert.str();
in.open(t.c_str(), std::ios::in | std::ios::binary);
int size = 32*1024;
char *tempBuffer = new char[size];
if (in.good())
{
while (in.read(tempBuffer, size))
out.write(tempBuffer, in.gcount());
}
delete [] tempBuffer;
in.close();
}
out.close();
return 0;
Almost everything in your copying loop has problems.
while (!in.eof())
This is broken. Not much more to say than that.
bzero(tempBuffer, size);
This is fairly harmless, but utterly pointless.
in.read(tempBuffer, size);
This the "almost" part -- i.e., the one piece that isn't obviously broken.
out.write(tempBuffer, strlen(tempBuffer));
You don't want to use strlen to determine the length -- it's intended only for NUL-terminated (C-style) strings. If (as is apparently the case) the data you read may contain zero-bytes (rather than using zero-bytes only to signal the end of a string), this will simply produce the wrong size.
What you normally want to do is a loop something like:
while (read(some_amount) == succeeded)
write(amount that was read);
In C++ that will typically be something like:
while (infile.read(buffer, buffer_size))
outfile.write(buffer, infile.gcount());
It's probably also worth noting that since you're allocating memory for the buffer using new, but never using delete, your function is leaking memory. Probably better to do without new for this -- an array or vector would be obvious alternatives here.
Edit: as for why while (infile.read(...)) works, the read returns a reference to the stream. The stream in turn provides a conversion to bool (in C++11) or void * (in C++03) that can be interpreted as a Boolean. That conversion operator returns the state of the stream, so if reading failed, it will be interpreted as false, but as long as it succeeded, it will be interpreted as true.