Best way to convert ostream to std::vector<uint8_t> - c++

I am trying to find the most efficient way of converting an std::ostream to a std::vector<uint8_t>. I could obviously convert to string first but I am trying to avoid extra data copies. Is there a nice way to do this? I have been looking at the rdbuf of ostream and think that it might be doable using that. Not sure how to proceed though. Any advice?

Assuming you mistake ostream with istream (because you're reading input into the vector) you're probaply looking for istream_iterator. You can pass them to the constructor of vector:
std::istringstream str("5 16 32 8");
std::vector<uint8_t> numbers{ std::istream_iterator<int>(str), std::istream_iterator<int>() };
Note that you have to pass int as argument to the iterator, because operator<< is not defined for uint8_t.

Assuming you mean istream (input) and not ostream (output), then istream has a read() method for reading input into a user-provided buffer. So, simply pre-allocate your vector to the desired size and then read() directly into its data buffer, eg:
std::vector<uint8_t> buffer(size);
stream.read(reinterpret_cast<char*>(buffer.data()), size);

If you would like to read file contents without parsing:
#include <vector>
#include <iostream>
#include <cstdint>
#include <iterator>
std::vector<uint8_t> read_data(std::istream& s) {
return std::vector<uint8_t>(std::istreambuf_iterator<char>{s},
std::istreambuf_iterator<char>{});
}
int main() {
auto v = read_data(std::cin);
}
The drawback is that you cannot specify the maximum size to read. If that is a requirement (a public API that you want to protect from unfriendly clients), use istream::read with a fixed size, as Remy Lebeau advises.
If the file is large (e.g. megabytes or more), you may like to map directly into memory to avoid large data copies and memory reallocations.

Related

Having trouble reading file into vector

I have a file that has three ints on three rows. It looks like this:
000
001
010
And I'm trying to read each integer into the vector positions but I don't know if I'm doing it right. Here is my code:
#include <fstream>
#include <iterator>
#include <vector>
int main()
{
std::vector<int> numbers;
std::fstream out("out.txt");
std::copy(std::ostreambuf_iterator<int>(out.rdbuf()),
std::ostreambuf_iterator<int>(), std::back_inserter(numbers));
}
What am I doing wrong here? I'm getting a "no matching function call" error on the line where I do the copy.
You're using wrong iterator.
You need istreambuf_iterator, not ostreambuf_iterator:
std::copy(std::istreambuf_iterator<int>(out.rdbuf()),
std::istreambuf_iterator<int>(), std::back_inserter(numbers));
Note that ostreambuf_iterator is an output iterator. It is used to write, not read. What you want to do is, read for which you need istreambuf_iterator.
But wait! The above code is not going to work either, Why?
Because you're using istreambuf_iterator and passing int to it. The istreambuf_iterator reads data as unformatted buffer of type either char* or wchar_t*. The template argument to istreambuf_iterator could be either char or wchar_t.
What you actually need is called istream_iterator which reads formatted data of given type:
std::copy(std::istream_iterator<int>(out), //changed here also!
std::istream_iterator<int>(), std::back_inserter(numbers));
This will work great now.
Note that you could just avoid using std::copy, and use the constructor of std::vector itself as:
std::fstream in("out.txt");
std::vector<int> numbers((std::istream_iterator<int>(in)), //extra braces
std::istream_iterator<int>());
Note the extra braces around first argument which is used to avoid vexing parse in C+++.
If the vector object is already created (and optionally it has some elements in it), then you can still avoid std::copy as:
numbers.insert(numbers.end(),
std::istream_iterator<int>(in), //no extra braces
std::istream_iterator<int>());
No extra braces needed in this case.
Hope that helps.
Read the Book 'C++ How To Program' by Dietal & Dietal, The chapter on Vectors. I assure you, all your problems will be solved. You have opened the text file for output instead of input. Instead of using this function I would suggest that you should read-in strings and copy them into your vector using iterators until EOF is encountered in the file. EDIT: This way is more natural and easy to read and understand if you are new to Vectors.

Reading a Binary File into a bitset or vector<bool>

How do I read a binary file into a bitset or vector<bool>? The binary file will vary in length. Is there a better container for this? I am new to C++ though experienced as a programmer.
If the file is large, Why should you read once, whole the file into the memory?
You can read a little piece every time. The size is determined with the size in this func:
file.read(buff, size)
When the buff is char's array.
I'm sorry, but You can't simplest read/save vector to file.
for more details see here and here.
And use Google, It's very helpful...
You didn't give too much context of what you're trying to do in your question. But here's one quick & dirty way to do it:
#include <iterator>
#include <fstream>
#include <vector>
#include <assert.h>
using namespace std;
const char *filename = "foo.bar";
int main()
{
vector<bool> v;
ifstream binary_file(filename, ios::binary);
assert(binary_file);
copy(istream_iterator<unsigned char>(binary_file),
istream_iterator<unsigned char>(),
back_insert_iterator< vector<bool> >(v));
}
Reading the zero-byte '\0' character into the vector will be false. Any other bytes read in will be treated as true.

Parsing binary data from file

and thank you in advance for your help!
I am in the process of learning C++. My first project is to write a parser for a binary-file format we use at my lab. I was able to get a parser working fairly easily in Matlab using "fread", and it looks like that may work for what I am trying to do in C++. But from what I've read, it seems that using an ifstream is the recommended way.
My question is two-fold. First, what, exactly, are the advantages of using ifstream over fread?
Second, how can I use ifstream to solve my problem? Here's what I'm trying to do. I have a binary file containing a structured set of ints, floats, and 64-bit ints. There are 8 data fields all told, and I'd like to read each into its own array.
The structure of the data is as follows, in repeated 288-byte blocks:
Bytes 0-3: int
Bytes 4-7: int
Bytes 8-11: float
Bytes 12-15: float
Bytes 16-19: float
Bytes 20-23: float
Bytes 24-31: int64
Bytes 32-287: 64x float
I am able to read the file into memory as a char * array, with the fstream read command:
char * buffer;
ifstream datafile (filename,ios::in|ios::binary|ios::ate);
datafile.read (buffer, filesize); // Filesize in bytes
So, from what I understand, I now have a pointer to an array called "buffer". If I were to call buffer[0], I should get a 1-byte memory address, right? (Instead, I'm getting a seg fault.)
What I now need to do really ought to be very simple. After executing the above ifstream code, I should have a fairly long buffer populated with a number of 1's and 0's. I just want to be able to read this stuff from memory, 32-bits at a time, casting as integers or floats depending on which 4-byte block I'm currently working on.
For example, if the binary file contained N 288-byte blocks of data, each array I extract should have N members each. (With the exception of the last array, which will have 64N members.)
Since I have the binary data in memory, I basically just want to read from buffer, one 32-bit number at a time, and place the resulting value in the appropriate array.
Lastly - can I access multiple array positions at a time, a la Matlab? (e.g. array(3:5) -> [1,2,1] for array = [3,4,1,2,1])
Firstly, the advantage of using iostreams, and in particular file streams, relates to resource management. Automatic file stream variables will be closed and cleaned up when they go out of scope, rather than having to manually clean them up with fclose. This is important if other code in the same scope can throw exceptions.
Secondly, one possible way to address this type of problem is to simply define the stream insertion and extraction operators in an appropriate manner. In this case, because you have a composite type, you need to help the compiler by telling it not to add padding bytes inside the type. The following code should work on gcc and microsoft compilers.
#pragma pack(1)
struct MyData
{
int i0;
int i1;
float f0;
float f1;
float f2;
float f3;
uint64_t ui0;
float f4[64];
};
#pragma pop(1)
std::istream& operator>>( std::istream& is, MyData& data ) {
is.read( reinterpret_cast<char*>(&data), sizeof(data) );
return is;
}
std::ostream& operator<<( std::ostream& os, const MyData& data ) {
os.write( reinterpret_cast<const char*>(&data), sizeof(data) );
return os;
}
char * buffer;
ifstream datafile (filename,ios::in|ios::binary|ios::ate);
datafile.read (buffer, filesize); // Filesize in bytes
you need to allocate a buffer first before you read into it:
buffer = new filesize[filesize];
datafile.read (buffer, filesize);
as to the advantages of ifstream, well it is a matter of abstraction. You can abstract the contents of your file in a more convenient way. You then do not have to work with buffers but instead can create the structure using classes and then hide the details about how it is stored in the file by overloading the << operator for instance.
You might perhaps look for serialization libraries for C++. Perhaps s11n might be useful.
This question shows how you can convert data from a buffer to a certain type. In general, you should prefer using a std::vector<char> as your buffer. This would then look like this:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
int main() {
std::ifstream input("your_file.dat");
std::vector<char> buffer;
std::copy(std::istreambuf_iterator<char>(input),
std::istreambuf_iterator<char>(),
std::back_inserter(buffer));
}
This code will read the entire file into your buffer. The next thing you'd want to do is to write your data into valarrays (for the selection you want). valarray is constant in size, so you have to be able to calculate the required size of your array up-front. This should do it for your format:
std::valarray array1(buffer.size()/288); // each entry takes up 288 bytes
Then you'd use a normal for-loop to insert the elements into your arrays:
for(int i = 0; i < buffer.size()/288; i++) {
array1[i] = *(reinterpret_cast<int *>(buffer[i*288])); // first position
array2[i] = *(reinterpret_cast<int *>(buffer[i*288]+4)); // second position
}
Note that on a 64-bit system this is unlikely to work as you expect, because an integer would take up 8 bytes there. This question explains a bit about C++ and sizes of types.
The selection you describe there can be achieved using valarray.

C++ - Implementing my own stream

Hello! My problem can be described the following way:
I have some data which actually is an array and could be represented as char* data with some size
I also have some legacy code (function) that takes some abstract std::istream object as a param and uses that stream to retrieve data to operate.
So, my question is the following - what would be the easy way to map my data to some std::istream object so that I can pass it to my function? I thought about creating a std::stringstream object from my data, but that means copying and (as I assume) isn't the best solution.
Any ideas how this could be done so that my std::istream operates on the data directly?
Thank you.
If you're looking at actually creating your own stream, I'd look at the Boost.Iostreams library. It makes it easy to create your own stream objects.
Definitely not the easiest way but just in case someone wanted to understand how std streams work inside, this seems to be a very nice introduction about how you can roll your own:
http://www.mr-edd.co.uk/blog/beginners_guide_streambuf
Use string stream:
#include <sstream>
int main()
{
char[] data = "PLOP PLOP PLOP";
int size = 13; // PS I know this is not the same as strlen(data);
std::stringstream stream(std::string(data, size));
// use stream as an istream;
}
If you want to be real effecient you can muck with the stream buffer directly. I have not tried this and do not have a compiler to test with, but the folowing should work:
#include <sstream>
int main()
{
char[] data = "PLOP PLOP PLOP";
int size = 13; // PS I know this is not the same as strlen(data);
std::stringstream stream;
stream.rdbuf()->pubsetbuf(data, size);
// use stream as an istream;
}

How to write an object to file in C++

I have an object with several text strings as members. I want to write this object to the file all at once, instead of writing each string to file. How can I do that?
You can override operator>> and operator<< to read/write to stream.
Example Entry struct with some values:
struct Entry2
{
string original;
string currency;
Entry2() {}
Entry2(string& in);
Entry2(string& original, string& currency)
: original(original), currency(currency)
{}
};
istream& operator>>(istream& is, Entry2& en);
ostream& operator<<(ostream& os, const Entry2& en);
Implementation:
using namespace std;
istream& operator>>(istream& is, Entry2& en)
{
is >> en.original;
is >> en.currency;
return is;
}
ostream& operator<<(ostream& os, const Entry2& en)
{
os << en.original << " " << en.currency;
return os;
}
Then you open filestream, and for each object you call:
ifstream in(filename.c_str());
Entry2 e;
in >> e;
//if you want to use read:
//in.read(reinterpret_cast<const char*>(&e),sizeof(e));
in.close();
Or output:
Entry2 e;
// set values in e
ofstream out(filename.c_str());
out << e;
out.close();
Or if you want to use stream read and write then you just replace relevant code in operators implementation.
When the variables are private inside your struct/class then you need to declare operators as friend methods.
You implement any format/separators that you like. When your string include spaces use getline() that takes a string and stream instead of >> because operator>> uses spaces as delimiters by default. Depends on your separators.
It's called serialization. There are many serialization threads on SO.
There are also a nice serialization library included in boost.
http://www.boost.org/doc/libs/1_42_0/libs/serialization/doc/index.html
basically you can do
myFile<<myObject
and
myFile>>myObject
with boost serialization.
If you have:
struct A {
char a[30], b[25], c[15];
int x;
}
then you can write it all just with write(fh, ptr, sizeof(struct a)).
Of course, this isn't portable (because we're not saving the endieness or size of "int," but that may not be an issue for you.
If you have:
struct A {
char *a, *b, *c;
int d;
}
then you're not looking to write the object; you're looking to serialize it. Your best bet is to look in the Boost libraries and use their serialization routines, because it's not an easy problem in languages without reflection.
There's not really a simple way, it's C++ after all, not PHP, or JavaScript.
http://www.parashift.com/c++-faq-lite/serialization.html
Boost also has some library for it: http://www.boost.org/doc/libs/release/libs/serialization ... like Tronic already mentioned :)
The better method is to write each field individually along with the string length.
As an alternative, you can create a char array (or std::vector<char>) and write all the members into the buffer, then write the buffer to the output.
The underlying thorn is that a compiler is allowed to insert padding between members in a class or structure. Use memcpy or std::copy will result in padding bytes written to the output.
Just remember that you need to either write the string lengths and the content or the content followed by some terminating character.
Other people will suggest checking out the Boost Serialization library.
Unfortunately that is generally not quite possible. If your struct only contains plain data (no pointers or complex objects), you can store it as a one chunk, but care must be taken if portability is an issue. Padding, data type size and endianess issues make this problematic.
You can use Boost.Serialization to minimize the amount of code required for proper portable and versionable searialization.
Assuming your goal is as stated, to write out the object with a single call to write() or fwrite() or whatever, you'd first need to copy the string and other object data into a single contiguous block of memory. Then you could write() that block of memory out with a single call. Or you might be able to do a vector-write by calling writev(), if that call is available on your platform.
That said, you probably won't gain much by reducing the number of write calls. Especially if you are using fwrite() or similar already, then the C library is already doing buffering for you, so the cost of multiple small calls is minimal anyway. Don't put yourself through a lot of extra pain and code complexity unless it will actually do some good...