There is a question that is very similar in spirit here. Unfortunately that question didn't prompt much response - I thought I would ask a more specific question with the hope that an alternative method can be suggested.
I'm writing a binary file into std::cin (with tar --to-command=./myprog).
The binary file happens to be a set of floats and I want to put the data into std::vector<float> - ideally the c++ way.
I can generate a std::vector<char> very nicely (thanks to this answer)
#include <fstream>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <vector>
int
main (int ac, char **av)
{
std::istream& input = std::cin;
std::vector<char> buffer;
std::copy(
std::istreambuf_iterator<char>(input),
std::istreambuf_iterator<char>( ),
std::back_inserter(buffer)); // copies all data into buffer
}
I now want to transform my std::vector<char> into a std::vector<float>, presumably with std::transform and a function that does the conversion (a char[2] to a float, say). I am struggling however, because my std::vector<float> will have half as many elements as std::vector<char>. If I could iterate with a stride of 2 then I think I would be fine, but from the previous question it seems that I cannot do that (at least not elegantly).
I would write my own class that reads two chars and converts it to float.
struct FloatConverter
{
// When the FloatConverter object is assigned to a float value
// i.e. When put into the vector<float> this method will be called
// to convert the object into a float.
operator float() { return 1.0; /* How you convert the 2 chars */ }
friend std::istream& operator>>(std::istream& st, FloatConverter& fc)
{
// You were not exactly clear on what should be read in.
// So I went pedantic and made sure we just read 2 characters.
fc.data[0] = str.get();
fc.data[1] = str.get();
retun str;
}
char data[2];
};
Based on comments by GMan:
struct FloatConverterFromBinary
{
// When the FloatConverterFromBinary object is assigned to a float value
// i.e. When put into the vector<float> this method will be called
// to convert the object into a float.
operator float() { return data }
friend std::istream& operator>>(std::istream& st, FloatConverterFromBinary& fc)
{
// Use reinterpret_cast to emphasis how dangerous and unportable this is.
str.read(reinterpret_cast<char*>(&fc.data), sizeof(float));
retun str;
}
float data;
};
Then use it like this:
int main (int ac, char **av)
{
std::istream& input = std::cin;
std::vector<float> buffer;
// Note: Because the FloatConverter does not drop whitespace while reading
// You can potentially use std::istream_iterator<>
//
std::copy(
std::istreambuf_iterator<FloatConverter>(input),
std::istreambuf_iterator<FloatConverter>( ),
std::back_inserter(buffer));
}
It seems to me that the best answer is to write a pair of your own iterators that parse the file the way that you want. You could change std::vector<char> to std::vector<float> and use the same streambuf iterators provided the input was formatted with at least one space between values.
use boost range adaptors:
boost::copy(istream_range(input)|stride(2),back_inserter(buffer));
you might need to write your own istreambuf_iterator, which is trivial.
Related
My understanding is that reading a uint8_t from a stringstream is a problem because the stringstream will interpret the uint8_t as a char. I would like to know how I can read a uint8_t from a stringstream as a numeric type. For instance, the following code:
#include <iostream>
#include <sstream>
using namespace std;
int main()
{
uint8_t ui;
std::stringstream ss("46");
ss >> ui;
cout << unsigned(ui);
return 0;
}
prints out 52. I would like it to print out 46.
EDIT: An alternative would to just read a string from the stringstream and then convert the solution to uint8_t, but this breaks the nice chaining properties. For example, in the actual code I have to write, I often need something like this:
void foobar(std::istream & istream){
uint8_t a,b,c;
istream >> a >> b >> c;
// TODO...
}
You can overload the input operator>> for uint8_t, such as:
std::stringstream& operator>>(std::stringstream& str, uint8_t& num) {
uint16_t temp;
str >> temp;
/* constexpr */ auto max = std::numeric_limits<uint8_t>::max();
num = std::min(temp, (uint16_t)max);
if (temp > max) str.setstate(std::ios::failbit);
return str;
}
Live demo: https://wandbox.org/permlink/cVjLXJk11Gigf5QE
To say the truth I am not sure whether such a solution is problem-free. Someone more experienced might clarify.
UPDATE
Note that this solution is not generally applicable to std::basic_istream (as well as it's instance std::istream), since there is an overloaded operator>> for unsigned char: [istream.extractors]. The behavior will then depend on how uint8_t is implemented.
Please do not use char or unsigned char(uint8_t) if you want to read in a formatted way. Your example code and its result is an expected behavior.
As we can see from https://en.cppreference.com/w/cpp/io/basic_istream/operator_gtgt2
template< class Traits >
basic_istream<char,Traits>& operator>>( basic_istream<char,Traits>& st, unsigned char& ch );
This does "Performs character input operations".
52 is an ascii code for '4'. Which means that the stringstream has read only one byte and still ready to read '6'.
So if you want work in the desired way, you should use 2-byte or bigger integer types for sstream::operator>> then cast it to uint8_t - the exact way that you self-answered.
Here's a reference for those overloads.
https://en.cppreference.com/w/cpp/io/basic_istream/operator_gtgt
After much back and forth, the answer seems to be that there is no standard way of doing this. The options are to either read off the uint8_t as either a uint16_t or std::string, and then convert those values to uint8_t:
#include <iostream>
#include <sstream>
using namespace std;
int main()
{
uint8_t ui;
uint16_t tmp;
std::stringstream ss("46");
ss >> tmp;
ui = static_cast<uint8_t>(tmp);
cout << unsigned(ui);
return 0;
}
However, such a solution disregards range checking. So you will need to implement that yourself if you need it.
I have some data in a buffer pointed to by a const char* pointer. The data is just an ASCII string. I know its size. I would like to be able to read it in the same way data is read from streams. I'm looking for a solution that would allow me to write code like this:
// for example, data points to a string "42 3.14 blah"
MemoryStreamWrapper in(data, data_size);
int x;
float y;
std::string w;
in >> x >> y >> w;
Important condition: the data must not be copied or altered in any way (otherwise I'd just use a string stream. To my best knowledge, it isn't possible to create a string stream from a const char pointer without copying the data.)
The way to do this is to create a suitable stream buffer. This can, e.g., be done like this:
#include <streambuf>
#include <istream>
struct membuf: std::streambuf {
membuf(char const* base, size_t size) {
char* p(const_cast<char*>(base));
this->setg(p, p, p + size);
}
};
struct imemstream: virtual membuf, std::istream {
imemstream(char const* base, size_t size)
: membuf(base, size)
, std::istream(static_cast<std::streambuf*>(this)) {
}
};
The only somewhat awkward thing is the const_cast<char*>() in the stream buffer: the stream buffer won't change the data but the interface still requires char* to be used, mainly to make it easier to change the buffer in "normal" stream buffers. With this, you can use imemstream as a normal input stream:
imemstream in(data, size);
in >> value;
The only way would be to subclass std::istream (which also requires subclassing std::streambuf) to create your own stream class that reads from constant memory.
It's not as easy as it sounds because the the C++ standard library stream classes are pretty messy and badly designed. I don't think it's worth it unless you need it to scale a lot.
I currently do this, and the conversion to std::string at the end take 98% of the execution time. There must be a better way!
std::string
file2string(std::string filename)
{
std::ifstream file(filename.c_str());
if(!file.is_open()){
// If they passed a bad file name, or one we have no read access to,
// we pass back an empty string.
return "";
}
// find out how much data there is
file.seekg(0,std::ios::end);
std::streampos length = file.tellg();
file.seekg(0,std::ios::beg);
// Get a vector that size and
std::vector<char> buf(length);
// Fill the buffer with the size
file.read(&buf[0],length);
file.close();
// return buffer as string
std::string s(buf.begin(),buf.end());
return s;
}
Being a big fan of C++ iterator abstraction and the algorithms, I would love the following to be the fasted way to read a file (or any other input stream) into a std::string (and then print the content):
#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <string>
int main()
{
std::string s(std::istreambuf_iterator<char>(std::ifstream("file")
>> std::skipws),
std::istreambuf_iterator<char>());
std::cout << "file='" << s << "'\n";
}
This certainly is fast for my own implementation of IOStreams but it requires a lot of trickery to actually get it fast. Primarily, it requires optimizing algorithms to cope with segmented sequences: a stream can be seen as a sequence of input buffers. I'm not aware of any STL implementation consistently doing this optimization. The odd use of std::skipws is just to get reference to the just created stream: the std::istreambuf_iterator<char> expects a reference to which the temporary file stream wouldn't bind.
Since this probably isn't the fastest approach, I would be inclined to use std::getline() with a particular "newline" character, i.e. on which isn't in the file:
std::string s;
// optionally reserve space although I wouldn't be too fuzzed about the
// reallocations because the reads probably dominate the performances
std::getline(std::ifstream("file") >> std::skipws, s, 0);
This assumes that the file doesn't contain a null character. Any other character would do as well. Unfortunately, std::getline() takes a char_type as delimiting argument, rather than an int_type which is what the member std::istream::getline() takes for the delimiter: in this case you could use eof() for a character which never occurs (char_type, int_type, and eof() refer to the respective member of char_traits<char>). The member version, in turn, can't be used because you would need to know ahead of time how many characters are in the file.
BTW, I saw some attempts to use seeking to determine the size of the file. This is bound not to work too well. The problem is that the code conversion done in std::ifstream (well, actually in std::filebuf) can create a different number of characters than there are bytes in the file. Admittedly, this isn't the case when using the default C locale and it is possible to detect that this doesn't do any conversion. Otherwise the best bet for the stream would be to run over the file and determine the number of characters being produced. I actually think that this is what would be needed to be done when the code conversion could something interesting although I don't think it actually is done. However, none of the examples explicitly set up the C locale, using e.g. std::locale::global(std::locale("C"));. Even with this it is also necessary to open the file in std::ios_base::binary mode because otherwise end of line sequences may be replaced by a single character when reading. Admittedly, this would only make the result shorter, never longer.
The other approaches using the extraction from std::streambuf* (i.e. those involving rdbuf()) all require that the resulting content is copied at some point. Given that the file may actually be very large this may not be an option. Without the copy this could very well be the fastest approach, however. To avoid the copy, it would be possible to create a simple custom stream buffer which takes a reference to a std::string as constructor argument and directly appends to this std::string:
#include <fstream>
#include <iostream>
#include <string>
class custombuf:
public std::streambuf
{
public:
custombuf(std::string& target): target_(target) {
this->setp(this->buffer_, this->buffer_ + bufsize - 1);
}
private:
std::string& target_;
enum { bufsize = 8192 };
char buffer_[bufsize];
int overflow(int c) {
if (!traits_type::eq_int_type(c, traits_type::eof()))
{
*this->pptr() = traits_type::to_char_type(c);
this->pbump(1);
}
this->target_.append(this->pbase(), this->pptr() - this->pbase());
this->setp(this->buffer_, this->buffer_ + bufsize - 1);
return traits_type::not_eof(c);
}
int sync() { this->overflow(traits_type::eof()); return 0; }
};
int main()
{
std::string s;
custombuf sbuf(s);
if (std::ostream(&sbuf)
<< std::ifstream("readfile.cpp").rdbuf()
<< std::flush) {
std::cout << "file='" << s << "'\n";
}
else {
std::cout << "failed to read file\n";
}
}
At least with a suitably chosen buffer I would expect the version to be the fairly fast. Which version is the fastest will certainly depend on the system, the standard C++ library being used, and probably a number of other factors, i.e. you want to measure the performance.
You can try this:
#include <fstream>
#include <sstream>
#include <string>
int main()
{
std::ostringstream oss;
std::string s;
std::string filename = get_file_name();
if (oss << std::ifstream(filename, std::ios::binary).rdbuf())
{
s = oss.str();
}
else
{
// error
}
// now s contains your file
}
You can also just use oss.str() directly if you like; just make sure you have some sort of error check somewhere.
No guarantee that it's the most efficient; you probably can't beat <cstdio> and fread. As #Benjamin pointed out, the string stream only exposes the data by copy, so you could instead read directly into the target string:
#include <string>
#include <cstdio>
std::FILE * fp = std::fopen("file.bin", "rb");
std::fseek(fp, 0L, SEEK_END);
unsigned int fsize = std::ftell(fp);
std::rewind(fp);
std::string s(fsize, 0);
if (fsize != std::fread(static_cast<void*>(&s[0]), 1, fsize, fp))
{
// error
}
std::fclose(fp);
(You might like to use a RAII wrapper for the FILE*.)
Edit: The fstream-analogue of the second version goes like this:
#include <string>
#include <fstream>
std::ifstream infile("file.bin", std::ios::binary);
infile.seekg(0, std::ios::end);
unsigned int fsize = infile.tellg();
infile.seekg(0, std::ios::beg);
std::string s(fsize, 0);
if (!infile.read(&s[0], fsize))
{
// error
}
Edit: Yet another version, using streambuf-iterators:
std::ifstream thefile(filename, std::ios::binary);
std::string s((std::istreambuf_iterator<char>(thefile)), std::istreambuf_iterator<char>());
(Mind the aditional parentheses to get the correct parsing.)
Ironically, the example for string::reserve is reading a file into a string. You don't want to read the file into one buffer and then have to allocate/copy into another one.
Here's the example code:
// string::reserve
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main ()
{
string str;
size_t filesize;
ifstream file ("test.txt",ios::in|ios::ate);
filesize=file.tellg();
str.reserve(filesize); // allocate space in the string
file.seekg(0);
for (char c; file.get(c); )
{
str += c;
}
cout << str;
return 0;
}
I don't know how efficient it is, but here is a simple (to read) way, by just setting the EOF as the delimiter:
string buffer;
ifstream fin;
fin.open("filename.txt");
if(fin.is_open()) {
getline(fin,buffer,'\x1A');
fin.close();
}
The efficiency of this obviously depends on what's going on internally in the getline algorithm, so you could take a look at the code in the standard libraries to see how it works.
Guido Van Rossum demonstrates the simplicity of Python in this article and makes use of this function for buffered reads of a file of unknown length:
def intsfromfile(f):
while True:
a = array.array('i')
a.fromstring(f.read(4000))
if not a:
break
for x in a:
yield x
I need to do the same thing in C++ for speed reasons! I have many files containing sorted lists of unsigned 64 bit integers that I need to merge. I have found this nice piece of code for merging vectors.
I am stuck on how to make an ifstream for a file of unknown length present itself as a vector which can be happily iterated over until the end of the file is reached. Any suggestions? Am I barking up the correct tree with an istreambuf_iterator?
In order to disguise an ifstream (or really, any input stream) in a form that acts like an iterator, you want to use the istream_iterator or the istreambuf_iterator template class. The former is useful for files where the formatting is of concern. For example, a file full of whitespace-delimited integers can be read into the vector's iterator range constructor as follows:
#include <fstream>
#include <vector>
#include <iterator> // needed for istream_iterator
using namespace std;
int main(int argc, char** argv)
{
ifstream infile("my-file.txt");
// It isn't customary to declare these as standalone variables,
// but see below for why it's necessary when working with
// initializing containers.
istream_iterator<int> infile_begin(infile);
istream_iterator<int> infile_end;
vector<int> my_ints(infile_begin, infile_end);
// You can also do stuff with the istream_iterator objects directly:
// Careful! If you run this program as is, this won't work because we
// used up the input stream already with the vector.
int total = 0;
while (infile_begin != infile_end) {
total += *infile_begin;
++infile_begin;
}
return 0;
}
istreambuf_iterator is used to read through files a single character at a time, disregarding the formatting of the input. That is, it will return you all characters, including spaces, newline characters, and so on. Depending on your application, that may be more appropriate.
Note: Scott Meyers explains in Effective STL why the separate variable declarations for istream_iterator are needed above. Normally, you would do something like this:
ifstream infile("my-file.txt");
vector<int> my_ints(istream_iterator<int>(infile), istream_iterator<int>());
However, C++ actually parses the second line in an incredibly bizarre way. It sees it as the declaration of a function named my_ints that takes in two parameters and returns a vector<int>. The first parameter is of type istream_iterator<int> and is named infile (the parantheses are ignored). The second parameter is a function pointer with no name that takes zero arguments (because of the parantheses) and returns an object of type istream_iterator<int>.
Pretty cool, but also pretty aggravating if you're not watching out for it.
EDIT
Here's an example using the istreambuf_iterator to read in a file of 64-bit numbers laid out end-to-end:
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
int main(int argc, char** argv)
{
ifstream input("my-file.txt");
istreambuf_iterator<char> input_begin(input);
istreambuf_iterator<char> input_end;
// Fill a char vector with input file's contents:
vector<char> char_input(input_begin, input_end);
input.close();
// Convert it to an array of unsigned long with a cast:
unsigned long* converted = reinterpret_cast<unsigned long*>(&char_input[0]);
size_t num_long_elements = char_input.size() * sizeof(char) / sizeof(unsigned long);
// Put that information into a vector:
vector<unsigned long> long_input(converted, converted + num_long_elements);
return 0;
}
Now, I personally rather dislike this solution (using reinterpret_cast, exposing char_input's array), but I'm not familiar enough with istreambuf_iterator to comfortably use one templatized over 64-bit characters, which would make this much easier.
Is is possible to read one line from input stream and pass it to string stream without using temorary string variable in C++?
I currently do the reading like this (but I don't like the temporary variable line):
string line;
getline(in, line); // in is input stream
stringstream str;
str << line;
Like #Steve Townsend said above, it's probably not worth the effort, however if you wanted to do this (and you knew beforehand the number of lines involved), you could do something like:
#include <iostream>
#include <iterator>
#include <string>
#include <sstream>
#include <algorithm>
using namespace std;
template <typename _t, int _count>
struct ftor
{
ftor(istream& str) : _str(str), _c() {}
_t operator() ()
{
++_c;
if (_count > _c) return *(_str++); // need more
return *_str; // last one
}
istream_iterator<_t> _str;
int _c;
};
int main(void)
{
ostringstream sv;
generate_n(ostream_iterator<string>(sv, "\n"), 5, ftor<string, 5>(cin));
cout << sv.str();
return 0;
}
There is detailed info in the question below (per #Martin York) on reading direct from stream to stringstream. This is not a direct dup as you wish to handle the input line by line, but this approach will be hard to beat for efficiency. You can instantiate the individual lines using a character range once the raw data is in the stringstream.
How to read file content into istringstream?
To be honest, this may be a lot of work for a problem that's not really a huge perf concern anyway.