I'm writing a command-line utility for some text processing. I need a helper function (or two) that does the following:
If the filename is -, return standard input/output;
Otherwise, create and open a file, check for error, and return it.
And here comes my question: what is the best practice to design/implement such a function? What should it look like?
I first considered the old-school FILE*:
FILE *open_for_read(const char *filename)
{
if (strcmp(filename, "-") == 0)
{
return stdin;
}
else
{
auto fp = fopen(filename, "r");
if (fp == NULL)
{
throw runtime_error(filename);
}
return fp;
}
}
It works, and it's safe to fclose(stdin) later on (in case one doesn't forget to), but then I would lose access to the stream methods such as std::getline.
So I figure, the modern C++ way would be to use smart pointers with streams. At first, I tried
unique_ptr<istream> open_for_read(const string& filename);
This works for ifstream but not for cin, because you can't delete cin. So I have to supply a custom deleter (that does nothing) for the cin case. But suddenly, it fails to compile, because apparently, when supplied a custom deleter, the unique_ptr becomes a different type.
Eventually, after many tweaks and searches on StackOverflow, this is the best I can come up with:
unique_ptr<istream, void (*)(istream *)> open_for_read(const string &filename)
{
if (filename == "-")
{
return {static_cast<istream *>(&cin), [](istream *) {}};
}
else
{
unique_ptr<istream, void (*)(istream *)> pifs{new ifstream(filename), [](istream *is)
{
delete static_cast<ifstream *>(is);
}};
if (!pifs->good())
{
throw runtime_error(filename);
}
return pifs;
}
}
It is type-safe and memory-safe (or at least I believe so; do correct me if I'm wrong), but this looks kind of ugly and boilerplate, and above all, it is such a headache to just get it to compile.
Am I doing it wrong and missing something here? There's gotta be a better way.
I would probably make it into
std::istream& open_for_read(std::ifstream& ifs, const std::string& filename) {
return filename == "-" ? std::cin : (ifs.open(filename), ifs);
}
and then supply an ifstream to the function.
std::ifstream ifs;
auto& is = open_for_read(ifs, the_filename);
// now use `is` everywhere:
if(!is) { /* error */ }
while(std::getline(is, line)) {
// ...
}
ifs will, if it was opened, be closed when it goes out of scope as usual.
A throwing version might look like this:
std::istream& open_for_read(std::ifstream& ifs, const std::string& filename) {
if(filename == "-") return std::cin;
ifs.open(filename);
if(!ifs) throw std::runtime_error(filename + ": " + std::strerror(errno));
return ifs;
}
As an alternative to Ted's answer (which I think I prefer, actually), you could make your custom deleter a bit smarter:
auto stream_deleter = [] (std::istream *stream) { if (stream != &std::cin) delete stream; };
using stream_ptr = std::unique_ptr <std::istream, decltype (stream_deleter)>;
stream_ptr open_for_read (const std::string& filename)
{
if (filename == "-")
return stream_ptr (&std::cin, stream_deleter);
auto sp = stream_ptr (new std::ifstream (filename), stream_deleter);
if (!sp->good ())
throw std::runtime_error (filename);
return sp;
}
Then the same deleter works for both cases and there are no typing problems.
Live demo
Something which I've used in the past was calling rdbuf to change the buffer of std::cin. That may be useful if you don't want to change existing code using std::cin. You have to pay attention not to use the buffer after it has been destroyed, but, that's nothing that a RAII wrapper can't solve. Something like (not tested, not even proven correct):
struct stream_redirector {
stream_redirector(std::iostream& s, std::string const& filename,
std::ios_base::openmode mode = ios_base::in)
: redirected_stream_{s}
{
if (filename != "-") {
stream_.open(filename, mode);
if (stream_) {
throw std::runtime_error(filename + ": " + std::strerror(errno));
saved_buf_ = redirected_stream_.rdbuf();
redirected_stream_.rdbuf(stream_.rdbuf());
}
}
~stream_redirector() {
if (saved_buf_ != nullptr) {
redirected_stream_.rdbuf(saved_buf_);
}
}
private:
std::stream& redirected_stream_;
std::streambuf* saved_buf_{nullptr};
std::fstream stream_;
};
To be used:
...
stream_redirector cin_redirector(std::cin, filename);
std::string str;
std::cin >> str;
...
Related
In a console program I am creating, I have a bit of code that parses through a file. After parsing each line, it is checked for syntax errors. If there is a syntax error, the program then stops reading the file and goes to the next part of the program. The problem is, it is very messy as my only solution to it so far is a series of nested if statements or a line of if statements. The problem with nested ifs is it gets very messy very fast, and a series of if statements has the program testing for several things that don't need to be tested. Heres some sudo code of my problem (note I am NOT using a return statement)
Pseudo code shown instead of real code, as it is very large
Nested if:
open file;
read line;
//Each if is testing something different
//Every error is different
if (line is valid)
{
read line;
if (line is valid)
{
read line;
if (line is valid)
{
do stuff;
}
else
error;
}
else
error;
}
else
error;
code that must be reached, even if there was an error;
Non-nested ifs:
bool fail = false;
open file;
read line;
//Each if is testing something different
//Every error is different
if (line is valid)
read line;
else
{
error;
fail = true;
}
if (!error && line is valid)
read line;
else
{
error;
fail = true;
}
if (!error && line is valid)
do stuff;
else
error;
//Note how error is constantly evaluated, even if it has already found to be false
code that must be reached, even if there was an error;
I have looked at many different sites, but their verdicts differed from my problem. This code does work at runtime, but as you can see it is not very elegant. Is there anyone who has a more readable/efficient approach on my problem? Any help is appreciated :)
Two options come to mind:
Option 1: chain reads and validations
This is similar to how std::istream extraction operators work. You could do something like this:
void your_function() {
std::ifstream file("some_file");
std::string line1, line2, line3;
if (std::getline(file, line1) &&
std::getline(file, line2) &&
std::getline(file, line3)) {
// do stuff
} else {
// error
}
// code that must be reached, even if there was an error;
}
Option 2: split into different functions
This can get a little long, but if you split things out right (and give everything a sane name), it can actually be very readable and debuggable.
bool step3(const std::string& line1,
const std::string& line2,
const std::string& line3) {
// do stuff
return true;
}
bool step2(std::ifstream& file,
const std::string& line1,
const std::string& line2) {
std::string line3;
return std::getline(file, line3) && step3(line1, line2, line3);
}
bool step1(std::ifstream& file,
const std::string& line1) {
std::string line2;
return std::getline(file, line2) && step2(file, line1, line2);
}
bool step0(std::ifstream& file) {
std::string line1;
return std::getline(file, line1) && step1(file, line1);
}
void your_function() {
std::ifstream file("some_file");
if (!step0(file)) {
// error
}
// code that must be reached, even if there was an error;
}
This example code is a little too trivial. If the line validation that occurs in each step is more complicated than std::getline's return value (which is often the case when doing real input validation), then this approach has the benefit of making that more readable. But if the input validation is as simple as checking std::getline, then the first option should be preferred.
Is there [...] a more readable/efficient approach on my problem
Step 1. Look around for a classical example of text parser
Answer: a compiler, which parses text files and produces different kind of results.
Step 2. Read some theory how does compilers work
There are lots of approaches and techniques. Books, online and open source examples. Simple and complicated.
Sure, you might just skip this step if you are not that interested.
Step 3. Apply theory on you problem
Looking through the theory, you will no miss such therms as "state machine", "automates" etc. Here is a brief explanation on Wikipedia:
https://en.wikipedia.org/wiki/Automata-based_programming
There is basically a ready to use example on the Wiki page:
#include <stdio.h>
enum states { before, inside, after };
void step(enum states *state, int c)
{
if(c == '\n') {
putchar('\n');
*state = before;
} else
switch(*state) {
case before:
if(c != ' ') {
putchar(c);
*state = inside;
}
break;
case inside:
if(c == ' ') {
*state = after;
} else {
putchar(c);
}
break;
case after:
break;
}
}
int main(void)
{
int c;
enum states state = before;
while((c = getchar()) != EOF) {
step(&state, c);
}
if(state != before)
putchar('\n');
return 0;
}
Or a C++ example with state machine:
#include <stdio.h>
class StateMachine {
enum states { before = 0, inside = 1, after = 2 } state;
struct branch {
unsigned char new_state:2;
unsigned char should_putchar:1;
};
static struct branch the_table[3][3];
public:
StateMachine() : state(before) {}
void FeedChar(int c) {
int idx2 = (c == ' ') ? 0 : (c == '\n') ? 1 : 2;
struct branch *b = & the_table[state][idx2];
state = (enum states)(b->new_state);
if(b->should_putchar) putchar(c);
}
};
struct StateMachine::branch StateMachine::the_table[3][3] = {
/* ' ' '\n' others */
/* before */ { {before,0}, {before,1}, {inside,1} },
/* inside */ { {after, 0}, {before,1}, {inside,1} },
/* after */ { {after, 0}, {before,1}, {after, 0} }
};
int main(void)
{
int c;
StateMachine machine;
while((c = getchar()) != EOF)
machine.FeedChar(c);
return 0;
}
Sure, instead of chars you should feed lines.
This technique scales up to a complicated compilers, proven with tons of implementations. So if you are looking for a "right" approach, here it is.
A common modern practice is an early return with RAII. Basically it means that the code that must happen should be in a destructor of a class, and your function will have a local object of that class. Now when you have error you exit early from the function (either with Exception or just plain return) and the destructor of that local object will handle the code that must happen.
The code will look something like this:
class Guard
{
...
Guard()
~Guard() { /*code that must happen */}
...
}
void someFunction()
{
Gaurd localGuard;
...
open file;
read line;
//Each if is testing something different
//Every error is different
if (!line)
{
return;
}
read line;
if (!line)
{
return;
}
...
}
What is the proper c++11 way to extract a set of characters out of a stringstream without using boost?
I want to do it without copying, if possible, because where this is used is in a critical data loop. It seems, though, std::string does not allow direct access to the data.
For example, the code below performs a substring copy out of a stringstream:
inline std::string left(std::stringstream ss, uint32_t count) {
char* buffer = new char[count];
ss.get(buffer, count);
std::string str(buffer); // Second copy performed here
delete buffer;
return str;
}
Should I even be using char *buffer according to c++11?
How do I get around making a second copy?
My understanding is that vectors initialize every character, so I want to avoid that.
Also, this needs to be passed into a function which accepts const char *, so now after this runs I am forced to do a .c_str(). Does this also make a copy?
It would be nice to be able to pass back a const char *, but that seems to go against the "proper" c++11 style.
To understand what I am trying to do, here is "effectively" what I want to use it for:
fprintf( stderr, "Data: [%s]...", left(ststream, 255) );
But the c++11 forces:
fprintf( stderr, "Data: [%s]...", left(str_data, 255).c_str() );
How many copies of that string am I making here?
How can I reduce it to only a single copy out of the stringstream?
You could use something like described in this link: How to create a std::string directly from a char* array without copying?
Basically, create a string, call the resize() method on the string with the size that is passed to your function and then pass the pointer to the first character of the string to the stringstring.get() method. You will end up with only one copy.
inline std::string left(std::stringstream& ss, uint32_t count) {
std::string str;
str.resize(count);
ss.get(&str[0], count);
return str;
}
My suggestion:
Create the std::string to be returned by giving it the size.
Read the characters one by one from the stringstream and set the values in the std::string.
Here's what the function looks like:
inline std::string left(std::stringstream ss, uint32_t count) {
std::string str(count+1, '\0');
for (uint32_t i = 0; i < count; ++i )
{
int c = ss.getc();
if ( c != EOF )
{
str[i] = c;
}
else
{
break;
}
}
return str;
}
R Sahu, this I like! Obvious now that I see it done. ;-)
I do have one mod though (as well as passed a shared_ptr of stream which is what I actually had in my version):
In your initializer, you are filling with nulls. You only need to fill with the last one, so I propose a tweak of this:
inline std::string left(std::shared_ptr<std::stringstream> ss, uint32_t count) {
std::string str;
str.reserve(count + 1);
uint32_t i;
for(i = 0; i < count; ++i) {
int c = ss->get();
if(c != EOF) {
str[i] = c;
} else {
break;
}
}
str[i] = '\0';
return str;
}
Now, only initialized with nulls on a single character.
Thanks R Sahu!
If the purpose of this function is solely for passing to fprintf or another C-style stream, then you could avoid allocation completely by doing the following:
void left(FILE *out, std::stringstream &in, size_t count)
{
in.seekg(0);
char ch;
while ( count-- && in.get(ch) )
fputc(out, static_cast<unsigned char>(ch));
}
Usage:
fprintf( stderr, "Data: [" );
left(stderr, stream, 255);
fprintf( stderr, "] ...\n");
Bear in mind that another seekg will be required if you try to use the stream reading functions on the stringstream later; and it would not surprise me if this is the same speed or slower than the options involving str().
I am trying to work with ifstream and istringstream using only on variable. I know that both of them are children of istream. So, I am trying to make only one variable of type istream and the intialize depending on some input.
The real problem is that I asked the user to input file path or content of file. Then, I will read it line by line. I tried to do like this.
istream * stream;
if(isFile){
ifstream a("fileOrContent");
stream = &a;
} else {
istringstream a("fileOrContent");
stream = &a;
}
getline(stream,line)
// do something with line
I also tried this
ifstream stream;
if(isFile){
ifstream stream("fileOrContent");
} else {
istringstream stream("fileOrContent");
}
getline(stream,line)
// do something with line
Currently, I using two full copies of my code for each one. Any suggestions of how I might do it?
Thank you
How about refactoring your code like this:
void process(std::istream & is)
{
// ....
}
int main()
{
if (isFile)
{
std::ifstream is("foo.txt");
process(is);
}
else
{
std::istringstream is(str);
process(is);
}
}
What you are trying to do is something like this:
istream * stream;
if(isFile){
stream = new ifstream("fileOrContent");
} else {
stream = new istringstream("fileOrContent");
}
getline(*stream,line)
That said you should use a smart pointer to hold the istream pointer to avoid memory leaks, as pointed out by #πάντα ῥεῖ.
i just did something similar to this. you are almost there
if(isFile) {
stream = new ifstream("whatever");
} else {
stream = new istringstream("whatever");
}
getline(*stream, line);
make sure to delete it though
If you don't want to manage the memory yourself, you can use an unique_ptr which will automatically free the memory when it goes out of scope:
#include <memory>
std::unique_ptr<std::istream> stream;
if(isFile){
stream = std::unique_ptr<std::istream>(new ifstream("fileOrContent"));
} else {
stream = std::unique_ptr<std::istream>(new istringstream("fileOrContent"));
}
getline(*stream,line)
Put the getline and the "do something with line" in a function which takes a std::istream & argument.
Then create either an ifstream or an istringstream, and place the function call into the if/else branches.
void DoSomethingWithLine(std::istream &stream)
{
getline(stream,line);
// do something with line
}
if (isFile){
ifstream a("fileOrContent");
DoSomethingWithLine(a);
} else {
istringstream a("fileOrContent");
DoSomethingWithLine(a);
}
It won't get much simpler than this.
I am trying to extract a string from an istream with strings as delimiters, yet i haven't found any string operations with behavior close to such as find() or substr() in istreams.
Here is an example istream content:
delim_oneFUUBARdelim_two
and my goal is to get FUUBAR into a string with as little workarounds as possible.
My current solution was to copy all istream content into a string using this solution for it and then extracting using string operations. Is there a way to avoid this unnecessary copying and only read as much from the istream as needed to preserve all content after the delimited string in case there are more to be found in similar fashion?
You can easily create a type that will consume the expected separator or delimiter:
struct Text
{
std::string t_;
};
std::istream& operator>>(std::istream& is, Text& t)
{
is >> std::skipws;
for (char c: t.t_)
{
if (is.peek() != c)
{
is.setstate(std::ios::failbit);
break;
}
is.get(); // throw away known-matching char
}
return is;
}
See it in action on ideone
This suffices when the previous stream extraction naturally stops without consuming the delimiter (e.g. an int extraction followed by a delimiter that doesn't start with a digit), which will typically be the case unless the previous extraction is of a std::string. Single-character delimiters can be specified to getline, but say your delimiter is "</block>" and the stream contains "<black>metalic</black></block>42" - you'd want something to extract "<black>metallic</black>" into a string, throw away the "</block>" delimiter, and leave the "42" on the stream:
struct Until_Delim {
Until_Delim(std::string& s, std::string delim) : s_(s), delim_(delim) { }
std::string& s_;
std::string delim_;
};
std::istream& operator>>(std::istream& is, const Until_Delim& ud)
{
std::istream::sentry sentry(is);
size_t in_delim = 0;
for (char c = is.get(); is; c = is.get())
{
if (c == ud.delim_[in_delim])
{
if (++in_delim == ud.delim_.size())
break;
continue;
}
if (in_delim) // was part-way into delimiter match...
{
ud.s_.append(ud.delim_, 0, in_delim);
in_delim = 0;
}
ud.s_ += c;
}
// may need to trim trailing whitespace...
if (is.flags() & std::ios_base::skipws)
while (!ud.s_.empty() && std::isspace(ud.s_.back()))
ud.s_.pop_back();
return is;
}
This can then be used as in:
string a_string;
if (some_stream >> Until_Delim(a_string, "</block>") >> whatevers_after)
...
This notation might seem a bit hackish, but there's precedent in Standard Library's std::quoted().
You can see the code running here.
Standard streams are equipped with locales that can do classification, namely the std::ctype<> facet. We can use this facet to ignore() characters in a stream while a certain classification is not present in the next available character. Here's a working example:
#include <iostream>
#include <sstream>
using mask = std::ctype_base::mask;
template<mask m>
void scan_classification(std::istream& is)
{
auto& ctype = std::use_facet<std::ctype<char>>(is.getloc());
while (is.peek() != std::char_traits<char>::eof() && !ctype.is(m, is.peek()))
is.ignore();
}
int main()
{
std::istringstream iss("some_string_delimiter3.1415another_string");
double d;
scan_classification<std::ctype_base::digit>(iss);
if (iss >> d)
std::cout << std::to_string(d); // "3.1415"
}
struct T
{
void eat(std::string const& segment)
{
buffer << segment;
std::string sentence;
while (std::getline(buffer, sentence))
std::cout << "[" << sentence.size() << "]";
}
std::stringstream buffer;
};
int main() {
T t;
t.eat("A\r\nB\nC\nD");
// ^^ ^ ^ ^
}
// Actual output: [2][1][1][1]
// Desired output: [1][1][1][1]
I would like the std::stringstream to strip that carriage return for me (and would prefer not to have to copy and modify segment).
How might I go about this? I would have thought that this would happen anyway, on Linux, for a stream in text mode... but perhaps that mechanism is in the logic of file streams.
This is a general problem on Unix machines when reading files created on
a Windows machine. I would suggest doing the clean-up at the input
level.
One of the best solution I've found when reading line based files is to
create a class something like:
class Line
{
std::string myText;
public:
friend std::istream& operator>>( std::istream& source, Line& dest )
{
std::getline( source, dest.myText );
if ( source ) {
dest.myText.erase(
std::remove( dest.myText.begin(), dest.myText.end(), '\015' ),
dest.myText.end() );
}
return source;
}
operator std::string() const
{
return myText;
}
};
You can add other functions as necessary: the automatic type conversion
doesn't play when trying to match templates, for example, and I found it
useful to add friends to wrap boost::regex_match.
I use this (without the '\015' removal) even when I don't have to
worry about Windows/Linux differences; it supports reading lines using
std::istream_iterator<Line>, for example.
Another solution would be to use a filtering streambuf, inserted into
the input stream. This is also very simple:
class RemoveCRStreambuf : public std::streambuf
{
std::streambuf* mySource;
char myBuffer; // One char buffer required for input.
protected:
int underflow()
{
int results = mySource->sbumpc();
while ( results == '\015' ) {
results = mySource->sbumpc();
}
if ( results != EOF ) {
myBuffer = results;
setg( &myBuffer, &myBuffer + 1, &myBuffer + 1 );
}
return results;
}
public:
RemoveCRStreambuf( std::streambuf* source )
: mySource( source )
{
}
};
To insert it:
std::streambuf* originalSB = source->rdbuf();
RemoveCRStreambuf newSB( originalSB );
source->rdbuf( &newSB );
// Do input here...
source->rdbuf( originalSB ); // Restore...
(Obviously, using some sort of RAII for the restoration would be
preferable. My own filtering streambuf have a constructor which takes
an std::istream; they save a pointer to this as well, and restore the
streambuf in their destructor.)