Wrapping std::getline() - c++

I am struggling with the problem of reading input from file on a per-line basis, in a cross-platform way.
Different platforms use different sequences of characters to represent a new line/end of line.
std::getline doesn't deal with these in a cross platform way.
What do I mean by this?
std::getline changes its behavior depending on the platform on which an executable is compiled. On Windows platforms, it expects to see CRLF to denote line endings. On Linux, it expects just LF.
It does not handle cases where a file contains a line ending which is not what the platform expects. For example a file created on a Windows machine is likely to have CRLF line endings. If that file is copied to a Linux machine without changing the line ending format then std::getline "breaks".
It seemed to me that the easiest way to work around this would be to create a new function which wraps std::getline. Something like this:
return_type GetLine(stream_type ifs, string_type s)
{
return_type ret = std::getline(ifs, s);
s.erase(std::remove(s.begin(), s.end(), '\r' ), s.end());
s.erase(std::remove(s.begin(), s.end(), '\n' ), s.end());
return ret;
}
However at this point I'm stuck. From some searching, although getline returns a stream object (?) it also has an implicit cast-to-bool operator.
I could force return_type to be bool, but then this prevents my wrapper function from returning a stream object, if such a thing were to be required in future.
I also haven't been able to make sense of the STL templates in a sufficient enough way to determine what stream_type and string_type should be. I can force them to be std::ifstream and std::string, but I think this decision would also make the function less generic.
How should I proceed here?

You should take the stream by reference because streams typically cannot be copied. Also the string should be passed by reference because you want to write to it.
To be generic you can use the same interface as std::getline does. As you want to use specific delimiters, they need not be passed as arguments. If you make the function a template then it will work with any stream that also works for std::getline:
#include <iostream>
#include <sstream>
#include <string>
template< class CharT, class Traits, class Allocator >
std::basic_istream<CharT,Traits>& my_getline(
std::basic_istream<CharT,Traits>& input,
std::basic_string<CharT,Traits,Allocator>& str)
{
return std::getline(input,str);
}
int main() {
std::istringstream s{"hello world"};
std::string foo;
my_getline(s,foo);
std::cout << foo;
}
However at this point I'm stuck. From some searching, although getline returns a stream object (?) it also has an implicit cast-to-bool operator.
It's not getline that converts to bool but the stream returned by getline can be converted to bool. Your line is almost correct, but it needs to be a reference (and you need not spell out the type explicitly):
auto& ret = std::getline(ifs, s);
// more code
return ret;
Note that I didn't address the actual issue of extracting characters until any of the delimiters is encountered (rather than only the platform specific newline that you already get with bare std::getline).

Related

Reading from file without using string

I am doing a school project where we must not use std::string. How can I do this? In the txt file the data are separated with a ";", and we do not know the length of the words.
Example:
apple1;apple2;apple3
mango1;mango2;mango3
I tried a lot of things, but nothing worked, always got errors.
I tried using getline, but since it is for string it did not work.
I also tried to reload the operator<< but it did not help.
There are two entirely separate getline()'s. One is std::getline(), which takes a std::string as a parameter.
But there's also a member function in std::istream, which works with an array of chars instead of a std::string, eg:
#include <sstream>
#include <iostream>
int main() {
std::istringstream infile{"apple1;apple2;apple3"};
char buffer[256];
while (infile.getline(buffer, sizeof(buffer), ';'))
std::cout << buffer << "\n";
}
Result:
apple1
apple2
apple3
Note: while this fits the school prohibition against using std::string, there's almost no other situation where it makes sense.

Why is getline written so strangely?

I don't understand the design decisions behind the C++ getline function.
Why does it take a stream and a string by reference as arguments, only to return the same stream that was passed in? It seems more intuitive to only take the stream as an argument, then return the string that was read. Returning the same stream lets you chain the call, but would anyone really want to use getline(getline(stream, x), y)?
Additionally, why is the function not in the std namespace like the rest of the standard library?
If the function returned a string, there would be no way of indicating that the read failed, as all string values are valid values that could be returned by this (or any other) function. On the other hand, a stream has lots of error indicator flags that can be tested by the code that calls getline. So people can write code like:
while( std::getline( std::cin, somestring )) {
// do stuff with somestring
}
and it is hard to see how you could write similar code if getline returned a string.
why is the function not in the std namespace like the rest of the standard library?
It is in the std namespace - what makes you think otherwise?
Why does it take a stream and a string by reference as arguments, only to return the same stream that was passed in?
It is a common pattern in the stream library to do that. It means you can test the operation being performed as you perform it. For example:
std::string line;
while(std::getline(std::cin, line))
{
// use line here because we know the read succeeded
}
You can also make succinct parsers by "chaining" stream functions:
std::string key, value;
if(std::getline(std::getline(in, key, '='), value))
my_map[key] = value;
It seems more intuitive to only take the stream as an argument, then return the string that was read.
The problem with returning a new string every call is that you are constantly allocating new memory for them instead of reusing the memory already allocated to the string you passed in or that it gained while iterating through a loop.
// Here line will not need to allocate memory every time
// through the loop. Only when it finds a longer line than
// it has capacity for:
std::string line;
while(std::getline(std::cin, line))
{
// use line here because we know the read succeeded
}

How to parse a sequence of integers stored in a text buffer?

Parsing text consisting of a sequence of integers from a stream in C++ is easy enough: just decode them. When the data is received somehow and is readily available within a program, e.g., receiving a base64 encoded text (the decoding isn't the problem), the situation is a bit different. The data is sitting in a buffer within the program and only needs to be decoded, not read. Of course, a std::istringstream could be used:
std::vector<int> parse_text(char* begin, char* end) {
std::istringstream in(std::string(begin, end));
return std::vector<int>(std::istream_iterator<int>(in),
std::istream_iterator<int>());
}
Since a lot of these buffers are received and they can be fairly big, it is desirable to not copy the actual content of character array and, ideally, to also avoid creating a stream for each buffer. Thus, the question becomes:
Given a buffer of chars containing a sequences of (space separated; dealing with other separators is easily done, e.g., using a suitable manipulator) integers how can they be decoded without copying the sequence and, if possible, without creating even an std::istream?
Avoiding a copy of the buffer is easily done with a custom stream buffer which simply sets of the get area to use the buffer. The stream buffer actually doesn't even need to override any of the virtual functions and would just set up the internal buffer:
class imemstream
: private virtual std::streambuf
, public std::istream
{
public:
imemstream(char* begin, char* end)
: std::streambuf()
, std::istream(static_cast<std::streambuf*>(this))
{
this->setg(begin, begin, end);
}
};
std::vector<int> parse_data_via_istream(char* begin, char* end)
{
imemstream in(begin, end);
return std::vector<int>(std::istream_iterator<int>(in),
std::istream_iterator<int>());
}
This approach avoids copying the stream and uses the ready made std::istream functionality. However, it does create a stream object. With a suitable update function the stream stream/stream buffer can be extended to reset the buffer and process multiple buffers.
To avoid creation of the stream, the underlying functionality from std::num_get<...> could be used. The actual parsing is done by one of the std::locale facets. The numeric parsing for std::istream is done by std::num_get<char, std::istreambuf_iterator<char>>. This facet isn't much help as it uses a sequence specified by std::istreambuf_iterator<char>s but a std::num_get<char, char const*> facet can be instantiated. It won't be in part of the default std::locale but it easy to create a corresponding std::locale and install it, e.g., as the global std::locale object first thing in main():
int main()
{
std::locale::global(std::locale(std::locale(),
new std::num_get<char, char const*>()));
...
Note that the std::locale object will clean-up the added facet, i.e., there is no need to add any clean-up code: the facets are reference counted and released when the last std::locale holding a particular facet disappears. To actually use the facet it, unfortunately, needs an std::ios_base object which is can only really be obtained from some stream object. However, any stream can be used (although in a multi-threaded system it should probably be a separate stream object per stream to avoid accidental race conditions):
char const* skipspace(char const* it, char const* end)
{
return std::find_if(it, end,
[](unsigned char c){ return !std::isspace(c); });
}
std::vector<int> parse_data_via_istream(std::ios_base& fmt,
char const* it, char const* end)
{
std::vector<int> rc;
std::num_get<char, char const*> const& ng
= std::use_facet<std::num_get<char, char const*>>(std::locale());
std::ios_base::iostate error;
for (long tmp;
(it = ng.get(skipspace(it, end), end, fmt, error, tmp))
, error == std::ios_base::goodbit; ) {
rc.push_back(tmp);
}
return rc;
}
Most of this just about a bit of error handling and skipping leading whitespace: mostly, std::istream provides facilities to automatically skip whitespace for formatted input and deals with the necessary error protocol. There is potentially a small advantage of the approach outlined above with respect to getting the facet just once per buffer and avoiding creation of a std::istream::sentry object as well as avoiding creation of a stream. Of course, the code assumes that some stream can be used to pass it in as its std::ios_base& subobject to provide parsing flags like the base to be used.
OK, this is quite a bit of code for something which strtol() could mostly do, too. The approach using std::num_get<char, char const*> has some flexibility which isn't offered by strtol():
Since the std::locale's facet are used which can be overridden to parse arbitrary formats of representation, e.g., Roman numerals, it more flexible with respect to input formats.
It is easy to set up use of thousands separators or change the representation of the decimal point (just change std::numpunct<char> in std::locale used by fmt to set these up).
The buffer doesn't have to be null-terminated. For example, a contiguous sequence of character made up of 8 digit values can be parsed by feeding it and it+8 as the range when calling std::num_get<char, char const*>::get().
However, strtol() is probably a good approach for most uses. On the other hand, the above provides an alternative which may be useful in some contexts.

storing text to string from a file that contains spaces

So I've been doing algorithms in C++ for about 3 months now as a hobby. I've never had a problem I couldn't solve by googleing up until now. I'm trying to read from a text file that will be converted into a hash table, but when i try and capture the data from a file it ends at a space. here's the code
#include <iostream>
#include <fstream>
int main()
{
using namespace std;
ifstream file("this.hash");
file >> noskipws;
string thing;
file >> thing;
cout << thing << endl;
return 0;
}
I'm aware of the noskipws flag i just don't know how to properly implement it
When using the formatted input operator for std::string it always stops at what the stream considers to be whitespace. Using the std::locale's character classification facet std::ctype<char> you can redefine what space means. It's a bit involved, though.
If you want to read up to a specific separator, you can use std::getline(), possibly specifying the separator you are interested in, e.g.:
std::string value;
if (std::getline(in, value, ',')) { ... }
reads character until it finds a comma or the end of the file is reached and stores the characters up to the separator in value.
If you just want to read the entire file, one way to do is to use
std::ifstream in(file.c_str());
std::string all((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>());
I think the best tool for what you're trying to do is get, getline or read. Now those all use char buffers rather than std::strings, so need a bit more thought, but they're quite straightforward really. (edit: std::getline( file, string ), as pointed out by Dietmar Kühl, uses c++ strings rather than character buffers, so I would actually recommend that. Then you won't need to worry about maximum line lengths)
Here's an example which will loop through the entire file:
#include <iostream>
int main () {
char buffer[1024]; // line length is limited to 1023 bytes
std::ifstream file( "this.hash" );
while( file.good( ) ) {
file.getline( buffer, sizeof( buffer ) );
std::string line( buffer ); // convert to c++ string for convenience
// do something with the line
}
return 0;
}
(note that line length is limited to 1023 bytes, and if a line is longer it will be broken into 2 reads. When it's a true newline, you'll see a \n character at the end of the string)
Of course, if you a maximum length for your file in advance, you can just set the buffer accordingly and do away with the loop. If the buffer needs to be very big (more than a couple of kilobytes), you should probably use new char[size] and delete[] instead of the static array, to avoid stack overflows.
and here's a reference page: http://www.cplusplus.com/reference/fstream/ifstream/

Conversion problems in C++ (string expected)

I have a function that I cannot touch, Is a "log maker", It puts something to print in a file an show it up when I run the file. The problem is that the function only gets const string so if I want to print something I have to convert everything in this data type (I cannot use cout).
itoa & atoi functions are not standard functions so I cannot use it neither. C++ is very "special" with data types and doesn't accept conversions really easy, so this is my question:
How can I convert everytype of data into string for the log purposes?
Probably I should check data type on a function to convert things and returning them into a stringstream (witch I have to convert into a string, of course).
So, any advice on how to do that?
boost::lexical_cast encapsulates the use of ostringstream, so you
could use that. Otherwise, the code isn't that difficult:
template<typename T>
std::string
toString( T const& object )
{
std::ostringstream results;
results << object;
return results.str();
}
(There's no reason to use stringstream here; ostringstream is largely sufficient.
You can use
std::stringstream
or
boost lexical_cast<>
Yes, if you want arbitrary type in string representation stringstream intermediate sounds like a solution.
I assume the functions expects a const std::string & ?
Your approach with std::stringstream is correct. Alternatively you could simply write a toString() method for the class you wish to directly output. However, usually when one wants to output objects to a file, overloads the << operator for that particular type.