Matching a string at the beginning of a inputstream - c++

I have implemented a simple inputstream manipulator to match the next n chars in an inputstream against a given string. However, I am not sure if this is the best way to do this. Any hints?
class MatchString {
private:
std::string mString;
public:
MatchString(const std::string &str) {
mString = str;
}
std::istream& operator()(std::istream& is) const {
// Allocate a string buffer, ...
char *buffer = new char[mString.length()];
// ... read next n chars into the buffer ...
is.read(buffer, mString.length());
// ... and compare them with given string.
if(strncmp(buffer, mString.c_str(), mString.length())) {
throw MismatchException(mString);
}
delete[] buffer;
return is;
}
};
inline MatchString match(const std::string &str) {
return MatchString(str);
}
inline std::istream& operator>>(std::istream& is, const MatchString& matchStr) {
return matchStr(is);
}
EDIT:
A solution consuming the matched chars could be implemented based on the suggestion of user673679:
class MatchString {
...
std::istream& operator()(std::istream& is) const {
// Match the next n chars.
std::for_each(mString.begin(), mString.end(),
[&](const char c) {
if(is.get() != c) {
throw MismatchException(mString);
}
});
return is;
}
};
How would I implement this if I don't want to consume the chars?
EDIT II:
Here another solution mentioned by fjardon:
class MatchString {
...
std::istream& operator()(std::istream& is) const {
// Match the next n chars.
if(std::mismatch(mString.begin(), mString.end(),
std::istreambuf_iterator<char>(is)).first != mString.end()) {
throw MismatchException(mString);
}
return is;
}
};
EDIT III:
Finally got a working function, that will revert consumption, if string doesn't match:
class MatchString {
...
std::istream& operator()(std::istream& is) const {
// Match the next n chars.
std::streampos oldPos = is.tellg();
if(std::mismatch(mString.begin(), mString.end(),
std::istreambuf_iterator<char>(is)).first != mString.end()) {
is.seekg(oldPos);
throw MismatchException(mString);
}
return is;
}
};

Instead of allocating and copying the whole string from the stream, you could just check one character at a time and avoid allocating the buffer completely:
#include <iostream>
#include <sstream>
#include <string>
auto mString = std::string("foobar");
std::istream& match(std::istream& is) {
for (auto c : mString)
if (c != is.get())
throw std::runtime_error("nope");
return is;
}
int main()
{
auto input = "foobarbaz";
auto stream = std::istringstream(input);
match(stream);
std::cout << "done!" << std::endl;
}
You should also add error checking for is.get() (or .read() in your original code).

Related

Extracting a specific thing from a string

I have a string in the format <a,b>, which represents an edge in a directed graph (a is source and b is target). a and b are also strings themselves (for example, a can be "Square" and b is "Circle").
I need to build a function which extracts a, and another function which extracts b. So the signature will be:
string getSource(String edge); //will return b
string getTarget(String edge); //will return a
I am using the std::string library to represent those strings.
I know that I need to find a way to find the ',' separating them in the middle of the string, and get rid of the '<' and '>'. But I couldn't find a function in std::string that will help me to do that.
How would you go about on doing this?
This seems to be a good use case for a regex:
std::regex sd {R"(<(.*),(.*)>)"};
and then your functions can be written as:
std::string getSource(std::string const & edge) {
std::smatch m;
std::regex_match(edge, m, sd);
return m[1].str();
}
and in getTarget you would return m[2].str();.
If you know for certain that the string is in the correct format, this is just a matter of using std::find to locate the characters of interest and then constructing a new string from those iterators. For example:
std::string getSource(std::string const & edge) {
return {
std::next(std::find(std::begin(edge), std::end(edge), '<')),
std::find(std::begin(edge), std::end(edge), ',')
};
}
std::string getTarget(std::string const & edge) {
return {
std::next(std::find(std::begin(edge), std::end(edge), ',')),
std::find(std::begin(edge), std::end(edge), '>')
};
}
If the strings are not in the correct format then these functions could exhibit undefined behavior. This could be fixed trivially with the use of a helper function:
template <typename T>
std::string checkedRangeToString(T begin, T end) {
if (begin >= end) {
// Bad format... throw an exception or return an empty string?
return "";
}
return {begin, end};
}
std::string getSource(std::string const & edge) {
return checkedRangeToString(
std::next(std::find(std::begin(edge), std::end(edge), '<')),
std::find(std::begin(edge), std::end(edge), ',')
);
}
std::string getTarget(std::string const & edge) {
return checkedRangeToString(
std::next(std::find(std::begin(edge), std::end(edge), ',')),
std::find(std::begin(edge), std::end(edge), '>')
);
}
This sounds like it belongs in a class whose constructor takes that std::string argument and parses it.
class edge {
public:
edge(const std::string& str);
std::string source() const { return src; }
std::string target() const { return tgt; }
private:
std::string src;
std::string tgt;
};
edge::edge(const std::string& str) {
auto comma = std::find(std::begin(str), std::end(str), ',');
if (str.length() < 3 || comma == std::end(str) || str.front() != '<' || str.back() != '>')
throw std::runtime_error("bad input");
src = std::string(std::next(std::begin(str)), comma);
tgt = std::string(std::next(comma), std::prev(std::end(str)));
}
I wouldn't use a regular expression for such a simple parse. Regular expressions are expensive and highly overrated.

How to extract data from a line which has fields separated by '|' character in C++?

I have data in the following format in a text file. Filename - empdata.txt
Note that there are no blank space between the lines.
Sl|EmployeeID|Name|Department|Band|Location
1|327427|Brock Mcneil|Research and Development|U2|Pune
2|310456|Acton Golden|Advertising|P3|Hyderabad
3|305540|Hollee Camacho|Payroll|U3|Bangalore
4|218801|Simone Myers|Public Relations|U3|Pune
5|144051|Eaton Benson|Advertising|P1|Chennai
I have a class like this
class empdata
{
public:
int sl,empNO;
char name[20],department[20],band[3],location[20];
};
I created an array of objects of class empdata.
How to read the data from the file which has n lines of data in the above specified format and store them to the array of (class)objects created?
This is my code
int main () {
string line;
ifstream myfile ("empdata.txt");
for(int i=0;i<10;i++) //processing only first 10 lines of the file
{
getline (myfile,line);
//What should I do with this "line" so that I can extract data
//from this line and store it in the class object?
}
return 0;
}
So basically my question is how to extract data from a string which has data separated by '|' character and store each data to a separate variable
I prefer to use the String Toolkit. The String Toolkit will take care of converting the numbers as it parses.
Here is how I would solve it.
#include <fstream>
#include <strtk.hpp> // http://www.partow.net/programming/strtk
using namespace std;
// using strings instead of character arrays
class Employee
{
public:
int index;
int employee_number;
std::string name;
std::string department;
std::string band;
std::string location;
};
std::string filename("empdata.txt");
// assuming the file is text
std::fstream fs;
fs.open(filename.c_str(), std::ios::in);
if(fs.fail()) return false;
const char *whitespace = " \t\r\n\f";
const char *delimiter = "|";
std::vector<Employee> employee_data;
// process each line in turn
while( std::getline(fs, line ) )
{
// removing leading and trailing whitespace
// can prevent parsing problemsfrom different line endings.
strtk::remove_leading_trailing(whitespace, line);
// strtk::parse combines multiple delimeters in these cases
Employee e;
if( strtk::parse(line, delimiter, e.index, e.employee_number, e.name, e.department, e.band, e.location) )
{
std::cout << "succeed" << std::endl;
employee_data.push_back( e );
}
}
AFAIK, there is nothing that does it out of the box. But you have all the tools to build it yourself
The C way
You read the lines into a char * (with cin.getline()) and then use strtok, and strcpy
The getline way
The getline function accept a third parameter to specify a delimiter. You can make use of that to split the line through a istringstream. Something like :
int main() {
std::string line, temp;
std::ifstream myfile("file.txt");
std::getline(myfile, line);
while (myfile.good()) {
empdata data;
std::getline(myfile, line);
if (myfile.eof()) {
break;
}
std::istringstream istr(line);
std::getline(istr, temp, '|');
data.sl = ::strtol(temp.c_str(), NULL, 10);
std::getline(istr, temp, '|');
data.empNO = ::strtol(temp.c_str(), NULL, 10);
istr.getline(data.name, sizeof(data.name), '|');
istr.getline(data.department, sizeof(data.department), '|');
istr.getline(data.band, sizeof(data.band), '|');
istr.getline(data.location, sizeof(data.location), '|');
}
return 0;
}
This is the C++ version of the previous one
The find way
You read the lines into a string (as you currently do) and use string::find(char sep, size_t pos) to find next occurence of the separator and copy the data (from string::c_str()) between start of substring and separator to your fields
The manual way
You just iterate the string. If the character is a separator, you put a NULL at the end of current field and pass to next field. Else, you just write the character in current position of current field.
Which to choose ?
If you are more used to one of them, stick to it.
Following is just my opinion.
The getline way will be the simplest to code and to maintain.
The find way is mid level. It is still at a rather high level and avoids the usage of istringstream.
The manual way will be really low level, so you should structure it to make it maintainable. For example your could a explicit description of the lines as an array of fields with a maximimum size and current position. And as you have both int and char[] fields it will be tricky. But you can easily configure it the way you want. For example, your code only allow 20 characters for department field, whereas Research and Development in line 2 is longer. Without special processing, the getline way will leave the istringstream in bad state and will not read anything more. And even if you clear the state, you will be badly positionned. So you should first read into a std::string and then copy the beginning to the char * field.
Here is a working manual implementation :
class Field {
public:
virtual void reset() = 0;
virtual void add(empdata& data, char c) = 0;
};
class IField: public Field {
private:
int (empdata::*data_field);
bool ok;
public:
IField(int (empdata::*field)): data_field(field) {
ok = true;
reset();
}
void reset() { ok = true; }
void add(empdata& data, char c);
};
void IField::add(empdata& data, char c) {
if (ok) {
if ((c >= '0') && (c <= '9')) {
data.*data_field = data.*data_field * 10 + (c - '0');
}
else {
ok = false;
}
}
}
class CField: public Field {
private:
char (empdata::*data_field);
size_t current_pos;
size_t size;
public:
CField(char (empdata::*field), size_t size): data_field(field), size(size) {
reset();
}
void reset() { current_pos = 0; }
void add(empdata& data, char c);
};
void CField::add(empdata& data, char c) {
if (current_pos < size) {
char *ix = &(data.*data_field);
ix[current_pos ++] = c;
if (current_pos == size) {
ix[size -1] = '\0';
current_pos +=1;
}
}
}
int main() {
std::string line, temp;
std::ifstream myfile("file.txt");
Field* fields[] = {
new IField(&empdata::sl),
new IField(&empdata::empNO),
new CField(reinterpret_cast<char empdata::*>(&empdata::name), 20),
new CField(reinterpret_cast<char empdata::*>(&empdata::department), 20),
new CField(reinterpret_cast<char empdata::*>(&empdata::band), 3),
new CField(reinterpret_cast<char empdata::*>(&empdata::location), 20),
NULL
};
std::getline(myfile, line);
while (myfile.good()) {
Field** f = fields;
empdata data = {0};
std::getline(myfile, line);
if (myfile.eof()) {
break;
}
for (std::string::const_iterator it = line.begin(); it != line.end(); it++) {
char c;
c = *it;
if (c == '|') {
f += 1;
if (*f == NULL) {
continue;
}
(*f)->reset();
}
else {
(*f)->add(data, c);
}
}
// do something with data ...
}
for(Field** f = fields; *f != NULL; f++) {
free(*f);
}
return 0;
}
It is directly robust, efficient and maintainable : adding a field is easy, and it is tolerant to errors in input file. But it is way loooonger than the other ones, and would need much more tests. So I would not advise to use it without special reasons (necessity to accept multiple separators, optional fields and dynamic order, ...)
Try this simple code segment , this will read the file and , give a print , you can read line by line and later you can use that to process as you need .
Data : provided bu you : in file named data.txt.
package com.demo;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
public class Demo {
public static void main(String a[]) {
try {
File file = new File("data.txt");
FileReader fileReader = new FileReader(file);
BufferedReader bufferReader = new BufferedReader(fileReader);
String data;
while ((data = bufferReader.readLine()) != null) {
// data = br.readLine( );
System.out.println(data);
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
In console you will get output like this :
Sl|EmployeeID|Name|Department|Band|Location
1|327427|Brock Mcneil|Research and Development|U2|Pune
2|310456|Acton Golden|Advertising|P3|Hyderabad
3|305540|Hollee Camacho|Payroll|U3|Bangalore
4|218801|Simone Myers|Public Relations|U3|Pune
5|144051|Eaton Benson|Advertising|P1|Chennai
This is a simple idea, you may do what you need.
In C++ you can change the locale to add an extra character to the separator list of the current locale:
#include <locale>
#include <iostream>
struct pipe_is_space : std::ctype<char> {
pipe_is_space() : std::ctype<char>(get_table()) {}
static mask const* get_table()
{
static mask rc[table_size];
rc['|'] = std::ctype_base::space;
rc['\n'] = std::ctype_base::space;
return &rc[0];
}
};
int main() {
using std::string;
using std::cin;
using std::locale;
cin.imbue(locale(cin.getloc(), new pipe_is_space));
string word;
while(cin >> word) {
std::cout << word << "\n";
}
}

inserting text before each line using std::ostream

I would like to know if it is possible to inherit from std::ostream, and to override flush() in such a way that some information (say, the line number) is added to the beginning of each line. I would then like to attach it to a std::ofstream (or cout) through rdbuf() so that I get something like this:
ofstream fout("file.txt");
myostream os;
os.rdbuf(fout.rdbuf());
os << "this is the first line.\n";
os << "this is the second line.\n";
would put this into file.txt
1 this is the first line.
2 this is the second line.
flush() wouldn't be the function to override in this context, though you're on the right track. You should redefine overflow() on the underlying std::streambuf interface. For example:
class linebuf : public std::streambuf
{
public:
linebuf() : m_sbuf() { m_sbuf.open("file.txt", std::ios_base::out); }
int_type overflow(int_type c) override
{
char_type ch = traits_type::to_char_type(c);
if (c != traits_type::eof() && new_line)
{
std::ostream os(&m_sbuf);
os << line_number++ << " ";
}
new_line = (ch == '\n');
return m_sbuf.sputc(ch);
}
int sync() override { return m_sbuf.pubsync() ? 0 : -1; }
private:
std::filebuf m_sbuf;
bool new_line = true;
int line_number = 1;
};
Now you can do:
linebuf buf;
std::ostream os(&buf);
os << "this is the first line.\n"; // "1 this is the first line."
os << "this is the second line.\n"; // "2 this is the second line."
Live example
James Kanze's classic article on Filtering Streambufs has a very similar example which puts a timestamp at the beginning of every line. You could adapt that code.
Or, you could use the Boost tools that grew out of the ideas in that article.
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/array.hpp>
#include <cstring>
#include <limits>
// line_num_filter is a model of the Boost concept OutputFilter which
// inserts a sequential line number at the beginning of every line.
class line_num_filter
: public boost::iostreams::output_filter
{
public:
line_num_filter();
template<typename Sink>
bool put(Sink& snk, char c);
template<typename Device>
void close(Device&);
private:
bool m_start_of_line;
unsigned int m_line_num;
boost::array<char, std::numeric_limits<unsigned int>::digits10 + 4> m_buf;
const char* m_buf_pos;
const char* m_buf_end;
};
line_num_filter::line_num_filter() :
m_start_of_line(true),
m_line_num(1),
m_buf_pos(m_buf.data()),
m_buf_end(m_buf_pos)
{}
// put() must return true if c was written to dest, or false if not.
// After returning false, put() with the same c might be tried again later.
template<typename Sink>
bool line_num_filter::put(Sink& dest, char c)
{
// If at the start of a line, print the line number into a buffer.
if (m_start_of_line) {
m_buf_pos = m_buf.data();
m_buf_end = m_buf_pos +
std::snprintf(m_buf.data(), m_buf.size(), "%u ", m_line_num);
m_start_of_line = false;
}
// If there are buffer characters to be written, write them.
// This can be interrupted and resumed if the sink is not accepting
// input, which is why the buffer and pointers need to be members.
while (m_buf_pos != m_buf_end) {
if (!boost::iostreams::put(dest, *m_buf_pos))
return false;
++m_buf_pos;
}
// Copy the actual character of data.
if (!boost::iostreams::put(dest, c))
return false;
// If the character copied was a newline, get ready for the next line.
if (c == '\n') {
++m_line_num;
m_start_of_line = true;
}
return true;
}
// Reset the filter object.
template<typename Device>
void line_num_filter::close(Device&)
{
m_start_of_line = true;
m_line_num = 1;
m_buf_pos = m_buf_end = m_buf.data();
}
int main() {
using namespace boost::iostreams;
filtering_ostream myout;
myout.push(line_num_filter());
myout.push(std::cout);
myout << "this is the first line.\n";
myout << "this is the second line.\n";
}

How to cleanly extract a string delimited string from an istream in c++

I am trying to extract a string from an istream with strings as delimiters, yet i haven't found any string operations with behavior close to such as find() or substr() in istreams.
Here is an example istream content:
delim_oneFUUBARdelim_two
and my goal is to get FUUBAR into a string with as little workarounds as possible.
My current solution was to copy all istream content into a string using this solution for it and then extracting using string operations. Is there a way to avoid this unnecessary copying and only read as much from the istream as needed to preserve all content after the delimited string in case there are more to be found in similar fashion?
You can easily create a type that will consume the expected separator or delimiter:
struct Text
{
std::string t_;
};
std::istream& operator>>(std::istream& is, Text& t)
{
is >> std::skipws;
for (char c: t.t_)
{
if (is.peek() != c)
{
is.setstate(std::ios::failbit);
break;
}
is.get(); // throw away known-matching char
}
return is;
}
See it in action on ideone
This suffices when the previous stream extraction naturally stops without consuming the delimiter (e.g. an int extraction followed by a delimiter that doesn't start with a digit), which will typically be the case unless the previous extraction is of a std::string. Single-character delimiters can be specified to getline, but say your delimiter is "</block>" and the stream contains "<black>metalic</black></block>42" - you'd want something to extract "<black>metallic</black>" into a string, throw away the "</block>" delimiter, and leave the "42" on the stream:
struct Until_Delim {
Until_Delim(std::string& s, std::string delim) : s_(s), delim_(delim) { }
std::string& s_;
std::string delim_;
};
std::istream& operator>>(std::istream& is, const Until_Delim& ud)
{
std::istream::sentry sentry(is);
size_t in_delim = 0;
for (char c = is.get(); is; c = is.get())
{
if (c == ud.delim_[in_delim])
{
if (++in_delim == ud.delim_.size())
break;
continue;
}
if (in_delim) // was part-way into delimiter match...
{
ud.s_.append(ud.delim_, 0, in_delim);
in_delim = 0;
}
ud.s_ += c;
}
// may need to trim trailing whitespace...
if (is.flags() & std::ios_base::skipws)
while (!ud.s_.empty() && std::isspace(ud.s_.back()))
ud.s_.pop_back();
return is;
}
This can then be used as in:
string a_string;
if (some_stream >> Until_Delim(a_string, "</block>") >> whatevers_after)
...
This notation might seem a bit hackish, but there's precedent in Standard Library's std::quoted().
You can see the code running here.
Standard streams are equipped with locales that can do classification, namely the std::ctype<> facet. We can use this facet to ignore() characters in a stream while a certain classification is not present in the next available character. Here's a working example:
#include <iostream>
#include <sstream>
using mask = std::ctype_base::mask;
template<mask m>
void scan_classification(std::istream& is)
{
auto& ctype = std::use_facet<std::ctype<char>>(is.getloc());
while (is.peek() != std::char_traits<char>::eof() && !ctype.is(m, is.peek()))
is.ignore();
}
int main()
{
std::istringstream iss("some_string_delimiter3.1415another_string");
double d;
scan_classification<std::ctype_base::digit>(iss);
if (iss >> d)
std::cout << std::to_string(d); // "3.1415"
}

Can you specify what ISN'T a delimiter in std::getline?

I want it to consider anything that isn't an alphabet character to be a delimiter. How can I do this?
You can't. The default delimiter is \n:
while (std::getline (std::cin, str) // '\n' is implicit
For other delimiters, pass them:
while (std::getline (std::cin, str, ' ') // splits at a single whitespace
However, the delimiter is of type char, thus you can only use one "split-character", but not what not to match.
If your input already happens to be inside a container like std::string, you can use find_first_not_of or find_last_not_of.
In your other question, are you sure you have considered all answers? One uses istream::operator>>(std::istream&, <string>), which will match a sequence of non-whitespace characters.
You don't. getline is a simple tool for a simple job. If you need something more complex, then you need to use a more complex tool, like RegEx's or something.
You can't do what you want using std::getline(), but you can roll your own. Here's a getline variant that let's you specify a predicate (function, functor, lambda if it's C++11) to indicate if a character is a delimiter along with a couple overloads that let you pass in a string of delimiter characters (kind of like strtok()):
#include <functional>
#include <iostream>
#include <string>
using namespace std;
template <typename Predicate>
istream& getline_until( istream& is, string& str, Predicate pred)
{
bool changed = false;
istream::sentry k(is,true);
if (bool(k)) {
streambuf& rdbuf(*is.rdbuf());
str.erase();
istream::traits_type::int_type ch = rdbuf.sgetc(); // get next char, but don't move stream position
for (;;ch = rdbuf.sgetc()) {
if (istream::traits_type::eof() == ch) {
is.setstate(ios_base::eofbit);
break;
}
changed = true;
rdbuf.sbumpc(); // move stream position to consume char
if (pred(istream::traits_type::to_char_type(ch))) {
break;
}
str.append(1,istream::traits_type::to_char_type(ch));
if (str.size() == str.max_size()) {
is.setstate(ios_base::failbit);
break;
}
}
if (!changed) {
is.setstate(ios_base::failbit);
}
}
return is;
}
// a couple of overloads (along with a predicate) that allow you
// to pass in a string that contains a set of delimiter characters
struct in_delim_set : unary_function<char,bool>
{
in_delim_set( char const* delim_set) : delims(delim_set) {};
in_delim_set( string const& delim_set) : delims(delim_set) {};
bool operator()(char ch) {
return (delims.find(ch) != string::npos);
};
private:
string delims;
};
istream& getline_until( istream& is, string& str, char const* delim_set)
{
return getline_until( is, str, in_delim_set(delim_set));
}
istream& getline_until( istream& is, string& str, string const& delim_set)
{
return getline_until( is, str, in_delim_set(delim_set));
}
// a simple example predicate functor
struct is_digit : unary_function<char,bool>
{
public:
bool operator()(char c) const {
return ('0' <= c) && (c <= '9');
}
};
int main(int argc, char* argv[]) {
string test;
// treat anything that's not a digit as end-of-line
while (getline_until( cin, test, not1(is_digit()))) {
cout << test << endl;
}
return 0;
}