tokenizing string , accepting everything between given set of characters in CPP

tokenizing string , accepting everything between given set of characters in CPP - c++

I have the following code:
int main()
{
string s = "server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')";
regex re("(\'[!-~]+\')");
sregex_token_iterator i(s.begin(), s.end(), re, 1);
sregex_token_iterator j;
unsigned count = 0;
while(i != j)
{
cout << "the token is "<<*i<< endl;
count++;
}
cout << "There were " << count << " tokens found." << endl;
return 0;
}
Using the above regex, I wanted to extract the string between the paranthesis and single quote:, The out put should look like :
the token is 'm1.labs.teradata.com'
the token is 'use\')r_*5'
the token is 'u" er 5'
the token is 'default'
There were 4 tokens found.
Basically, the regex supposed to extract everything between " (' " and " ') ". It can be anything space , special character, quote or a closing parathesis.
I has earlier used the following regex:
boost::regex re_arg_values("(\'[!-~]+\')");
But is was not accepting space. Please can someone help me out with this. Thanks in advance.

Here's a sample of using Spirit X3 to create grammar to actually parse this. I'd like to parse into a map of (key->value) pairs, which makes a lot more sense than just blindly assuming the names are always the same:
using Config = std::map<std::string, std::string>;
using Entry = std::pair<std::string, std::string>;
Now, we setup some grammar rules using X3:
namespace parser {
using namespace boost::spirit::x3;
auto value = quoted("'") | quoted('"');
auto key = lexeme[+alpha];
auto pair = key >> '(' >> value >> ')';
auto config = skip(space) [ *as<Entry>(pair) ];
}
The helpers as<> and quoted are simple lambdas:
template <typename T> auto as = [](auto p) { return rule<struct _, T> {} = p; };
auto quoted = [](auto q) { return lexeme[q >> *('\\' >> char_ | char_ - q) >> q]; };
Now we can parse the string into a map directly:
Config parse_config(std::string const& cfg) {
Config parsed;
auto f = cfg.begin(), l = cfg.end();
if (!parse(f, l, parser::config, parsed))
throw std::invalid_argument("Parse failed at " + std::string(f,l));
return parsed;
}
And the demo program
int main() {
Config cfg = parse_config("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')");
for (auto& setting : cfg)
std::cout << "Key " << setting.first << " has value " << setting.second << "\n";
}
Prints
Key dbname has value default
Key password has value u" er 5
Key server has value m1.labs.teradata.com
Key username has value use')r_*5
LIVE DEMO
Live On Coliru
#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <map>
using Config = std::map<std::string, std::string>;
using Entry = std::pair<std::string, std::string>;
namespace parser {
using namespace boost::spirit::x3;
template <typename T> auto as = [](auto p) { return rule<struct _, T> {} = p; };
auto quoted = [](auto q) { return lexeme[q >> *(('\\' >> char_) | (char_ - q)) >> q]; };
auto value = quoted("'") | quoted('"');
auto key = lexeme[+alpha];
auto pair = key >> '(' >> value >> ')';
auto config = skip(space) [ *as<Entry>(pair) ];
}
Config parse_config(std::string const& cfg) {
Config parsed;
auto f = cfg.begin(), l = cfg.end();
if (!parse(f, l, parser::config, parsed))
throw std::invalid_argument("Parse failed at " + std::string(f,l));
return parsed;
}
int main() {
Config cfg = parse_config("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')");
for (auto& setting : cfg)
std::cout << "Key " << setting.first << " has value " << setting.second << "\n";
}
Bonus
If you want to learn how to extract the raw input: just try
auto source = skip(space) [ *raw [ pair ] ];
as in this:
using RawSettings = std::vector<std::string>;
RawSettings parse_raw_config(std::string const& cfg) {
RawSettings parsed;
auto f = cfg.begin(), l = cfg.end();
if (!parse(f, l, parser::source, parsed))
throw std::invalid_argument("Parse failed at " + std::string(f,l));
return parsed;
}
int main() {
for (auto& setting : parse_raw_config(text))
std::cout << "Raw: " << setting << "\n";
}
Which prints: Live On Coliru
Raw: server ('m1.labs.teradata.com')
Raw: username ('use\')r_*5')
Raw: password('u" er 5')
Raw: dbname ('default')

Fixing a few syntax and style issues:
you need to escape \ in C strings
you had a " in s, making a syntax error
#include <boost/regex.hpp>
#include <boost/range/iterator_range.hpp>
#include <iostream>
int main() {
std::string s = "server ('m1.labs.teradata.com') username ('use\')r_*5') password('u' er 5') dbname ('default')";
boost::regex re(R"(('([^'\\]*(?:\\[\s\S][^'\\]*)*)'))");
size_t count = 0;
for (auto tok : boost::make_iterator_range(boost::sregex_token_iterator(s.begin(), s.end(), re, 1), {})) {
std::cout << "Token " << ++count << " is " << tok << "\n";
}
}
Prints
Token 1 is 'm1.labs.teradata.com'
Token 2 is 'use'
Token 3 is ') password('
Token 4 is ' er 5'
Token 5 is 'default'

Related

boost spirit x3 match an end of lexeme? [duplicate]

How does one prevent X3 symbol parsers from matching partial tokens? In the example below, I want to match "foo", but not "foobar". I tried throwing the symbol parser in a lexeme directive as one would for an identifier, but then nothing matches.
Thanks for any insights!
#include <string>
#include <iostream>
#include <iomanip>
#include <boost/spirit/home/x3.hpp>
int main() {
boost::spirit::x3::symbols<int> sym;
sym.add("foo", 1);
for (std::string const input : {
"foo",
"foobar",
"barfoo"
})
{
using namespace boost::spirit::x3;
std::cout << "\nParsing " << std::left << std::setw(20) << ("'" + input + "':");
int v;
auto iter = input.begin();
auto end = input.end();
bool ok;
{
// what's right rule??
// this matches nothing
// auto r = lexeme[sym - alnum];
// this matchs prefix strings
auto r = sym;
ok = phrase_parse(iter, end, r, space, v);
}
if (ok) {
std::cout << v << " Remaining: " << std::string(iter, end);
} else {
std::cout << "Parse failed";
}
}
}

Qi used to have distinct in their repository.
X3 doesn't.
The thing that solves it for the case you showed is a simple lookahead assertion:
auto r = lexeme [ sym >> !alnum ];
You could make a distinct helper easily too, e.g.:
auto kw = [](auto p) { return lexeme [ p >> !(alnum | '_') ]; };
Now you can just parse kw(sym).
Live On Coliru
#include <iostream>
#include <boost/spirit/home/x3.hpp>
int main() {
boost::spirit::x3::symbols<int> sym;
sym.add("foo", 1);
for (std::string const input : { "foo", "foobar", "barfoo" }) {
std::cout << "\nParsing '" << input << "': ";
auto iter = input.begin();
auto const end = input.end();
int v = -1;
bool ok;
{
using namespace boost::spirit::x3;
auto kw = [](auto p) { return lexeme [ p >> !(alnum | '_') ]; };
ok = phrase_parse(iter, end, kw(sym), space, v);
}
if (ok) {
std::cout << v << " Remaining: '" << std::string(iter, end) << "'\n";
} else {
std::cout << "Parse failed";
}
}
}
Prints
Parsing 'foo': 1 Remaining: ''
Parsing 'foobar': Parse failed
Parsing 'barfoo': Parse failed

Parsing file input with spirit

I played around with boost::spirit recently and wanted to use it to parse file input. What i got is this: defining some semantic actions:
data = ifstream("herpderp", ios::in);
std::string line;
auto pri = [&](auto &ctx){cout << "got this" << endl;};
auto bri = [&](auto &ctx){cout << "got that" << endl;};
and the actual reading happens like this:
while(getline(data, line, '\n'))
{
bool r = phrase_parse(line.begin(), line.end(), (int_ >> char_ >> int_ >> double_)[pri] | (int_ >> char_ >> int_)[bri], space);
}
However the problem is - I have no idea how to access the contents of _attr(ctx) inside the lambdas pri and bri. I know they work as intended, depending on the contents of the file because of the cout prints (they alternate) - they are however compound type as one can tell from the parsing rules. If anyone can shed some light on this, I'd be grateful.
Edit:
Got this to work the way I wanted it to. It required another import
#include <boost/mpl/int.hpp>
And each of the lambdas looks like this:
auto bri = [&](auto &ctx)
{
int firstIntFromMatch = at<boost::mpl::int_<0>>(_attr(ctx));
char charFromMatch = at<boost::mpl::int_<1>>(_attr(ctx));
int secondIntFromMatch = at<boost::mpl::int_<2>>(_attr(ctx));
doSomething(firstIntFromMatch, charFromMatch, secondIntFromMatch);
};
auto pri = [&](auto &ctx)
{
int firstIntFromMatch = at<boost::mpl::int_<0>>(_attr(ctx));
char charFromMatch = at<boost::mpl::int_<1>>(_attr(ctx));
int secondIntFromMatch = at<boost::mpl::int_<2>>(_attr(ctx));
double doubleFromMatch = at<boost::mpl::int_<3>>(_attr(ctx));
doSomething(firstIntFromMatch, charFromMatch, secondIntFromMatch);
doSomethingElse(doubleFromMatch);
};

I'm with #lakeweb, see also http://stackoverflow.com/questions/8259440/boost-spirit-semantic-actions-are-evil
However to answer your specific question: the attributes are fusion sequences. Including fusion/include/io.hpp enables you to just print them:
auto pri = [&](auto &ctx){std::cout << "got this: " << _attr(ctx) << std::endl;};
auto bri = [&](auto &ctx){std::cout << "got that: " << _attr(ctx) << std::endl;};
Prints
Live On Coliru
got this: (321 a 321 3.14)
Parsed
got that: (432 b 432)
Parsed
Doing Useful Stuff
Doing useful stuff is always more exciting. You could manually take apart these fusion sequences. Defining the simplest data struct I can think of to receive our data:
struct MyData {
int a = 0;
char b = 0;
int c = 0;
double d = 0;
friend std::ostream& operator<<(std::ostream& os, MyData const& md) {
return os << "MyData{" << md.a << "," << md.b << "," << md.c << "," << md.d << "}";
}
};
Now, we can "enhance" (read: complicate) stuff to parse into it:
auto pri = [&](auto &ctx) {
auto& attr = _attr(ctx);
std::cout << "got this: " << attr << std::endl;
using boost::fusion::at_c;
_val(ctx) = { at_c<0>(attr), at_c<1>(attr), at_c<2>(attr), at_c<3>(attr) };
};
auto bri = [&](auto &ctx)
{
auto& attr = _attr(ctx);
std::cout << "got that: " << attr << std::endl;
using boost::fusion::at_c;
_val(ctx) = { at_c<0>(attr), at_c<1>(attr), at_c<2>(attr), std::numeric_limits<double>::infinity()};
};
auto const pri_rule = x3::rule<struct _pri, MyData> {"pri_rule"} =
(x3::int_ >> x3::char_ >> x3::int_ >> x3::double_)[pri];
auto const bri_rule = x3::rule<struct _bri, MyData> {"bri_rule"} =
(x3::int_ >> x3::char_ >> x3::int_)[bri];
And yes, this "works":
Live On Coliru
for(std::string const line : {
"321 a 321 3.14",
"432 b 432"
})
{
MyData data;
bool r = x3::phrase_parse(
line.begin(), line.end(),
pri_rule | bri_rule,
x3::space,
data);
if (r)
std::cout << "Parsed " << data << "\n";
else
std::cout << "Failed\n";
}
Prints
got this: (321 a 321 3.14)
Parsed MyData{321,a,321,3.14}
got that: (432 b 432)
Parsed MyData{432,b,432,inf}
However this seems horribly complicated.
SIMPLIFY!!!
It seems you merely have an optional trailing double_. With a little bit of help:
BOOST_FUSION_ADAPT_STRUCT(MyData, a,b,c,d);
You can have the same effect without any of the mess:
bool r = x3::phrase_parse(
line.begin(), line.end(),
x3::int_ >> x3::char_ >> x3::int_ >> (x3::double_ | x3::attr(9999)),
x3::space, data);
Which would print Live On Coliru
Parsed MyData{321,a,321,3.14}
Parsed MyData{432,b,432,9999}
Optional: Optionality
If you don't have a valid default for the double you could make it an optional:
x3::int_ >> x3::char_ >> x3::int_ >> -x3::double_,
And could still parse it:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted.hpp>
#include <boost/optional/optional_io.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
struct MyData {
int a = 0;
char b = 0;
int c = 0;
boost::optional<double> d;
friend std::ostream& operator<<(std::ostream& os, MyData const& md) {
return os << "MyData{" << md.a << "," << md.b << "," << md.c << "," << md.d << "}";
}
};
BOOST_FUSION_ADAPT_STRUCT(MyData, a,b,c,d)
int main() {
for(std::string const line : { "321 a 321 3.14", "432 b 432" }) {
MyData data;
bool r = x3::phrase_parse(
line.begin(), line.end(),
x3::int_ >> x3::char_ >> x3::int_ >> -x3::double_,
x3::space, data);
if (r)
std::cout << "Parsed " << data << "\n";
else
std::cout << "Failed\n";
}
}
Prints:
Parsed MyData{321,a,321, 3.14}
Parsed MyData{432,b,432,--}

Tokenize a "Braced Initializer List"-Style String in C++ (With Boost?)

I have a string (nested strings even) that are formatted like a C++ braced initializer list. I want to tokenize them one level at a time into a vector of strings.
So when I input "{one, two, three}" to the function should output a three element vector
"one",
"two",
"three"
To complicate this, it needs to support quoted tokens and preserve nested lists:
Input String: "{one, {2, \"three four\"}}, \"five, six\", {\"seven, eight\"}}"
Output is a four element vector:
"one",
"{2, \"three four\"}",
"five, six",
"{\"seven, eight\"}"
I've looked at a few other SO posts:
Using Boost Tokenizer escaped_list_separator with different parameters
Boost split not traversing inside of parenthesis or braces
And used those to start a solution, but this seems slightly too complicated for the tokenizer (because of the braces):
#include <boost/algorithm/string.hpp>
#include <boost/tokenizer.hpp>
std::vector<std::string> TokenizeBracedList(const std::string& x)
{
std::vector<std::string> tokens;
std::string separator1("");
std::string separator2(",\n\t\r");
std::string separator3("\"\'");
boost::escaped_list_separator<char> elements(separator1, separator2, separator3);
boost::tokenizer<boost::escaped_list_separator<char>> tokenizer(x, elements);
for(auto i = std::begin(tokenizer); i != std::end(tokenizer); ++i)
{
auto token = *i;
boost::algorithm::trim(token);
tokens.push_back(token);
}
return tokens;
}
With this, even in the trivial case, it doesn't strip the opening and closing braces.
Boost and C++17 are fair game for a solution.

Simple (Flat) Take
Defining a flat data structure like:
using token = std::string;
using tokens = std::vector<token>;
We can define an X3 parser like:
namespace Parser {
using namespace boost::spirit::x3;
rule<struct list_, token> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = '{' >> (item % ',') >> '}';
auto sublist = raw [ list ];
auto item_def = sublist | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
Live On Wandbox
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
using token = std::string;
using tokens = std::vector<token>;
namespace x3 = boost::spirit::x3;
namespace Parser {
using namespace boost::spirit::x3;
rule<struct list_, token> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = '{' >> (item % ',') >> '}';
auto sublist = raw [ list ];
auto item_def = sublist | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
int main() {
for (std::string const input : {
R"({one, "five, six"})",
R"({one, {2, "three four"}, "five, six", {"seven, eight"}})",
})
{
auto f = input.begin(), l = input.end();
std::vector<std::string> parsed;
bool ok = phrase_parse(f, l, Parser::list, x3::space, parsed);
if (ok) {
std::cout << "Parsed: " << parsed.size() << " elements\n";
for (auto& el : parsed) {
std::cout << " - " << std::quoted(el, '\'') << "\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: " << std::quoted(std::string{f, l}) << "\n";
}
}
Prints
Parsed: 2 elements
- 'one'
- 'five, six'
Parsed: 4 elements
- 'one'
- '{2, "three four"}'
- 'five, six'
- '{"seven, eight"}'
Nested Data
Changing the datastructure to be a bit more specific/realistic:
namespace ast {
using value = boost::make_recursive_variant<
double,
std::string,
std::vector<boost::recursive_variant_>
>::type;
using list = std::vector<value>;
}
Now we can change the grammar, as we no longer need to treat sublist as if it is a string:
namespace Parser {
using namespace boost::spirit::x3;
rule<struct item_, ast::value> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = x3::rule<struct list_, ast::list> {"list" }
= '{' >> (item % ',') >> '}';
auto item_def = list | double_ | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
Everything "still works": Live On Wandbox
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
namespace ast {
using value = boost::make_recursive_variant<
double,
std::string,
std::vector<boost::recursive_variant_>
>::type;
using list = std::vector<value>;
}
namespace x3 = boost::spirit::x3;
namespace Parser {
using namespace boost::spirit::x3;
rule<struct item_, ast::value> item;
auto quoted = lexeme [ '"' >> *('\\' >> char_ | ~char_('"')) >> '"' ];
auto bare = lexeme [ +(graph-','-'}') ];
auto list = x3::rule<struct list_, ast::list> {"list" }
= '{' >> (item % ',') >> '}';
auto item_def = list | double_ | quoted | bare;
BOOST_SPIRIT_DEFINE(item)
}
struct pretty_printer {
using result_type = void;
std::ostream& _os;
int _indent;
pretty_printer(std::ostream& os, int indent = 0) : _os(os), _indent(indent) {}
void operator()(ast::value const& v) { boost::apply_visitor(*this, v); }
void operator()(double v) { _os << v; }
void operator()(std::string s) { _os << std::quoted(s); }
void operator()(ast::list const& l) {
_os << "{\n";
_indent += 2;
for (auto& item : l) {
_os << std::setw(_indent) << "";
operator()(item);
_os << ",\n";
}
_indent -= 2;
_os << std::setw(_indent) << "" << "}";
}
};
int main() {
pretty_printer print{std::cout};
for (std::string const input : {
R"({one, "five, six"})",
R"({one, {2, "three four"}, "five, six", {"seven, eight"}})",
})
{
auto f = input.begin(), l = input.end();
ast::value parsed;
bool ok = phrase_parse(f, l, Parser::item, x3::space, parsed);
if (ok) {
std::cout << "Parsed: ";
print(parsed);
std::cout << "\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: " << std::quoted(std::string{f, l}) << "\n";
}
}
Prints:
Parsed: {
"one",
"five, six",
}
Parsed: {
"one",
{
2,
"three four",
},
"five, six",
{
"seven, eight",
},
}

parsing a single value into an ast node with a container

My problem is the following. I have an ast node which is defined as like the following:
struct foo_node{
std::vector<std::string> value;
}
and I have a parser like this for parsing into the struct, which works fine:
typedef x3::rule<struct foo_node_class, foo_node> foo_node_type;
const foo_node_type foo_node = "foo_node";
auto const foo_node_def = "(" >> +x3::string("bar") >> ")";
Now I want to achieve that the parser also parses "bar", without brackets, but only if its a single bar. I tried to do it like this:
auto const foo_node_def = x3::string("bar")
| "(" > +x3::string("bar") > ")";
but this gives me a compile time error, since x3::string("bar") returns a string and not a std::vector<std::string>.
My question is, how can I achieve, that the x3::string("bar") parser (and every other parser which returns a string) parses into a vector?

The way to parse a single element and expose it as a single-element container attribute is x3::repeat(1) [ p ]:
Live On Coliru
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
struct foo_node {
std::vector<std::string> value;
};
BOOST_FUSION_ADAPT_STRUCT(foo_node, value)
namespace rules {
auto const bar
= x3::string("bar");
auto const foo_node
= '(' >> +bar >> ')'
| x3::repeat(1) [ +bar ]
;
}
int main() {
for (std::string const input : {
"bar",
"(bar)",
"(barbar)",
})
{
auto f = input.begin(), l = input.end();
foo_node data;
bool ok = x3::parse(f, l, rules::foo_node, data);
if (ok) {
std::cout << "Parse success: " << data.value.size() << " elements\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
Prints
Parse success: 1 elements
Parse success: 1 elements
Parse success: 2 elements

Matching and Extracting data using regular expression

Problem: To find a matching string and to extract data from the matched string. There are a number of command strings which has keywords and data.
Command Examples:
Ask name to call me
Notify name that do this action
Message name that request
Keywords: Ask, Notify, Message, to, that. Data:
Input strings:
Ask peter to call me
Notify Jenna that I am going to be away
Message home that I am running late
My problem consists of two problems
1) Find matching command
2) Extract data
Here is what I am doing:
I create multiple regular expressions:
"Ask[[:s:]][[:w:]]+[[:s:]]to[[:s:]][[:w:]]+" or "Ask([^\t\n]+?)to([^\t\n]+?)"
"Notify[[:s:]][[:w:]]+[[:s:]]that[[:s:]][[:w:]]+" or "Notify([^\t\n]+?)that([^\t\n]+?)"
void searchExpression(const char *regString)
{
std::string str;
boost::regex callRegEx(regString, boost::regex_constants::icase);
boost::cmatch im;
while(true) {
std::cout << "Enter String: ";
getline(std::cin, str);
fprintf(stderr, "str %s regstring %s\n", str.c_str(), regString);
if(boost::regex_search(str.c_str(), im, callRegEx)) {
int num_var = im.size() + 1;
fprintf(stderr, "Matched num_var %d\n", num_var);
for(int j = 0; j <= num_var; j++) {
fprintf(stderr, "%d) Found %s\n",j, std::string(im[j]).c_str());
}
}
else {
fprintf(stderr, "Not Matched\n");
}
}
}
I am able to Find a matching string, I am not able to extract the data.
Here is the output:
input_string: Ask peter to call Regex Ask[[:s:]][[:w:]]+[[:s:]]to[[:s:]][[:w:]]+
Matched num_var 2
0) Found Ask peter to call
1) Found
2) Found
I would like to extract peter and call from Ask Peter to call.

Since you're really wanting to parse a grammar, you should consider Boost's parser generator.
You'd simply write the whole thing top-down:
auto sentence = [](auto&& v, auto&& p) {
auto verb = lexeme [ no_case [ as_parser(v) ] ];
auto name = lexeme [ +graph ];
auto particle = lexeme [ no_case [ as_parser(p) ] ];
return confix(verb, particle) [ name ];
};
auto ask = sentence("ask", "to") >> lexeme[+char_];
auto notify = sentence("notify", "that") >> lexeme[+char_];
auto message = sentence("message", "that") >> lexeme[+char_];
auto command = ask | notify | message;
This is a Spirit X3 grammar for it. Read lexeme as "keep whole word" (don't ignore spaces).
Here, "name" is taken to be anything up to the expected particle¹
If you just want to return the raw string matched, this is enough:
Live On Coliru
#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/directive/confix.hpp>
namespace x3 = boost::spirit::x3;
namespace commands {
namespace grammar {
using namespace x3;
auto sentence = [](auto&& v, auto&& p) {
auto verb = lexeme [ no_case [ as_parser(v) ] ];
auto name = lexeme [ +graph ];
auto particle = lexeme [ no_case [ as_parser(p) ] ];
return confix(verb, particle) [ name ];
};
auto ask = sentence("ask", "to") >> lexeme[+char_];
auto notify = sentence("notify", "that") >> lexeme[+char_];
auto message = sentence("message", "that") >> lexeme[+char_];
auto command = ask | notify | message;
auto parser = raw [ skip(space) [ command ] ];
}
}
int main() {
for (std::string const input : {
"Ask peter to call me",
"Notify Jenna that I am going to be away",
"Message home that I am running late",
})
{
std::string matched;
if (parse(input.begin(), input.end(), commands::grammar::parser, matched))
std::cout << "Matched: '" << matched << "'\n";
else
std::cout << "No match in '" << input << "'\n";
}
}
Prints:
Matched: 'Ask peter to call me'
Matched: 'Notify Jenna that I am going to be away'
Matched: 'Message home that I am running late'
BONUS
Of course, you'd actually want to extract the relevant bits of information.
Here's how I'd do that. Let's parse into a struct:
struct Command {
enum class Type { ask, message, notify } type;
std::string name;
std::string message;
};
And let's write our main() as:
commands::Command cmd;
if (parse(input.begin(), input.end(), commands::grammar::parser, cmd))
std::cout << "Matched: " << cmd.type << "|" << cmd.name << "|" << cmd.message << "\n";
else
std::cout << "No match in '" << input << "'\n";
Live On Coliru
#include <iostream>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/directive/confix.hpp>
namespace x3 = boost::spirit::x3;
namespace commands {
struct Command {
enum class Type { ask, message, notify } type;
std::string name;
std::string message;
friend std::ostream& operator<<(std::ostream& os, Type t) { return os << static_cast<int>(t); } // TODO
};
}
BOOST_FUSION_ADAPT_STRUCT(commands::Command, type, name, message)
namespace commands {
namespace grammar {
using namespace x3;
auto sentence = [](auto type, auto&& v, auto&& p) {
auto verb = lexeme [ no_case [ as_parser(v) ] ];
auto name = lexeme [ +graph ];
auto particle = lexeme [ no_case [ as_parser(p) ] ];
return attr(type) >> confix(verb, particle) [ name ];
};
using Type = Command::Type;
auto ask = sentence(Type::ask, "ask", "to") >> lexeme[+char_];
auto notify = sentence(Type::notify, "notify", "that") >> lexeme[+char_];
auto message = sentence(Type::message, "message", "that") >> lexeme[+char_];
auto command // = rule<struct command, Command> { }
= ask | notify | message;
auto parser = skip(space) [ command ];
}
}
int main() {
for (std::string const input : {
"Ask peter to call me",
"Notify Jenna that I am going to be away",
"Message home that I am running late",
})
{
commands::Command cmd;
if (parse(input.begin(), input.end(), commands::grammar::parser, cmd))
std::cout << "Matched: " << cmd.type << "|" << cmd.name << "|" << cmd.message << "\n";
else
std::cout << "No match in '" << input << "'\n";
}
}
Prints
Matched: 0|peter|call me
Matched: 2|Jenna|I am going to be away
Matched: 1|home|I am running late
¹ I'm no English linguist so I don't know whether that is the correct grammatical term :)

This code reads the command strings from the file "commands.txt", searches for the regular expressions and prints the parts whenever there is a match.
#include <iostream>
#include <fstream>
#include <string>
#include <boost/regex.hpp>
const int NumCmdParts = 4;
std::string CommandPartIds[] = {"Verb", "Name", "Preposition", "Content"};
int main(int argc, char *argv[])
{
std::ifstream ifs;
ifs.open ("commands.txt", std::ifstream::in);
if (!ifs.is_open()) {
std::cout << "Error opening file commands.txt" << std::endl;
exit(1);
}
std::string cmdStr;
// Pieces of regular expression pattern
// '(?<Verb>' : This is to name the capture group as 'Verb'
std::string VerbPat = "(?<Verb>(Ask)|(Notify|Message))";
std::string SeparatorPat = "\\s*";
std::string NamePat = "(?<Name>\\w+)";
// Conditional expression. if (Ask) (to) else (that)
std::string PrepositionPat = "(?<Preposition>(?(2)(to)|(that)))";
std::string ContentPat = "(?<Content>.*)";
// Put the pieces together to compose pattern
std::string TotalPat = VerbPat + SeparatorPat + NamePat + SeparatorPat
+ PrepositionPat + SeparatorPat + ContentPat;
boost::regex actions_re(TotalPat);
boost::smatch action_match;
while (getline(ifs, cmdStr)) {
bool IsMatch = boost::regex_search(cmdStr, action_match, actions_re);
if (IsMatch) {
for (int i=1; i <= NumCmdParts; i++) {
std::cout << CommandPartIds[i-1] << ": " << action_match[CommandPartIds[i-1]] << "\n";
}
}
}
ifs.close();
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

tokenizing string , accepting everything between given set of characters in CPP - c++

Related

boost spirit x3 match an end of lexeme? [duplicate]

Parsing file input with spirit

Tokenize a "Braced Initializer List"-Style String in C++ (With Boost?)

parsing a single value into an ast node with a container

Matching and Extracting data using regular expression

Categories

Resources