Extracting Values from string using spirit parser

Extracting Values from string using spirit parser - c++

i have following line
/90pv-RKSJ-UCS2C usecmap
std::string const line = "/90pv-RKSJ-UCS2C usecmap";
auto first = line.begin(), last = line.end();
std::string label, token;
bool ok = qi::phrase_parse(
first, last,
qi::lexeme [ "/" >> +~qi::char_(" ") ] >> ' ' >> qi::lexeme[+~qi::char_(' ')] , qi::space, label, token);
if (ok)
std::cout << "Parse success: label='" << label << "', token='" << token << "'\n";
else
std::cout << "Parse failed\n";
if (first!=last)
std::cout << "Remaining unparsed input: '" << std::string(first, last) << "'\n";
I want to 90pv-RKSJ-UCS2C in label and usecmap in token variable.
I extract 90pv-RKSJ-UCS2C value but not usecmap

With space the skipper, you cannot ever match ' ' (it is skipped!). See also: Boost spirit skipper issues
So, either don't use a skipper, or allow the skipper to eat it:
bool ok = qi::phrase_parse(
first, last,
qi::lexeme [ "/" >> +qi::graph ] >> qi::lexeme[+qi::graph], qi::blank, label, token);
Notes:
I used qi::graph instead of the ~qi::char_(" ") formulation
I used blank_type because you said
i have following line
Which implies that line-ends should not be skipped
Demo
Live On Coliru
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main()
{
std::string const line = "/90pv-rksj-ucs2c usecmap";
auto first = line.begin(), last = line.end();
std::string label, token;
bool ok = qi::phrase_parse(
first, last,
qi::lexeme [ "/" >> +qi::graph ] >> qi::lexeme[+qi::graph], qi::blank, label, token);
if (ok)
std::cout << "parse success: label='" << label << "', token='" << token << "'\n";
else
std::cout << "parse failed\n";
if (first!=last)
std::cout << "remaining unparsed input: '" << std::string(first, last) << "'\n";
}
Prints:
parse success: label='90pv-rksj-ucs2c', token='usecmap'

If you are using C++11, I suggest using regular expression.
#include <iostream>
#include <regex>
using namespace std;
int main() {
regex re("^/([^\\s]*)\\s([^\\s]*)"); // 1st () captures
// 90pv-RKSJ-UCS2C and 2nd () captures usecmap
smatch sm;
string s="/90pv-RKSJ-UCS2C usecmap";
regex_match(s,sm,re);
for(int i=0;i<sm.size();i++) {
cout<<sm[i]<<endl;
}
string label=sm[1],token=sm[2];
system("pause");
}

Related

Boost Spirit (x3) failing to consume last token when parsing character escapes

Using boost spirit x3 to parse escaped ascii strings I came across this answer but am getting an expectation exception. I have changed the expectation operator in the original to the sequence operator to disable the exception in the code below. Running the code it parses the input and assigns the correct value to the attribute but returns false and is not consuming the input. Any ideas what I've done wrong here?
gcc version 10.3.0
boost 1.71
std = c++17
#include <boost/spirit/home/x3.hpp>
#include <string>
#include <iostream>
namespace x3 = boost::spirit::x3;
using namespace std::string_literals;
//changed expectation to sequence
auto const qstring = x3::lexeme['"' >> *(
"\\n" >> x3::attr('\n')
| "\\b" >> x3::attr('\b')
| "\\f" >> x3::attr('\f')
| "\\t" >> x3::attr('\t')
| "\\v" >> x3::attr('\v')
| "\\0" >> x3::attr('\0')
| "\\r" >> x3::attr('\r')
| "\\n" >> x3::attr('\n')
| "\\" >> x3::char_("\"\\")
| "\\\"" >> x3::char_('"')
| ~x3::char_('"')
) >> '"'];
int main(int, char**){
auto const quoted = "\"Hel\\\"lo Wor\\\"ld"s;
auto const expected = "Hel\"lo Wor\"ld"s;
std::string result;
auto first = quoted.begin();
auto const last = quoted.end();
bool ok = x3::phrase_parse(first, last, qstring, x3::ascii::space, result);
std::cout << "parse returned " << std::boolalpha << ok << '\n';
std::cout << result << " == " << expected << " is " << std::boolalpha << (result == expected) << '\n';
std::cout << "first == last = " << (first == last) << '\n';
std::cout << "first = " << *first << '\n';
return 0;
}

Your input isn't terminated with a quote character. Writing it as a raw string literal helps:
std::string const qinput = R"("Hel\"lo Wor\"ld)";
Should be
std::string const qinput = R"("Hel\"lo Wor\"ld")";
Now, the rest is common container handling: in Spirit, when a rule fails (also when it just backtracks a branch) the container attribute is not rolled back. See e.g. boost::spirit::qi duplicate parsing on the output, Understanding Boost.spirit's string parser, etc.
Basically, you cannot rely on the result if the parse failed. This is likely why the original had an expectation point: to raise an exception.
A full demonstration of the correct working:
Live On Coliru
#include <boost/spirit/home/x3.hpp>
#include <string>
#include <iostream>
#include <iomanip>
namespace x3 = boost::spirit::x3;
auto escapes = []{
x3::symbols<char> sym;
sym.add
("\\b", '\b')
("\\f", '\f')
("\\t", '\t')
("\\v", '\v')
("\\0", '\0')
("\\r", '\r')
("\\n", '\n')
("\\\\", '\\')
("\\\"", '"')
;
return sym;
}();
auto const qstring = x3::lexeme['"' >> *(escapes | ~x3::char_('"')) >> '"'];
int main(){
auto squote = [](std::string_view s) { return std::quoted(s, '\''); };
std::string const expected = R"(Hel"lo Wor"ld)";
for (std::string const qinput : {
R"("Hel\"lo Wor\"ld)", // oops no closing quote
R"("Hel\"lo Wor\"ld")",
"\"Hel\\\"lo Wor\\\"ld\"", // if you insist
R"("Hel\"lo Wor\"ld" trailing data)",
})
{
std::cout << "\n -- input " << squote(qinput) << "\n";
std::string result;
auto first = cbegin(qinput);
auto last = cend(qinput);
bool ok = x3::phrase_parse(first, last, qstring, x3::space, result);
ok &= (first == last);
std::cout << "parse returned " << std::boolalpha << ok << "\n";
std::cout << squote(result) << " == " << squote(expected) << " is "
<< (result == expected) << "\n";
if (first != last)
std::cout << "Remaining input unparsed: " << squote({first, last})
<< "\n";
}
}
Prints
-- input '"Hel\\"lo Wor\\"ld'
parse returned false
'Hel"lo Wor"ld' == 'Hel"lo Wor"ld' is true
Remaining input unparsed: '"Hel\\"lo Wor\\"ld'
-- input '"Hel\\"lo Wor\\"ld"'
parse returned true
'Hel"lo Wor"ld' == 'Hel"lo Wor"ld' is true
-- input '"Hel\\"lo Wor\\"ld"'
parse returned true
'Hel"lo Wor"ld' == 'Hel"lo Wor"ld' is true
-- input '"Hel\\"lo Wor\\"ld" trailing data'
parse returned false
'Hel"lo Wor"ld' == 'Hel"lo Wor"ld' is true
Remaining input unparsed: 'trailing data'

boost spirit x3 match an end of lexeme? [duplicate]

How does one prevent X3 symbol parsers from matching partial tokens? In the example below, I want to match "foo", but not "foobar". I tried throwing the symbol parser in a lexeme directive as one would for an identifier, but then nothing matches.
Thanks for any insights!
#include <string>
#include <iostream>
#include <iomanip>
#include <boost/spirit/home/x3.hpp>
int main() {
boost::spirit::x3::symbols<int> sym;
sym.add("foo", 1);
for (std::string const input : {
"foo",
"foobar",
"barfoo"
})
{
using namespace boost::spirit::x3;
std::cout << "\nParsing " << std::left << std::setw(20) << ("'" + input + "':");
int v;
auto iter = input.begin();
auto end = input.end();
bool ok;
{
// what's right rule??
// this matches nothing
// auto r = lexeme[sym - alnum];
// this matchs prefix strings
auto r = sym;
ok = phrase_parse(iter, end, r, space, v);
}
if (ok) {
std::cout << v << " Remaining: " << std::string(iter, end);
} else {
std::cout << "Parse failed";
}
}
}

Qi used to have distinct in their repository.
X3 doesn't.
The thing that solves it for the case you showed is a simple lookahead assertion:
auto r = lexeme [ sym >> !alnum ];
You could make a distinct helper easily too, e.g.:
auto kw = [](auto p) { return lexeme [ p >> !(alnum | '_') ]; };
Now you can just parse kw(sym).
Live On Coliru
#include <iostream>
#include <boost/spirit/home/x3.hpp>
int main() {
boost::spirit::x3::symbols<int> sym;
sym.add("foo", 1);
for (std::string const input : { "foo", "foobar", "barfoo" }) {
std::cout << "\nParsing '" << input << "': ";
auto iter = input.begin();
auto const end = input.end();
int v = -1;
bool ok;
{
using namespace boost::spirit::x3;
auto kw = [](auto p) { return lexeme [ p >> !(alnum | '_') ]; };
ok = phrase_parse(iter, end, kw(sym), space, v);
}
if (ok) {
std::cout << v << " Remaining: '" << std::string(iter, end) << "'\n";
} else {
std::cout << "Parse failed";
}
}
}
Prints
Parsing 'foo': 1 Remaining: ''
Parsing 'foobar': Parse failed
Parsing 'barfoo': Parse failed

How to write a boost::spirit::qi parser to parse an integer range from 0 to std::numeric_limits<int>::max()?

I tried to use qi::uint_parser<int>(). But it is the same like qi::uint_. They all match integers range from 0 to std::numeric_limits<unsigned int>::max().
Is qi::uint_parser<int>() designed to be like this? What parser shall I use to match an integer range from 0 to std::numeric_limits<int>::max()? Thanks.

Simplest demo, attaching a semantic action to do the range check:
uint_ [ _pass = (_1>=0 && _1<=std::numeric_limits<int>::max()) ];
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
template <typename It>
struct MyInt : boost::spirit::qi::grammar<It, int()> {
MyInt() : MyInt::base_type(start) {
using namespace boost::spirit::qi;
start %= uint_ [ _pass = (_1>=0 && _1<=std::numeric_limits<int>::max()) ];
}
private:
boost::spirit::qi::rule<It, int()> start;
};
template <typename Int>
void test(Int value, char const* logical) {
MyInt<std::string::const_iterator> p;
std::string const input = std::to_string(value);
std::cout << " ---------------- Testing '" << input << "' (" << logical << ")\n";
auto f = input.begin(), l = input.end();
int parsed;
if (parse(f, l, p, parsed)) {
std::cout << "Parse success: " << parsed << "\n";
} else {
std::cout << "Parse failed\n";
}
if (f!=l) {
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
int main() {
unsigned maxint = std::numeric_limits<int>::max();
MyInt<std::string::const_iterator> p;
test(maxint , "maxint");
test(maxint-1, "maxint-1");
test(maxint+1, "maxint+1");
test(0 , "0");
test(-1 , "-1");
}
Prints
---------------- Testing '2147483647' (maxint)
Parse success: 2147483647
---------------- Testing '2147483646' (maxint-1)
Parse success: 2147483646
---------------- Testing '2147483648' (maxint+1)
Parse failed
Remaining unparsed: '2147483648'
---------------- Testing '0' (0)
Parse success: 0
---------------- Testing '-1' (-1)
Parse failed
Remaining unparsed: '-1'

How to extract trimmed text using Boost Spirit?

Using boost spirit, I'd like to extract a string that is followed by some data in parentheses. The relevant string is separated by a space from the opening parenthesis. Unfortunately, the string itself may contain spaces. I'm looking for a concise solution that returns the string without a trailing space.
The following code illustrates the problem:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <string>
#include <iostream>
namespace qi = boost::spirit::qi;
using std::string;
using std::cout;
using std::endl;
void
test_input(const string &input)
{
string::const_iterator b = input.begin();
string::const_iterator e = input.end();
string parsed;
bool const r = qi::parse(b, e,
*(qi::char_ - qi::char_("(")) >> qi::lit("(Spirit)"),
parsed
);
if(r) {
cout << "PASSED:" << endl;
} else {
cout << "FAILED:" << endl;
}
cout << " Parsed: \"" << parsed << "\"" << endl;
cout << " Rest: \"" << string(b, e) << "\"" << endl;
}
int main()
{
test_input("Fine (Spirit)");
test_input("Hello, World (Spirit)");
return 0;
}
Its output is:
PASSED:
Parsed: "Fine "
Rest: ""
PASSED:
Parsed: "Hello, World "
Rest: ""
With this simple grammar, the extracted string is always followed by a space (that I 'd like to eliminate).
The solution should work within Spirit since this is only part of a larger grammar. (Thus, it would probably be clumsy to trim the extracted strings after parsing.)
Thank you in advance.

Like the comment said, in the case of a single space, you can just hard code it. If you need to be more flexible or tolerant:
I'd use a skipper with raw to "cheat" the skipper for your purposes:
bool const r = qi::phrase_parse(b, e,
qi::raw [ *(qi::char_ - qi::char_("(")) ] >> qi::lit("(Spirit)"),
qi::space,
parsed
);
This works, and prints
PASSED:
Parsed: "Fine"
Rest: ""
PASSED:
Parsed: "Hello, World"
Rest: ""
See it Live on Coliru
Full program for reference:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <string>
#include <iostream>
namespace qi = boost::spirit::qi;
using std::string;
using std::cout;
using std::endl;
void
test_input(const string &input)
{
string::const_iterator b = input.begin();
string::const_iterator e = input.end();
string parsed;
bool const r = qi::phrase_parse(b, e,
qi::raw [ *(qi::char_ - qi::char_("(")) ] >> qi::lit("(Spirit)"),
qi::space,
parsed
);
if(r) {
cout << "PASSED:" << endl;
} else {
cout << "FAILED:" << endl;
}
cout << " Parsed: \"" << parsed << "\"" << endl;
cout << " Rest: \"" << string(b, e) << "\"" << endl;
}
int main()
{
test_input("Fine (Spirit)");
test_input("Hello, World (Spirit)");
return 0;
}

Simple parser and generator

I need to parse and generate some texts from and to c++ objects.
The syntax is:
command #param #param #param
There is set of commands some of them have no params etc.
Params are mainly numbers.
The question is: Should I use Boost Spirit for this task? Or just simply tokenize each line evaluate function to call from string compare with command, read additional parameters and create cpp object from it?
If you suggest using Spirit or any other solution it would be nice if you could provide some examples similiar to my problem. I've read and tried all examples from Boost Spirit doc.

I implemented more or less precisely this in a previous answer to the question " Using boost::bind with boost::function: retrieve binded variable type ".
The complete working sample program (which expects a very similar grammar) using Boost Spirit is here: https://gist.github.com/1314900. You'd just want to remove the /execute literals for your grammar, so edit Line 41 from
if (!phrase_parse(f,l, "/execute" > (
to
if (!phrase_parse(f,l, (
The example script
WriteLine "bogus"
Write "here comes the answer: "
Write 42
Write 31415e-4
Write "that is the inverse of" 24 "and answers nothing"
Shutdown "Bye" 9
Shutdown "Test default value for retval"
Now results in the following output after execution:
WriteLine('bogus');
Write(string: 'here comes the answer: ');
Write(double: 42);
Write(double: 3.1415);
Write(string: 'that is the inverse of');
Write(double: 24);
Write(string: 'and answers nothing');
Shutdown(reason: 'Bye', retval: 9)
Shutdown(reason: 'Test default value for retval', retval: 0)
Full Code
For archival purposes:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <fstream>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
///////////////////////////////////
// 'domain classes' (scriptables)
struct Echo
{
void WriteLine(const std::string& s) { std::cout << "WriteLine('" << s << "');" << std::endl; }
void WriteStr (const std::string& s) { std::cout << "Write(string: '" << s << "');" << std::endl; }
void WriteInt (int i) { std::cout << "Write(int: " << i << ");" << std::endl; }
void WriteDbl (double d) { std::cout << "Write(double: " << d << ");" << std::endl; }
void NewLine () { std::cout << "NewLine();" << std::endl; }
} echoService;
struct Admin
{
void Shutdown(const std::string& reason, int retval)
{
std::cout << "Shutdown(reason: '" << reason << "', retval: " << retval << ")" << std::endl;
// exit(retval);
}
} adminService;
void execute(const std::string& command)
{
typedef std::string::const_iterator It;
It f(command.begin()), l(command.end());
using namespace qi;
using phx::bind;
using phx::ref;
rule<It, std::string(), space_type> stringlit = lexeme[ '"' >> *~char_('"') >> '"' ];
try
{
if (!phrase_parse(f,l, /*"/execute" >*/ (
(lit("WriteLine")
> stringlit [ bind(&Echo::WriteLine, ref(echoService), _1) ])
| (lit("Write") >> +(
double_ [ bind(&Echo::WriteDbl, ref(echoService), _1) ] // the order matters
| int_ [ bind(&Echo::WriteInt, ref(echoService), _1) ]
| stringlit [ bind(&Echo::WriteStr, ref(echoService), _1) ]
))
| (lit("NewLine") [ bind(&Echo::NewLine, ref(echoService)) ])
| (lit("Shutdown") > (stringlit > (int_ | attr(0)))
[ bind(&Admin::Shutdown, ref(adminService), _1, _2) ])
), space))
{
if (f!=l) // allow whitespace only lines
std::cerr << "** (error interpreting command: " << command << ")" << std::endl;
}
}
catch (const expectation_failure<It>& e)
{
std::cerr << "** (unexpected input '" << std::string(e.first, std::min(e.first+10, e.last)) << "') " << std::endl;
}
if (f!=l)
std::cerr << "** (warning: skipping unhandled input '" << std::string(f,l) << "')" << std::endl;
}
int main()
{
std::ifstream ifs("input.txt");
std::string command;
while (std::getline(ifs/*std::cin*/, command))
execute(command);
}

For simple formatted, easily tested input, tokenizing should be enough.
When tokenizing, you can read a line from the input and put that in a stringstream (iss). From iss, you read the first word and pass that to a command factory which creates the right command for you. Then you can pass iss to the readInParameters function of the new command, so each command can parse it own parameters and check whether all parameters are valid.
Not tested code-sample:
std::string line;
std::getline(inputStream, line);
std::istringstream iss(line);
std::string strCmd;
iss >> strCmd;
try
{
std::unique_ptr<Cmd> newCmd = myCmdFactory(strCmd);
newCmd->readParameters(iss);
newCmd->execute();
//...
}
catch (std::exception& e)
{
std::cout << "Issue with received command: " << e.what() << "\n";
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extracting Values from string using spirit parser - c++

Related

Boost Spirit (x3) failing to consume last token when parsing character escapes

boost spirit x3 match an end of lexeme? [duplicate]

How to write a boost::spirit::qi parser to parse an integer range from 0 to std::numeric_limits<int>::max()?

How to extract trimmed text using Boost Spirit?

Simple parser and generator

Categories

Resources