After some delay I'm now again trying to parse some ASCII text file
surrounded by some binary characters.
Parsing text file with binary envelope using boost Spririt
However I'm now struggling if a skip parser is the right approach?
The grammar of the file (it's a JEDEC file) is quite simple:
Each data field in the file starts with a single letter and ends with an asterisk. The data field can contain spaces and carriage return.
After the asterisk spaces and carriage return might follow too before the
next field identifier.
This is what I used to start building a parser for such a file:
phrase_parse(first, last,
// First char in File
char_('\x02') >>
// Data field
*((print[cout << _1] | graph[cout << _1]) - char_('*')) >>
// End of data followed by 4 digit hexnumber. How to limit?
char_('\x03') >> *xdigit,
// Skip asterisks
char_('*') );
Unfortunately I don't get any output from this one. Does someone have an idea what might be wrong?
Sample file:
<STX>
JEDEC file generated by John Doe*
DM SIGNETICS(PHILIPS)*
DD GAL16R8*
QP20*
QV0*
G0*F0*
L00000 1110101111100110111101101110111100111111*
CDEAD*
<ETX>BEEF
and this is what I want to achive:
Start: JEDEC file generated by John Doe
D: M SIGNETICS(PHILIPS)
D: D GAL16R8
Q: P20
Q: V0
G: 0
F: 0
L: 00000 1110101111100110111101101110111100111111
C: DEAD
End: BEEF
I would suggest you want to use a skipper at the toplevel rule only. And use it to skip the insignificant whitespace.
You don't use a skipper for the asterisks because you do not want to ignore them. If they're ignored, your rules cannot act upon them.
Furthermore the inner rules should not use the space skipper for the simple reason that whitespace and linefeeds are valid field data in JEDEC.
So, the upshot of all this would be:
value = *(ascii::char_("\x20-\x7e\r\n") - '*') >> '*';
field = ascii::graph >> value;
start = STX >> value >> *field >> ETX >> xmit_checksum;
Where the rules would be declared with the respective skippers:
qi::uint_parser<uint16_t, 16, 4, 4> xmit_checksum;
qi::rule<It, ascii::space_type> start;
qi::rule<It> field, value; // no skippers - they are lexemes
Take-away: Split your grammar up in rules. Be happier for it.
Processing the results
Your sample needlessly mixes responsibilities for parsing and "printing".
I'd suggest not using semantic actions here (Boost Spirit: "Semantic actions are evil"?).
Instead, declare appropriate attribute types:
struct JEDEC {
std::string caption;
struct field {
char id;
std::string value;
};
std::vector<field> fields;
uint16_t checksum;
};
And declare them in your rules:
qi::rule<It, ast::JEDEC(), ascii::space_type> start;
qi::rule<It, ast::JEDEC::field()> field;
qi::rule<It, std::string()> value;
qi::uint_parser<uint16_t, 16, 4, 4> xmit_checksum;
Now, nothing needs to be changed in your grammar, and you can print the desired output with:
inline static std::ostream& operator<<(std::ostream& os, JEDEC const& jedec) {
os << "Start: " << jedec.caption << "\n";
for(auto& f : jedec.fields)
os << f.id << ": " << f.value << "\n";
auto saved = os.rdstate();
os << "End: " << std::hex << std::setw(4) << std::setfill('0') << jedec.checksum;
os.setstate(saved);
return os;
}
LIVE DEMO
Here's a demo program that ties it together using the sample input from your question:
Live On Coliru
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
namespace ascii = qi::ascii;
namespace ast {
struct JEDEC {
std::string caption;
struct field {
char id;
std::string value;
};
std::vector<field> fields;
uint16_t checksum;
};
inline static std::ostream& operator<<(std::ostream& os, JEDEC const& jedec) {
os << "Start: " << jedec.caption << "\n";
for(auto& f : jedec.fields)
os << f.id << ": " << f.value << "\n";
auto saved = os.rdstate();
os << "End: " << std::hex << std::setw(4) << std::setfill('0') << std::uppercase << jedec.checksum;
os.setstate(saved);
return os;
}
}
BOOST_FUSION_ADAPT_STRUCT(ast::JEDEC::field,
(char, id)(std::string, value))
BOOST_FUSION_ADAPT_STRUCT(ast::JEDEC,
(std::string, caption)
(std::vector<ast::JEDEC::field>, fields)
(uint16_t, checksum))
template <typename It>
struct JedecGrammar : qi::grammar<It, ast::JEDEC(), ascii::space_type>
{
JedecGrammar() : JedecGrammar::base_type(start) {
const char STX = '\x02';
const char ETX = '\x03';
value = *(ascii::char_("\x20-\x7e\r\n") - '*') >> '*';
field = ascii::graph >> value;
start = STX >> value >> *field >> ETX >> xmit_checksum;
BOOST_SPIRIT_DEBUG_NODES((start)(field)(value))
}
private:
qi::rule<It, ast::JEDEC(), ascii::space_type> start;
qi::rule<It, ast::JEDEC::field()> field;
qi::rule<It, std::string()> value;
qi::uint_parser<uint16_t, 16, 4, 4> xmit_checksum;
};
int main() {
typedef boost::spirit::istream_iterator It;
It first(std::cin>>std::noskipws), last;
JedecGrammar<It> g;
ast::JEDEC jedec;
bool ok = phrase_parse(first, last, g, ascii::space, jedec);
if (ok)
{
std::cout << "Parse success\n";
std::cout << jedec;
}
else
std::cout << "Parse failed\n";
if (first != last)
std::cout << "Remaining input unparsed: '" << std::string(first, last) << "'\n";
}
Output:
Start: JEDEC file generated by John Doe
D: M SIGNETICS(PHILIPS)
D: D GAL16R8
Q: P20
Q: V0
G: 0
F: 0
L: 00000 1110101111100110111101101110111100111111
C: DEAD
End: BEEF
Take-away: See your dentist twice a year.
Related
I'm basing my app off this example and getting the exact same results. For some reason, the contents of the input string are all parsed into the fusion struct 'comments', and nothing is parsed into the fusion struct 'numbers'. So not sure where I'm going wrong here.
namespace client {
namespace ast {
struct number {
int num1;
int num2;
};
struct comment {
std::string text;
bool dummy;
};
struct input {
std::vector<comment> comments;
std::vector<number> numbers;
};
}
}
BOOST_FUSION_ADAPT_STRUCT(client::ast::comment, text, dummy)
BOOST_FUSION_ADAPT_STRUCT(client::ast::number, num1, num2)
BOOST_FUSION_ADAPT_STRUCT(client::ast::input, comments, numbers)
namespace client {
namespace parser {
namespace x3 = boost::spirit::x3;
using namespace x3;
x3::attr_gen dummy;
typedef std::string::const_iterator It;
using namespace x3;
auto const comment = *(char_ - eol) >> dummy(false);
auto const number = int_ >> int_;
auto lines = [](auto p) { return *(p >> eol); };
auto const input =
lines(comment) >>
lines(number);
}
}
int main()
{
namespace x3 = boost::spirit::x3;
std::string const iss("any char string here\n1 2\n");
auto iter = iss.begin(), eof = iss.end();
client::ast::input types;
bool ok = parse(iter, eof, client::parser::input, types);
if (iter != eof) {
std::cout << "Remaining unparsed: '" << std::string(iter, eof) << "'\n";
}
std::cout << "Parsed: " << (100.0 * std::distance(iss.begin(), iter) / iss.size()) << "%\n";
std::cout << "ok = " << ok << std::endl;
// This range loop prints all contents if input.
for (auto& item : types.comments) { std::cout << "comment: " << boost::fusion::as_deque(item) << "\n"; }
// This loop prints nothing.
for (auto& item : types.numbers) { std::cout << "number: " << boost::fusion::as_deque(item) << "\n"; }
}
My larger application does the same with a large input file and several more AST's, yet it would seem all my examples are consumed by the comment parser.
Here's the complete running example.
http://coliru.stacked-crooked.com/a/f983b26d673305a0
Thoughts?
You took the grammar idea from my answer here: X3, how to populate a more complex AST?
There it worked because the line formats are not ambiguous. In fact the "variant" approach you had required special attention, and I noted that in this bullet:
departments need to be ordered before teams, or you get "team" matched instead of departments
The same kind of ambiguity exists in your grammar. *(char_ - eol) matches "1 2" just fine, so obviously it is added as a comment. You will have to disambiguate the grammar or somehow force the switch to "parse number lines now" mode.
If you wholly don't care what precedes the number lines, just use x3::seek [ lines(number) ].
I'm trying to understand how boost spirit assign_to_* customization points work.
Here is an exemple I am using:
I have this parser in a rule in a grammar:
int_ >> lit(':') >> char_
And I want the result to be put in this struct:
struct IntAndChar{
int i;
char c;
};
(This is just an exemple to use the customization point so I won't use the BOOST_FUSION_ADAPT_STRUCT or semantic actions.)
I thought I could just define a specialization of assign_to_attribute_from_value but I only get the int this way and the second element is dropped.
Can someone give me a hint to understand how it works?
You don't want to assign to the attribute¹. Instead you wish to transform boost::fusion::vector2<int, char> into IntAndChar.
Therefore, let's start off telling spirit our type is not container-like:
template<>
struct is_container<IntAndChar, void> : mpl::false_ { };
Next, tell it how it can transform e between raw and cooked forms of our attributes:
template<>
struct transform_attribute<IntAndChar, fusion::vector2<int, char>, qi::domain, void> {
using Transformed = fusion::vector2<int, char>;
using Exposed = IntAndChar;
using type = Transformed;
static Transformed pre(Exposed&) { return Transformed(); }
static void post(Exposed& val, Transformed const& attr) {
val.i = fusion::at_c<0>(attr);
val.c = fusion::at_c<1>(attr);
}
static void fail(Exposed&) {}
};
That's it! There is one catch though. It won't work unless you trigger a transformation. The docs say:
It is invoked by Qi rule, semantic action and attr_cast, [...]
1. Using qi::rule (not very helpful)
So here's a solution using rule:
Live On Coliru
int main() {
using It = std::string::const_iterator;
qi::rule<It, boost::fusion::vector2<int, char>(), qi::space_type> rule = qi::int_ >> ':' >> qi::char_;
//qi::rule<It, IntAndChar(), qi::space_type> rule = qi::attr_cast(qi::int_ >> ':' >> qi::char_);
for (std::string const input : { "123:a", "-4 : \r\nq" }) {
It f = input.begin(), l = input.end();
IntAndChar data;
bool ok = qi::phrase_parse(f, l, rule, qi::space, data);
if (ok) std::cout << "Parse success: " << data.i << ", " << data.c << "\n";
else std::cout << "Parse failure ('" << input << "')\n";
if (f != l) std::cout << "Remaining unparsed input: '" << std::string(f, l) << "'\n";
}
}
Prints:
Parse success: 123, a
Parse success: -4, q
Of course this approach requires you to spell out boost::fusion::vector2<int, char> which is tedious and error-prone.
2. Using qi::attr_cast
You can use qi::attr_cast to trigger the transform:
qi::rule<It, IntAndChar(), qi::space_type> rule = qi::attr_cast<IntAndChar, boost::fusion::vector2<int, char> >(qi::int_ >> ':' >> qi::char_);
// using deduction:
qi::rule<It, IntAndChar(), qi::space_type> rule = qi::attr_cast<IntAndChar>(qi::int_ >> ':' >> qi::char_);
// using even more deduction:
qi::rule<It, IntAndChar(), qi::space_type> rule = qi::attr_cast(qi::int_ >> ':' >> qi::char_);
CAVEAT That should work. However, due to very subtle behaviour (bugs?) you need to deep-copy the Proto expression tree there, in order for it to work without Undefined Behaviour:
qi::rule<It, IntAndChar(), qi::space_type> rule = qi::attr_cast(qi::copy(qi::int_ >> ':' >> qi::char_));
Bringing it all together, we can even do without the qi::rule:
Live On Coliru
int main() {
using It = std::string::const_iterator;
for (std::string const input : { "123:a", "-4 : \r\nq" }) {
It f = input.begin(), l = input.end();
IntAndChar data;
bool ok = qi::phrase_parse(f, l, qi::attr_cast(qi::copy(qi::int_ >> ':' >> qi::char_)), qi::space, data);
if (ok) std::cout << "Parse success: " << data.i << ", " << data.c << "\n";
else std::cout << "Parse failure ('" << input << "')\n";
if (f != l) std::cout << "Remaining unparsed input: '" << std::string(f, l) << "'\n";
}
}
Prints
Parse success: 123, a
Parse success: -4, q
¹ (unless you want to treat IntAndChar as a container, which is a different story)
I have a file which contains some "entity" data in Valve's format. It's basically a key-value deal, and it looks like this:
{
"world_maxs" "3432 4096 822"
"world_mins" "-2408 -4096 -571"
"skyname" "sky_alpinestorm_01"
"maxpropscreenwidth" "-1"
"detailvbsp" "detail_sawmill.vbsp"
"detailmaterial" "detail/detailsprites_sawmill"
"classname" "worldspawn"
"mapversion" "1371"
"hammerid" "1"
}
{
"origin" "553 -441 322"
"targetname" "tonemap_global"
"classname" "env_tonemap_controller"
"hammerid" "90580"
}
Each pair of {} counts as one entity, and the rows inside count as KeyValues. As you can see, it's fairly straightforward.
I want to process this data into a vector<map<string, string> > in C++. To do this, I've tried using regular expressions that come with Boost. Here is what I have so far:
static const boost::regex entityRegex("\\{(\\s*\"([A-Za-z0-9_]+)\"\\s*\"([^\"]+)\")+\\s*\\}");
boost::smatch what;
while (regex_search(entitiesString, what, entityRegex)) {
cout << what[0] << endl;
cout << what[1] << endl;
cout << what[2] << endl;
cout << what[3] << endl;
break; // TODO
}
Easier-to-read regex:
\{(\s*"([A-Za-z0-9_]+)"\s*"([^"]+)")+\s*\}
I'm not sure the regex is well-formed for my problem yet, but it seems to print the last key-value pair (hammerid, 1) at least.
My question is, how would I go about extracting the "nth" matched subexpression within an expression? Or is there not really a practical way to do this? Would it perhaps be better to write two nested while-loops, one which searches for the {} patterns, and then one which searches for the actual key-value pairs?
Thanks!
Using a parser generator you can code a proper parser.
For example, using Boost Spirit you can define the rules of the grammar inline as C++ expressions:
start = *entity;
entity = '{' >> *entry >> '}';
entry = text >> text;
text = '"' >> *~char_('"') >> '"';
Here's a full demo:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <map>
using Entity = std::map<std::string, std::string>;
using ValveData = std::vector<Entity>;
namespace qi = boost::spirit::qi;
template <typename It, typename Skipper = qi::space_type>
struct Grammar : qi::grammar<It, ValveData(), Skipper>
{
Grammar() : Grammar::base_type(start) {
using namespace qi;
start = *entity;
entity = '{' >> *entry >> '}';
entry = text >> text;
text = '"' >> *~char_('"') >> '"';
BOOST_SPIRIT_DEBUG_NODES((start)(entity)(entry)(text))
}
private:
qi::rule<It, ValveData(), Skipper> start;
qi::rule<It, Entity(), Skipper> entity;
qi::rule<It, std::pair<std::string, std::string>(), Skipper> entry;
qi::rule<It, std::string()> text;
};
int main()
{
using It = boost::spirit::istream_iterator;
Grammar<It> parser;
It f(std::cin >> std::noskipws), l;
ValveData data;
bool ok = qi::phrase_parse(f, l, parser, qi::space, data);
if (ok) {
std::cout << "Parsing success:\n";
int count = 0;
for(auto& entity : data)
{
++count;
for (auto& entry : entity)
std::cout << "Entity " << count << ": [" << entry.first << "] -> [" << entry.second << "]\n";
}
} else {
std::cout << "Parsing failed\n";
}
if (f!=l)
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
Which prints (for the input shown):
Parsing success:
Entity 1: [classname] -> [worldspawn]
Entity 1: [detailmaterial] -> [detail/detailsprites_sawmill]
Entity 1: [detailvbsp] -> [detail_sawmill.vbsp]
Entity 1: [hammerid] -> [1]
Entity 1: [mapversion] -> [1371]
Entity 1: [maxpropscreenwidth] -> [-1]
Entity 1: [skyname] -> [sky_alpinestorm_01]
Entity 1: [world_maxs] -> [3432 4096 822]
Entity 1: [world_mins] -> [-2408 -4096 -571]
Entity 2: [classname] -> [env_tonemap_controller]
Entity 2: [hammerid] -> [90580]
Entity 2: [origin] -> [553 -441 322]
Entity 2: [targetname] -> [tonemap_global]
I think doing it all with one regex expression is hard because of the variable number of entries inside each entity {}. Personally I would consider using simply std::readline to do your parsing.
#include <map>
#include <vector>
#include <string>
#include <sstream>
#include <iostream>
std::istringstream iss(R"~(
{
"world_maxs" "3432 4096 822"
"world_mins" "-2408 -4096 -571"
"skyname" "sky_alpinestorm_01"
"maxpropscreenwidth" "-1"
"detailvbsp" "detail_sawmill.vbsp"
"detailmaterial" "detail/detailsprites_sawmill"
"classname" "worldspawn"
"mapversion" "1371"
"hammerid" "1"
}
{
"origin" "553 -441 322"
"targetname" "tonemap_global"
"classname" "env_tonemap_controller"
"hammerid" "90580"
}
)~");
int main()
{
std::string skip;
std::string entity;
std::vector<std::map<std::string, std::string> > vm;
// skip to open brace, read entity until close brace
while(std::getline(iss, skip, '{') && std::getline(iss, entity, '}'))
{
// turn entity into input stream
std::istringstream iss(entity);
// temporary map
std::map<std::string, std::string> m;
std::string key, val;
// skip to open quote, read key to close quote
while(std::getline(iss, skip, '"') && std::getline(iss, key, '"'))
{
// skip to open quote read val to close quote
if(std::getline(iss, skip, '"') && std::getline(iss, val, '"'))
m[key] = val;
}
// move map (no longer needed)
vm.push_back(std::move(m));
}
for(auto& m: vm)
{
for(auto& p: m)
std::cout << p.first << ": " << p.second << '\n';
std::cout << '\n';
}
}
Output:
classname: worldspawn
detailmaterial: detail/detailsprites_sawmill
detailvbsp: detail_sawmill.vbsp
hammerid: 1
mapversion: 1371
maxpropscreenwidth: -1
skyname: sky_alpinestorm_01
world_maxs: 3432 4096 822
world_mins: -2408 -4096 -571
classname: env_tonemap_controller
hammerid: 90580
origin: 553 -441 322
targetname: tonemap_global
I would have written it like this:
^\{(\s*"([A-Za-z0-9_]+)"\s*"([^"]+)")+\s*\}$
Or splited the regex into two strings. First match the curly braces, then loop through the content of curly braces line for line.
Match curly braces: ^(\{[^\}]+)$
Match the lines: ^(\s*"([A-Za-z0-9_]+)"\s*"([^"]+)"\s*)$
(I'm sorry if I ask this question wrong, this is my first time I write in a forum)
When I started programming at my SFML - Game, I had a very old book, wich was very C-like (eg. recommendation of atoi();).
Now I got a new C++(including C++11) book, and I want to rewrite the old lines wih newer Code.
I saved the Tiles in a file stored like this:
[0-0,15-1|22,44] [0-1|0]
[4-0,10-1,3-1|0] [0-5,5-5|0]
That means:
[...] desribes a Tile
0-0 etc. is the xy position on the Texturesheet
22 etc. is the event that will be triggered.
the amount of events and sf::Vector2i shouldn't be set constantly.
The Tiles are separately taken out from another class, which manages the entire Tilemap.
Now my problem: I have no idea how i should push the numbers from the strinstream right in two vectores?
My code:
class Tile{
private:
std::deque<sf::Sprite> tile;
std::deque<int> event;
public:
Tile(sf::Texture& texture, std::deque<sf::Vector2i>&& ctor_texturerects, std::deque<int>&& ctor_events);//This one is working fine
Tile(sf::Texture& texture, std::stringstream&& ctor_stream/*reads the Tile*/){
std::deque<sf::Vector2i> temp_texturerects;
std::deque<int>temp_events;
/*TODO: filter the stringstream and push them into the containers*/
Tile::Tile(texture,std::move(temp_texturerect),std::move(temp_events));
}
I'd be also very happy if you could give me another solution, like changing sf::Vector2i to a better solution or giving me a better stream and class concept
Thanks in advance
Xeno Ceph
Edit:
I made a little workaround:
(I changed the inputstream to a normal string)
But the code doesn't look good
There mujst be an easier solution
Tile:: Tile(sf::Texture& texture, std::string&& ctor_string){
std::deque<sf::Vector2i> temp_texturerects;
std::deque<int> temp_events;
std::stringstream strstr;
for(int i=0; i<ctor_string.size(); ++i){
while(ctor_string[i]!='|'){
while(ctor_string[i] != ','){
strstr << ctor_string[i];
}
sf::Vector2i v2i;
strstr >> v2i.x >> v2i.y;
temp_texturerects.push_front(v2i);
strstr.str("");
}
while(ctor_string[i]!=']'){
while(ctor_string[i] != ','){
strstr << ctor_string[i];
}
int integer;
strstr >> integer;
temp_events.push_front(integer);
strstr.str("");
}
}
Tile::Tile(texture, std::move(temp_texturerects), std::move(temp_events));
}
Has anybody a better solution?
If I understand your question correctly, you have some strings of the form
[0-0,15-1|22,44] [0-1|0]
[4-0,10-1,3-1|0] [0-5,5-5|0]
and you want to extract 2 types of data - positions (e.g. 0-0) and events (e.g. 22).
Your question is how to extract this data cleanly, discarding the [ and ] characters, etc.
One great way to approach this is to use the getline function that operates on stringstreams, which inherit from std::istream (http://www.cplusplus.com/reference/string/string/getline/). It can take custom delimiters, not just the newline character. So you can use '[', '|' and ']' as different delimiting characters and parse them in a logical order.
For example, since your string is just a collection of tiles, you can split it up into a number of functions - ParseTile, ParsePositions and ParseEvents, something like the following:
void Tile::ParseInput(stringstream&& ctor_string) {
//extract input, tile by tile
while(!ctor_string.eof()) {
string tile;
//you can treat each tile as though it is on a separate line by
//specifying the ']' character as the delimiter for the "line"
getline(ctor_string, tile, ']');
tile += "]"; //add back the ']' character that was discarded from the getline
//the string "tile" should now contain a single tile [...], which we can further process using ParseTile
ParseTile(tile);
}
}
The ParseTile function:
void Tile::ParseTile(string&& tile) {
//input string tile is e.g. " [0-0, 15-1|22,44]"
string discard; //string to hold parts of tile string that should be thrown away
string positions; //string to hold list of positions, separated by ','
string events; //string to hold events, separated by ','
//create stringstream from input
stringstream tilestream(tile);
//tilestream is e.g. "[0-0,15-1|22,44]"
getline(tilestream, discard, '['); //gets everything until '['
//now, discard is " [" and tilestream is "0-0,15-1|22,44]"
getline(tilestream, positions, '|');
//now, positions is "0-0,15-1" and tilestream is "22,44]"
getline(tilestream, events,']');
//now, events is "22,44" and tilestream is empty
ParsePositions(positions);
ParseEvents(events);
}
You can write your own ParsePositions and ParseEvents functions which basically will be a more getline calls using a ',' as the delimiting character (just loop until the string ends).
I suggest either writing a proper parser manually (not unlike the other answer proposes) or to use a proper parsing framework, like Boost Spirit.
The advantages of the latter is that you get debuggability, composability, attributes etc. "for free". Here's the simplest example I could think of:
struct TileData
{
std::deque<sf::Vector2i> texturerects;
std::deque<int> events;
};
typedef std::deque<TileData> TileDatas;
template <typename It>
struct parser : qi::grammar<It, TileDatas(), qi::space_type>
{
parser() : parser::base_type(start)
{
using namespace qi;
v2i = (int_ >> '-' >> int_)
[ _val = phx::construct<sf::Vector2i>(_1, _2) ];
tiledata =
(v2i % ',') >> '|' >>
(int_ % ',');
start = *('[' >> tiledata >> ']');
}
private:
qi::rule<It, sf::Vector2i(), qi::space_type> v2i;
qi::rule<It, TileData(), qi::space_type> tiledata;
qi::rule<It, TileDatas(), qi::space_type> start;
};
Adding a bit of code to test this, see it live on http://liveworkspace.org/code/3WM0My$1, output:
Parsed: TileData {
texturerects: deque<N2sf8Vector2iE> {v2i(0, 0), v2i(15, 1), }
events: deque<i> {22, 44, }
}
Parsed: TileData {
texturerects: deque<N2sf8Vector2iE> {v2i(0, 1), }
events: deque<i> {0, }
}
Parsed: TileData {
texturerects: deque<N2sf8Vector2iE> {v2i(4, 0), v2i(10, 1), v2i(3, 1), }
events: deque<i> {0, }
}
Parsed: TileData {
texturerects: deque<N2sf8Vector2iE> {v2i(0, 5), v2i(5, 5), }
events: deque<i> {0, }
}
Full code:
#define BOOST_SPIRIT_USE_PHOENIX_V3
// #define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;
// liveworkspace.org doesn't have SFML
namespace sf { struct Vector2i { int x, y; Vector2i(int ax=0, int ay=0) : x(ax), y(ay) {} }; }
struct TileData
{
std::deque<sf::Vector2i> texturerects;
std::deque<int> events;
};
BOOST_FUSION_ADAPT_STRUCT(TileData,
(std::deque<sf::Vector2i>, texturerects)
(std::deque<int>, events))
typedef std::deque<TileData> TileDatas;
template <typename It>
struct parser : qi::grammar<It, TileDatas(), qi::space_type>
{
parser() : parser::base_type(start)
{
using namespace qi;
v2i = (int_ >> '-' >> int_)
[ _val = phx::construct<sf::Vector2i>(_1, _2) ];
tiledata =
(v2i % ',') >> '|' >>
(int_ % ',');
start = *('[' >> tiledata >> ']');
}
private:
qi::rule<It, sf::Vector2i(), qi::space_type> v2i;
qi::rule<It, TileData(), qi::space_type> tiledata;
qi::rule<It, TileDatas(), qi::space_type> start;
};
typedef boost::spirit::istream_iterator It;
std::ostream& operator<<(std::ostream& os, sf::Vector2i const &v) { return os << "v2i(" << v.x << ", " << v.y << ")"; }
template <typename T> std::ostream& operator<<(std::ostream& os, std::deque<T> const &d) {
os << "deque<" << typeid(T).name() << "> {";
for (auto& t : d) os << t << ", ";
return os << "}";
}
std::ostream& operator<<(std::ostream& os, TileData const& ttd) {
return os << "TileData {\n"
"\ttexturerects: " << ttd.texturerects << "\n"
"\tevents: " << ttd.events << "\n}";
}
int main()
{
parser<It> p;
std::istringstream iss(
"[0-0,15-1|22,44] [0-1|0]\n"
"[4-0,10-1,3-1|0] [0-5,5-5|0]");
It f(iss), l;
TileDatas data;
if (qi::phrase_parse(f,l,p,qi::space,data))
{
for (auto& tile : data)
{
std::cout << "Parsed: " << tile << "\n";
}
}
if (f != l)
{
std::cout << "Remaining unparsed: '" << std::string(f, l) << "'\n";
}
}
template <typename Iterator>
struct parse_grammar
: qi::grammar<Iterator, std::string()>
{
parse_grammar()
: parse_grammar::base_type(start_p, "start_p"){
a_p = ',' > qi::double_;
b_p = *a_p;
start_p = qi::double_ > b_p >> qi::eoi;
}
qi::rule<Iterator, std::string()> a_p;
qi::rule<Iterator, std::string()> b_p;
qi::rule<Iterator, std::string()> start_p;
};
// implementation
std::vector<double> parse(std::istream& input, const std::string& filename)
{
// iterate over stream input
typedef std::istreambuf_iterator<char> base_iterator_type;
base_iterator_type in_begin(input);
// convert input iterator to forward iterator, usable by spirit parser
typedef boost::spirit::multi_pass<base_iterator_type> forward_iterator_type;
forward_iterator_type fwd_begin = boost::spirit::make_default_multi_pass(in_begin);
forward_iterator_type fwd_end;
// prepare output
std::vector<double> output;
// wrap forward iterator with position iterator, to record the position
typedef classic::position_iterator2<forward_iterator_type> pos_iterator_type;
pos_iterator_type position_begin(fwd_begin, fwd_end, filename);
pos_iterator_type position_end;
parse_grammar<pos_iterator_type> gram;
// parse
try
{
qi::phrase_parse(
position_begin, position_end, // iterators over input
gram, // recognize list of doubles
ascii::space); // comment skipper
}
catch(const qi::expectation_failure<pos_iterator_type>& e)
{
const classic::file_position_base<std::string>& pos = e.first.get_position();
std::stringstream msg;
msg <<
"parse error at file " << pos.file <<
" line " << pos.line << " column " << pos.column << std::endl <<
"'" << e.first.get_currentline() << "'" << std::endl <<
" " << "^- here";
throw std::runtime_error(msg.str());
}
// return result
return output;
}
I have this above sample code(Code used from boost-spirit website for example here).
In the grammar in the rule a_p I want to use semantic action and call a method and pass the iterator to it something as below:
a_p = ',' > qi::double_[boost::bind(&parse_grammar::doStuff(), this,
boost::ref(position_begin), boost::ref(position_end)];
and if the signature of the method doStuff is like this:
void doStuff(pos_iterator_type const& first, pos_iterator_type const& last);
Any ideas how to do this?
I do not mind any way(if I can do it using boost::phoenix or something not sure how) as long as to the method the iterators are passed with their current state.
I'm not completely sure why you think you 'need' what you describe. I'm afraid the solution to your actual task might be very simple:
start_p = qi::double_ % ',' > qi::eoi;
However, since the actual question is quite interesting, and the use of position interators in combination with istream_buf (rather than just the usual (slower) boost::spirit::istream_iterator) has it's merit, I'll show you how to do it with the semantic action as well.
For a simple (but rather complete) test main of
int main()
{
std::istringstream iss(
"1, -3.4 ,3.1415926\n"
",+inF,-NaN ,\n"
"2,-.4,4.14e7\n");
data_t parsed = parse(iss, "<inline-test>");
std::cout << "Done, parsed " << parsed.size() << " values ("
<< "min: " << *std::min_element(parsed.begin(), parsed.end()) << ", "
<< "max: " << *std::max_element(parsed.begin(), parsed.end()) << ")\n";
}
The output with the semantic action now becomes:
debug ('start_p') at <inline-test>:1:[1..2] '1' = 1
debug ('start_p') at <inline-test>:1:[4..8] '-3.4' = -3.4
debug ('start_p') at <inline-test>:1:[10..19] '3.1415926' = 3.14159
debug ('start_p') at <inline-test>:2:[2..6] '+inF' = inf
debug ('start_p') at <inline-test>:2:[7..11] '-NaN' = -nan
debug ('start_p') at <inline-test>:3:[1..2] '2' = 2
debug ('start_p') at <inline-test>:3:[3..6] '-.4' = -0.4
debug ('start_p') at <inline-test>:3:[7..13] '4.14e7' = 4.14e+07
Done, parsed 8 values (min: -3.4, max: inf)
See it live at http://liveworkspace.org/code/8a874ef3...
Note how it
demonstrates access to the name of the actual parser instance ('start_p')
demonstrates accces to the full source iterator range
shows how to do specialized processing inside the semantic action
I still suggest using qi::double_ to parse the raw input, because it is the only thing I know that easily handles all cases (see test data and this other question: Is it possible to read infinity or NaN values using input streams?)
demonstrates parsing the actual data into the vector efficiently by displaying statistics of the parsed values
Full Code
Here is the full code for future reference:
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/support_multi_pass.hpp>
#include <boost/spirit/include/classic_position_iterator.hpp>
#include <boost/phoenix/function/adapt_function.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
namespace classic = boost::spirit::classic;
namespace ascii = boost::spirit::ascii;
typedef std::vector<double> data_t;
///////// USING A FREE FUNCTION
//
template <typename Grammar, typename Range>
double doStuff_(Grammar &grammar, Range pos_range)
{
// for efficiency, cache adhoc grammar:
static const qi::rule <typename Range::iterator, double()> r_double = qi::double_;
static const qi::grammar<typename Range::iterator, double()> g_double(r_double); // caching just the rule may be enough, actually
double value = 0;
qi::parse(pos_range.begin(), pos_range.end(), g_double, value);
std::cout << "debug ('" << grammar.name() << "') at "
<< pos_range.begin().get_position().file << ":"
<< pos_range.begin().get_position().line << ":["
<< pos_range.begin().get_position().column << ".."
<< pos_range.end ().get_position().column << "]\t"
<< "'" << std::string(pos_range.begin(),pos_range.end()) << "'\t = "
<< value
<< '\n';
return value;
}
BOOST_PHOENIX_ADAPT_FUNCTION(double, doStuff, doStuff_, 2)
template <typename Iterator, typename Skipper>
struct parse_grammar : qi::grammar<Iterator, data_t(), Skipper>
{
parse_grammar()
: parse_grammar::base_type(start_p, "start_p")
{
using qi::raw;
using qi::double_;
using qi::_1;
using qi::_val;
using qi::eoi;
using phx::push_back;
value_p = raw [ double_ ] [ _val = doStuff(phx::ref(*this), _1) ];
start_p = value_p % ',' > eoi;
// // To use without the semantic action (more efficient):
// start_p = double_ % ',' >> eoi;
}
qi::rule<Iterator, data_t::value_type(), Skipper> value_p;
qi::rule<Iterator, data_t(), Skipper> start_p;
};
// implementation
data_t parse(std::istream& input, const std::string& filename)
{
// iterate over stream input
typedef std::istreambuf_iterator<char> base_iterator_type;
base_iterator_type in_begin(input);
// convert input iterator to forward iterator, usable by spirit parser
typedef boost::spirit::multi_pass<base_iterator_type> forward_iterator_type;
forward_iterator_type fwd_begin = boost::spirit::make_default_multi_pass(in_begin);
forward_iterator_type fwd_end;
// wrap forward iterator with position iterator, to record the position
typedef classic::position_iterator2<forward_iterator_type> pos_iterator_type;
pos_iterator_type position_begin(fwd_begin, fwd_end, filename);
pos_iterator_type position_end;
parse_grammar<pos_iterator_type, ascii::space_type> gram;
data_t output;
// parse
try
{
if (!qi::phrase_parse(
position_begin, position_end, // iterators over input
gram, // recognize list of doubles
ascii::space, // comment skipper
output) // <-- attribute reference
)
{
std::cerr << "Parse failed at "
<< position_begin.get_position().file << ":"
<< position_begin.get_position().line << ":"
<< position_begin.get_position().column << "\n";
}
}
catch(const qi::expectation_failure<pos_iterator_type>& e)
{
const classic::file_position_base<std::string>& pos = e.first.get_position();
std::stringstream msg;
msg << "parse error at file " << pos.file
<< " line " << pos.line
<< " column " << pos.column
<< "\n\t'" << e.first.get_currentline()
<< "'\n\t " << std::string(pos.column, ' ') << "^-- here";
throw std::runtime_error(msg.str());
}
return output;
}
int main()
{
std::istringstream iss(
"1, -3.4 ,3.1415926\n"
",+inF,-NaN ,\n"
"2,-.4,4.14e7\n");
data_t parsed = parse(iss, "<inline-test>");
std::cout << "Done, parsed " << parsed.size() << " values ("
<< "min: " << *std::min_element(parsed.begin(), parsed.end()) << ", "
<< "max: " << *std::max_element(parsed.begin(), parsed.end()) << ")\n";
}