Serializing a FlatBuffer object to JSON without it's schema file - c++

I've been working with FlatBuffers as a solution for various things in my project, one of them specifically being JSON support. However, while FB natively supports JSON generation, the documentation for flatbuffers is poor, and the process is somewhat cumbersome. Right now, I am working in the Object->JSON direction. The issue I am having doesn't really arise the other way around (I think).
I currently have JSON generation working per an example I found here (line 630, JsonEnumsTest()) - by parsing a .fbs file into a flattbuffers::Parser, building and packaging my flatbuffer object, then running GenerateText() to generate a JSON string. The code I have is simpler than the example in test.cpp, and looks vaguely like this:
bool MyFBSchemaWrapper::asJson(std::string& jsonOutput)
{
//**This is the section I don't like having to do
std::string schemaFile;
if (flatbuffers::LoadFile((std::string(getenv("FBS_FILE_PATH")) + "MyFBSchema.fbs").c_str(), false, &schemaFile))
{
flatbuffers::Parser parser;
const char *includePaths[] = { getenv("FBS_FILE_PATH");
parser.Parse(schemaFile.c_str(), includePaths);
//**End bad section
parser.opts.strict_json = true;
flatbuffers::FlatBufferBuilder fbBuilder;
auto testItem1 = fbBuilder.CreateString("test1");
auto testItem2 = fbBuilder.CreateString("test2");
MyFBSchemaBuilder myBuilder(fbBuilder);
myBuilder.add_item1(testItem1);
myBuilder.add_item2(testItem2);
FinishMyFBSchemaBuffer(fbBuilder, myBuilder.finish());
auto result = GenerateText(parser, fbBuilder.GetBufferPointer(), &jsonOutput);
return true;
}
return false;
}
Here's my issue: I'd like to avoid having to include the .fbs files to set up my Parser. I don't want to clutter an already large monolithic program by adding even more random folders, directories, environment variables, etc. I'd like to be able to generate JSON from the compiled FlatBuffer schemas, and not have to search for a file to do so.
Is there a way for me to avoid having to read back in my .fbs schemas into the parser? My intuition is pointing to no, but the lack of documentation and community support on the topic of FlatBuffers & JSON is telling me there might be a way. I'm hoping that there's a way to use the already generated MyFBSchema_generated.h to create a JSON string.

Yes, see Mini Reflection in the documentation: http://google.github.io/flatbuffers/flatbuffers_guide_use_cpp.html

Related

Mocking proto.Message in Go

I wrote a function that takes a list of proto.Message objects. Looking at the documentation, it seems like proto.Message wraps protoreflect.ProtoMessage which contains a single function, ProtoReflect() Message. Looking at the documentation for Message, it implements a number of other functions that return types referenced by the protoreflect package.
It seems that attempting to create a mock proto.Message would be a lot more work that it's worth but I don't want to go through the whole process of creating a protobuf file, compiling it and referencing it just for unit testing.
Is there another way I can create a mock proto.Message object?
After looking into this further, I decided that the best thing to do would be to create an actual protobuf message:
syntax = "proto3";
package my.package;
option go_package = "path/to/my/package"; // golang
message TestRecord {
string key = 1;
string value = 2;
}
and compile it into a Golang file and then modify it for my own purposes:
$ protoc --go_out=./gopb/ --go_opt=paths=source_relative *.proto
Once the test file has been created, I can delete it or save it as a record.

How to get the next enum value from an enum in protobuf?

I have a protobuf message with non-consecutive enum values something like this:
message Information {
enum Versions {
version1 = 0;
version2 = 1;
version3 = 10;
version4 = 20;
version5 = 30;
}
}
I want to have a C++ function GetNextVersion() which takes in one enum version and gives the next version as output. For eg: GetNextVersion(Information::version4) should give Information::version5 as output. Is there any inbuilt and easy method to do this?
You can use protobuf's reflection to achieve the goal:
Information::Versions GetNextVersion(Information::Versions ver) {
const auto *desc = Information::Versions_descriptor();
auto cur_idx = desc->FindValueByNumber(ver)->index();
if (cur_idx >= desc->value_count() - 1) {
throw runtime_error("no next enum");
}
auto next_idx = cur_idx + 1;
return Information::Versions(desc->value(next_idx)->number());
}
int main() {
try {
auto ver = Information::version1;
while (true) {
cout << ver << endl;
ver = GetNextVersion(ver);
}
} catch (const runtime_error &e) {
cout << e.what() << endl;
}
return 0;
}
Is there any inbuilt and easy method to do this?
I see no easy method to get that.
But I can suggest metaprogramming approaches (at least on Linux) with C++ code generation.
You could, assuming you have access to the source code of protobuf-c :
write some GNU gawk script to parse that C++ code and generate the C++ code of GetNextVersion
perhaps write some GNU sed (or a Python one) script doing the same.
write some GCC plugin and use it to parse that C++ code and generate the C++ code of GetNextVersion
write some GNU emacs code doing the same.
wait a few months and (in spring 2021) use Bismon. I am developing it, so contact me by email
extend and adapt the Clang static analyzer for your needs.
extend and adapt the SWIG tool for your needs.
extend and adapt the RPGGEN tool for your needs.
use GNU bison or ANTLR to parse C++ code, or design your domain specific language with some documented EBNF syntax and write some code generator with them.
You could also keep the description of enum Versions in some database (sqlite, PostGreSQL, etc...) or some JSON file or some CSV file (or an XML one, using XSLT or libexpat) and emit it (for protobuf) and the source code of GetNextVersion using some Python script, or GNU m4, or GPP.
You could write a GNU guile script or some rules for CLIPS generating some C++ code with your protobuf description.
In a few months (spring 2021), the RefPerSys system might be helpful. Before that, you could contribute and extend it and reuse it for your needs.
A pragmatic approach could be to add a comment in your protobuf declaration to remind you of editing another file when you need to change the protobuf message and protocol.
No, there isn't.
You define your own data type, so you also must define the operators for it.
So, your GetNextVersion()method contains that knowledge how to increment the version number. If you had decided to use an integer, then the compiler knows already how to increment that, but you wanted something special and that is the price you have to pay for it.

Use Arrow schema with parquet StreamWriter

I am attempting to use the C++ StreamWriter class provided by Apache Arrow.
The only example of using StreamWriter uses the low-level Parquet API i.e.
parquet::schema::NodeVector fields;
fields.push_back(parquet::schema::PrimitiveNode::Make(
"string_field", parquet::Repetition::OPTIONAL, parquet::Type::BYTE_ARRAY,
parquet::ConvertedType::UTF8));
fields.push_back(parquet::schema::PrimitiveNode::Make(
"char_field", parquet::Repetition::REQUIRED, parquet::Type::FIXED_LEN_BYTE_ARRAY,
parquet::ConvertedType::NONE, 1));
auto node = std::static_pointer_cast<parquet::schema::GroupNode>(
parquet::schema::GroupNode::Make("schema", parquet::Repetition::REQUIRED, fields));
The end result is a std::shared_ptr<parquet::schema::GroupNode> that can then be passed into StreamWriter.
Is it possible to construct and use "high-level" Arrow schemas with StreamWriter? They are supported when using the WriteTable function (non-streaming) but I've found no examples of its use with the streaming API.
I, of course, can resort to using the low-level API but it is extremely verbose when creating large and complicated schema and I would prefer (but not need) to use the high-level Arrow schema mechanisms.
For example,
std::shared_ptr<arrow::io::FileOutputStream> outfile_;
PARQUET_ASSIGN_OR_THROW(outfile_, arrow::io::FileOutputStream::Open("test.parquet"));
// construct an arrow schema
auto schema = arrow::schema({arrow::field("field1", arrow::int64()),
arrow::field("field2", arrow::float64()),
arrow::field("field3", arrow::float64())});
// build the writer properties
parquet::WriterProperties::Builder builder;
auto properties = builder.build()
// my current best attempt at converting the Arrow schema to a Parquet schema
std::shared_ptr<parquet::SchemaDescriptor> parquet_schema;
parquet::arrow::ToParquetSchema(schema.get(), *properties, &parquet_schema); // parquet_schema is now populated
// and now I try and build the writer - this fails
auto writer = parquet::ParquetFileWriter::Open(outfile_, parquet_schema->group_node(), properties);
The last line fails because parquet_schema->group_node() (which is the only way I'm aware of to get access to the GroupNode for the schema) returns a const GroupNode* whereas ParquetFileWriter::Open) needs a std::shared_ptr<GroupNode>.
I'm not sure casting-away the constness of the returned group node and forcing it into the ::Open() call is an officially supported (or correct) usage of StreamWriter.
Is what I want to do possible?
It seems that you need to use low-level api for StreamWriter.
a very tricky way:
auto writer = parquet::ParquetFileWriter::Open(outfile_, std::shared_ptr<parquet::schema::GroupNode>(const_cast<parquet::schema::GroupNode *>(parquet_schema->group_node())), properties);
you may need to converter schema manually.
source code cpp/src/parquet/arrow/schema.cc may help you.
PARQUET_EXPORT
::arrow::Status ToParquetSchema(const ::arrow::Schema* arrow_schema,
const WriterProperties& properties,
std::shared_ptr<SchemaDescriptor>* out);

C/CPP version of BeautifulSoup especially at handling malformed HTML

Are there any recommendations for a c/cpp lib which can be used to easily (as much as that possible) parse / iterate / manipulate HTML streams/files assuming some might be malformed, i.e. tags not closed etc.
BeautifulSoup
HTMLparser from Libxml is easy to use (simple tutorial below) and works great even on malformed HTML.
Edit : Original blog post is no longer accessible, so I've copy pasted the content here.
Parsing (X)HTML in C is often seen as a difficult task.
It's true that C isn't the easiest language to use to develop a parser.
Fortunately, libxml2's HTMLParser module come to the rescue. So, as promised, here's a small tutorial explaining how to use libxml2's HTMLParser to parse (X)HTML.
First, you need to create a parser context. You have many functions for doing that, depending on how you want to feed data to the parser. I'll use htmlCreatePushParserCtxt(), since it work with memory buffers.
htmlParserCtxtPtr parser = htmlCreatePushParserCtxt(NULL, NULL, NULL, 0, NULL, 0);
Then, you can set many options on that parser context.
htmlCtxtUseOptions(parser, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET);
We are now ready to parse an (X)HTML document.
// char * data : buffer containing part of the web page
// int len : number of bytes in data
// Last argument is 0 if the web page isn't complete, and 1 for the final call.
htmlParseChunk(parser, data, len, 0);
Once you've pushed it all your data, you can call that function again with a NULL buffer and 1 as the last argument. This will ensure that the parser have processed everything.
Finally, how to get the data you parsed? That's easier than it seems. You simply have to walk the XML tree created.
void walkTree(xmlNode * a_node)
{
xmlNode *cur_node = NULL;
xmlAttr *cur_attr = NULL;
for (cur_node = a_node; cur_node; cur_node = cur_node->next)
{
// do something with that node information, like... printing the tag's name and attributes
printf("Got tag : %s\n", cur_node->name)
for (cur_attr = cur_node->properties; cur_attr; cur_attr = cur_attr->next)
{
printf(" ->; with attribute : %s\n", cur_attr->name);
}
walkTree(cur_node->children);
}
}
walkTree(xmlDocGetRootElement(parser->myDoc));
And that's it! Isn't that simple enough? From there, you can do any kind of stuff, like finding all referenced images (by looking at img tag) and fetching them, or anything you can think of doing.
Also, you should know that you can walk the XML tree anytime, even if you haven't parsed the whole (X)HTML document yet.
If you have to parse (X)HTML in C, you should use libxml2's HTMLParser. It will save you a lot of time.
you could use Google gumbo-parser
Gumbo is an implementation of the HTML5 parsing algorithm implemented as a pure C99 library with no outside dependencies. It's designed to serve as a building block for other tools and libraries such as linters, validators, templating languages, and refactoring and analysis tools.
#include "gumbo.h"
int main() {
GumboOutput* output = gumbo_parse("<h1>Hello, World!</h1>");
// Do stuff with output->root
gumbo_destroy_output(&kGumboDefaultOptions, output);
}
There's also a C++ binding for this library gumbo-query
A C++ library that provides jQuery-like selectors for Google's Gumbo-Parser.
#include <iostream>
#include <string>
#include "Document.h"
#include "Node.h"
int main(int argc, char * argv[])
{
std::string page("<h1><a>some link</a></h1>");
CDocument doc;
doc.parse(page.c_str());
CSelection c = doc.find("h1 a");
std::cout << c.nodeAt(0).text() << std::endl; // some link
return 0;
}
I've only used libCurl C++ for this type of thing but found it to be pretty good and useable. Don't know how it would cope with broken HTML though.
Try using SIP and run BeautifulSoup on it might help.
More details on below link thread. OpenFrameworks + Python

Making files with map or scenario information, such as what resources to load, objects, locations, events

I'm making a simple graphics engine in C++, using Visual C++ and DirectX, and I'm testing out different map layouts.
Currently, I construct "maps" by simply making a C++ source file and start writing:
SHADOWENGINE ShadowEngine(&settings);
SPRITE_SETTINGS sset;
MODEL_SETTINGS mset;
sset.Name = "Sprite1";
sset.Pivot = TOPLEFT;
sset.Source = "sprite1.png";
sset.Type = STATIC;
sset.Movable = true;
sset.SoundSet = "sprite1.wav"
ShadowEngine->Sprites->Load(sset);
sset.Name = "Sprite2"
sset.Source = "sprite2.png";
sset.Parent = "Sprite1";
sset.Type = ANIMATED;
sset.Frames = 16;
sset.Interval = 1000;
sset.Position = D3DXVECTOR(0.0f, (ShadowEngine->Resolution->Height/2), 0.0f);
ShadowEngine->Sprites->Load(sset);
mset.Source = "character.sx";
mset.Collision = false;
mset.Type = DYNAMIC;
ShadowEngine->Models->Load(mset);
//Etc..
What I'd like to be able to do, is to create map files that are instead loaded into the engine, without having to write them into the executable. That way, I can make changes to the maps without having to recompile every damn time.
SHADOWENGINE ShadowEngine(&settings);
ShadowEngine->InitializeMap("Map1.sm");
The only way I can think of is to make it read the file as text and then just parse the information, but it sounds like such a hassle.
Am I thinking the wrong way?
What should I do?
Wouldn't mind an explanation on how others do it, like Warcraft III, Starcraft, Age of Empires, Heroes of Might and Magic...
Would really appreciate some help on this one.
You are not thinking the wrong way, loading your map data is definitely desirable. The most common prebuilt solutions are Protocol Buffers and Lua. If you don't already know Lua I would use protocol buffers, as it directly solves your problem, whereas Lua is a scripting language which is flexible enough to do what you need done.
Some people write their data as XML, but this is only a partial solution as XML is just a markup language. After loading the XML you'll have a DOM tree to parse.
Google's CPP Protobuf Tutorial