Serialize Lua Table from C++ (via JSON)

Serialize Lua Table from C++ (via JSON) - c++

I'd like to communicate complex data from a C++ service to a Lua application. This communication takes place over the network. For simplicity and speed in the Lua application I would prefer to send literal Lua table literals (no need for a separate parser) instead of XML or JSON or YAML or such.
While there exist things like C++ libraries that write JSON, I cannot find an existing C++ library for creating serialized Lua. My idea, then, is to use an existing JSON library for C++ and then convert the string to Lua.
So, for example, I'd like to convert this string:
{
"hello":42,
"array":[1,2,{"more":false},null,true],
"worst":"still [null]: got it?"
}
into this string:
{
["hello"]=42,
["array"]={1,2,{["more"]=false},nil,true},
["worst"]="still [null]: got it?"
}
A naive replace_all converting to : to =, [] to {}, and null to nil will destroy content inside of strings. How can I perform this conversion?
To avoid the problems of an XY problem I have included my end motivation at the top and in the title, in case the JSON->Lua string conversion is the wrong choice.

I would code that Lua-format serializing library by myself. You could choose a free software Json C++ library (e.g. jsoncpp or libjson) and adapt its code (to your Lua-format) quite easily.
Of course you should obey the license of that library, and I strongly suggest you to make your Lua-format serialization library a free software itself, e.g. on github and/or freecode and/or sourceforge...
The point is that JSON (and hopefully your Lua format) is simple enough to make quite easy its parsing or printing... Adapting an existing library to your format is probably simpler and certainly faster than "post-processing" its output ...

Although I'm not locating it readily today, I recall that there was discussion a year or so ago on the Lua list of the merits of defining a limited subset of Lua table literals analogous to JSON, dubbed "LSON" for the sake of discussion. IIRC, the consensus developed that there wasn't enough benefit to be had over just using an established standard lightweight format like JSON, but I know some experiments were conducted.
This Github Gist for lson.lua demonstrates a simple LSON writer and reader in pure Lua. The writer could be transformed to C or C++ with only moderate effort based on that code. A key feature of that code is that it provides some protection against circular references, and against data types that can be stored in tables but which have no reasonable mechanism for writing as source code (userdata, and thread are both highly problematic to serialize in any form). Of course, for data originating as plain old data in C with only lightweight structure, you won't have any of the problematic data types anyway. It also protects against circular references. If serializing lists or trees from C, circular references may be impossible by construction. If not, you will need to deal with them yourself.
Note that using Lua's own parser does potentially introduce security issues. The most glaring issue is that just writing assert(loadstring('return '..Input))() allows the imported text access to your entire current environment. While there is some protection resulting from applying the return keyword outside of the input text, that still won't prevent clever use of any functions that can be called from an expression. For best safety, you will want to read about sandboxes and possibly even apply some clever tricks to restrict the compiled bytecode before blindly executing it.
The security issues may be a strong argument in favor of qualifying and using a JSON parser. Even javascipt applications often prefer to use JSON parsers rather than just letting the javascript engine execute untrusted content.

Related

hibernate-like saving state of a program

Is there any way in C++ or Java or Python that would allow me to save the state of my program, no questions asked? For example, I've spent an hour learning how to save a tree-like structure into a file. Very educative but I feel I could just do:
saveState(file);
And the "file" would contain whole memory my program uses. Just like operating system's "hibernate" or "suspend-to-disk" feature. I know about boost serialization, this is probably not what I'm looking for.

What you most likely want is what we call serialization or object marshalling. There are a whole butt load of academic problems with data/object serialization that you can easily google.
That being said given the right library (probably very native) you could do a true snapshot of your running program similarly what "OS specific hibernate" does. Here is an SO answer for doing that on Linux: https://stackoverflow.com/a/12190830/318174
To do the above snapshot-ing though you will most likely need an external process from the process you want to save. I highly recommend you don't that. Instead read/lookup in your language of choice (btw welcome to SO, don't tag every language... that pisses people off) how to do serialization or object marshalling... hint... most people these days pick JSON.

I think that what you describe would be a feature that few people would actually want to use for a real system. Usually you want to save something so it can be transmitted, or so you can stop running the program, or guard against the possibility that the program quits (or power fails).
In most production systems one wants to make the writes to disk small and incremental so that the system can remain responsive, and writing inconsistent data can be avoided. Writing ALL memory to disk on a regular basis would probably result in lots of non-responsive time. You would need to lock the entire system to avoid inconsistent state.
Writing your own persistence is tedious and error prone however so you may find this SO question of interest: Persisting graph data (Java)

There are a couple of frameworks around this. Check out Google Protocol Buffers if you need support for Java, Python, and C++ https://developers.google.com/protocol-buffers/ I've used it in some projects and it works well.
There's also Thrift (orginally from Facebook) http://thrift.apache.org/ I don't have any experience with it though.
Another option is what #QuentinUK suggests. Use a class that inherits from something streamable and/or make streamable operators/functions.
I'd use a framework.

Here's your problem:
http://en.wikipedia.org/wiki/Address_space_layout_randomization
Back in ancient history (16-bit DOS programs with extenders), compilers used to support "based" pointers which stored relative addresses. These were safe to serialize en masse. And applications did so, saving both code and data, the serialized modules were called "overlays".
Today, you'd need based pointer support in your toolchain (resulting in every pointer access requiring an extra adjustment), or else to go through all the data, distinguishing the pointers from the other data (how?) and adjusting them to their new storage location, in case the OS already loaded some library at the same address your old program had used for its heap. In modern "managed" environments, where pointers already have to be identified for the garbage collector, this is feasible even if not commonly done. In native code, it's very difficult, although that metadata is created to enable relocation of shared libraries.
So instead people end up walking their entire data structures manually, and converting object links (pointers) into something that can be restored on the other end, even though the object has a new address (again, because the old address may have been used for a shared library).
Note that many processors have features to support based addressing... and that since based addressing is no longer common, compilers went ahead and used those pointer arithmetic features to speed up user code.

Yes, derive objects from a streamable class and add the streaming functions. Then you can stream everything to disk. You will need a library for this such as MFC.

How to write a C++ code generator that takes C++ code as input?

We have a CORBA implementation that autogenerates Java and C++ stubs for us. Because the CORBA-generated code is difficult to work with, we need to write wrappers/helpers around the CORBA code. So we have a 2-step code generation process (yes, I know this is bad):
CORBA IDL -> annoying CORBA-generated code -> useful wrappers/helper functions
Using Java's reflection, I can inspect the CORBA-generated code and use that to generate additional code. However, because C++ doesn't have reflection, I am not sure how to do this on the C++ side. Should I use a C++ parser? C++ templates?
TLDR: How to generate C++ code using generated C++ code as input?

Have you considered to take a step back and use the IDL as source for a custom code generator? Probably you have some wrapper code that hides things like duplicate, var, ptr, etc. We have a Ruby based CORBA IDL compiler that currently generates Ruby and C++ code. That could be extended with a customer generator, see https://www.remedy.nl for RIDL and R2CORBA.
Another option would be to check out the IDL to C++11 language mapping, more details on https://www.taox11.org. This new language mapping is much easier to use and uses standard types and STL containers to work with.

GCC XML could help in recovering the interface.
I'm using it to write a Prolog foreign interface for OpenGL and Horde3D rendering engine.
The interfaces I'm interested to are limited to C, but GCC XML handles C++ as well.
GCC XML parse source code interface and emits and XML AST. Then with an XML library it's fairly easy extract requested info. A nuance it's the lose of macro' symbols: AFAIK just the values survive to the parse. As an example, here (part of ) the Prolog code used to generate the FLI:
make_funcs(NameChange, Xml, FileName, Id) :-
index_id(Xml, Indexed),
findall(Name:Returns:ArgTypes,
(xpath(Xml, //'Function'(#file = Id, #name = Name, #returns = ReturnsId), Function),
typeid_indexed(Indexed, ReturnsId, Returns),
findall(Arg:Type, (xpath(Function, //'Argument'(#name = Arg, #type = TypeId), _),
typeid_indexed(Indexed, TypeId, Type)), ArgTypes)
),
AllFuncs),
length(AllFuncs, LAllFuncs),
writeln(FileName:LAllFuncs),
fat('prolog/h3dplfi/~s.cpp', [FileName], Cpp),
open(Cpp, write, Stream),
maplist(\X^((X = K-A -> true ; K = X, A = []), format(Stream, K, A), nl(Stream)),
['#include "swi-uty.h"',
'#include <~#>'-[call(NameChange, FileName)]
]),
forall(member(F, AllFuncs), make_func(Stream, F)),
close(Stream).
xpath (you guess it) it's the SWI-Prolog library that make analysis simpler...

If you want to reliably process C++ source code, you need a program transformation tool that understands C++ syntax and semantics, can parse C++ code, transform the parsed representation, and regenerate valid C++ code (including the original comments). Such a tool provides in effect arbitrary metaprogramming by operating outside the language, so it is not limited by the "reflection" or "metaprogramming" facilities built into the language.
Our DMS Software Reengineering Toolkit with its C++ Front End can do this.
It has been used on a number of C++ automated transformation tasks, both (accidentally) related to CORBA-based activities. The first included reshaping interfaces for a proprietary distributed system into CORBA-compatible facets. The second reshaped a large CORBA-based application in the face of IDL changes; such changes in effect cause the code to be moved around and causes signature changes. You can find technical papers at the web site that describe the first activity; the second was done for a major defense contractor.

Take a look at Clang compiler, aside from being a standalone compiler it is also intended to be used as an library in situations like the one you describe. It will provide you with parse tree on which you could do your analysis and transformations

Getting AST for C++?

I'm looking to get an AST for C++ that I can then parse with an external program. What programs are out there that are good for generating an AST for C++? I don't care what language it is implemented in or the output format (so long as it is readily parseable).
My overall goal is to transform a C++ unit test bed to its corresponding C# wrapper test bed.

You can use clang and especially libclang to parse C++ code. It's a very high quality, hand written library for lexing, parsing and compiling C++ code but it can also generate an AST.
Clang also supports C, Objective-C and Objective-C++. Clang itself is written in C++.

Actually, GCC will emit the AST at any stage in the pipeline that interests you, including the GENERIC and GIMPLE forms. Check out the (plethora of) command-line switches begining with -fdump- — e.g. -fdump-tree-original-raw
This is one of the easier (…) ways to work, as you can use it on arbitrary code; just pass the appropriate CFLAGS or CXXFLAGS into most Makefiles:
make CXXFLAGS=-fdump-tree-original-raw all
… and you get “the works.”
Updated: Saw this neat little graphing system based on GCC AST's while checking my flag name :-) Google FTW.
http://digitocero.com/en/blog/exporting-and-visualizing-gccs-abstract-syntax-tree-ast

Our C++ Front End, built on top of our DMS Software Reengineering Toolkit can parse a variety of C++ dialects (including C++11 and ObjectiveC) and export that AST as an XML document with a command line switch. See example ASTs produced by this front end.
As a practical matter, you will need more than the AST; you can't really do much with C++ (or any other modern language) without an understanding of the meaning and scope of each identifier. For C++, meaning/scope are particularly ugly. The DMS C++ front end handles all of that; it can build full symbol tables associating identifers to explicit C++ types. That information isn't dumpable in XML with a command line switch, but it is "technically easy" to code logic in DMS to walk the symbol table and spit out XML. (there is an option to dump this information, just not in XML format).
I caution you against the idea of manipulating (or even just analyzing) the XML. First, XSLT isn't a particularly good way to understand the meaning of the ASTs, let alone transform the AST, because the ASTs represent context sensitive language structures (that's why you want [nee MUST HAVE] the symbol table). You can read the XML into a dom-like tree if you like and write your own procedural code to manipulate it. But source-to-source transformations are an easier way; you can write your transformations using C++ notation rather than buckets of code goo climbing over a tree data structure.
You'll have another problem: how to generate valid C++ code from the transformed XML. If you don't mind spitting out raw text, you can solve this problem in purely ad hoc ways, at the price of having no gaurantee other than sweat that generated code is syntactically valid. If you want to generate a C++ representation of your final result as an AST, and regenerate valid text from that, you'll need a prettyprinter, which are not technically hard but still a lot of work to build especially for a language as big as C++.
Finally, the reason that tools like DMS exist is to provide the vast amount of infrastructure it takes to process/manipulate complex structure such as C++ ASTs. (parse, analyse, transform, prettyprint). You can try to replicate all this machinery yourself, but this is usually a poor time/cost/productivity tradeoff. The claim is it is best to stay within the tool ecosystem rather than escape it and build bad versions of it yourself. If you haven't done this before, you'll find this out painfully.
FWIW, DMS has been used to carry out massive analysis and transformations on C++ source code. See Publications on DMS and check the papers by Akers on "Re-engineering C++ Component Models".
Clang is based on the same kind of philosophy; there's an ecosystem of tools.
YMMV, but I'd be surprised.

Script system in application

I'm developing a game and now I want to make script system for it.
Now I have abstract class Object which is inherited by all game objects. I have to write a lot of technical code, add new object type into enum, register parser function for each object (that function parses object's params from file).
I don't want to make such work. So the idea is to get some script system (boost.python for example, because I'm using boost in my project). Each object will be a simple python-script, at c++ side I just load and run all that scripts.
Python isn't hard -typed so I can register functions, build types dynamically without storing enum, etc. The only bad part is writing a lot of binding-code but It makes only once.
Are my ideas right?

Can you give us a rough idea of how large the game is going to be?
If you're not careful, you could give yourself a lot of extra work without much benefit, but with some planning it sounds like it might help. The important questions are "What parts of the program do I want to simplify?", "Do I need a scripting language to simplify them? and "Can the scripting language simplify them?".
You mentioned that you don't want to have to manually parse files. Python's pickle module could handle serialization for you, but so could .NET. If you're using Visual Studio, then you may find it easier to write the code in C# than in Python.
You should also look for ways to simplify your code without adding a new language. For example, you might be able to create a simple binary file format and store your data structures without much parsing. There are probably other things you can do, but that would require more detailed knowledge of the program.

Open-source C++ scanning library

Rationale: In my day-to-day C++ code development, I frequently need to
answer basic questions such as who calls what in a very large C++ code
base that is frequently changing. But, I also need to have some
automated way to exactly identify what the code is doing around a
particular area of code. "grep" tools such as Cscope are useful (and
I use them heavily already), but are not C++-language-aware: They
don't give any way to identify the types and kinds of lexical
environment of a given use of a type or function a such way that is
conducive to automation (even if said automation is limited to
"read-only" operations such as code browsing and navigation, but I'm
asking for much more than that below).
Question: Does there exist already an open-source C/C++-based library
(native, not managed, not Microsoft- or Linux-specific) that can
statically scan or analyze a large tree of C++ code, and can produce
result sets that answer detailed questions such as:
What functions are called by some supplied function?
What functions make use of this supplied type?
Ditto the above questions if C++ classes or class templates are involved.
The result set should provide some sort of "handle". I should be able
to feed that handle back to the library to perform the following types
of introspection:
What is the byte offset into the file where the reference was made?
What is the reference into the abstract syntax tree (AST) of that
reference, so that I can inspect surrounding code constructs? And
each AST entity would also have file path, byte-offset, and
type-info data associated with it, so that I could recursively walk
up the graph of callers or referrers to do useful operations.
The answer should meet the following requirements:
API: The API exposed must be one of the following:
C or C++ and probably is "C handle" or C++-class-instance-based
(and if it is, must be generic C o C++ code and not Microsoft- or
Linux-specific code constructs unless it is to meet specifics of
the given platform), or
Command-line standard input and standard output based.
C++ aware: Is not limited to C code, but understands C++ language
constructs in minute detail including awareness of inter-class
inheritance relationships and C++ templates.
Fast: Should scan large code bases significantly faster than
compiling the entire code base from scratch. This probably needs to
be relaxed, but only if Incremental result retrieval and Resilient
to small code changes requirements are fully met below.
Provide Result counts: I should be able to ask "How many results
would you provide to some request (and no don't send me all of the
results)?" that responds on the order of less than 3 seconds versus
having to retrieve all results for any given question. If it takes
too long to get that answer, then wastes development time. This is
coupled with the next requirement.
Incremental result retrieval: I should be able to then ask "Give me
just the next N results of this request", and then a handle to the
result set so that I can ask the question repeatedly, thus
incrementally pulling out the results in stages. This means I
should not have to wait for the entire result set before seeing
some subset of all of the results. And that I can cancel the
operation safely if I have seen enough results. Reason: I need to
answer the question: "What is the build or development impact of
changing some particular function signature?"
Resilient to small code changes: If I change a header or source
file, I should not have to wait for the entire code base to be
rescanned, but only that header or source file
rescanned. Rescanning should be quick. E.g., don't do what cscope
requires you to do, which is to rescan the entire code base for
small changes. It is understood that if you change a header, then
scanning can take longer since other files that include that header
would have to be rescanned.
IDE Agnostic: Is text editor agnostic (don't make me use a specific
text editor; I've made my choice already, thank you!)
Platform Agnostic: Is platform-agnostic (don't make me only use it
on Linux or only on Windows, as I have to use both of those
platforms in my daily grind, but I need the tool to be useful on
both as I have code sandboxes on both platforms).
Non-binary: Should not cost me anything other than time to download
and compile the library and all of its dependencies.
Not trial-ware.
Actively Supported: It is likely that sending help requests to mailing lists
or associated forums is likely to get a response in less than 2
days.
Network agnostic: Databases the library builds should be able to be used directly on
a network from 32-bit and 64-bit systems, both Linux and Windows
interchangeably, at the same time, and do not embed hardcoded paths
to filesystems that would otherwise "root" the database to a
particular network.
Build environment agnostic: Does not require intimate knowledge of my build environment, with
the notable exception of possibly requiring knowledge of compiler
supplied CPP macro definitions (e.g. -Dmacro=value).

I would say that CLang Index is a close fit. However I don't think that it stores data in a database.
Anyway the CLang framework offer what you actually need to build a tool tailored to your needs, if only because of its C, C++ and Objective-C parsing / indexing capabitilies. And since it's provided as a set of reusable libraries... it was crafted for being developed on!

I have to admit that I haven't used either because I work with a lot of Microsoft-specific code that uses Microsoft compiler extensions that i don't expect them to understand, but the two open source analyzers I'm aware of are Mozilla Pork and the Clang Analyzer.

If you are looking for results of code analysis (metrics, graphs, ...) why not use a tool (instead of API) to do that? If you can, I suggest you to take a look at Understand.
It's not free (there's a trial version) but I found it very useful.

Maybe Doxygen with GraphViz could be the answer of some of your constraints but not all,for example the analysis of Doxygen is not incremental.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js