parsing a C structure in C++ - c++

I'm looking for a way to parse a C structure in order to get the the name and the type of the variables.
For example I have a structure like this:
struct MyStruct {
int anInt ;
float aFloat ;
}
I need to get the types int and float and the 2 strings "anInt" and "aFloat".
After I have to use these values in another function:
addValue<int>("anInt") ;
add Value<float>("aFloat") ;
Do you know how to do this automatically, I guess, at compilation ?
Thanks.

You probably cannot do that with standard C++11 template code.
You could consider customizing your C++ compiler to get such information. For example, if compiling your code with a recent GCC, you might consider customizing it with your MELT extension (MELT is a domain specific language, implemented by a plugin, to customize GCC). You'll need to understand the details of GCC internal representation (so such an extension might take a week or more of your time).
The Qt MOC facility might perhaps be useful or inspirational. It is parsing a limited form of class declaration.
Alternatively you might consider generating the C++ struct or class representation from some other input; e.g. change your build procedure, perhaps your Makefile, to generate some .h header file (and perhaps some .cc C++ translation unit) from your higher level representation.

Related

List of member reflection in C++ [duplicate]

Is there a way to enumerate the members of a structure (struct | class) in C++ or C? I need to get the member name, type, and value. I've used the following sample code before on a small project where the variables were globally scoped. The problem I have now is that a set of values need to be copied from the GUI to an object, file, and VM environment. I could create another "poor man’s" reflection system or hopefully something better that I haven't thought of yet. Does anyone have any thoughts?
EDIT: I know C++ doesn't have reflection.
union variant_t {
unsigned int ui;
int i;
double d;
char* s;
};
struct pub_values_t {
const char* name;
union variant_t* addr;
char type; // 'I' is int; 'U' is unsigned int; 'D' is double; 'S' is string
};
#define pub_v(n,t) #n,(union variant_t*)&n,t
struct pub_values_t pub_values[] = {
pub_v(somemember, 'D'),
pub_v(somemember2, 'D'),
pub_v(somemember3, 'U'),
...
};
const int no_of_pub_vs = sizeof(pub_values) / sizeof(struct pub_values_t);
To state the obvious, there is no reflection in C or C++. Hence no reliable way of enumerating member variables (by default).
If you have control over your data structure, you could try a std::vector<boost::any> or a std::map<std::string, boost::any> then add all your member variables to the vector/map.
Of course, this means all your variables will likely be on the heap so there will be a performance hit with this approach. With the std::map approach, it means that you would have a kind of "poor man's" reflection.
You can specify your types in an intermediate file and generate C++ code from that, something like COM classes can be generated from idl files. The generated code provides reflection capabilities for those types.
I've done something similar two different ways for different projects:
a custom file parsed by a Ruby script to do the generation
define the types as C# types, use C#'s reflection to get all the information and generate C++ from this (sounds convoluted, but works surprisingly well, and writing the type definitions is quite similar to writing C++ definitions)
Boost has a ready to use Variant library that may fit your needs.
simplest way - switch to Objective-C OR Objective-C++. That languages have good introspection and are full-compatible with C/C++ sources.
also You can use m4/cog/... for simultaneous generation structure and his description from some meta-description.
It feels like you are constructing some sort of debugger. I think this should be doable if you make sure you generate pdb files while building your executable.
Not sure in what context you want to do this enumeration, but in your program you should be able to call functions from Microsofts dbghelp.dll to get type information from variables etc. (I'm assuming you are using windows, which might of course not be the case)
Hope this helps to get you a little bit further.
Cheers!
Since C++ does not have reflection builtin, you can only get the information be teaching separately your program about the struct content.
This can be either by generating your structure from a format that you can use after that to know the strcture information, or by parsing your .h file to extract the structure information.

Is it possible to create a user-defined datatype in a language like C/C++(or maybe any) from a string as user input or from file

Well this might be a very weird question but my curiosity has striken pretty hard on this. So here it goes...
NOTE: Lets take the language C into consideration here.
As programmers we usually define a user-defined datatype(say struct) in the source code with the appropriate name.
Suppose I have a program in which I have a structure defined as:
struct Animal {
char *name;
int lifeSpan;
};
And also I have started the execution of this program.
Now, my question here is;
What if I want to define a new structure called "Plant" just like "Animal" mentioned above in my program, without writing its definition in the source code itself(which is obviously impossible currently) but rather from a user input string(or a file input) during runtime.
Lets say my program takes input string from a text file named file1.txt whose content is:
struct Plant {
char *name;
int lifeSpan;
};
What I want now is to have a new structure named "Plant" in my program which is already in execution. The program should read the file content and create a structure as written in the file and attach it to itself on-the-go.
I have checked out a solution for C++ in the discussion Declaring a data type dynamically in C++ but it doesnt seem to have a very convincing solution.
The solution I am looking for is at the compiler-linker-loader level rather than from the language itself.I would be very pleased and thankful if anyone is looking forward to sharing their ideas on this.
What you're asking about is basically "can we implement C as a scripting language?", since this is the only way code can be executed after compilation.
I'm aware that people have been writing (mostly in the comments) that it's possible in other languages but isn't possible in C, since C is a compiled language (hence data types should be defined during compile time).
However, to the best of my knowledge it's actually possible (and might not be as hard as one would imagine).
There are many possible approaches (machine code emulation (VM), JIT compilation, etc').
One approach will use a C compiler to compile the C script as an external dynamic library (.dll on windows, .so on linux, etc') and than "load" the compiled library and execute the code (this is pretty much the JIT compilation approach, for lazy people).
EDIT:
As mentioned in the comments, by using this approach, the new type is loaded as part of an external library.
The original code won't know about this new type, only the new code (or library) will be "aware" of this new type and able to properly use it.
On the other hand, I'm not sure why you're insisting on the need to use static types and a compiler-linker-loader level solution.
The language itself (the C language) can manage this task dynamically (during execution time).
Consider Ruby MRI, for example. The Ruby language supports dynamic types that can be defined during runtime...
...However, this is implemented in C and it's possible to use the code from within C to define new modules and classes. These aren't static types that can be tested during compilation (type creation and identification is performed during runtime).
This is a perfect example showing that C (as a language) can dynamically define "types".
However, this is also a poor example because Ruby's approach is slow. A custom approved can be far faster since it would avoid the huge overhead related to functionality you might not need (such as inheritance).

Conversion between C structs (C++ POD) and google protobufs?

I have code that currently passes around a lot of (sometimes nested) C (or C++ Plain Old Data) structs and arrays.
I would like to convert these to/from google protobufs. I could manually write code that converts between these two formats, but it would be less error prone to auto-generate such code. What is the best way to do this? (This would be easy in a language with enough introspection to iterate over the names of member variables, but this is C++ code we're talking about)
One thing I'm considering is writing python code that parses the C structs and then spits out a .proto file, along with C code that copies from member to member (in either direction) for all of the types, but maybe there is a better way... or maybe there is another IDL that already can generate:
.h file containing all of nested types
.proto file containing equivalents
.c file with functions that copy either direction between the C++ structs that the .proto file generates and the structs defined in the .h file
I could not find a ready solution for this problem, if there is one, please let me know!
If you decide to roll your own in python, the python bindings for gdb might be useful. You could then read the symbol table, find all structs defined in specified file, and iterate all struct members.
Then use <gdbtype>.strip_typedefs() to get the primitive type of each member and translate it to appropriate protobuf type.
This is probably safer then a text parsers as it will handle types that depends on architecture, compiler flags, preprocessor macros, etc.
I guess the code to convert to and from protobuf also could be generated from the struct member to message field relation, but does not sound easy.
Protocol buffers can be built by parsing an ASCII representation using TextFormat. So one option would be to add a method dumpAsciiProtoBuf to each of your structs. The method would dump any simple fields (like strings, bools, etc) and call dumpAsciiProtoBuf recursively on nested structs fields. You would then have to make sure that the concatenated result is a valid ASCII protocol buffer which can be parsed using TextFormat.
Note though that this might have some performance implications (since parsing the ASCII representation could be expensive). However, this would save you the trouble of writing a converter in a different language, so it seems to be a convenient solution.
I would not parse the C source code myself, instead I would use the LibClang to parse C files into an AST and my own AST walker to generate the Protobuf and the transcoders as necessary. Googling for "libclang walk AST" should give something to start with, like ast-walker.cc and ast-dumper.cc from this github repository, for example.
The question brought up is the age old challenge with "C" (and C++) code - No easy (or standard) way to reflect on c "struct" (or classes). Just search stack overflow on C reflection, and you will see lot of unsuccessful attempts. My first advice will be NOT to try to build another solution (in python, etc.).
One simple approach: Consider using gdb ptype to get structured output for you structures, which you can use to create the .proto file. The advantage is that there is no need to handle the full syntax of the C language (#define, line breaks, ...). See How do I show what fields a struct has in GDB?
From the gdb ptype, it's a short trip to protobuf '.proto' file.
You can get similar result from libCLang (and I believe there is comparable gcc plugin, but I can not locate it). However, you will have to write some non-trivial "C" code.
Another approach - will be to use 'swig' (https://www.swig.org), and process the swig xml output (or the -xmlout option) to dump the parse tree into XML. While this approach will require a little bit of digging to locate the structure that are needed, the information in XML format is complete, easy to parse (using whatever XML parser you want - python, perl). If you are brave enough, you can use xslt to generate the output.

gcc for parsing code

I would like to know how to use GCC as a library to parse C/C++/Java/Objective C/Ada code for my program.
I want to bypass prepocessing and prefix all the functions that are user written with a prefix My.
like so Print(); becomes MyPrint(); I also wish to do this with the variables.
You can look here:
http://codesynthesis.com/~boris/blog/2010/05/03/parsing-cxx-with-gcc-plugin-part-1/
This is description of how to use gcc plugin interface to parse C++ code. Other language should be handled in the same manner.
Also you can try pork from mozilla:
https://wiki.mozilla.org/Pork
When I tried it (pork), I spend hour or so to fix compile problems, but then
I can write scripts like this:
rewrite SyncPrimitiveUpgrade {
type PRLock* => Mutex*
call PR_NewLock() => new Mutex()
call PR_Lock(lock) => lock->Lock()
call PR_Unlock(lock) => lock->Unlock()
call PR_DestroyLock(lock) => delete lock
}
so it found all type PRLock and replate it with Mutex, also it search call of functions
like PR_NewLock and replace it with "new Mutex".
You might wish to investigate the sparse C parser. It understands a lot of C (all the C used in the Linux kernel sources, which is a fairly good subset of legal ANSI-C and GNU-C extensions) and provides a few sample compiler backends to provide a lint-like static analysis tool for type checking.
While the code looks very clean and thorough, your task might be easier done via another mechanism -- the example.c included with the sparse source that demonstrates a compiler is 1955 lines long.
For C, you cannot do that reliably. If you skip preprocessing you will -- in general -- not have valid C code to be parsed. E.g.
#define FOO
#define BAR
#define BAZ
FOO void BAR qux BAZ(void) { }
How is the parser supposed to recognize this a function definition of qux without doing the preprocessing?
First, GCC is not a library, and is not structured to be one (in contrast to LLVM).
Why (i.e. what for) do you want to parse C, C++, Ada source code?
I would consider (assuming a GCC 4.6 version) extending GCC either thru plugins written in C, or preferably using MELT, a high level domain specific language to extend GCC (disclaimer: I am the main author of MELT).
But using GCC as a library is not realistic at all.
I really think that for what you want to achieve, MELT is the right tool. However, it is poorly documented. Please use the gcc-melt#googlegroups.com list to ask questions.
And be aware that extending GCC does take some amount of work (more than a week perhaps), because you need to partly understand the GCC internal representations.
Our DMS Software Reengineering Toolkit can parse C, C++, Java and Ada code (not Objective C at this time) in a wide variety of dialects and carry out transformations on the code. DMS's C and C++ front ends include a preprocessor, so you can you can cause preprocessing before you parse.
I'm probably don't understand what you want to do, because it seems strange to rename every function and (global?) variable with a "My...." prefix. But you could do that with some DMS rules (a rough sketch of renames of user functions for GCC3:
domain C~GCC3.
rule rewrite_function_names(t: type_designator, i: IDENTIFIER, p: parameter_list, s: statements):
function_header->functionheader
"\t \i(\p) { \s } " -> "\t \renamed\(\i\) (\p) { \s }" ;
and a helper function "renames" that takes a tree node containing an identifer, and returns a tree node with the renamed identifier.
Because DMS patterns only match against the parse trees, you won't get any false positives.
You'd need some additional patterns to handle various different syntax cases within each langauge (e.g, for C, "void" return type, because "void" isn't a type designator in the syntax, and global variable declarations), and different rules for different languages (Ada's syntax is not the same as that of C).
This might seem like big hammer for your task, but if you really insist on doing this for a variety of languages in a reliable way, it seems hard to avoid the problem of getting decent parsers for all those languages. (And if you are really going to do this for all these languages, DMS can be taught to handle ObjectiveC the same we we have taught it to handle the other langauges).
Your alternative is some kind of string hacking solution, which might work 95% of the time. If you can live with that, then Perl or something similar is likely your answer.
forget about GCC, its made as a compiler's parser, not an analysis parser, you'd do way better using something like libclang, a C interface to clang, which can process both C & C++

Enumerate members of a structure?

Is there a way to enumerate the members of a structure (struct | class) in C++ or C? I need to get the member name, type, and value. I've used the following sample code before on a small project where the variables were globally scoped. The problem I have now is that a set of values need to be copied from the GUI to an object, file, and VM environment. I could create another "poor man’s" reflection system or hopefully something better that I haven't thought of yet. Does anyone have any thoughts?
EDIT: I know C++ doesn't have reflection.
union variant_t {
unsigned int ui;
int i;
double d;
char* s;
};
struct pub_values_t {
const char* name;
union variant_t* addr;
char type; // 'I' is int; 'U' is unsigned int; 'D' is double; 'S' is string
};
#define pub_v(n,t) #n,(union variant_t*)&n,t
struct pub_values_t pub_values[] = {
pub_v(somemember, 'D'),
pub_v(somemember2, 'D'),
pub_v(somemember3, 'U'),
...
};
const int no_of_pub_vs = sizeof(pub_values) / sizeof(struct pub_values_t);
To state the obvious, there is no reflection in C or C++. Hence no reliable way of enumerating member variables (by default).
If you have control over your data structure, you could try a std::vector<boost::any> or a std::map<std::string, boost::any> then add all your member variables to the vector/map.
Of course, this means all your variables will likely be on the heap so there will be a performance hit with this approach. With the std::map approach, it means that you would have a kind of "poor man's" reflection.
You can specify your types in an intermediate file and generate C++ code from that, something like COM classes can be generated from idl files. The generated code provides reflection capabilities for those types.
I've done something similar two different ways for different projects:
a custom file parsed by a Ruby script to do the generation
define the types as C# types, use C#'s reflection to get all the information and generate C++ from this (sounds convoluted, but works surprisingly well, and writing the type definitions is quite similar to writing C++ definitions)
Boost has a ready to use Variant library that may fit your needs.
simplest way - switch to Objective-C OR Objective-C++. That languages have good introspection and are full-compatible with C/C++ sources.
also You can use m4/cog/... for simultaneous generation structure and his description from some meta-description.
It feels like you are constructing some sort of debugger. I think this should be doable if you make sure you generate pdb files while building your executable.
Not sure in what context you want to do this enumeration, but in your program you should be able to call functions from Microsofts dbghelp.dll to get type information from variables etc. (I'm assuming you are using windows, which might of course not be the case)
Hope this helps to get you a little bit further.
Cheers!
Since C++ does not have reflection builtin, you can only get the information be teaching separately your program about the struct content.
This can be either by generating your structure from a format that you can use after that to know the strcture information, or by parsing your .h file to extract the structure information.