How does to!string(enum.member) works? - d

How does std.conv.to!string(enum.member) work? How is it possible that a function takes an enum member and returns its name? Does it use a compiler extension or something similar? It's a bit usual to me since I came from C/C++ world.

What it does is use compile time reflection on the enum type to get a list of members (the names as strings) and their values. It constructs a switch statement out of this information for a fast lookup to get the name from a value. to!SomeEnum("a_string") uses the same principle, just in the other direction.
The compile time reflection info is accessed with __traits(allMembers, TheEnumType), which returns a list of strings that can be looped over to build the switch statement. Then __traits(getMember, TheEnumType, memberName) is used to fetch the body.
Traits can be seen more of here: http://dlang.org/traits.html#allMembers
That allMembers one works on many types, not just classes as seen in the example, but also structs, enums, and more, even modules.
The phobos source code has some examples like EnumMembers in std.traits: https://github.com/D-Programming-Language/phobos/blob/master/std/traits.d#L3360
though the phobos source is kinda hard to read, but on line 3399, at the bottom of that function, you can see it using __traits(allMembers) as its data source. std.conv.to is implemented in terms of many std.traits functions.
You can also check out the sample chapter tab to get the Reflection chapter out of my D cookbook which discusses this stuff too:
http://www.packtpub.com/discover-advantages-of-programming-in-d-cookbook/book
The final example in that chapter shows how to use several of the reflection capabilities to build a little function dispatcher based on strings. The following chapter (not available for free though) shows how to build a switch out of it for better efficiency too. It's actually pretty easy: just put the case statements inside a foreach over the compile time data and the D compiler will unroll then optimize the lookup table for you!

Related

Is it possible to create a user-defined datatype in a language like C/C++(or maybe any) from a string as user input or from file

Well this might be a very weird question but my curiosity has striken pretty hard on this. So here it goes...
NOTE: Lets take the language C into consideration here.
As programmers we usually define a user-defined datatype(say struct) in the source code with the appropriate name.
Suppose I have a program in which I have a structure defined as:
struct Animal {
char *name;
int lifeSpan;
};
And also I have started the execution of this program.
Now, my question here is;
What if I want to define a new structure called "Plant" just like "Animal" mentioned above in my program, without writing its definition in the source code itself(which is obviously impossible currently) but rather from a user input string(or a file input) during runtime.
Lets say my program takes input string from a text file named file1.txt whose content is:
struct Plant {
char *name;
int lifeSpan;
};
What I want now is to have a new structure named "Plant" in my program which is already in execution. The program should read the file content and create a structure as written in the file and attach it to itself on-the-go.
I have checked out a solution for C++ in the discussion Declaring a data type dynamically in C++ but it doesnt seem to have a very convincing solution.
The solution I am looking for is at the compiler-linker-loader level rather than from the language itself.I would be very pleased and thankful if anyone is looking forward to sharing their ideas on this.
What you're asking about is basically "can we implement C as a scripting language?", since this is the only way code can be executed after compilation.
I'm aware that people have been writing (mostly in the comments) that it's possible in other languages but isn't possible in C, since C is a compiled language (hence data types should be defined during compile time).
However, to the best of my knowledge it's actually possible (and might not be as hard as one would imagine).
There are many possible approaches (machine code emulation (VM), JIT compilation, etc').
One approach will use a C compiler to compile the C script as an external dynamic library (.dll on windows, .so on linux, etc') and than "load" the compiled library and execute the code (this is pretty much the JIT compilation approach, for lazy people).
EDIT:
As mentioned in the comments, by using this approach, the new type is loaded as part of an external library.
The original code won't know about this new type, only the new code (or library) will be "aware" of this new type and able to properly use it.
On the other hand, I'm not sure why you're insisting on the need to use static types and a compiler-linker-loader level solution.
The language itself (the C language) can manage this task dynamically (during execution time).
Consider Ruby MRI, for example. The Ruby language supports dynamic types that can be defined during runtime...
...However, this is implemented in C and it's possible to use the code from within C to define new modules and classes. These aren't static types that can be tested during compilation (type creation and identification is performed during runtime).
This is a perfect example showing that C (as a language) can dynamically define "types".
However, this is also a poor example because Ruby's approach is slow. A custom approved can be far faster since it would avoid the huge overhead related to functionality you might not need (such as inheritance).

How to modify C++ code from user-input

I am currently writing a program that sits on top of a C++ interpreter. The user inputs C++ commands at runtime, which are then passed into the interpreter. For certain patterns, I want to replace the command given with a modified form, so that I can provide additional functionality.
I want to replace anything of the form
A->Draw(B1, B2)
with
MyFunc(A, B1, B2).
My first thought was regular expressions, but that would be rather error-prone, as any of A, B1, or B2 could be arbitrary C++ expressions. As these expressions could themselves contain quoted strings or parentheses, it would be quite difficult to match all cases with a regular expression. In addition, there may be multiple, nested forms of this expression
My next thought was to call clang as a subprocess, use "-dump-ast" to get the abstract syntax tree, modify that, then rebuild it into a command to be passed to the C++ interpreter. However, this would require keeping track of any environment changes, such as include files and forward declarations, in order to give clang enough information to parse the expression. As the interpreter does not expose this information, this seems infeasible as well.
The third thought was to use the C++ interpreter's own internal parsing to convert to an abstract syntax tree, then build from there. However, this interpreter does not expose the ast in any way that I was able to find.
Are there any suggestions as to how to proceed, either along one of the stated routes, or along a different route entirely?
What you want is a Program Transformation System.
These are tools that generally let you express changes to source code, written in source level patterns that essentially say:
if you see *this*, replace it by *that*
but operating on Abstract Syntax Trees so the matching and replacement process is
far more trustworthy than what you get with string hacking.
Such tools have to have parsers for the source language of interest.
The source language being C++ makes this fairly difficult.
Clang sort of qualifies; after all it can parse C++. OP objects
it cannot do so without all the environment context. To the extent
that OP is typing (well-formed) program fragments (statements, etc,.)
into the interpreter, Clang may [I don't have much experience with it
myself] have trouble getting focused on what the fragment is (statement? expression? declaration? ...). Finally, Clang isn't really a PTS; its tree modification procedures are not source-to-source transforms. That matters for convenience but might not stop OP from using it; surface syntax rewrite rule are convenient but you can always substitute procedural tree hacking with more effort. When there are more than a few rules, this starts to matter a lot.
GCC with Melt sort of qualifies in the same way that Clang does.
I'm under the impression that Melt makes GCC at best a bit less
intolerable for this kind of work. YMMV.
Our DMS Software Reengineering Toolkit with its full C++14 [EDIT July 2018: C++17] front end absolutely qualifies. DMS has been used to carry out massive transformations
on large scale C++ code bases.
DMS can parse arbitrary (well-formed) fragments of C++ without being told in advance what the syntax category is, and return an AST of the proper grammar nonterminal type, using its pattern-parsing machinery. [You may end up with multiple parses, e.g. ambiguities, that you'll have decide how to resolve, see Why can't C++ be parsed with a LR(1) parser? for more discussion] It can do this without resorting to "the environment" if you are willing to live without macro expansion while parsing, and insist the preprocessor directives (they get parsed too) are nicely structured with respect to the code fragment (#if foo{#endif not allowed) but that's unlikely a real problem for interactively entered code fragments.
DMS then offers a complete procedural AST library for manipulating the parsed trees (search, inspect, modify, build, replace) and can then regenerate surface source code from the modified tree, giving OP text
to feed to the interpreter.
Where it shines in this case is OP can likely write most of his modifications directly as source-to-source syntax rules. For his
example, he can provide DMS with a rewrite rule (untested but pretty close to right):
rule replace_Draw(A:primary,B1:expression,B2:expression):
primary->primary
"\A->Draw(\B1, \B2)" -- pattern
rewrites to
"MyFunc(\A, \B1, \B2)"; -- replacement
and DMS will take any parsed AST containing the left hand side "...Draw..." pattern and replace that subtree with the right hand side, after substituting the matches for A, B1 and B2. The quote marks are metaquotes and are used to distinguish C++ text from rule-syntax text; the backslash is a metaescape used inside metaquotes to name metavariables. For more details of what you can say in the rule syntax, see DMS Rewrite Rules.
If OP provides a set of such rules, DMS can be asked to apply the entire set.
So I think this would work just fine for OP. It is a rather heavyweight mechanism to "add" to the package he wants to provide to a 3rd party; DMS and its C++ front end are hardly "small" programs. But then modern machines have lots of resources so I think its a question of how badly does OP need to do this.
Try modify the headers to supress the method, then compiling you'll find the errors and will be able to replace all core.
As far as you have a C++ interpreter (as CERN's Root) I guess you must use the compiler to intercept all the Draw, an easy and clean way to do that is declare in the headers the Draw method as private, using some defines
class ItemWithDrawMehtod
{
....
public:
#ifdef CATCHTHEMETHOD
private:
#endif
void Draw(A,B);
#ifdef CATCHTHEMETHOD
public:
#endif
....
};
Then compile as:
gcc -DCATCHTHEMETHOD=1 yourfilein.cpp
In case, user want to input complex algorithms to the application, what I suggest is to integrate a scripting language to the app. So that the user can write code [function/algorithm in defined way] so the app can execute it in the interpreter and get the final results. Ex: Python, Perl, JS, etc.
Since you need C++ in the interpreter http://chaiscript.com/ would be a suggestion.
What happens when someone gets ahold of the Draw member function (auto draw = &A::Draw;) and then starts using draw? Presumably you'd want the same improved Draw-functionality to be called in this case too. Thus I think we can conclude that what you really want is to replace the Draw member function with a function of your own.
Since it seems you are not in a position to modify the class containing Draw directly, a solution could be to derive your own class from A and override Draw in there. Then your problem reduces to having your users use your new improved class.
You may again consider the problem of automatically translating uses of class A to your new derived class, but this still seems pretty difficult without the help of a full C++ implementation. Perhaps there is a way to hide the old definition of A and present your replacement under that name instead, via clever use of header files, but I cannot determine whether that's the case from what you've told us.
Another possibility might be to use some dynamic linker hackery using LD_PRELOAD to replace the function Draw that gets called at runtime.
There may be a way to accomplish this mostly with regular expressions.
Since anything that appears after Draw( is already formatted correctly as parameters, you don't need to fully parse them for the purpose you have outlined.
Fundamentally, the part that matters is the "SYMBOL->Draw("
SYMBOL could be any expression that resolves to an object that overloads -> or to a pointer of a type that implements Draw(...). If you reduce this to two cases, you can short-cut the parsing.
For the first case, a simple regular expression that searches for any valid C++ symbol, something similar to "[A-Za-z_][A-Za-z0-9_\.]", along with the literal expression "->Draw(". This will give you the portion that must be rewritten, since the code following this part is already formatted as valid C++ parameters.
The second case is for complex expressions that return an overloaded object or pointer. This requires a bit more effort, but a short parsing routine to walk backward through just a complex expression can be written surprisingly easily, since you don't have to support blocks (blocks in C++ cannot return objects, since lambda definitions do not call the lambda themselves, and actual nested code blocks {...} can't return anything directly inline that would apply here). Note that if the expression doesn't end in ) then it has to be a valid symbol in this context, so if you find a ) just match nested ) with ( and extract the symbol preceding the nested SYMBOL(...(...)...)->Draw() pattern. This may be possible with regular expressions, but should be fairly easy in normal code as well.
As soon as you have the symbol or expression, the replacement is trivial, going from
SYMBOL->Draw(...
to
YourFunction(SYMBOL, ...
without having to deal with the additional parameters to Draw().
As an added benefit, chained function calls are parsed for free with this model, since you can recursively iterate over the code such as
A->Draw(B...)->Draw(C...)
The first iteration identifies the first A->Draw( and rewrites the whole statement as
YourFunction(A, B...)->Draw(C...)
which then identifies the second ->Draw with an expression "YourFunction(A, ...)->" preceding it, and rewrites it as
YourFunction(YourFunction(A, B...), C...)
where B... and C... are well-formed C++ parameters, including nested calls.
Without knowing the C++ version that your interpreter supports, or the kind of code you will be rewriting, I really can't provide any sample code that is likely to be worthwhile.
One way is to load user code as a DLL, (something like plugins,)
this way, you don't need to compile your actual application, just the user code will be compiled, and you application will load it dynamically.

Conversion between C structs (C++ POD) and google protobufs?

I have code that currently passes around a lot of (sometimes nested) C (or C++ Plain Old Data) structs and arrays.
I would like to convert these to/from google protobufs. I could manually write code that converts between these two formats, but it would be less error prone to auto-generate such code. What is the best way to do this? (This would be easy in a language with enough introspection to iterate over the names of member variables, but this is C++ code we're talking about)
One thing I'm considering is writing python code that parses the C structs and then spits out a .proto file, along with C code that copies from member to member (in either direction) for all of the types, but maybe there is a better way... or maybe there is another IDL that already can generate:
.h file containing all of nested types
.proto file containing equivalents
.c file with functions that copy either direction between the C++ structs that the .proto file generates and the structs defined in the .h file
I could not find a ready solution for this problem, if there is one, please let me know!
If you decide to roll your own in python, the python bindings for gdb might be useful. You could then read the symbol table, find all structs defined in specified file, and iterate all struct members.
Then use <gdbtype>.strip_typedefs() to get the primitive type of each member and translate it to appropriate protobuf type.
This is probably safer then a text parsers as it will handle types that depends on architecture, compiler flags, preprocessor macros, etc.
I guess the code to convert to and from protobuf also could be generated from the struct member to message field relation, but does not sound easy.
Protocol buffers can be built by parsing an ASCII representation using TextFormat. So one option would be to add a method dumpAsciiProtoBuf to each of your structs. The method would dump any simple fields (like strings, bools, etc) and call dumpAsciiProtoBuf recursively on nested structs fields. You would then have to make sure that the concatenated result is a valid ASCII protocol buffer which can be parsed using TextFormat.
Note though that this might have some performance implications (since parsing the ASCII representation could be expensive). However, this would save you the trouble of writing a converter in a different language, so it seems to be a convenient solution.
I would not parse the C source code myself, instead I would use the LibClang to parse C files into an AST and my own AST walker to generate the Protobuf and the transcoders as necessary. Googling for "libclang walk AST" should give something to start with, like ast-walker.cc and ast-dumper.cc from this github repository, for example.
The question brought up is the age old challenge with "C" (and C++) code - No easy (or standard) way to reflect on c "struct" (or classes). Just search stack overflow on C reflection, and you will see lot of unsuccessful attempts. My first advice will be NOT to try to build another solution (in python, etc.).
One simple approach: Consider using gdb ptype to get structured output for you structures, which you can use to create the .proto file. The advantage is that there is no need to handle the full syntax of the C language (#define, line breaks, ...). See How do I show what fields a struct has in GDB?
From the gdb ptype, it's a short trip to protobuf '.proto' file.
You can get similar result from libCLang (and I believe there is comparable gcc plugin, but I can not locate it). However, you will have to write some non-trivial "C" code.
Another approach - will be to use 'swig' (https://www.swig.org), and process the swig xml output (or the -xmlout option) to dump the parse tree into XML. While this approach will require a little bit of digging to locate the structure that are needed, the information in XML format is complete, easy to parse (using whatever XML parser you want - python, perl). If you are brave enough, you can use xslt to generate the output.

gcc for parsing code

I would like to know how to use GCC as a library to parse C/C++/Java/Objective C/Ada code for my program.
I want to bypass prepocessing and prefix all the functions that are user written with a prefix My.
like so Print(); becomes MyPrint(); I also wish to do this with the variables.
You can look here:
http://codesynthesis.com/~boris/blog/2010/05/03/parsing-cxx-with-gcc-plugin-part-1/
This is description of how to use gcc plugin interface to parse C++ code. Other language should be handled in the same manner.
Also you can try pork from mozilla:
https://wiki.mozilla.org/Pork
When I tried it (pork), I spend hour or so to fix compile problems, but then
I can write scripts like this:
rewrite SyncPrimitiveUpgrade {
type PRLock* => Mutex*
call PR_NewLock() => new Mutex()
call PR_Lock(lock) => lock->Lock()
call PR_Unlock(lock) => lock->Unlock()
call PR_DestroyLock(lock) => delete lock
}
so it found all type PRLock and replate it with Mutex, also it search call of functions
like PR_NewLock and replace it with "new Mutex".
You might wish to investigate the sparse C parser. It understands a lot of C (all the C used in the Linux kernel sources, which is a fairly good subset of legal ANSI-C and GNU-C extensions) and provides a few sample compiler backends to provide a lint-like static analysis tool for type checking.
While the code looks very clean and thorough, your task might be easier done via another mechanism -- the example.c included with the sparse source that demonstrates a compiler is 1955 lines long.
For C, you cannot do that reliably. If you skip preprocessing you will -- in general -- not have valid C code to be parsed. E.g.
#define FOO
#define BAR
#define BAZ
FOO void BAR qux BAZ(void) { }
How is the parser supposed to recognize this a function definition of qux without doing the preprocessing?
First, GCC is not a library, and is not structured to be one (in contrast to LLVM).
Why (i.e. what for) do you want to parse C, C++, Ada source code?
I would consider (assuming a GCC 4.6 version) extending GCC either thru plugins written in C, or preferably using MELT, a high level domain specific language to extend GCC (disclaimer: I am the main author of MELT).
But using GCC as a library is not realistic at all.
I really think that for what you want to achieve, MELT is the right tool. However, it is poorly documented. Please use the gcc-melt#googlegroups.com list to ask questions.
And be aware that extending GCC does take some amount of work (more than a week perhaps), because you need to partly understand the GCC internal representations.
Our DMS Software Reengineering Toolkit can parse C, C++, Java and Ada code (not Objective C at this time) in a wide variety of dialects and carry out transformations on the code. DMS's C and C++ front ends include a preprocessor, so you can you can cause preprocessing before you parse.
I'm probably don't understand what you want to do, because it seems strange to rename every function and (global?) variable with a "My...." prefix. But you could do that with some DMS rules (a rough sketch of renames of user functions for GCC3:
domain C~GCC3.
rule rewrite_function_names(t: type_designator, i: IDENTIFIER, p: parameter_list, s: statements):
function_header->functionheader
"\t \i(\p) { \s } " -> "\t \renamed\(\i\) (\p) { \s }" ;
and a helper function "renames" that takes a tree node containing an identifer, and returns a tree node with the renamed identifier.
Because DMS patterns only match against the parse trees, you won't get any false positives.
You'd need some additional patterns to handle various different syntax cases within each langauge (e.g, for C, "void" return type, because "void" isn't a type designator in the syntax, and global variable declarations), and different rules for different languages (Ada's syntax is not the same as that of C).
This might seem like big hammer for your task, but if you really insist on doing this for a variety of languages in a reliable way, it seems hard to avoid the problem of getting decent parsers for all those languages. (And if you are really going to do this for all these languages, DMS can be taught to handle ObjectiveC the same we we have taught it to handle the other langauges).
Your alternative is some kind of string hacking solution, which might work 95% of the time. If you can live with that, then Perl or something similar is likely your answer.
forget about GCC, its made as a compiler's parser, not an analysis parser, you'd do way better using something like libclang, a C interface to clang, which can process both C & C++

Is there a tool that enables me to insert one line of code into all functions and methods in a C++-source file?

It should turn this
int Yada (int yada)
{
return yada;
}
into this
int Yada (int yada)
{
SOME_HEIDEGGER_QUOTE;
return yada;
}
but for all (or at least a big bunch of) syntactically legal C/C++ - function and method constructs.
Maybe you've heard of some Perl library that will allow me to perform these kinds of operations in a view lines of code.
My goal is to add a tracer to an old, but big C++ project in order to be able to debug it without a debugger.
Try Aspect C++ (www.aspectc.org). You can define an Aspect that will pick up every method execution.
In fact, the quickstart has pretty much exactly what you are after defined as an example:
http://www.aspectc.org/fileadmin/documentation/ac-quickref.pdf
If you build using GCC and the -pg flag, GCC will automatically issue a call to the mcount() function at the start of every function. In this function you can then inspect the return address to figure out where you were called from. This approach is used by the linux kernel function tracer (CONFIG_FUNCTION_TRACER). Note that this function should be written in assembler, and be careful to preserve all registers!
Also, note that this should be passed only in the build phase, not link, or GCC will add in the profiling libraries that normally implement mcount.
I would suggest using the gcc flag "-finstrument-functions". Basically, it automatically calls a specific function ("__cyg_profile_func_enter") upon entry to each function, and another function is called ("__cyg_profile_func_exit") upon exit of the function. Each function is passed a pointer to the function being entered/exited, and the function which called that one.
You can turn instrumenting off on a per-function or per-file basis... see the docs for details.
The feature goes back at least as far as version 3.0.4 (from February 2002).
This is intended to support profiling, but it does not appear to have side effects like -pg does (which compiles code suitable for profiling).
This could work quite well for your problem (tracing execution of a large program), but, unfortunately, it isn't as general purpose as it would have been if you could specify a macro. On the plus side, you don't need to worry about remembering to add your new code into the beginning of all new functions that are written.
There is no such tool that I am aware of. In order to recognise the correct insertion point, the tool would have to include a complete C++ parser - regular expressions are not enough to accomplish this.
But as there are a number of FOSS C++ parsers out there, such a tool could certainly be written - a sort of intelligent sed for C++ code. The biggest problem would probably be designing the specification language for the insert/update/delete operation - regexes are obviously not the answer, though they should certainly be included in the language somehow.
People are always asking here for ideas for projects - how about this for one?
I use this regex,
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
to locate the functions and add extra lines of code.
With that regex I also get the function name (group 1) and the arguments (group 2).
Note: you must filter out names like, "while", "do", "for", "switch".
This can be easily done with a program transformation system.
The DMS Software Reengineering Toolkit is a general purpose program transformation system, and can be used with many languages (C#, COBOL, Java, EcmaScript, Fortran, ..) as well as specifically with C++.
DMS parses source code (using full langauge front end, in this case for C++),
builds Abstract Syntax Trees, and allows you to apply source-to-source patterns to transform your code from one C# program into another with whatever properties you wish. THe transformation rule to accomplish exactly the task you specified would be:
domain CSharp.
insert_trace():function->function
"\visibility \returntype \fnname(int \parametername)
{ \body } "
->
"\visibility \returntype \fnname(int \parametername)
{ Heidigger(\CppString\(\methodname\),
\CppString\(\parametername\),
\parametername);
\body } "
The quote marks (") are not C++ quote marks; rather, they are "domain quotes", and indicate that the content inside the quote marks is C++ syntax (because we said, "domain CSharp"). The \foo notations are meta syntax.
This rule matches the AST representing the function, and rewrites that AST into the traced form. The resulting AST is then prettyprinted back into source form, which you can compile. You probably need other rules to handle other combinations of arguments; in fact, you'd probably generalize the argument processing to produce (where practical) a string value for each scalar argument.
It should be clear you can do a lot more than just logging with this, and a lot more than just aspect-oriented programming, since you can express arbitrary transformations and not just before-after actions.