I have used pegjs to create a parser, quite similar actually to the example shown here for a custom string-formatting function.
I would like to port this to C++. What might be the best way to do this? Are there are existing parsers in C/C++ that are compliant with the pegjs grammar? Or, would I want to transpile the generated JS with something like V8, or what might be a possible route to do this conversion, aside from manually rewriting everything from scratch?
Related
I'm considering Google protocol buffers as a solution to my problem of communication between C++ and C# using named pipes. But I have one concern: all I've been able to find on protobuf is how to create a message from a prototype using protobuf compiler. This is neat, but I would also need to be able to serialize existing structs. I can't seem to find any info (but maybe I'm overlooking it). Do you know if it is possible to serialize a struct in C++ using protobufs, so it can be read in .NET, without modifying said existing struct?
Yes and no.
It's possible. In fact, I have done it. Not the .NET loading part, but the serialization to protobuf and the generation of a prototype from a C++ class. However, doing so requires a number of things and is not that easy.
First of all, protobufs are quite limited in their ability to represent data. They are basically only capable of representing POD-types (in the C++ sense), and very little else. I personally had to add a few basic things to the format to make it into a proper full-featured serialization format. But if you restrict yourself to POD-types, then the plain protobuf format will work fine.
The second thing is that you'll need a serialization library of some kind, and that will require that you add some code for each struct / class to perform the serialization / de-serialization (not necessarily "intrusively", meaning that you might not have to change the classes, just add some code on the side). You can look at Boost.Serialization, that's the basic template for how to create a serialization library in C++. Boost.Serialization is not particularly flexible for this purpose, and so, you might have to change a few things (like I had to do).
The third thing is that you will need quite a bit of wizardry under-the-hood to make this happen. In particular, you are going to need a reliable and feature-rich run-time type identification system (RTTI) in order to able to have useful type names, and you might need to clever meta-programming or some intrusive class hierarchy to be able to detect user-defined types for which you need to generate a prototype.
So, that's why my answer is "yes and no" because it is possible, but not without quite a bit of work and a good framework to rely on.
N.B.: Writing code to encode/decode data into the proto-buf format (with those small-ints, and all that) is really the easy part, proto-buf format is so simple, it's almost laughable. Writing the serialization framework that will allow you to do fancy things like generating prototypes, that's the hard part.
I am looking for a good tool that can automatically convert c++ naming styles, such as change method name from "SetPosition()" to "setPosition()", and class name "CPoint" to "Point". Seems most of formatters have no such feature. Thanks.
You can manipulate C++ source using Clang, although you might have to write a bit of code to do the conversion that you have in mind.
For example, the Clang infrastructure would allow you to write a program that iterates over all class method names in a given source file. You could then programatically convert any pascal-case method names that you encounter into camel-case names.
Code formatters don't do renaming of functions. What you need is a refactoring tool that can perform rename. If your project isn't very big then Devexpress Refactor! for C++ (free) might be of help. It will involve some manual work, but it'll be safer than search and replace. Qt Creator has a rename reporting built in an I have found it to be quite reliable (hasn't messed up so far unlike some dedicated refactoring tools).
If manual work using refactoring tools is too much then you can use clangs pieces to build a tool that performs the refactorings automatically, but this is probably more work than the semi-manual process with refactoring tools.
I am trying to write a JSON parser (instead of using one of the freely available ones, because of certain project constraints) and have written lex+yacc based version with a simple wrapper C++ class. I have redefined the YY_INPUT macro for lex to read from a memory buffer. Now the deal is to ensure that the parser is thread-safe and I am not sure how easy it is to ensure that. There are two concerns:
Ultimately YY_INPUT is reading from a global object. I could not think of another way of doing this.
I have no idea how many globals does the generated lex/yacc code end up using.
Would be great if folks can share their experience of doing something similar.
Cheers.
PS. We don' t use STL/string or any templates for that matter. We use our own variant-based containers. We use lex+yacc rather than flex+bison, on four major Unices.
I don't have much experience working directly with yacc, but I know that bison supports reentrant parsers that are thread-safe. It also looks like lex supports a reentrant lexer as well, and I'd guess that if you put the two together it should work out just fine.
I'm trying to create an app to search my company's ColdFusion codebase. I'd like to be able to do intelligent searches, for example: find where a function is defined (and not hit everywhere the function is called). In order to do this, I'd need to parse the ColdFusion code to identify things like function declarations, function calls, database queries, etc.
I've looked into using lex and yacc, but I've never used them before and the learning curve seems very steep. I'm hoping there is something already out there that I could use. My other option is a mess of difficult-to-maintain regex-spaghetti code, which I want to avoid.
I used the source to CFEclipse, since it is open source and has a parser. Not sure about the legality of this if we were selling/redistributing it, but we're only using it for an internal tool.
Writing parsers for real langauges is usually difficult because they contain constructs that Lex and Yacc often don't handle well, e.g., the langauge isn't LALR(1). ColdFusion might be easier than some because of its XML-like style.
If you want to build a sophisticated parser quickly, you might consider using our
DMS Software Reengineering Toolkit which has GLR parsing support.
If you want to avoid writing your own or hacking all those Regexps, you could consider our Source Code Search Engine. It has language-sensitive parsers and can search across very large source code bases very quickly. One of its "language sensitive" parsers is AdhocText, which is designed to handle "generic" programming languages such as those you might find in a random programming book; it even understands XML-like tags such as ColdFusion has. You can download a evaluation version from the link provided to try it.
EDIT 4/3/2010: A recent feature added to the SCSE is the ability to tag definitions and uses separately. That would address the OP's desire to find the function definition rather than all the calls.
None existed. Since ColdFusion is more like scripts than code, I'd imagine it'll be hard to write a parser for it.
ColdFusion Builder can parse CFM/CFC to an outline in Eclipse. Maybe you can do some research on whether a CF Builder plugin can do what you want to do.
A few semesters back I had a class where we wrote a very rudimentary scheme parser and eventually an interpreter. After the class, I converted my parser into a C++ parser that did a reasonably good job of parsing C++ as long as I didn't do anything fancy with the preprocessor or macros. I could use it to read over my classes and functions and do neat things like automatically generate class readers or writers or set up function callbacks from a text file.
However, my program is pretty limited. I'm sure I could spend some time to make it more robust and do more neat things, but I don't want to spend the time and effort if there are already more robust tools available that do the same thing. I figure there has to be something like this out there since parsers are an essential part of compilers, but I haven't seen tools specifically for automatic code generation that make it easy to go through and play with data structures that represent classes, functions and variables for C++ specifically. Are there tools that do this?
Edit:
Hopefully this will clarify a little bit of what I'm looking for. The program I have runs as a prebuild step in visual studio. It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code. Currently I just use it to make it easy to read and write my data structures to a plain text file, but I could do other things as well. The file readers and writers are output into plain .cpp and .h files which I include in the rest of my project just as I would any other file. What I'm looking for are tools that do similar things so I can decide if I should continue to use my own or switch to a some better solution. I'm not looking for anything that generates machine code or edits code that I've written.
A complete parser-building tool like ANTLR or YACC is necessary if you want to parse C++ from scratch, but it's overkill for your purposes.
It reads over my source files, makes a list of classes, their members, their functions, etc. which is then used to generate new code.
Two main options:
GCC-XML can generate a list of classes, members, and functions. The distribution version on their web site is quite old; try the CVS version instead. I don't know about the availability of a Windows port.
Doxygen is designed for producing documentation, but it can also produce an XML output, which you should be able to use to do what you want.
Currently I just use it to make it easy to read and write my data structures to a plain text file...
This is known as serialization. Try Boost.Serialization or maybe libs11n or Google Protocol Buffers. Stack Overflow has further discussion.
...but I could do other things as well.
Other cool applications of this kind of automatic code generation include reflection (inspecting your objects' members at runtime, using duck typing with C++, etc.) and generating wrappers for calling C++ from scripting languages. For a C++ reflection library, see Reflex. For an example of generating wrappers for scripting languages, see Boost.Python or SWIG.
The C++ FAQ Lite has references to YACC grammars for C++. YACC is an old-school parser that was used to generate parser output, clumsy and difficult to learn but very powerful. Nowadays, you'd use Gnu Bison instead of YACC.
Don't forget about Cog. It requires you to know Python. In essence it embeds the output of Python scripts into your code. It's absurdly easy to use, but it takes a totally different approach from things like ANTLR and its purpose is somewhat different.
Maybe Boost::Serialize or ANTLR?
I answered a similar question (re splitting source files into separate header and cpp files) by suggesting the use of lzz.
lzz has a very powerful C++ parser that builds a representation for everything except the bodies of functions. As long as you don't need the contents of the function bodies you you could modify 'lzz' so that it performs the generation step you want.
If you want tools that can parse production C++ code, and carry out arbitrary analyses and transformations, see our DMS Software Reengineering Toolkit and its C++ front end.
It would be straightforward to use the information DMS can provide about C++ code, its structures, types, instances, to generate such access functions. If you wanted to generate access functions in another language, DMS provides means to code transformations from the input language (in this case, C++) to that target language.
Mozilla developed Pork for this kind of thing. I can't say it's easy to use (or even to build), but it is in production.
I've already used professionally the Nvelocity engine combined with C# as a prevoius step to coding, with very good results.