Is C++ code generation in ANTLR 3.2 ready? - c++

I was trying hard to make ANTLR 3.2 generate parser/lexer in C++. It was fruitless. Things went well with Java & C though.
I was using this tutorial to get started: http://www.ibm.com/developerworks/aix/library/au-c_plusplus_antlr/index.html
When I checked the *.stg files, I found that:
CPP has only
./tool/src/main/resources/org/antlr/codegen/templates/CPP/CPP.stg
C has so many files:
./tool/src/main/resources/org/antlr/codegen/templates/C/AST.stg
./tool/src/main/resources/org/antlr/codegen/templates/C/ASTDbg.stg
./tool/src/main/resources/org/antlr/codegen/templates/C/ASTParser.stg
./tool/src/main/resources/org/antlr/codegen/templates/C/ASTTreeParser.stg
./tool/src/main/resources/org/antlr/codegen/templates/C/C.stg
./tool/src/main/resources/org/antlr/codegen/templates/C/Dbg.stg
And so other languages.
My C.g file:
grammar C;
options { language='CPP'; }
/** Match things like "call foo;" */
r : 'call' ID ';' {System.out.println("invoke "+$ID.text);} ;
ID: ('a'..'z'|'A'..'Z'|'_')('0'..'9'|'a'..'z'|'A'..'Z'|'_')* ;
WS: (' ' |'\n' |'\r' )+ {$channel=HIDDEN;} ; // ignore whitespace
Errors:
error(10): internal error: group Cpp does not satisfy interface ANTLRCore: missing templates [lexerRuleRefAndListLabel, parameterSetAttributeRef, scopeSetAttributeRef, returnSetAttributeRef, lexerRulePropertyRef_text, lexerRulePropertyRef_type, lexerRulePropertyRef_line, lexerRulePropertyRef_pos, lexerRulePropertyRef_index, lexerRulePropertyRef_channel, lexerRulePropertyRef_start, lexerRulePropertyRef_stop, ruleSetPropertyRef_tree, ruleSetPropertyRef_st]
error(10): internal error: group Cpp does not satisfy interface ANTLRCore: mismatched arguments on these templates [outputFile(LEXER, PARSER, TREE_PARSER, actionScope, actions, docComment, recognizer, name, tokens, tokenNames, rules, cyclicDFAs, bitsets, buildTemplate, buildAST, rewriteMode, profile, backtracking, synpreds, memoize, numRules, fileName, ANTLRVersion, generatedTimestamp, trace, scopes, superClass, literals), optional headerFile(LEXER, PARSER, TREE_PARSER, actionScope, actions, docComment, recognizer, name, tokens, tokenNames, rules, cyclicDFAs, bitsets, buildTemplate, buildAST, rewriteMode, profile, backtracking, synpreds, memoize, numRules, fileName, ANTLRVersion, generatedTimestamp, trace, scopes, superClass, literals), lexer(grammar, name, tokens, scopes, rules, numRules, labelType, filterMode, superClass), rule(ruleName, ruleDescriptor, block, emptyRule, description, exceptions, finally, memoize), alt(elements, altNum, description, autoAST, outerAlt, treeLevel, rew), tokenRef(token, label, elementIndex, hetero), tokenRefAndListLabel(token, label, elementIndex, hetero), listLabel(label, elem), charRangeRef(a, b, label), ruleRef(rule, label, elementIndex, args, scope), ruleRefAndListLabel(rule, label, elementIndex, args, scope), lexerRuleRef(rule, label, args, elementIndex, scope), lexerMatchEOF(label, elementIndex), tree(root, actionsAfterRoot, children, nullableChildList, enclosingTreeLevel, treeLevel)]
error(10): internal error: C.g : java.lang.IllegalArgumentException: Can't find template actionGate.st; group hierarchy is [Cpp]
... and so on.
Please kindly advise. Thank you! I'm using Leopard 10.5.8 with
CLASSPATH=:/Users/vietlq/projects/antlr-3.2.jar:/Users/vietlq/projects/stringtemplate-3.2.1/lib/stringtemplate-3.2.1.jar:/Users/vietlq/projects/stringtemplate-3.2.1/lib/antlr-2.7.7.jar

It sounds like you've answered your own question: ANTLR's C++ lexer/parser generators are not yet functional.
For what it's worth, it's still possible to use ANTLR for parsing from C++, via the C target. I use ANTLR to generate a C language lexer and parser, which I then compile and link to my C++ code.
I have one C++ file that translates an ANTLR parse tree to my target abstract syntax tree classes, and the rest of my code doesn't care where the AST comes from. It works pretty well in practice! It would be easy to replace ANTLR with a different parser generator, and I find that the separation leads to cleaner ANTLR grammars.

I have posted a C++ Target for ANTLR. Please check it out.

Related

Calcite: Implementing functions with reserved names (current_schema)

I'm using Calcite for parsing and executing queries, and can't figure out how to implement functions like Postgres' current_schema(). For functions with non-reserved keywords, ex: VERSION(), the implementation is fairly straightforward:
private void addRootOperators() {
SchemaPlus plus = this.rootSchema.plus();
// VersionFunction is a custom class with a run() method
plus.add("VERSION", ScalarFunctionImpl.create(VersionFunction.class, "run"));
}
Unfortunately this approach doesn't work when I try to implement CURRENT_SCHEMA(), though it will if I give it a non-reserved name.
When I run the query SELECT current_schema() with the function defined like VERSION above, I get the error:
Encountered "(" at line 1, column 22
I believe this is because the default implementation of CURRENT_SCHEMA is as a SqlStringContextVariable (hence why the parentheses are causing a parsing error), but I don't know how to support the version of the function with the parentheses in light of this.

xerces_3_1 is able to create invalid xml at comments & processing instructions

I've encountered a problem using the xerces-dom library:
When you're adding a comments to the xml-tree like:
DOMDocument* doc = impl->createDocument(0, L"root", 0);
DOMElement* root = doc->getDocumentElement();
DOMComment* com1 = doc->createComment(L"SetA -- DataA");
DOMComment* com2 = doc->createComment(L"SetB -- DataB");
doc->insertBefore(com1, root);
doc->insertBefore(com2, root);
That will create the following xml-tree:
<?xml version="1.0" encoding="UTF-8" standalone="false"?>
<!--SetA -- DataA-->
<!--SetB -- DataB-->
<root/>
which is indeed invalid xml.
The same can be done with processing instructions by using ?> as data:
DOMProcessingInstruction procInstr = doc->createProcessingInstruction(L"target", L"?>");
My question:
Is there a way i can configure xerces to not create these kind of comments or do i have to check for these things myself?
And my other question: Why isn't it possible to just always escape characters like <>&'", even in comments and processing instructions, in order to avoid these kind of problems?
A DOMDocument is not an XML document. It is supposed to represent one, but it is conceivable that a valid DOM may not be serializable into a valid XML document (the converse should be less likely). Indeed this appears to be the case here:
Neither the Level 1 or Level2 two specs say anything about this, but the Level 3 DOM specification added this sentence about the DOMComment interface:
No lexical check is done on the content of a comment and it is therefore possible to have the character sequence "--" (double-hyphen) in the content, which is illegal in a comment per section 2.5 of [XML 1.0]. The presence of this character sequence must generate a fatal error during serialization.
So Xerces is operating within the DOM Level 3 specification even if it accepts a comment with '--' in it, as long as it bombs if you go to serialize it.
Not a great situation, but it makes sense because DOM was originally intended to represent XML Documents that have been read in, not to create new ones. So it is liberal in what it can represent. Fine for reading - a DOMComment can represent anything (and more) the XML document can, but a bit annoying that it doesn't catch the invalid string when you createComment().
Checking DOMDocumentImpl.cpp we see:
DOMComment *DOMDocumentImpl::createComment(const XMLCh *data)
{
return new (this, DOMMemoryManager::COMMENT_OBJECT) DOMCommentImpl(this, data);
}
And in DOMCommentImpl.cpp we have just:
DOMCommentImpl::DOMCommentImpl(DOMDocument *ownerDoc, const XMLCh *dat)
: fNode(ownerDoc), fCharacterData(ownerDoc, dat)
{
fNode.setIsLeafNode(true);
}
Finally we see in DOMCharacterDataImpl.cpp that there is no chance of validation up front - it just saves the user provided string without checking it.
DOMCharacterDataImpl::DOMCharacterDataImpl(DOMDocument *doc, const XMLCh *dat)
{
fDoc = (DOMDocumentImpl*)doc;
XMLSize_t len=XMLString::stringLen(dat);
fDataBuf = fDoc->popBuffer(len+1);
if (!fDataBuf)
fDataBuf = new (fDoc) DOMBuffer(fDoc, len+15);
fDataBuf->set(dat, len);
}
Sadly, no Xerces does not have an option or even a nice hook to check this for you. And because the Level 3 spec seems to demand that "No lexical check is done", it probably isn't even legal to add one.
The answer to your second question is simpler to answer: Because that's the way they wanted it defined it. See the XML 1.1 spec for example:
Comments
[15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
It is similar for PIs.
The grammar simply does not allow for escapes. Seems about right: baroque and broke.
Maybe there is a way to catch the error on serialization or normalization, but I wasn't able to confirm whether Xerces 3.1 can. To be safe I think the best way is to wrap createComment() and check for it before creating the node, or walk the tree and check it yourself.

Enterprise Architect Reverse Engineering: 'Unexpected symbol' error

I'm trying to generate a class diagram, using reverse engineering, but the following is happening:
There was an error parsing C:\Documents and Settings\Meus documentos\EA_Documentos\Modelos\Environment\class\Factory.h on line 11. Unexpected symbol: ISIMFactory
You may need to define a language macro.
There was an error parsing C:\Documents and Settings\Meus documentos\EA_Documentos\Modelos\Environment\class\Model.h on line 99. Unexpected symbol: ISIMModel
You may need to define a language macro.
There are many more of these.
This is the corresponding code in CSIMEnvironmentModel.h
class SIMMDLENVv01_EXPORT CSIMEnvironmentModel // line 99
: public ISIMModel
, public ISIMEventSource
, public ISIMScheduledModel
, public ISIMExecut
, public ISIMPublisher
{
public:
CSIMEnvironmentModel(const std::string &a_modelType);
virtual ~CSIMEnvironmentModel(void);
and CSIMEnvFactory.h
class SIMMDLENVv01_EXPORT CSIMEnvFactory // line 11
: public ISIMFactory
{
public:
CSIMEnvFactory();
virtual ~CSIMEnvFactory(void);
std::vector<ISIMModel*> InstanceModel(const std::string &a_modelType, const std::string &a_conf);
};
What's the reason for this error message?
Your code contains usage of a macro definition (SIMMDLENVv01_EXPORT) that isn't part of EA's standard macro definitions (there's whole a lot of them covering ATL and MFC mostly).
You'll need to add additional ones under 'Settings->Language Macros' (as the hint in the error message suggests).
NOTE
Use the syntax MACRO() when declaring macros that were #defined to receive any number of arguments.
If you're trying to reverse engineer framework libraries like Qt or alike, you'll need to set many of these that you're able to reverse engineer the code without getting errors.
May be you should think of a different strategy to reference these types and classes in your model then.
Another workaround might be to solely preprocess all the code you want to import first, and import from the preprocessed results.

Sphinx: Correct way to document an enum?

Looking through the C and C++ domains of Sphinx, it doesn't appear to have native support for documenting enums (and much less anonymous enums). As of now, I use cpp:type:: for the enum type, and then a list of all possible values and their description, but that doesn't seem like an ideal way to deal with it, especially since it makes referencing certain values a pain (either I reference just the type, or add an extra marker in front of the value).
Is there a better way to do this? And how would I go about handling anonymous enums?
A project on Github, spdylay, seems to have an approach. One of the header files at
https://github.com/tatsuhiro-t/spdylay/blob/master/lib/includes/spdylay/spdylay.h
has code like this:
/**
* #enum
* Error codes used in the Spdylay library.
*/
typedef enum {
/**
* Invalid argument passed.
*/
SPDYLAY_ERR_INVALID_ARGUMENT = -501,
/**
* Zlib error.
*/
SPDYLAY_ERR_ZLIB = -502,
} spdylay_error;
There some description of how they're doing it at https://github.com/tatsuhiro-t/spdylay/tree/master/doc, which includes using a API generator called mkapiref.py, available at
https://github.com/tatsuhiro-t/spdylay/blob/master/doc/mkapiref.py
The RST it generates for this example is
.. type:: spdylay_error
Error codes used in the Spdylay library.
.. macro:: SPDYLAY_ERR_INVALID_ARGUMENT
(``-501``)
Invalid argument passed.
.. macro:: SPDYLAY_ERR_ZLIB
(``-502``)
Zlib error.
You could take a look and see if it's useful for you.
Sphinx now has support for enums
Here is an example with enum values:
.. enum-class:: partition_affinity_domain
.. enumerator:: \
not_applicable
numa
L4_cache
L3_cache
L2_cache
L1_cache
next_partitionab
Hi Maybe you should consider using doxygen for documentation as it has a lot more native support for c / c++. if you want to retain the sphinx output of your documentation you can output from doxygen as xml, then using Breathe it will take the xml and give you the same sphinx output you are used to having.
Here is an example of documenting an enum in doxygen format from the breathe website.
//! Our toolset
/*! The various tools we can opt to use to crack this particular nut */
enum Tool
{
kHammer = 0, //!< What? It does the job
kNutCrackers, //!< Boring
kNinjaThrowingStars //!< Stealthy
};
hope this helps.

Error parsing when reversing code

I try to make a class diagram from existing C++ code using Enterprise Architect 9.3.935. I do Code Engineering / Import Source Directory and then select my directory.
However, I get tons of error of type:
"There was an error parsing C:\xxxxx # on line xxxx. Unexpected symbol: XXXXX.
You may need to define a language macro."
In the code, I have a macro for exporting DLL and most of my class look like :
class MACRO_FOR_DLL_EXPORT CMyClassName
{
...
}
or
class MACRO_FOR_DLL_EXPORT CMyClassName : public CHerMother
{
...
}
The unexpected symbol is usually "{" in the first case and "CHerMother" in the second.
How to fix this issue, is it related with the macro ?
You can declare several Language specific macros in your EA project, to ignore these when reverse engineering (parsing) code. There's a number of standard C/C++ framework macros predefined natively by EA.