How to suppress C++ keywords? - c++

I'm generating C++ code and run into issues when the model being generated from has properties clashing with C++ keywords. I'd prefer the model to stay language agnostic.
I've tried some #define int ReSeRvEd_int-hacks local to the generated code but it just feels wrong to allocate other symbols - the problem does not really go away and either case cross referencing between generated code and model becomes more difficult.
Any suggestion how to suppress/hide keywords?

I can think of a couple of approaches:
Add a standard prefix or suffix to all generated tokens. So rather than properties named "steve" and "int" producing variables named steve and int respectively, they would produce prop_steve and prop_int.
Force generated tokens to be capitalized.
Two things that I would not do:
Try to make the parser okay with a property named int, as you seem to be trying to do above. In addition to violating the Principle of Least Astonishment, this is not legal.
Have a hardcoded remapping from, say, "int" to innt. Ugly, inconsistent, and (assuming the generated code interfaces with user-written code) forces the user to memorize the remappings.

Related

Enable Sublime Text 3 or 4's syntax highlighting for custom types

I'm tring to set a specific color on Sublime Text (coding C/C++) for custom types such as ones created with typedefs. By default, ST3/4 seems to treat one of these custom types as variable names, hence coloring them in the same way.
I found this question, which is quite the same but it's about Vim, mine is specific to ST.
My problem is identical:
Here you see InitMyStruct takes a pointer to MyStruct as an argument and returns one, but they are not colored like a type which is, in my opinion, misleading.
Is there a way to make Sublime Text 3 or 4 change the color of those custom types? Also, I have the same request for references to #defined elements such as with #define ERROR_CODE (-1), ERROR_CODE will appear as a common variable elsewhere in the code.
I tried looking into preferences for color scheme but there is no Sublime scope for that. I doubt this is a problem with my theme/color scheme as I can't find a single color scheme which handles that.
The reason they aren’t colored as types is because they aren’t built-in. You’ve declared it elsewhere in the file, or perhaps even elsewhere in the project. Regex-based syntax highlighting (mostly) only knows about the string of characters on a given line, so it can only provide scopes for what it can anticipate a priori based purely on the syntax.
Your best bet to get relevant scopes for these types would be to use LSP. I haven’t used it, though, so I’m not sure whether it will actually help in this scenario.

Boost.Spirit adding #include feature into calculator example

Following Boost.Spirit compiler examples I am migrating my Flex/Bison based calculator-like grammar to Spirit based. I want to add a feature #include<another_input.inp>. I have defined the include_statement grammar successfully. Should I follow the way error handling was doing: on_success(include_statement, annotation_function(...)), i.e. for each successful matching of include_statement, get the new input file name and call phrase_parse() again ? or like the Flex/Bison to push/pop the input stack?
Thanks.
Guessing, from the little information that is here, that you meant to ask whether you can reuse the same grammar instance, or it should be better to instantiate a new instance to parse the includes, it depends.
You can do both.
When the grammar is stateless (hint: it usually is if you can use it const) there's no difference. Otherwise, prefer to instantiate a separate instance.
However,
the point is somewhat moot since it appears you already decided to parse the includes after parsing the main document (if I get your comment right)
there's always the danger of global state; Even if the grammar object is const, you could potentially modify external state (e.g. using phx::ref from semantic action) so, this would be an issue, regardless of whether you used separate grammar instances.

C++ Naming variables [duplicate]

This question already has answers here:
Variable Naming Conventions in C++
(12 answers)
Closed 9 years ago.
In many examples of code that I've seen, they name their variable in a specific way.
E.g.
class obj
{
int mInt;
}
or
bool gTexture;
Questions.
Why do they name them in such way, and there are for sure more ways, I think...
How do you name them, and why?
Thank You
The m in mInt represents that the int is a member variable, while the g in gTexture denotes the variable being global.
This comes from Hungarian Notation.
http://en.wikipedia.org/wiki/Hungarian_notation
Naming is personally. To answer your second question, I don't use such a naming convention, and I append an underscore to class attributes.
Companies have often naming conventions. You may want to have alook at Google's naming conventions: http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#General_Naming_Rules
The example you have given uses 'm' for member varibles and 'g' for globals. This is something that is used by some people. It makes it easy to see in a member function (when the function is a bit larger than a few lines, so you can't just look up at the top of the function to see the name of the parameters, local variables and so on), what is "local variable" and what affects "outside of the function".
If you work for a company, in a school or on an open source project, most likely, there is a coding standard that tells what the naming convention is. If it's your personal project, then decide on something you think works for you. The main point is that it's consistent. If not ALL member variables start with 'm', and not all global variables start with 'g', then it's pretty pointless to have it some places - just gives a false sense of security.
You haven't to follow a specific notation but it's cool if you do.
All is about clarity of your code, a variable without any upper case is truly less understandable than a variable with a good synthax. (At the first view, when you look quickly a part of code)
For a clear code, I can recommend the google's norme for c++ code : http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
Why do they name them in such way, and there are for sure more ways, I think...
Generally it is difficult to understand other people's code; If enough time passes, it is difficult to understand your own code as well.
Because of this, software teams set up conventions to make sure the code their team writes is as similar as possible to the code they themselves would have written.
This refers to structuring code, used elements (interfaces, classes, namespaces, etc), naming functions and variables, what to document and in which format, and so on.
When done properly and consistently, it has a significant effect of shortening code maintenance time within a team.
There are a few known conventions, mostly from the conventions used in implementing large code bases and used libraries.
Java tends to use camelCaseNotation (start with small letter, use no underscores, capitalize each word).
MFC used the Hungarian notation, where variable names are prefixed with a few letters specifying scope and type of data (m_XXX for member variables, g_XXX for globals, s_XXX for statics, etc).
In particular the Hungarian convention can be gotten right (by using prefixes for semantic information) or horribly wrong (by using prefixes for syntactical information).
(MFC got it horribly wrong.)
ANSI C++ (and std:: namespace) tends to use small_letters_with_underscores for identifiers.
There are others and most software teams set up a convention that is a variation of one of the big ones.
How do you name them, and why?
These days I follow the ANSI C++ conventions, mostly because I want my code to integrate seamlessly with library code. I also think it looks simple and obvious (and this is very subjective).
I rarely use one letter variables (only when the meaning is clear) and prefer full words, to shortened ones.
Examples:
indexes: int index, line_index, col_index;
class names: class recordset; class task_details; etc.
http://en.wikipedia.org/wiki/Hungarian_notation
Not a real question. Everyone name them as they want to. You may read these guidelines, though: http://msdn.microsoft.com/en-us/library/vstudio/ms229045(v=vs.100).aspx

Is it possible to strip type names from executable while keeping RTTI enabled?

I recently disabled RTTI on my compiler (MSVC10) and the executable size decreased significantly. By comparing the produced executables using a text editor, I found that the RTTI-less version contains much less symbol names, explaining the saved space.
AFAIK, those symbol names are only used to fill the type_info structure associated with each the polymorphic type, and one can programmatically access them calling type_info::name().
According to the standard, the format of the string returned by type_info::name() is unspecified. That is, no one can rely one it to do serious things portably. So, it should be possible for an implementation to always return an empty string without breaking anything, thus reducing the executable size without disabling RTTI support (so we can still use the typeid operator & compare type_info's objects safely).
But... is it possible ? I'm using MSVC10 and I've not found any option to do that. I can either disable completely RTTI (/GR-), or enable it with full type names (/GR). Does any compiler provide such an option?
So, it should be possible for an implementation to always return an empty string without breaking anything, thus reducing the executable size without disabling RTTI support (so we can still use the typeid operator & compare type_info's objects safely).
You are misreading the standard. The intent of making the return value from type_info::name() unspecified (other than a null-terminated binary string) was to give the implementers of the compiler/library/run-time environment free reign to implement the RTTI requirements as they see best. You, the programmer, have no say in how the Application Binary Interface (if there is one) is designed or implemented.
You're asking three different questions here.
The initial question asks whether there's any way to get MSVC to not generate names, or whether it's possible with other compilers, or, failing that, whether there's any way to strip the names out of the generated type_info without breaking things.
Then you want to know whether it would be possible to modify the MS ABI (presumably not too radically) so that it would be possible to strip the names.
Finally, you want to know whether it would be possible to design an ABI that didn't have names.
Question #1 is itself a complex question. As far as I know, there's no way to get MSVC to not generate names. And most other compilers are aimed at ABIs that specifically define what typeid(foo).name() must return, so they also can't be made to not generate names.
The more interesting question is, what happens if you strip out the names. For MSVC, I don't know the answer. The best thing to do here is probably to try it—go into your DLLs and change the first character of each name to \0 and see if it breaks dynamic_cast, etc. (I know that you can do this with Mac and linux x86_64 executables generated by g++ 4.2 and it works, but let's put that aside for now.)
On to question #2, assuming blanking the names doesn't work, it wouldn't be that hard to modify a name-based system to no longer require names. One trivial solution is to use hashes of the names, or even ROT13-encoded names (remember that the original goal here is "I don't want casual users to see the embarrassing names of my classes"). But I'm not sure that would count for what you're looking for. A slightly more complex solution is as follows:
For "dllexport"ed classes, generate a UUID, put that in the typeinfo, and also put it in the .LIB import library that gets generated along with the DLL.
For "dllimport"ed classes, read the UUID out of the .LIB and use that instead.
So, if you manage to get the dllexport/dllimport right, it will work, because your exe will be using the same UUID as the dll. But what if you don't? What if you "accidentally" specify identical classes (e.g., an instantiation of the same template with the same parameters) in your DLL and your EXE, without marking one as dllexport and one as dllimport? RTTI won't see them as the same type.
Is this a problem? Well, the C++ standard doesn't say it is. And neither does any MS documentation. In fact, the documentation explicitly says that you're not allowed to do this. You cannot use the same class or function in two different modules unless you explicitly export it from one module and import it into another. The fact that this is very hard to do with class templates is a problem, and it's a problem they don't try to solve.
Let's take a realistic example: Create a node-based linkedlist class template with a global static sentinel, where every list's last node points to that sentinel, and the end() function just returns a pointer to it. (Microsoft's own implementation of std::map used to do exactly this; I'm not sure if that's still true.) New up a linkedlist<int> in your exe, and pass it by reference to a function in your dll that tries to iterate from l.begin() to l.end(). It will never finish, because none of the nodes created by the exe will point to the copy of the sentinel in the dll. Of course if you pass l.begin() and l.end() into the DLL, instead of passing l itself, you won't have this problem. You can usually get away with passing a std::string or various other types by reference, just because they don't depend on anything that breaks. But you're not actually allowed to do so, you're just getting lucky. So, while replacing the names with UUIDs that have to be looked up at link time means types can't be matched up at link-loader time, the fact that types already can't be matched up at link-loader time means this is irrelevant.
It would be possible to build a name-based system that didn't have these problems. The ARM C++ ABI (and the iOS and Android ABIs based on it) restricts what programmers can get away with much less than MS, and has very specific requirements on how the link-loader has to make it work (3.2.5). This one couldn't be modified to not be name-based because it was an explicit choice in the design that:
• type_info::operator== and type_info::operator!= compare the strings returned by type_info::name(), not just the pointers to the RTTI objects and their names.
• No reliance is placed on the address returned by type_info::name(). (That is, t1.name() != t2.name() does not imply that t1 != t2).
The first condition effectively requires that these operators (and type_info::before()) must be called out of line, and that the execution environment must provide appropriate implementations of them.
But it's also possible to build an ABI that doesn't have this problem and that doesn't use names. Which segues nicely to #3.
The Itanium ABI (used by, among other things, both OS X and recent linux on x86_64 and i386) does guarantee that a linkedlist<int> generated in one object and a linkedlist<int> generated from the same header in another object can be linked together at runtime and will be the same type, which means they must have equal type_info objects. From 2.9.1:
It is intended that two type_info pointers point to equivalent type descriptions if and only if the pointers are equal. An implementation must satisfy this constraint, e.g. by using symbol preemption, COMDAT sections, or other mechanisms.
The compiler, linker, and link-loader must work together to make sure that a linkedlist<int> created in your executable points to the exact same type_info object that a linkedlist<int> created in your shared object would.
So, if you just took out all the names, it wouldn't make any difference at all. (And this is pretty easily tested and verified.)
But how could you possibly implement this ABI spec? j_kubik effectively argues that it's impossible because you'd have to preserve some link-time information in the .so files. Which points to the obvious answer: preserve some link-time information in the .so files. In fact, you already have to do that to handle, e.g., load-time relocations; this just extends what you need to preserve. And in fact, both Apple and GNU/linux/g++/ELF do exactly that. (This is part of the reason everyone building complex linux systems had to learn about symbol visibility and vague linkage a few years ago.)
There's an even more obvious way to solve the problem: Write a C++-based link loader, instead of trying to make the C++ compiler and linker work together to trick a C-based link loader. But as far as I know, nobody's tried that since Be.
Requirements for type-descriptor:
Works correctly in multi compilation-unit and shared-library environment;
Works correctly for different versions of shared libraries;
Works correctly although different compilation units don't share any information about type, except it's name: usually one header is used for all compilation units to define same type, but it's not required; even if, it doesn't affect resulting object file.
Work correctly despite fact that template instantiations must be fully-defined (so including type_info data) in every library that uses them, and yet behave like one type if several such libs are used together.
The fourth rule essentially bans all non-name based type-descriptors like UUIDs (unless specifically mentioned in type definition, but that is just name-replacement at best, and probably requires standard-alterations).
Stroing thuse UUIDs in separate files like suggeste .LIB files also causes trouble: different library versions implementing new types would cause trouble.
Compilation units should be able to share the same type (and its type_info) without the need to involve linker - because it should stay free of any language-specifics.
So type-name can be only unique type descriptor without completely re-modeling compilation and linking (also dynamic). I could imagine it working, but not under current scheme.

Is there a tool to add the "override" identifier to existing C++ code

The task
I am trying to work out how best to add C++0x's override identifier to all existing methods that are already overrides in a large body of C++ code, without doing it manually.
(We have many, many hundreds of thousands of lines of code, and doing it manually would be a complete non-starter.)
Current idea
Our coding standards say that we should add the virtual keyword against all implicitly virtual methods in derived classes, even though strictly unnecessary (to aid comprehension).
So if I were to script the addition myself, I'd write a script that read all our headers, found all functions beginning with virtual, and insert override before the following semi-colon. Then compile it on a compiler that supports override, and fix all the errors in base classes.
But I'd really much rather not use this home-grown way, as:
it's obviously going to be tedious and error-prone.
not everyone has remembered, every time, to add the virtual keyword, so this method would miss out some existing overrides
Is there an existing tool?
So, is there already a tool that parses C++ code, detects existing methods that overrides, and appends override to their declarations?
(I am aware of static analysis tools such as PC-lint that warn about functions that look like they should be overrides. What I'm after is something that would actually munge our code, so that future errors in overrides will be detected at compiler-time, rather than later on in static analysis)
(In case anyone is tempted to point out that C++03 doesn't support 'override'... In practice, I'd be adding a macro, rather than the actual "override" identifier, to use our code on older compilers that don't support this feature. So after the identifier was added, I'd run a separate script to replace it with whatever macro we're going to use...)
Thanks in advance...
There is a tool under development by the LLVM project called "cpp11-migrate" which currently has the following features:
convert loops to range-based for loops
convert null pointer constants (like NULL or 0) to C++11 nullptr
replace the type specifier in variable declarations with the auto type specifier
add the override specifier to applicable member functions
This tool is documented here and should be released as part of clang 3.3.
However, you can download the source and build it yourself today.
Edit
Some more info:
Status of the C++11 Migrator - a blog post, dated 2013-04-15
cpp11-migrate User’s Manual
Edit 2: 2013-09-07
"cpp11-migrate" has been renamed to "clang-modernize". For windows users, it is now included in the new LLVM Snapshot Builds.
Edit 3: 2020-10-07
"clang-modernize" has bee renamed to "Clang-Tidy".
Our DMS Software Reengineering Toolkit with its C++11-capable C++ Front End can do this.
DMS is a general purpose program transformation system for arbitrary programming languages; the C++ front end allows it to process C++. DMS parses, builds ASTs and symbol tables that are accurate (this is hard to do for C++), provides support for querying properties of the AST nodes and trees, allows procedural and source-to-source transformations on the tree. After all changes are made, the modified tree can be regenerated with comments retained.
Your problem requires that you find derived virtual methods and change them. A DMS source-to-source transformation rule to do that would look something like:
source domain Cpp. -- tells DMS the following rules are for C++
rule insert_virtual_keyword (n:identifier, a: arguments, s: statements):
method_declaration -> method_declaration " =
" void \n(\a) { \s } " -> " virtual void \n(\a) { \s }"
if is_implicitly_virtual(n).
Such rules match against the syntax trees, so they can't mismatch to a comment, string, or whatever. The funny quotes are not C++ string quotes; they are meta-quotes to allow the rule language to know that what is inside them has to be treated as target language ("Cpp") syntax. The backslashes are escapes from the target language text, allowing matches to arbitrary structures e.g., \a indicates a need for an "a", which is defined to be the syntactic category "arguments".
You'd need more rules to handle cases where the function returns a non-void result, etc. but you shouldn't need a lot of them.
The fun part is implementing the predicate (returning TRUE or FALSE) controlling application of the transformation: is_implicitly_virtual. This predicate takes (an abstract syntax tree for) the method name n.
This predicate would consult the full C++ symbol table to determine what n really is. We already know it is a method from just its syntactic setting, but we want to know in what class context.
The symbol table provides the linkage between the method and class, and the symbol table information for the class tells us what the class inherits from, and for those classes, which methods they contain and how they are declared, eventually leading to the discovery (or not) that the parent class method is virtual. The code to do this has to be implemented as procedural code going against the C++ symbol table API. However, all the hard work is done; the symbol table is correct and contains references to all the other data needed. (If you don't have this information, you can't possibly decide algorithmically, and any code changes will likely be erroneous).
DMS has been used to carry out massive changes on C++ code in the past using program transformations.(Check the Papers page at the web site for C++ rearchitecting topics).
(I'm not a C++ expert, merely the DMS architect, so if I have minor detail wrong, please forgive.)
I did something like this a few months ago with about 3 MB worth of code and while you say that "doing it manually would be a complete non-starter," I think it is the only way. The reason is that you should be applying the override keyword to the prototypes that are intended to override base class methods. Any tool that adds it will put it on the prototypes that actually override base class methods. The compiler already knows which methods those are so adding the keyword doesn't change anything. (Please note that I am not terribly familiar with the new standard and I am assuming the override keyword is optional. Visual Studio has supported override since at least VS2005.)
I used a search for "virtual" in the header files to find most of them and I still occasionally find another prototype that is missing the override keyword.
I found two bugs by going through that.
Eclipse CDT has a working C++ parser and semantic utilities. The latest version IIRC also has markers for overriding methods.
It wouldn't require much code to write a plug-in which would base on that and rewrite the code to contain the override tags where appropriate.
one option is to
Enable suggest-override compiler warning And then write a script
which can insert override keyword to location pointed by the emitted warnings