Replace C style comments by C++ style comments - c++

How can I automatically replace all C style comments (/* comment */) by C++ style comments (// comment)?
This has to be done automatically in several files. Any solution is okay, as long as it works.

This tool does the job:
https://github.com/cenit/jburkardt/tree/master/recomment
RECOMMENT is a C++ program which
converts C style comments to C++ style
comments.
It also handles all the non-trivial cases mentioned by other people:
This code incorporates suggestions and
coding provided on 28 April 2005 by
Steven Martin of JDS Uniphase,
Melbourne Florida. These suggestions
allow the program to ignore the
internal contents of strings, (which
might otherwise seem to begin or end
comments), to handle lines of code
with trailing comments, and to handle
comments with trailing bits of code.

This is not a trivial problem.
int * /* foo
/* this is not the beginning of a comment.
int * */ var = NULL;
What do you want to replace that with? Any real substitution requires sometimes splitting lines.
int * // foo
// this is not the beginning of a comment.
// int *
var = NULL;

How do you intend to handle situations like this:
void CreateExportableDataTable(/*[out, retval]*/ IDispatch **ppVal)
{
//blah
}
Note the comment inside the parens... this is a common way of documenting things in generated code, or mentioning default parameter values in the implementation of a class, etc. I'm usually not a fan of such uses of comments, but they are common and need to be considered. I don't think you can convert them to C++ style comments without doing some heavy thinking.

I'm with the people who commented in your question. Why do it? Just leave it.
it wastes time, adds useless commits to version control, risk of screwing up
EDIT:
Adding details from the comments from the OP
The fundamental reason of preferring C++-style comment is that you can comment out a block of code which may have comments in it. If that comment is in C-style, this block-comment-out of code is not straight forward. – unknown (yahoo)
that might be a fair/ok thing to want to do, but I have two comments about that:
I know of no one who would advocate changing all existing code - that is a preference for new code. (IMO)
If you feel the need to "comment out code" (another iffy practice) then you can do it as needed - not before
It also appears that you want to use the c-style comments to block out a section of code? Or are you going to use the // to block out many lines?
One alternative is a preprocessor #ifdef for that situation. I cringe at that but it is just as bad as commenting out lines/blocks. Neither should be left in the production code.

I recently converted all C-style comments to C++-style for all files in our repository. Since I could not find a tool that would do it automatically, I wrote my own: c-comments-to-cpp
It is not fool-proof, but way better than anything else I've tried (including RECOMMENT). Among other things, it supports converting Doxygen style comments, for instance:
/**
* #brief My foo struct.
*/
struct foo {
int bar; /*!< This is a member.
It also has a meaning. */
};
Gets converted to:
/// #brief My foo struct.
struct foo {
int bar; ///< This is a member.
///< It also has a meaning.
};

Here's a Python script that will (mostly) do the job. It handles most edge cases, but it does not handle comment characters inside of strings, although that should be easy to fix.
#!/usr/bin/python
import sys
out = ''
in_comment = False
file = open(sys.argv[1], 'r+')
for line in file:
if in_comment:
end = line.find('*/')
if end != -1:
out += '//' + line[:end] + '\n'
out += ' ' * (end + 2) + line[end+2:]
in_comment = False
else:
out += '//' + line
else:
start = line.find('/*')
cpp_start = line.find('//')
if start != -1 and (cpp_start == -1 or cpp_start > start):
out += line[:start] + '//' + line[start+2:]
in_comment = True
else:
out += line
file.seek(0)
file.write(out)

Why don't you write a C app to parse it's own source files? You could find the /* comments */ sections with a relatively easy Regex query. You could then replace the new line characters with new line character + "//".
Anyway, just a thought. Good luck with that.

If you write an application/script to process the C source files, here are some things to be careful of:
comment characters within strings
comment characters in the middle of a line (you might not want to split the code line)
You might be better off trying to find an application that understands how to actually parse the code as code.

There are a few suggestions that you might like to try out:
a)Write your own code (C/ Python/ any language you like) to replace the comments. Something along the lines of what regex said or this naive solution 'might' work:
[Barring cases like the one rmeador, Darron posted]
for line in file:
if line[0] == "\*":
buf = '//' + all charachters in the line except '\*'
flag = True
if flag = True:
if line ends with '*/':
strip off '*/'
flag = False
add '//' + line to buf
b)Find a tool to do it. (I'll look up some and post, if I find them.)
c)Almost all modern IDE's (if you are using one) or text editors have an auto comment feature. You can then manually open up each file, select comment lines, decide how to handle the situation and comment C++ style using an accelerator (say Ctrl + M). Then, you can simply 'Find and Replace' all "/*" and "*/", again using your judgment. I have Gedit configured to do this using the "Code Comment' plugin. I don't remember the way I did it in Vim off hand. I am sure this one can be found easily.

If there are just "several files" is it really necessary to write a program? Opening it up in a text editor might do the trick quicker in practice, unless there's a whole load of comments. emacs has a comment-region command that (unsurprisingly) comments a region, so it'd just be a case of ditching the offending '/*' and '*/'.

Very old question, I know, but I just achieved this using "pure emacs". In short, the solution looks as follows:
Run M-x query-replace-regexp. When prompted, enter
/\*\(\(.\|^J\)*?\)*\*/
as the regex to search for. The ^J is a newline, which you can enter by pressing ^Q (Ctrl+Q in most keyboards), and then pressing the enter key. Then enter
//\,(replace-regexp-in-string "[\n]\\([ ]*?\\) \\([^ ]\\)" "\n\\1// \\2" \1))
as the replacement expression.
Essentially, the idea is that you use two nested regex searches. The main one simply finds C-style comments (the *? eager repetition comes very handy for this). Then, an elisp expression is used to perform a second replacement inside the comment text only. In this case, I'm looking for newlines followed by space, and replacing the last three space characters by //, which is nice for preserving the comment formatting (works only as long as all comments are indented, though).
Changes to the secondary regex will make this approach work in other cases, for example
//\,(replace-regexp-in-string "[\n]" " " \1))
will just put the whole contents of the original comment into a single C++-style comment.

from PHP team convention... some reasonning has to exist if the question was asked. Just answer if you know.
Never use C++ style comments (i.e. // comment). Always use C-style
comments instead. PHP is written in C, and is aimed at compiling
under any ANSI-C compliant compiler. Even though many compilers
accept C++-style comments in C code, you have to ensure that your
code would compile with other compilers as well.
The only exception to this rule is code that is Win32-specific,
because the Win32 port is MS-Visual C++ specific, and this compiler
is known to accept C++-style comments in C code.

Related

Creating a simple parser in (V)C++ (2010) similar to PEG

For an school project, I need to parse a text/source file containing a simplified "fake" programming language to build an AST. I've looked at boost::spirit, however since this is a group project and most seems reluctant to learn extra libraries, plus the lecturer/TA recommended leaning to create a simple one on C++. I thought of going that route. Is there some examples out there or ideas on how to start? I have a few attempts but not really successful yet ...
parsing line by line
Test each line with a bunch of regex (1 for procedure/function declaration), one for assignment, one for while etc...
But I will need to assume there are no multiple statements in one line: eg. a=b;x=1;
When I reach a container statement, procedures, whiles etc, I will increase the indent. So all nested statements will go under this
When I reach a } I will decrement indent
Any better ideas or suggestions? Example code I need to parse (very simplified here ...)
procedure Hello {
a = 1;
while a {
b = a + 1 + z;
}
}
Another idea was to read whole file into a string, and go top down. Match all procedures, then capture everything in { ... } then start matching statements (end with ;) or containers while { ... }. This is similar to how PEG does things? But I will need to read entire file
Multipass makes things easier. On a first pass, split things into tokens, like "=", or "abababa", or a quote-delimited string, or a block of whitespace. Don't be destructive (keep the original data), but break things down to simple chunks, and maybe have a little struct or enum that describes what the token is (ie, whitespace, a string literal, an identifier type thing, etc).
So your sample code gets turned into:
identifier(procedure) whitespace( ) identifier(Hello) whitespace( ) operation({) whitespace(\n\t) identifier(a) whitespace( ) operation(=) whitespace( ) number(1) operation(;) whitespace(\n\t) etc.
In those tokens, you might also want to store line number and offset on the line (this will help with error message generation later).
A quick test would be to turn this back into the original text. Another quick test might be to dump out pretty-printed version in html or something (where you color whitespace to have a pink background, identifiers as light blue, operations as light green, numbers as light orange), and see if your tokenizer is making sense.
Now, your language may be whitespace insensitive. So discard the whitespace if that is the case! (C++ isn't, because you need newlines to learn when // comments end)
(Note: a professional language parser will be as close to one-pass as possible, because it is faster. But you are a student, and your goal should be to get it to work.)
So now you have a stream of such tokens. There are a bunch of approaches at this point. You could pull out some serious parsing chops and build a CFG to parse them. (Do you know what a CFG is? LR(1)? LL(1)?)
An easier method might be to do it a bit more ad-hoc. Look for operator({) and find the matching operator(}) by counting up and down. Look for language keywords (like procedure), which then expects a name (the next token), then a block (a {). An ad-hoc parser for a really simple language may work fine.
I've done exactly this for a ridiculously simple language, where the parser consisted of a really simple PDA. It might work for you guys. Or it might not.
Since you mentioned PEG i'll like to throw in my open source project : https://github.com/leblancmeneses/NPEG/tree/master/Languages/npeg_c++
Here is a visual tool that can export C++ version: http://www.robusthaven.com/blog/parsing-expression-grammar/npeg-language-workbench
Documentation for rule grammar: http://www.robusthaven.com/blog/parsing-expression-grammar/npeg-dsl-documentation
If i was writing my own language I would probably look at the terminals/non-terminals found in System.Linq.Expressions as these would be a great start for your grammar rules.
http://msdn.microsoft.com/en-us/library/system.linq.expressions.aspx
System.Linq.Expressions.Expression
System.Linq.Expressions.BinaryExpression
System.Linq.Expressions.BlockExpression
System.Linq.Expressions.ConditionalExpression
System.Linq.Expressions.ConstantExpression
System.Linq.Expressions.DebugInfoExpression
System.Linq.Expressions.DefaultExpression
System.Linq.Expressions.DynamicExpression
System.Linq.Expressions.GotoExpression
System.Linq.Expressions.IndexExpression
System.Linq.Expressions.InvocationExpression
System.Linq.Expressions.LabelExpression
System.Linq.Expressions.LambdaExpression
System.Linq.Expressions.ListInitExpression
System.Linq.Expressions.LoopExpression
System.Linq.Expressions.MemberExpression
System.Linq.Expressions.MemberInitExpression
System.Linq.Expressions.MethodCallExpression
System.Linq.Expressions.NewArrayExpression
System.Linq.Expressions.NewExpression
System.Linq.Expressions.ParameterExpression
System.Linq.Expressions.RuntimeVariablesExpression
System.Linq.Expressions.SwitchExpression
System.Linq.Expressions.TryExpression
System.Linq.Expressions.TypeBinaryExpression
System.Linq.Expressions.UnaryExpression

Highlighting Javascript Inline Block Comments in Vim

Background
I use JScript (Microsoft's ECMAScript implementation) for a lot of Windows system administration needs. This means I use a lot of ActiveX (Automated COM) objects. The methods of these objects often expect Number or Boolean arguments. For example:
var fso = new ActiveXObject("Scripting.FileSystemObject");
var a = fso.CreateTextFile("c:\\testfile.txt", true);
a.WriteLine("This is a test.");
a.Close();
(CreateTextFile Method on MSDN)
On the second line you see that the second argument is one that I'm talking about. A Boolean of "true" doesn't really describe how the method's behavior will change. This isn't a problem for me, but my automation-shy coworkers are easily spooked. Not knowing what an argument does spooks them. Unfortunately a long list of constants (not real constants, of course, since current JScript versions don't support them) will also spook them. So I've taken to documenting some of these basic function calls with inline block comments. The second line in the above example would be written as such:
var a = fso.CreateTextFile("c:\\testfile.txt", /*overwrite*/ true, /*unicode*/ false);
That ends up with a small syntax highlighting dilemma for me, though. I like my comments highlighted vibrantly; both block and line comments. These tiny inline block comments mean little to me, personally, however. I'd like to highlight those particular comments in a more muted fashion (light gray on white, for example). Which brings me to my dilemma.
Dilemma
I'd like to override the default syntax highlighting for block comments when both the beginning and end marks are on the same line. Ideally this is done solely in my vimrc file, and not in a superseding personal copy of the javascript.vim syntax. My initial attempt is pathetic:
hi inlineComment guifg=#bbbbbb
match inlineComment "\/\*.*\*\/"
Straight away you can see the first problem with this regular expression pattern is that it's a greedy search. It's going to match from the first "/*" to the last "*/" on the line, meaning everything between two inline block comments will get this highlight style as well. I can fix that, but I'm really not sure how to deal with my second concern.
Comments can't be defined inside of String literals in ECMAScript. So this syntax highlighting will override String highlighting as well. I've never had a problem with this in system administration scripts, but it does often bite me when I'm examining the source of many javascript libraries intended for browsers (less.js for example).
What regex pattern, syntax definition, or other solution would the amazing StackOverflow community recommend to restore my vimrc zen?
I'm not sure, but from your description it sounds like you don't need a new syntax definition. Vim syntax files usually let you override a particular syntax item with your own choice of highlighting. In this case, the item you want is called javaScriptComment, so a command like this will set its highlighting:-
hi javaScriptComment guifg=#bbbbbb
but you have to do this in your .vimrc file (or somewhere that's sourced from there), so it's evaluated before the syntax file. The syntax file uses the highlight default command, so the syntax file's choice of highlighting only affects syntax items with no highlighting set. See :help :hi-default for more details on that. BTW, it only works on Vim 5.8 and later.
The above command will change all inline /* */ comments, and leave // line comments with their default setting, because line comments are a different syntax item (javaScriptLineComment). You can find the names of all these groups by looking at the javascript.vim file. (The easiest way to do this is :e $VIMRUNTIME/syntax/javascript.vim .)
If you only want to change some inline comments, it's a little more complicated, but still easy to see what to do by looking at javascript.vim . If you do that, you can see that block comments are defined like this:-
syn region javaScriptComment start="/\*" end="\*/" contains=#Spell,javaScriptCommentTodo
See that you can use separate regexes for begin and end markers: you don't need to worry about matching the stuff in between with non-greedy quantifiers, or anything like that. To have a syntax item that works similarly but only on one line, try adding the oneline option (:h :syn-oneline for more details):-
syn region myOnelineComment start="/\*" end="\*/" oneline
I've removed the two contains groups because (1) if you're only using it for parameter names, you probably don't want spell-checking turned on inside these comments, and (2) contained sections that aren't oneline override the oneline in the container region, so you would still match all TODO comments with this region.
You can define this new kind of comment region in your .vimrc, and set the highlighting how you like: it looks like you already know how to do that, so I won't go into more details on that. I haven't tried out this particular example, so you may still need a bit of fiddling to make it work. Give it a try and let me know how it goes.
Why don't you simply add a comment line above the call?
I think that
// fso.CreateTextFile(filename:String, overwrite:Boolean, unicode:Boolean)
var a = fso.CreateTextFile("c:\\testfile.txt", true, false);
is a lot more readable and informative than
var a = fso.CreateTextFile("c:\\testfile.txt", /*overwrite*/ true, /*unicode*/ false);

Making Doxygen read double-slash C++ comments as markup

I'm trying to set up automated Doxygen runs on our massive 78,000 file C++ codebase. It's doing okay with extracting basic type and hierarchy information, but I'd like to make it smarter about picking up the documentation comments that are already in place.
Most of the comments that have accumulated over the years do follow a general pattern, though not the pattern that Doxygen expected. Mostly they look like
// class description
class foo
{
// returns ascii art of a fruit
const char* apples( void );
// does something to some other thing
customtype_t baz( foo &other );
enum
{
kBADGER, // an omnivorous mustelid
kMUSHROOM, // tasty on pizza
kSNAKE, // oh no!
};
}
Which are double-slashed, rather than the /// or //! style comments that Doxygen expects.
There are far too many files to go through searching and replacing all such comments, and many of my programmers are violently allergic to seeing triple-slashes in their code, so I'd like to find some way to make Doxygen read ordinary comments as JavaDoc comments, when they're in the right place. Is there a way to make Doxygen read // as ///?
I couldn't find any such configuration parameter, so I figure I'll need to transform the input somehow. In general the rule I'd use is:
if there is a line containing just a
comment, immediately preceding a
function/class/type/variable
declaration, assume it is a ///
comment.
if there is a declaration
followed on the same line by a //
comment, treat it as a ///<
But I don't know how to go about teaching Doxygen this rule. The two ways I can think of are:
Write a program as an INPUT_FILTER, which parses the input C++ and transforms //s into ///s as above. But this kind of transform is too complicated to do as a regular expression, and I really don't want to have to write a full blown C++ parser just to feed input to another C++ parser! Also, spinning up an INPUT_FILTER program for each file slows down Doxygen unacceptably: it already takes over 30 minutes to run across our source, and adding an INPUT_FILTER makes it take over six hours.
Modify the Doxygen source code to include the above comment rules. That seems like a scary amount of work in unfamiliar code.
Any other ideas?
The answer is simple: You can't.
The special style of doxygen must be used, to mark a comment as documentation.
Doxygen does NOT only take comments, that precede the declaration. You also could use them everywhere in the code.
If you want to use the doxygen features, you would have to update the comments by hand, or write a script/tool that looks for declarations and preceding comments to change them.
You have to decide, choose one from the 3 solutions (your two, and the script, added as answer) or not using doxygen.
You can use a script to change comment to Doxygen style, here is a simple python script, just try it:
#!/usr/bin/env python
import os
import sys
import re
def main(input_file, output_file):
fin = open(input_file, 'r')
fout = open(output_file, 'w')
pattern1 = '^\s*//\s.*'
pattern2 = '^\s*\w.*\s//\s.*'
for line in fin.readlines():
if re.match(pattern1, line) != None:
line = line.replace('//', '///', 1)
if re.match(pattern2, line) != None:
line = line.replace('//', '///<', 1)
fout.write(line)
fin.close()
fout.close()
if __name__ == '__main__':
if len(sys.argv) != 3:
print 'usage: %s input output' % sys.argv[0]
sys.exit(1)
main(sys.argv[1], sys.argv[2])

Easily comment (C++) code in vim

I have looked at the following question:
How to comment out a block of Python code in Vim
But that does not seem to work for me. How do I comment code easily without resorting to plugins/scripts?
Use ctrl-V to do a block selection and then hit I followed by //[ESC].
Alternatively, use shift-V to do a line-based select and then type :s:^://[Enter]. The latter part could easily go into a mapping. eg:
:vmap // :s:^://<CR>
Then you just shift-V, select the range, and type // (or whatever you bind it to).
You can add this to your .vimrc file
map <C-c> :s/^/\/\//<Enter>
Then when you need to comment a section just select all lines (Shift-V + movement) and then press CtrlC.
To un-comment you can define in a similar way
map <C-u> :s/^\/\///<Enter>
that removes a // at begin of line from the selected range when pressing CtrlU.
You can use the NERD commenter plugin for vim, which has support for a whole bunch of languages (I'm sure C++ is one of them). With this installed, to comment/uncomment any line, use <Leader>ci. To do the same for a block of text, select text by entering the visual mode and use the same command as above.
There are other features in this such as comment n lines by supplying a count before the command, yank before comment with <Leader>cy, comment to end of line with <Leader>c$, and many others, which you can read about in the link. I've found this plugin to be extremely useful and is one of my 'must have' plugins.
There's always #ifdef CHECK_THIS_LATER ... #endif which has the advantage of not causing problems with nested C-style comments (if you use them) and is easy to find and either uncomment or remove completely later.

Vim different textwidth for multiline C comments?

In our C++ code base we keep 99 column lines but 79-some-odd column multiline comments. Is there a good strategy to do this automagically? I assume the modes are already known because of smart comment line-joining and leading * insertion.
Apparently both code and comments use the same textwidth option. As far as I can see, the only trick is to set this option dynamically:
:autocmd CursorMoved,CursorMovedI * :if match(getline(.), '^\s*\*') == 0 | :setlocal textwidth=79 | :else | :setlocal textwidth=99 | :endif
Here the critical part is detecting when we are in a comment. If you only format comments this way:
/*
* my comment
*/
my regex should work... unless you have lines in the code starting with * (which I guess can happen in C, less frequently in C++). If you use comments like this:
// comment line 1
// comment line 2
the regex is even simpler to write. If you want to cover all possible situations, including corner cases, well... I guess the best thing would be to define a separate detection function and call that from the :autocmd instead of match().
I came across this same problem and think that I have found a suitable solution.
What I wanted my comments to word wrap so that when I'm typing I don't have to worry about formating text. This works well with comment text. But I wasn't comfortable with having vim format my code. So I wanted vim to highlight every thing in red after x column.
To do this with only cpp code you would add the following to your ~/.vim/ftdetect/cpp.vim file.
set textwidth=79
match ErrorMsg '\%>99v.\+'
note: You may have to create the file and folders if they don't exist.
If you have problems with this make sure that you have formatoptions set to:
formatoptions=croql
You can see this by running :set formatoptions inside of vim.