Line Comments in Standard ML - sml

I'm learning ML, with the SML/NJ dialect. What I'm trying to figure out is if there is a line comment operator. I found the block comment operator, (* ... *), but I really miss line comments.
Suggestions? Or am I just stuck with block comments?

You're stuck with block comments.
On the other hand, block comments can be nested: (* (* *) still comment here *)

There is a RFC for line comments, which proposes a hashmark followed by a whitespace.

Single-line comments now ship in both MLton and SML/NJ, as long as you enable sML ("Successor ML") extensions (sml -Cparser.succ-ml=true for SML/NJ).
Here's a concrete example. In the definition below, the value 1 is ignored, and the definition of a is taken from the next line (2) instead. (Below = denotes a continuation line, and please ignore the broken syntax highlighting.)
$ sml -Cparser.succ-ml=true
- val a = (*) 1
= 2;;
val a = 2 : int
See https://github.com/SMLFamily/Successor-ML/wiki/Summary-of-proposed-changes for more about sML.

Related

Deleting comments in a large file

I am trying to delete a bunch of comments that are all in the following format:
/**
* #ngdoc
... comment body (delete me, too!)
*/
I have tried using this command: %s/\/**\n * #ngdoc.\{-}*\///g
Here is the regex without the patterns: %s/pattern1.\{-}pattern2//g
Here are the individual patterns: \/**\n * #ngdoc and *\/
When I try my pattern in vim I get the following error:
E871: (NFA regexp) Can't have a multi follow a multi !
E61: Nested *
E476: Invalid command
Thanks for any help with this regexp nightmare!
Instead of trying to cram this into one complex regex, it's much easier to search for the start of a comment and delete from there on to the end of a comment
:g/^\/\*\*$/,/\*\/$/d_
This breaks down into
:g start a global command
/^\/\*\*$/ search for start of a comment: <sol>/**<eol>
,/^\*\/$/ extend the range to the end of a comment: <sol>*/<eol>
d delete the range
_ use the black hole register (performance optimization)
Your problem is you have \{-} followed by * which are the multis referenced in the error message. Quote the *:
%s/\/\*\*\n \* #ngdoc\_.\{-}\*\/\n//g
Using embedded newlines in the pattern is the wrong approach. You should instead use an address range. Something like:
sed '\#^/\*\*$#,\#^\*/$#d' file
This will delete all lines starting from one that matches /** anchored at column 1 to the line matching */ anchored at column 1. If your comments are well behaved (eg, no trailing space after /**), this should do what you want.
Try this using gc to be careful when deleting
%s/\v\/\*\*\n\s\*\s\#ngdoc\n((\s*\n)?(\s\*.*\n)?){-}\s?\*\///gc
Match comments like
/**
* #ngdoc
* ... comment body (delete me, too!)
*
*/
My approached consists of using a macro:
qa/\/\*\*<enter><shift-v>/\*\/<enter>d
qa ........ starts recording macro "a"
/\/\*\* ... searches for the comment beginning
<Enter> ... use Ctrl-v Enter
V ......... starts visual block (until...)
/\*\/ ..... end of your comment
<Enter> ... Ctrl-v Enter agai
d ......... it will delete selected area
In order to isert etc presse followed by the keyword you want.

Regex clarification on escape sequences with lex

I'm creating a lexer.l file that is working as intended except for one part. I have the rule:
[\(\*.*\*\)] {}
which I want to make it so when I encounter (* this is a test *) in a file, I simply do nothing with it. However when I run lex lexer.l I get warning on lines with rules \(, \*, and \) stating that they can never be met. So I guess my question is why would [\(\*.*\*\)] {} interfere with \( and the others? How can I catch (* this is a test *)?
Languages with the comment syntax (*…*) typically allow nested comments, and nested comments cannot easily be recognized by (f)lex because the nesting requires a context-free grammar, and the lexical scanner only implements regular languages.
If your comments do not nest (so that (* something (* else *) is a comment, rather than the prefix of a longer comment), then you can use the regular expression
[(][*][^*]*[*]+([^*)][^*]*[*]+)*[)]
If you do require nested comments, you can use start conditions and a stack (or a simulated stack, as below):
%x SC_COMMENT
%%
int comment_nesting = 0;
"(*" { BEGIN(SC_COMMENT); }
<SC_COMMENT>{
"(*" { ++comment_nesting; }
"*"+")" { if (comment_nesting) --comment_nesting;
else BEGIN(INITIAL); }
"*"+ ;
[^(*\n]+ ;
[(] ;
\n ;
}
That snippet was taken from this answer, with a small adjustment because that answer recognizes nested /*…*/ comments. A fuller explanation of the code appears there.

Is it possible to write comments in Xtend-templates?

Is it possible to write comments inside an Xtend template? (for example in order to quickly comment out an IF-statement or anything)
Yes, that's possible. Use the toggle-comment action in Eclipse or type the prefix ««« manually, e.g as in ««« my comment in a template
You can use ««« for single line comments like Sebastian Zarnekow mentioned.
A drawback of this commenting style is that it also comments out the newline character at the end of this line. Sometimes that's exactly what you want, but sometimes it's not.
For example: The following code snippet ...
val x = '''
line 1
line 2 ««« my comment
line 3
line 4
'''
println(x)
... will print following output ...
line 1
line 2 line 3
line 4
Another way to comment is as if you would insert an expression and inside the expression («») you use a plain old java comment. : «/* comment */»
That way you can go on with you template in the same line, span multiple rows and you avoid the deleted newline character.
PS: You can insert the guillemets this way:
« with ALT holding down and then 1 7 4 on the num block
» with ALT holding down and then 1 7 5 on the num block
or you map a good key combination to the two chars in your IDE, e.g. CTRL+< and CTRL+>
comment in xtend-templates: 【Ctrl + /】

Emacs Lisp Regular Expression Match everything until character sequence

I am trying to write a regular expression in emacs lisp that will match multi line comments.
For example:
{-
Some
Comment
Here
-}
Should match as a comment. Basically, anything between {- and -}. I am able to almost do it by doing the following:
"\{\-[^-]*\-\}"
However, this will fail if the comment includes a - not immediately followed by }
So, it will not match correctly in this case:
{-
Some -
Comment -
Here -
-}
Which should be valid.
Basically, I would like to match on everything (including newlines) up to the sequence -}
Thanks in advance!
Doesn't this work for you? {-[^-]*[^}]*-}
(You didn't specify things precisely, so I'm just guessing what you want. Must the {- and -} be at the line beginning? Must they be on lines by themselves? Must there be some other characters between them? Etc. For example, should it match a line like this? {--}?)
Made a toolkit for such cases. It comes with a parser, beg-end.el.
Remains to write a function, which will determine the beginning resp. the end of the object.
In pseudo-code:
(put 'MY-FORM 'beginning-op-at
(lambda () (search-forward "-}")))
(put 'MY-FORM 'end-op-at
(lambda () (search-backward "{-")))
When done, it's should be available, i.e. copied and returned like this
(defun MY-FORM-atpt (&optional arg)
" "
(interactive "p")
(ar-th 'MY-FORM arg))
Get it here:
https://launchpad.net/s-x-emacs-werkstatt/

Remove C and C++ comments using Python?

I'm looking for Python code that removes C and C++ comments from a string. (Assume the string contains an entire C source file.)
I realize that I could .match() substrings with a Regex, but that doesn't solve nesting /*, or having a // inside a /* */.
Ideally, I would prefer a non-naive implementation that properly handles awkward cases.
This handles C++-style comments, C-style comments, strings and simple nesting thereof.
def comment_remover(text):
def replacer(match):
s = match.group(0)
if s.startswith('/'):
return " " # note: a space and not an empty string
else:
return s
pattern = re.compile(
r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"',
re.DOTALL | re.MULTILINE
)
return re.sub(pattern, replacer, text)
Strings needs to be included, because comment-markers inside them does not start a comment.
Edit: re.sub didn't take any flags, so had to compile the pattern first.
Edit2: Added character literals, since they could contain quotes that would otherwise be recognized as string delimiters.
Edit3: Fixed the case where a legal expression int/**/x=5; would become intx=5; which would not compile, by replacing the comment with a space rather then an empty string.
C (and C++) comments cannot be nested. Regular expressions work well:
//.*?\n|/\*.*?\*/
This requires the “Single line” flag (Re.S) because a C comment can span multiple lines.
def stripcomments(text):
return re.sub('//.*?\n|/\*.*?\*/', '', text, flags=re.S)
This code should work.
/EDIT: Notice that my above code actually makes an assumption about line endings! This code won't work on a Mac text file. However, this can be amended relatively easily:
//.*?(\r\n?|\n)|/\*.*?\*/
This regular expression should work on all text files, regardless of their line endings (covers Windows, Unix and Mac line endings).
/EDIT: MizardX and Brian (in the comments) made a valid remark about the handling of strings. I completely forgot about that because the above regex is plucked from a parsing module that has additional handling for strings. MizardX's solution should work very well but it only handles double-quoted strings.
Don't forget that in C, backslash-newline is eliminated before comments are processed, and trigraphs are processed before that (because ??/ is the trigraph for backslash). I have a C program called SCC (strip C/C++ comments), and here is part of the test code...
" */ /* SCC has been trained to know about strings /* */ */"!
"\"Double quotes embedded in strings, \\\" too\'!"
"And \
newlines in them"
"And escaped double quotes at the end of a string\""
aa '\\
n' OK
aa "\""
aa "\
\n"
This is followed by C++/C99 comment number 1.
// C++/C99 comment with \
continuation character \
on three source lines (this should not be seen with the -C fla
The C++/C99 comment number 1 has finished.
This is followed by C++/C99 comment number 2.
/\
/\
C++/C99 comment (this should not be seen with the -C flag)
The C++/C99 comment number 2 has finished.
This is followed by regular C comment number 1.
/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.
/\
\/ This is not a C++/C99 comment!
This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.
/\
\* This is not a C or C++ comment!
This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.
This is followed by regular C comment number 3.
/\
\
\
\
* C comment */
This does not illustrate trigraphs. Note that you can have multiple backslashes at the end of a line, but the line splicing doesn't care about how many there are, but the subsequent processing might. Etc. Writing a single regex to handle all these cases will be non-trivial (but that is different from impossible).
This posting provides a coded-out version of the improvement to Markus Jarderot's code that was described by atikat, in a comment to Markus Jarderot's posting. (Thanks to both for providing the original code, which saved me a lot of work.)
To describe the improvement somewhat more fully: The improvement keeps the line numbering intact. (This is done by keeping the newline characters intact in the strings by which the C/C++ comments are replaced.)
This version of the C/C++ comment removal function is suitable when you want to generate error messages to your users (e.g. parsing errors) that contain line numbers (i.e. line numbers valid for the original text).
import re
def removeCCppComment( text ) :
def blotOutNonNewlines( strIn ) : # Return a string containing only the newline chars contained in strIn
return "" + ("\n" * strIn.count('\n'))
def replacer( match ) :
s = match.group(0)
if s.startswith('/'): # Matched string is //...EOL or /*...*/ ==> Blot out all non-newline chars
return blotOutNonNewlines(s)
else: # Matched string is '...' or "..." ==> Keep unchanged
return s
pattern = re.compile(
r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"',
re.DOTALL | re.MULTILINE
)
return re.sub(pattern, replacer, text)
I don't know if you're familiar with sed, the UNIX-based (but Windows-available) text parsing program, but I've found a sed script here which will remove C/C++ comments from a file. It's very smart; for example, it will ignore '//' and '/*' if found in a string declaration, etc. From within Python, it can be used using the following code:
import subprocess
from cStringIO import StringIO
input = StringIO(source_code) # source_code is a string with the source code.
output = StringIO()
process = subprocess.Popen(['sed', '/path/to/remccoms3.sed'],
input=input, output=output)
return_code = process.wait()
stripped_code = output.getvalue()
In this program, source_code is the variable holding the C/C++ source code, and eventually stripped_code will hold C/C++ code with the comments removed. Of course, if you have the file on disk, you could have the input and output variables be file handles pointing to those files (input in read-mode, output in write-mode). remccoms3.sed is the file from the above link, and it should be saved in a readable location on disk. sed is also available on Windows, and comes installed by default on most GNU/Linux distros and Mac OS X.
This will probably be better than a pure Python solution; no need to reinvent the wheel.
The regular expression cases will fall down in some situations, like where a string literal contains a subsequence which matches the comment syntax. You really need a parse tree to deal with this.
you may be able to leverage py++ to parse the C++ source with GCC.
Py++ does not reinvent the wheel. It
uses GCC C++ compiler to parse C++
source files. To be more precise, the
tool chain looks like this:
source code is passed to GCC-XML
GCC-XML passes it to GCC C++ compiler
GCC-XML generates an XML description
of a C++ program from GCC's internal
representation. Py++ uses pygccxml
package to read GCC-XML generated
file. The bottom line - you can be
sure, that all your declarations are
read correctly.
or, maybe not. regardless, this is not a trivial parse.
# RE based solutions - you are unlikely to find a RE that handles all possible 'awkward' cases correctly, unless you constrain input (e.g. no macros). for a bulletproof solution, you really have no choice than leveraging the real grammar.
I'm sorry this not a Python solution, but you could also use a tool that understands how to remove comments, like your C/C++ preprocessor. Here's how GNU CPP does it.
cpp -fpreprocessed foo.c
There is also a non-python answer: use the program stripcmt:
StripCmt is a simple utility written
in C to remove comments from C, C++,
and Java source files. In the grand
tradition of Unix text processing
programs, it can function either as a
FIFO (First In - First Out) filter or
accept arguments on the commandline.
The following worked for me:
from subprocess import check_output
class Util:
def strip_comments(self,source_code):
process = check_output(['cpp', '-fpreprocessed', source_code],shell=False)
return process
if __name__ == "__main__":
util = Util()
print util.strip_comments("somefile.ext")
This is a combination of the subprocess and the cpp preprocessor. For my project I have a utility class called "Util" that I keep various tools I use/need.
I have using the pygments to parse the string and then ignore all tokens that are comments from it. Works like a charm with any lexer on pygments list including Javascript, SQL, and C Like.
from pygments import lex
from pygments.token import Token as ParseToken
def strip_comments(replace_query, lexer):
generator = lex(replace_query, lexer)
line = []
lines = []
for token in generator:
token_type = token[0]
token_text = token[1]
if token_type in ParseToken.Comment:
continue
line.append(token_text)
if token_text == '\n':
lines.append(''.join(line))
line = []
if line:
line.append('\n')
lines.append(''.join(line))
strip_query = "\n".join(lines)
return strip_query
Working with C like languages:
from pygments.lexers.c_like import CLexer
strip_comments("class Bla /*; complicated // stuff */ example; // out",CLexer())
# 'class Bla example; \n'
Working with SQL languages:
from pygments.lexers.sql import SqlLexer
strip_comments("select * /* this is cool */ from table -- more comments",SqlLexer())
# 'select * from table \n'
Working with Javascript Like Languages:
from pygments.lexers.javascript import JavascriptLexer
strip_comments("function cool /* not cool*/(x){ return x++ } /** something **/ // end",JavascriptLexer())
# 'function cool (x){ return x++ } \n'
Since this code only removes the comments, any strange value will remain. So, this is a very robust solution that is able to deal even with invalid inputs.
You don't really need a parse tree to do this perfectly, but you do in effect need the token stream equivalent to what is produced by the compiler's front end. Such a token stream must necessarilyy take care of all the weirdness such as line-continued comment start, comment start in string, trigraph normalization, etc. If you have the token stream, deleting the comments is easy. (I have a tool that produces exactly such token streams, as, guess what, the front end of a real parser that produces a real parse tree :).
The fact that the tokens are individually recognized by regular expressions suggests that you can, in principle, write a regular expression that will pick out the comment lexemes. The real complexity of the set regular expressions for the tokenizer (at least the one we wrote) suggests you can't do this in practice; writing them individually was hard enough. If you don't want to do it perfectly, well, then, most of the RE solutions above are just fine.
Now, why you would want strip comments is beyond me, unless you are building a code obfuscator. In this case, you have to have it perfectly right.
I ran across this problem recently when I took a class where the professor required us to strip javadoc from our source code before submitting it to him for a code review. We had to do this several times, but we couldn't just remove the javadoc permanently because we were required to generate javadoc html files as well. Here is a little python script I made to do the trick. Since javadoc starts with /** and ends with */, the script looks for these tokens, but the script can be modified to suite your needs. It also handles single line block comments and cases where a block comment ends but there is still non-commented code on the same line as the block comment ending. I hope this helps!
WARNING: This scripts modifies the contents of files passed in and saves them to the original files. It would be wise to have a backup somewhere else
#!/usr/bin/python
"""
A simple script to remove block comments of the form /** */ from files
Use example: ./strip_comments.py *.java
Author: holdtotherod
Created: 3/6/11
"""
import sys
import fileinput
for file in sys.argv[1:]:
inBlockComment = False
for line in fileinput.input(file, inplace = 1):
if "/**" in line:
inBlockComment = True
if inBlockComment and "*/" in line:
inBlockComment = False
# If the */ isn't last, remove through the */
if line.find("*/") != len(line) - 3:
line = line[line.find("*/")+2:]
else:
continue
if inBlockComment:
continue
sys.stdout.write(line)