Regular Expression to get comments in VB.Net source code - regex

I have a syntax highlighting function in vb.net. I use regular expressions to match "!IF" for instance and then color it blue. This works perfect until I tried to figure out how to do comments.
The language I'm writing this for a comment can either be if the line starts with a single quote ' OR if anywhere in the line there is two single quotes
'this line is a comment
!if StackOverflow = "AWESOME" ''this is also a comment
Now i know how to see if it starts with a single line ^' but i need to to return the string all the way to the end of the line so i can color the entire comment green and not just the single quotes.
You shouldn't need the code but here is a snippet just in case it helps.
For Each pass In frmColors.lbRegExps.Items
RegExp = System.Text.RegularExpressions.Regex.Matches(LCase(rtbMain.Text), LCase(pass))
For Each RegExpMatch In RegExp
rtbMain.Select(RegExpMatch.Index, RegExpMatch.Length)
rtbMain.SelectionColor = ColorTranslator.FromHtml(frmColors.lbHexColors.Items(PassNumber))
Next
PassNumber += 1
Next

Something along the lines of:
^(\'[^\r\n]+)$|(''[^\r\n]+)$
should give you the commented line (of part of the line) in group n° 1
Actually, you do not even need group
^\'[^\r\n]+$|''[^\r\n]+$
If it finds something, it is a comment.
"(^'|'').*$"
mentioned by Boaz would work if applied only line by line (which may be your case).
For multi-line detection, you must be sure to avoid the 'Dotall' mode, where '.' stands also for \r and \n characters. Otherwise that pattern would match both your lines entirely.
That is why I generally prefer [^\r\n] to '.': it avoids any dependency to the mode of the pattern. Even in 'Dotall' mode, it still works and avoids trying any match on the next line.

While the above would work you can simplify it:
"(^'|'').*$"
As VonC mentions - this would only work if you feed the Regex one line at a time. For multi line mode use:
"(^'|'').*?$"
The ? makes the * operator not be greedy , forcing the regex to match a single line.

Using the regex pattern: REM((\t| ).*$|$)|^\'[^\r\n]+$|''[^\r\n]+$
see more https://code.msdn.microsoft.com/How-to-find-code-comments-9d1f7a29/

Related

Regex to capture a single new line instance, but not 2

I have a text file where lines are trimmed by newline characters /n and paragraphs by double newlines /n/n
I want to strip out those single newlines and replace with simple spaces. But I do not want the double newlines affected.
I thought something like one of these would work:
(?!\n\n)\n
\n{1}
\n{1,1}
But no luck. Everything I try inevitably ends up affecting those double new lines too. How can I write a regex that effectively "ignores" the /n/n but captures the /n
You can search using this regex:
(.)\n(?!\n)
And replace it with:
"\1 "
RegEx Demo
RegEx Breakup:
.\n: Match any character followed by a line break
(?!\n): Negative lookahead to assert that we don't have a line break at next position. We match one character before matching \n to make sure we don't match an empty line. Also note that this character is being captured in capture group #1. This will match all single line breaks but will skip double line breaks.
\1 : is replacement to append a space after first capture group
Python Code:
import re
repl = re.sub('(.)\n(?!\n)', r'\1 ', input)
print (repl)
Javscript Code:
repl = input.replace(/(.)\n(?!\n)/g, '$1 ')
console.log (repl)
You'll need a negative lookahead and a negative lookbehind. /(?<!\n)\n(?!\n)/g would probably work off the top of my head.
That said, you should be aware of kind of spotty browser support for lookbehinds. It's gotten better since I last checked, but Safari and IE don't support it at all.
I thought of a simple way to do this.(may not be the right way from a regex point of view) but its a workaround.
import re
sample = """This is a sentence in para1.
this is also a sentence in para1
The begining of paragraph2 and sentence1
this is a second line in paragraph2.
"""
print(sample)
sample = re.sub(r'\n\n\n',"NPtag",sample)
sample = re.sub(r'\n\n'," ",sample)
sample = re.sub(r"NPtag",'\n\n\n',sample)
print("OUTPUT*****\n")
print(sample)
the workaround is to replace the multi-line(3 in this case to demonstrate the space clearly) breaker with a NewParagraphtag(NPtag) and then substitute the single newline(2 in the above case, to demonstrate the sapce clearly in notebook env) with space and resubstitute the NPtag with multiline break. You can see the output here as:
Hope this helps. Eager to see other regex answers too! Happy coding

find a single quote at the end of a line starting with "mySqlQueryToArray"

I'm trying to use regex to find single quotes (so I can turn them all into double quotes) anywhere in a line that starts with mySqlQueryToArray (a function that makes a query to a SQL DB). I'm doing the regex in Sublime Text 3 which I'm pretty sure uses Perl Regex. I would like to have my regex match with every single quote in a line so for example I might have the line:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name'");
I want the regex to match in that line both of the quotes around $name but no other characters in that line. I've been trying to use (?<=mySqlQueryToArray.*)' but it tells me that the look behind assertion is invalid. I also tried (?<=mySqlQueryToArray)(?<=.*)' but that's also invalid. Can someone guide me to a regex that will accomplish what I need?
To find any number of single quotes in a line starting with your keyword you can use the \G anchor ("end of last match") by replacing:
(^\h*mySqlQueryToArray|(?!^)\G)([^\n\r']*)'
With \1\2<replacement>: see demo here.
Explanation
( ^\h*mySqlQueryToArray # beginning of line: check the keyword is here
| (?!^)\G ) # if not at the BOL, check we did match sth on this line
( [^\n\r']* ) ' # capture everything until the next single quote
The general idea is to match everything until the next single quote with ([^\n\r']*)' in order to replace it with \2<replacement>, but do so only if this everything is:
right after the beginning keyword (^mySqlQueryToArray), or
after the end of the last match ((?!^)\G): in that case we know we have the keyword and are on a relevant line.
\h* accounts for any started indent, as suggested by Xælias (\h being shortcut for any kind of horizontal whitespace).
https://stackoverflow.com/a/25331428/3933728 is a better answer.
I'm not good enough with RegEx nor ST to do this in one step. But I can do it in two:
1/ Search for all mySqlQueryToArray strings
Open the search panel: ⌘F or Find->Find...
Make sure you have the Regex (.* ) button selected (bottom left) and the wrap selector (all other should be off)
Search for: ^\s*mySqlQueryToArray.*$
^ beginning of line
\s* any indentation
mySqlQueryToArray your call
.* whatever is behind
$ end of line
Click on Find All
This will select every occurrence of what you want to modify.
2/ Enter the replace mode
⌥⌘F or Find->Replace...
This time, make sure that wrap, Regex AND In selection are active .
Them search for '([^']*)' and replace with "\1".
' are your single quotes
(...) si the capturing block, referenced by \1 in the replace field
[^']* is for any character that is not a single quote, repeated
Then hit Replace All
I know this is a little more complex that the other answer, but this one tackles cases where your line would contain several single-quoted string. Like this:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name' and Value='1234'");
If this is too much, I guess something like find: (?<=mySqlQueryToArray)(.*?)'([^']*)'(.*?) and replace it with \1"\2"\3 will be enough.
You can use a regex like this:
(mySqlQueryToArray.*?)'(.*?)'(.*)
Working demo
Check the substitution section.
You can use \K, see this regex:
mySqlQueryToArray[^']*\K'(.*?)'
Here is a regex demo.

Regex optimization: negative character class "[^#]" nullifies multiline flag "m"

I'm trying to parse a text line by line, catching everything EXCEPT what's after a specific marker, # for example. No escaping to take into account, pretty basic.
For instance, if the input text is:
Multiline input text
Mid-sentence# cut, this won't be matched
Hey there
If want to retrieve
['Multiline input text',
'Mid-sentence',
'Hey There']
This is working fine with /(.*?)(?:#.*$|$)/mg (even though there are a few empty matches). However, if I try to improve the regex (by avoiding backtracking and getting rid of empty matches) with /([^#]++)(?:#.*$|$)/mg, it returns
[
"Multiline input text
Mid-sentence",
"
Hey There"
]
As if [^#] was including linebreaks, even with the multiline flag on. As far as I can tell I can fix that by adding [^#\n\r] into the class character, but this makes the multiline option kind of useless and I'm afraid it could break on some weird linebreaks in some environments/encoding.
Would any of you know the reason for this behavior, and if there's another workaround? Thanks!
Edit
Originally, it happens in PCRE. But even in Javascript with /([^#]+)(?:#.*$|$)/mg, same unwanted multiline behavior. I know I could probably use the language to parse the text line by line, but I'd like to do it with regex only.
It seems you got your definition of /m wrong. The only thing this flag does is to change what ^ and $ matches, so that they also match at the beginning and end of line respectively. It does not affect anything else. If you don't want to match line breaks you should do as you suggested and use [^#\n\r].
The regex that will work for you is:
^(.*?)(?:#.*|)$
Online Demo: http://regex101.com/r/aP8eV6
DIfference is use of .*? instead of [^#]+.
[^#]+ by definition matches anything but # and that includes newlines as well.
multiline flag m only lets you use line start/end anchors ^ and $ in multiline inputs.

Adding Line Break After pattern in VIM

I have a css file and I want to add an empty line after every }.
How can I do this in Vim?
A substitution would work nicely.
:%s/}/\0\r/g
Replace } with the whole match \0 and a new line character \r.
or
:%s/}/&\r/g
Where & also is an alternative for the whole match, looks a bit funny though in my opinion. Vim golfers like it because it saves them a keystroke :)
\0 or & in the replacement part of the substitution acts as a special character. During the substitution the whole string that was matched replaces the \0 or the & character in the substitution.
We can demonstrate this with a more complex search and replace -
Which witch is which?
Apply a substitution -
:s/[wW][ih][ti]ch/The \0/g
Gives -
The Which The witch is The which?
The answer is :%s/}/}\r/ I guess.
:%s/pre/cur\r/g
%: operate on the entire buffer.
pre(previous pattern): which pattern will be to changed.
cur(current pattern): by which the previous pattern will be changed.
\r: new line.
g: repeat for every match on a line (default is to just replace the first).

Regular Expression for comments but not within a "string" / not in another container

So I need a regular expression for finding single line and multi line comments, but not in a string. (eg. "my /* string")
for testing (# single line, /* & */ multi line):
# complete line should be found
lorem ipsum # from this to line end
/*
all three lines should be found
*/ but not here anymore
var x = "this # should not be found"
var y = "this /* shouldn't */ match either"
var z = "but" & /* this must match */ "_"
SO does the syntax display really well; I basically want all the gray text.
I don't care if its a single regex or two separates. ;)
EDIT: one more thing. the opposite would also satisfy me, searching for a string which is not in a comment
this is my current string matching: "[\s\S]*?(?<!\\)" (indeed: will not work with "\\")
EDIT2:
OK finally I wrote my own comment parser -.-
And if someone else is interested in the source code, grab it from here: https://github.com/relikd/CommentParser
Here's one possibility (it does have an achilles heel that i'll get to):
(#[^"\n\r]*(?:"[^"\n\r]*"[^"\n\r]*)*[\r\n]|/\*([^*]|\*(?!/))*?\*/)(?=[^"]*(?:"[^"]*"[^"]*)*$)
In action here
With the GLOBAL and DOTALL flags, but not the MULTILINE flag.
Explanation of the regex:
(
#[^"\n\r]* Hash mark followed by non-" and non-end-of-line
(?:"[^"\n\r]*"[^"\n\r]*)* If any quotes in the comment, they must be balanced
[\r\n] Followed by end-of-line ($ except we
don't have multiline flag)
| OR
/\*([^*]|\*(?!/))*?\*/ /* xxx */ sort of comment
) BOTH FOLLOWED BY
(?=[^"]*(?:"[^"]*"[^"]*)*$) only a *balanced* number of quotes for the
*rest of the code :O!*
However, this relies on balanced quotes being used throughout the text (it also doesn't take into account escaped quotes, but it's easy enough to modify the regex to take that into account).
If a user has a comment with a " in it that isn't balanced...boom. You're screwed!
Regex is generally not recommended by things like HTML/code parsing, but if you can rely on the fact that quotes have to balance when you define a string, etc, you can sometimes get away with it.
Since you are also parsing comments, which have no set structure (ie you are not guaranteed that quotes within comments will be balanced), you won't be able to find a regex solution that works here.
Anything you think up can be outwitted by an unbalanced quote in a comment somewhere (say the comment was # remove all the " marks), or by multiline strings (where on a given line there may be unbalanced quotes).
Bottom line - you can probably make a regex that will work in most cases, but not for all. To get something watertight you'll have to write some code.
I would use two regular expressions for this:
/(\/\*.*?\/)|(#.+?$)/m to find all the comments, the "m" modifier is to enable multiline
/"[^"]*?"/ to find all the strings
If you apply the highlighting to the comments first and only after to the strings, the invalid comments should disappear.