Regex for single line comments - c++

I'm trying to make a regex to identify a comment. It has to start with // and end with a new line or a *) pattern.
For now, I manage to get this (\/\/)([^\n\r]+), but I am unsuccessful to add the *) pattern.
Any tips?

Try it like this:
^\/\/[^\n\r]+(?:[\n\r]|\*\))$
Matches
^ Beginning of the string
\/\/ Match two forward slashes
[^\n\r]+ Match not a newline or a carriage return 1 or more times
(?: Non capturing group
[\n\r]|\*\) Match a newline or a carriage return or *)
) Close non capturing group
$ The end of the string
Edit:
Updated according to the comments, this is the final regex:
\/\/[^\n\r]+?(?:\*\)|[\n\r])

You can use (\/\/)(.+?)(?=[\n\r]|\*\)).
?= means the last group is a positive lookahead. It only assert the following characters can match the new-line-or-*) pattern. If you want to match the new-line-or-*) pattern as well, just remove ?=.
.+? means lazy matching, i.e. matching characters as few as possible. So for string such as // something *) something *), it will stop matching before the first *).
Note this pattern does not match //\n (your previous regex does not as well) because + means at least one characters. If you want to match such string, use * instead of + in the regex.
Finally, although you can use regex to parse such single line comments, as Jerry Coffin said in comment, don't try to parse programming source codes using regexes, because the language constituted by all legal source codes is commonly not a regular language.

extendind the answer of #the-fourth-bird if you need to find a block of single lines of comments, something like this changing 3 for the number of lines, should help to find a bigger blocks
^(\/\/.*[\r\n]){3}$
And if trying to find a block of comment with /** */ here explain a few ways.

Related

RegEx for matching everything between two special characters [duplicate]

This question already has answers here:
RegEx to select everything between two characters?
(4 answers)
Closed 3 years ago.
I want to find all characters between 2 special characters. I can't find the solution though because there are new lines that are not included. It's prolly easy, but I can't seem to find the right regex for it.
How do I solve this problem?
The source data is structured like this:
\#(.*)\;
doesn't include new lines and
(?!\#)([\S\s])(?!=\;)
doesn't work also.
It selects everything, but doesn't do the group trick...
Source looks like this:
#first line of text;
#second line of text;
#third line could easy
be on a new line;
#forth etc;
#this could (#hi,#hi,#hi) also
happen though:));
#so.... any idea;
any new line starts with # and every line ends with ;
I see two problems in your regex,
You are missing quantifier in your [\S\s] due to which it will only match one character.
Second you need a non-greedy regex so it doesn't match all the lines.
Also, where you wrote this (?!#) I guess you meant to write any one character among them, for which you should place it in a character set like this [?!#]
You need this regex, where you can capture your text from group1
#([\w\W]*?);
Regex Demo
And like you attempted, if you want your full match to only select the intended text, you can use lookaround.
Regex Demo with lookarounds so your full match is intended text only
Also, writing [^;]* (which also matches newlines) is way faster than .*? hence you should preferably use this regex,
(?<=[?!#])[^;]*(?=;)
Regex Demo with best performance
You just need to modify your first regex a little bit so that it looks like this:
#([\s\S]*?);
. will only match non new line characters. So I replaced it with [\s\S] - the set of whitespaces union the set of non-whitespaces - the set of all characters. If your regex engine has the "single line" option, you can turn that on, and . will match new lines as well.
I also made * lazy. Otherwise it will just be one whole match that matches all the way to the last ;. For more info, see this question.
You don't need to escape the ;.
You have to use either a single line flag /s or add whitespace characters \s as second alternative to all characters .. Also, your * quantifier must be lazy/non-greedy, so the whole regex stops at first ; it founds.
#((?:.|\s)*?); or #(.*?);/s

Regex - how do I match this?

I've been trying hard to get this Regex to work, but am simply not good enough at this stuff apparently :(
Regex - Trying to extract sources
I thought this would work... I'm trying to get all of the content where:
It starts with ds://
Ends with either carriage return or line feed
That's it! Essentially I'm going to then do a negative lookahead such that I can remove all content that is NOT conforming to above (in Notepad++) which allows for Regex search/replace.
Search for lines that contain the pattern, and mark them
Search menu > Mark
Find what: ds://.*\R
check Regular expression
Check Mark the lines
Find all
Remove the non marked lines
Search menu > Bookmark
Remove unmarked lines
You don't need to add the \w specifier to look for a word after the ds:// in the look ahead. Removing that and altering the final specification from "zero or one carriage return, then zero or one newline" to "either a carriage return or a newline" in capture group should do it for you:
(?=ds:\/\/).*(?:\r|\n)
Update: Carriage return or Line feed group does not need to be captured.
Update 2: The following regex will actually work for your proposed use case in the comments, matching everything but the pattern you described in the question.
^(?:(?!ds:\/\/.*(?:\r|\n)).)*$
You regex (?=ds:\w+).*\r?\n? does not match because in the content there is ds:// and \w does not match a forward slash. To make your regex work you could change it to:
(?=ds://\w+).*\r?\n? demo which can be shortened to ds://.*\R? demo
Note that you don't have to escape the forward slash.
If you want to do a find and replace to keep the lines that contain ds:// you could use a negative lookahead:
Find what
^(?!.*ds://).*\R?
Replace with
Leave empty
Explanation
^ Start of the string
(?!.*ds://) Negative lookahead to assert the string does not contain ds://
.* Match any character 0+ times
\R? An optional unicode newline sequence to also match the last line if it is not followed by a newline
See the Regex demo
Here you go, Andrew:
Regex: ds:\/\/.*
Link: https://regex101.com/r/ulO9GO/2
Let me know if any question.

Regex to match Zero and Comma

I'm looking for a regex string that will capture the following text:
0, ,0,
I've tried a few variation of this but to no avail:
^[0,]+$
^[0,]
Any advice would be greatly appreciated.
Edited:
This will be used within another program that does regex pattern matching using Perl. The program reads a file with a list of entries within it. Using different profiles within the program I need to pick out entries that look like the following:
0, ,0,
These entries could also read like this:
1, ,0,
So the ideal regex I'm looking for would scan for "Does it start with a 1 or 0 immediatly followed by a comma then a space then a comma then number (0-9) and ending with a comma."
Further examples:
0, ,8,
1, ,5,
I hope that helps to clarify the request.
Thanks,
(?:[0\s]+,)+
there is a space in your string, so you need \s to match it.
Your question doesn't mention a particular regex implementation, so the answers you have received might not work for you. (Lesson: always specify the environment in which you plan to use this.)
In any reasonably modern regex variant,
[0,]+
matches a sequence of one or more characters. The character class [abc] matches a single character which is one of the enumerated characters inside the square brackets, and the quantifier + says to match the previous expression as many times as possible, but at least once.
Matching and capturing are separate concepts in some implementations. Perhaps you want to add parentheses around this regex to specify that you want to capture, not just match, the strings in the input which this regular expression describes (and in some implementations, you want to add a flag -commonly g - to say that you want all matches,not just the first).
Regex: ^(?:[0 ],)+$ or ^(?:[0\s],)+$
Details:
^ asserts position at start of the string
(?:) Non-capturing group
[] Match a single character present in the list
+ Matches between one and unlimited times
$ asserts position at the end of the string
\s matches any whitespace character
Regex demo
You need to capture spaces too with, for instance, \s:
^[0,\s]+$
\s will match all spaces characters and is the equivalent to [\r\n\t\f\v ].
See result in action here: https://regex101.com/r/g3faWA/1
You can also remove line delimiters (^ and $) if you want to match the parts of the line that contains 0 and commas even if the line contains other characters. That would give:
[0,\s]+

Match pattern anywhere in string?

I want to match the following pattern:
Exxxx49 (where x is a digit 0-9)
For example, E123449abcdefgh, abcdefE123449987654321 are both valid. I.e., I need to match the pattern anywhere in a string.
I am using:
^*E[0-9]{4}49*$
But it only matches E123449.
How can I allow any amount of characters in front or after the pattern?
Remove the ^ and $ to search anywhere in the string.
In your case the * are probably not what you intended; E[0-9]{4}49 should suffice. This will find an E, followed by four digits, followed by a 4 and a 9, anywhere in the string.
I would go for
^.*E[0-9]{4}49.*$
EDIT:
since it fullfills all requirements state by OP.
"[match] Exxxx49 (where x is digit 0-9)"
"allow for any amount of characters in front or after pattern"
It will match
^.* everything from, including the beginning of the line
E[0-9]{4}49 the requested pattern
.*$ everthing after the pattern, including the the end of the line
Your original regex had a regex pattern syntax error at the first *. Fix it and change it to this:
.*E\d{4}49.*
This pattern is for matching in engines (most engines) that are anchored, like Java. Since you forgot to specify a language.
.* matches any number of sequences. As it surrounds the match, this will match the entire string as long as this match is located in the string.
Here is a regex demo!
Just simply use this:
E[0-9]{4}49
How do I allow for any amount of characters in front or after pattern? but it only matches E123449
Use global flag /E\d{4}49/g if supported by the language
OR
Try with capturing groups (E\d{4}49)+ that is grouped by enclosing inside parenthesis (...)
Here is online demo

Why do I get successful but empty regex matches?

I'm searching the pattern (.*)\\1 on the text blabl with regexec(). I get successful but empty matches in regmatch_t structures. What exactly has been matched?
The regex .* can match successfully a string of zero characters, or the nothing that occurs between adjacent characters.
So your pattern is matching zero characters in the parens, and then matching zero characters immediately following that.
So if your regex was /f(.*)\1/ it would match the string "foo" between the 'f' and the first 'o'.
You might try using .+ instead of .*, as that matches one or more instead of zero or more. (Using .+ you should match the 'oo' in 'foo')
\1 is the backreference typically used for replacement later or when trying to further refine your regex by getting a match within a match. You should just use (.*), this will give you the results you want and will automatically be given the backreference number 1. I'm no regex expert but these are my thoughts based on my limited knowledge.
As an aside, I always revert back to RegexBuddy when trying to see what's really happening.
\1 is the "re-match" instruction. The question is, do you want to re-match immediately (e.g., BLABLA)
/(.+)\1/
or later (e.g., BLAahemBLA)
/(.+).*\1/