How to find only patterns that are not commented? - regex

I have some code where the same condition ABC is used as part of an if clause, at the end of it (as a comment) and in obsolete sections (which I do not want to remove yet). An example could look like this:
if (ABC) //this is the only line that should be matched, this comment should not change the outcome of the search
{
lots of code
} // if (ABC)
//if (ABC)
// {
// lots of obsolete code
// } // if (ABC)
How can I tell vim to search for the pattern ABC only where is is not commented out via // occurring before it on the same line?
^.*\(\/\/\)\#!.*ABC did not work, because the .* are also fulfilled by // and ^\(\/\/\)\#!*ABC complains about "Nested *".
Any ideas?
Thank you

for the example in your question, this line works:
/\v(\/\/.*)#<!ABC
or without very magic:
/\(\/\/.*\)\#<!ABC

Related

Error while compiling regex function, why am I getting this issue?

My RAKU Code:
sub comments {
if ($DEBUG) { say "<filtering comments>\n"; }
my #filteredtitles = ();
# This loops through each track
for #tracks -> $title {
##########################
# LAB 1 TASK 2 #
##########################
## Add regex substitutions to remove superflous comments and all that follows them
## Assign to $_ with smartmatcher (~~)
##########################
$_ = $title;
if ($_) ~~ s:g:mrx/ .*<?[\(^.*]> / {
# Repeat for the other symbols
########################## End Task 2
# Add the edited $title to the new array of titles
#filteredtitles.push: $_;
}
}
# Updates #tracks
return #filteredtitles;
}
Result when compiling:
Error Compiling! Placeholder variable '#_' may not be used here because the surrounding block doesn't take a signature.
Is there something obvious that I am missing? Any help is appreciated.
So, in contrast with #raiph's answer, here's what I have:
my #tracks = <Foo Ba(r B^az>.map: { S:g / <[\(^]> // };
Just that. Nothing else. Let's dissect it, from the inside out:
This part: / <[\(^]> / is a regular expression that will match one character, as long as it is an open parenthesis (represented by the \() or a caret (^). When they go inside the angle brackets/square brackets combo, it means that is an Enumerated character class.
Then, the: S introduces the non-destructive substitution, i.e., a quoting construct that will make regex-based substitutions over the topic variable $_ but will not modify it, just return its value with the modifications requested. In the code above, S:g brings the adverb :g or :global (see the global adverb in the adverbs section of the documentation) to play, meaning (in the case of the substitution) "please make as many as possible of this substitution" and the final / marks the end of the substitution text, and as it is adjacent to the second /, that means that
S:g / <[\(^]> //
means "please return the contents of $_, but modified in such a way that all its characters matching the regex <[\(^]> are deleted (substituted for the empty string)"
At this point, I should emphasize that regular expressions in Raku are really powerful, and that reading the entire page (and probably the best practices and gotchas page too) is a good idea.
Next, the: .map method, documented here, will be applied to any Iterable (List, Array and all their alikes) and will return a sequence based on each element of the Iterable, altered by a Code passed to it. So, something like:
#x.map({ S:g / foo /bar/ })
essencially means "please return a Sequence of every item on #x, modified by substituting any appearance of the substring foo for bar" (nothing will be altered on #x). A nice place to start to learn about sequences and iterables would be here.
Finally, my one-liner
my #tracks = <Foo Ba(r B^az>.map: { S:g / <[\(^]> // };
can be translated as:
I have a List with three string elements
Foo
Ba(r
B^az
(This would be a placeholder for your "list of titles"). Take that list and generate a second one, that contains every element on it, but with all instances of the chars "open parenthesis" and "caret" removed.
Ah, and store the result in the variable #tracks (that has my scope)
Here's what I ended up with:
my #tracks = <Foo Ba(r B^az>;
sub comments {
my #filteredtitles;
for #tracks -> $_ is copy {
s:g / <[\(^]> //;
#filteredtitles.push: $_;
}
return #filteredtitles;
}
The is copy ensures the variable set up by the for loop is mutable.
The s:g/...//; is all that's needed to strip the unwanted characters.
One thing no one can help you with is the error you reported. I currently think you just got confused.
Here's an example of code that generates that error:
do { #_ }
But there is no way the code you've shared could generate that error because it requires that there is an #_ variable in your code, and there isn't one.
One way I can help in relation to future problems you may report on StackOverflow is to encourage you to read and apply the guidance in Minimal Reproducible Example.
While your code did not generate the error you reported, it will perhaps help you if you know about some of the other compile time and run time errors there were in the code you shared.
Compile-time errors:
You wrote s:g:mrx. That's invalid: Adverb mrx not allowed on substitution.
You missed out the third slash of the s///. That causes mayhem (see below).
There were several run-time errors, once I got past the compile-time errors. I'll discuss just one, the regex:
.*<?[...]> will match any sub-string with a final character that's one of the ones listed in the [...], and will then capture that sub-string except without the final character. In the context of an s:g/...// substitution this will strip ordinary characters (captured by the .*) but leave the special characters.
This makes no sense.
So I dropped the .*, and also the ? from the special character pattern, changing it from <?[...]> (which just tries to match against the character, but does not capture it if it succeeds) to just <[...]> (which also tries to match against the character, but, if it succeeds, does capture it as well).
A final comment is about an error you made that may well have seriously confused you.
In a nutshell, the s/// construct must have three slashes.
In your question you had code of the form s/.../ (or s:g/.../ etc), without the final slash. If you try to compile such code the parser gets utterly confused because it will think you're just writing a long replacement string.
For example, if you wrote this code:
if s/foo/ { say 'foo' }
if m/bar/ { say 'bar' }
it'd be as if you'd written:
if s/foo/ { say 'foo' }\nif m/...
which in turn would mean you'd get the compile-time error:
Missing block
------> if m/⏏bar/ { ... }
expecting any of:
block or pointy block
...
because Raku(do) would have interpreted the part between the second and third /s as the replacement double quoted string of what it interpreted as an s/.../.../ construct, leading it to barf when it encountered bar.
So, to recap, the s/// construct requires three slashes, not two.
(I'm ignoring syntactic variants of the construct such as, say, s [...] = '...'.)

Having a problem with multiple hyphens in a multistring regex match in Perl

I am downloading a webpage and converting into a string using LWP::Simple. When I copy the results into an editor I find multiple instances of the pattern I'm looking for "data-src-hq".
While I'm trying to do something more complex using regex I am starting in baby steps so I can properly learn how to use regex, I started off with just to match "data-src-hq" with the following code:
if($html =~ /data-src-hq/ism)
{
print "match\n";
}
else
{
print "nope\n";
}
My code returns "nope". However, if I modify the pattern search to just "data" or "data-src" I do get a match. The same happens no matter how I use and combine the string and multiline modifier.
My understanding is that a hyphen is not a special character unless it's within brackets, am I missing something simple?
How to fix this?
You are likely getting two outputs, one of match and one of nope. Your code is missing the keyword else:
See your code's current execution here
if($html =~ /data-src-hq/ism)
{
print "match\n";
}
{
print "nope\n";
}
Should be:
See this code's execution here
if($html =~ /data-src-hq/ism)
{
print "match\n";
}
else {
print "nope\n";
}
Otherwise, your code is fine and works to identify whether data-src-hq exists in $html.
So why does your existing code output nope?
That's because {} is a basic block (see Basic BLOCKs in Perl's documentation). An excerpt from the documentation:
A BLOCK by itself (labeled or not) is semantically equivalent to a
loop that executes once. Thus you can use any of the loop control
statements in it to leave or restart the block. (Note that this is NOT
true in eval{}, sub{}, or contrary to popular belief do{} blocks,
which do NOT count as loops.) The continue block is optional.

Regex to find the last "}" on last line

I am having trouble writing a regex for the last "}" in the last line in a file.
For example, if this is the file:
/*
blas
*/
import bla;
public class bla {
...
public void bla (blas){
...
}
...
} //this is the "}"
Can anyone help?
It's preferable if the solution doesn't rely on indentation or lack of additional comments at the end (I think it's fair to assume that even if there are comments at the end then they don't contain "}"), but right now I'm prepared to take anything.
The best I could do was: ^}$ but that relies on proper indentation.
Thanks a lot in advance.
No need for multi-line modifiers, just use this }(?=[^}]*$)
This would find the last } in the file.
If you want to force a find only on the last line it's (?m)^.*(}).*\z
or, if the last visible line it's (?m)^.*(}).*\s*\z
Simple as halwa!
use this:
}.*?$
Example here: https://regex101.com/r/oM3tW3/1
If you want to return the match, use:
(}).*?$

Regex replace special comments

So few months ago one of my colleagues left. He used to comment all his code this way:
//----------------------------
// COMMENT
//----------------------------
private void func()...
So each comment, instead of using 1 line at most, uses 4 lines (including break line), which drives me crazy. I'm trying to create a Regex which I can remove this comment safely and replace it. The above code should like this way:
// COMMENT
private void func()...
I thought of just removing each one of the '//----------------------------' but it leaves me with many empty lines as well as break line between the comment and the actual line which to be described. Any help will be well appreciated.
EDIT
Note one:
Our project is written in Visual Studio
Note two: Some comments may contain more than 1 line of comment, example:
//----------------------------
// LINE 1 COMMENT
// LINE 2 COMMENT
//----------------------------
This expression matches your case and any 3 lines of comments where the first and the last ones have trailing -:
((\s|\t)*\/{2,})(.*[-]+)(\r?\n)((\1(.*)\4?)+)\4\1\3\4?
Try it here
And then you can replace it with:
\5 (or $5)
EDIT: for multi-line comments.
Here's a Regular Expression that you can use to strip out the excess (decorative) comment lines and convert these bulky comments into one-liners.
It also supports indentation and multi-line comments using this style:
//----------------------------
// LINE 1 COMMENT
// LINE 2 COMMENT
//----------------------------
private void func()...
Find:
(( |\t)*?\r\n)?( |\t)*?//-+(\r\n( |\t)*?// .+)+\r\n( |\t)*?//-+\r\n
Replace With:
\4
(Replace \4 with $4 if the replace failed)
Good luck!

Jflex capturing match

I'm using JFlex and i want to match something like:
|MATCHED|NOTMACHED|
|NOTMACHED|NOTMACHED|NOTMACHED|
<newline>
|MATCHED|NOTMACHED|
|NOTMACHED|NOTMACHED|NOTMACHED|
my pattern:
FIXTURE_NAME_PATTERN=[^\|]\n\|[A-Za-z]+\|
<YYINITIAL> {FIXTURE_NAME}
{ yybegin(YYINITIAL); return FitnesseTypes.FIXTURE_NAME; }
But it matches pipes "|" aswell as previous lines (whitespaces). I tried to use capturing but i can't get this working. Any suggestions ?
You'll want to use states. Define a state such as
%state AFTER_NEWLINE
in your state definitions.
Then, in your lexical definitions, you would have something like this:
<YYINITIAL>\n
{ yybegin(AFTER_NEWLINE); }
<AFTER_NEWLINE>|
{ /*Do whatever you want with pipes*/ }
<AFTER_NEWLINE>[A-Za-z]
{ yybegin(YYINITIAL); return FitnesseTypes.FIXTURE_NAME; }
//Any other lexical definitions you might need
What this basically does is every time a new line is hit, it sets the state to AFTER_NEWLINE. Then, it matches the next time a bunch of letters show up in a row, and sets the state back to YYINITIAL. Pipes are thrown away.