SonarQube: Ignore issues in blocks not working with regex - regex

I followed the instructions in question
Ignore Issues in Blocks
but since it didn't work, I suspect it may be something related with the regex rather than the Sonar configurations.
I have some code that is generated by EMF. I changed the code generation templates to also add //end-generated tag at the end of block, which looks like this:
/*
* #generated
*/
public class MyGeneratedClass implements Enumerator {
/*
* #generated
*/
public void myGeneratedMethod() {
// some code that SonarQube doesn't like
} //end-generated
}
The idea is that the regex must only match method blocks. So the START-BLOCK must match text #generated as long it's not followed by 'class', 'enum' or 'interface' word after the comment termination.
I created this regex:
For start block:
#generated\h*\s[\s\*]*\/(?!\s*[^\s]*\s*(class|enum|interface)\s)
For end block:
\/\/end-generated
This works in my tests using a simple code with Java Pattern class, which returns true for class sample above:
public static void main(String[] args) throws IOException {
String regex = "#generated\\h*\\s[\\s\\*]*\\/(?!\\s*[^\\s]*\\s*(class|enum|interface)\\s)";
String text = new String(Files.readAllBytes(Paths.get("test.txt")));
Matcher matcher = Pattern.compile(regex).matcher(text);
System.out.println(matcher.find());
}
However when I add it to SonarQube configurations, it doesn't work, issues found in the generated method are still reported by SonarQube.
I added the regex in:
SonarQube » Settings » Exclusions » Issues » Ignore Issues in Blocks
I also tried with escaped version of regex, and result is the same:
#generated\\h*\\s[\\s\\*]*\\/(?!\\s*[^\\s]*\\s*(class|enum|interface)\\s)
Also adding these settings directly into Sonar properties didn't work, as reported already in the question I mentioned above:
sonar.issue.ignore.block=rule1
sonar.issue.ignore.block.rule1.beginBlockRegexp=#generated\\h*\\s[\\s\\*]*\\/(?!\\s*[^\\s]*\\s*(class|enum|interface)\\s)
sonar.issue.ignore.block.rule1.endBlockRegexp=\\/\\/end-generated
I'm using SonarQube server 5.1.2 and running the analysis from Sonar-ant-task 2.3.
I'm wondering if it may be something wrong with the regex that makes SonarQube not able to match it, or maybe some limitation in SonarQube regex processing?
EDIT: I changed the regex for something more simple to match just the '#generated' word, and it worked. But if I put '#generated[\n\r]' to force to only match if there's a new line after #generated, then it doesn't work anymore.
It seems SonarQube doesn't support basic things such as new line characters. Can someone confirm this?

I confirm that all patterns used to exclude issues are applied line per line. In the end it is always translated to "exclude issues from line X to line Y". I agree this is not perfect (especially now we have precise issue location) but this is an "historical" feature. We will probably not invest time in it, since our mood is to avoid complex configuration. The ideal solution for your use case would be to have https://jira.sonarsource.com/browse/SONARJAVA-71 implemented.

Since SonarQube applies regex per line, I came up with a different solution for this use-case:
I use the start-block regex per line: (?m)#generated$ and then exclude class|enum|interface by adding this pattern to the end-block regex, together with end-generated.
Final configuration looks like this:
sonar.issue.ignore.block=rule1
sonar.issue.ignore.block.rule1.beginBlockRegexp=(?m)#generated$
sonar.issue.ignore.block.rule1.endBlockRegexp=\/\/\h*end-generated|\sclass\s|\senum\s|\sinterface\s

Related

How to match everything inside the first pair of square brackets

I'm trying to create a regular expression in sieve. The implementation of sieve that I'm using is Dovecot Pigeonhole
I'm subscribed to github project updates and I receive emails from github with the subject in the format that looks like this:
Re: [Opserver] Create issues on Jira from Exception details page (#77)
There is a project name in square bracket included in the subject line. Here is the relevant part of my sieve script:
if address "From" "notifications#github.com" {
if header :regex "subject" "\\[(.*)\\]" {
set :lower :upperfirst "repository" "${1}";
fileinto :create "Subscribtions.GitHub.${repository}"; stop;
} else {
fileinto :create "Subscribtions.GitHub"; stop;
}
}
As you can see from the above, I'm moving the messages to appropriate project IMAP folders. So the message with the subject above will end up in Subscribtions.Github.Opserver
Unfortunately, there is one small problem with this script. If someone adds square brackets in the title of their github issue, the filter breaks. For example if the subject is:
[Project] [Please look at it] - very weird issue
The above filter will move the message to folder Subscribtions.Github.Project] [please look at it which is completely undesirable. I'd like it to be moved to Subscribtions.Github.Project anyway.
This happens because by default regular expressions are greedy. So they match the longest possible match. However when I try to fix it the usual way changing "\\[(.*)\\]" to "\\[(.*?)\\]" nothing seems to change.
How do I write this regular expression so that it acts as desired?
The answer is to change "\\[(.*)\\]" to "\\[([^]]*)\\]".
By reading regex spec linked in the question we disvover that POSIX regular expression are used. Unfortunately those do not support non-greedy matches.
However there is a work around in this particular case, given above.

Maven replacer plugin - repeat while matches exist

I am using the maven replacer plugin and I've run into a situation where I have a regular expression that matches across lines which I need to run on the input file until all matches have been replaced. The configuration for this expression looks like this:
<regexFlags>
<regexFlag>DOTALL</regexFlag>
</regexFlags>
<replacements>
<replacement>
<token>\#([^\n\r=\#]+)\#=([^\n\r]*)(.*)(\#default\.\1\#=[^\n\r]*)(.*)</token>
<value>#$1#=$2$3$5</value>
<replacement>
<replacements>
The input could look like this:
#d.e.f#=y
#a.b.c#=x
#h.i.j#=aaaa
#default.a.b.c#=QQQ
#asdfasd.fasdfs.asdfa#=23423
#default.h.i.j#=234
#default.RR.TT#=393993
and I want the output to look like this:
#d.e.f#=y
#a.b.c#=x
#h.i.j#=aaaa
#asdfasd.fasdfs.asdfa#=23423
#default.RR.TT#=393993
The intention is to re-write the file, but without the tokens with a #default prefix, where another token without the prefix has already been defined.
#default.a.b.c#=QQQ and #default.h.i.j#=234 have been removed from the output because other tokens already contains a.b.c and h.i.j.
The current problem I have is that the replacer plugin only replaces the first match, so my output looks like this:
#d.e.f#=y
#a.b.c#=x
#h.i.j#=aaaa
#asdfasd.fasdfs.asdfa#=23423
#default.h.i.j#=234
#default.RR.TT#=393993
Here, #default.a.b.c=QQQ is gone, which is correct, but #default.h.i.j#=234 is still present.
If I were writing this in code, I think I could probably just loop while attempting to match on the entire output, and break when there are no matches. Is there a way to do this with the replacer plugin?
Edit: I may have over simplified my example. A more realistic one is:
#d.e.f#=y
#a.b.c#=x
#h.i.j#=aaaa
#default.a.b.c#=QQQ
#asdfasd.fasdfs.asdfa#=23423
#default.h.i.j#=234
#default.RR.TT#=393993
#x.y.z#=0
#default.q.r.s#=1
#l.m.n#=8.3
#q.r.s#=78
#blah.blah.blah#=blah
This shows that it's possible for a default.x.x.x=y to precede a x.x.x=y token (as #default.q.r.s#=1 preceedes #q.r.s#=78`), my prior example wasn't clear about this. I do actually have an expression to capture this, it looks a bit like this:
\#default\.([^\n\r=#|]+)#=([^\n\r|]*)(.*)#\1#=([^\n\r|]*)(.*)
I know line separators are missing from this even though they were in the other one - I was experimenting with removing all line separators and treating it as a single line but that hasn't helped. I can resolve this problem simply by running each replacement multiple times by copying and pasting the configurations a few times, but that is not a good solution and will fail eventually.
I don't believe you could solve this problem as is, a work-around is to reverse the order of the file top to bottom, perform lookahead regex and then reverse the result order
pattern = #default\.(.*?)#[^\r\n]+(?=[\s\S]*#\1#) Demo
another way (depending on the capabilities of "Maven") is to run this pattern
#(.*)(#[\s\S]*)#default\.\1.*
and replace with #$1$2 Demo in a loop until there are no matches
then run this pattern
#default\.(.*)#.*(?=[\s\S]*\1)
and replace with nothing Demo in a loop until there are no matches
It doesn't look like the replacer plugin can actually do what I want. I got around this by using regular expressions to build multiple filter files, and then applying them to the resource files.
My original goal had been to use regular expressions to build a single, clean, and tidy filter file. In the end, I discovered that I was able to get away with just using multiple filters (not as clean or tidy) and apply them in the correct order.

Using Regex to find function containing a specific method or variable

This is my first post on stackoverflow, so please be gentle with me...
I am still learning regex - mostly because I have finally discovered how useful they can be and this is in part through using Sublime Text 2. So this is Perl regex (I believe)
I have done searching on this and other sites but I am now genuinely stuck. Maybe I am trying to do something that can't be done
I would like to find a regex (pattern) that will let me find the function or method or procedure etc that contains a given variable or method call.
I have tried a number of expressions and they seem to get part of the way but not all the way. Particularly when searching in Javascript I pick up multiple function declarations instead of the one nearest to the call/variable that I am looking for.
for example:
I am looking for the function that calls the method save data()
I have learnt, from this excellent site that I can use (?s) to switch . to include newlines
function.*(?=(?s).*?savedata\(\))
however, that will find the first instance of the word function and then all the text unto and including savedata()
if there are multiple procedures then it will start at the next function and repeat until it gets to savedata() again
function(?s).*?savedata\(\) does something similar
I have tried asking it to ignore the second function (I believe) by using something like:
function(?s).*?(?:(?!function).*?)*savedata\(\)
But that doesn't work.
I have done some investigation with look forwards and look backwards but either I am doing it wrong (highly possible) or they are not the right thing.
In summary (I guess), how do I go backwards, from a given word to the nearest occurrence of a different word.
At the moment I am using this to search through some javascript files to try and understand the structure/calls etc but ultimately I am hoping to use on c# files and some vb.net files
Many thanks in advance
Thanks for the swift responses and sorry for not added an example block of code - which I will do now (modified but still sufficient to show the issue)
if I have a simple block of javascript like the following:
function a_CellClickHandler(gridName, cellId, button){
var stuffhappenshere;
var and here;
if(something or other){
if (anothertest) {
event.returnValue=false;
event.cancelBubble=true;
return true;
}
else{
event.returnValue=false;
event.cancelBubble=true;
return true;
}
}
}
function a_DblClickHandler(gridName, cellId){
var userRow = rowfromsomewhere;
var userCell = cellfromsomewhereelse;
//this will need to save the local data before allowing any inserts to ensure that they are inserted in the correct place
if (checkforarangeofthings){
if (differenttest) {
InsSeqNum = insertnumbervalue;
InsRowID = arow.getValue()
blnWasInsert = true;
blnWasDoubleClick = true;
SaveData();
}
}
}
running the regex against this - including the second one that was identified as should be working Sublime Text 2 will select everything from the first function through to SaveData()
I would like to be able to get to just the dblClickHandler in this case - not both.
Hopefully this code snippet will add some clarity and sorry for not posting originally as I hoped a standard code file would suffice.
This regex will find every Javascript function containing the SaveData method:
(?<=[\r\n])([\t ]*+)function[^\r\n]*+[\r\n]++(?:(?!\1\})[^\r\n]*+[\r\n]++)*?[^\r\n]*?\bSaveData\(\)
It will match all the lines in the function up to, and including, the first line containing the SaveData method.
Caveat:
The source code must have well-formed indentation for this to work, as the regex uses matching indentations to detect the end of functions.
Will not match a function if it starts on the first line of the file.
Explanation:
(?<=[\r\n]) Start at the beginning of a line
([\t ]*+) Capture the indentation of that line in Capture Group 1
function[^\r\n]*+[\r\n]++ Match the rest of the declaration line of the function
(?:(?!\1\})[^\r\n]*+[\r\n]++)*? Match more lines (lazily) which are not the last line of the function, until:
[^\r\n]*?\bSaveData\(\) Match the first line of the function containing the SaveData method call
Note: The *+ and ++ are possessive quantifiers, only used to speed up execution.
EDIT:
Fixed two minor problems with the regex.
EDIT:
Fixed another minor problem with the regex.

Highlighting Javascript Inline Block Comments in Vim

Background
I use JScript (Microsoft's ECMAScript implementation) for a lot of Windows system administration needs. This means I use a lot of ActiveX (Automated COM) objects. The methods of these objects often expect Number or Boolean arguments. For example:
var fso = new ActiveXObject("Scripting.FileSystemObject");
var a = fso.CreateTextFile("c:\\testfile.txt", true);
a.WriteLine("This is a test.");
a.Close();
(CreateTextFile Method on MSDN)
On the second line you see that the second argument is one that I'm talking about. A Boolean of "true" doesn't really describe how the method's behavior will change. This isn't a problem for me, but my automation-shy coworkers are easily spooked. Not knowing what an argument does spooks them. Unfortunately a long list of constants (not real constants, of course, since current JScript versions don't support them) will also spook them. So I've taken to documenting some of these basic function calls with inline block comments. The second line in the above example would be written as such:
var a = fso.CreateTextFile("c:\\testfile.txt", /*overwrite*/ true, /*unicode*/ false);
That ends up with a small syntax highlighting dilemma for me, though. I like my comments highlighted vibrantly; both block and line comments. These tiny inline block comments mean little to me, personally, however. I'd like to highlight those particular comments in a more muted fashion (light gray on white, for example). Which brings me to my dilemma.
Dilemma
I'd like to override the default syntax highlighting for block comments when both the beginning and end marks are on the same line. Ideally this is done solely in my vimrc file, and not in a superseding personal copy of the javascript.vim syntax. My initial attempt is pathetic:
hi inlineComment guifg=#bbbbbb
match inlineComment "\/\*.*\*\/"
Straight away you can see the first problem with this regular expression pattern is that it's a greedy search. It's going to match from the first "/*" to the last "*/" on the line, meaning everything between two inline block comments will get this highlight style as well. I can fix that, but I'm really not sure how to deal with my second concern.
Comments can't be defined inside of String literals in ECMAScript. So this syntax highlighting will override String highlighting as well. I've never had a problem with this in system administration scripts, but it does often bite me when I'm examining the source of many javascript libraries intended for browsers (less.js for example).
What regex pattern, syntax definition, or other solution would the amazing StackOverflow community recommend to restore my vimrc zen?
I'm not sure, but from your description it sounds like you don't need a new syntax definition. Vim syntax files usually let you override a particular syntax item with your own choice of highlighting. In this case, the item you want is called javaScriptComment, so a command like this will set its highlighting:-
hi javaScriptComment guifg=#bbbbbb
but you have to do this in your .vimrc file (or somewhere that's sourced from there), so it's evaluated before the syntax file. The syntax file uses the highlight default command, so the syntax file's choice of highlighting only affects syntax items with no highlighting set. See :help :hi-default for more details on that. BTW, it only works on Vim 5.8 and later.
The above command will change all inline /* */ comments, and leave // line comments with their default setting, because line comments are a different syntax item (javaScriptLineComment). You can find the names of all these groups by looking at the javascript.vim file. (The easiest way to do this is :e $VIMRUNTIME/syntax/javascript.vim .)
If you only want to change some inline comments, it's a little more complicated, but still easy to see what to do by looking at javascript.vim . If you do that, you can see that block comments are defined like this:-
syn region javaScriptComment start="/\*" end="\*/" contains=#Spell,javaScriptCommentTodo
See that you can use separate regexes for begin and end markers: you don't need to worry about matching the stuff in between with non-greedy quantifiers, or anything like that. To have a syntax item that works similarly but only on one line, try adding the oneline option (:h :syn-oneline for more details):-
syn region myOnelineComment start="/\*" end="\*/" oneline
I've removed the two contains groups because (1) if you're only using it for parameter names, you probably don't want spell-checking turned on inside these comments, and (2) contained sections that aren't oneline override the oneline in the container region, so you would still match all TODO comments with this region.
You can define this new kind of comment region in your .vimrc, and set the highlighting how you like: it looks like you already know how to do that, so I won't go into more details on that. I haven't tried out this particular example, so you may still need a bit of fiddling to make it work. Give it a try and let me know how it goes.
Why don't you simply add a comment line above the call?
I think that
// fso.CreateTextFile(filename:String, overwrite:Boolean, unicode:Boolean)
var a = fso.CreateTextFile("c:\\testfile.txt", true, false);
is a lot more readable and informative than
var a = fso.CreateTextFile("c:\\testfile.txt", /*overwrite*/ true, /*unicode*/ false);

StackOverflowError with Checkstyle 4.4 RegExp check

Hello,
Background:
I'm using Checkstyle 4.4.2 with a RegExp checker module to detect when the file name in out java source headers do not match the file name of the class or interface in which they reside. This can happen when a developer copies a header from one class to another and does not modify the "File:" tag.
The regular expression use in the RexExp checker has been through many incarnations and (though it is possibly overkill at this point) looks like this:
File: (\w+)\.java\n(?:.*\n)*?(?:[\w|\s]*?(?: class | interface )\1)
The basic form of files I am checking (though greatly simplified) looks like this
/*
*
* Copyright 2009
* ...
* File: Bar.java
* ...
*/
package foo
...
import ..
...
/**
* ...
*/
public class Bar
{...}
The Problem:
When no match is found, (i.e. when a header containing "File: Bar.java" is copied into file Bat.java ) I receive a StackOverflowError on very long files (my test case is #1300 lines).
I have experimented with several visual regular expression testers and can see that in the non-matching case when the regex engine passes the line containing the class or interface name it starts searching again on the next line and does some backtracking which probably causes the StackOverflowError
The Question:
How to prevent the StackOverflowError by modifying the regular expression
Is there some way to modify my regular expression such that in the non-matching case (i.e. when a header containing "File: Bar.java" is copied into file Bat.java ) that the matching would stop once it examines the line containing the interface or class name and sees that "\1" does not match the first group.
Alternatively if that can be done, Is is possible minimize the searching and matching that takes place after it examines the line containing the interface or class thus minimizing processing and (hopefully) the StackOverflow error?
Try
File: (\w+)\.java\n.*^[\w \t]+(?:class|interface) \1
in dot-matches-all mode. Rationale:
[\w\s] (the | doesn't belong there) matches anything, including line breaks. This results in a lot of backtracking back up into the lines that the previous part of the regex had matched.
If you let the greedy dot gobble up everything up to the end of the file (quick) and then backtrack until you find a line that starts with words or spaces/tabs (but no newlines) and then class or interface and \1, then that doesn't require as much stack space.
A different, and probably even better solution would be to split the problem into parts.
First match the File: (\w+)\.java part. Then do a second search with ^[\w \t]+(?:class|interface) plus the \1 match from the first search on the same file.
Follow up:
I plugged in Tim Pietzcher's suggestion above and his greedy solution did indeed fail faster and without a StackOverflowError when no match was found. However, in the positive case, the StackOverflowError still occurred.
I took a look at the source code RegexpCheck.java. The classes pattern is constructed in multiline mode such that the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. Then it reads the entire class file into a string and does a recursive search for the pattern(see findMatch()). That is undoubtedly the source of the StackOverflowException.
In the end I didn't get it to work (and gave up) Since Maven 2 released the maven-checkstyle-plugin-2.4/Checkstyle 5.0 about 6 weeks ago we've decided to upgrade our tools. This may not solve the StackOverflowError problem, but it will give me something else to work on until someone decides that we need to pursue this again.