Regex Notepad++ Function List - regex

I am trying to create a regex expression that will create a function list for a propietary programming language inside of notepad++ using the functionList.xml file. The regex expression needs to capture all instances of any Method, Function or Macro following this syntax:
Method Blah1(pParam1, pParam2)
[New] dynamicVar = "string"
[New] dynamicVar2 = 4
[New] result = Bar2(dynamicVar, dynamicVar2)
End Method
Function Bar2(pParam3, pParam2)
Foo3
Return pParam3 && pParam2
End Function
Macro Foo3()
End Macro
So the regexp should capture 3 instances for the above example.
Link to regexr: http://www.regexr.com/3fcd7
functionList.xml
<parser
displayName="MOX"
id ="mox_function"
commentExpr="(?s:/\*.*?\*/)|(?m-s://.*?$)"
>
<function
mainExpr="^(s|Method|Macro|Function)"
>
<functionName>
<nameExpr expr="[A-Za-z_]\w*\s*[=:]|[A-Za-z_]?\w*\s*\(" />
<nameExpr expr="[A-Za-z_]?\w*" />
</functionName>
<className>
<nameExpr expr="([A-Za-z_]\w*\.)*[A-Za-z_]\w*\." />
<nameExpr expr="([A-Za-z_]\w*\.)*[A-Za-z_]\w*" />
</className>
</function>
</parser>

Try this.
(?m)^(Method|Macro).*\(.*\)\n.*\n?^\s?End.*$
https://regex101.com/r/pAkFDU/

I've never been able to use FunctionList. It's been known to have stability issues for a long time and is even not listed, by default, in the plugin manager.
It's possible highly likely that a too-inclusive expression won't properly capture nested groups (unless FunctionList recurses through matches, or supports Recursion in Regex). Just something you should be aware of.
Too-inclusive regexes frequently stumble on scenarios like this
Function foo()
Function refoo()
End Function // 1
End Function // 2
Function solo(stuff,thing)
End Function
Function bar()
sample = Function (x,y)
return x & y
End Function
End Function // 3
Either failing at points 1 or 3. It can be quite difficult to properly capture the function, unless you have recursion available. (Function|Method|Macro)[\s\S]*?End \1 captures too little; take the ? away and it captures too much.
Truly recursive ((?R)) Regular Expressions (or Balanced Pairs), which aren't widely supported, can properly capture to the right point, but without code to recurse through it, won't flag the nested functions.
(Function|Method|Macro) (\w+)\((.*)\)((?:(?R)|[\s\S]*?)+)End \1 would correctly stop at point 2.
You can use a similar expression with a lookahead: (Function|Method|Macro) (\w+)\((.*)\)(?=(?:(?R)|[\s\S]*?)+End \1) but it's also easy to fool. All it basically cares about is that End \1, follows it anywhere in the file.
If FunctionList does support recursive regex and doesn't recurse (code-side) through the results (I don't know, the plugin won't work for me), I would look at a simpler expression like below. Honestly, even if it does, you'll get the best performance out of a simple expression. It will work faster and you'll know exactly what it is and is not indicating. Anything else has too much overhead for relatively little gain.
^\s*(Function|Method|Macro) (\w+)\((.+)\)
Output: \2 Args: \3
^\s*(Function|Method|Macro) (\w+)\(\)
Output: \2
The outputs are just examples, the plugin I use lets you customize the output. Others may not. Also, prefixing with ^\s* allows you to find functions not assigned to variables or commented-out.
Or both above can be captured in one expression by changing the first expression's + to a *. It won't tell you if there's a matching End, but it will find the part you're interested in for a document outline.
Personally, I've always had good luck with an a similar plugin ambiguously named "SourceCookifier" but it only parses the file as it was last saved/opened. If I add a new function, it's not listed until save/open.
The interface for adding new languages isn't very intuitive, packing a lot of features in a small area, but it works and it works well. Like FunctionList, and probably most other alternatives, you'll need to add rules for "Mox".
I'm not saying you should switch to this particular utility, but FunctionList hasn't been updated in 7 years.

Related

Regex expression to recognize XdY+Z OR XdY

I've been trying to develop a program that will be used for DMing in an MMORPG but I'm having trouble parsing for the actual regex expression I need.
To quote myself from another thread on a less active forum:
I've officially taken over the DiceRoller addon from years and years ago and I've reworked it a lot since I've taken it over and done a lot of testing in game. While I haven't uploaded anything yet, I've been struggling on a piece of regex expression that is currently crucial to the design of the addon.
Some background: the newest iteration of the DiceRoller addon makes it so you can type "!XdY" (where X is the number of dice, Y is the dice value) into raid chat and the DM who has the addon will go through some logic in the addon (random number lua protocol) and then spit out an input after adding up the dice.
It is as follows:
local count, size = string.match(message, "^!(%d+)[dD](%d+)$")
Now the functionality I need it to do is parse for both "!XdY" OR "XdY+Z", but it seems as if I can't get close to "XdY+Z" no matter which regex expression I use since I need it to do both expressions. I can provide more source code context if necessary.
This is the closest I've ever gotten:
http://i.imgur.com/eMhPHQB.png
and this is with the regex expression:
local count, size, modifier = string.match(message, "^!(%d+)[dD](%d+)+?(%d+)$")
As you can see, with the modifier it will work just fine. However, remove the modifier the regex expression still thinks that it is "XdY+Z" and so with "1d20" it think it is "1d2+0". It will think 1d200 is "1d20+0", etc. I've tried moving around the optional character "?" but it just causes the expression to not work at all. If I do !1d2 it doesn't work. It's almost as if the optional character NEEDS to be there?
Thanks for the help ahead of time, I've always struggled with regex.
local function dice(input)
local count, size, modifier = input:match"^!(%d+)[dD](%d+)%+?(%d*)$"
if count then
return tonumber(count), tonumber(size), tonumber("0"..modifier)
end
end
for _, input in ipairs{"!1d6", "!1d24", "!1d200", "!1d2+4", "!1d20+24"} do
print(input, dice(input))
end
Output:
!1d6 1 6 0
!1d24 1 24 0
!1d200 1 200 0
!1d2+4 1 2 4
!1d20+24 1 20 24
Lua regular expressions are very limited. You would need to use ^!(%d+)[dD](%d+)(?:+(%d+))?$ but this wouldn't be supported because of (?:+(%d+))? that uses a non-capturing group and a modifier on a group, both are not supported by Lua Patterns.
Consider using a regex library like this one that allows you to use PCRE, PHP regex engine, one of the most complete engine. But that would be overkill if you only want to use it for this regex. You can do it by code then, wouldn't be so hard for a simple task like this.
While Lua patterns are not powerful enough to parse this with one expression (as they don't support optional groups), there is an easy option to handle it with two expressions:
-- check the longer expression first
local count, size, modifier = string.match(message, "^!(%d+)[dD](%d+)+(%d+)$")
if not count then
count, size = string.match(message, "^!(%d+)[dD](%d+)$")
end

ctags and R regex

I'm trying to use ctags with R. Using this answer I've added
--langdef=R
--langmap=r:.R.r
--regex-R=/^[ \t]*"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t]function/\1/f,Functions/
--regex-R=/^"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t][^f][^u][^n][^c][^t][^i][^o][^n]/\1/g,GlobalVars/
--regex-R=/[ \t]"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t][^f][^u][^n][^c][^t][^i][^o][^n]/\1/v,FunctionVariables
to my .ctags file. However when trying to generate tags with the following R file
x <- 1
foo <- function () {
y <- 2
return(y)
}
Only the function foo is recognized. I want to generate tags also for variables (i.e for x and y in my code above). Should I change the regex in the ctags file? Thanks in advance.
Those patterns don't appear to be correct, because a variable is only identified if it is assigned a value that has the same length as "function" but does not share any of its characters. E.g.:
x <- aaaaaaaa
The following ctags configuration should work properly:
--langdef=R
--langmap=R:.R.r
--regex-R=/^[ \t]*"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t]function[ \t]*\(/\1/f,Functions/
--regex-R=/^"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t][^\(]+$/\1/g,GlobalVars/
--regex-R=/[ \t]"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t][^\(]+$/\1/v,FunctionVariables/
The idea here is that a function must be followed by a parenthesis while the variable declarations do not. Given the limitations of a regex parser this parenthesis must appear in the same line as the keyword "function" for this configuration to work.
Update: R support will be included in Universal Ctags, so give it a try and report bugs and missing features.
The problem with the original regex is that it does not capture variable declarations / assignments that are shorter than 8 characters (since the regex requires 8 characters that are NOT f-u-n-c-t-i-o-n in that specific order).
The problem with the regex updated by Vitor is that, it does not capture variable declarations / assignments which incorporate parantheses, a situation which is quite common in R.
And a forgotton issue is that, neither of them captures superassigned objects with <<- (only local assignment with <- is covered).
As I checked the Universal Ctags repo, although pcre regex is planned to be supported as also raised in Issue 519 and a commented out pcre flag exists in the configuration file, there is not yet support for pcre type positive/negative lookahead or lookbehind expressions in ctags unfortunately. When that support starts, things will be much easier.
My solution, first of all, takes into account superassignment by "<{1,2}-" and also the fact that the right side of assignment can include:
either a sequence of 8 characters which are not f-u-n-c-t-i-o-n followed by any or no characters (.*)
OR a sequence of at most 7 any characters (.{1,7})
My proposed regex patterns are as such:
--langdef=R
--langmap=R:.R.r
--regex-R=/^[ \t]*"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t]function[ \t]*\(/\1/f,Functions/
--regex-R=/^"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<{1,2}-[ \t]([^f][^u][^n][^c][^t][^i][^o][^n].*|.{1,7}$)/\1/g,GlobalVars/
--regex-R=/[ \t]"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<{1,2}-[ \t]([^f][^u][^n][^c][^t][^i][^o][^n].*|.{1,7}$)/\1/v,FunctionVariables/
And my tests show that, it captures much more objects than the previous ones.
Edit:
Even these do not capture the variables that are declared as arguments to functions. Since ctags regex makes a non-greedy search and non-capturing groups do not work, the result is still limited. But we can at least capture the first arguments to functions and define them as an additional type as a, Arguments:
--regex-R=/.+function[ ]*\([ \t]*(,*([^,= \(\)]+)[,= ]*)/\2/a,Arguments/

Hunspell/Aspell data conversion to human-readable inflection list

Is there an easy way to generate a human-readable inflection list from Hunspell/Aspell dictionary data files?
For example, I'd like to generate the following outputs (for different languages):
...
book, books
book, books, booked, booking
...
go, goes, went, gone, going
...
I looked at the Hunspell/Aspell docs, but couldn't find an API call that would do this.
There is a method that the command line one does, but it doesn't output quite in the format you're looking for. You could also do this manually if you wanted though just by some simple scripting with regex.
The format of for each set of affixes is
TYPE TAG REMOVE REPLACE MATCH
Such that where TAG matches what follows what's behind the /in a given word in the .dicfile, you can do the following (presuming you've already stripped the word of the /...):
if($word =~ /$match$/) $word =~ s/$remove$/$replace/;
Notice the $ there matching the end-of-line/word. Adjust with ^ if it's a prefix.
There are three caveats:
The $match directly from the .aff file is in almost all cases equivalent to standard regex. There are minor variations such that if the match is something like [abc-gh], you'd be better to change it to (a|b|c|-|g|h) or [abcgh-] (hunspell doesn't use hyphen as a metacharacter) otherwise it'll be interpreted as [abcdefgh] (standard regex). For a negated character class, your options are to manually move the - to the end of the expression (e.g. [^a-df] to [^adf-] or to use negative look behinds.
If $replace is 0, then you should change it to an empty string.
If your result ends with /..., you need to reprocess it again because it has a double affix.
Be careful. By my rough calculations, the dictionary I'm working on could have more than 50 million words being formed (and I wouldn't be surprised if it hits beyond 100 million).

How to click a link in only a single class

I have elements that can be in one of two state class="icon" or class="icon active".
I thought that $browser.element(:class => /^icon$/).click would click the first button that isn't active but it just clicks the first one it finds regardless of whether or not it also contains "active."
Is the regex wrong? Or better yet, is there a non-regex way of doing it?
As mentioned in the comments, the regex you used should work in watir-webdriver. However if you need a solution that will work in both watir-classic and watir-webdriver, you will need to use find.
b.elements.find{ |e| e.class_name == 'icon'}.click
This will only matches elements where the 'class' attribute is exactly 'icon'.
It is slower and less readable, but allows you to bypass watir-classic's method for matching classes. As seen below, watir-classic will check that the regex matches any of the element's classes.
def match_class? element, what
classes = element.class_name.split(/\s+/)
classes.any? {|clazz| what.matches(clazz)}
end
This is theoretical, and I apologize for not having the time to construct a fake page and test to see if it works
browser.element(:class => /icon(?!active)$/).click
This works in theory (the regex) matching a line like icon but not icon active but, there may be some under the hood magic that goes on with how class names are matched which might cause it to return the wrong line.
If that does not work let me know, I'll suggest an alternative approach, which while less elegant, ought to work.
For reference I used the Rubular online regex tester along with this SO answer Regular expression to match a line that doesn't contain a word? to some up with that.
Failing the ability to use a regex, another option would be to get a collection of matching items, and then inspect them more closely, clicking when you find one that works and abandoning the collection at that point.
browswer.elements(:class => "icon").each do |possible|
unless possible.attribute_value("class").include? "active"
possible.click
break
end
end
I'm not always a big fan of unless, but in this case it results in readable code, so I used it
for troubleshooting, lets see what is being shown for the class info on the elements in that collection
browswer.elements(:class => "icon").each do |possible|
puts possible.attribute_value("class")
end

Using Regex to find function containing a specific method or variable

This is my first post on stackoverflow, so please be gentle with me...
I am still learning regex - mostly because I have finally discovered how useful they can be and this is in part through using Sublime Text 2. So this is Perl regex (I believe)
I have done searching on this and other sites but I am now genuinely stuck. Maybe I am trying to do something that can't be done
I would like to find a regex (pattern) that will let me find the function or method or procedure etc that contains a given variable or method call.
I have tried a number of expressions and they seem to get part of the way but not all the way. Particularly when searching in Javascript I pick up multiple function declarations instead of the one nearest to the call/variable that I am looking for.
for example:
I am looking for the function that calls the method save data()
I have learnt, from this excellent site that I can use (?s) to switch . to include newlines
function.*(?=(?s).*?savedata\(\))
however, that will find the first instance of the word function and then all the text unto and including savedata()
if there are multiple procedures then it will start at the next function and repeat until it gets to savedata() again
function(?s).*?savedata\(\) does something similar
I have tried asking it to ignore the second function (I believe) by using something like:
function(?s).*?(?:(?!function).*?)*savedata\(\)
But that doesn't work.
I have done some investigation with look forwards and look backwards but either I am doing it wrong (highly possible) or they are not the right thing.
In summary (I guess), how do I go backwards, from a given word to the nearest occurrence of a different word.
At the moment I am using this to search through some javascript files to try and understand the structure/calls etc but ultimately I am hoping to use on c# files and some vb.net files
Many thanks in advance
Thanks for the swift responses and sorry for not added an example block of code - which I will do now (modified but still sufficient to show the issue)
if I have a simple block of javascript like the following:
function a_CellClickHandler(gridName, cellId, button){
var stuffhappenshere;
var and here;
if(something or other){
if (anothertest) {
event.returnValue=false;
event.cancelBubble=true;
return true;
}
else{
event.returnValue=false;
event.cancelBubble=true;
return true;
}
}
}
function a_DblClickHandler(gridName, cellId){
var userRow = rowfromsomewhere;
var userCell = cellfromsomewhereelse;
//this will need to save the local data before allowing any inserts to ensure that they are inserted in the correct place
if (checkforarangeofthings){
if (differenttest) {
InsSeqNum = insertnumbervalue;
InsRowID = arow.getValue()
blnWasInsert = true;
blnWasDoubleClick = true;
SaveData();
}
}
}
running the regex against this - including the second one that was identified as should be working Sublime Text 2 will select everything from the first function through to SaveData()
I would like to be able to get to just the dblClickHandler in this case - not both.
Hopefully this code snippet will add some clarity and sorry for not posting originally as I hoped a standard code file would suffice.
This regex will find every Javascript function containing the SaveData method:
(?<=[\r\n])([\t ]*+)function[^\r\n]*+[\r\n]++(?:(?!\1\})[^\r\n]*+[\r\n]++)*?[^\r\n]*?\bSaveData\(\)
It will match all the lines in the function up to, and including, the first line containing the SaveData method.
Caveat:
The source code must have well-formed indentation for this to work, as the regex uses matching indentations to detect the end of functions.
Will not match a function if it starts on the first line of the file.
Explanation:
(?<=[\r\n]) Start at the beginning of a line
([\t ]*+) Capture the indentation of that line in Capture Group 1
function[^\r\n]*+[\r\n]++ Match the rest of the declaration line of the function
(?:(?!\1\})[^\r\n]*+[\r\n]++)*? Match more lines (lazily) which are not the last line of the function, until:
[^\r\n]*?\bSaveData\(\) Match the first line of the function containing the SaveData method call
Note: The *+ and ++ are possessive quantifiers, only used to speed up execution.
EDIT:
Fixed two minor problems with the regex.
EDIT:
Fixed another minor problem with the regex.