Excluding the pattern for vim syntax highlighting - regex

I am trying to adjust the reStructured text syntax highlighting in vim. I have tried several vim regexes to get highlight working for below two examples, but I am unable to. If I use search/highlight function all below regexes do the job, but for highlighter (syn match) it is not working. Maybe I need to change syn match to something else?
This is the text example I am looking at in rst file:
.. item:: This is the title I want to highlight
there is some text here which I do not care
.. item-matrix:: This is the title I want to highlight
:source: XX
:target: YY
Regexes that match the text:
[.+].*[:+] \zs.*
\(.. .*:: \)\zs.*
When putting that to syn match it does not work (.vim):
syn match rstHeading /[.+].*[:+] \zs.*/
I know I am close because above example matches for
..:: This is highlighted as rstHeading

When integrating with an existing syntax script (here: $VIMRUNTIME/syntax/rst.vim), you need to consider the existing syntax groups. :syn list shows all active groups, but it's easier when you install the SyntaxAttr.vim - Show syntax highlighting attributes of character under cursor plugin. (I maintain an extended fork.)
On your example headings, I see that the .. item:: part is matched by rstExplicitMarkup, and the remainder (what you want to highlight) by rstExDirective.
Assuming that you want to integrate with (and not completely override) these, you need your syntax group to be contained inside the latter. This can be done via containedin=rstExDirective.
Another pitfall is that \zs limits the highlighting, but internally still matches the whole text. In combination with syntax highlighting, this means that the existing rstExplicitMarkup prevents a match of your pattern. If you use a positive lookbehind (:help /\#<=) instead, it'll work:
syn match rstHeading /\%([.+].*[:+] \)\#<=.*/ containedin=rstExDirective
Of course, to actually see any highlighting, you also need to define or link a highlight group to your new syntax group:
hi link rstHeading Title

Related

using regex expressions to only select lines with an error code (-) and ignore others

I'm new to regex expressions and I need some help to capture only lines that have the (-999), and retrieve the number in parentheses in a line like "2016/99/99 12:00:0.999 2 1 (-499) Cannot open the message store with error code"
This is for an ITM log monitor for Tivoli.
Would be nice if you had tried something yourself.. but anyway try this: \((-.*?)\)
This will capture anything negative that is between the brackets.
You never specified what language you were using, some languages use different regex engines and may or may not require delimiters - so this is a bit of a shot in the dark.
Demo - https://regex101.com/r/aD2lM3/2

Regular expression to remove comment

I am trying to write a regular expression which finds all the comments in text.
For example all between /* */.
Example:
/* Hello */
When I do this:/\*.*\*/, it behaves odd and nothing is shown. What is wrong with it?
EDIT: The comments can be spread across multiple lines
Unlike the example posted above, you were trying to match comments that spanned multiple lines. By default, . does not match a line break. Thus you have to enable multi-line mode in the regex to match multi-line comments.
Also, you probably need to use .*? instead of .*. Otherwise it will make the largest match possible, which will be everything between the first open comment and the last close comment.
I don't know how to enable multi-line matching mode in Sublime Text 2. I'm not sure it is available as a mode. However, you can insert a line break into the actual pattern by using CTRL + Enter. So, I would suggest this alternative:
/\*(.|\n)*?\*/
If Sublime Text 2 doesn't recognize the \n, you could alternatively use CTRL + Enter to insert a line break in the pattern, in place of \n.
I encountered this problem several years ago and wrote an entire article about it.
If you don't have access to non-greedy matching (not all regex libraries support non-greedy) then you should use this regex:
/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/
If you do have access to non-greedy matching then you can use:
/\*(.|[\r\n])*?\*/
Also, keep in mind that regular expressions are just a heuristic for this problem. Regular expressions don't support cases in which something appears to be a comment to the regular expression but actually isn't:
someString = "An example comment: /* example */";
// The comment around this code has been commented out.
// /*
some_code();
// */
Just want to add for HTML Comments is is this
\<!--(.|\n)*?-->
Just an additionnal note about using regex to remove comments inside a programming language file.
Warning!
Doing this you must not forget the case where you have the string /* or */ inside a string in the code - like var string = "/*"; - (we never know if you parse a huge code that is not yours)!
So the best is to parse the document with a programming language and have a boolean to save the state of an open string (and ignore any match inside open string).
Again a string delimited by " can contain a \" so pay attention with the regex!
You cannot write a regular expression that would be able to correctly find all comments, or even one type of comments - single-line or multiline.
Regular expressions can only provide a partial match, one that would would cover perhaps 90% of all cases, but that's it.
The syntax for regular expression is so complex, it is only possible to identify them correctly in 100% of cases by doing a full expression evaluation, which in turn is based on tokenizing the code. The latter is a huge task, which is implemented by all AST parsers today. See AST Explorer
Only a proper-written AST parser can tell you precisely where all regular expressions are located in your code. You would have to write a parser then based on that.
Or, you could use one of the existing libraries that already do all that, like decomment.
RegEx examples where any head-on approach is going to stumble, being unable to tell a regular expression from a comment block:
/\// - it will think this reg-ex is a single-line comment
/\/*/ - it will think this reg-ex opens a multi-line comment
The answer which user1919238 wrote works. Just corroborating that here, although the many upvotes probably do give you a clue.
It got rid of all these annoying block comments, put here just to show the usefulness/thank user1919238 for saving time:
/*# sourceMappingURL=data:application/json;base64,eyJ2ZXJzaW9uIjozLCJzb3VyY2VzIjpbIndlYnBhY2s6Ly9zdHlsZXMvZ2xvYmFscy5jc3MiXSwibmFtZXMiOltdLCJtYXBwaW5ncyI6IkFBQUE7O0VBRUUsVUFBVTtFQUNWLFNBQVM7RUFDVDt3RUFDc0U7QUFDeEU7O0FBRUE7RUFDRSxjQUFjO0VBQ2QscUJBQXFCO0FBQ3ZCOztBQUVBO0VBQ0Usc0JBQXNCO0FBQ3hCIiwic291cmNlc0NvbnRlbnQiOlsiaHRtbCxcbmJvZHkge1xuICBwYWRkaW5nOiAwO1xuICBtYXJnaW46IDA7XG4gIGZvbnQtZmFtaWx5OiAtYXBwbGUtc3lzdGVtLCBCbGlua01hY1N5c3RlbUZvbnQsIFNlZ29lIFVJLCBSb2JvdG8sIE94eWdlbixcbiAgICBVYnVudHUsIENhbnRhcmVsbCwgRmlyYSBTYW5zLCBEcm9pZCBTYW5zLCBIZWx2ZXRpY2EgTmV1ZSwgc2Fucy1zZXJpZjtcbn1cblxuYSB7XG4gIGNvbG9yOiBpbmhlcml0O1xuICB0ZXh0LWRlY29yYXRpb246IG5vbmU7XG59XG5cbioge1xuICBib3gtc2l6aW5nOiBib3JkZXItYm94O1xufVxuIl0sInNvdXJjZVJvb3QiOiIifQ== */
/*# sourceMappingURL=data:application/json;base64,eyJ2ZXJzaW9uIjozLCJzb3VyY2VzIjpbIndlYnBhY2s6Ly9zdHlsZXMvSG9tZS5tb2R1bGUuY3NzIl0sIm5hbWVzIjpbXSwibWFwcGluZ3MiOiJBQUFBO0VBQ0UsaUJBQWlCO0VBQ2pCLGlCQUFpQjtFQUNqQixhQUFhO0VBQ2Isc0JBQXNCO0VBQ3RCLHVCQUF1QjtFQUN2QixtQkFBbUI7QUFDckI7O0FBRUE7RUFDRSxlQUFlO0VBQ2YsU0FBTztFQUNQLGFBQWE7RUFDYixzQkFBc0I7RUFDdEIsdUJBQXVCO0VBQ3ZCLG1CQUFtQjtBQUNyQjs7QUFFQTtFQUNFLFdBQVc7RUFDWCxhQUFhO0VBQ2IsNkJBQTZCO0VBQzdCLGFBQWE7RUFDYix1QkFBdUI7RUFDdkIsbUJBQW1CO0FBQ3JCOztBQUVBO0VBQ0UsbUJBQW1CO0FBQ3JCOztBQUVBO0VBQ0UsYUFBYTtFQUNiLHVCQUF1QjtFQUN2QixtQkFBbUI7QUFDckI7O0FBRUE7RUFDRSxjQUFjO0VBQ2QscUJBQXFCO0FBQ3ZCOztBQUVBOzs7RUFHRSwwQkFBMEI7QUFDNUI7O0FBRUE7RUFDRSxTQUFTO0VBQ1QsaUJBQWlCO0VBQ2pCLGVBQWU7QUFDakI7O0FBRUE7O0VBRUUsa0JBQWtCO0FBQ3BCO0FBQ0E7RUFDRSxnQkFBZ0I7RUFDaEIsaUJBQWlCO0FBQ25COztBQUVBO0VBQ0UsbUJBQW1CO0VBQ25CLGtCQUFrQjtFQUNsQixnQkFBZ0I7RUFDaEIsaUJBQWlCO0VBQ2pCO29EQUNrRDtBQUNwRDs7QUFFQTtFQUNFLGFBQWE7RUFDYixtQkFBbUI7RUFDbkIsdUJBQXVCO0VBQ3ZCLGVBQWU7RUFDZixnQkFBZ0I7RUFDaEIsZ0JBQWdCO0FBQ2xCOztBQUVBO0VBQ0UsWUFBWTtFQUNaLGVBQWU7RUFDZixlQUFlO0VBQ2YsZ0JBQWdCO0VBQ2hCLGNBQWM7RUFDZCxxQkFBcUI7RUFDckIseUJBQXlCO0VBQ3pCLG1CQUFtQjtFQUNuQixxREFBcUQ7QUFDdkQ7O0FBRUE7OztFQUdFLGNBQWM7RUFDZCxxQkFBcUI7QUFDdkI7O0FBRUE7RUFDRSxrQkFBa0I7RUFDbEIsaUJBQWlCO0FBQ25COztBQUVBO0VBQ0UsU0FBUztFQUNULGtCQUFrQjtFQUNsQixnQkFBZ0I7QUFDbEI7O0FBRUE7RUFDRSxXQUFXO0FBQ2I7O0FBRUE7RUFDRTtJQUNFLFdBQVc7SUFDWCxzQkFBc0I7RUFDeEI7QUFDRiIsInNvdXJjZXNDb250ZW50IjpbIi5jb250YWluZXIge1xuICBtaW4taGVpZ2h0OiAxMDB2aDtcbiAgcGFkZGluZzogMCAwLjVyZW07XG4gIGRpc3BsYXk6IGZsZXg7XG4gIGZsZXgtZGlyZWN0aW9uOiBjb2x1bW47XG4gIGp1c3RpZnktY29udGVudDogY2VudGVyO1xuICBhbGlnbi1pdGVtczogY2VudGVyO1xufVxuXG4ubWFpbiB7XG4gIHBhZGRpbmc6IDVyZW0gMDtcbiAgZmxleDogMTtcbiAgZGlzcGxheTogZmxleDtcbiAgZmxleC1kaXJlY3Rpb246IGNvbHVtbjtcbiAganVzdGlmeS1jb250ZW50OiBjZW50ZXI7XG4gIGFsaWduLWl0ZW1zOiBjZW50ZXI7XG59XG5cbi5mb290ZXIge1xuICB3aWR0aDogMTAwJTtcbiAgaGVpZ2h0OiAxMDBweDtcbiAgYm9yZGVyLXRvcDogMXB4IHNvbGlkICNlYWVhZWE7XG4gIGRpc3BsYXk6IGZsZXg7XG4gIGp1c3RpZnktY29udGVudDogY2VudGVyO1xuICBhbGlnbi1pdGVtczogY2VudGVyO1xufVxuXG4uZm9vdGVyIGltZyB7XG4gIG1hcmdpbi1sZWZ0OiAwLjVyZW07XG59XG5cbi5mb290ZXIgYSB7XG4gIGRpc3BsYXk6IGZsZXg7XG4gIGp1c3RpZnktY29udGVudDogY2VudGVyO1xuICBhbGlnbi1pdGVtczogY2VudGVyO1xufVxuXG4udGl0bGUgYSB7XG4gIGNvbG9yOiAjMDA3MGYzO1xuICB0ZXh0LWRlY29yYXRpb246IG5vbmU7XG59XG5cbi50aXRsZSBhOmhvdmVyLFxuLnRpdGxlIGE6Zm9jdXMsXG4udGl0bGUgYTphY3RpdmUge1xuICB0ZXh0LWRlY29yYXRpb246IHVuZGVybGluZTtcbn1cblxuLnRpdGxlIHtcbiAgbWFyZ2luOiAwO1xuICBsaW5lLWhlaWdodDogMS4xNTtcbiAgZm9udC1zaXplOiA0cmVtO1xufVxuXG4udGl0bGUsXG4uZGVzY3JpcHRpb24ge1xuICB0ZXh0LWFsaWduOiBjZW50ZXI7XG59XG4uZGVzY3JpcHRpb24ge1xuICBsaW5lLWhlaWdodDogMS41O1xuICBmb250LXNpemU6IDEuNXJlbTtcbn1cblxuLmNvZGUge1xuICBiYWNrZ3JvdW5kOiAjZmFmYWZhO1xuICBib3JkZXItcmFkaXVzOiA1cHg7XG4gIHBhZGRpbmc6IDAuNzVyZW07XG4gIGZvbnQtc2l6ZTogMS4xcmVtO1xuICBmb250LWZhbWlseTogTWVubG8sIE1vbmFjbywgTHVjaWRhIENvbnNvbGUsIExpYmVyYXRpb24gTW9ubywgRGVqYVZ1IFNhbnMgTW9ubyxcbiAgICBCaXRzdHJlYW0gVmVyYSBTYW5zIE1vbm8sIENvdXJpZXIgTmV3LCBtb25vc3BhY2U7XG59XG5cbi5ncmlkIHtcbiAgZGlzcGxheTogZmxleDtcbiAgYWxpZ24taXRlbXM6IGNlbnRlcjtcbiAganVzdGlmeS1jb250ZW50OiBjZW50ZXI7XG4gIGZsZXgtd3JhcDogd3JhcDtcbiAgbWF4LXdpZHRoOiA4MDBweDtcbiAgbWFyZ2luLXRvcDogM3JlbTtcbn1cblxuLmNhcmQge1xuICBtYXJnaW46IDFyZW07XG4gIGZsZXgtYmFzaXM6IDQ1JTtcbiAgcGFkZGluZzogMS41cmVtO1xuICB0ZXh0LWFsaWduOiBsZWZ0O1xuICBjb2xvcjogaW5oZXJpdDtcbiAgdGV4dC1kZWNvcmF0aW9uOiBub25lO1xuICBib3JkZXI6IDFweCBzb2xpZCAjZWFlYWVhO1xuICBib3JkZXItcmFkaXVzOiAxMHB4O1xuICB0cmFuc2l0aW9uOiBjb2xvciAwLjE1cyBlYXNlLCBib3JkZXItY29sb3IgMC4xNXMgZWFzZTtcbn1cblxuLmNhcmQ6aG92ZXIsXG4uY2FyZDpmb2N1cyxcbi5jYXJkOmFjdGl2ZSB7XG4gIGNvbG9yOiAjMDA3MGYzO1xuICBib3JkZXItY29sb3I6ICMwMDcwZjM7XG59XG5cbi5jYXJkIGgzIHtcbiAgbWFyZ2luOiAwIDAgMXJlbSAwO1xuICBmb250LXNpemU6IDEuNXJlbTtcbn1cblxuLmNhcmQgcCB7XG4gIG1hcmdpbjogMDtcbiAgZm9udC1zaXplOiAxLjI1cmVtO1xuICBsaW5lLWhlaWdodDogMS41O1xufVxuXG4ubG9nbyB7XG4gIGhlaWdodDogMWVtO1xufVxuXG5AbWVkaWEgKG1heC13aWR0aDogNjAwcHgpIHtcbiAgLmdyaWQge1xuICAgIHdpZHRoOiAxMDAlO1xuICAgIGZsZXgtZGlyZWN0aW9uOiBjb2x1bW47XG4gIH1cbn1cbiJdLCJzb3VyY2VSb290IjoiIn0= */
/*# sourceMappingURL=data:application/json;base64, */
if you want to replace the obnoxious comment from flutter main.dart,
Press cmd +r on mac or cntrl+ r on windows,
type //.* into the box above, leave the box below empty
click .* on the replace dialog, to activate regex,
then click on replace all. this will remove all your comments, you can do this if you want to remove all comments in any file in a flutter.
Additional, to reformat the main.dart
press cmd+a on mac and cntrl+a on windows,
then press cmd+alt(option)+l or cntrl+alt+l, this will reformat the code.
I will attach a picture of the main. dart, the green .* at the top of the page is what you will press to activate the regex.

Regexp-replace: Multiple replacements within a match

I'm converting our MVC3 project to use T4MVC. And I would like to replace java-script includes to work with T4MVC as well. So I need to replace
"~/Scripts/DataTables/TableTools/TableTools.min.js"
"~/Scripts/jquery-ui-1.8.24.min.js"
Into
Scripts.DataTables.TableTools.TableTools_min_js
Scripts.jquery_ui_1_8_24_min_js
I'm using Notepad++ as a regexp tool at the moment, and it is using POSIX regexps.
I can find script name and replace it with these regexps:
Find: \("~/Scripts/(.*)"\)
Replace with \(Scripts.\1\)
But I can't figure out how do I replace dots and dashes in the file names into underscores and replace forward slashes into dots.
I can check that js-filename have dot or dash in a name with this
\("~/Scripts/(?=\.*)(?=\-*).*"\)
But how do I replace groups within a group?
Need to have non-greedy replacement within group, and have these replacements going in an order, so forward slashes converted into a dot will not be converted to underscore afterwards.
This is a non-critical problem, I've already done all the replacements manually, but I thought I'm good with regexp, so this problem bugs me!!
p.s. preferred tool is Notepad++, but any POSIX regexp solution would do -)
p.p.s. Here you can get a sample of stuff to be replaced
And here is the the target text
I would just use a site like RegexHero
You can past the code into the target string box, then place (?<=(~/Script).*)[.-](?=(.*"[)]")) into the Regular Expression box, with _ in the Replacement String box.
Once the replace is done, click on Final String at the bottom, and select Move to target string and start a new expression.
From there, Paste (?<=(<script).*)("~/)(?=(.*[)]" ))|(?<=(Url.).*)(")(?=(.*(\)" ))) into the Regular Expression box and leave the Replacement String box empty.
Once the replace is done, click on Final String at the bottom, and select Move to target string and start a new expression.
From there paste (?<=(Script).*)[/](?=(.*[)]")) into the Regular Expression box and . into the Replacement String box.
After that, the Final String box will have what you are looking for. I'm not sure the upper limits of how much text you can parse, but it could be broken up if that's an issue. I'm sure there might be better ways to do it, but this tends to be the way I go about things like this. One reason I like this site, is because I don't have to install anything, so I can do it anywhere quickly.
Edit 1: Per the comments, I have moved step 3 to Step 5 and added new steps 3 and 4. I had to do it this way, because new Step 5 would have replaced the / in "~/Scripts with a ., breaking the removal of "~/. I also had to change Step 5's code to account for the changed beginning of Script
Here is a vanilla Notepad++ solution, but it's certainly not the most elegant one. I managed to do the transformation with several passes over the file.
First pass
Replace . and - with _.
Find: ("~/Scripts[^"]*?)[.-]
Replace With: \1_
Unfortunately, I could not find a way to match only the . or -, because it would require a lookbehind, which is apparently not supported by Notepad++. Due to this, every time you execute the replacement only the first . or - in a script name will be replaced (because matches cannot overlap). Hence, you have to run this replacement multiple times until no more replacements are done (in your example input, that would be 8 times).
Second pass
Replace / with ..
Find: ("~/Scripts[^"]*?)/
Replace with: \1.
This is basically the same thing as the first pass, just with different characters (you will have to this 3 times for the example file). Doing the passes in this order ensures that no slashes will end up as underscores.
Third pass
Remove the surrounding characters.
Find: "~/(Scripts[^"]*?)"
Replace with: \1
This will now match all the script names that are still surrounded by "~/ and ", capturing what is in between and just outputting that.
Note that by including those surrounding characters in the find patterns of the first two passes, you can avoid converting the . in strings that are already of the new format.
As I said this is not the most convenient way to do it. Especially, since passes one and two have to be executed manually multiple times. But it would still save a lot of time for large files, and I cannot think of a way to get all of them - only in the correct strings - in one pass, without lookbehind capabilities. Of course, I would very much welcome suggestions to improve this solution :). I hope I could at least give you (and anyone with a similar problem) a starting point.
If, as your question indicates, you'd like to use N++ then use N++ Python Script. Setup the script and assign a shortcut key, then you have a single pass solution requiring only to open, modify, and save... can't get much simpler than that.
I think part of the problem is that N++ is not a regex tool and the use of a dedicated regex tool
, or even a search/replace solution, is sometimes warranted. You may be better off, both in speed and in time value using a tool made for text processing vs editing.
[Script Edit]:: Altered to match the modified in/out expectations.
# Substitute & Replace within matched group.
from Npp import *
import re
def repl(m):
return "(Scripts." + re.sub( "[-.]", "_", m.group(1) ).replace( "/", "." ) + ")"
editor.pyreplace( '(?:[(].*?Scripts.)(.*?)(?:"?[)])', repl )
Install:: Plugins -> Plugin Manager -> Python Script
New Script:: Plugins -> Python Script -> script-name.py
Select target tab.
Run:: Plugins -> Python Script -> Scripts -> script-name
[Edit: An extended one-liner PythonScript command]
Having need for the new regex module for Python (that I hope replaces re) I played around and compiled it for use with the N++ PythonScript plugin and decided to test it on your sample set.
Two commands on the console ended up with the correct results in the editor.
import regex as re
editor.setText( (re.compile( r'(?<=.*Content[(].*)((?<omit>["~]+?([~])[/]|["])|(?<toUnderscore>[-.]+)|(?<toDot>[/]+))+(?=.*[)]".*)' ) ).sub(lambda m: {'omit':'','toDot':'.','toUnderscore':'_'}[[ key for key, value in m.groupdict().items() if value != None ][0]], editor.getText() ) )
Very sweet!
What else is really cool about using regex instead of re was that I was able to build the expression in Expresso and use it as is! Which allows for a verbose explanation of it, just by copy-paste of the r'' string portion into Expresso.
The abbreviated text of which is::
Match a prefix but exclude it from the capture. [.*Content[(].*]
[1]: A numbered capture group. [(?<omit>["~]+?([~])[/]|["])|(?<toUnderscore>[-.]+)|(?<toDot>[/]+)], one or more repetitions
Select from 3 alternatives
[omit]: A named capture group. [["~]+?([~])[/]|["]]
Select from 2 alternatives
["~]+?([~])[/]
Any character in this class: ["]
[toUnderscore]: A named capture group. [[-.]+]
[toDot]: A named capture group. [[/]+]
Match a suffix but exclude it from the capture. [.*[)]".*]
The command breakdown is fairly nifty, we are telling Scintilla to set the full buffer contents to the results of a compiled regex substitution command by essentially using a 'switch' off of the name of the group that isn't empty.
Hopefully Dave (the PythonScript Author) will add the regex module to the ExtraPythonLibs part of the project.
Alternatively you could use a script that would do it and avoid copy pasting and the rest of the manual labor altogether. Consider using the following script:
$_.gsub!(%r{(?:"~/)?Scripts/([a-z0-9./-]+)"?}i) do |i|
'Scripts.' + $1.split('/').map { |i| i.gsub(/[.-]/, '_') }.join('.')
end
And run it like this:
$ ruby -pi.bak script.rb *.ext
All the files with extension .ext will be edited in-place and the original files will be saved with .ext.bak extension. If you use revision control (and you should) then you can easily review changes with some visual diff tool, correct them if necessary and commit them afterwards.

Multiline regexp in jEdit custom mode

I'm currently creating a language with a friend and I would like to provide a highlighting for it in jEdit.
It's syntax is actually quite simple. The functions can only match this pattern:
$function_name(arguments)
Note that our parser is currently working without closing tag like the C-style semi-column and that we would like to keep this feature.
I created my jEdit mode and (almost) succeeded in highligting my pattern with <SPAN_REGEXP>. Here's how I did it:
<SPAN_REGEXP HASH_CAR="\$" TYPE="KEYWORD3" DELEGATE="ARGS">
<BEGIN>\$[A-Za_z0-9_]*\s*\(</BEGIN>
<END>)</END>
</SPAN_REGEXP>
But It's not good enough.
Here's what I would like:
Same color for the entire function skeleton : $func( )
Special highlighting (already defined within the ARGS rules set) for %content1% in $func(%content1%)
No highlighting for brackets not following a $func
Authorize alternative multiline syntax like
$func
(
args
)
which is for now not highlighted.
I guessed I needed to change my <BEGIN> regexp to accept newlines, but it seems that jEdit is unable to match multiline regexp for highlighting although he does it perfectly for search&replace !
I tried the (?s) and (?m) flags, the [\d\D]* workaround, even [\r\n]* but it never works.
So, here are my questions:
Does anyone know how to match multiline regexp in jEdit modes <SPAN_REGEXP> ?
If not, does anyone have any idea how to do what I need ?
As stated in the help, the SPAN_REGEXP does not support multi-line regexes. You can of course specify multi-line regexes, but they are only checked against individual lines and thus will then never match. You could post a Feature Request to the Feature Request Tracker of jEdit though if there is none for it yet.

In Yahoo-Pipes, how to use regex when you can't see non-printable characters and html tags?

I keeping having the problem trying to extract data using regex whereas my result is not what I wanted because there might be some newlines, spaces, html tags, etc in the string, but is there anyway to actually see what is in the string, the debugger seems to show only the real text. How do you deal with this?
If the content of the string is HTML then debugger gives you a choice of viewing "HTML" or "Source". Source should show you any HTML tags that are there.
However if your concern is white space, this may not be enough. Your only option is to "view source" on the original page.
The best course of action is to explicitly handle these possibilities in your regex. For example, if you think you might be getting white space in your target string, use the \s* pattern in the critical positions. That will match zero or more spaces, tabs, and new lines (you must also have the "s" option checked in the regex panel for new lines).
However, without specific examples of source text and the regex you are using - advice can only be generic.
What I do is use a regex tester (whichever uses the same regex engine that you are using) and I test my pattern on it. I've tried using text editors that display invisible characters but to me they only add to the confusion.
So I just go by trial and error. For instance, if a line ends in:
</a>
Then I'll try the following patterns on the regex tester until I find one that works:
</a>.
</a>..
</a>\s
</a>\s*
</a>\n
</a>\r
</a>\r\n
Etc.