Multiline regexp in jEdit custom mode - regex

I'm currently creating a language with a friend and I would like to provide a highlighting for it in jEdit.
It's syntax is actually quite simple. The functions can only match this pattern:
$function_name(arguments)
Note that our parser is currently working without closing tag like the C-style semi-column and that we would like to keep this feature.
I created my jEdit mode and (almost) succeeded in highligting my pattern with <SPAN_REGEXP>. Here's how I did it:
<SPAN_REGEXP HASH_CAR="\$" TYPE="KEYWORD3" DELEGATE="ARGS">
<BEGIN>\$[A-Za_z0-9_]*\s*\(</BEGIN>
<END>)</END>
</SPAN_REGEXP>
But It's not good enough.
Here's what I would like:
Same color for the entire function skeleton : $func( )
Special highlighting (already defined within the ARGS rules set) for %content1% in $func(%content1%)
No highlighting for brackets not following a $func
Authorize alternative multiline syntax like
$func
(
args
)
which is for now not highlighted.
I guessed I needed to change my <BEGIN> regexp to accept newlines, but it seems that jEdit is unable to match multiline regexp for highlighting although he does it perfectly for search&replace !
I tried the (?s) and (?m) flags, the [\d\D]* workaround, even [\r\n]* but it never works.
So, here are my questions:
Does anyone know how to match multiline regexp in jEdit modes <SPAN_REGEXP> ?
If not, does anyone have any idea how to do what I need ?

As stated in the help, the SPAN_REGEXP does not support multi-line regexes. You can of course specify multi-line regexes, but they are only checked against individual lines and thus will then never match. You could post a Feature Request to the Feature Request Tracker of jEdit though if there is none for it yet.

Related

Excluding the pattern for vim syntax highlighting

I am trying to adjust the reStructured text syntax highlighting in vim. I have tried several vim regexes to get highlight working for below two examples, but I am unable to. If I use search/highlight function all below regexes do the job, but for highlighter (syn match) it is not working. Maybe I need to change syn match to something else?
This is the text example I am looking at in rst file:
.. item:: This is the title I want to highlight
there is some text here which I do not care
.. item-matrix:: This is the title I want to highlight
:source: XX
:target: YY
Regexes that match the text:
[.+].*[:+] \zs.*
\(.. .*:: \)\zs.*
When putting that to syn match it does not work (.vim):
syn match rstHeading /[.+].*[:+] \zs.*/
I know I am close because above example matches for
..:: This is highlighted as rstHeading
When integrating with an existing syntax script (here: $VIMRUNTIME/syntax/rst.vim), you need to consider the existing syntax groups. :syn list shows all active groups, but it's easier when you install the SyntaxAttr.vim - Show syntax highlighting attributes of character under cursor plugin. (I maintain an extended fork.)
On your example headings, I see that the .. item:: part is matched by rstExplicitMarkup, and the remainder (what you want to highlight) by rstExDirective.
Assuming that you want to integrate with (and not completely override) these, you need your syntax group to be contained inside the latter. This can be done via containedin=rstExDirective.
Another pitfall is that \zs limits the highlighting, but internally still matches the whole text. In combination with syntax highlighting, this means that the existing rstExplicitMarkup prevents a match of your pattern. If you use a positive lookbehind (:help /\#<=) instead, it'll work:
syn match rstHeading /\%([.+].*[:+] \)\#<=.*/ containedin=rstExDirective
Of course, to actually see any highlighting, you also need to define or link a highlight group to your new syntax group:
hi link rstHeading Title

Removing everything between a tag (including the tag itself) using Regex / Eclipse

I'm fairly new to figuring out how Regex works, but this one is just frustrating.
I have a massive XML document with a lot of <description>blahblahblah</description> tags. I want to basically remove any and all instances of <description></description>.
I'm using Eclipse and have tried a few examples of Regex I've found online, but nothing works.
<description>(.*?)</description>
Shouldn't that work?
EDIT:
Here is the actual code.
<description><![CDATA[<center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"><th>ID</th><td>308</td></tr></table></center>]]></description>
I'm not familiar with Eclipse, but I would expect its regex search facility to use Java's built-in regex flavor. You probably just need to check a box labeled "DOTALL" or "single-line" or something similar, or you can add the corresponding inline modifier to the regex:
(?s)<description>(.*?)</description>
That will allow the . to match newlines, which it doesn't by default.
EDIT: This is assuming there are newlines within the <description> element, which is the only reason I can think of why your regex wouldn't work. I'm also assuming you really are doing a regex search; is that automatic in Eclipse, or do you have to choose between regex and literal searching?

negative look ahead to exclude html tags

I'm trying to come up with a validation expression to prevent users from entering html or javascript tags into a comment box on a web page.
The following works fine for a single line of text:
^(?!.*(<|>)).*$
..but it won't allow any newline characters because of the dot(.). If I go with something like this:
^(?!.*(<|>))(.|\s)*$
it will allow multiple lines but the expression only matches '<' and '>' on the first line. I need it to match any line.
This works fine:
^[-_\s\d\w"'\.,:;#/&\$\%\?!#\+\*\\(\)]{0,4000}$
but it's ugly and I'm concerned that it's going to break for some users because it's a multi-lingual application.
Any ideas? Thanks!
Note that your RE prevents users from entering < and >, in any context. "2 > 1", for example. This is very undesirable.
Rather than trying to use regular expressions to match HTML (which they aren't well suited to do), simply escape < and > by transforming them to < and >. Alternatively, find a package for your language-of-choice that implements whitelisting to allow a limited subset of HTML, or that supports its own markup language (I hear markdown is nice).
As for "." not matching newline characters, some regexp implementations support a flag (usually "m" for "multi-line" and "s" for "single line"; the latter causes "." to match newlines) to control this behavior.
The first two are basically equivalent to /^[^<>]*$/, except this one works on multiline strings. Any reason why you didn't write the RE that way?
So, I looked into it and there is a .Net 'SingleLine' option for regular expressions that causes "." to also match on the new line character. Unfortunately, this isn't available in the ASP.Net RegularExpressionValidator. As far as I can see, there's no way to make something like ^(?!.(<\w+>)).$ work on a multi-line textbox without doing server-side validation.
I took your advice and went the route of escaping the tags on the server side. This requires setting the validation page directive to 'false' but in this particular instance that isn't a big deal because the comment box is really the only thing to worry about.

Help with an Emacs Regular Expression

I have statements like this all over my code:
LogWrite (String1,
String2,
L"=======format string======",
...
);
I want to change each of these to:
LogWrite (String1,
String2,
L"format string",
...
);
I'm trying to write the regexp required to do this using the Emacs function query-replace-regexp, but not much success yet. Help please!
UPDATE:
1) In case it is not clear, this question is emacs specific.
2) I would like to match the entire code chunk starting from Log... ending at );
3) I used the following reg-exp to match the code chunk:
L.*\n.*\n.*==.*;
I used re-builder to match this regexp. the \n is used because I found that otherwise emacs would stop matching at the new line. The problem is that I don't know how to select the format string and save it to use it in the replace regexp - hence the ==.* part in the regexp. That needs to be modified to save the format string.
If you don't have multiple (or escaped) double quotes in those format string lines, you can
//replace
L"=+(.*)=+"
//with
L"\1"
Update: Removed the lazy quantifier (thanks #tim). Make sure that the regex is not multiline; the greedy * will lead to pretty bad results if . matches new lines
A great tool to figure out emacs regular expressions is:
M-x re-builder
A brief description from the documentation:
When called up re-builder' attaches
itself to the current buffer which
becomes its target buffer, where all
the matching is done. The active
window is split so you have a view on
the data while authoring the RE. If
the edited expression is valid the
matches in the target buffer are
marked automatically with colored
overlays (for non-color displays see
below) giving you feedback over the
extents of the matched (sub)
expressions. The (non-)validity is
shown only in the modeline without
throwing the errors at you. If you
want to know the reason why RE Builder
considers it as invalid call
reb-force-update' ("\C-c\C-u") which
should reveal the error.
It comes built into Emacs (since 21)
And for the syntax of Emacs regular expressions, you can read these info pages:
Syntax of Regular Expressions
Backslash in Regular Expressions
/={7}(.*)={6}/\1/
this should do.

Regex not returning 2 groups

I'm having a bit of trouble with my regex and was wondering if anyone could please shed some light on what to do.
Basically, I have this Regex:
\[(link='\d+') (type='\w+')](.*|)\[/link]
For example, when I pass it the string:
[link='8' type='gig']Blur[/link] are playing [link='19' type='venue']Hyde Park[/link]"
It only returns a single match from the opening [link] tag to the last [/link] tag.
I'm just wondering if anyone could please help me with what to put in my (.*|) section to only select one [link][/link] section at a time.
Thanks!
You need to make the wildcard selection ungreedy with the "?" operator. I make it:
/\[(link='\d+')\s+(type='\w+')\](.*?)\[\/link\]/
of course this all falls down for any kind of nesting, in which case the language is no longer regular and regexs aren't suitable - find a parser
Regular Expressions Info a is a fantastic site. This page gives an example of dealing with html tags. There's also an Eclipse plugin that lets you develop expressions and see the matching in realtime.
You need to make the .* in the middle of your regex non-greedy. Look up the syntax and/or flag for non-greedy mode in your flavor of regular expressions.