How can I specify the regular expression dialect in IntelliJ IDEA? - regex

I have a file which is in Java's regular expression dialect:
# Prevents matching at the second half of a version number and things like
# 1.16.2 splitting into 1.1 and 6.2
(?<![._\-\d])
(?<sign>-)?
(?<integerPart>\d+(?:,\d+)*)
(
(?<fractionalPart>\.\d+)?
(?<suffix>[kKMG%])?
# Prevents matching at the first half of a version number
(?![._\-\d])
|
# Note how this one does _not_ include '.' because we wanted to deal with
# integers with a period after them. This may change?
(?![_\-\d])
)
IDEA gives me errors on all the groups, saying: "This named group syntax is not supported in this regex dialect".
But when I edit settings for this inspection there is just one checkbox.
Questions:
What dialect is the default anyway? I'm mildly surprised that it isn't the Java Pattern one
How do I configure this to use Java one? Is there a magic comment I can put in the file to hint at the format which IDEA and maybe even other text editors would recognise?

It looks like a known bug in IntelliJ IDEA. There is no way to change the dialect at the moment.

Related

Regex to match everything except a pattern

Regex noob here struggling with this, which I know it will be easy for some of you regex gods out there!
Given the following:
title: Some title
date: 2022-08-15
tags: <value to extract>
identifier: 1234567
---------------------------
Some text
some more text
I would like a regex to match everything except the value of tags (ie the "<value to extract>" text).
For context, this is supposed to run on emacs (in case it matters).
EDIT: Just to clarify as per #phils question, all I care about extracting the tags value. However, this is via a package setting that asks for a regex string and I don't have much control over how it gets use. It seems to expect a regex to strip what I don't need from the string rather than matching what I do want, which is slightly annoying.. Also, the since it seems to match everything with \\(.\\), I'm guessing it's using the global flag?
Please let me know if any of this isn't clear.
Emacs regular expressions can't trivially express "not foo" for arbitrary values of foo. (The likes of PCRE have non-regular extensions for zero-width negative look-ahead/behind assertions, but in Emacs that sort of functionality is generally done with the support of lisp code1.)
You can still do it purely with regexp matching, but it's simply very cumbersome. An Emacs regexp which matches any line which does not begin with tags: is:
^\(?:$\|[^t]\|t[^a]\|ta[^g]\|tag[^s]\|tags[^:]\).*
or if you need to enter it in the elisp double-quoted read syntax for strings:
"^\\(?:$\\|[^t]\\|t[^a]\\|ta[^g]\\|tag[^s]\\|tags[^:]\\).*"
1 In lisp code you would instead simply check each line to see whether it does start with tags: and, if so, skip it (which is why Emacs generally gets away without the feature you're looking for, but of course that doesn't help you here).
After playing around with it for a bit and taken inspiration from #phils' answer, I've come up with the following:
"^\\(?:\\(#\\+\\)?\\(?:filetags:\s+\\|tags:\s+\\|title:.*\\|identifier:.*\\|date:.*\\)\\|.*\\)"
I've also added an extra \\(#\\+\\)? to account for org meta keys which would usually have the format #+key: value.

Are named capture groups supported? If so, how to engage?

I understand that VSCode uses the JavaScript regex engine for its functionality.
The latest JavaScript specification allows for named capture groups to be used.
However, I am at a loss in understanding whether this is enabled in VSCode v1.43?
I am using the following notations in the general find command:
(?<name-of-capture>pattern to find)( other stuff )(\k<name-of-capture>)
(?<name-of-capture>pattern to find)( other stuff )(\g<name-of-capture>)
I have also used the combinations of \k'name' and \g'name' and these have no effect.
If anyone has insights into this I would appreciate to hear.
If you want to use an inline backreference, they work in VSCode.
(?<group>[a-z]+) \d+ \k<group>
matches abc 1 abc.
However, new JavaScript-like $<group> replacement does not work, .NET-style replacement backreference, ${group}, does not work either, probably, due to the issue referred to by #JW.
NOTE: They say they need 20 votes on the issue and there are 3 days to go before they close the issue and turn down the suggestion to introduce backreferences in replacement. If you want this feature to be implemented, please consider voting for that issue.

Futile attempt to run regular expression find/replace in MS Word using groups on Mac

According to the received wisdom MS Word (more or less) supports find/replace with use of regular expressions. I have a simple regular expression:
^(C[[:alpha:]]*)(\d*)(.*)$
That I'm running on the data:
indSIMDdecile
CSdeccrim12006
CSdeccrim12006
CSdeccrim12009
CSdeccrim12009
CSdeccrim12012
CSdeccrim12012
CSdeceduc12004
CSdeceduc12004
CSdeceduc12006
CSdeceduc12006
CSdeceduc12009
CSdeceduc12009
CSdeceduc12012
CSdeceduc12012
CSdecemp12004.x
I'm interested in returning the first word prior to the digit 1, which works as demonstrated on regex101 here.
Problem
I would like to the same but in MS Word (v. 15.18 on Mac). After getting error messages of trying to supply unsuitable syntax I learned that MS Word does not support to the full regex syntax. I simplified my expression to something on the lines:
but the search does not find any strings and nothing gets replaced. Hence my questions, is it possible to use MS Word on Mac with regex?
The linked help website hints that something like that should be possible, but so far now luck.
The simple answer is "no", if you mean "Does Mac Word have a UI feature that lets you use one of the modern dialects of regex?" Word's Find/Replace only supports its own Regular Expression syntax.
In this case, I think the following will give you what you need:
Find with wildcards:
(C)([!1]#)(1)
and a replace by
\1
(If you also had to find "C1", then that doesn't work, and unfortunately nor does
(C)([!1]{0,})(1)
because Word does not allow 0 in the {,} pattern)
But there is a problem with "#". If the text the "#" is looking for is long, the find/replace may fail. There is supposed to be a 255 limit, but it seems rather more arbitrary than that. (I have long suspected a buffer overrun type error in the Word code, but perhaps there is a simpler explanation).
If you mean, "is there any way to use modern regex with Word?", then the answer is "Yes, but you only get to operate on a copy of the text in the document. You will need to create your own code to do the 'replace' part of the find replace, and that means that you would have to deal with any of the issues such as preserving formatting that Word's built-in find/replace might get right for you.
On the Windows side, people who want a better regex than Word's often use VBScript's regexp object because it is easily used from VBA. VBA itself only really has the "like" operator, which also only has fairly crude pattern matching abilities. I think there are examples of VBScript rexexp use on StackOverflow. On the Mac side, you would either have to use VBA and "shell out" to one of the built-in Mac/Unix utilities to do your finding (and perhaps replacing), or perhaps use Applescript or Javascript application scripting to do it. As far as I can remember Applescript does not have a 'modern' regex built-in either.
[As a bit of history, Word's "regular expressions" were I think introduced in Word 6, around 1993, at a time when most dialects of regex were much more crude than they are today. I don't think Word's version has moved along much at all - it probably added some Unicode support at some point, but that's probably about it. I assume that people using modern regex don't regard it as regex at all, and I personally prefer not to call Word's Regular Expressions 'regex' precisely for that reason.]

Filter by regex example

Could anyone provide an example of a regex filter for the Google Chrome Developer toolbar?
I especially need exclusion. I've tried many regexes, but somehow they don't seem to work:
It turned out that Google Chrome actually didn't support this until early 2015, see Google Code issue. With newer versions it works great, for example excluding everything that contains banners:
/^(?!.*?banners)/
It's possible -- at least in Chrome 58 Dev. You just need to wrap your regex with forward-slashes: /my-regex-string/
For example, this is one I'm currently using: /^(.(?!fallback font))+$/
It successfully filters out any messages that contain the substring "fallback font".
EDIT
Something else to note is that if you want to use the ^ (caret) symbol to search from the start of the log message, you have to first match the "fileName.js?someUrlParam:lineNumber " part of the string.
That is to say, the regex is matching against not just the log message, but also the stack-entry for the line which made the log.
So this is the regex I use to match all log messages where the actual message starts with "Dog":
/^.+?:[0-9]+ Dog/
The negative or exclusion case is much easier to write and think about when using the DevTool's native syntax. To provide the exclusion logic you need, simply use this:
-/app/ -/some\sother\sregex/
The "-" prior to the regex makes the result negative.
Your expression should not contain the forward slashes and /s, these are not needed for crafting a filter.
I believe your regex should finally read:
!(appl)
Depending on what exactly you want to filter.
The regex above will filter out all lines without the string "appl" in them.
edit: apparently exclusion is not supported?

Selenum-server fails regexp pattern that passes in Selenium-IDE

i have a (i guess) simple question.
I am running Selenium test cases (HTML, selense) after a successful build in Hudson. The testcases pass when i run them in the IDE after the build but not on the server. The cases that fails holds a regexp expression like so:
regexp:The profile details have been updated, it can take up to [0-9]*\s\w*\(.\) until the changes are fully visible.
the goal is to match times like 1 hour(s), 30 second(s) 4 minutes(s)
Have someone encountered a problem like this and how did you solve it?
I would use a slightly simpler expression, assuming you do not need to handle things like 1 second, 2 seconds, etc, but only need to handle things like 1 second(s) or 2 second(s), you could use this expression:
try replacing the regex stuff with
[0-9]+ [a-z]+\(s\)
Some older regex flavors (POSIX ERE) don't allow the shorthand character classes, so if those rules are somehow being invoked, then it would fail on \s and \w (and maybe even the .). The above expression should work almost anywhere; though it is not as flexible, it should be flexible enough for your situation.
Some flavors that are even older (POSIX BRE) don't support the shorthand repetition and actually interpret curly braces and parentheses differently as well. If that is the flavor being expected, it might just fail on any normal expression, and you'd need to usde something like:
[0-9]\{1,\} [a-z]\{1,\}(s)
If neither of these work, then there is a significant difference in the engines implementing your regex OR there is some difference in how the page is rendering and being parsed/searched/analyzed by the Selenium Server.
If that is the case, there could be <br> tags or formatting tags (like <i> or <span>) that are being parsed as text rather than being extracted from the string. Then, something like changes are fully visible might be parsed as changes are <em>fully</em> visible and thus failing the test
When you test it locally versus hudson run, are there differences in OS platform?
I know that I had some problems with windows xp versus unix (where hudson was deployed). Although I don't think that is the case here, just a thought though.