What regular expression variant is used in Visual Studio code? - regex

I know that I can use Ruby's regular expressions in a tmLanguage file, however that seems not to be the case in other configuration files, e.g. for extensions. Take for example the firstLine value in the language contribution. I get errors when I use character classes (e.g. \s or \p{L}). Hence I wonder what is actually allowed there. How would you match whitespaces there?
Update:
After the comments I tried this:
"firstLine": "^(lexer|parser)?\\s*grammar\\w+;"
which is supposed to match a first line like lexer grammar G1; or just grammar G1;. Is there a way to test if that RE works, because I have no validation otherwise?
Update 2:
It's essential to use the correct grammar and it will magically work:
"firstLine": "^(lexer|parser)?\\s*grammar\\s*\\w+\\s*;"

.NET regular expressions use a syntax that is largely based on Perl 5, however it does add a few new features such as named capture groups and right to left matching, so the two should not be thought of as identical. Here is the full MSDN documentation for .NET regular expressions:
.NET Framework Regular Expressions
\s is a valid character class in .NET, but it is difficult to say exactly what the problem is without seeing the code you are trying. Andrew could be right, that you just did not escape the \.

Related

How can I simulate a negative lookup in a regular expression

I have the following regular expression that includes a negative look ahead. Unfortunately the tool that I'm using does not support regular expressions. So I'm wondering if its possible to achieve negative look ahead behaviour without actually using one.
Here is my regular expression:
(?<![ABCDEQ]|\[|\]|\w\w\d)(\d+["+-]?)(?!BE|AQ|N)(?:.*)
Here it is working with sample data on Regex101.com:
see expression on regex101.com
I'm using a tool called Alteryx. The documentation indicates that it uses Perl, however, for whatever reason the look ahead does not work.
Alteryx appears to use the Boost library for its regex support, and the Boost documentation says lookbehind expressions must have a fixed length. It's more restrictive than PHP (PCRE), which allows you to use alternation in a lookbehind, as long as each branch is fixed-length. But that's easy enough to get around: just use multiple lookbehinds:
(?<![ABCDEQ])(?<!\[)(?<!\])(?<!\w\w\d)(\d+["+-]?)(?!BE|AQ|N)(?:.*)
That regex works for me in a Boost-powered regex tester, where yours doesn't. I would compress it a little more by putting square brackets inside the character set:
(?<![][ABCDEQ])(?<!\w\w\d)(\d+["+-]?)(?!BE|AQ|N)(?:.*)
The right bracket is treated as a literal when it's the first character listed, and the left bracket is never special (though some other flavors have different rules).
Here's the updated demo.

Remove text between two characters (parenthesis) in a string

I'm working on a project and I want to remove text between two parentheses in a string.
Example:
std::string str = "I want to remove (this)."
How would I go about doing that?
I've searched google and stackoverflow an haven't found anything.
I'd use a regular expression for that. Check out the link I provided. As for the expression to use the following expression
(\()(?:[^\)\\]*(?:\\.)?)*\)
That guy worked for me.
Conditionally replace regex matches in string
Do not get regular and common expressions confused. This is not like the more common expression of :-) or :-O or >:( All-though effective These expressions are mutually exclusive expressions that not many languages understand but are more commonly used.

regex matching pair of brackets

I'm trying to write a Sublime Text 2 syntax highlighter for Simulink's Target Language Compiler (TLC) files. This is a scripting language for auto-generating code. In TLC, the syntax to expand the contents of a token (similar to dereferencing a pointer in C or C++) is
%<token>
The regular expression I wrote to match this is
%<.+?>
This works for most cases, but fails for the following statement
%<LibAddToCommonIncludes("<string.h>")>
Modifying the regular expression to greedy fixes this if the statement is by itself on a line, but fails in several other cases. So that is not an option.
For that line, the highlighting stops at the first > instead of the second. How can I modify the regular expression to handle this case?
It'd be great if there was a general expression that could handle any number of nested <> pairs; for example
%<...<...>...<...<...>...>...>
where the dots are optional characters. The entire expression above should be a single match.
A generic way through regular expressions is difficult -as explained very well in this thread.
You can try to specifically match 2 < characters through a regex. Something like %<.+?<.+?>.+?>.

VB6 and C# regexes

I need to convert a VB6(which I'm not fammiliar with) project to C# 4.0 one. The project contains some regexes for string validation.
I need to know if the regexes behave the same in both cases, so if i just copy the regex string from the VB6 project, to the C# project, will they work the same?
I have a basic knowledge of regexes and I can just about read what one does, but for flavors and such, that's a bit over my head at the moment.
For example, are these 2 lines equivalent?
VB6:
isStringValid = (str Like "*[!0-9A-Z]*")
C#:
isStringValid = Regex.IsMatch(str, "*[!0-9A-Z]*");
Thanks!
The old VB Like operator, despite appearances, is not a regular expression interface. It's more of a glob pattern matcher. See http://msdn.microsoft.com/en-us/library/swf8kaxw.aspx
In your example:
Like "*[!0-9A-Z]*"
Matches strings that start and end with any character (zero or more), then doesn't match an alphanumeric character somewhere in the middle. The regular expression for this would be:
/.*[^0-9A-Z].*/
EDIT To answer your question: No, the two can't be used interchangeably. However, it's fairly easy to convert Like's operand into a proper regular expression:
Like RegEx
========== ==========
? .
* .*
# \d
[abc0-9] [abc0-9]
[!abc0-9] [^abc0-9]
There are a few caveats to this, but that should get you started and cover most cases.
In a word, yes.
These are the same. Some quick googling should give you answers to more complex issues.
http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/bce145b8-95d4-4be4-8b07-e8adee7286f1/
http://www.regular-expressions.info/dotnet.html

Is the syntax for writing regular expression standardized

Is the syntax for writing regular expression standardized? That is, if I write a regular expression in C++ it will work in Python or Javascript without any modifications.
No, there are several dialects of Regular Expressions.
They generally have many elements in common.
Some popular ones are listed and compared here.
Simple regular expressions, mostly yes. However, across the spectrum of programming languages, there are differences.
No, here are some differences that comes to mind:
JavaScript lets you write inline regex (where \ in \s need not be escaped as \\s), that are delimited by the / character. You can specify flags after the closing /. JS also has RegExp constructor that takes the escaped string as the first argument and an optional flag string as second argument.
/^\w+$/i and new RegExp("^\\w+$", "i") are valid and the same.
In PHP, you can enclose the regex string inside an arbitrary delimiter of your choice (not sure of the super set of characters that can be used as delimiters though). Again you should escape backslashes here.
"|[0-9]+|" is same as #[0-9]+#
Python and C# supports raw strings (not limited to regex, but really helpful for writing regex) that lets you write unescaped backslashes in your regex.
"\\d+\\s+\\w+" can be written as r'\d+\s+\w+' in Python and #'\d+\s+\w+' in C#
Delimiters like \<, \A etc are not globally supported.
JavaScript doesn't support lookbehind and the DOTALL flag.