Finding Elvis ?:

Finding Elvis ?: - regex

I have been tasked to find Elvis (using eclipse search). Is there any regex that I can use to find him?
The "Elvis Operator" (?:) is a shortening of Java's ternary operator.
I have tried \?[\s\S]*[:] but it doesn't match multiline.
Is there such a refactoring where I could change Elvis into an if-else block?

Edit
Sorry, I had posted a regex for the ternary operator, if your problem is multiline you could use this:
\?(\p{Z}|\r\n|\n)*:

You'll need to explicitly match line delimiters if you want to match across multiple lines. \R will match any of them(platform-independent), in Eclipse 3.4 anyway, or you can use the proper one for your file (\r, \n, \r\n). E.g. \?.*\R*.*: will work if there's only one line break. You can't use \R in a character class, though, so if you don't know how many lines the operator might span, you'd have to construct a character class with your line delimiter and any character that might appear in an operand. Something like ([-\r\n\w\s\[\](){}=!/%*+&^|."']*)\?([-\r\n\w\s\[\](){}=!/%*+&^|."']*):([-\r\n\w\s\[\](){}=!/%*+&^|."']*). I've included parentheses to capture the operands as groups so you could find and replace.
You've got a pretty big problem, though, if this is Java (and probably any other language). The ternary conditional ?: operator creates an expression, while an if statement is not an expression. Consider:
boolean even = true;
int foo = even ? 2 : 3;
int bar = if (even) 2 else 3;
The third line is syntactically incorrect; the two conditional constructs are not equivalent. (What you'd actually get from the second line if you used my regex to find and replace is if (int foo = even) 2 else 3; which has additional problems.)
So, you can find the ?: operators with the regex above (or something similar; I may have missed some characters you need to include in the class), but you won't necessarily be able to replace them with 'if' statements.

Related

Why JFlex reject .+?(?=->)

I'm trying design simple language idea plugin.
I want to match below example code as 3 tokens as text before ->, between -> and :, after :
Ex: First part -> Second Part: Third part
For first part when I try regex .+?(?=->) at https://regex101.com/r/TDBWg0/1 it works.
But as per JFlex .+?(?=->) has a syntax error:
Error in file "Simple.flex" (line 41):
Syntax error.
FIRST_PART=.+(?=(->))
^

Lexer generators like JFlex often have a different syntax and feature set than most other regex implementations, so helpers like regex101 aren't always that useful for them. Instead you should look at the JFlex manual to see which syntax JFlex supports.
There's two things of note there:
The syntax for lookahead is /regex not (?=regex)
There is no syntax for non-greedy quantifiers
< and > need to be quoted or escaped
So .+/"->" would be a valid regex, but when there are multiple ->s it will match up to the last ->, not the first. Presumably you tried to make the + non-greedy specifically so that it would only match up to the first, so this is no good.
Since there are no non-greedy modifiers in JFlex, we need a different approach. If we look at the available regex features again, we'll see that there's an operator ~, which works as follows:
~a (upto)
matches everything up to (and including) the first occurrence of a text matched by a. The expression ~a is equivalent to !([^]* a [^]*) a. A traditional C-style comment is matched by "/*" ~"*/".
So the regex you want is simpy ~"->".
Another approach, that works with virtually every regex implementation, would be to write a regex that specifically matches everything that's not a ->, i.e. any non-- character or a - not followed by a >. So that'd be:
([^-]|-[^\>])+

Regex to match all lines not starting with ==

This should be simple, but I've been having trouble with it. I want to write a regex in perl to match all lines that do not begin with an "==". I created this expression:
^[^\=\=].*
Which works fine in a regex tester I use, but when I run the perl script I get an error stating:
POSIX syntax [= =] is reserved for future extensions in regex
And the script terminates. I assume I'm using some syntax wrong, but I haven't found anything regarding this. Does anyone have a better way to match these lines?

Your regex is incorrect, as it fails with =A as input, by example.
A way to do it would be with a Perl Compatible Regular Expression(PCRE): ^(?!==)

You're misunderstanding how character classes work in regular expressions
A character class is delimited by square brackets [...] and generally will match any one of the characters that it encloses. So [abc] will match a, b, or c, but only the first character of aa or cbc. You probably know that you can also use ranges, such as [a-c]
You can also negate the class, as you have done, so [^a] will match any one character that isn't an a, such as z or &, but only the first character of zz
Replicating a character in a class will not change what it matches, so [aardvark] will match exactly one of a, d, k, r, or v, and is equivalent to [adkrv]
Your regex pattern uses the character class [^\=\=]. It's unnecessary to escape an equals sign, and replicating it has no effect, so you have the equivalent of [^=], which will match any single character other than the equals sign =
The reason you got that error message is that character classes beginning [= and ending =] (just [=] doesn't count) are reserved for special behaviour yet to be implemented. As above, there would ordinarily be no reason to write a character class with multiple occurrences of the same character, so it's reasonable to disallow such a construction
perldoc perldiag has this to say
POSIX syntax [= =] is reserved for future extensions in regex; marked by <-- HERE in m/%s/
(F) Within regular expression character classes ([]) the syntax beginning with "[=" and ending with "=]" is reserved for future extensions. If you need to represent those character sequences inside a regular expression character class, just quote the square brackets with the backslash: "[=" and "=]". The <-- HERE shows whereabouts in the regular expression the problem was discovered. See perlre.
A solution depends on how you want to use the test in your Perl code, but if you need an if statement then I would simply invert the test and check that the line doesn't start with ==
unless ( /^==/ )
or, if you're allergic to Perl's unless
if ( not /^==/ )

Regular expression in C++ for mathematical expressions

I have this trouble: I must verify the correctness of many mathematical expressions especially check for consecutive operators + - * /.
For example:
6+(69-9)+3
is ok while
6++8-(52--*3)
no.
I am not using the library <regex> since it is only compatible with C++11.
Is there a alternative method to solve this problem? Thanks.

You can use a regular expression to verify everything about a mathematical expression except the check that parentheses are balanced. That is, the regular expression will only ensure that open and close parentheses appear at the point in the expression they should appear, but not their correct relationship with other parentheses.
So you could check both that the expression matches a regex and that the parentheses are balanced. Checking for balanced parentheses is really simple if there is only one type of parenthesis:
bool check_balanced(const char* expr, char open, char close) {
int parens = 0;
for (const char* p = expr; *p; ++p) {
if (*p == open) ++parens;
else if (*p == close && parens-- == 0) return false;
}
return parens == 0;
}
To get the regular expression, note that mathematical expressions without function calls can be summarized as:
BEFORE* VALUE AFTER* (BETWEEN BEFORE* VALUE AFTER*)*
where:
BEFORE is sub-regex which matches an open parenthesis or a prefix unary operator (if you have prefix unary operators; the question is not clear).
AFTER is a sub-regex which matches a close parenthesis or, in the case that you have them, a postfix unary operator.
BETWEEN is a sub-regex which matches a binary operator.
VALUE is a sub-regex which matches a value.
For example, for ordinary four-operator arithmetic on integers you would have:
BEFORE: [-+(]
AFTER: [)]
BETWEEN: [-+*/]
VALUE: [[:digit:]]+
and putting all that together you might end up with the regex:
^[-+(]*[[:digit:]]+[)]*([-+*/][-+(]*[[:digit:]]+[)]*)*$
If you have a Posix C library, you will have the <regex.h> header, which gives you regcomp and regexec. There's sample code at the bottom of the referenced page in the Posix standard, so I won't bother repeating it here. Make sure you supply REG_EXTENDED in the last argument to regcomp; REG_EXTENDED|REG_NOSUB, as in the example code, is probably even better since you don't need captures and not asking for them will speed things up.

You can loop over each charin your expression.
If you encounter a + you can check whether it is follow by another +, /, *...
Additionally you can group operators together to prevent code duplication.
int i = 0
while(!EOF) {
switch(expression[i]) {
case '+':
case '*': //Do your syntax checks here
}
i++;
}

Well, in general case, you can't solve this with regex. Arithmethic expressions "language" can't be described with regular grammar. It's context-free grammar. So if what you want is to check correctness of an arbitrary mathemathical expression then you'll have to write a parser.
However, if you only need to make sure that your string doesn't have consecutive +-*/ operators then regex is enough. You can write something like this [-+*/]{2,}. It will match substrings with 2 or more consecutive symbols from +-*/ set.
Or something like this ([-+*/]\s*){2,} if you also want to handle situations with spaces like 5+ - * 123

Well, you will have to define some rules if possible. It's not possible to completely parse mathamatical language with Regex, but given some lenience it may work.
The problem is that often the way we write math can be interpreted as an error, but it's really not. For instance:
5--3 can be 5-(-3)
So in this case, you have two choices:
Ensure that the input is parenthesized well enough that no two operators meet
If you find something like --, treat it as a special case and investigate it further
If the formulas are in fact in your favor (have well defined parenthesis), then you can just check for repeats. For instance:
--
+-
+*
-+
etc.
If you have a match, it means you have a poorly formatted equation and you can throw it out (or whatever you want to do).
You can check for this, using the following regex. You can add more constraints to the [..][..]. I'm giving you the basics here:
[+\-\*\\/][+\-\*\\/]
which will work for the following examples (and more):
6++8-(52--*3)
6+\8-(52--*3)
6+/8-(52--*3)
An alternative, probably a better one, is just write a parser. it will step by step process the equation to check it's validity. A parser will, if well written, 100% accurate. A Regex approach leaves you to a lot of constraints.

There is no real way to do this with a regex because mathematical expressions inherently aren't regular. Heck, even balancing parens isn't regular. Typically this will be done with a parser.
A basic approach to writing a recursive-descent parser (IMO the most basic parser to write) is:
Write a grammar for a mathematical expression. (These can be found online)
Tokenize the input into lexemes. (This will be done with a regex, typically).
Match the expressions based on the next lexeme you see.
Recurse based on your grammar
A quick Google search can provide many example recursive-descent parsers written in C++.

Find strings that do not begin with something

This has been gone over but I've not found anything that works consistently... or assist me in learning where I've gone awry.
I have file names that start with 3 or more digits and ^\d{3}(.*) works just fine.
I also have strings that start with the word 'account' and ^ACCOUNT(.*) works just fine for these.
The problem I have is all the other strings that DO NOT meet the two previous criteria. I have been using ^[^\d{3}][^ACCOUNT](.*) but occasionally it fails to catch one.
Any insights would be greatly appreciated.

^[^\d{3}][^ACCOUNT](.*)
That's definitely not what you want. Square brackets create character classes: they match one character from the list of characters in brackets. If you put a ^ then the match is inverted and it matches one character that's not listed. The meaning of ^ inside brackets is completely different from its meaning outside.
In short, [] is not at all what you want. What you can do, if your regex implementation supports it, is use a negative lookahead assertion.
^(?!\d{3}|ACCOUNT)(.*)
This negative lookahead assertion doesn't match anything itself. It merely checks that the next part of the string (.*) does not match either \d{3} or ACCOUNT.

Demorgan's law says: !(A v B) = !A ^ !B.
But unfortunately Regex itself does
not support the negation of expressions. (You always could rewrite it, but sometimes, this is a huge task).
Instead, you should look at your Programming Language, where you can negate values without problems:
let the "matching" function be "match" and you are using match("^(?:\d{3}|ACCOUNT)(.)") to determine, whether the string matches one of both conditions. Then you could simple negate the boolean return value of that matching function and you'll receive every string that does NOT match.

regex matching pair of brackets

I'm trying to write a Sublime Text 2 syntax highlighter for Simulink's Target Language Compiler (TLC) files. This is a scripting language for auto-generating code. In TLC, the syntax to expand the contents of a token (similar to dereferencing a pointer in C or C++) is
%<token>
The regular expression I wrote to match this is
%<.+?>
This works for most cases, but fails for the following statement
%<LibAddToCommonIncludes("<string.h>")>
Modifying the regular expression to greedy fixes this if the statement is by itself on a line, but fails in several other cases. So that is not an option.
For that line, the highlighting stops at the first > instead of the second. How can I modify the regular expression to handle this case?
It'd be great if there was a general expression that could handle any number of nested <> pairs; for example
%<...<...>...<...<...>...>...>
where the dots are optional characters. The entire expression above should be a single match.

A generic way through regular expressions is difficult -as explained very well in this thread.
You can try to specifically match 2 < characters through a regex. Something like %<.+?<.+?>.+?>.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Finding Elvis ?: - regex

Edit Sorry, I had posted a regex for the ternary operator, if your problem is multiline you could use this: \?(\p{Z}|\r\n|\n)*:

Related

Why JFlex reject .+?(?=->)

Regex to match all lines not starting with ==

Regular expression in C++ for mathematical expressions

Find strings that do not begin with something

regex matching pair of brackets

Categories

Resources