RegEx: Match only = and not first char in == - regex

I'm new to regex and I need an expression that matches on =, but not on ==.
So for example:
[x] == [y] // No match
[x] = [y] // Match
All my self-made regular expressions get a match on the first = in ==. I dont want that. I just want a match if the = is the only operator in the expression.
I'm working with delphi regular expressions.

Use negative lookaround:
(?<!=)=(?!=)
This will match equal sign if not preceded and followed by equal sign.

You have to match the whether the predecessor is not = and the successor is not false:
[^=]=[^=]
Have a look at this example. Here is a little interactive tutorial wich covers the important cases.

adapting this answer should do the trick:
(?:[^=]+(=)[^=]+)
Explanation:
(?: // Do not capture group
[^=]+ // Match 1 or more occurrences of character other than [=]
(=) // Match and capture a `=`
[^=]+ // Match 1 or more occurrences of character other than [=]
) // End of group

Related

Regular Expression Nucleotide Search

I am trying to find a regular expression that will allow me to know if there is a dinucleotide(Two letters) that appears 2 times in a row in my sequence. I give you an example:
Let's suppose I have this sequence (The character ; is to make clear that I am talking about dinucleotides):
"AT;GC;TA;CC;AG;AG;CC;CA;TA;TA"
The result I expect is that it matches the pattern AGAG and TATA.
I have tried this already but it fails because it gives me any pair of dinucleotides, not the same pair :
([ATGC]{2}){2}
You will need to use backreferences.
Start with matching one pair:
[ATGC]{2}
will match any pair of two of the four letters.
You need to put that in capturing parentheses and refer to the contents of the parentheses with \1, like so:
([ATGC]{2});\1
Suppose the string were
"TA;TA;GC;TA;CC;AG;AG;CC;CA;TA;TA"
^^ ^^ ^^ ^^ ^^ ^^
If you wish to match "TA" twice (and "AG" once) you could apply #Andy's solution.
If you wish to match "TA" just once, no matter the number of instances of "TA;TA" in the string, you could match
([ATGC]{2});\1(?!.*\1;\1)
and retrieve the contents of capture group 1.
Demo
The expression can be broken down as follows.
([ATGC]{2}) # match two characters, each from the character class,
# and save to capture group 1
;\1 # match ';' followed by the content of capture group 1
(?! # begin a negative lookahead
.* # match zero or more characters
\1;\1 # match the content of capture group 1 followed by ';'
# followed by the content of capture group 1
) # end negative lookahead

Regular Expression to match first word with a character in each line

I am trying to write a regex that finds the first word in each line that contains the character a.
For a string like:
The cat ate the dog
and the mouse
The expression should find cat and
So far, I have:
/\b\w*a\w*\b/g
However this will return every match in each line, not just the first match (cat ate and).
What is the easiest way to only return the first occurrence?
Assuming you are onluy looking for words without numbers and underscores (\w would include those), I'd advise to maybe use:
(?i)^.*?(?<!\S)([b-z]*a[a-z]*)(?!\S)
And use whatever is in the 1st capture group. See an online demo. Or, if supported:
(?i)^.*?\K(?<!\S)[b-z]*a[a-z]*(?!\S)
See an online demo.
Please note that I used lookaround to assert that the word is not inbetween anything other than whitespace characters. You may also use word-boundaries if you please and swap those lookarounds for \b. Also, depending on your application you can probably scratch the inline case-insensitive switch to a 'flag'. For example, if you happen to use JavaScript /^.*?(?<!\S)([b-z]*a[a-z]*)(?!\S)/gmi should probably be your option. See for example:
var myString = "The cat ate the dog\nand the mouse";
var myRegexp = new RegExp("^.*?(?<!\S)([b-z]*a[a-z]*)(?!\S)", "gmi");
m = myRegexp.exec(myString);
while (m != null) {
console.log(m[1])
m = myRegexp.exec(myString);
}
If you want to match a word using \w you might also use a negated character class matching any character except a or a newline.
Then match a word that consists of at least an a char with word boundaries \b
^[^a\n\r]*\b([^\Wa]*a\w*)
The pattern matches:
^ Start of string
[^a\n\r]*\b Optionally match any character except a or a newline
( Capture group 1
[^\Wa]*a\w* Optionally match a word character without a, then match a and optional word characters
) Close group 1
Regex demo
Using whitespace boundaries on the left and right:
^[^a\n\r]*(?<!\S)([^\Wa]*a\w*)(?!\S)
Regex demo
The text could be matched with the regular expression
(?=(\b[a-z]*a[a-z]*\b)).*\r?\n
with the multiline and case-indifferent flags set. For each match capture group 1 contains the first word (comprised only of letters) in a line that contains an "a". There are no matches in lines that do not contain an "a".
Demo
The expression can be broken down as follows.
(?= # begin a positive lookahead
\b # match a word boundary
([a-z]*a[a-z]*) # match a word containing an "a" and save to
# capture group 1
)
.*\r?\n # match the remainder of the line including the
# line terminator

Regular expression: matching only if the latter sequence is not a specific letter

I am trying to match the strings in the form of "${234}" but the ones that don't have a "=" character at the right side of it.
For example:
v1 = 345 + ${234};
Here ${234} should match. I can do this with \${([0-9]+)}
But the following shouldn't match:
${234} = 345 + v5;
Because there is a "=" at the right of the "${234}"sequence.
I know that there are some expressions to match for "sequences ending with". But as you see, it is a bit different here.
Is it possible to match the above sequence with regexp?
You can use a negative lookahead (?!...):
\${([0-9]+)}(?!.*=)
This will only match if there isn't an = some point after the ${...}.
You could use
(?:(\$\{\d\d\d\}) ?[^=])
This is essentially literal $ followe by literal { followed by 3 digits followed by literal } followed by optional space, not followed by =.
f.e.: only the ${234} will be captured as group.
test here: https://regexr.com/3jgqk

How do I match the contents of parenthesis in a scala regular expression

I'm trying to get at the contents of a string like this (2.2,3.4) with a scala regular expression to obtain a string like the following 2.2,3.4
This will get me the string with parenthesis and all from a line of other text:
"""\(.*?\)"""
But I can't seem to find a way to get just the contents of the parenthesis.
I've tried: """\((.*?)\)""" """((.*?))""" and some other combinations, without luck.
I've used this one in the past in other Java apps: \\((.*?)\\), which is why I thought the first attempt in the line above """\((.*?)\)""" would work.
For my purposes, this looks something like:
var points = "pointA: (2.12, -3.48), pointB: (2.12, -3.48)"
var parenth_contents = """\((.*?)\)""".r;
val center = parenth_contents.findAllIn(points(0));
var cxy = center.next();
val cx = cxy.split(",")(0).toDouble;
Use Lookahead and Lookbehind
You can use this regex:
(?<=\()\d+\.\d+,\d+\.\d+(?=\))
Or, if you don't need precision inside the parentheses:
(?<=\()[^)]+(?=\))
See demo 1 and demo 2
Explanation
The lookbehind (?<=\() asserts that what precedes is a (
\d+\.\d+,\d+\.\d+ matches the string
or, in Option 2, [^)]+ matches any chars that are not a closing parenthesis
The lookahead (?=\)) asserts that what follows is a )
Reference
Lookahead and Lookbehind Zero-Length Assertions
Mastering Lookahead and Lookbehind
May be try this out
val parenth_contents = "\\(([^)]+)\\)".r
parenth_contents: scala.util.matching.Regex = \(([^)]+)\)
val parenth_contents(r) = "(123, abc)"
r: String = 123, abc
A even sample regex for matching all occurrence of both parenthesis itself and content inside the parenthesises.
(\([^)]+\)+)
1st Capturing Group (\([^)]+\)+)
\( matches the character ( literally (case sensitive)
Match a single character not present in the list below [^)]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
) matches the character ) literally (case sensitive)
\)+ matches the character ) literally (case sensitive)
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
https://regex101.com/r/MMNRRo/1
\((.*?)\) works - you just need to extract the matched group. The easiest way to do that is to use the unapplySeq method of scala.util.matching.Regex:
scala> val wrapped = raw"\((.*?)\)".r
wrapped: scala.util.matching.Regex = \((.*?)\)
val wrapped(r) = "(123,abc)"
r: String = 123,abc

Regex not operator

Is there an NOT operator in Regexes?
Like in that string : "(2001) (asdf) (dasd1123_asd 21.01.2011 zqge)(dzqge) name (20019)"
I want to delete all \([0-9a-zA-z _\.\-:]*\) but not the one where it is a year: (2001).
So what the regex should return must be: (2001) name.
NOTE: something like \((?![\d]){4}[0-9a-zA-z _\.\-:]*\) does not work for me (the (20019) somehow also matches...)
Not quite, although generally you can usually use some workaround on one of the forms
[^abc], which is character by character not a or b or c,
or negative lookahead: a(?!b), which is a not followed by b
or negative lookbehind: (?<!a)b, which is b not preceeded by a
No, there's no direct not operator. At least not the way you hope for.
You can use a zero-width negative lookahead, however:
\((?!2001)[0-9a-zA-z _\.\-:]*\)
The (?!...) part means "only match if the text following (hence: lookahead) this doesn't (hence: negative) match this. But it doesn't actually consume the characters it matches (hence: zero-width).
There are actually 4 combinations of lookarounds with 2 axes:
lookbehind / lookahead : specifies if the characters before or after the point are considered
positive / negative : specifies if the characters must match or must not match.
You could capture the (2001) part and replace the rest with nothing.
public static string extractYearString(string input) {
return input.replaceAll(".*\(([0-9]{4})\).*", "$1");
}
var subject = "(2001) (asdf) (dasd1123_asd 21.01.2011 zqge)(dzqge) name (20019)";
var result = extractYearString(subject);
System.out.println(result); // <-- "2001"
.*\(([0-9]{4})\).* means
.* match anything
\( match a ( character
( begin capture
[0-9]{4} any single digit four times
) end capture
\) match a ) character
.* anything (rest of string)
Here is an alternative:
(\(\d{4}\))((?:\s*\([0-9a-zA-z _\.\-:]*\))*)([^()]*)(( ?\([0-9a-zA-z _\.\-:]*\))*)
Repetitive patterns are embedded in a single group with this construction, where the inner group is not a capturing one: ((:?pattern)*), which enable to have control on the group numbers of interrest.
Then you get what you want with: \1\3