Is this possible with one regex? - regex

I have a string like
{! texthere }
I want to capture either everything after {! until either the end or you reach the first }. So if I had
{!text here} {!text here again} {!more text here. Oh boy!
I would want ["{!text here}", "{!text here again}", "{!more text here. oh boy!"]
I thought this would work
{!.*}??
but the above string would come out to be ["{!text here} {!text here again} {!more text here. Oh boy!"]
I'm still very inexperienced with regexes so I don't understand why this doesn't work. I would think it would match '{!' followed by any number of characters until you get to a bracket (non greedy) which may not be there.

Using positive lookbehind (?<={!)[^}]+:
In [8]: import re
In [9]: str="{!text here} {!text here again} {!more text here. Oh boy!"
In [10]: re.findall('(?<={!)[^}]+',str)
Out[10]: ['text here', 'text here again', 'more text here. Oh boy!']
That is positive lookbehind where by any non } character is matched if following {!.

You can do it this way :
({![^}]+}?)
Edit live on Debuggex
Then recover the capture group $1 which corresponds to the first set of parenthesis.
Using this way, you have to use a "match all" type of function because the regex itself is made to match a single group function
This way doesn't use any look around. Also the use of ^} should limit the number of regex engine cycle since it is searching for the next } as a breaker instead of having to do the whole expression then backtrack.

I believe you want to use a reluctant quantifier:
{!.*?}?
This will cause the . to stop matching as soon as the first following } is found, instead of the last.
I had a question about greedy and reluctant quantifiers that has a good answer here.
Another option would be to specify the characters that are allowed to come between the two curly braces like so:
{![^}]*}?
This specifies that there cannot be a closing curly brace matched within your pattern.

if your tool/language supports perl regex, try this:
(?<={!)[^}]*

Related

Regex: Exact match string ending with specific character

I'm using Java. So I have a comma separated list of strings in this form:
aa,aab,aac
aab,aa,aac
aab,aac,aa
I want to use regex to remove aa and the trailing ',' if it is not the last string in the list. I need to end up with the following result in all 3 cases:
aab,aac
Currently I am using the following pattern:
"aa[,]?"
However it is returning:
b,c
If lookarounds are available, you can write:
,aa(?![^,])|(?<![^,])aa,
with an empty string as replacement.
demo
Otherwise, with a POSIX ERE syntax you can do it with a capture:
^(aa(,|$))+|(,aa)+(,|$)
with the 4th group as replacement (so $4 or \4)
demo
Without knowing your flavor, I propose this solution for the case that it does know the \b.
I use perl as demo environment and do a replace with "_" for demonstration.
perl -pe "s/\baa,|,aa\b/_/"
\b is the "word border" anchor. I.e. any start or end of something looking like a word. It allows to handle line end, line start, blank, comma.
Using it, two alternatives suffice to cover all the cases in your sample input.
Output (with interleaved input, with both, line ending in newline and line ending in blank):
aa,aab,aac
_aab,aac
aab,aa,aac
aab_,aac
aab,aac,aa
aab,aac_
aa,aab,aac
_aab,aac
aab,aa,aac
aab_,aac
aab,aac,aa
aab,aac_
If the \b is unknown in your regex engine, then please state which one you are using, i.e. which tool (e.g. perl, awk, notepad++, sed, ...). Also in that case it might be necessary to do replacing instead of deleting, i.e. to fine tune a "," or "" as replacement. For supporting that, please show the context of your regex, i.e. the replacing mechanism you are using. If you are deleting, then please switch to replacing beforehand.
(I picked up an input from comment by gisek, that the cpaturing groups are not needed. I usually use () generously, including in other syntaxes. In my opinion not having to think or look up evaluation orders is a benefit in total time and risks taken. But after testing, I use this terser/eleganter way.)
If your regex engine supports positive lookaheads and positive lookbehinds, this should work:
,aa(?=,)|(?<=,)aa,|(,|^)aa(,|$)
You could probably use the following and replace it by nothing :
(aa,|,aa$)
Either aa, when it's in the begin or the middle of a string
,aa$ when it's at the end of the string
Demo
As you want to delete aa followed by a coma or the end of the line, this should do the trick: ,aa(?=,|$)|^aa,
see online demo

regex to select only the zipcode

,Ray Balwierczak,4/11/2017,,895 Forest Hill Rd,Apalachin,NY,13732,y,,
i want to select only 13732 from the line. I came up with this regex
(\d)(\s*\d+)*(\,y,,)
But its also selecting the ,y,, .if i remove it that part from regex, the regex also gets valid for the date. please help me on this.
Generally, if you want to match something without capturing it, use zero-length lookaround (lookahead or lookbehind). In your case, you can use lookahead:
(\d)(\s*\d+)*(?=\,y,,)
The syntax (?=<stuff>) means "followed by <stuff>, without matching it".
More information on lookarounds can be found in this tutorial.
Regex: \D*(\d{5})\D*
Explanation: match 5 digits surrounded by zero or more non-digits on both sides. Then you can extract group containing the match.
Here's code in python:
import re
string = ",Ray Balwierczak,4/11/2017,,895 Forest Hill Rd,Apalachin,NY,13732,y,,"
search = re.search("\D*(\d{5})\D*", string)
print search.group(1)
Output:
13732

Get all matches for a certain pattern using RegEx

I am not really a RegEx expert and hence asking a simple question.
I have a few parameters that I need to use which are in a particular pattern
For example
$$DATA_START_TIME
$$DATA_END_TIME
$$MIN_POID_ID_DLAY
$$MAX_POID_ID_DLAY
$$MIN_POID_ID_RELTM
$$MAX_POID_ID_RELTM
And these will be replaced at runtime in a string with their values (a SQL statement).
For example I have a simple query
select * from asdf where asdf.starttime = $$DATA_START_TIME and asdf.endtime = $$DATA_END_TIME
Now when I try to use the RegEx pattern
\$\$[^\W+]\w+$
I do not get all the matches(I get only a the last match).
I am trying to test my usage here https://regex101.com/r/xR9dG0/2
If someone could correct my mistake, I would really appreciate it.
Thanks!
This will do the job:
\$\$\w+/g
See Demo
Just Some clarifications why your regex is doing what is doing:
\$\$[^\W+]\w+$
Unescaped $ char means end of string, so, your pattern is matching something that must be on the end of the string, that's why its getting only the last match.
This group [^\W+] doesn't really makes sense, groups starting with [^..] means negate the chars inside here, and \W is the negation of words, and + inside the group means literally the char +, so you are saying match everything that is Not a Not word and that is not a + sign, i guess that was not what you wanted.
To match the next word just \w+ will do it. And the global modifier /g ensures that you will not stop on the first match.
This should work - Based on what you said you wanted to match this should work . Also it won't match $$lower_case_strings if that's what you wanted. If not, add the "i" flag also.
\${2}[A-Z_]+/g

Vim/Perl Regex Tag Match Problem

I have data that looks like this:
[Shift]);[Ctrl][Ctrl+S][Left mouse-click][Backspace][Ctrl]
I want to find all [.*] tags that have the word mouse in them. Keeping in mind non-greedy specifiers, I tried this in Vim: \[.\{-}mouse.\{-}\], but this yielded this result,
[Shift]);[Ctrl][Ctrl+S][Left mouse-click]
Rather than just the desired,
[Left mouse-click]
Any ideas? Ultimately I need this pattern in Perl syntax as well, so if anyone has a solution in Perl that would also be appreciated.
\[[^]]*mouse[^[]*\]
That is, match a literal opening bracket, then any number of characters that aren't closing brackets, then "mouse," then any number of non-opening-brackets, and finally a literal closing bracket. Should be the same in Perl.
You can use the following regex:
\[[^\]]*mouse.*?\]

Regular Expression to match fractions and not dates

I'm trying to come up with a regular expression that will match a fraction (1/2) but not a date (5/5/2005) within a string. Any help at all would be great, all I've been able to come up with is (\d+)/(\d+) which finds matches in both strings. Thanks in advance for the help.
Assuming PCRE, use negative lookahead and lookbehind:
(?<![\/\d])(\d+)\/(\d+)(?![\/\d])
A lookahead (a (?=) group) says "match this stuff if it's followed by this other stuff." The contents of the lookahead aren't matched. We negate it (the (?!) group) so that it
doesn't match stuff after our fraction - that way, we don't match the group in what follows.
The complement to a lookahead is a lookbehind (a (?<=) group) does the opposite - it matches stuff if it's preceeded by this other stuff, and just like the lookahead, we can negate it (the (?<!) group) so that we can match things that don't follow something.
Together, they ensure that our fraction doesn't have other parts of fractions before or after it. It places no other arbitrary requirements on the input data. It will match the fraction 2/3 in the string "te2/3xt", unlike most of the other examples provided.
If your regex flavor uses //s to delimit regular expressions, you'll have to escape the slashes in that, or use a different delimiter (Perl's m{} would be a good choice here).
Edit: Apparently, none of these regexes work because the regex engine is backtracking and matching fewer numbers in order to satisfy the requirements of the regex. When I've been working on one regex for this long, I sit back and decide that maybe one giant regex is not the answer, and I write a function that uses a regex and a few other tools to do it for me. You've said you're using Ruby. This works for me:
>> def get_fraction(s)
>> if s =~ /(\d+)\/(\d+)(\/\d+)?/
>> if $3 == nil
>> return $1, $2
>> end
>> end
>> return nil
>> end
=> nil
>> get_fraction("1/2")
=> ["1", "2"]
>> get_fraction("1/2/3")
=> nil
This function returns the two parts of the fraction, but returns nil if it's a date (or if there's no fraction). It fails for "1/2/3 and 4/5" but I don't know if you want (or need) that to pass. In any case, I recommend that, in the future, when you ask on Stack Overflow, "How do I make a regex to match this?" you should step back first and see if you can do it using a regex and a little extra. Regular expressions are a great tool and can do a lot, but they don't always need to be used alone.
EDIT 2:
I figured out how to solve the problem without resorting to non-regex code, and updated the regex. It should work as expected now, though I haven't tested it. I also went ahead and escaped the /s since you're going to have to do it anyway.
EDIT 3:
I just fixed the bug j_random_hacker pointed out in my lookahead and lookbehind. I continue to see the amount of effort being put into this regex as proof that a pure regex solution was not necessarily the optimal solution to this problem.
Use negative lookahead and lookbehind.
/(?<![\/\d])(?:\d+)\/(?:\d+)(?![\/\d])/
EDIT: I've fixed my answer to trap for the backtracking bug identified by #j_random_hacker. As proof, I offer the following quick and dirty php script:
<?php
$subject = "The match should include 1/2 but not 12/34/56 but 11/23, now that's ok.";
$matches = array();
preg_match_all('/(?<![\/\d])(?:\d+)\/(?:\d+)(?![\/\d])/', $subject, $matches);
var_dump($matches);
?>
which outputs:
array(1) {
[0]=>
array(2) {
[0]=>
string(3) "1/2"
[1]=>
string(5) "11/23"
}
}
Lookahead is great if you're using Perl or PCRE, but if they are unavailable in the regex engine you're using, you can use:
(^|[^/\d])(\d+)/(\d+)($|[^/\d])
The 2nd and 3rd captured segments will be the numerator and denominator.
If you do use the above in a Perl regex, remember to escape the /s -- or use a different delimiter, e.g.:
m!(?:^|[^/])(\d+)/(\d+)(?:$|[^/])!
In this case, you can use (?:...) to avoid saving the uninteresting parenthesised parts.
EDIT 18/12/2009: Chris Lutz noticed a tricky bug caused by backtracking that plagues most of these answers -- I believe this is now fixed in mine.
if its line input you can try
^(\d+)\/(\d+)$
otherwise use this perhaps
^(\d+)\/(\d+)[^\\]*.
this will work: (?<![/]{1})\d+/\d+(?![/]{1})
Depending on the language you're working with you might try negative-look-ahead or look-behind assertions: in perl (?!pattern) asserts that /pattern/ can't follow the matched string.
Or, again, depending on the language, and anything you know about the context, a word-boundary match (\b in perl) might be appropriate.