Regex for \begin{?} and \end{?} - regex

I need to match from a a string \begin{?} and \end{?} where ? is any number of alphanumerical or * characters so it must match for example \begin{align} and \end{align*}.
I tried to do it but I'm not sure what's wrong
^\\begin{[^}]*}$
Start with \begin{, following anything that's not } multiple times and close with }.
The same thing is with the \end{?} but I would like it do it inside single regex if possible.

I think below regex is what you need.
\\(begin|end){[a-zA-Z0-9*]+}

Your regex:
\\(begin|end){.*?}
the .* will grab anything between the { }, and the ? means will stop when the first } comes.

{} are special characters used for expressing repetitions so you need to escape those as well.
^\\begin\{[^}]*\}$

Related

Regex not working correctly all the time, why?

Regex just working in some cases, other not working.
https://regex101.com/r/p5u3N6/1
I expected regex match only groups of two "{ } { }" without nothing between { }
I'm guessing that we wish to only capture, three of our inputs listed in the demo using an expression similar to:
(\{.*?\}(.+?){.*?\})
Demo 1
or
(\{(.+?)\}(.+?){(.+?)\})
Demo 2
The .*? in the first part of your pattern is passing through the unexpected parts of your input until it finds because . accepts all of those characters. Simply making the quantifier lazy with ? isn't enough-- it will still proceed until it finds a match.
\{[^}]*?\}\s\{[^}]*?\}
https://regex101.com/r/p5u3N6/5
Not sure I understood your requirements, I suppose you only want pairs of {}{} to match, and allow nothing more than one space between these two. You can try this \{([^\{]+)\}\ \{([^\}]+)\}.

Interesting easy looking Regex

I am re-phrasing my question to clear confusions!
I want to match if a string has certain letters for this I use the character class:
[ACD]
and it works perfectly!
but I want to match if the string has those letter(s) 2 or more times either repeated or 2 separate letters
For example:
[AKL] should match:
ABCVL
AAGHF
KKUI
AKL
But the above should not match the following:
ABCD
KHID
LOVE
because those are there but only once!
that's why I was trying to use:
[ACD]{2,}
But it's not working, probably it's not the right Regex.. can somebody a Regex guru can help me solve this puzzle?
Thanks
PS: I will use it on MYSQL - a differnt approach can also welcome! but I like to use regex for smarter and shorter query!
To ensure that a string contains at least two occurencies in a set of letters (lets say A K L as in your example), you can write something like this:
[AKL].*[AKL]
Since the MySQL regex engine is a DFA, there is no need to use a negated character class like [^AKL] in place of the dot to avoid backtracking, or a lazy quantifier that is not supported at all.
example:
SELECT 'KKUI' REGEXP '[AKL].*[AKL]';
will return 1
You can follow this link that speaks on the particular subject of the LIKE and the REGEXP features in MySQL.
If I understood you correctly, this is quite simple:
[A-Z].*?[A-Z]
This looks for your something in your set, [A-Z], and then lazily matches characters until it (potentially) comes across the set, [A-Z], again.
As #Enigmadan pointed out, a lazy match is not necessary here: [A-Z].*[A-Z]
The expression you are using searches for characters between 2 and unlimited times with these characters ACDFGHIJKMNOPQRSTUVWXZ.
However, your RegEx expression is excluding Y (UVWXZ])) therefore Z cannot be found since it is not surrounded by another character in your expression and the same principle applies to B ([ACD) also excluded in you RegEx expression. For example Z and A would match in an expression like ZABCDEFGHIJKLMNOPQRSTUVWXYZA
If those were not excluded on purpose probably better can be to use ranges like [A-Z]
If you want 2 or more of a match on [AKL], then you may use just [AKL] and may have match >= 2.
I am not good at SQL regex, but may be something like this?
check (dbo.RegexMatch( ['ABCVL'], '[AKL]' ) >= 2)
To put it in simple English, use [AKL] as your regex, and check the match on the string to be greater than 2. Here's how I would do in Java:
private boolean search2orMore(String string) {
Matcher matcher = Pattern.compile("[ACD]").matcher(string);
int counter = 0;
while (matcher.find())
{
counter++;
}
return (counter >= 2);
}
You can't use [ACD]{2,} because it always wants to match 2 or more of each characters and will fail if you have 2 or more matching single characters.
your question is not very clear, but here is my trial pattern
\b(\S*[AKL]\S*[AKL]\S*)\b
Demo
pretty sure this should work in any case
(?<l>[^AKL\n]*[AKL]+[^AKL\n]*[AKL]+[^AKL\n]*)[\n\r]
replace AKL for letters you need can be done very easily dynamicly tell me if you need it
Is this what you are looking for?
".*(.*[AKL].*){2,}.*" (without quotes)
It matches if there are at least two occurences of your charactes sorrounded by anything.
It is .NET regex, but should be same for anything else
Edit
Overall, MySQL regular expression support is pretty weak.
If you only need to match your capture group a minimum of two times, then you can simply use:
select * from ... where ... regexp('([ACD].*){2,}') #could be `2,` or just `2`
If you need to match your capture group more than two times, then just change the number:
select * from ... where ... regexp('([ACD].*){3}')
#This number should match the number of matches you need
If you needed a minimum of 7 matches and you were using your previous capture group [ACDF-KM-XZ]
e.g.
select * from ... where ... regexp('([ACDF-KM-XZ].*){7,}')
Response before edit:
Your regex is trying to find at least two characters from the set[ACDFGHIJKMNOPQRSTUVWXZ].
([ACDFGHIJKMNOPQRSTUVWXZ]){2,}
The reason A and Z are not being matched in your example string (ABCDEFGHIJKLMNOPQRSTUVWXYZ) is because you are looking for two or more characters that are together that match your set. A is a single character followed by a character that does not match your set. Thus, A is not matched.
Similarly, Z is a single character preceded by a character that does not match your set. Thus, Z is not matched.
The bolded characters below do not match your set
ABCDEFGHIJKLMNOPQRSTUVWXYZ
If you were to do a global search in the string, only the italicized characters would be matched:
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Replace a character by another, unless it is located in between braces

What I would like to do with the following string, is to replace all comas "," by tabulation, unless the said coma is between braces { }.
Say I have:
goldRigged,1,0,0,0,1,0,0,0,1,"{"LootItemID": "goldOre", "Amount": 1}"
The result should be:
goldRigged\t1\t0\t0\t0\t1\t0\t0\t0\t1\t"{"LootItemID": "goldOre"**,** "Amount": 1}"
I already have: \"(\\{((.*?))\\})\" which allow me to match what's in between { }.
The idea would be to exclude the content with something and match any comas with something like \",^(\\{((.*?))\\})\"
But I guess that by doing that it will exclude the comma itself.
What you would need is called a negative lookahead and a negative lookbehind. However, this would make up a quite complex statement:
Match all commas that are not preceeded by a opening brace as long as they were not previously preceeded by a closing brace (plus the reverted logic for the right side of the comma). This will result in an expression that is difficult to process because the regex engine constantly needs to run up and down your string from its current position what will be rather inefficient.
Instead, iterate over all characters of your string. If you match an opening brace, set an escape hint. Remove it, when you find a closing brace. When you find a comma, replace it when your escape hint is not set. Write your result to some sort of string buffer and your solution will b significantly more efficiant over the regex.
You want to use a negative lookaround to achieve this:
(?<![\{\}]),*(?![\{\}]) should work, try here: http://regex101.com/r/gG3oU1
Use negative lookahead (?!expr) and negative lookbehind (?<!expr) in your regex expression
for example you can code like this:
System.Text.RegularExpressions.Regex.Replace(
"goldRigged,1,0,0,0,1,0,0,0,1, {\"LootItemID\": \"goldOre\", \"Amount\": 1}" ,
#"(?<!\{[^\}].*)[,](?![^\{]*\})", "\t");
Does your input line contain the { only in the last token?
If yes then you can try this brute force approach
echo "goldRigged,1,0,0,0,1,0,0,0,1,"{"LootItemID": "goldOre", "Amount": 1}"" | awk -F'{' '{one=$1;gsub(",","\t",one);printf("%s{%s\n",one,$2);}
The below regex is an expensive way of doing it. As suggest by #Sniffer a parser would be nicer here :)
(?=,.*?"{),|(?!,.*?\}),
First alternation
(?=,.*?"{), - make sure comma is outside the sequence "{
Second alternation
(?!,.*?\}), - make sure comma isn't inside the sequence }"
There will be edge cases that haven't been accounted for, that's the parser comes in
I think you actually need only one lookahead:
,(?=[^{}]*({|$))
reads: a comma, followed by some non-braces and then either an open brace or the end.
Example in JS:
> x = 'goldRigged,1,0,0,0,1,0,0,0,1,"{"LootItemID": "goldOre", "Amount": 1}",some,more{stuff,ff}end'
> x.replace(/,(?=[^{}]*({|$))/g, "#")
"goldRigged#1#0#0#0#1#0#0#0#1#"{"LootItemID": "goldOre", "Amount": 1}"#some#more{stuff,ff}end"
Note this doesn't work if braces can be nested, in this case you need either a regex engine with recursion (?R) or a proper parser.

Little vim regex

I have a bunch of strings that look like this: '../DisplayPhotod6f6.jpg?t=before&tn=1&id=130', and I'd like to take out everything after the question mark, to look like '../DisplayPhotod6f6.jpg'.
s/\(.\.\.\/DisplayPhoto.\{4,}\.jpg\)*'/\1'/g
This regex is capturing some but not all occurences, can you see why?
\.\{4,} is trying to match 4 or more . characters. What it looks like you wanted is "match 4 or more of any character" (.\{4,}) but "match 4 or more non-. characters" ([^.]\{4,}) might be more accurate. You'll also need to change the lone * at the end of the pattern to .* since the * is currently applying to the entire \(\) group.
I think the easyest way to go for this is:
s/?.*$/'/g
This says: delete everything after the question mark and replace it with a single quote.
I would use macros, sometime simpler than regexp (and interactive) :
qa
/DisplayPhoto<Enter>
f?dt'
n
q
And then some #a, or 20000#a to go though all lines.
The following regexp: /(\.\./DisplayPhoto.*\.jpg)/gi
tested against following examples:
../DisplayPhotocef3.jpg?t=before&tn=1&id=54
../DisplayPhotod6f6.jpg?t=before&tn=1&id=130
will result:
../DisplayPhotocef3.jpg
../DisplayPhotod6f6.jpg
%s/\('\.\.\/DisplayPhoto\w\{4,}\.jpg\).*'/\1'/g
Some notes:
% will cause the swap to work on all lines.
\w instead of '.', in case there are some malformed file names.
Replace '.' at the start of your matching regex with ' which is exactly what it should be matching.

How to Match The Inner Possible Result With Regular Expressions

I have a regular expression to match anything between { and } in my string.
"/{.*}/"
Couldn't be simpler. The problem arises when I have a single line with multiple matches. So if I have a line like this:
this is my {string}, it doesn't {work} correctly
The regex will match
{string}, it doesn't {work}
rather than
{string}
How do I get it to just match the first result?
Question-mark means "non-greedy"
"/{.*?}/"
Use a character class that includes everything except a right bracket:
/{[^}]+}/
this will work with single nested braces with only a depth of one: {(({.*?})*|.*?)*}
I'm not sure how to get infinite depth or if it's even possible with regex
Default behaviour is greedy matching, i.e. first { to last }. Use lazy matching by the ? after your *.,
/{.*?}/
or even rather than * use "not a }"
/{[^}]*}/