I would like to match the following string:
[-ed, -ing, -ment <i>n.</i>]
But exclude:
[-ed, -ing, -ment <b>n.</b>]
And my regex is:
\[[\-,\s\.(<i>)(</i>)a-z]+\]
Which won't work.
I add brackets to <i> so it appears as a whole, so <b> wont be matched.
However, brackets inside square brackets don't seem to work.
The following works with the sample string:
\[([^<>]|<i>.*?<\/i>)+?\]
I.e. square brackets containing a number of things that are either a single character that is neither < nor >, or <i>[...]</i> with some content.
It will match the first string and not the second. The problem description however is quite vague, so the regex might need some tweaking. E.g:
Is it just <i> or anything but <b>?
Can the square brackets contain nested square brackets?
Are the contents of the square brackets in fact comma-separated elements that must all begin with a hyphen?
Related
Can anyone tell me how to do the following task using regex?
replace all the ABC with DEF only when ABC is inside both <> and ""
original string:
<tagA nameABC1="attr1ABCx xyzABC" name2="attABCa"> outside"ABC"xyz</tagA>
<tagB nameABC2="attr2ABCx cccABC" name3="testABCb"> outside_"ABC"</tagB>
desired string after replacing:
<tagA nameABC1="attr1DEFx xyzDEF" name2="attDEFa"> outside"ABC"xyz</tagA>
<tagB nameABC2="attr2DEFx cccDEF" name3="testDEFb"> outside_"ABC"</tagB>
Edited:
Thank you guys.
I've decided to use HTML parser library jsoup to handle all html text properly.
Assuming well formed input (no dangling quotes or brackets):
Search: ABC(?=(?:(?:[^"]*"){2})*[^"]*"[^"]*$)(?=[^<>]*>)
Replace: DEF
See live demo.
This works by applying two look aheads:
the first look ahead (?=(?:(?:[^"]*"){2})*[^"]*"[^"]*$) requires there to be an odd number of quote characters in the remaining input, which in turn means the match is inside quotes
the other look ahead (?=[^<>]*>) requires the next angle bracket to be a closing bracket, which in turn means the match is inside an angle bracket pair
This is not bullet proof, for example it doesn't cater for closing angle brackets being inside quotes, but even this could be handled with an even more complicated look ahead that applied similar logic from the first look ahead when matching angle brackets... an excerise left for the reader.
I am checking for a solution to the following problem.
I have a text sequence as follows and I would like to extract the contents of the square brackets which is closer to the <em> tag.
[P1/1]0(4)0(5)**[P1/432]** g(5)I(2)d(7)a(8)`<em>`b(5)[P1/4]C(6)e(7)B(8)B`</em>`(9)[P1/5]0(6)i(7)[P1/6]0(1)I(2)[P1/7]0(6)[P1/1]0(1)0(2)[P1/2]E(1)c(2)d(3)a(4)**[P1/3]** 0(1)`<em>`b(2)[P1/4]C(1)e(2)B(3)B`</em>`(4)[P1/5]0(1)
So in the above mentioned text, what I am searching for is [P1/432] and [P1/3].
With regular expression ((.(?!\[.*?]))+?)<em>, I am not able to get only the contents of the brackets, but everything from [ to <em>.
Can someone help me ??
There is a straightforward solution if we don't care about nested, unbalanced brackets:
\[[^\]\[]*\](?=[^\]\[]*<em>)
Live demo
I am trying to parse a regular expression in matlab. I am trying to extract all the number between '[]' for all the groups. Here are the details:
pat = '(\[\d,\d,\d,\d\])';
s1 = 'frame_1:[1,2,3,5],[11,22,33,44],[23,12,12,33],'
[matched_string] = regexp(s1,pat,'match');
>> matched_string{:}
ans =
'[1,2,3,5]'
I want to get all the boxes, i.e [1,2,3,5],[11,22,33,44] and [23,12,12,33].
Can someone help me figure out what I am doing wrong?
Your pattern only matches single digits inside square brackets. To match one or more, add + after each:
'(\[\d+,\d+,\d+,\d+\])'
If you do not care of the format inside the square brackets, and just need to extract square brackets with digits and commas inside, you may use a simpler
'\[[\d,]+]'
Note that ] at the end of the regular expression is not a special char here, since there is no corresponding [ that opens a character class, thus, no need escaping it.
I have a dictionary list as a text files and want to select certain words that contains all of the members of a list of specific characters. Using the text editor notepad++ to apply following regular expression on the dictionary list. I've tried the following regular expression statement on notepad++;
[BLT]+
However, this matches not all of the letters in the square brackets, but any of the letters in the square brackets. Then I've also tried the following regular expression, including the word boundary;
\b[BLT]+
And this expression, again, matches all the occurences of the words including any, but not all of the letters listed in between the square brackets.
Desired Behaviour
Let say, the dictionary contains a list as below;
AL
BAL
BAK
LABAT
TAL
LAT
BALAT
LA
AB
LATAB
TAB
What I need is an expression that contains all of the the letters 'B','L','T' (not any!), thus expected behaviour should be as below;
LABAT
BALAT
LATAB
What is the most minimalist and generic regular expression for this problem?
You can use lookaheads:
^(?=.*B)(?=.*L)(?=.*T).+$
As an example for a more general case, the optimized regex for at least 1 B, 2 Ls and 3 Ts:
^(?=[^B\n]*B)(?=(?:[^L\n]*L){2})(?=(?:[^T\n]*T){3}).+$
I am trying to match (a) and replace (b) the following occurrences:
array[0] -> atoi(array[0])
array[1] -> atoi(array[1])
...
array[i+1] -> atoi(array[i+1])
and so on...
(a) I am unable to match anything with the following expression array\\[(.\*?)\\] , array\\[.\*?\\] , or array\\[*\\]
I am able to match single character occurrences between the brackets with array\\[.\\] and additionally also segments with multiples matches on a single line with array\\[.*\\]
(b) After a working match I figure s/"MATCHING REGEX"/atoi(array\[\1\])/g should work, however attempting that with array\\[.\\] resulted in atoi(array[])
How about this?
:s/\<array\[[^\]]\+\]/atoi(\0)/
You can use:
:s/array\[.\{-}\]/atoi(&)
Well you don't really say what RegEx engine you are using but if I had to guess it may be that this particular engine doesn't like the "non-greedy" qualifier. So let's try the regex eleminating the non-greedy qualifier and using a character class of "not closing square bracket" in place of the non-greedy ".*?". Try this instead:
array[([^]]*)]