i have this string
<p> this is some text</p>
can be any number of times
to match i am using regex (?<=<p.*?>* )(.*)(?=</p>)
but i am getting this is some text as output
How to get this is some text
EDIT
i am sorry my string is <p class='randomstring'>a) this is some text</p>
in place of a) there is digit some times.
You can use this regex:
(?<=<p[^>]*>)(?: )+(.*)(?=</p>)
And grab the captured group #1 for you match, that will be:
this is some text
EDIT: Based on your edited question try this regex:
(?<=<p[^>]*>)[^)]*\) *(?: )+(.*)(?=</p>)
You could use the below regex which uses variable length positive lookbehind.
(?<=<p[^>]*>(?: )+)\b.*?(?=</p>)
This should match only the string this is some text
Update:
(?<=<p[^>]*>\w*\)(?: )+)\b.*?(?=</p>)
Related
I need to match text between two tags, but starting at a specific occurrence of the tag.
Imagine this text:
Some long <br> text goes <br> here. And some <br> more can <br> go here.<br>
In my example, I would like to match here. And some.
I successfully matched the text between the first occurrence (between the first and second br tags) with:
<br>(.*?)<br>
But I am looking for the text in the next match (which would be between the second and third br tags). This is probably more obvious than I realize, but Regex is not my strong suite.
Just extend your regex:
<br>(.*?)<br>(.*?)<br>
or, for an unlimited number of matches, and trimming the spaces:
<br>\s*(.*?)(?=\s*<br>)
EDIT: Now that I see that you are parsing an HTML document, be aware that regular expressions may not be the best tool for that job, especially if your parsing requirements are complex.
I'm still fairly green when it comes to regular expressions. What I am trying to achieve is :
Source:
<!-- Text --><b>Text</b>
Link
<div class="col"><h1>Nested Content</h1><p>More content</p>
</div>
<!-- END of Text -->
More text <!-- Another Tag Comment -->
Expected Capture :
$1 = Text
$2 = <b>Text</b>
Link
<div class="col"><h1>Nested Content</h1><p>More content</p>
</div>
$3 = END of Text
Current Regex :
/\<\!-*( *[A-Za-z]*) *-*\>([\s\S\t\r]*)\<\!-*( *[A-Za-z]*) *-*\>/igm
The issues are its too greedy it continues until the match in the source ending with :
$3 = Another Tag Comment
How do I go about refactoring my regex to end with the expected capture ?
<!--((?:(?!-->).)*)-->((?:(?!<!--)[\s\S])+)<!--((?:(?!-->).)*)-->
You can try this.See demo.
https://regex101.com/r/cA4wE0/17
You need to make the inner pattern [\s\S]* as non-greedy and also you need to add \s or space inside the last character class [A-Za-z]* . Add word boundaries \b, inorder to do an exact string match.
\<\!-* *([A-Za-z]*) *-*\>([\s\S]*?)<!-* *(\b[A-Za-z ]*\b) *-*\>
DEMO
I'm using Sublime Text, and I want to use Find/Replace to make HTML to Markdown. One problem I encountered is how to replace multiple matches?
The HTML is below:
<blockquote>
<p> text 1 </p>
<p> text 2 </p>
<p> text 3 </p>
<p> text 4 </p>
</blockquote>
And I want to change it to
><p> text 1 </p>
><p> text 2 </p>
><p> text 3 </p>
><p> text 4 </p>
I use
<blockquote>\n(^.+$\n)+?.+</blockquote>
to capture the p tag within the blockquote. But how to replace it?
Thanks a lot.
I have tested this for your simple test case. The main problem is, it may or may not work for more complex input, where you may need to further customize the regex.
Find what:
(?:<blockquote>\s*+|(?<!\A)(?<!</blockquote>)\G)(.*)\s++(?:</blockquote>)?
This solution will clean the closing tag as it match the last line. It fixes the caveat in the first solution where the end tag </blockquote> is not removed.
Replace with:
\n> $1
Use regular expression mode and highlight matches to check what will be replaced.
It will strip all leading spaces, and leave only 1 space between > and the text.
The regex above is built based on my own answer to the question of solving this class of problem with regex alone: Collapse and Capture a Repeating Pattern in a Single Regex Expression.
My earlier solution is based on the second construct, while the current solution is based on the first construct. The initial solution is quoted here, in case you want to customize the regex to be more flexible with its end tag (e.g. free spacing):
(?:<blockquote>\s*+|(?!\A)\G\s++(?!</blockquote>))(.*)
You can do this in two steps.
1)<blockquote>((?:(?!<\/blockquote>).)*)<\/blockquote> replace by $1.
See demo.
http://regex101.com/r/dZ1vT6/35
2)^\s+ replace by <
See demo.
http://regex101.com/r/dZ1vT6/36
include_once($pathToRoot.'header.php');
echo('</div>');
assume you have variations on the above code across hundreds of files, how do you match against the first occurrence of
</div>
after
header.php'
?
In the find field:
(?s)(header\.php'.+?)</div>
In the replace (if you what to replace </div> with </test>):
$1</test>
I don't know that sublimetext2 but the regular expression would look like this:
/include_once\($pathToRoot.'header.php'\);(.*?)(<\/div>)/s
The first group would be the string between the include and the closing div and the second group would be the closing div itself.
have you any ideas how to change in item. description in Yahoo.pipes this link
<img src="http://mysite.com/img/pc/image.gif" class="big" style="background-image:url(http://mysite.com/pre_big_crop/pic/pc/gallery/dd/c1/example.jpeg);" alt="" title="">
to this
<img src="http://mysite.com/pre_big_crop/pic/pc/gallery/dd/c1/example.jpeg"/>
using regex.
I don't know what variant of RegEx Pipes uses, so I'll go with the .NET variant and you can adjust for whatever syntax is needed. It should be pretty close.
Search for:
<img[^>]+url\(
([^\)]+)
\)[^>]+>
Replace with:
<img src="$1" />
Join the lines. Line 1 finds an image tag up to the url argument in the CSS style attribute. Line 2 matches the background image URL and captures it. Line 3 matches the rest of the image tag.
Here is an extremely simple regex to accomplish what you're looking for using PERL style Regexs:
<img.*background-image:url\((.*)\);.*>
Basically, here is the breakdown on how it matches:
It will start by matching the characters "
It then matches any characters, between 0 and unlimited times.
Then it matches the string "background-image:url(
Then it matches any characters, between 0 and unlimited times, which is captured into backreference #1
Then it matches the characters ");"
Then it matches any characters, between 0 and unlimited times.
Then it matches the ">" character.
Note: You should replace the items that match any characters to something more specific, depending on the application that you're using the regex. This is why I've referred to this as "extremely simple".
Then, that gets replaced with:
<img src="$1">
Edit: Didn't see richardtallent's answer, pretty similar application just a different implementation.