Regexp capture unlimited groups - regex

I need a little help here.
So I have string:
{block name="something" param1="param" param2="param"}
it can be:
{block name="something"} or
{block name="something" param1="value" sm="value" ng="value" um="param" .. and so on}.
What I need is to capture all possible params.
What I could figure out so far is {(?<type>[\w]+) ((?<param>[\w]+)="(?<value>[\w]+)"), but it captures only first param - "name" :/
Any help will be appreciated.

Here you need to use \G in-order to do continuous string match. \h matches any horizontal whitespace character.
(?:^\{(?<type>\w+)|\G)\h*((?<param>\w+)="(?<value>\w+)")
DEMO

Related

regexp for hashtag/mention in href

My goal make html hastag, for this i'm need wrap text with # into
<a class="tag"><span class="hash">#</span>text</a>
I wan't make regexp which can give me words with # and #, but i'm have some trouble with URLs like this:
http://gitlab.com/#xxx or https://medium.com/#erikdkennedy
My example string:
<p>Some text <span class="highlighted">#test</span><br />
gitlab.com/#xxx<br />
<code>some feature</code></p>
My regexp is:
(?!.*(<mail-link|link))#([a-zA-Z0-9]+)
I get 2 matches #test and last #xxx (https://regex101.com/r/pXxIkf/1)
How i can get only test, and dont find inside the href definition?
Thank you!
Try this :
(?<=\>)(?:[\s]*(?:#|#))([a-zA-Z0-9]+)
(?<=>) Positive Lookbehind to make sure that there is > before the hashtag.
(?: start non-capturig group.
[\s]* there is whitespace or not.
(?:#|#) non-capturig group that make sure there either # or #
DEMO

Regex - match every possible char and space

I want to extract data from html. The thing is, that i cant extract 2 of strings which are on the top, and on the bottom of my pattern.
I want to extract 23423423423 and 1234523453245 but only, if there is string Allan between:
<h4>###### </h4> said12:49:32
</div>
<a href="javascript:void(0)" onclick="replyAnswer(##########,'GET','');" class="reportLink">
report </a>
</div>
<div class="details">
<p class="content">
Hi there, Allan.
</p>
<div id="AddAnswer1234523453245"></div>
Of course, i can do something like this: Profile\/(\d+).*\s*.*\s*.*\s*.*\s*.*\s*.*\s*.*\s*.*Allan.*\s*.*\s*.*AddAnswer(\d+). But the code is horrible. Is there any solution to make it shorter?
I was thinking about:
Profile\/(\d+)(.\sAllan)*AddAnswer(\d+)
or
Profile\/(\d+)(.*Allan\s*)*AddAnswer(\d+)
but none of wchich works properly. Do you have any ideas?
You can construct a character group to match any character including newlines by using [\S\s]. All space and non-space characters is all characters.
Then, your attempts were reasonably close
/Profile\/(\d+)[\S\s]*Allan[\S\s]*AddAnswer(\d+)/
This looks for the profile, the number that comes after it, any characters before Allan, any characters before AddAnswer, and the number that comes after it. If you have single-line mode available (/s) then you can use dots instead.
/Profile\/(\d+).*Allan.*AddAnswer(\d+)/s
demo
You can use m to specify . to match newlines.
/Profile\/(\d+).+AddAnswer(\d+)/m
Better use a parser instead. If you must use regular expressions for whatever reason, you might get along with a tempered greedy solution:
Profile/(\d+) # Profile followed by digits
(?:(?!Allan)[\S\s])+ # any character except when there's Allan ahead
Allan # Allan literally
(?:(?!AddAnswer)[\S\s])+ # same construct as above
AddAnswer(\d+) # AddAnswer, followed by digits
See a demo on regex101.com

Match text via Regex that is within an HTML tag

Via a Regex, I'm trying to match the word one, only when it's within an HTML <p> tag.
<p>zero one two three</p>
zero one two<p>three</p>
<p>zero one <b>two</b></p><p>three</p>
<p>two</p>three one
#1 and #3 above should be matches. It feels like I need a lookahead that makes sure there is a closing </p> tag without an opening <p> tag that comes before it (or a lookbehind that does the opposite). But I can't seem to come up with the right expression. Any ideas are appreciated.
<p>(?:(?!<\/p>).)*(\bone\b)(?:(?!<\/p>).)*<\/p>
You can try this.Just grab the capture.See demo.
http://regex101.com/r/xT7yD8/12
You could try the below regex to match the string one which is inside the <p> tag.
\bone\b(?=(?:(?!<\/?p>).)*<\/p>)
DEMO

Non-greedy regex acts greedily

Here's a simple example:
Text: <input name="zzz" value="18754" type="hidden"><input name="zzz" value="18311" type="hidden"><input name="zzz" value="17138" type="hidden">
Regex: /<input.*?value="(18754|17138)".*?>/
When matches are replaced by an empty string, the result is an empty string. I expected the middle <input> to remain since I am using non-greedy matching (.*?). Anyone could explain why it is removed?
There are two matches:
<input name="zzz" value="18754" type="hidden">
<input name="zzz" value="18311" type="hidden"><input name="zzz" value="17138" type="hidden">
In the second case, the first .*? matches name="zzz" value="18311" type="hidden"><input name="zzz". It's a match and it's non-greedy.
aix already explained, why it does match the middle part.
To avoid this behaviour, get rid of the .*?, instead try this:
/<input[^>]*value="(18754|17138)"[^>]*>/
See it here on Regexr
Instead of matching any character, match any, but ">"
aiz's answer is correct -- the second match includes the 2nd and 3rd input tags.
One possible fix for your regex would be to change . to [^>], like this:
/<input[^>]*?value="(18754|17138)"[^>]*?>/
That will cause it to match any character except >. But that has the obvious problem of breaking whenever > shows up inside a quoted literal. As everyone always says: Regexes aren't designed to work on HTML. Don't use them unless you have no other choice.

What Yahoo Pipes regex use in this case?

have you any ideas how to change in item. description in Yahoo.pipes this link
<img src="http://mysite.com/img/pc/image.gif" class="big" style="background-image:url(http://mysite.com/pre_big_crop/pic/pc/gallery/dd/c1/example.jpeg);" alt="" title="">
to this
<img src="http://mysite.com/pre_big_crop/pic/pc/gallery/dd/c1/example.jpeg"/>
using regex.
I don't know what variant of RegEx Pipes uses, so I'll go with the .NET variant and you can adjust for whatever syntax is needed. It should be pretty close.
Search for:
<img[^>]+url\(
([^\)]+)
\)[^>]+>
Replace with:
<img src="$1" />
Join the lines. Line 1 finds an image tag up to the url argument in the CSS style attribute. Line 2 matches the background image URL and captures it. Line 3 matches the rest of the image tag.
Here is an extremely simple regex to accomplish what you're looking for using PERL style Regexs:
<img.*background-image:url\((.*)\);.*>
Basically, here is the breakdown on how it matches:
It will start by matching the characters "
It then matches any characters, between 0 and unlimited times.
Then it matches the string "background-image:url(
Then it matches any characters, between 0 and unlimited times, which is captured into backreference #1
Then it matches the characters ");"
Then it matches any characters, between 0 and unlimited times.
Then it matches the ">" character.
Note: You should replace the items that match any characters to something more specific, depending on the application that you're using the regex. This is why I've referred to this as "extremely simple".
Then, that gets replaced with:
<img src="$1">
Edit: Didn't see richardtallent's answer, pretty similar application just a different implementation.