I have the following regex, which will match all the <br> and <br /> tags in a string:
/<br[\s|\/]*>/gi
I actually want to match every set of two consecutive tags, with valid matches being:
<br><br>
<br/><br>
<br><br/>
<br/><br/>
(and all variations with a space before the slash)
Obviously I can just double up the expression to /<br[\s|\/]*><br[\s|\/]*>/gi, but is there a shorter way of taking the first expression and saying "this, but twice"?
Try this one:
/(<br[\s|\/]*>){2}/gi
Related
I am new to regex expression and I need a regex in the following pattern:
The string must have a format of “TCK#”. TCK followed by integers.
For example, This is acceptable TCK123. This is not acceptable 123
Here is my current regex expression:
input class="form-control" required="true" type="text" name="TCKInput"
pattern="^[TCK][0-9]$">
With my current code, when the user enter TCK123, it is not acceptable, which is not what I am looking for
Change to below regex:
^(?:TCK)[0-9]+$
Demo: https://regex101.com/r/h9V7n1/1
Changes in the existing Regex you were using:
1) You were using [, ] around TCK which means regex has to match
any one of the values inside this bracket. As you have to match TCK
as it is, change it to (, )
2) You didn't mention + after [0-9] which means exactly one
occurrence will be matched. However, if you will mention +, it will
match one or more occurrence
If you want all 3 letters: TCK and then at least one or more digits after it, then try this:
^TCK\d+$
If you use [TCK] that will only accept one T, one C, or one K
Demo
This Demo sends to a live test server, so a successful submission of data will result in a response from said server
<form id='main' action='https://httpbin.org/post' method='post'>
<input class="form-control" required="true" type="text" name="TCKInput" pattern="^TCK\d+$">
<input type='submit'>
</form>
I need to match text between two tags, but starting at a specific occurrence of the tag.
Imagine this text:
Some long <br> text goes <br> here. And some <br> more can <br> go here.<br>
In my example, I would like to match here. And some.
I successfully matched the text between the first occurrence (between the first and second br tags) with:
<br>(.*?)<br>
But I am looking for the text in the next match (which would be between the second and third br tags). This is probably more obvious than I realize, but Regex is not my strong suite.
Just extend your regex:
<br>(.*?)<br>(.*?)<br>
or, for an unlimited number of matches, and trimming the spaces:
<br>\s*(.*?)(?=\s*<br>)
EDIT: Now that I see that you are parsing an HTML document, be aware that regular expressions may not be the best tool for that job, especially if your parsing requirements are complex.
In an Ant build file, is there a way to use a replaceregexp to find and replace two tags, and retain what's in between them? For example, to find all of these:
</a>1234abcdefg</P>
</a>123456789. </p>
</a> yop </p>
</a></p>
and replace
</a> and </p>
with
<#> and <##>
so that I have, respectively:
<#>1234abcdefg##
<#>123456789. <##>
<#> yop <##>
<#><##>
I can't replace the tags individually since they occur in other places, I just want the instances in which </a> is followed by </p>, in the same line, with either nothing or something in between them, and I want to keep what's in between them.
Try this:
<replaceregexp file="notTested.xml" match="(<)\/a(>.*?<)\/p(>)" replace="\1#\2##\3" byline="true" flags="g" />
as for, but it replaces what's between the tags with .* , i haven't seen .* in a replacement/substitution expression. probably it takes it as literals . and *.
as for </a>.*</p>, the > .* < will not work when you have multiple declerations of </a> and </p> on the same line... such as:
</a>1234abcdefg</P>abcde</a>123456789. </p> would be replaced as
<#>1234abcdefg</P>abcde</a>123456789. <##>
you need to use non greedy quantifier ?. See WiKi for the use of .*? vs .*.
Solution 1: You can try this
You store the match with parenthesis, and then replace it.
exp = new Regex(#"YourtagStartRegex(bodyRegex)YourtagClosingRegex");
str = exp.Replace(str, "$1");
Reference:Replace the start and end of a string ignoring the middle with regex, how?
Or
Solution 2:
Regex ignore middle part of capture
have you any ideas how to change in item. description in Yahoo.pipes this link
<img src="http://mysite.com/img/pc/image.gif" class="big" style="background-image:url(http://mysite.com/pre_big_crop/pic/pc/gallery/dd/c1/example.jpeg);" alt="" title="">
to this
<img src="http://mysite.com/pre_big_crop/pic/pc/gallery/dd/c1/example.jpeg"/>
using regex.
I don't know what variant of RegEx Pipes uses, so I'll go with the .NET variant and you can adjust for whatever syntax is needed. It should be pretty close.
Search for:
<img[^>]+url\(
([^\)]+)
\)[^>]+>
Replace with:
<img src="$1" />
Join the lines. Line 1 finds an image tag up to the url argument in the CSS style attribute. Line 2 matches the background image URL and captures it. Line 3 matches the rest of the image tag.
Here is an extremely simple regex to accomplish what you're looking for using PERL style Regexs:
<img.*background-image:url\((.*)\);.*>
Basically, here is the breakdown on how it matches:
It will start by matching the characters "
It then matches any characters, between 0 and unlimited times.
Then it matches the string "background-image:url(
Then it matches any characters, between 0 and unlimited times, which is captured into backreference #1
Then it matches the characters ");"
Then it matches any characters, between 0 and unlimited times.
Then it matches the ">" character.
Note: You should replace the items that match any characters to something more specific, depending on the application that you're using the regex. This is why I've referred to this as "extremely simple".
Then, that gets replaced with:
<img src="$1">
Edit: Didn't see richardtallent's answer, pretty similar application just a different implementation.
I have a RegEx that is working for me but I don't know WHY it is working for me. I'll explain.
RegEx: \s*<in.*="(<?.*?>)"\s*/>\s*
Text it finds (it finds the white-space before and after the input tag):
<td class="style9">
<input name="guarantor4" id="guarantor4" size="50" type="text" tabindex="10" value="<?php echo $data[guarantor4]; ?>" /> </td>
</tr>
The part I don't understand:
<in.*=" <--- As I understand it, this should only find up to the first =" as in it should only find <input name="
It actually finds: <input name="guarantor4" id="guarantor4" size="50" type="text" tabindex="10" value=" which happened to be what I was trying to do.
What am I not understanding about this RegEx?
You appear to be using 'greedy' matching.
Greedy matching says "eat as much as possible to make this work"
try with
<in[^=]*=
for starters, that will stop it matching the "=" as part of ".*"
but in future, you might want to read up on the
.*?
and
.+?
notation, which stops at the first possible condtion that matches instead of the last.
The use of 'non-greedy' syntax would be better if you were trying to only stop when you saw TWO characters,
ie:
<in.*?=id
which would stop on the first '=id' regardless of whether or not there are '=' in between.
.* is greedy. You want .*? to find up to only the first =.
.* is greedy, so it'll find up to the last =. If you want it non-greedy, add a question mark, like so: .*?
As I understand it, this should only
find up to the first =" as in it
should only find <input name="
You don't say what language you're writing in, but almost all regular expression systems are "greedy matchers" - that is, they match the longest possible substring of the input. In your case, that means everything everying from the start of the input tag to the last equal-quote sequence.
Most regex systems have a way to specify that the patter only match the shortest possible substring, not the longest - "non-greedy matching".
As an aside, don't assume the first parameter will be name= unless you have full control over the construction of the input. Both HTML and XML allow attributes to be specified in any order.
Your greedy approach is causing confusion. You want .*?
Consider the input 101000000000100.
Using 1.*1, * is greedy - it will match all the way to the end, and then backtrack until it can match 1, leaving you with 1010000000001.
.*? is non-greedy. * will match nothing, but then will try to match extra characters until it matches 1, eventually matching 101.