Regular expression group word and sentence - regex

I would like to make a regular expression that does the following:
Gets the whole line of a text file
Gets the first word of that line
Outputs into an input
Currently I can do each of those separately but as one call it is getting hairy:
Whole Line
^\b(.*)\b
First Word
^\b(\w*)\b
Replace for Input
<div class="field"><label><input class="input-checkbox" id="Foo$1" name="Foo" type="checkbox" value="$1" /> <span>$1</span> </label></div>
I would like to use $1 and $2 to separate between the full line for the text display and the first word for the value and ID. Any thoughts? I really like regular expressions for their usefulness and speed as long as I don't hit a knowledge road block like this

Use the entire match:
Search: ^(\w+).*
Replace: First word is $1, whole line is $&
In your case, the replacment term would be:
<div class="field"><label><input class="input-checkbox" id="Foo$1" name="Foo" type="checkbox" value="$1" /> <span>$0</span> </label></div>
The entire match in Atom is coded as $&.
Most other tools/languages use group zero $0 for the entire match.

Related

Regex Match All Characters Between Tags on nth occurrence

I need to match text between two tags, but starting at a specific occurrence of the tag.
Imagine this text:
Some long <br> text goes <br> here. And some <br> more can <br> go here.<br>
In my example, I would like to match here. And some.
I successfully matched the text between the first occurrence (between the first and second br tags) with:
<br>(.*?)<br>
But I am looking for the text in the next match (which would be between the second and third br tags). This is probably more obvious than I realize, but Regex is not my strong suite.
Just extend your regex:
<br>(.*?)<br>(.*?)<br>
or, for an unlimited number of matches, and trimming the spaces:
<br>\s*(.*?)(?=\s*<br>)
EDIT: Now that I see that you are parsing an HTML document, be aware that regular expressions may not be the best tool for that job, especially if your parsing requirements are complex.

Regular expression replace start and end, ignore middle

In an Ant build file, is there a way to use a replaceregexp to find and replace two tags, and retain what's in between them? For example, to find all of these:
</a>1234abcdefg</P>
</a>123456789. </p>
</a> yop </p>
</a></p>
and replace
</a> and </p>
with
<#> and <##>
so that I have, respectively:
<#>1234abcdefg##
<#>123456789. <##>
<#> yop <##>
<#><##>
I can't replace the tags individually since they occur in other places, I just want the instances in which </a> is followed by </p>, in the same line, with either nothing or something in between them, and I want to keep what's in between them.
Try this:
<replaceregexp file="notTested.xml" match="(<)\/a(>.*?<)\/p(>)" replace="\1#\2##\3" byline="true" flags="g" />
as for, but it replaces what's between the tags with .* , i haven't seen .* in a replacement/substitution expression. probably it takes it as literals . and *.
as for </a>.*</p>, the > .* < will not work when you have multiple declerations of </a> and </p> on the same line... such as:
</a>1234abcdefg</P>abcde</a>123456789. </p> would be replaced as
<#>1234abcdefg</P>abcde</a>123456789. <##>
you need to use non greedy quantifier ?. See WiKi for the use of .*? vs .*.
Solution 1: You can try this
You store the match with parenthesis, and then replace it.
exp = new Regex(#"YourtagStartRegex(bodyRegex)YourtagClosingRegex");
str = exp.Replace(str, "$1");
Reference:Replace the start and end of a string ignoring the middle with regex, how?
Or
Solution 2:
Regex ignore middle part of capture

What Yahoo Pipes regex use in this case?

have you any ideas how to change in item. description in Yahoo.pipes this link
<img src="http://mysite.com/img/pc/image.gif" class="big" style="background-image:url(http://mysite.com/pre_big_crop/pic/pc/gallery/dd/c1/example.jpeg);" alt="" title="">
to this
<img src="http://mysite.com/pre_big_crop/pic/pc/gallery/dd/c1/example.jpeg"/>
using regex.
I don't know what variant of RegEx Pipes uses, so I'll go with the .NET variant and you can adjust for whatever syntax is needed. It should be pretty close.
Search for:
<img[^>]+url\(
([^\)]+)
\)[^>]+>
Replace with:
<img src="$1" />
Join the lines. Line 1 finds an image tag up to the url argument in the CSS style attribute. Line 2 matches the background image URL and captures it. Line 3 matches the rest of the image tag.
Here is an extremely simple regex to accomplish what you're looking for using PERL style Regexs:
<img.*background-image:url\((.*)\);.*>
Basically, here is the breakdown on how it matches:
It will start by matching the characters "
It then matches any characters, between 0 and unlimited times.
Then it matches the string "background-image:url(
Then it matches any characters, between 0 and unlimited times, which is captured into backreference #1
Then it matches the characters ");"
Then it matches any characters, between 0 and unlimited times.
Then it matches the ">" character.
Note: You should replace the items that match any characters to something more specific, depending on the application that you're using the regex. This is why I've referred to this as "extremely simple".
Then, that gets replaced with:
<img src="$1">
Edit: Didn't see richardtallent's answer, pretty similar application just a different implementation.

AS3 regex - how to match 2 consecutive matches multiple times?

I have the following regex, which will match all the <br> and <br /> tags in a string:
/<br[\s|\/]*>/gi
I actually want to match every set of two consecutive tags, with valid matches being:
<br><br>
<br/><br>
<br><br/>
<br/><br/>
(and all variations with a space before the slash)
Obviously I can just double up the expression to /<br[\s|\/]*><br[\s|\/]*>/gi, but is there a shorter way of taking the first expression and saying "this, but twice"?
Try this one:
/(<br[\s|\/]*>){2}/gi

Parsing with regular expressions

I have some text like
some text [http://abc.com/a.jpg] here will be long text
can be multiple line breaks again [http://a.com/a.jpg] here will be other text
blah blah
Which I need to transform into
<div>some text</div><img src="http://abc.com/a.jpg"/><div>here will be long text
can be multiple line breaks again</div><img src="http://a.com/a.jpg"/><div>here will be other text
blah blah</div>
To get the <img> tags, I replaced \[(.*?)\] with <img src="$1"/>, resulting in
some text<img src="http://abc.com/a.jpg"/>here will be long text
can be multiple enters again<img src="http://a.com/a.jpg"/>here will be other text
blah blah
However, I have no idea how to wrap the text in a <div>.
I'm doing everything on the iPhone with RegexKitLite
Here's the simplest approach:
Replace all occurrences of \[(.*?)\] with </div><img src="$1"/><div>
Prepend a <div>
Append a </div>
That does have a corner case where the result starts or ends with <div></div>, but this probably doesn't matter.
If it does, then:
Replace all occurrences of \](.*?)\[ with ]<div>$1</div>[
Replace all occurrences of \[(.*?)\] with <img src="$1"/>