Extract multiple variable values from a single regular expression - regex

I want to extract ID and Name from a single regular expression, but I'm not able to get the correct response
<a href="/profiles/6635/Name"
I have used below regular expression
<a href="/profiles/(.*?)/(.*?)"

As #WiktorStribiżew suggested, you should fix your regular expression to
<a href="/profiles/([^/]+)/([^/]+)"
But also use $1$ and $2$ to get both values in in Template field, for example
$1$$2$
Will save to variable concatenated value - 6635Name

What you use <a href="/profiles/(.*?)/(.*?)" is fine to capture ID and name from <a href="/profiles/6635/Name" because a lazy way (non-greedy) (.*?) you use will match only between profiles/ and the second / same like using [^\/]+ and then between / and " so , check again that you put everything right .
You may need to escape / like this \/so , change it to :
<a href="\/profiles\/(.*?)\/(.*?)"
This is your same regex here DEMO
And if you need to make sure with java tester use this tool :Java regex tester

Related

Regex find specific character but just when inside an HTML tag

I have an HTML string, e.g. :
<a href=“{{foo.bar}}”>some text “nice” here</a>
I'm trying to find out if any opening/closing double quote (“”, not ") is present inside an html tag (i.e. inside <>, but there could others things also in the tag).
In my example, <a href=“{{foo.bar}}”> should match but “nice” or </a> shouldn't.
What is the right regex for this ?
Actually I don't believe you've found it but you rather you fell into the common trap of regular expressions. You found a pattern which matches what you desire in a specific case.
If you place a < character inside the value of the tag of the link, <a href=“{{foo.bar}}”>some text < “nice” here</a> and your regex will match <a href=“{{foo.bar}}”> and < “nice” here</a>.
So an extra caution needs to be taken when it comes to regular expressions. To match any opening tag of html better use <\w+.*?>. After that extract whatever you find inside “”.
ok, found it : <[^>]*[“”]+[^>]*>
That does not work as you probably expect it to. When you add capturing groups, you'll see which parts of the string are actually matched by which groups:
<([^>]*)([“”]+)([^>]*)>
matches your example in this way:
<a href=“{{foo.bar}}”> a href=“{{foo.bar}} ”
^ Full match ^ 1st group ^ 2nd group ^ 3rd group (nothing)
Building on #Themelis' answer, you probably want to start with something like this:
<(\w+ [^<>“]*)“([^”]+)”([^<>]*)>
matches your example in this way:
<a href=“{{foo.bar}}”> a href= {{foo.bar}}
^ Full match ^ 1st group ^ 2nd group ^ 3rd group (nothing)

regex substitute two patterns in one match

I'm trying to do a find/replace in notepad++ where the string is similar to
<span class="CharOverride-1">Q</span>
With a single replace command I'd like the result to be
<span class="somethingNew">somethingElse</span>
This matches the two things I want replaced but I don't know how to form the substitution
(?<=<span class="(CharOverride-1)">)(Q)(?=<\/span>)
If possible I'd like to avoid doing something like this
(<span class=")(CharOverride-1)(">)(Q)(<\/span>)
and
\1somethingNew\3somethingElse\5
You can simlpy use 3 captures groups:
Search:
(<span class=").*?(">).*?(</span>)
Replace:
\1somethingNew\2somethingElse\3
Don't forget to check the "regular expression" checkbox.
But, if I can give you a very personal advice: don't use Notepad++...
The regular expression (?<=<span class=")CharOverride-1">Q(?=<\/span>) uses lookahead and lookbehind to find the string CharOverride-1">Q, but only where it follows the string <span class=" and is followed by </span>. Use somethingNew">somethingElse as the replacement string.

Regular expressions: Find and replace url with c:url tag

I have a problem to build good regular expressions to find and replace. I need to replace all urls in many .jsf files. I want replace ulrs staring by XXX with <c:url value="URL_WITHOUT_XXX"/>. Examples below.
I stuck with find regular expression "XXX(.*)" and replace expression "<c:url value="\1"/>", but my find expression match to long string , for example "XXX/a" style="", but need that match only to first " (href end). Anybody helps ?
I have:
<a href="XXX/a" style="">
<a href="XXX/b" >
<a href="XXX/c" ...>
I want:
<a href="<c:url value="/a"/>" style="">
<a href="<c:url value="/b"/>" >
<a href="<c:url value="/c"/>" ...>
PS: Sorry for my poor english ;)
Edit:
I use Find/Replace in Eclipse (regular expressions on)
You should specify the language you're working with.
The following regex will match what you want:
<a href="XXX[^\"]*"
If you want to have some particular value, you can group the regex according to your needs. For example:
<a href="(XXX[^\"]*)"
will give you in the first group:
XXX/a
XXX/b
XXX/b
If you want to have only /a, /b, and /c, you can group it like that:
<a href="XXX([^\"]*)"
Edit:
I will explain what <a href="XXX[^\"]*" does:
It will match: <a href="XXX
Then it should match anything that except a " zero or many times: [^\"]*
Finally match the ", which is not really necessary
When you do: [^abc] you're telling it to match anything but not a, or b, or c.
So [^\"] is: Match anything except a ".
And the quantifier * means zero or more times, so a* will match either an empty string, or a, aa, aaa, ...
And the last thing: Groups
When you want to keep the value appart from the entire match, so you can do anything with it, you can use groups: (something).

Regex for extracting links with specified attributes

I'm trying to build regex to extract links from text which have not rel="nofollow".
Example:
aiusdiua asudauih <a rel="nofollow" hre="http://uashiuadha.asudh/adas>adsaag</a> uhwaida <br> asdgydug <a href="http://asdha.sda/uduih/dufhuis>aguuia</a>
Thanks!
The following regex will do the job:
<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"
The wanted urls will be in the capture group #1. E.g. in Ruby it would be:
if input =~ /<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"/
match = $~[1]
end
Since it accepts [^>]*? before rel in the negative lookahead, href or anything else can come before rel. If href comes after rel, it'll of course also be ok.
Try this
<(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"]([^>"]*)[^>]*?>
if you are using .net regex then
<(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"](?<URL>[^>"]*)[^>]*?>
data lies in group named URL or group 1

How to write this regex expression

In my HTML I have below tags:
<img src="../images/img.jpg" alt="sometext"/>
Using regex expression I want to remove alt=""
How would I write this?
Update
Its on movable type. I have to write it a like so:(textA is replaced by textB)
regex_replace="textA","textB"
Why don't you just find 'alt=""' and replace it with ' ' ?
On Movable Type try this:
regex_replace="/alt=""/",""
http://www.movabletype.org/documentation/developer/passing-multiple-parameters-into-a-tag-modifier.html
What regex you are asking for ? Straight away remove ..
$ sed 's/alt=""//'
<img src="../images/img.jpg" alt=""/>
<img src="../images/img.jpg" />
This does not requires a regex.
The following expression matches alt="sometext"
alt=".*?"
Note that if you used alt=".*" instead, and you had <img alt="sometext src="../images/img.jpg"> then you would match the whole string alt="sometext src="../images/img.jpg" (from alt=" to the last ").
The .* means: Match as much as you can.
The .*? means: Match as little as you can.
s/ alt="[^"]*"//
This regex_replace modifier should match any IMG tag with an alt attribute and capture everything preceding the alt attribute in group #1. The matched text is then replaced with the contents of group #1, effectively stripping off the alt attribute.
regex_replace='/(<img(?:\s+(?!alt\b)\w+="[^"]*")*)\s+alt="[^"]*"/g','$1'
Is that what you're looking for?