Regular expression: Remove first match pattern in front and behind certain text

Regular expression: Remove first match pattern in front and behind certain text - regex

I have the following text.
<span style="color:#FF0000;">赤色</span><span style="color:#0;">|*|</span><span style="color:#0070C0;">青色</span><span style="color:#0;">|*|</span><span style="color:#00B050;">緑色</span><span style="color:#0;">|*|</span>
I need to remove any span tag that defines color for "|*|" only. That is in this case, I need to remove
<span style="color:#0;">
and
</span>
Can anyone help to do that?
Thanks in advance!

You want something like this:
<span[^>]+style="[^"]*color:[^>]+>(\|\*\|)<\/span>
This matches <span, then one or more non-> characters, then a style attribute that contains color:, then the rest of the tag, then |*|, then </span>.
You would replace with $1 or just |*|.
Here's a demo.
Note: one reason your attempt didn't work is that you escaped the |s, but not the *. You need to escape the * as \*.

Related

How to Match Redundant Lines From Contenteditable Div in Regex

I'm trying to process the html inside a contenteditable div. It might look like:
<div>Hi I'm Jack...</div>
<div><br></div>
<div><br></div>
<div>More text.</div> *<div><br></div>*
*<div><br></div>**<div><br></div>*
*<div><br></div>*
*<div>
<br>
</div>*
What regex expression would match all trailing <div><br></div> but not the ones sandwiched between useful divs containing text, i.e., <div> text (not html) </div>?
I have enclosed all expressions I want to match in asterisks. The asterisk are for reference only and are not part of my string.
Thanks,
Jack

You can use the pattern:
(?:<div>[\n\s]*<br>[\n\s]*<\/div>)(?!.*?<div>[^<]+<\/div>)
You can try it here.
Let me know if this works for all your cases and I will write a detailed explanation of the pattern.

regex required for fetching value in </span>

I need a regex for fetching the value in the </span> tag
<span class="booking-id-value">U166097</span>
value required: U166097
can please someone suggest me. I have tried using
<span class="booking-id-value">(.+?)
but it is not deriving the desired result it display on "U"

I think you need to be more specific about your expected value - below I'll just accept alphabetic and numeric characters as value - and more flexible about your tag, then I can suggest you to use a regex like this:
/<\s*span.+?class\s*=\s*"\s*booking-id-value\s*".*?>/s*([A-Za-z0-9]+)\s*<\//
Regex Demo

? after the .+ makes it ungreedy, tells it to match as little as possible - and that’s just the first U in this case.
Remove the ?, and instead look for the closing </span> after (.*) to terminate what is matched correctly:
<span class="booking-id-value">(.+)<\/span>
https://regex101.com/r/vt4pgN/1/

Regex:
<span.*>(.*)<\/span>
Substitute with:
$1
Result

Regex Pattern to Match A Href and Remove

I am trying to create a regex to match all a href links that contain my domain and I will end up removing the links. It is working fine until I run into an a href link that has another HTML tag within the tag.
Regex Statement:
(<a[^<]*coreyjansen\.com[^<]*>)([^"]*?)(<\/a>)
It matches the a href links in this statement with no problem
Need a lawyer? Contact <span style="color: #000000">Random text is great Corey is awesome</span>
It is unable to match both of the a href links this statement:
<strong><a href="http://coreyjansen.com/"><img class="alignright size-full
wp-image-12" src="http://50h0.com/wp-content/uploads/2014/06/lawyers.jpg"
alt="lawyers" width="250" height="250" /></a>
I have been trying to play with the neglected character set with no luck. If I remove the neglected character set what ends up happening is it will match two links that are right after each other such as example 2 as one match.

The issue here is that [^<]*> matches everything up until last >. That's the greedy behaviour of * asterisk. You can make it non-greedy by appending ? after asterisk(which you already do in other part of your query). It will then match everything until first occurrence of >. Then you have to change the middle part of your regex too ie. to catch everything until first tag </a> like this:
(<a[^<]*coreyjansen\.com[^<]*?>)(.*?)(<\/a>)

Use below regex which matches only a tag
(<a[^>]*coreyjansen\.com[^>]*>)
Example data
<strong><a href="http://coreyjansen.com/"><img class="alignright size-full
wp-image-12" src="http://50h0.com/wp-content/uploads/2014/06/lawyers.jpg"
alt="lawyers" width="250" height="250" /><a href="http://coreyjansen.com/"/>
Above regex will match all three a tag with your required domain.
Try above on regex

I'm playing with the following regex and it seems to be working:
<a.*coreyjansen\.com.*</a>
it captures anything between anchor tags that contain your site name. I am using javascript pattern matching from www.regexpal.com, depending on the language it could be slightly different

You need to match start of tag <a then match address before > char. You are matching wrong char. When you match that, then everithing between <a> and </a> is displayed link. I don't know why you compare to not contain quotes, every tag attribute (in HTML5) has value inside quotes, so you need to match everything except link ending tag </a>. It's done by ((?!string to not match).)* and after that should follow </a>. The result regex is:
(<a[^>]*coreyjansen\.com[^>]*>)((?!<\/a>).)*(<\/a>)

how to match any string in Emacs regexp?

I'm referring to this page: http://ergoemacs.org/emacs/emacs_regex.html
which says that to capture a pattern in Emacs Regexp, you need to escape the paren like this: \(myPattern\).
It further says that the syntax for capturing a sequence of ASCII characters is [[:ascii:]]+
In my document, I'm trying to match all strings that occur between <p class="calibre3"> and </p>
So, following the syntax above, I do a replace-regexp for
<p class="calibre3">\([[:ascii:]]+\)</p>
but it finds no matches.
Suggestions?

Regexps are not good for general-purpose HTML parsing, but as paragraph tags cannot be validly nested, the following is going to be fine (provided the mark-up is valid & well-formed).
<p class="calibre3">\(.*?\)</p>
*? is the non-greedy zero-or-more repetitions operator, so it will match as little as possible -- in this case everything until the next </p> (as opposed to the greedy version, which would match everything until the final </p> in the text).
The [^<] approach is fine if it fits the data in question, but it won't work if there are other tags within the paragraphs.

You need to escape your angle brackets and I would use [^<] instead of [[:ascii]] like so:
\<p class="calibre3"\>([^<]+\)</p\>

<p class="calibre3">\([^<]\)+</p>
Source: #TooTone

How to write this regex expression

In my HTML I have below tags:
<img src="../images/img.jpg" alt="sometext"/>
Using regex expression I want to remove alt=""
How would I write this?
Update
Its on movable type. I have to write it a like so:(textA is replaced by textB)
regex_replace="textA","textB"

Why don't you just find 'alt=""' and replace it with ' ' ?

On Movable Type try this:
regex_replace="/alt=""/",""
http://www.movabletype.org/documentation/developer/passing-multiple-parameters-into-a-tag-modifier.html

What regex you are asking for ? Straight away remove ..
$ sed 's/alt=""//'
<img src="../images/img.jpg" alt=""/>
<img src="../images/img.jpg" />
This does not requires a regex.

The following expression matches alt="sometext"
alt=".*?"
Note that if you used alt=".*" instead, and you had <img alt="sometext src="../images/img.jpg"> then you would match the whole string alt="sometext src="../images/img.jpg" (from alt=" to the last ").
The .* means: Match as much as you can.
The .*? means: Match as little as you can.

s/ alt="[^"]*"//

This regex_replace modifier should match any IMG tag with an alt attribute and capture everything preceding the alt attribute in group #1. The matched text is then replaced with the contents of group #1, effectively stripping off the alt attribute.
regex_replace='/(<img(?:\s+(?!alt\b)\w+="[^"]*")*)\s+alt="[^"]*"/g','$1'
Is that what you're looking for?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression: Remove first match pattern in front and behind certain text - regex

Related

How to Match Redundant Lines From Contenteditable Div in Regex

regex required for fetching value in </span>

Regex Pattern to Match A Href and Remove

how to match any string in Emacs regexp?

How to write this regex expression

Categories

Resources