I am trying to do a search and replace using GREP/Regex
Here is what I am searching for
<div align="center" class="orange-arial-11"><b>.+<br>
I want to remove the <b>, <br> tags, and place <h3> tags around what .+ finds.
But I can't get what .+ finds to stay when it does the replace.
For example, I want to find this
<div align="center" class="orange-arial-11"><b>This is the section I want intact<br>
to change to this
<div align="center" class="orange-arial-11"><h3>This is the section I want intact</h3>
Any help is appreciated.
Use sed instead of grep:
# Modify the file in-place
sed -i~ 's|\(<div align="center" class="orange-arial-11">\)<b>\(.\+\)<br>|\1<h3>\2</h3>|' the-file
It depends exactly what system you're using, but if you put something in parenthesis you can refer to it later
So it might be something like
s/<b>(.+)<br>/<h3>\1<\/h3>/
In TextWrangler:
search for:
<div align="center" class="orange-arial-11"><b>(.+?)<br>
replace with:
<div align="center" class="orange-arial-11"><h3>\1</h3>
The '\1' will be replaced with the string matched inside the parens in the search pattern.
Related
I'm trying to find a way to make a list of everything between
<right>and </> tags
This is my regex:
<right>([^<\/>]*)<\/>?
I'm currently missing this case:
when there is only 1 tag in front: <right>hi
when interlocked <right><right>hello</></> (for the case of nesting, the content inside will be viewed as text, the next card is processed as usual)
it will look like this:
case1:
<div style="text-align: right">
hi
</>
case2:
<div style="text-align: right">
<div style="text-align: right">
hello
</>
</>
This is where i try regex: https://regex101.com/r/ybO9cV/1
thanks for your help !
Mostly bad news. To get the matches that are missing the trailing </> use /<right>([^<\/>]*)(<\/>)?/g
The nesting requirement cannot be satisfied with regex - can-regular-expressions-be-used-to-match-nested-patterns
I have a wordpress blog(s), in which I am trying to use plugin named 'Search Regex' to remove a part of post text, that I've placed as advertisement on the bottom (don't ask)
Now, I'm spinning around for a few months trying to find a proper answer on this, I know mysql doesn't support regex fully, and it would probably be painful to even try doing it that way, so I decided to use this plugin.
My wordpress blog have couple of thousands of posts, with almost the same code on the bottom, and code looks something like this:
<!--more-->
<br />
<center>
<table width="100%">
<tbody>
<tr>
<td bgcolor="#000000" style="text-align: center; font-size: 16px; font-weight: bold;">
<a href="http://myaffiliate.com/?q2=affiliateid" target="_blank" rel="nofollow" ><img title="blabla" src="http://someimage.com/somewhere></a><br />
<b>Some random Blah</b>
</td>
</tr>
</tbody>
</table>
</center>
Everything outside is fixed and doesn't change, and everything inside tags is changing with each post.
Now, this is multiline text, and I SERIOUSLY have problem finding a matching string, and even tool, that could help me solving this puzzle. I have this spreaded on several blogs, and I figured that instead of spamming my blog with same repetitive code that makes 1/3rd of all content I have in my posts, I can simply include it through single.php
So, I want this permanently deleted from my database.
Thanks in advance for help.
I'm not an expert on blogs or regex, but couldn't you use a python script to substitute all that text to replace it?
Something like this
re.sub(r' <!--more-->
<br />
<center>
<table width="100%">
<tbody>
<tr>
[\.]+
</td>
</tr>
</tbody>
</table>
</center>
', "")
If I'm not wrong, that should find all the cases where it finds an ad and replace it with a blank string :)
You could try something like this (always take a backup before testing)
global $wpdb;
$posts=$wpdb->get_results("SELECT ID,post_content FROM {$wpdb->posts} LIMIT 0,1");
foreach($posts as $p){
$pos = strpos($p->post_content, '<!--more-->');
$more_content = substr($p->post_content,$pos);
if($pos !== false && strstr($more_content,"myaffiliate.com" && $p->ID>0)){
$content = substr($p->post_content,0,$pos-1);
$wpdb->query($wpdb->prepare("UPDATE {$wpdb->posts} SET post_content='%s' WHERE ID=%d",$content,$p->ID));
}
}
This is untested, but you get the idea.
This will remove the post content part after <!--more--> if it contains the string "myaffiliate.com" (this is somewhat lazy matching, but we could refine it with preg_match or preg_replace if you need it).
You could use LIMIT 0,1 while testing and then increase it to your needs.
so im trying to figure out regular expressions in Flex, but cant for the life of me figure out how to do the following.
From the sample below, i need to extract out only "Mike_Mercury".
So i have to somehow strip out everything around it with RegExp, or whatever's best. Also, I would need it to work with other samples as well. Im getting this from the reddit api, so id have to extract that same section from a whole bunch of these. Thanks!
<table>
<tr>
<td>
<a href="http://www.reddit.com/r/atheism/comments/q2sfe/barack_obamas_insightful_words_on_abortion/">
<img src="http://d.thumbs.redditmedia.com/9StfiHi7hEbf8v73.jpg" alt="Barack Obama's insightful words on abortion"
title="Barack Obama's insightful words on abortion" /></a>
</td>
<td>
submitted by Mike_Mercury
to atheism
<br />
[link] <a href="http://www.reddit.com/r/atheism/comments/q2sfe/barack_obamas_insightful_words_on_abortion/">
[1722 comments]</a>
</td>
</tr>
</table>
Try this regex:
submitted by (.*?)
I'm trying to convert a html file with 100 of entries like this one:
<table>
<tr>
<td valign="top" width="30">
1.</td>
<td>
TEXT DESCRIPTION
</td>
</tr>
</table>
<table><tr><td></td></tr></table>
where the number "1." goes from 1 to 100, into this:
<li>
TEXT DESCRIPTION
</li>
I haven't find a way to do this, neither with regexp nor with extended search mode. Any ideas?
You could start with this:
Replace
.*<td>(.*[A-Za-z]+.*)<\/td>.*
with
<li>\1</li>
This will match one chunk of code of the form you reported. You must modify it to match multiple chunks of the same form in the same file.
Moreover to work correctly we should make it match lazily. Someone who knows how?
I am trying to extract the contents of the table using Regex.
I have removed most of the tags from the table, i am stuck with <br> , <a href >, <img > & <b> How to remove them ??
for <b> tag i tried this Regex
\s*<b[^>]*>\s*
(?<value>.*?)
\s* </b>\s*
it worked for some lines and some its giving the out put as
<b class="saadirheader">Email:</b>
Can anyone help me removing these tags
<br> , <a href >, <img > and <b>
Full Tags :-
<img src="Newrecord_files/spacer.gif" alt="" border="0" height="1" width="5">
<a href="mailto:first.last#email.org">
Thanking you,
Naveen HS
Use the following Regex:
(?:<br|<a href|<img|<b)(?:.(?!>))*.>
This Regex will match all the tags you mentioned above, and if there are more tags you forgot to mention just add a "|" sign with the tag you want to add, and insert it into the first parentheses.