I have a newsletter that contains a few image inside a div called nieuwsbrief-tekst. I want to find those images and add inline css code to it. I can find the div with preg_match, and I can also find the image tag itself, but adding the style="" to the image tag hasn't worked so far.
There is also more then one nieuwsbief-tekst div, these divs are the different content blocks, so there are 3 or 4 of them. I tried the preg_replace, but that has no effect.
Any tips or suggestions how to handle this?
So the html would look like this, and I only want add the style attribute to the images inside the div.
HTM Code:
<div class="nieuwsbrief-tekst">lorum ipmsum</div>
<div class="nieuwsbrief-tekst"><img src="#"></div>
<div class="nieuwsbrief-tekst">lorum ipmsum</div>
<div class="nieuwsbrief-tekst"><img src="#"></div>
PHP Code:
if(preg_match_all('/<div class="nieuwsbrief-tekst">(.*?)<\/div>/is', $var, $matches)) {
foreach($matches[0] as $match) {
if(preg_match('/<img[^>]+>/is', $match, $match_img)) {
echo 'image found';
$pattern = '/<img[^>]+>/is';
$replacement = '<img style="float:left; margin:0 10px 10px 0;';
$test = preg_replace($pattern, $replacement, $match_img);
}
}
echo '<pre>';
print($test);
echo '</pre>';
}
Thanks :)
this should accomplish what you want
$pattern = "/<div class=\"tmp\">(\\w+)?<img ([^>]+)>(\\w+)?<\/div>/is";
$replacement = "<div class=\"tmp\">\\1 <img style=\"float:left; margin:0 10px 10px 0; \\2 /> \\3</div>";
$str = preg_replace($pattern, $replacement, $str);
echo $str;
Just change the tmp to fit your needs :)
Related
use WWW::Mechanize;
mkdir "images";
$url = "https://www.somedomain.com/";
$mech = new WWW::Mechanize;
$mech->get($url);
$num = 1;
$year = 2019;
$number = 23;
$content = q{<P><div class="row" style="text-align:center"><a target="_blank" href="/image/SG0092a.gif"><img src="/image/SG0092a.gif" alt="graphic image" class="img-responsive graphic"/></a></div><div class="row" style="text-align:center"><a target="_blank" href="/image/SG0092b.gif"><img src="/image/SG0092b.gif" alt="graphic image" class="img-responsive graphic"/></a></div><div class="row" style="text-align:center"><a target="_blank" href="/image/SG0092c.gif"><img src="/image/SG0092c.gif" alt="graphic image" class="img-responsive graphic"/></a></div><div class="row" style="text-align:center"><a target="_blank" href="/image/SG0092d.gif"><img src="/image/SG0092d.gif" alt="graphic image" class="img-responsive graphic"/></a></div><div class="row" style="text-align:center"><a target="_blank" href="/image/SG0092e.gif"><img src="/image/SG0092e.gif" alt="graphic image" class="img-responsive graphic"/></a></div>};
while ($content =~ s/(<img.+?src=)"([^>]+?)\.([A-Za-z]+)"/$1"images\/${year}_${number}_$num.$3"/g)
{
$imageuri = "$2.$3";
print $imageuri, "\n";
$mech->get($imageuri);
$mech->save_content("images/${year}_${number}_$num.$3");
$num++;
}
print $content, "\n";
Is it possible to do the above in perl? I would like the src attributes of the img elements replaced with a new path and filename and for the image files to be downloaded and saved with that path and filename.
You could do the following (but you should really consider using a real HTML parser):
$content =~ s{(<img.+?src=)"([^>]+?)\.([A-Za-z]+)"}{
my $imageuri = "$2.$3";
print $imageuri, "\n";
$mech->get($imageuri);
my $file = "images/${year}_${number}_$num.$3";
$num++;
$mech->save_content($file);
qq($1"$file")
}eg;
The e modifier on the substitution operator makes perl parse the replacement part as a block of code, not a string.
Other notes:
Always start your Perl files with use strict; use warnings; or equivalent (e.g. use strict can be replaced by use v5.12.0 or higher).
Avoid indirect object syntax (new WWW::Mechanize). Use normal method calls instead (WWW::Mechanize->new).
Use local variables (e.g. my $num = 1;) unless you really need package variables.
Here is one way to do it with an HTML parser, HTML::TreeBuilder.
This changes the src attribute to the new value in the processed node and replaces that node in the tree with the changed copy, for all img tags.
use warnings;
use strict;
use feature 'say';
use HTML::TreeBuilder;
my $content = join '', <DATA>; # join in general (not needed with one line)
my ($num, $year, $number) = (1, 2019, 23);
my $new_src_base = "images/${year}_${number}_$num";
my $tree = HTML::TreeBuilder->new_from_content($content);
my #nodes = $tree->look_down(_tag => 'img');
for my $node (#nodes) {
my ($ext) = $node->attr('src') =~ m{.*/.*\.(.*)\z}; #/
my $orig_src = $node->attr('src', $new_src_base . ".$ext"); # change 'src'
$node->replace_with($node);
# my $imageurl = $orig_src; # fetch the image etc...
# $mech->get($imageurl);
}
say $tree->as_HTML; # to inspect; otherwise print to file
__DATA__
<P><div class="row" style="text-align:center"><a target="_blank" href="/image/SG0092a.gif"> <img src="/image/SG0092a.gif" alt="graphic image" class="img-responsive graphic"/></a></div> <div class="row" style="text-align:center"><a target="_blank" href="/image/SG0092b.gif"> <img src="/image/SG0092b.gif" alt="graphic image" class="img-responsive graphic"/></a></div> <div class="row" style="text-align:center"><a target="_blank" href="/image/SG0092c.gif"> <img src="/image/SG0092c.gif" alt="graphic image" class="img-responsive graphic"/></a></div> <div class="row" style="text-align:center"><a target="_blank" href="/image/SG0092d.gif"> <img src="/image/SG0092d.gif" alt="graphic image" class="img-responsive graphic"/></a></div> <div class="row" style="text-align:center"><a target="_blank" href="/image/SG0092e.gif"> <img src="/image/SG0092e.gif" alt="graphic image" class="img-responsive graphic"/></a></div>
For the new name of src attribute I copy what I can infer from the OP. The code in the question leaves href attribute of the link unchanged (path to the same gif) so this code leaves that, too.
There are other tools to do this with, see this post for more, for example.
The above could perhaps run into problems related to weak references in older versions, see documentation. Then this should be safer
for my $node (#nodes) {
my ($ext) = ( $node->attr('src') ) =~ m{.*/.*\.(.*)\z}; #/
my $copy = $node->clone;
my $orig_src = $copy->attr('src', $new_src_base . ".$ext");
$node->replace_with($copy)->delete;
...
}
Using Mojo::DOM:
use strict;
use warnings;
use Mojo::DOM;
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $dom = Mojo::DOM->new($content);
my $num = 1;
foreach my $img ($dom->find('img[src]')->each) {
next unless $img->{src} =~ m/\.([a-zA-Z]+)\z/;
my $ext = $1;
my $path = "images/${year}_${number}_$num.$ext";
$ua->get($img->{src})->result->save_to($path);
$img->attr(src => $path);
$num++;
}
print $dom->to_string;
I would like to place an if statement within an echo and I am not quite sure how to do it. Here is the echo:
if(!$hideProduct) {
echo '
<div class="gearBorder">
<img title="Sold Out" alt="Sold Out" class="soldOut" src="soldout.png" />':"").'
<div class="gearInfo">
<h4>' . $productName . '</h4>
<p class="gearDesc">'. $productDescription .'</p>
<p class="cost">$' . $productPrice . '</div>
</div>
</div>
';}
On line 3, I would like to wrap the image in an if statement:
if($productStatus = '0') {
}
What would be the best way to wrap the image in that statement? Thanks!
You can actually end control flow blocks like if statements outside of the same PHP block they were opened in. For example, this should work:
<?php if (!$hideProduct) { ?>
<div class="gearBorder">
<?php if ($productStatus == '0') { ?>
<img title="Sold Out" ... />
<?php } ?>
...HTML...
</div>
<?php } ?>
If you don't like the curly braces, you can also replace them with a colon (:) and endif, respectively. See this link for more information.
Use an array to hold the CSS classes (with background-image) for each $productStatus
Fast and efficient. There is a performance hot when you toggle from HTML mode to PHP mode. This method eliminates the if elseif performance hit in the Intel micro code.
<style type="text/css">
.soldout{width:40px;height:40px;background-image: url('soldout.png');}
.backorder{width:40px;height:40px;background-image: url('backorder.png');}
.instock{width:40px;height:40px;background-image: url('instock.png');}
</style>
$icon = array(' class="soldout" ',' class="backorder" ',' class="instock" ');.
echo '<div class="gearBorder"><div ' . $icon[$productStatus] . '></div><div class="gearInfo"> ... ';
I would also use a 4-bit color GIF icon and convert it to Base64 MIME
Make sure page is served with gZip and there will be little to no penalty for the Base64.
If you want to stick with image files, make sure images are served with a large cache max-age value.
background-image: url('data:image/gif;base64,R0lGODlhKAAoAK...');
I've got a bunch of strings already separated from an HTML file, examples:
<img alt="" src="//i.imgur.com/tApg8ebb.jpg" title="Some manly skills for you guys<p><span class='points-q7Vdm'>18,736</span> <span class='points-text-q7Vdm'>points</span> : 316,091 views</p>">
<img src="//i.imgur.com/SwmwL4Gb.jpg" width="48" height="48">
<img src="//s.imgur.com/images/blog_rss.png">
I am trying to make a regular expression that will grab the src="URL" part of the img tag so that I can replace it later based on a few other conditions. The many instances of quotation marks are giving me the biggest problem, I'm still relatively new with Regex, so a lot of the tricks are out of my knowledge,
Thanks in advance
Use DOM or another parser for this, don't try to parse HTML with regular expressions.
Example:
$html = <<<DATA
<img alt="" src="//i.imgur.com/tApg8ebb.jpg" title="Some manly skills for you guys<p><span class='points-q7Vdm'>18,736</span> <span class='points-text-q7Vdm'>points</span> : 316,091 views</p>">
<img src="//i.imgur.com/SwmwL4Gb.jpg" width="48" height="48">
<img src="//s.imgur.com/images/blog_rss.png">
DATA;
$doc = new DOMDocument();
$doc->loadHTML($html); // load the html
$xpath = new DOMXPath($doc);
$imgs = $xpath->query('//img');
foreach ($imgs as $img) {
echo $img->getAttribute('src') . "\n";
}
Output
//i.imgur.com/tApg8ebb.jpg
//i.imgur.com/SwmwL4Gb.jpg
//s.imgur.com/images/blog_rss.png
If you would rather store the results in an array, you could do..
foreach ($imgs as $img) {
$sources[] = $img->getAttribute('src');
}
print_r($sources);
Output
Array
(
[0] => //i.imgur.com/tApg8ebb.jpg
[1] => //i.imgur.com/SwmwL4Gb.jpg
[2] => //s.imgur.com/images/blog_rss.png
)
$pattern = '/<img.+src="([\w/\._\-]+)"/';
I'm not sure which language you're using, so quote syntax will vary.
I have blog post in wordpress for audio podcast like
[audio mp3="http://www.andrewbusch.com/wp-content/uploads/show_6195505.mp3"][/audio]
We talk the latest with forward guidance in the Federal Reserve with Wall Street Journal reporter, Jon Hilsenrath.
I want to make a download link also for the above mp3 url.
How to extract the url from the above content?
I tried like the below but it is not working
<div class="banner-text">
<h1><?php echo the_title(); ?></h1>
<?php the_content(); ?>
<?php
$content = get_the_content( $more_link_text, $stripteaser );
preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $content, $match);
$match = $match[0];
?>
<div class="banner-download"><span class="displace">Download</span></div>
</div>
Any help ?
Interesting question. My instinct would be to use as much of WordPress's built in shortcode functionality as possible to do the work. The get_post_galleries function seems to do something similar. So something like this seems to do what you want:
<?php
// Assumes $content is already set
preg_match_all( '/' . get_shortcode_regex() . '/s', $content, $matches, PREG_SET_ORDER );
foreach ($matches as $match) {
if ( 'audio' === $match[2] ) {
$tags = shortcode_parse_atts( $match[3] );
echo $tags['mp3'];
}
}
You might need to refine it depending on how you want to handle it if there are multiple audio shortcodes.
How do you find the following div using regex? The URL and image location will consistently change based on the post URL, so I need to use a wild card.
I must use a regular expression because I am limited in what I can use due to the software I am using: http://community.autoblogged.com/entries/344640-common-search-and-replace-patterns
<div class="tweetmeme_button" style="float: right; margin-left: 10px;"> <br /> <img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fjumpinblack.com%2F2011%2F11%2F25%2Fdrake-and-rick-ross-you-only-live-once-ep-mixtape-2011-download%2F&source=jumpinblack1&style=compact&b=2" height="61" width="50" /><br /> </div>
I tried using
<div class="tweetmeme_button" style="float: right; margin-left: 10px;">.*<\/div>
Using regular expression to process HTML is a bad idea. I'm using HTML::TreeBuilder::XPath for this.
use strict;
use warnings;
use HTML::TreeBuilder::XPath;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
$mech->get("http://www.someURL.com");
my $tree = HTML::TreeBuilder::XPath->new_from_content( $mech->content() );
my $div = $tree->findnodes( '//div[#class="tweetmeme_button"]')->[0];
Use an HTML parser to parse HTML.
HTML::TokeParser::Simple or HTML::TreeBuilder::XPath among many others.
E.g.:
#!/usr/bin/env perl
use strict;
use warnings;
use HTML::TokeParser::Simple;
my $parser = HTML::TokeParser::Simple->new( ... );
while (my $div = $parser->get_tag) {
next unless $div->is_start_tag('div');
{
no warnings 'uninitialized';
next unless $div->get_attr('class') eq 'tweetmeme_button';
next unless $div->get_attr('style') eq 'float: right; margin-left: 10px;'
# now do what you want until the next </div>
}
}