I am building some emails template in which I include some local pictures.
I am trying to write a shell script to replace turn my images to base64. So basically automatically turn:
<img width="100%" src="./img/my_image.gif" />
to
<img width="100%" src="" />
For now I used this script:
#!/bin/bash
awk -F'[()]' -v q="'" '
/src="(.*)"/ {
cmd=sprintf("openssl enc -base64 -in %s | tr -d %c\\n%c",$2,q,q)
cmd | getline b64
close(cmd)
$0=$1 "(data:image/gif;base64," b64 ");"
}1' ./my_template.html
I run into two issues:
- my regex doesn't seem to be correct even though it worked fine on regex101
- this regex would also catch the images which are not local (src="https://....")
How can I tweak it to make it work here?
I don't know your file, but IMHO editing html using awk isn't the best idea. In the general, i would to use better tool, like perl and such.
Here is an example using xmlstarlet. The following script:
#!/bin/bash
htmlfile=t.html
encode_image() {
local img="$1"
ext="${img##*.}"
printf "data:image/%s;base64,%s" "$ext" $(openssl base64 -A -in "$img")
}
while read -r src; do
encoded=$(encode_image "$src")
xmlstarlet ed --inplace -u "//img[#src='$src']/#src" -v "$encoded" t.html
done < <(xmlstarlet sel -t -v '//img/#src' -n "$htmlfile")
from this t.html
<!DOCTYPE HTML>
<html>
<head>
<meta charset="UTF-8"/>
</head>
<body>
<p>bla</p>
<img width="100%" src="./img/my_image.gif" />
<p>otherbla</p>
<img width="100%" src="./img/my_image2.gif" />
</body>
</html>
create this:
<!DOCTYPE HTML>
<html>
<head>
<meta charset="UTF-8"/>
</head>
<body>
<p>bla</p>
<img width="100%" src="....="/>
<p>otherbla</p>
<img width="100%" src="...="/>
</body>
</html>
Of course, the HTML must be correctly formatted, otherwise the parser will die.
Related
Hello I have a html file with several img tags:
<img src="https://www.pokeyplay.com/imagenes/backend/publicidad.gif" alt="Publicidad" align="left" />
<img src="https://www.pokeyplay.com/imagenes/backend/spacer.gif" alt="sp" />
<img src="imagenes/backend/etiqueta-pyp-pokedex.gif" alt="P&P PokéDex" width="184" height="100" />
<img src="imagenes/backend/spacer.gif" alt="sp" />
<img src="http://urpgstatic.com/img_library/pokemon_sprites/187.png" style="vertical-align:middle" />
In order to stract all img tags I am using the following regexp:
'<img[^>]* src=\"([^\"]*)\"[^>]*>'
But I want to extract only all IMG tags from urpgstatic.com
How can do this?
I did several tries like this:
<img.*?src="(http[s]?:\/\/)urpgstatic.com?([^\/\s]+\/)(.*)[png]$"[^\>]+>
Thanks
Try this
<img[^>]*(?=\"https?:\/\/(www\.)?urpgstatic\.com)\"([^\"]*)\"[^>]*>
Demo
Also, this will work with grep
grep -iP '<img[^>]*(?=\"https?:\/\/(www\.)?urpgstatic\.com)\"([^\"]*)\"[^>]*>' index.html
You may use this grep command:
grep -ioE '<img [^>]*src="https?://(www\.)?urpgstatic\.com/[^>]*>' file.html
<img src="http://urpgstatic.com/img_library/pokemon_sprites/187.png" style="vertical-align:middle" />
Though please remember that parsing HTML using regex may be error prone and using a HTML parser such as DOM in php is more reliable.
RegEx Details:
<img [^>]*src=: Match <img <anything-except->src= text
"https?://: Match http://orhttps://`
(www\.)?urpgstatic\.com/: Match optional www. followed by urpgstatic.com/
This is my test file. In this file, there is a newline character after a script tag. I want to remove that character with space character. I want to write a shell script so that this kind of errors can be remove.
<html>
<head>
<script
type="text/javascript" sfsf="test" src="http://test.mydomain.com/test"></script>
</head>
<body>
<script type="text/javascript" src = "http://test.com/public//test"></script>
</body>
</html>
The output should be like this:
<html>
<head>
<script type="text/javascript" sfsf="test" src="http://test.mydomain.com/test"></script>
</head>
<body>
<script type="text/javascript" src = "http://test.com/public//test"></script>
</body>
</html>
i find the solution my self
cat file | tr '\n' ' ' | sed 's/> \?/>\n/g'
If you can use perl:
perl -n0e 's/(script)(\s+)/$1 /g; print' file
or you can omit the "print" command if you use -p instead of -n:
perl -p0e 's/(script)(\s+)/$1 /g;' file
Use tr to remove newlines and carriage returns and strip spaces after script tag with sed.
cat file | tr -d '\n' | td -d '\r' | sed 's#</script>[[:space:]]+##g'
There are about 100 files and I need to go through each of them and delete all the data which is between <style> and </style> + delete these tags too.
For example
<html>
<head> <title> Example </title> </head>
<style>
p{color: red;
background-color: #FFFF;
}
div {......
...
}
</style>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>
should become
<html>
<head> <title> Example </title> </head>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>
Also, in some files the style pattern is like
<style type="text/css"> blah </style>
or
<link rel="stylesheet" type="text/css" href="$url_path/gridsorting.css">
I need to remove all 3 patterns. How do I do this in Perl?
use strict;
use warnings;
use XML::LibXML qw( );
my $qfn = 'a.html';
my $doc = XML::LibXML->load_html( location => $qfn );
my $root = $doc->documentElement();
for my $style_node ($root->findnodes('//style')) {
$style_node->parentNode()->removeChild($style_node);
}
{
open(my $fh, '>', $qfn)
or die;
print($fh $doc->toStringHTML());
}
It correctly handles:
style elements with attributes or spaces in the tag,
style elements that span more than one line,
style tags that span more than one line,
lines that contain part of a style element and something else,
documents with multiple style elements,
something that looks like a style tags in attribute values,
something that looks like a style tags in CDATA blocks, and
something that looks like a style tags in comments.
As of this update, the other solutions only handle 2 or 3 of these.
Ikegami is right, you really should use at least an HTML/XML parser to do this task. Personally I like using the Mojo::DOM parser. This is a Document-Object Model interface to your HTML and it supports CSS3 selectors, making it really flexible when you need it. This is a pretty easy one for it however:
#!/usr/bin/env perl
use strict;
use warnings;
use Mojo::DOM;
my $content = <<'END';
<html>
<head> <title> Example </title> </head>
<style>
p{color: red;
background-color: #FFFF;
}
div {......
...
}
</style>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>
END
my $dom = Mojo::DOM->new( $content );
$dom->find('style')->pluck('remove');
print $dom;
The pluck method is a little confusing, but its really just a shorthand for the doing a method on each resultant object. The analogous line could be
$dom->find('style')->each(sub{ $_->remove });
which is a little more understandable but less cute.
After reading your edit that you have to deal with more that just your basic form, I have to stress even further that this is why you use a parser for modifying HTML rather than let your regex grow to ridiculous proportions.
Now lets say that the $content variable also contained these lines
<link rel="stylesheet" type="text/css" href="$url_path/gridsorting.css">
<link rel="icon" href="somefile.jpg">
where you want to remove the first one, and not the second. You can do this in one of two ways.
$dom->find('link')->each( sub{ $_->remove if $_->{rel} eq 'stylesheet' } );
This mechanism uses the object methods (and Mojo::DOM exposes attributes as hash keys) to remove only the link tags which have rel=stylesheet. You can however use CSS3 selectors to only find those elements, however, and since Mojo::DOM has full CSS3 selector support you can do
$dom->find('link[rel=stylesheet]')->pluck('remove');
CSS3 selector statements can be joined with a comma to find all tags matching either selector, so we can simply include the line
$dom->find('style, link[rel=stylesheet]')->pluck('remove');
and get rid of all your offensive stylesheets in one fell swoop!
One more possible solution is to use HTML::TreeBuilder.
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder 5; # Ensure weak references in use
foreach my $file_name (#ARGV) {
my $tree = HTML::TreeBuilder->new; # empty tree
$tree->parse_file($file_name);
# print "Hey, here's a dump of the parse tree of $file_name:\n";
# $tree->dump; # a method we inherit from HTML::Element
foreach my $e ($tree->look_down(_tag => "style")) {
$e->delete();
}
foreach my $e ($tree->look_down(_tag => "link", rel => "stylesheet")) {
$e->delete();
}
print "And here it is, bizarrely rerendered as HTML:\n",
$tree->as_HTML, "\n";
# Now that we're done with it, we must destroy it.
$tree = $tree->delete; # Not required with weak references
}
One way using sed:
sed '/<style>/,/<\/style>/d' file.txt
Results:
<html>
<head> <title> Example </title> </head>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>
perl -lne 'print unless(/<style>/.../<\/style>/)' your_file
tested below:
> cat temp
<html>
<head> <title> Example </title> </head>
<style>
p{color: red;
background-color: #FFFF;
}
div {......
...
}
</style>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>
> perl -lne 'print unless(/<style>/.../<\/style>/)' temp
<html>
<head> <title> Example </title> </head>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>
>
if you want to do it inplace,then:
perl -i -lne 'print unless(/<style>/.../<\/style>/)' your_file
I figured out one way, you can try the following:
#! /usr/bin/perl -w
use strict;
my $line = << 'END';
<html>
<head> <title> Example </title> </head>
<style>
p{color: red;
background-color: #FFFF;
}
div {......
...
}
</style>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>
END
$line =~ s{<style[^>]*.*?</style>.}{}gs;
print $line;
I'm new in Codeigniter, and I wonder how can I load my own XHTML template to be used, I was working on CakePHP earlier and it was pretty easy to add own template in Cake, but I switched to the Codeigniter, since I've read it's a lot better and has a 'better future'. I was searching on wiki, but tutorials there was providing not enough information for me.
put public folders in root directory,
index.php
application/
system/
images/
js/
css/
now include js like this: <script src="<?php echo base_url();?>js/jquery.js"></script>
for css: <link href="<?php echo base_url();?>css/style.css" rel="stylesheet" type="text/css" />
and for images: <img src="<?php echo base_url();?>images/1.jpg" />
the fastest and simplest way of display page, is as follows:
in controller:
$data['body'] = "welcome";
$this->load->view('page', $data);
now create page.php inside the view folder:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="es">
<head>
<title>Template codeigniter</title>
<script src="<?php echo base_url();?>js/jquery.js"></script>
<link href="<?php echo base_url();?>css/style.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div>
<?=$body?>
</div>
<div class="clear"></div>
<div>Footer</div>
</div>
</body>
</html>
Loading templates is best described in the CI doc regarding templates.
http://codeigniter.com/user_guide/libraries/parser.html
$this->load->library('parser');
$data['val1'] = 'some string';
$data['val2'] = 2012;
$this->load->view('my_xhtml', $data);
Now, in your template, you will have PHP vars of $val1 & $val2 you can use in dynamic elements of your html.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How does Facebook Sharer select Images?
When a user posts a link to my webpage to share on Facebook, Facebook scans my webpage/site and offers up some images found in that page. The user selects one to associate with the fb post.
Can I control which images on my web page facebook will offer up to the user to use in a post?
A feature that looks close is Open Graph Tags, http://developers.facebook.com/docs/opengraph/ , but it doesn't seem to suffice my particular need as it seems to allow just one image to represent that page.
Yes, you can do this by generating a "super proxy" this will create based on dynamic information your application provides the meta data needed by facebook sharer, you can check this working on this link: http://concursos.genommalab.com/soyflaca/ I made this website and since all the content comes from a single page to share every one of the items has a customize one I used a deep link technique and send it to my super proxy that generated the static content to the facebook sharer.php, here is the code:
<?php
// get our URL query parameters
$current_path = 'http://' . dirname($_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI']);
$title = $_GET['t'];
$diet_id = $_GET['diet_id'];
$desciption = $_GET['desc'];
$image_thumb = $_GET['thumb'];
$shared_url = $_GET['surl'];
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title><?php echo $title;?></title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<meta name="title" content="<?php echo $title;?>" />
<meta name="description" content="<?php echo $desciption;?>" />
<meta http-equiv="refresh" content="1;URL=<?php echo $current_path . '/#diet_' . $diet_id; ?>" />
<link rel="<?php echo $image_thumb; ?>" />
</head>
<body>
<img src="<?php echo $image_thumb; ?>" alt="<?php echo 'Imagen de ' . $title; ?>" width="112" height="112" style="visibility: hidden;"/>
</body>
</html>
Or perhaps all you want to do is add a list of images available to facebook, like this...
<link rel="images/example_1.jpg" />
<link rel="images/example_2.jpg" />
<link rel="images/example_3.jpg" />
<link rel="images/example_4.jpg" />
<link rel="images/example_5.jpg" />
Right inside <head> </head> tags. Remember this are scanned by facebook sharer.php