Replace local images with base 64

Replace local images with base 64 - regex

I am building some emails template in which I include some local pictures.
I am trying to write a shell script to replace turn my images to base64. So basically automatically turn:
<img width="100%" src="./img/my_image.gif" />
to
<img width="100%" src="data:image/gif;base64,XXXXXXX" />
For now I used this script:
#!/bin/bash
awk -F'[()]' -v q="'" '
/src="(.*)"/ {
cmd=sprintf("openssl enc -base64 -in %s | tr -d %c\\n%c",$2,q,q)
cmd | getline b64
close(cmd)
$0=$1 "(data:image/gif;base64," b64 ");"
}1' ./my_template.html
I run into two issues:
- my regex doesn't seem to be correct even though it worked fine on regex101
- this regex would also catch the images which are not local (src="https://....")
How can I tweak it to make it work here?

I don't know your file, but IMHO editing html using awk isn't the best idea. In the general, i would to use better tool, like perl and such.
Here is an example using xmlstarlet. The following script:
#!/bin/bash
htmlfile=t.html
encode_image() {
local img="$1"
ext="${img##*.}"
printf "data:image/%s;base64,%s" "$ext" $(openssl base64 -A -in "$img")
}
while read -r src; do
encoded=$(encode_image "$src")
xmlstarlet ed --inplace -u "//img[#src='$src']/#src" -v "$encoded" t.html
done < <(xmlstarlet sel -t -v '//img/#src' -n "$htmlfile")
from this t.html
<!DOCTYPE HTML>
<html>
<head>
<meta charset="UTF-8"/>
</head>
<body>
<p>bla</p>
<img width="100%" src="./img/my_image.gif" />
<p>otherbla</p>
<img width="100%" src="./img/my_image2.gif" />
</body>
</html>
create this:
<!DOCTYPE HTML>
<html>
<head>
<meta charset="UTF-8"/>
</head>
<body>
<p>bla</p>
<img width="100%" src="data:image/gif;base64,iVB....="/>
<p>otherbla</p>
<img width="100%" src="data:image/gif;base64,iVBO...="/>
</body>
</html>
Of course, the HTML must be correctly formatted, otherwise the parser will die.

Related

How can I fix this regex in order to get html tag only from a particular url?

Hello I have a html file with several img tags:
<img src="https://www.pokeyplay.com/imagenes/backend/publicidad.gif" alt="Publicidad" align="left" />
<img src="https://www.pokeyplay.com/imagenes/backend/spacer.gif" alt="sp" />
<img src="imagenes/backend/etiqueta-pyp-pokedex.gif" alt="P&P PokéDex" width="184" height="100" />
<img src="imagenes/backend/spacer.gif" alt="sp" />
<img src="http://urpgstatic.com/img_library/pokemon_sprites/187.png" style="vertical-align:middle" />
In order to stract all img tags I am using the following regexp:
'<img[^>]* src=\"([^\"]*)\"[^>]*>'
But I want to extract only all IMG tags from urpgstatic.com
How can do this?
I did several tries like this:
<img.*?src="(http[s]?:\/\/)urpgstatic.com?([^\/\s]+\/)(.*)[png]$"[^\>]+>
Thanks

Try this
<img[^>]*(?=\"https?:\/\/(www\.)?urpgstatic\.com)\"([^\"]*)\"[^>]*>
Demo
Also, this will work with grep
grep -iP '<img[^>]*(?=\"https?:\/\/(www\.)?urpgstatic\.com)\"([^\"]*)\"[^>]*>' index.html

You may use this grep command:
grep -ioE '<img [^>]*src="https?://(www\.)?urpgstatic\.com/[^>]*>' file.html
<img src="http://urpgstatic.com/img_library/pokemon_sprites/187.png" style="vertical-align:middle" />
Though please remember that parsing HTML using regex may be error prone and using a HTML parser such as DOM in php is more reliable.
RegEx Details:
<img [^>]*src=: Match <img <anything-except->src= text
"https?://: Match http://orhttps://`
(www\.)?urpgstatic\.com/: Match optional www. followed by urpgstatic.com/

Removing newline with space character using sed or tr command in script tag in a file

This is my test file. In this file, there is a newline character after a script tag. I want to remove that character with space character. I want to write a shell script so that this kind of errors can be remove.
<html>
<head>
<script
type="text/javascript" sfsf="test" src="http://test.mydomain.com/test"></script>
</head>
<body>
<script type="text/javascript" src = "http://test.com/public//test"></script>
</body>
</html>
The output should be like this:
<html>
<head>
<script type="text/javascript" sfsf="test" src="http://test.mydomain.com/test"></script>
</head>
<body>
<script type="text/javascript" src = "http://test.com/public//test"></script>
</body>
</html>
i find the solution my self
cat file | tr '\n' ' ' | sed 's/> \?/>\n/g'

If you can use perl:
perl -n0e 's/(script)(\s+)/$1 /g; print' file
or you can omit the "print" command if you use -p instead of -n:
perl -p0e 's/(script)(\s+)/$1 /g;' file

Use tr to remove newlines and carriage returns and strip spaces after script tag with sed.
cat file | tr -d '\n' | td -d '\r' | sed 's#</script>[[:space:]]+##g'

Delete the content between HTML tags including the tags themselves in Perl

There are about 100 files and I need to go through each of them and delete all the data which is between <style> and </style> + delete these tags too.
For example
<html>
<head> <title> Example </title> </head>
<style>
p{color: red;
background-color: #FFFF;
}
div {......
...
}
</style>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>
should become
<html>
<head> <title> Example </title> </head>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>
Also, in some files the style pattern is like
<style type="text/css"> blah </style>
or
<link rel="stylesheet" type="text/css" href="$url_path/gridsorting.css">
I need to remove all 3 patterns. How do I do this in Perl?

use strict;
use warnings;
use XML::LibXML qw( );
my $qfn = 'a.html';
my $doc = XML::LibXML->load_html( location => $qfn );
my $root = $doc->documentElement();
for my $style_node ($root->findnodes('//style')) {
$style_node->parentNode()->removeChild($style_node);
}
{
open(my $fh, '>', $qfn)
or die;
print($fh $doc->toStringHTML());
}
It correctly handles:
style elements with attributes or spaces in the tag,
style elements that span more than one line,
style tags that span more than one line,
lines that contain part of a style element and something else,
documents with multiple style elements,
something that looks like a style tags in attribute values,
something that looks like a style tags in CDATA blocks, and
something that looks like a style tags in comments.
As of this update, the other solutions only handle 2 or 3 of these.

Ikegami is right, you really should use at least an HTML/XML parser to do this task. Personally I like using the Mojo::DOM parser. This is a Document-Object Model interface to your HTML and it supports CSS3 selectors, making it really flexible when you need it. This is a pretty easy one for it however:
#!/usr/bin/env perl
use strict;
use warnings;
use Mojo::DOM;
my $content = <<'END';
<html>
<head> <title> Example </title> </head>
<style>
p{color: red;
background-color: #FFFF;
}
div {......
...
}
</style>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>
END
my $dom = Mojo::DOM->new( $content );
$dom->find('style')->pluck('remove');
print $dom;
The pluck method is a little confusing, but its really just a shorthand for the doing a method on each resultant object. The analogous line could be
$dom->find('style')->each(sub{ $_->remove });
which is a little more understandable but less cute.
After reading your edit that you have to deal with more that just your basic form, I have to stress even further that this is why you use a parser for modifying HTML rather than let your regex grow to ridiculous proportions.
Now lets say that the $content variable also contained these lines
<link rel="stylesheet" type="text/css" href="$url_path/gridsorting.css">
<link rel="icon" href="somefile.jpg">
where you want to remove the first one, and not the second. You can do this in one of two ways.
$dom->find('link')->each( sub{ $_->remove if $_->{rel} eq 'stylesheet' } );
This mechanism uses the object methods (and Mojo::DOM exposes attributes as hash keys) to remove only the link tags which have rel=stylesheet. You can however use CSS3 selectors to only find those elements, however, and since Mojo::DOM has full CSS3 selector support you can do
$dom->find('link[rel=stylesheet]')->pluck('remove');
CSS3 selector statements can be joined with a comma to find all tags matching either selector, so we can simply include the line
$dom->find('style, link[rel=stylesheet]')->pluck('remove');
and get rid of all your offensive stylesheets in one fell swoop!

One more possible solution is to use HTML::TreeBuilder.
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder 5; # Ensure weak references in use
foreach my $file_name (#ARGV) {
my $tree = HTML::TreeBuilder->new; # empty tree
$tree->parse_file($file_name);
# print "Hey, here's a dump of the parse tree of $file_name:\n";
# $tree->dump; # a method we inherit from HTML::Element
foreach my $e ($tree->look_down(_tag => "style")) {
$e->delete();
}
foreach my $e ($tree->look_down(_tag => "link", rel => "stylesheet")) {
$e->delete();
}
print "And here it is, bizarrely rerendered as HTML:\n",
$tree->as_HTML, "\n";
# Now that we're done with it, we must destroy it.
$tree = $tree->delete; # Not required with weak references
}

One way using sed:
sed '/<style>/,/<\/style>/d' file.txt
Results:
<html>
<head> <title> Example </title> </head>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>

perl -lne 'print unless(/<style>/.../<\/style>/)' your_file
tested below:
> cat temp
<html>
<head> <title> Example </title> </head>
<style>
p{color: red;
background-color: #FFFF;
}
div {......
...
}
</style>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>
> perl -lne 'print unless(/<style>/.../<\/style>/)' temp
<html>
<head> <title> Example </title> </head>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>
>
if you want to do it inplace,then:
perl -i -lne 'print unless(/<style>/.../<\/style>/)' your_file

I figured out one way， you can try the following:
#! /usr/bin/perl -w
use strict;
my $line = << 'END';
<html>
<head> <title> Example </title> </head>
<style>
p{color: red;
background-color: #FFFF;
}
div {......
...
}
</style>
<body>
<p> hi I'm a paragraph. </p>
</body>
</html>
END
$line =~ s{<style[^>]*.*?</style>.}{}gs;
print $line;

What is the best way to load my own xhtml template in Codeigniter?

I'm new in Codeigniter, and I wonder how can I load my own XHTML template to be used, I was working on CakePHP earlier and it was pretty easy to add own template in Cake, but I switched to the Codeigniter, since I've read it's a lot better and has a 'better future'. I was searching on wiki, but tutorials there was providing not enough information for me.

put public folders in root directory,
index.php
application/
system/
images/
js/
css/
now include js like this: <script src="<?php echo base_url();?>js/jquery.js"></script>
for css: <link href="<?php echo base_url();?>css/style.css" rel="stylesheet" type="text/css" />
and for images: <img src="<?php echo base_url();?>images/1.jpg" />
the fastest and simplest way of display page, is as follows:
in controller:
$data['body'] = "welcome";
$this->load->view('page', $data);
now create page.php inside the view folder:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="es">
<head>
<title>Template codeigniter</title>
<script src="<?php echo base_url();?>js/jquery.js"></script>
<link href="<?php echo base_url();?>css/style.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div>
<?=$body?>
</div>
<div class="clear"></div>
<div>Footer</div>
</div>
</body>
</html>

Loading templates is best described in the CI doc regarding templates.
http://codeigniter.com/user_guide/libraries/parser.html
$this->load->library('parser');
$data['val1'] = 'some string';
$data['val2'] = 2012;
$this->load->view('my_xhtml', $data);
Now, in your template, you will have PHP vars of $val1 & $val2 you can use in dynamic elements of your html.

How to tell Facebook what images to offer a user when posting a link to my web page? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How does Facebook Sharer select Images?
When a user posts a link to my webpage to share on Facebook, Facebook scans my webpage/site and offers up some images found in that page. The user selects one to associate with the fb post.
Can I control which images on my web page facebook will offer up to the user to use in a post?
A feature that looks close is Open Graph Tags, http://developers.facebook.com/docs/opengraph/ , but it doesn't seem to suffice my particular need as it seems to allow just one image to represent that page.

Yes, you can do this by generating a "super proxy" this will create based on dynamic information your application provides the meta data needed by facebook sharer, you can check this working on this link: http://concursos.genommalab.com/soyflaca/ I made this website and since all the content comes from a single page to share every one of the items has a customize one I used a deep link technique and send it to my super proxy that generated the static content to the facebook sharer.php, here is the code:
<?php
// get our URL query parameters
$current_path = 'http://' . dirname($_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI']);
$title = $_GET['t'];
$diet_id = $_GET['diet_id'];
$desciption = $_GET['desc'];
$image_thumb = $_GET['thumb'];
$shared_url = $_GET['surl'];
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title><?php echo $title;?></title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<meta name="title" content="<?php echo $title;?>" />
<meta name="description" content="<?php echo $desciption;?>" />
<meta http-equiv="refresh" content="1;URL=<?php echo $current_path . '/#diet_' . $diet_id; ?>" />
<link rel="<?php echo $image_thumb; ?>" />
</head>
<body>
<img src="<?php echo $image_thumb; ?>" alt="<?php echo 'Imagen de ' . $title; ?>" width="112" height="112" style="visibility: hidden;"/>
</body>
</html>
Or perhaps all you want to do is add a list of images available to facebook, like this...
<link rel="images/example_1.jpg" />
<link rel="images/example_2.jpg" />
<link rel="images/example_3.jpg" />
<link rel="images/example_4.jpg" />
<link rel="images/example_5.jpg" />
Right inside <head> </head> tags. Remember this are scanned by facebook sharer.php

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Replace local images with base 64 - regex

Related

How can I fix this regex in order to get html tag only from a particular url?

Removing newline with space character using sed or tr command in script tag in a file

Delete the content between HTML tags including the tags themselves in Perl

What is the best way to load my own xhtml template in Codeigniter?

How to tell Facebook what images to offer a user when posting a link to my web page? [duplicate]

Categories

Resources