I tried real hard to find solution but couldn't do. Yup regex is way too complex. Anyways here is problem.
Objective:
I want to replace image link with cdn image links in PHP. In order to do that I thought better is to use preg_replace.
if links is /var/b.png OR http://www.example.com/png it will be replaced with CDN but if case src or class contains 'captcha' then it shouldn't as these are dynamic in nature.
For start I am trying:
$_SERVER["HTTP_HOST"] = 'www.bring.com';
$preg_host = preg_quote($_SERVER["HTTP_HOST"], '/');
$content = preg_replace('/((\<image\s+.*?src\=)(["\']http\:\/\/'.$preg_host.')(\/.*?["\'](^(?=.*(captcha)))(.*)?\>))/i', '$2$3.nyud.net:8080$4', $content);
$content = preg_replace('/(\<image\s+.*?src\=["\'])(\/.*?["\'].*?\>)/i', '$1http://'.$_SERVER['HTTP_HOST'].'.nyud.net:8080$2', $content);
Condition is that:
When not to do: src can contain "captcha" word and in some cases class contains "captcha" and this class can ahead or src or behind src which is making it more complicated. In these cases I don't want to replace links for example:
$content = <<<END
<image
type="image" src="/skins/bph/customer/images/icons/go.gif" alt="Search" title="Search" class="go-button" />
<image
id="verification_image_login_login_popup_form" src="http://www.bring.com/index.php?dispatch=image.captcha&verification_id=%3Alogin_login_popup_form&login_login_popup_form4ef33269bf30b=" alt="" onclick="this.src += 'reload' ;" width="100" height="25" class="image-captcha valign" /></p><div
class="clear">
<image
id="verification_image_login_login_popup_form" class="valign" src="http://www.bring.com/skins/bph/customer/images/icons/go.gif" alt="" onclick="this.src += 'reload' ;" width="100" height="25" /></p><div
class="clear">
END;
So as a result:
Shouldn't be replaced, but is happening opposite :(
Following should get replace as it doesn't have any class with captcha or link with captcha word in it
<image
id="verification_image_login_login_popup_form" class="valign" src="http://www.bring.com/skins/bph/customer/images/icons/xxx" alt="" onclick="this.src += 'reload' ;" width="100" height="25" /></p>
Rather than trying to solve whole problem by using regex magic (which can bite you at unexpected times) it is highly recommended to use PHP DOM parser.
Using DOM parser iterate through all the images and examine their src and class attributes and make your link modification as needed.
You can see tons of examples on using DOM if you search it here on SO or on Google.
Related
I have generated my base64 string from my image and assigned it to : session['graph']
When I have this code, it works:
<pre>
<img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYA
..... " />
</pre>
This works as well:
def graph():
d=base64 string
return '<img src="data:image/png;base64,%s" width="640" height="480" border="0"/>' %(d)
But I want use jinja for dynamic display of my images and graphs.
Using jina, this code is not working: can you help me to fix it please? thanks
<pre>
<img src="data:image/png;base64,{{session['graph']}}" />
<pre>
or this one is also not working in html file:
<p>
'<img src="data:image/png;base64,%s" width="640" height="480" border="0"/>' %(d)
</p>
It might worth checking for {{seession['graph']}} not to accidentally include data:image/png;base64 part. Exactly my recent issue in a similar case
I am trying to find and replace in a regex code
<div class="gallery-image-container">
<div jstcache="1116"
class="gallery-image-high-res loaded"
style="width: 396px;
height: 264px;
background-image: url("https://lh5.googleusercontent.com/p/AF1QipMcTfMPZj_d5iip9WKtN2SQB9Je5U4rRB0nT_t8=s396-k-no");
background-size: 396px 264px;"
jsan="7.gallery-image-high-res,7.loaded,5.width,5.height,5.background-image,5.background-size">
</div>
</div>
In the code above I used This
(https:\/\/[^&]*)
To extract this URL
https://lh5.googleusercontent.com/p/AF1QipMcTfMPZj_d5iip9WKtN2SQB9Je5U4rRB0nT_t8=s396-k-no
I used This regex s\d{3} to get s396
Now I want to replace s396 to s1000 in the URL
Now am Stock and don't know how to go about it.
Please is there anyway all these can be done in just one regex code not multiple codes?
I would suggest using an HTML parser, but I understand sometimes that is not possible. Here is a little example in python.
import re
data = '''
<div class="gallery-image-container">
<div jstcache="1116"
class="gallery-image-high-res loaded"
style="width: 396px;
height: 264px;
background-image: url("https://lh5.googleusercontent.com/p/AF1QipMcTfMPZj_d5iip9WKtN2SQB9Je5U4rRB0nT_t8=s396-k-no");
background-size: 396px 264px;"
jsan="7.gallery-image-high-res,7.loaded,5.width,5.height,5.background-image,5.background-size">
</div>
</div>
'''
match = re.search("(https?://[^&]+)", data)
url = match.group(1)
url = re.sub("s\d{3}", "s1000", url)
print(url)
They key part is the regex of
(https?://[^&]+)
It is using a negative character class. It's saying, look for http with an optional s followed by :// and then all the non & You can use this site to play around with regexs:
https://regex101.com/r/b0APFA/1
I'm sure you could do a clever 1 liner nested regex to find and replace all at once, but it's going to be easier to troubleshoot if you have a few lines.
I have a situation where I need to differentiate two calls by the path in the source of a HTML. This is how the img tag looks like
<img src="/folder/12280218/160024536.images.jpg" />
I am planning to alter the source to
<img src="/folder/12280218/160024536.images.jpg/1" />
observe the "/1" at the end of src
I need this so that I can change the flow in the controller when I am serving this image.
This is what I have tried until now.
my $string = '<p><img src="/folder/12280218/160024536.images.jpg" /></p>';
$string =~ s/<img\s+src\=\"(.*)"\s+\/><\/p>/<img src\=\"$1\/1" \><\/p>/g;
This is working as long as the $string looks like this.
In our application, user has the ability to alter the HTML input using CKEditor.
He can alter the image tag by adding width="800" before or after the src attribute. I want the regular expression to handle all these situations.
Please let me know how to proceed.
Thanks in advance.
Replace :
(<img.*src="[^"]*)(".*\/>)
by
$1/1$2
Demo here
Edit : Changed the regex to handle situations with other attributes (like the "width" part)
I've got a bunch of strings already separated from an HTML file, examples:
<img alt="" src="//i.imgur.com/tApg8ebb.jpg" title="Some manly skills for you guys<p><span class='points-q7Vdm'>18,736</span> <span class='points-text-q7Vdm'>points</span> : 316,091 views</p>">
<img src="//i.imgur.com/SwmwL4Gb.jpg" width="48" height="48">
<img src="//s.imgur.com/images/blog_rss.png">
I am trying to make a regular expression that will grab the src="URL" part of the img tag so that I can replace it later based on a few other conditions. The many instances of quotation marks are giving me the biggest problem, I'm still relatively new with Regex, so a lot of the tricks are out of my knowledge,
Thanks in advance
Use DOM or another parser for this, don't try to parse HTML with regular expressions.
Example:
$html = <<<DATA
<img alt="" src="//i.imgur.com/tApg8ebb.jpg" title="Some manly skills for you guys<p><span class='points-q7Vdm'>18,736</span> <span class='points-text-q7Vdm'>points</span> : 316,091 views</p>">
<img src="//i.imgur.com/SwmwL4Gb.jpg" width="48" height="48">
<img src="//s.imgur.com/images/blog_rss.png">
DATA;
$doc = new DOMDocument();
$doc->loadHTML($html); // load the html
$xpath = new DOMXPath($doc);
$imgs = $xpath->query('//img');
foreach ($imgs as $img) {
echo $img->getAttribute('src') . "\n";
}
Output
//i.imgur.com/tApg8ebb.jpg
//i.imgur.com/SwmwL4Gb.jpg
//s.imgur.com/images/blog_rss.png
If you would rather store the results in an array, you could do..
foreach ($imgs as $img) {
$sources[] = $img->getAttribute('src');
}
print_r($sources);
Output
Array
(
[0] => //i.imgur.com/tApg8ebb.jpg
[1] => //i.imgur.com/SwmwL4Gb.jpg
[2] => //s.imgur.com/images/blog_rss.png
)
$pattern = '/<img.+src="([\w/\._\-]+)"/';
I'm not sure which language you're using, so quote syntax will vary.
My puzzle: as a PHP newby I am trying to extract some data from a string using a regular expression, but I cannot find a correct syntax.
The content of the string is scraped as html of several images from a website, I want the final output to be 3 seperate variables: "$Number1", "$Number2" and "$Status".
An example of the content of the input string $html:
<div id="system">
<img alt="2" height="35" src="/images/numbers/2.jpg" width="18" /><img alt="2" height="35" src="/images/numbers/2.jpg" width="18" /><img alt=".5" height="35" src="/images/numbers/point5.jpg" style="margin-left: -4px" width="26" /><img alt="system statusA" height="35" src="/images/numbers/statusA.jpg" width="37" /><img alt="2" height="35" src="/images/numbers/2.jpg" width="18" /><img alt="1" height="35" src="/images/numbers/1.jpg" width="18" /><img alt=".0" height="35" src="/images/numbers/point0.jpg" style="margin-left: -4px" width="26" />
</div>
The possible values which can appear in this string are:
0.jpg
1.jpg
2.jpg
3.jpg
4.jpg
5.jpg
6.jpg
7.jpg
8.jpg
9.jpg
point0.jpg
point5.jpg
statusA.jpg
statusB.jpg
statusC.jpg
statusD.jpg
statusE.jpg
statusF.jpg
The result should be variables:
"Number1" (XX.X) based upon the first two numbers (0-9) and .0 or .5
"Status" (statusX) based upon the status
"Number2" (XX.X) based upon the last two numbers (0-9) and .0 or .5
Code so far:
$regex = '\balt='(.*?)';
preg_match($regex,$html,$match);
var_dump($match);
echo $match[0];
Probably I have to do this in multiple steps or use another function, who can help me?
The very first thing that you should ask yourself is: "in what format is my input data". Since in this case it is clearly a snippet of HTML, you should feed that snippet to an HTML parser, and not to a regular expression engine.
I don't know the exact function names, but your code should look like this:
$htmltext = '<div id="system">[...]</div>';
$htmltree = htmlparser_parse($htmltext);
$images = $htmltree->find_all('img');
foreach ($images as $image) {
echo $image->src;
}
So you need to find an HTML parser that parses a string into a tree of nodes. The nodes should have methods for finding node inside them based on CSS classes, element names or node IDs. For Python this library is called BeautifulSoup, for Java it is JSoup, and I'm sure that there is something similar for PHP.
The examples provided with simplehtmldom look promising.
Possibly DOM : http://www.php.net/manual/en/book.dom.php
See Robust and Mature HTML Parser for PHP too
You want just the alt's? Try this xpath example:
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXpath($doc);
foreach($xpath->query('//img/#alt') as $node){
echo $node->nodeValue."\n";
}