Regex to get src of iframe element

Regex to get src of iframe element - regex

I am trying to retrieve part of the src from different iframes from an HTML input.
So far, I've tried different methods but none of them works for all iframes. What I've tried so far:
<iframe(.*?)><\/iframe>
<iframe src="(.+?)".+</iframe>
<iframe.+?src=[\"'](.+?)[\"'].*?>
And here is a sample of iframe tags that I have:
<iframe src="http://www.youtube.com/embed/NM51qOpwcIM?modestbranding=1;rel=0;showinfo=0;autoplay=0;autohide=1;yt:stretch=16:9;wmode=transparent;?wmode=transparent" allowfullscreen="" style="width: 640px; height: 361.057px;" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/VASywEuqFd8?feature=oembed" allowfullscreen="" width="660" height="371" frameborder="0"></iframe>
Ideally, I would like to retrieve the src from the beginning and just before the first question mark (?) as such:
http://www.youtube.com/embed/NM51qOpwcIM

This can be achieved using
(?<=src=").*?(?=[\?"])
See working example on Regex101
Explanation
(?<=src=") Prepended by src="
.*? Lazy match any token
(?=[\?"]) Until either a ? or " would be the next token
If you might have a longer URL that doesn't end with ?
(?<=src=").*?(?=[\*"])

Related

How to find and replace in a regex code

I am trying to find and replace in a regex code
<div class="gallery-image-container">
<div jstcache="1116"
class="gallery-image-high-res loaded"
style="width: 396px;
height: 264px;
background-image: url("https://lh5.googleusercontent.com/p/AF1QipMcTfMPZj_d5iip9WKtN2SQB9Je5U4rRB0nT_t8=s396-k-no");
background-size: 396px 264px;"
jsan="7.gallery-image-high-res,7.loaded,5.width,5.height,5.background-image,5.background-size">
</div>
</div>
In the code above I used This
(https:\/\/[^&]*)
To extract this URL
https://lh5.googleusercontent.com/p/AF1QipMcTfMPZj_d5iip9WKtN2SQB9Je5U4rRB0nT_t8=s396-k-no
I used This regex s\d{3} to get s396
Now I want to replace s396 to s1000 in the URL
Now am Stock and don't know how to go about it.
Please is there anyway all these can be done in just one regex code not multiple codes?

I would suggest using an HTML parser, but I understand sometimes that is not possible. Here is a little example in python.
import re
data = '''
<div class="gallery-image-container">
<div jstcache="1116"
class="gallery-image-high-res loaded"
style="width: 396px;
height: 264px;
background-image: url("https://lh5.googleusercontent.com/p/AF1QipMcTfMPZj_d5iip9WKtN2SQB9Je5U4rRB0nT_t8=s396-k-no");
background-size: 396px 264px;"
jsan="7.gallery-image-high-res,7.loaded,5.width,5.height,5.background-image,5.background-size">
</div>
</div>
'''
match = re.search("(https?://[^&]+)", data)
url = match.group(1)
url = re.sub("s\d{3}", "s1000", url)
print(url)
They key part is the regex of
(https?://[^&]+)
It is using a negative character class. It's saying, look for http with an optional s followed by :// and then all the non & You can use this site to play around with regexs:
https://regex101.com/r/b0APFA/1
I'm sure you could do a clever 1 liner nested regex to find and replace all at once, but it's going to be easier to troubleshoot if you have a few lines.

Match only spefic url via regex

I want to match only this specific url
https://www.facebook.com/princessaustine.alcantara.3/about?lst=100002159119314%3A100022260619396%3A1507039852
Here's the source code
<div class="hidden_elem"><code id="u_0_17"><!-- <div class="fbTimelineTopSectionBase _6-d _529n"><div class="_5h60" id="pagelet_above_header_timeline" data-referrer="pagelet_above_header_timeline"></div><div id="above_header_timeline_placeholder"></div><div class="fbTimelineSection fbTimelineTopSection"><div id="fbProfileCover"><div class="cover" id="u_0_13"><a class="coverWrap coverImage" data-referrerid="100022260619396" href="https://www.facebook.com/photo.php?fbid=118243868927633&set=a.117907638961256.1073741827.100022260619396&type=3" rel="theater" ajaxify="https://www.facebook.com/photo.php?fbid=118243868927633&set=a.117907638961256.1073741827.100022260619396&type=3&size=1440%2C1080&source=10&player_origin=profile&referrer_profile_id=100022260619396" data-ploi="https://scontent.fmnl4-1.fna.fbcdn.net/v/t31.0-8/22136852_118243868927633_2950847275004458372_o.jpg?oh=fbcb3c8abc2023b35a5a36fb2989d850&oe=5A821DA8" title="Cover Photo" id="u_0_12" data-cropped="1"><img class="coverPhotoImg photo img" src="https://scontent.fmnl4-1.fna.fbcdn.net/v/t31.0-8/c0.81.851.315/p851x315/22136852_118243868927633_2950847275004458372_o.jpg?oh=7d0222f3c38b31acb33a7b1ffba2ac9e&oe=5A797385" style="top:0px;width:100%" data-fbid="118243868927633" alt="Cover Photo, Image may contain: 1 person, sitting" /><div class="coverBorder"></div><img class="coverChangeThrobber img" src="https://static.xx.fbcdn.net/rsrc.php/v3/yk/r/LOOn0JtHNzb.gif" alt="" width="16" height="16" /></a><div class="_2nlj _2xc6"><h1 class="_2nlv"><a class="_2nlw" href="https://www.facebook.com/princessaustine.alcantara.3"><span id="fb-timeline-cover-name" data-testid="profile_name_in_profile_page">Princess Austine Alcantara</span></a><span class="_2nly"></span></h1></div></div><div id="fbTimelineHeadline" class="clearfix"><div class="_50zj"><div class="actions _70j"><div class="_5h60 actionsDropdown" id="pagelet_timeline_profile_actions" data-referrer="pagelet_timeline_profile_actions"></div></div></div><div class="_70k"><ul class="_6_7 clearfix" data-referrer="timeline_light_nav_top" id="u_0_14"><li><a class="_6-6 _6-7" href="https://www.facebook.com/princessaustine.alcantara.3?lst=100002159119314%3A100022260619396%3A1507039852" data-tab-key="timeline">Timeline<span class="_513x"></span></a></li><li><a class="_6-6" href="https://www.facebook.com/princessaustine.alcantara.3/about?lst=100002159119314%3A100022260619396%3A1507039852" data-tab-key="about">About<span class="_513x"></span></a></li><li><a class="_6-6" href="https://www.facebook.com/princessaustine.alcantara.3/friends?lst=100002159119314%3A100022260619396%3A1507039852&source_ref=pb_friends_tl" data-tab-key="friends">Friends<span class="_gs6"><span id="u_0_10">7 Mutual</span></span><span class="_513x"></span></a></li><li><a class="_6-6" href="https://www.facebook.com/princessaustine.alcantara.3/photos?lst=100002159119314%3A100022260619396%3A1507039852&source_ref=pb_friends_tl" data-tab-key="photos">Photos<span class="_513x"></span></a></li><li><div class="_6a uiPopover _6-6 _9rx" id="u_0_15"><a class="_9ry _p" href="#" aria-haspopup="true" aria-expanded="false" rel="toggle" role="button" id="u_0_16">More<i class="_bxy img sp_AWfL8SqGWNa sx_41c408"></i></a></div></li></ul></div><div class="name"><div class="photoContainer"><div><a class="profilePicThumb" href="https://www.facebook.com/photo.php?fbid=116140922471261&set=a.116141002471253.1073741826.100022260619396&type=3&source=11&referrer_profile_id=100022260619396" rel="theater" id="u_0_11"><img class="profilePic img" alt="Princess Austine Alcantara's Profile Photo, Image may contain: 1 person, smiling, closeup" src="https://scontent.fmnl4-1.fna.fbcdn.net/v/t1.0-1/c0.0.160.160/p160x160/22050231_116140922471261_8103110572544919612_n.jpg?oh=d942ae339c7c9dc7c8add2e3dd34f6c4&oe=5A413CB6" /></a></div><meta content="https://scontent.fmnl4-1.fna.fbcdn.net/v/t1.0-1/p50x50/22050231_116140922471261_8103110572544919612_n.jpg?oh=e43d8f6e5cfb1387f1a5d864b7947225&oe=5A3CC115" itemprop="image" /></div></div></div></div></div>
I tried to use this regex code below but it also match other items inside. How can i match only that specific url? Thanks
The class is dynamic.
(?i)(?<=a class=".+" href=").*?(?=" data-tab-key="about)

If you want to match the href, you can use [^"]+ inside of href, this way you regex will not capture more than what you need as it will be stopped by ".
You can then create something like href="([^"]*?)" data-tab-key="about".
I'd suggest avoiding using regex to match html though.

Try..
(?i)a class=".+" href="\K.*?(?=" data-tab-key="about)
I believe you are struggling to get a variable length look behind to work, which is
(?<=a class=".+" href=")
.+ in the above is not a valid syntax as it introduces variable length in a look behind. This is not supported in any of the regex engines I know(I would be happy to know if I'm wrong here).
That said in-order to emulate a variable length look-behind one could use the \K flag which resets the starting point of the match to the current position(there by dropping all the the previously grabbed items out of the final match).
Demo regex is here.

Extract html tag from content:encoded in Yahoo Pipes

This is my pipes: link
I need to get src attribute of the img tag that are inside the content:encoded
This is the content:encoded of the feed:
<p style="text-align:justify;"><img class="alignnone size-full wp-image-49549" src="http://i2.wp.com/heshootshescoores.com/wp-content/uploads/2014/08/nhl.jpg?resize=600%2C400" alt="nhl"/></p>
<p style="text-align:justify;">...etc.
So in this example I would like to extract this link: ttp://i2.wp.com/heshootshescoores.com/wp-content/uploads/2014/08/nhl.jpg?resize=600%2C400
And export this attribute to a new item.media:thumbnail like this
<media:thumbnail xmlns:media="http://search.yahoo.com/mrss/"
url="HERE GOES THE SRC ATTRIBUTE.png"/>
Is it possible to do with Yahoo! Pipes?( I was thinking about regex but I am not familiar and I don't even know where to start.)
Thank You
FU question: https://stackoverflow.com/questions/25605740/add-items-attribute-in-yahoo-pipes

(.*?)(?=src=)src=\"(.*?)\"(.*)
This will work.
See demo.
http://regex101.com/r/bJ6rZ5/3

Embedding issuu

I need to embed an issuu document inside a website. The website administrator should be allowed to decide which document is displayed on the frontend.
This is an easy task, using the embed link on the issuu page. But I need to customize some options - for instance, disable sharing, set the dimensions and so on. I cannot rely on the administrators doing this process every time they need to change the document.
I can easily customize the issuu embed code to my taste, and all that I need is the document id. Unfortunately, the id is not included in the issuu page for the document. For instance, the id for this random link happens to be 110209071155-d0ed1d10ac0b40dda80dad24166a76ee, which is nowhere to be found, neither in the URL nor easily inside the page. You have to dig into the embed code to find it.
I thought the issuu API could allow me to get a document id given its URL, but I cannot find anything like this. The closest match is the search API, but if I search for the exact name of the document I get only one match for a different document!
Is there some easy way to be able to embed a document only knowing its URL? Or an easy way for a non techie person to find a document id in the page?

Unfortunate the only way for you to costomize is to pay for the service wich is 39$ for month =/.
You can force a fullscreen mode without ads by using
<body style="margin:0px;padding:0px;overflow:hidden">
<iframe src="YOUR ISSU EMBED" frameborder="0" style="overflow:hidden;height:105%;width:105%;position:absolute;" height="100%" width="100%""></iframe>
</body>

You can embed of course stacks but that isnt showed on Issuu site. This is code (its old code but it works):
<iframe src="http://static.issuu.com/widgets/shelf/index.html?folderId=FOLDERIDamp;theme=theme1&rows=1&thumbSize=large&roundedCorners=true&showTitle=true&showAuthor=false&shadow=true&effect3d=true" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="100%" height="200"></iframe>
FOLDERID is number of 36 chars that you get on address bar when you enter stacks (example: https://issuu.com/username/stacks/FOLDERID). When you replacing that in code you must paste 36 chars in this format 8-4-4-4-12 with - between chars. And voila its working.
You can change theme and other stuffs in code.

The Document ID is found in the HTML source of every document. It is in the og:video meta property.
<meta property="og:video" content="http://static.issuu.com/webembed/viewers/style1/v2/IssuuReader.swf?mode=mini&documentId=XXXXXXXX-XXXXXXXXXXXXX&pageNumber=0">
You can easily handle it by using the DomDocument and DomXPath php classes.
Here is how-to using PHP:
// Your document URL
$url = 'https://issuu.com/proyectotres/docs/proyecto_3_edicion_135';
// Turn off errors, loads the URL as an object and then turn errors on again
libxml_use_internal_errors(true);
$dom = DomDocument::loadHTMLFile($url);
libxml_use_internal_errors(false);
// DomXPath helps find the <meta property="og:video" content="http://hereyoucanfindthedocumentid?documentId=xxxxx-xxxxxxx"/>
$xpath = new DOMXPath($dom);
$meta = $xpath->query("//html/head/meta[#property='og:video']");
// Get the content attribute of the <meta> node and parse its query
$vars = [];
parse_str(parse_url($meta[0]->getAttribute('content'))['query'], $vars);
// Ready. The document ID is here:
$docID = $vars['documentId'];
// You can print it:
echo $docID;
You can try it with the URL of your own Issu document.

You can use the Issuu URL of your document to complete this iframe :
<iframe width="100%" height="283" style="display: block; margin-left: auto; margin-right: auto;" src="https://e.issuu.com/issuu-reader3-embed-files/latest/twittercard.html?u=nantucketchamber&d=program-update1&p=1" frameborder="0" allowfullscreen="allowfullscreen" span="" id="CmCaReT"></iframe>
You just need to replace "nantucketchamber" by a user name and "program-update1" by the file name in the Issuu URL
(for this example the URL is https://issuu.com/nantucketchamber/docs/program-update1)

Could anyone tell me why / how this XSS vector works in the browser?

I have suffered a number of XSS attacks against my site. The following HTML fragment is the XSS vector that has been injected by the attacker:
<a href="mailto:">
<a href=\"http://www.google.com onmouseover=alert(/hacked/); \" target=\"_blank\">
<img src="http://www.google.com onmouseover=alert(/hacked/);" alt="" /> </a></a>
It looks like script shouldn't execute, but using IE9's development tool, I was able to see that the browser translates the HTML to the following:
<a href="mailto:"/>
<a onmouseover="alert(/hacked/);" href="\"http://www.google.com" target="\"_blank\"" \?="">
</a/>
After some testing, it turns out that the \" makes the "onmouseover" attribute "live", but i don't know why. Does anyone know why this vector succeeds?

So to summarize the comments:
Sticking a character in front of the quote, turns the quote into a part of the attribute value instead of marking the beginning and end of the value.
This works just as well:
href=a"http://www.google.com onmouseover=alert(/hacked/); \"
HTML allows quoteless attributes, so it becomes two attributes with the given values.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to get src of iframe element - regex

This can be achieved using (?<=src=").?(?=[\?"]) See working example on Regex101 Explanation (?<=src=") Prepended by src=" .? Lazy match any token (?=[\?"]) Until either a ? or " would be the next token If you might have a longer URL that doesn't end with ? (?<=src=").?(?=[\"])

Related

How to find and replace in a regex code

Match only spefic url via regex

Extract html tag from content:encoded in Yahoo Pipes

Embedding issuu

Could anyone tell me why / how this XSS vector works in the browser?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to get src of iframe element - regex

This can be achieved using (?<=src=").*?(?=[\?"]) See working example on Regex101 Explanation (?<=src=") Prepended by src=" .*? Lazy match any token (?=[\?"]) Until either a ? or " would be the next token If you might have a longer URL that doesn't end with ? (?<=src=").*?(?=[\*"])

Related

How to find and replace in a regex code

Match only spefic url via regex

Extract html tag from content:encoded in Yahoo Pipes

Embedding issuu

Could anyone tell me why / how this XSS vector works in the browser?

Categories

Resources

This can be achieved using (?<=src=").?(?=[\?"]) See working example on Regex101 Explanation (?<=src=") Prepended by src=" .? Lazy match any token (?=[\?"]) Until either a ? or " would be the next token If you might have a longer URL that doesn't end with ? (?<=src=").?(?=[\"])