Regular expression in java to extract URl from HTML

Regular expression in java to extract URl from HTML - regex

I am new to regexes. I need help.
My HTML source is
<img src ="planets.gif" width="145" height="126" alt="Planets" usemap ="#planetmap">
<map name="planetmap">
<area shape="rect" coords="0,0,82,126" href="http://www.sun.htm" alt="Sun">
<area shape="circle" coords="90,58,3" href="http://www.mercur.htm" alt="Mercury">
<area shape="circle" coords="124,58,8" href="http://www.www.venus.htm" alt="Venus">
</map>
I’m trying to extract all href links out like http://www.google.com.
kindly help.
My Regex is
"href=[\\\"\\'](http:\\/\\/|\\.\\/|\\/)?\\w+(\\.\\w+)*(\\/\\w+(\\.\\w+)?)*(\\/|\\?\\w*=\\w*(&\\w*=\\w*)*)?[\\\"\\']"
it wil extract like href="http://www.google.com"
But I need only link http://www.google.com without href=

Please use a XML-parser for this kind of stuff.

Related

how can I deeplink an app from Google Assistant?

I'm creating a dialogflow agent integrated with Google Assistant.
What I'd like to do is to open an app (my app) when a proper intent is matched. I've seen that actions like Youtube, Spotify etc. are able to do that, for example I can tell the Youtube action "search for cats video" and the Youtube app will open with a list of cats videos.
I tried to use the DeepLink class but I then noticed it's deprecated.
DeepLink class
Is there any way you can suggest me to do this?
Thanks in advance

I think you are looking for App Actions. Here are the steps you need to follow:
Find the right built-in intent. actions.intent.OPEN_APP_FEATURE should be the right one for you.
Create and update actions.xml. It should look like
<?xml version="1.0" encoding="UTF-8"?>
<!-- This is a sample actions.xml -->
<actions>
<action intentName="actions.intent.OPEN_APP_FEATURE">
<!-- Use url from inventory match for deep link fulfillment -->
<fulfillment urlTemplate="{#url}" />
<!-- Define parameters with inventories here -->
<parameter name="feature">
<entity-set-reference entitySetId="featureParamEntitySet" />
</parameter>
</action>
<entity-set entitySetId="featureParamEntitySet">
<!-- Provide a URL per entity -->
<entity url="myapp://deeplink/one" name="featureParam_one" alternateName="#array/featureParam_one_synonyms" />
<entity url="myapp://deeplink/two" name="featureParam_two" alternateName="#array/featureParam_two_synonyms" />
</entity-set>
</actions>

Apache JMeter Regular Expressions Extractor Error

I have made an HTTP Request to a webpage and it respond successfully with a VAST code (XML) Afterwards I tried to use Apache JMeter Regular Expressions Extractor for Extracting a URL from the MediaFile tag in the responded XML code . but it doesn't work.
Here is the responded data (VAST XML):
<?xml version="1.0" encoding="UTF-8"?>
<VAST version="2.0">
<Ad id="brightroll_ad">
<InLine>
<AdSystem>BrightRoll</AdSystem>
<AdTitle></AdTitle>
<Impression><![CDATA[http://brxserv-22.btrll.com/v1/epix/6835714/3858435/84416/140363/AbQ93_XgMgCcRUTi_JAAFJwAACJEsAOuADAAAAAAAiyel-GCNFFg/event.imp/r_64.aHR0cDovL2Iuc2NvcmVjYXJkcmVzZWFyY2guY29tL3A_JmMxPTgmYzI9NjAwMDAwNiZjMz04NDQxNiZjND0zODU4NDM1JmM1PTIwNDYzJmM2PTY4MzU3MTQmYzEwPTE0MDM2MyZjdj0xLjcmY2o9MSZybj0xNDE0NDEwMTg1JnI9aHR0cCUzQSUyRiUyRnBpeGVsLnF1YW50c2VydmUuY29tJTJGcGl4ZWwlMkZwLWNiNkMwekZGN2RXakkuZ2lmJTNGbGFiZWxzJTNEcC42ODM1NzE0LjM4NTg0MzUuMCUyQ2EuMjA0NjMuODQ0MTYuMTQwMzYzJTJDdS45NjguNjQweDM2MCUzQm1lZGlhJTNEYWQlM0JyJTNEMTQxNDQxMDE4NQ]]></Impression>
<Impression><![CDATA[http://rc.rlcdn.com/361686.gif]]></Impression>
<Creatives>
<Creative id="140363" sequence="1">
<Linear>
<Duration>00:00:30</Duration>
<TrackingEvents>
<Tracking event="midpoint"><![CDATA[http://brxserv-22.btrll.com/v1/epix/6835714/3858435/84416/140363/AbQ93_XgMgCcRUTi_JAAFJwAACJEsAOuADAAAAAAAiyel-GCNFFg/event.mid]]></Tracking>
<Tracking event="complete"><![CDATA[http://brxserv-22.btrll.com/v1/epix/6835714/3858435/84416/140363/AbQ93_XgMgCcRUTi_JAAFJwAACJEsAOuADAAAAAAAiyel-GCNFFg/event.end]]></Tracking>
</TrackingEvents>
<AdParameters></AdParameters>
<VideoClicks>
<ClickTracking><![CDATA[http://brxserv-22.btrll.com/v1/epix/6835714/3858435/84416/140363/AbQ93_XgMgCcRUTi_JAAFJwAACJEsAOuADAAAAAAAiyel-GCNFFg/event.click]]></ClickTracking>
</VideoClicks>
<MediaFiles>
<MediaFile type="application/x-shockwave-flash" apiFramework="VPAID" height="360" width="640" delivery="progressive">
<![CDATA[http://shim.btrll.com/shim/20141023.75835_master/Scout.swf?type=VPAID&hidefb=true&asset_64=aHR0cDovL3J0ci5pbm5vdmlkLmNvbS9yMS41NDQ1OTU0ZDA5ZTY4OS40MjIxNTcxODtjYj0xNDE0NDEwMTg1O3NpdGVpZD0zODU4NDM1bGluZWl0ZW04NDQxNg&vid_click_url=&config_url_64=&h_64=YnJ4c2Vydi0yMi5idHJsbC5jb20&dn=-&e=p&p=6835714&s=3858435&l=84416&ic=140363&ii=20463&iq=t&cx=&x=AbQ93_XgMgCcRUTi_JAAFJwAACJEsAOuADAAAAAAAiyel-GCNFFg&adc=false&t=33&si=&vh_64=Z2VvLXJ0YnNlcnYtdjIuYnRybGwuY29t&apep=0.05&hbp=0.01&view=vast2]]>
</MediaFile>
</MediaFiles>
</Linear>
</Creative>
</Creatives>
</InLine>
and Here is the settings which I have used.
Reference Name: mediaFileUrl_VASTAdTagURI
Regular Expression: <MediaFile type="application//x-shockwave-flash" apiFramework="VPAID" height="360" width="640" delivery="progressive"><([^"]+)http:\/\/([^"]+)]]>>
Template: $1$$2$
Match No.: -1
Default Value: No mediaFileUrl_VASTAdTagURI
The result is always (No mediaFileUrl_VASTAdTagURI). any clue about the problem with the Regular Expression.

JMeter provides XPath Extractor to deal with XML and XHTML data. It can also work for HTML but you'll have to check Use Tidy box so JMeter could use JTidy to work against HTML.
XPath expression to extract contents of CDATA should look something like:
//MediaFile/text()[2]
See XPath Tutorial for more details. Few tools which can help in building/debugging XPath expressions:
XPath Checker Firefox add-on
FirePath Firefox add-on
View Results Tree JMeter's listener provides XPath Tester as well

Ant task replace escaped URI param

I'm trying to use Ant to remove a url parameter in an xml file.
The line in the xml is similar to below.
<from uri="http://www.google.com?q=test&somethingElse=something" />
I want to remove the "&somethingElse=something". "something" could be different values so it must be generic.
I've tried
<replaceregexp file="somefile.xml" match="&somethingElse(.*)" replace='" />' flags="gs" byline="true" />
<replaceregexp file="somefile.xml" match="\&somethingElse(.*)" replace='" />' byline="true" />
<replaceregexp file="somefile.xml" match="(&)somethingElse(.*)" replace='" />' flags="gs" byline="true" />
but those don't seem to work.
$(ant.regexp.regexpimpl) is not set so the default engine is being used.

In order to get & you need to write & in the Ant build file because it is XML. To match &somethingElse in the input of the replaceregexp Ant task you might therefore need to specify &amp;somethingElse in the Ant build file.

Regular expression: match youtube links, but not youtube embed code

Could you help me, please. I need regular expression that match string like this:
http://www.youtube.com/watch?v=eE4qPqMYsp8
but not this:
<object width="500" height="700"><param name="movie" value="http://www.youtube.com/v/eE4qPqMYsp8&hl=ru&fs=1&rel=0" /><param name="wmode" value="transparent" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><embed src="http://www.youtube.com/v/eE4qPqMYsp8&hl=ru&fs=1&rel=0" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" wmode="transparent" width="500" height="700">
I have this code:
%(?:(http://){0,1}(www.){0,1}youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|(http://){0,1}(www.){0,1}youtu\.be/)([^"&?/ ]{11})%
I don't know how to exclude some parameters.

How about an expression like this:
(?:https?://)?(?:www\.)?youtube\.com/watch.+?\bv=[a-zA-Z0-9]+
You can certainly add in more options (e.g. (?:-nocookie)), but it might be specific enough like this already.

Evernote export format (ENEX) to HTML, including pictures?

#Solved
The two subquestions I have created have been solved (yay for splitting this one up!), so this one is solved. I'll award the check mark to samjudson, since his answer was the closest. For actual working solutions though, see the below subquestions; both my implemented solutions and the checked answers.
#Deprecated
I am splitting this question into two separate questions, since this is a fairly complicated problem. Answers are still welcome though.
The suquestions are:
XSLT: Convert base64 data into
image files
XSLT: Obtaining or matching hashes
for base64 encoded data
Hi, just wondering if anyone here has had any success in converting Evernote's export format, which is XML, to HTML including the pictures. I do know that Evernote has an export to HTML function which does this, but I eventually want to do more fancy stuff with it.
I have managed to accomplish getting the text only using the following XSLT:
Sample code removed
See child questions for implemented solutions.
However, a.t.m. this simply ignores any pictures, and this is where I need help.
Stumbling block #1: Evernote stores its pictures as GIFs or PNGs, and when exported, it embeds these GIFs & PNGs directly in the XML using what appears to be base64 (I could be wrong). I need to be able to reconsitute the pictures. If you open the file in a text editor, look for the huge blocks of data in the **//note/resource/data**. For example (indents added manually):
<resource>
<data encoding="base64">
R0lGODlhEAAQAPMAMcDAwP/crv/erbigfVdLOyslHQAAAAECAwECAwECAwECAwECAwECAwECAwEC
AwECAyH/C01TT0ZGSUNFOS4wGAAAAAxtc09QTVNPRkZJQ0U5LjAHgfNAGQAh/wtNU09GRklDRTku
MBUAAAAJcEhZcwAACxMAAAsTAQCanBgAIf8LTVNPRkZJQ0U5LjATAAAAB3RJTUUH1AkWBTYSQXe8
fQAh+QQBAAAAACwAAAAAEAAQAAADSQhgpv7OlDGYstCIMqsZAXYJJEdRQRWRrHk2I9t28CLfX63d
ZEXovJ7htwr6dIQB7/hgJGXMzFApOBYgl6n1il0Mv5xuhBEGJAAAOw==
</data>
<mime>image/gif</mime>
<resource-attributes>
<file-name>clip_image001.gif</file-name>
</resource-attributes>
</resource>
Stumbling block #2: Evernote stores the file names of each picture under the resource node
**//note/resource/resource-attributes/file-name**
however, in the actual note in which it refers to the picture, it references the picture not by the filename, but by its hash, for example:
<en-media hash="4aaafc3e14314027bb1d89cf7d59a06c" type="image/gif" border="0" width="16" height="16" alt="Alt Text"/>
Can anyone shed some light on how to deal with (base64) encoded binary data inside XML?
Edit
I understand from the comments & answers that plain ol' XSLT won't get the job done handling images. The XSLT processor I am using is Xalan , however, if this is not good enough for the purposes of image processing or base64, then I am please suggest one that does do these!
Also, as requested, here is a sample Evernote export file. The code clips above are merely selected parts of this. I have stripped it down such that it contains just one note and edited most of the text out of it, and added indents for clarity.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export.dtd">
<en-export export-date="20091029T063411Z" application="Evernote/Windows" version="3.0">
<note>
<title>A title here</title>
<content><![CDATA[
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml.dtd">
<en-note bgcolor="#FFFFFF">
<p>Some text here (followed by the picture)
<p><en-media hash="4aaafc3e14314027bb1d89cf7d59a06c" type="image/gif" border="0" width="16" height="16" alt="A picture"/></p>
<p>Some more text here (preceded by the picture)
</en-note>
]]></content>
<created>20090925T063154Z</created>
<note-attributes>
<author/>
</note-attributes>
<resource>
<data encoding="base64">
R0lGODlhEAAQAPMAMcDAwP/crv/erbigfVdLOyslHQAAAAECAwECAwECAwECAwECAwECAwECAwEC
AwECAyH/C01TT0ZGSUNFOS4wGAAAAAxtc09QTVNPRkZJQ0U5LjAHgfNAGQAh/wtNU09GRklDRTku
MBUAAAAJcEhZcwAACxMAAAsTAQCanBgAIf8LTVNPRkZJQ0U5LjATAAAAB3RJTUUH1AkWBTYSQXe8
fQAh+QQBAAAAACwAAAAAEAAQAAADSQhgpv7OlDGYstCIMqsZAXYJJEdRQRWRrHk2I9t28CLfX63d
ZEXovJ7htwr6dIQB7/hgJGXMzFApOBYgl6n1il0Mv5xuhBEGJAAAOw==
</data>
<mime>image/gif</mime>
<resource-attributes>
<file-name>clip_image001.gif</file-name>
</resource-attributes>
</resource>
</note>
</en-export>
And this needs to be transformed into this:
<html>
<body>
<p>Some text here (followed by the picture)
<p><img src="clip_image001.gif" border="0" width="16" height="16" alt="A picture"/></p>
<p>Some more text here (preceded by the picture)
</body>
</html>
With the file clip_image001.gif being generated and saved.

There is a new Data URI specification http://en.wikipedia.org/wiki/Data_URI_scheme which may be of some help provided you are only intending to support modern browsers, and your images are small (for example IE8 only support <32k images).
Other than that the only other thing you can do is use some external scripts to export the image data to file and use them. This would depend greatly on what XSLT processor you are using.

It exists a pure XSLT answer to this issue ; look at this page

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression in java to extract URl from HTML - regex

Please use a XML-parser for this kind of stuff.

Related

how can I deeplink an app from Google Assistant?

Apache JMeter Regular Expressions Extractor Error

Ant task replace escaped URI param

Regular expression: match youtube links, but not youtube embed code

Evernote export format (ENEX) to HTML, including pictures?

Categories

Resources