Verify href has a valid url or not xslt 1 - xslt

I need to validate the url inside href tag. If that url is valid then do nothing else remove that href tag inside <a> tag. We can use any general regex or any other kind of url validation to do this that validates the href.
Example:
tinyurl
valid url
invalid url
Result:
<a rel="nofollow">tinyurl</a>
valid url
<a rel="nofollow">invalid url</a>
Thanks in advance. Any clue/help given is appreciated.
regex that can be helpful:
/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+(:[0-9]+)?|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)/

Michael Sperberg-McQueen has defined XSD types that match different flavours of URI in
http://www.w3.org/2011/04/XMLSchema/TypeLibrary-URI-RFC3986.xsd
and
http://www.w3.org/2011/04/XMLSchema/TypeLibrary-IRI-RFC3987.xsd
To see the way these complex regular expressions are constructed, view these documents at the raw XML level using (for example) curl.
Regular expressions can be used for pattern matching in XSLT 2.0, but there's no support in XSLT 1.0.

Related

Using regular expression, How can I get the href url of an anchor tag that matches a particular class value such as foo?

Can anyone help me with PHP regex (Regular Expression). I want to get all URLs that matches a certain attribute. The following example is, I want to get all href URLs that has a class of 'foo'.
<a title="foo" href="http://foo.com/" class="foo">Foo</a>
Bar
<a class="foo" title="foobar" href="http://foobar.com/">FooBar</a>
Result should be match the 2 URLs:
http://foo.com/
http://foobar.com/
I know this can be done easily using PHP packages such as DOM crawlers, but I want to use PHP RegEx.
See Demo
class="foo"[^>]*href="([^"]*)"[^>]*|href="([^"]*)"[^>]*class="foo"
[^>]*:match other attributes

jmeter use regex to get link text

I want to use jmeters regular expression extractor to catch a link from an HTTP response I have. How do I catch only whats inside the ? I want the TEXT.
<a([^>]+)>(.+?)<\/a>
The expression above gives me the whole link with the a tag and href.
I would rather recommend not using regular expressions for getting data from HTML as href attribute may be located in differently, at new line, etc. See the epic comment on SO for detailed explanation.
JMeter provides 2 test elements which can be used to extract href attribute from HTML page links:
XPath Extractor
CSS/JQuery Extractor
XPath Example
Add XPath Extractor as a child of the request (just like Regular Expression Extractor)
Configure it as follows:
If your response is not XHTML compliant - check Use Tidy box
Reference name - anything meaningful, i.e. href
XPath query - //a/#href
You can refer to extracted link URL as ${href} anywhere in current thread group.
In case of multiple matches URLs can be accessed as ${href_1} ${href_2} etc.
For more information on the XPath Extractor see Using the XPath Extractor in JMeter guide
CSS/JQuery Example
Add CSS/JQuery Extractor as a child of the request
Configure it as follows:
Reference name - any variable name, i.e. href
CSS/JQuery expression - a
Attribute - href
Match no:
default is blank - will return the first link
any number > 0 - will return match number
0 - will return random link URL
-1 - will return all link URLs and store them as ${href_1} ${href_2} etc.
For CSS/JQuery expressions building information refer to JSOUP selector syntax guide
Try with this:
<a[^>]* href="([^"]*)"
regular expression for finding 'href' value of a <a> link
Try this.
use group 1 to get the content from tag.
<a(?: [^>]+)?>((?:(?!<\/?a[ >]).)*)<\/a>
SEE DEMO: http://regex101.com/r/rV3eH6/1

Jmeter Token value extraction

Using Jmeter I was trying to extract the value of a token from the following, using the regular expression extractor:
<input name="__RequestVerificationToken" type="hidden"
value="BeRYiSIRjZoQHq4VW8qbkgXlnnzdUINpFNoYF_ugx-FRk0tkImbQPhwyYjyz_0Q-w6F2A0gDOfMZrdklD6rVn6-QnYggfImb55f90V7nrD_kbSkT3-y3gPqoTFg0ynTBLyX5Lw2" />
When I used the following expression:
name="__RequestVerificationToken" type="hidden" value="(.+?)"
the value was not extracted.
After a few searches I used the following expression:
name="__RequestVerificationToken" type="hidden" value="([A-Za-z0-9-_]+?)"
which worked, but I don't know why :d.
My question: why the first expression didn't worked since basically tells to extract any character that matches one or more times.
use this
name="__RequestVerificationToken" type="hidden"\s*value="(.+?)"
or the best is
name="__RequestVerificationToken" type="hidden"\s*value="([^"]*)"
Both of yours will not work as between type and value there is a \n which you have not taken care of.Now it works.See demo.
http://regex101.com/r/dK1xR4/14
First of all, don't use Regular Expressions to extract data from HTML. It is complicated and very fragile in case of even slight DOM changes.
JMeter provides the following components to extract data from HTML responses:
XPath Extractor
CSS/JQuery Extractor
XPath Extractor Guide
Add Xpath Extractor as a child of the request which produces that response
Configure it as follows:
Reference name: anything meaningful, i.e. token
XPath query: //input[#name='__RequestVerificationToken']/#value
If your response is not valid XHTML check Use Tidy box
Refer to extracted value as ${token} or ${__V(token)} where required. Remember that JMeter Variables scope is limited to current thread group only.
For more information see Using the XPath Extractor in JMeter
CSS/JQuery Extractor Guide
Add CSS/JQuery Extractor as a child of the request which produces that authentication token response
Configure it as follows:
Reference name: anything meaningful, i.e. token
CSS/JQuery expression: input[name=__RequestVerificationToken]
Attribute: value
Refer to extracted value as ${token} or ${__V(token)} where required. Same restriction on JMeter Variables scope apply.
See JSoup selector syntax guide for a reference on how to build CSS selectors.
Hope this helps.

Looking for regex to erase href text

If I have a bunch of urls like this:
<li>Xyz 123</li>
<li>Xyz 345</li>
What would a regex look like to erase the urls inside the hrefs so that they become:
<li>Xyz 123</li>
<li>Xyz 345</li>
The following should do what you like:
/href=\"([^\"]*)\"/
Basically match href="<any text but a '"'>".
Search for <a href="[^"]*" and replace with <a href="".
If you add more details about which language you're using, I can be more specific. Be aware also that regular expressions are usually not the tool of choice when dealing with HTML.
First of all, do not use regex to parse HTML — why? Have a look here or here.
Process the HTML using an XML reader / XML document processing engine. Then use XPath to find nodes matching your criteria and alter href attributes in the DOM.
Note: For HTML which is not well-formed XML a more-general HTML (SGML) parser is required.
I partially agree with the others but a more complete version would be
/(<a[^>]+href\s*=\s*\")(.*?)("[^>]*>)/$1$3/gi

how to match a url non greedy

I am hoping that someone can help me to make this match non greedy... I am using Javascript and ASP Classic
.match(/(<a\s+.*?><\/a>)/ig);
The purpose is to extract URL's from a page in this format <a href ></a>
I need to capture just the url
Thanks
Try the following:
.match(/(<a\s+.*?href="(.*?)".*?>/)/ig);