Regex pattern for <a attributes><img attributes/> - regex

I am trying to apply regex pattern like this.
I want to apply pattern like this.
<a attributes="some set of attributes"><img attributes="some set of attribtes"/></a>
Rules:
<a> tag with attributes followed by <img> with attributes.
Sample Valid Data:
<a xlink:href="some link" title="Image" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/1999/xhtml">
<img alt="No Image" title="No Image" xlink:href="soem path for image" xlink:title="Image" xmlns="http://www.w3.org/1999/xhtml" xmlns:xlink="http://www.w3.org/1999/xlink" />
</a>
Invalid:
<a>data<img/></a>--Data Present, no attributes
<a><img>abcd</img></a>--data Present, No attributes
<a><img/></a>---No attributes
Can any one suggest how to write pattern for this.
Thank you.

You can do this in a completely bulletproof manner with XPath:
//*[local-name()='a' and count(#*)>0 and *[local-name()='img' and count(#*)>0] and count(.//*)=1 and normalize-space(.)='']
This selects all elements with a local name of 'a' which have no non-significant text content, attributes, and a single 'img' element with attributes.
However, since your example code is clearly XML with namespaces and all, perhaps you can reformulate your question to say what your overall task is instead of "what regex should I use". At the very least it seems that perhaps you should be paying attention to those namespaces instead of treating namespace declarations as attributes.
For example, maybe what you really mean is this?
//xhtml:a[#xlink:href and xhtml:img[#xlink:href] and count(.//*)=1 and normalize-space(.)='']

Related

Can't read the XML node elements in ColdFusion

I'm trying to read some values from the XML file which I created, but it gives me the following error:
coldfusion.runtime.UndefinedElementException: Element MYXML.UPLOAD is undefined in XMLDOC.
Here is my code
<cffile action="read" file="#expandPath("./config.xml")#" variable="configuration" />
<cfset xmldoc = XmlParse(configuration) />
<div class="row"><cfoutput>#xmldoc.myxml.upload-file.size#</cfoutput></div>
Here is my config.xml
<myxml>
<upload-file>
<size>15</size>
<accepted-format>pdf</accepted-format>
</upload-file>
</myxml>
Can someone help me to figure out what is the error?
When I am printing the entire variable as <div class="row"><cfoutput>#xmldoc#</cfoutput></div> it is showing the values as
15 pdf
The problem is the hyphen - contained in the <upload-file> name within your XML. If you are in control of the XML contents the easiest fix will be to not use hyphens in your field names. If you cannot control the XML contents then you will need to do more to get around this issue.
Ben Nadel has a pretty good blog article in the topic - Accessing XML Nodes Having Names That Contain Dashes In ColdFusion
From that article:
To get ColdFusion to see the dash as part of the node name, we have to "escape" it, for lack of a better term. To do so, we either have to use array notation and define the node name as a quoted string; or, we have to use xmlSearch() where we can deal directly with the underlying document object model.
He goes on to give examples. As he states in that article, you can either quote the node name to access the data. Like...
<div class="row">
<cfoutput>#xmldoc.myxml["upload-file"].size#</cfoutput>
</div>
Or you can use the xmlSearch() function to parse the data for you. Note that this will return an array of the data. Like...
<cfset xmlarray = xmlSearch(xmldoc,"/myxml/upload-file/")>
<div class="row">
<cfoutput>#xmlarray[1].size#</cfoutput>
</div>
Both of these examples will output 15.
I created a gist for you to see these examples as well.

XSLT how to ignore {} characters

I am creating an xsl file.
I want to print below code as output
<li class="td-nav-flyout {position:'containerleft'}">
But when I run the code java says "cannot compile stylesheet".
Please help.
Thanks in advance.
-Ritesh
What about the xsl:attribute element?
<li>
<xsl:attribute name="class">td-nav-flyout {position:'containerleft'}</xsl:attribute>
</li>
Use the following:
<li class="td-nav-flyout {{position:'containerleft'}}">
Normally, the curly braces inside attribute text allow you to evaluate XPath expressions. See Attribute Value Templates in the spec for full information.

Could anyone tell me why / how this XSS vector works in the browser?

I have suffered a number of XSS attacks against my site. The following HTML fragment is the XSS vector that has been injected by the attacker:
<a href="mailto:">
<a href=\"http://www.google.com onmouseover=alert(/hacked/); \" target=\"_blank\">
<img src="http://www.google.com onmouseover=alert(/hacked/);" alt="" /> </a></a>
It looks like script shouldn't execute, but using IE9's development tool, I was able to see that the browser translates the HTML to the following:
<a href="mailto:"/>
<a onmouseover="alert(/hacked/);" href="\"http://www.google.com" target="\"_blank\"" \?="">
</a/>
After some testing, it turns out that the \" makes the "onmouseover" attribute "live", but i don't know why. Does anyone know why this vector succeeds?
So to summarize the comments:
Sticking a character in front of the quote, turns the quote into a part of the attribute value instead of marking the beginning and end of the value.
This works just as well:
href=a"http://www.google.com onmouseover=alert(/hacked/); \"
HTML allows quoteless attributes, so it becomes two attributes with the given values.

Xsl - How to select value from attribute whose name contains special character

My purpose is to get data from google place. I have the following snippet html:
<div class="rsw-pp rsw-pp-widget">
<div g:type="AverageStarRating"
g:secondaryurls="http://maps.google.com/?cid=12948004443906002997"
g:decorateusingsecondary="http://maps.google.com/?cid=12948004443906002997" g:groups="maps" g:rating_override="2.998000" class="rsw-stars">
</div>
</div>
I want to get value of g:rating_override. I tried with following xsl
<Rating>
<xsl:value-of
select="//div[#class='rsw-pp rsw-pp-widget']/div[#class='rsw-stars']/#g:rating_override" />
</Rating>
It said that 'System.Xml.Xsl.XsltException: Prefix 'g' is not defined'. Could you help me?
You need to define the "g" namespace. Typicaly this is done on the stylesheet element.
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:g=" ... "
version="1.0">
If you're using the Html Agility Pack embedded XSLT tools, you're facing two hard Html Agility Pack limits (at least with version 1.3.0.0):
Support for Namespaces is limited
The XPATH implementation does not support navigating to the attributes (only selection). This XPATH "//tag1/tag2/#myatt" does not work for example.
You can overcome these limits with C# code, but not easily with pure XPATH, hence not with XSLT.
In these case, it's often easier to convert the HTML to XML using the Html Agility Pack, and then use a regular XSLT on XML with the standard .NET classes, instead of XSLT on HTML with Html Agility Pack classes.

How to write this Regex

HTML:
<dt>
<a href="#profile-experience" >Past</a>
</dt>
<dd>
<ul class="past">
<li>
President, CEO & Founder <span class="at">at</span> China Connection
</li>
<li>
Professional Speaker and Trainer <span class="at">at</span> Edgemont Enterprises
</li>
<li>
Nurse & Clinic Manager <span class="at">at</span> <span>USAF</span>
</li>
</ul>
</dd>​​​​​
I want match the <li> node.
I write the Regex:
<dt>.+?Past+?</dt>\s+?<dd>\s+?<ul class=""past"">\s+?(?:<li>\s*?([\W\w]+?)+?\s*?</li>)+\s+?</ul>
In fact they do not work.
No not parse HTML using a regex like it's just a big pile of text. Using a DOM parser is a proper way.
Don't use regular expressions to parse HTML...
Don't use a regular expression to match an html document. It is better to parse it as a DOM tree using a simple state machine instead.
I'm assuming you're trying to get html list items. Since you're not specifying what language you use here's a little pseudo code to get you going:
Pseudo code:
while (iterating through the text)
if (<li> matched)
find position to </li>
put the substring between <li> to </li> to a variable
There are of course numerous third-party libraries that do this sort of thing. Depending on your development environment, you might have a function that does this already (e.g. javascript).
Which language do you use?
If you use Python, you should try lxml: http://lxml.de. With lxml, you can search for the node with tag ul and class "past". You then retrieve its children, which are li, and get text of those nodes.
If you are trying to extract from or manipulate this HTML, xPath, xsl, or CSS selectors in jQuery might be easier and more maintainable than a regex. What exactly is your goal and in what framework are you operating?
please learn to use jQuery for this sort of thing