Apache JMeter Regular Expressions Extractor Error - regex

I have made an HTTP Request to a webpage and it respond successfully with a VAST code (XML) Afterwards I tried to use Apache JMeter Regular Expressions Extractor for Extracting a URL from the MediaFile tag in the responded XML code . but it doesn't work.
Here is the responded data (VAST XML):
<?xml version="1.0" encoding="UTF-8"?>
<VAST version="2.0">
<Ad id="brightroll_ad">
<InLine>
<AdSystem>BrightRoll</AdSystem>
<AdTitle></AdTitle>
<Impression><![CDATA[http://brxserv-22.btrll.com/v1/epix/6835714/3858435/84416/140363/AbQ93_XgMgCcRUTi_JAAFJwAACJEsAOuADAAAAAAAiyel-GCNFFg/event.imp/r_64.aHR0cDovL2Iuc2NvcmVjYXJkcmVzZWFyY2guY29tL3A_JmMxPTgmYzI9NjAwMDAwNiZjMz04NDQxNiZjND0zODU4NDM1JmM1PTIwNDYzJmM2PTY4MzU3MTQmYzEwPTE0MDM2MyZjdj0xLjcmY2o9MSZybj0xNDE0NDEwMTg1JnI9aHR0cCUzQSUyRiUyRnBpeGVsLnF1YW50c2VydmUuY29tJTJGcGl4ZWwlMkZwLWNiNkMwekZGN2RXakkuZ2lmJTNGbGFiZWxzJTNEcC42ODM1NzE0LjM4NTg0MzUuMCUyQ2EuMjA0NjMuODQ0MTYuMTQwMzYzJTJDdS45NjguNjQweDM2MCUzQm1lZGlhJTNEYWQlM0JyJTNEMTQxNDQxMDE4NQ]]></Impression>
<Impression><![CDATA[http://rc.rlcdn.com/361686.gif]]></Impression>
<Creatives>
<Creative id="140363" sequence="1">
<Linear>
<Duration>00:00:30</Duration>
<TrackingEvents>
<Tracking event="midpoint"><![CDATA[http://brxserv-22.btrll.com/v1/epix/6835714/3858435/84416/140363/AbQ93_XgMgCcRUTi_JAAFJwAACJEsAOuADAAAAAAAiyel-GCNFFg/event.mid]]></Tracking>
<Tracking event="complete"><![CDATA[http://brxserv-22.btrll.com/v1/epix/6835714/3858435/84416/140363/AbQ93_XgMgCcRUTi_JAAFJwAACJEsAOuADAAAAAAAiyel-GCNFFg/event.end]]></Tracking>
</TrackingEvents>
<AdParameters></AdParameters>
<VideoClicks>
<ClickTracking><![CDATA[http://brxserv-22.btrll.com/v1/epix/6835714/3858435/84416/140363/AbQ93_XgMgCcRUTi_JAAFJwAACJEsAOuADAAAAAAAiyel-GCNFFg/event.click]]></ClickTracking>
</VideoClicks>
<MediaFiles>
<MediaFile type="application/x-shockwave-flash" apiFramework="VPAID" height="360" width="640" delivery="progressive">
<![CDATA[http://shim.btrll.com/shim/20141023.75835_master/Scout.swf?type=VPAID&hidefb=true&asset_64=aHR0cDovL3J0ci5pbm5vdmlkLmNvbS9yMS41NDQ1OTU0ZDA5ZTY4OS40MjIxNTcxODtjYj0xNDE0NDEwMTg1O3NpdGVpZD0zODU4NDM1bGluZWl0ZW04NDQxNg&vid_click_url=&config_url_64=&h_64=YnJ4c2Vydi0yMi5idHJsbC5jb20&dn=-&e=p&p=6835714&s=3858435&l=84416&ic=140363&ii=20463&iq=t&cx=&x=AbQ93_XgMgCcRUTi_JAAFJwAACJEsAOuADAAAAAAAiyel-GCNFFg&adc=false&t=33&si=&vh_64=Z2VvLXJ0YnNlcnYtdjIuYnRybGwuY29t&apep=0.05&hbp=0.01&view=vast2]]>
</MediaFile>
</MediaFiles>
</Linear>
</Creative>
</Creatives>
</InLine>
and Here is the settings which I have used.
Reference Name: mediaFileUrl_VASTAdTagURI
Regular Expression: <MediaFile type="application//x-shockwave-flash" apiFramework="VPAID" height="360" width="640" delivery="progressive"><([^"]+)http:\/\/([^"]+)]]>>
Template: $1$$2$
Match No.: -1
Default Value: No mediaFileUrl_VASTAdTagURI
The result is always (No mediaFileUrl_VASTAdTagURI). any clue about the problem with the Regular Expression.

JMeter provides XPath Extractor to deal with XML and XHTML data. It can also work for HTML but you'll have to check Use Tidy box so JMeter could use JTidy to work against HTML.
XPath expression to extract contents of CDATA should look something like:
//MediaFile/text()[2]
See XPath Tutorial for more details. Few tools which can help in building/debugging XPath expressions:
XPath Checker Firefox add-on
FirePath Firefox add-on
View Results Tree JMeter's listener provides XPath Tester as well

Related

How to process HTML entities in XSLT

I am trying to transform XHTML that contains the entity. Saxon complains that the entity is not defined. How can I define it?
Is it possible to add the entity definition at the beginning of the stylesheet? As suggested
here:
http://s-n-ushakov.blogspot.com/2011/09/xslt-entities-java-xalan.html
or here:
Using an HTML entity in XSLT (e.g. )
My puny attempt, ignored by Saxon, was to add the following to the beginning of the XSLT:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE stylesheet [
<!ENTITY nbsp " ">
]>
I am using Saxon 9.9 PE.
The HTML I am trying to transform is a complete document, not just a fragment.
One possibility is to pass the URL of the XHTML to the XSLT as a parameter, which would read the XHTML as text using the unparsed-text() function, expand the entity reference using the replace() function, and parse the result using the parse-xml() function. e.g.
<xsl:template name="xsl:initial-template">
<xsl:param name="source"/>
<xsl:apply-templates select="
$source
=> unparsed-text()
=> replace('&nbsp;', '&#x000A0;')
=> parse-xml()
"/>
</xsl:template>
If the input document contains an entity reference that isn't declared in the DOCTYPE declaration, then it isn't a well-formed XML document, and therefore it isn't a well-formed XHTML document; and if it isn't well-formed, then Saxon can't handle it.
It would be best to look at the processing workflow that generated this ill-formed document and fix it so the documents it produces are well-formed.
If you can't do that, then you might be able to parse it as HTML. Saxon has an extension function saxon:parse-html(); or if your application is in Java then you could create a SAXSource that uses validator.nu as its XMLReader.
You should consider using the tool Tidy and convert html files into xhtml. It corrects all such things.
Just run tidy with the argument -asxml.

Including a CDATA field in a Service Connector

An API I am communicating with is Soap based and requires XML with inner XML (CDATA) in the request.
For the service connector action test I have hard-coded the inner xml with this format:
<![CDATA[
<Application xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ApplicationCrossReferenceId="123">
...
...
</Application> ]]>
where the dots indicate the data contained.
When running the test the request payload has been transformed to the html entity for < which is $lt; - as seen below :
Is there a way to avoid this?
This is a bug in Informatica. the other characters are decoded back to their original correctly, as described in KB 512858, &gt and &lt however are not decoded.
A bug report has been raised 29.05.2020.
Edit: Further investigation revealed that using CDATA was not necessary in my case, instead I was able to use the following input for the body binding:
<Application xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ApplicationCrossReferenceId="123">
...
...
</Application>

Using Apache Nifi to extract HL7 values and apply regex

I need to extract patient info from the HL7 XML document using Apache Nifi,
and to apply regex to extract diagnostic results from the sections that contain embedded HTML (yes, sorry. not my design choice :-( )
First path to data of interest in the HL7 is:
"ClinicalDocument" \ "recordTarget" \ "patientRole" \ "patient" \ "name",
and the second, more complicated one is:
"ClinicalDocument" \ "structuredBody" \ "component" \ "section" \ "text #mediaType="text/x-hl7-text+xml"" where the value of the title element equals to "Diagnostic Results"
I need to match on text of the sub-node text value of the title of the section within component that has value "Diagnostic Results" (Diagnostic Results), and then extract the text value of the peer node text.
My HL7 XML snippets look like:
</ClinicalDocument>
...
<recordTarget>
<patientRole>
....
<patient>
<name><given>John</given><family>Doe</family></name>
...
<structuredBody>
...
<component>
<section classCode="DOCSECT" moodCode="EVN">
<templateId root="0.0.0.0.0.0.1" />
<code code="000-01" codeSystem="0.0.0.1.0.0" />
<title>Diagnostic Results</title>
<text mediaType="text/x-hl7-text+xml">
Some data of interest expressed in n microns.<content ID="NKN_results"/>
</text>
Any suggestions on how do I do this in Apache Nifi?
You should be able to use XPath and the NiFi EvaluateXPath processor to match and extract the <text> element. I started with the structuredBody tag as root for the following expression:
/structuredBody/component/section[title = 'Diagnostic Results' and text[#mediaType='text/x-hl7-text+xml']]/text
But you should be able to adapt it for the full XML path. Once the <text> element is parsed out, starting in NiFi 0.5.0 you can use the GetHtmlElement processor to extract from the embedded HTML. Previous to NiFi 0.5.0, if the HTML is well-formed (XHTML, e.g.) you can use another EvaluateXPath processor instead.

XPath - Querying two XML documents

I have have two xml docs:
XML1:
<Books>
<Book id="11">
.......
<AuthorName/>
</Book>
......
</Books>
XML2:
<Authors>
<Author>
<BookId>11</BookId>
<AuthorName>Smith</AuthorName>
</Author>
</Authors>
I'm trying to do the following:
Get the value of XML2/Author/AuthorName where XML1/Book/#id equals XML2/Author/BookId.
XML2/Author/AuthorName[../BookId = XML1/Book/#id]
An XPath 1.0 expression cannot refer to more than one XML document, unless the references to the additional documents have been set up in the context of the XPath engine by the hosting language. For example, if XSLT is the hosting language, then it makes its document() function available to the XPath engine it is hosting.
document($xml2Uri)/Authors/Author[BookId = $mainDoc/Books/Book/#id]
Do note, that even the main XML document needs to be referenced via another <xsl:variable>, named here $mainDoc.
The document() function is available only if Xpath is hosted by XSLT! This is not mentioned in the answer of Doc Brown and is misleading the readers.
An XPath 2.x expression may refer to any additional XML document using the XPath 2.0 doc() function.
for $doc in /,
$doc2 in doc(someUri)
return
$doc2/Authors/Author[BookId = $doc/Books/Book/#id]
The document function is your friend, here is a short tutorial how to combine multiple input files.
EDIT: Of course, that works only if your are using Xpath in an Xslt script.

Using Regular Expressions in JSP EL

In EL expressions, used in a jsp page, strings are taken literally. For example, in the following code snippet
<c:when test="${myvar == 'prefix.*'}">
test does not evaluate to true if the value of myvar is 'prefixxxxx.' Does anyone know if there is a way to have the string interpreted as a regex instead? Does EL have something similar to awk's tilde ~ operator?
While this special case can be handled with the JSTL fn:startsWith function, regular expressions in general seem like very likely tests. It's unfortunate that JSTL doesn't include a function for these.
On the bright side, it's pretty easy to write an EL function that does what you want. You need the function implementation, and a TLD to let your web application know where to find it. Put these together in a JAR and drop it into your WEB-INF/lib directory.
Here's an outline:
com/x/taglib/core/Regexp.java:
import java.util.regex.Pattern;
public class Regexp {
public static boolean matches(String pattern, CharSequence str) {
return Pattern.compile(pattern).matcher(str).matches();
}
}
META-INF/x-c.tld:
<taglib xmlns="http://java.sun.com/xml/ns/j2ee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-jsptaglibrary_2_0.xsd" version="2.0">
<tlib-version>1.0</tlib-version>
<short-name>x-c</short-name>
<uri>http://dev.x.com/taglib/core/1.0</uri>
<function>
<description>Test whether a string matches a regular expression.</description>
<display-name>Matches</display-name>
<name>matches</name>
<function-class>com.x.taglib.core.Regexp</function-class>
<function-signature>boolean matches(java.lang.String, java.lang.CharSequence)</function-signature>
</function>
</taglib>
Sorry, I didn't test this particular function, but I hope it's enough to point you in the right direction.
Simply add the following to WEB-INF/tags.tld
<?xml version="1.0" encoding="ISO-8859-1" ?>
<taglib version="2.1"
xmlns="http://java.sun.com/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-jsptaglibrary_2_1.xsd">
<display-name>Acme tags</display-name>
<short-name>custom</short-name>
<uri>http://www.acme.com.au</uri>
<function>
<name>matches</name>
<function-class>java.util.regex.Pattern</function-class>
<function-signature>
boolean matches(java.lang.String, java.lang.CharSequence)
</function-signature>
</function>
</taglib>
Then in your jsp
<%#taglib uri="http://www.acme.com.au" prefix="custom"%>
custom:matches('aaa.+', someVar) }
This will work exactly the same as Pattern.match
You can use JSTL functions like so -
<c:when test="${fn:startsWith(myVar, 'prefix')}">
Take a look: http://java.sun.com/products/jsp/jstl/1.1/docs/tlddocs/fn/tld-summary.html
for using Pattern.matches inside a jsp page in my case it was enough to call
java.util.regex.Pattern.matches(regexString,stringToCompare) because you can't import package in jsp