SoapUI xPath match tests - regex

I'm writing some tests for my web service right now and can't find a lot of information regarding xPath match and Contains. Looking for examples as well.
1) For example, I would like to check if the date has format YYYY-MM-DD.
Do i have to write regex expression in expected result?
2) How can I check if answer equals to one of allowed values (using xsd enumeration)?

If you have control over the WSDL, those sorts of format and simple content validations can be built into the XML Schema within your WSDL that defines your response messages. Then, just use a Schema Compliance assertion on your test step.


Eliminate javascript from HTML with XSLT

I am trying to transform an HTML report into XML, but some javascript in the file is throwing errors, due to statements with a less-than character (e.g., for(var i=0; i<els.length;i++) ). I thought I could eliminate the javascript with the following template, which should remove entire script nodes:
<xsl:template match="script"/>
I assumed the XSLT processor would simply skip over the entire script nodes, but it's still throwing the same errors. I also tried adding this one:
<xsl:template match="script/text()"/>
No luck. If I manually remove all the javascript from the file, my transform works, but that's not practical as I need to create and run a daily automated process on these HTML files to extract some data in the HTML tables.
As a general rule, XSLT will only process well-formed XML input: it's not designed to process other formats like HTML.
However, XSLT will generally accept input from a parser that delivers a stream of events that looks sufficiently like an XML stream. This allows parsers like TagSoup and to be used as a front-end to your XSLT processor.
Saxon packages this up with a parse-html() function that invokes TagSoup to parse HTML input and turn it into a DOM-like tree (actually an XDM tree) that it can process as if it came from XML. is a more up-to-date HTML parser than TagSoup, but you would have to do a little more work to integrate that.
Question was answered by Martin Honnen in the comments:… suggests there is an HTML import feature so try whether that helps. Of course there are standalone applications like HTML Tidy I think you can use outside of the XSLT processsing to first convert your HTML to XHTML.

Regex or Xpath for extracting nodes?

I have an XML file with the following structure;
This xml can be broken sometimes leaving a missing ending of <JobList> and missing end of </Job>.
I would like to be able to extract the <Job> nodes with full content on those that are closed with </Job>. What is the best way to do this?
To make a long story short I am using .NET and built in serializers for deserializing xml content. But since new properties are added you cannot just go back and forth between different versions as it is to strict. Mostly it works, but I would like to have a backup recovery method for this - hence the question.
The current situation is that the deserializer "crashes" the whole deserializing when a new property has been added instead of ignoring it. I am looking to manually parse it on error.
As mentioned on the comments, the ideal would be to make the xml valid, if for whatever reason that is not possible, the workaround is parsing the file as text with a regex.
A general regex for this case could be something like:
this will bring anything between a complete pair
Please notice that this will also return nodes with 'broken' inner nodes, but according to your question you are only concerned about missing and tags.

Calling webservice from voice xml

How I can call webservice from voice xml (vxml) document. I am using an opensource IVR project and I need to run a webservice for any given option from within the vxml document.
This is similar to this query;
how can I call a webservice from voiceXML?
However, solution is provided there but it is not
You cannot call a web service directly from a VoiceXML application. There are generally two approaches for getting data into a VoiceXML application:
Use the data element tag to make an http request. The result must be XML. You will need to parse the result with the provided DOM functions. Note, some browsers have extended features to facilitate XML parsing. This also requires a VoiceXML 2.1 compliant browser.
Transfer control to a dynamic bit of server code that returns VXML to be processed populating your desired variables. This can be done with a goto element or subdialog element.
Your question is incomplete, but I suspect I know what's bothering you.
I get information from a webservice by using
<data name="return_data" srcexpr="some_url" method="post" namelist="var1 var2 var3" />
The data I get back is inside the return_data variable. In my case, the data are in XML format, and I use JavaScript functions to extract the data I need.
As an aside, for maintainability, re-usability, and ease of reading, I personally find it useful to create separate files for the JS functions and include them via <script> into my root VoiceXML document.

easiest way to parse html from soapui http request

I created a http request step in soapui which is an html page from which I need to extract one single value from
<span class="result">12345<span>
I'm thinking about using groovy,is it the best way ? If yes I'm beginner in both soapui and groovy, any snippet code to get started (how to get html Content from http request step, how to parse in groovy) thanks.
If you feel more comfortable with Groovy, then go for it!
SoapUI internally represents almost everything as XML. Therefore the easiest way to manipulate things in SoapUI is using XPath. In your case, you could probably use a Property Transfer step to extract //span[#class="result"].
Siking answer worked setting the source 'property' to ResponseAsXml along with XPath expresions, as for common HTML XPath expressions there's a resource online at

Using a regular expression to replace everything between tags contained within XML output

I've been trawling the internet trying to find a solution to this issue. Basically I am using a web service provided by the company that runs our support software to retrieve customer tickets and output them (dependent on filtering) through our system so that customers can see from their dashboard which current support tickets they have active. I've managed to get the desired tags from the XML that is returned via the web service and place their content in a html table (therefore listing the active tickets row by row in the table) however, as the ticket description tag is populated with the content from emails sent by clients, there is lots of nasty redundant css and styling that has been applied to the Email that I would like to remove.
So far I have managed to use the 'replace' function to replace some of the redundant content from this email content ->
l_html_build := replace(l_html_build,'<','<');
l_html_build := replace(l_html_build,'>','>');
l_html_build := replace(l_html_build,'&lt;','');
l_html_build := replace(l_html_build,'&gt;','');
l_html_build := replace(l_html_build,'&nbsp;',' ');
However I now need to overwrite the p tags which have all sorts of garbage added to them so that they just become standard p tags->
From this:
<p 0in;"="" 3.0pt="" padding:="" 1.0pt;="" solid="" border-top:="" none;="" _mce_style=""border:" 0in"="" 0in="" 1.0pt;padding:3.0pt="" #b5c4df="" style=""border:none;border-top:solid">
To this:
I've looked into using the regEXP function listed here psoug however this appears to require a select statement that is performed each time. The data I need to manipulate is stored in a CLOB called l_html_build so is there any way of adapting the regEXP function to be used in a similar way to the replace function above or is there an alternative method that I am not aware of?
I apologise if this is a noob question. My expertise lies in front end development, PHP and MySQL but unfortunately I'm now required to bits of PL/SQL in my new role.
Any help would be greatly appreciated.
Knowing that:
There is no standard PL/SQL package that parses HTML.
You can't reliably parse HTML with regex. Furthermore, Oracle only support basic regular expressions, restricting its capabilities.
You want to stay in PL/SQL
You are left with few options (that I can think of):
Write a simple procedure yourself that will work in most of the cases (but there will be many exceptions that will break your parser).
Use a java parser, load class in database, call java from PL/SQL. Oracle comes with its integrated jvm, so this involves no extra setup.
I would go with option (2) if you want reliability, or option (1) if infrequent but inevitable losses are acceptable.
Since your content will be coming from email client, we can assume that only a tiny (negligible?) fraction will have very obscure HTML.
In that case you could start with simple regex expressions that may need some tweaking:
SQL> SELECT regexp_replace(
2 '<p1 3.0pt="" padding:="" #b5c4df="">
3 text
4 </p>',
5 '<([[:alpha:]]+)[^>]*>',
6 '<\1>') remove_attr_simple
7 FROM dual;
This will fail to catch tricky valid HTML (such as <P attr=">">) but since your input is somewhat standard this should be fine often enough. You may need to remove HTML comments with another procedure -- I'm not sure it can be done with regex.
SQL is really not the best tool for this job. Nor will regexes be able to perform this kind of task reliably. You would be better off extracting the data and processing it in another language using an XML parser.
Presumably Oracle itself is not sending these emails. What program does the sending, and can you add some programmatic processing at that point?
Since you already know PHP, here is a discussion of parsing HTML/XML in PHP. Similar tools are available in most other languages.