I am working on struts2. I am calling a simple web service which gives me list of all countries. The result I am getting is in string formal but actually its a xml output. So how can I get the result and in simple string.
when I print it in console I get result like this:
<NewDataSet>
<Table>
<Name>Zambia</Name>
</Table>
<Table>
<Name>Zimbabwe</Name>
</Table>
</NewDataSet>
this is encapsulated as string.
can anyone help me to get all countries in normal string?
In the links provided by #nmenego in the comment, below answer is the best one to start & hence, writing it here again. Because others' require external libraries, although they have their merits.
String xml = "<resp><status>good</status><msg>hi</msg></resp>";
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
InputSource source = new InputSource(new StringReader(
xml));
String status = xpath.evaluate("/resp/status", source);
System.out.println("satus=" + status);
Reference
Another approach could be to convert XML to POJO
Here's a very good example - http://www.mkyong.com/java/jaxb-hello-world-example/
Related
With Delphi Rio, I am using an HTML/DOM parser. I am traversing the various nodes, and the parser is returning attributes/tags. Normally these are not a problem, but for some attributes/tag, the string returned includes multiple attributes. I need to parse this string into some type of container, such as a stringlist. The attribute string the parser returns already has the '<' and '> removed.
Some examples of attribute strings are:
data-partnumber="BB3312" class=""
class="cb10"
account_number = "11432" model = "pay_plan"
My end result that I want is a StringList, with one or more name=value pairs.
I have not used RegEx to any real degree, but I think that I want to use RegEx. Would this be a valid approach? For a RegEx pattern, I think the pattern I want is
\w\s?=\s?\"[^"]+"
To identify multiple matches within a string, I would use TRegex.Matches. Am I overlooking something here that will cause me issues later on?
*** ADDITIONAL INFO ***
Several people have suggested to use a decent parser. I am currently using the openSource HTML/DOM parser found here: https://github.com/sandbil/HTML-Parser
In light of that, I am posting more info... here is an HTML Snippet I am parsing. Look at the line I have added *** at the end. My parser is returning this as
Node.AttributeText= 'data-partnumber="B92024" data-model="pay_as_you_go" class="" '
Would a different HTML DOM parser return this as 3 different elements/attributes? If so, can someone recommend a parser?
<section class="cc02 cc02v0" data-trackas="cc02" data-ocomid="cc02">
<div class="cc02w1">
<div class="otable otable-scrolling">
<div class="otable-w1">
<table class="otable-w2">
<thead>
<tr>
<th>Product</th>
<th>Unit Price</th>
<th>Metric</th>
</tr>
</thead>
<tbody>
<tr>
<td class="cb152title"><div>MySQL Database for HeatWave-Standard-E3</div></td>
<td><div data-partnumber="B92024" data-model="pay_as_you_go" class="">$0.3536<span></span></div></td> *****
<td><div>Node per hour</div></td>
</tr>
<tr data-partnumber="B92426">
<td class="cb152title">MySQL Database—Storage</td>
<td><span data-model="pay_as_you_go" class="">$0.04<span></span></span></td>
<td>Gigabyte storage capacity per month</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</section>
The documentation for the parser you are using says TDomTreeNode has an AttributesText property that is a "string with all attributes", which you have shown examples of. But it also has an Attributes property that is "parsed attributes" provided as a TDictionary<string, string>. Have you tried looking into the values of that property yet? You should not need to use a RegEx at all, just enumerate the entries of the TDictionary instead, eg:
var
Attr: TPair<string, string>;
for Attr in Node.Attributes do begin
// use Attr.Key and Attr.Value as needed...
end;
(As the OP asked about using a RegEx to parse attribute=value pairs, this answers the question directly, which other users may be looking for in the future.)
RegEx based answer
Using a RegEx is extremely powerful, from the data you have provided you can extract the attribute name and value pairs using:
(\S+)\s*=\s*(\"?)([^"]*)(\2|\s|$)
This uses grouping and can be explained as follows:
The first result group is the attribute name (it matches non-whitespace characters)
The second result group is an enclosing " if present, otherwise an empty string
The third result group is the value of the attribute
As RegExes can be run recursively you can use MatchAgain to see if there's another match and so read all of the attributes recursively.
procedure ParseAttributes(AInput: String; ATarget: TStringList);
var
LMatched: Boolean;
begin
pRegEx:=TPerlRegEx.Create;
try
pRegEx.RegEx:='(\S+)\s*=\s*(\"?)([^"]*)(\2|\s|$)';
pRegEx.Subject:=AInputData;
LMatched:=pRegEx.Match;
while LMatched do
begin
ATarget.Add(pRegEx.Groups[1].'='+'"'+pRegEx.Groups[3]+'"');
LMatched:=pRegEx.MatchAgain;
end;
finally
pRegEx.Free;
end;
end;
Disclaimer: I haven't tried compiling that code, but hopefully it's enough to get you started!
Practical Point: With respect to the actual problem you posed with your DOM parser - this is a task that there are existing solutions for so a practical answer to solving the problem may well be to use a DOM parser that works! If a RegEx is something you need for whatever reason this one should do the job.
I have a String containing the following:
"<CV-ALL><CURRICULO-VITAE SISTEMA-ORIGEM-XML='DEGOIS_ONLINE' DATA-ATUALIZACAO='26032015' HORA-ATUALIZACAO='193918' VERSAO-DA-GRAMATICA='2.0'><DADOS-GERAIS NOME-COMPLETO='Gonçalves' NOME-EM-CITACOES-BIBLIOGRAFICAS='Pte, A.' NACIONALIDADE='P' CURRICULUM-CONCLUIDO='SIM' OUTRAS-INFORMACOES-RELEVANTES='' ID-DEGOIS='267296113190873275' ORCID='0000-0001-5944-3218'><ENDERECO FLAG-DE-PREFERENCIA='ENDERECO_INSTITUCIONAL'><ENDERECO-PROFISSIONAL CODIGO-INSTITUICAO-EMPRESA='1124000002312' NOME-INSTITUICAO-EMPRESA='Instituto Politécnico' CODIGO-ORGAO='43400884' NOME-ORGAO='Escola Superior' CODIGO-UNIDADE='11241886' NOME-UNIDADE='Centro de Investigação' PAIS='Portugal' UF='CE'/> .... "
Please, note that for the sake of simplicity I have omitted the the remaining content of this string and its content is in fact an XML document.
Anybody knows/has an XSL transformation script that takes as input a string such this and transform it to a document XML so that it is possible to navigate using XPath expressions?
Thanks!
Assuming you are passing the string to the XSLT stylesheet as a parameter, and processing a dummy XML document, you have two options:
Output the string with output escaping disabled, save the result
as a new document, then process the new document with another
XSLT stylesheet;
Use XSLT 3.0 (or a processor that supports parsing XML as an
extension) to parse the string as XML before processing it.
Of course, you could save yourself all this trouble by writing the string to a file using your parent process, then pointing your XSLT processor to the resulting file.
I have the following text in the HTML response:
<input type="hidden" name="test" value="testValue">
I need to extract the value from the above input tag.
I've tried both regexp and xpath extractor, but neither is working for me:
regexp pattern
input\s*type="hidden"\s*name="test"\s*value="(.+)"\s*>
xpath query
//input[#name="test"]/#value
The above xpath gives an error at the Xpath Assertion Listener .. "No node matched".
I tried a lot and concluded that the xpath works only if I use it as //input[#name].
At the moment I'm trying to add an actual name it gives the error .. "No node matched".
Could anyone please suggest me how to resolve the above issue?
Please take a look at my previous answer :
https://stackoverflow.com/a/11452267/169277
The relevant part for you would be step 3:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Element;
String html = prev.getResponseDataAsString(); // get response from your sampler
Document doc = Jsoup.parse(html);
Element inputElement = doc.select("input[name=test]").first();
String inputValue = inputElement.attr("value");
vars.put("inputTextValue", inputValue);
Update
So you don't get tangled with the code I've created jMeter post processor called Html Extractor here is the github url :
https://github.com/c0mrade/Html-Extractor
Since you are using XPath Extractor to parse HTML (not XML) response ensure that Use Tidy (tolerant parser) option is CHECKED (in XPath Extractor's control panel).
Your xpath query looks fine, check the option mentioned above and try again.
I am getting this html string returned via a web service call.
I am having trouble displaying the html probably because of the odd format (notice how the opening brackets "< head >" show up as '< ; head > ;' instead)
This is my truncated html formatted response.
What I am trying to do is display this html page on a form. But I am even having trouble getting it to open when I write the string to a file.
Any help is greatly appreciated,
Thanks
<?xml version="1.0" encoding="utf-8"?>
<string xmlns="http://tempuri.org/"><html><head><title>......
...html......</div></div></body></html></string>
< and > are XML and HTML character entities. For some reason (probably to return misformatted html) this web service(?) returns tag brackets <> replaced with < and >. If you assume that in returned <string></string> element < and > are used only as tag brackets you can just replace entities with proper bracket. If you can't assume that you need to parse string element text to obtain valid html.
Thanks for your input. This is what worked for me.
uses
HTTPApp
var
HtmlXmlText : String;
HtmlText := HTMLDecode(HtmlXmlText);
I used HTMLDecode to clean up the odd characters to the standard formatting.
If you are just writing code to deal with this string (and you don't have access to code that retrieved it) then the most correct way is to use an XML parser.
uses XmlIntf;
procedure blah;
var
doc: IXMLDocument;
HtmlText: String;
begin
doc := CreateXMLDocument;
doc.LoadFromFile(...);
HtmlText := doc.documentElement.InnerText;
// Your text is already decoded here
DoWhatever(HtmlText);
end;
AS 3.0 / Flash
I am consuming XML which I have no control over.
the XML has HTML in it which i am styling and displaying in a HTML text field.
I want to remove all the html except the links.
Strip all HTML tags except links
is not working for me.
does any one have any tips? regEx?
the following removes tables.
var reTable:RegExp = /<table\s+[^>]*>.*?<\/table>/s;
but now i realize i need to keep content that is the tables and I also need the links.
thanks!!!
cp
Probably shouldn't use regex to parse html, but if you don't care, something simple like this:
find /<table\s+[^>]*>.*?<\/table\s+>/
replace ""
ActionScript has a pretty neat tool for handling XML: E4X. Rather than relying on RegEx, which I find often messes things up with XML, just modify the actual XML tree, and from within AS:
var xml : XML = <page>
<p>Other elements</p>
<table><tr><td>1</td></tr></table>
<p>won't</p>
<div>
<table><tr><td>2</td></tr></table>
</div>
<p>be</p>
<table><tr><td>3</td></tr></table>
<p>removed</p>
<table><tr><td>4</td></tr></table>
</page>;
clearTables (xml);
trace (xml.toXMLString()); // will output everything but the tables
function removeTables (xml : XML ) : void {
xml.replace( "table", "");
for each (var child:XML in xml.elements("*")) clearTables(child);
}