DataSink step to return each response with all the children - web-services

I learnt very recently how to use data-driven testing in Ready API and loop calls based on the data. My goal is to run the steps in loop and at the end create an auto-export facility with DataSink so that the results get auto exported.
Now when I try go to DataSink, as I understood I need to create column headers as below
to store the corresponding child values
It would work well, if the soap response for each of the siteId has the same XML structure. But in my case each of the 2000+ response that I get has different number of children within
<retrun> </return>
For e.g. please take a look at the response 1 and response 2. Both these responses have different number of children.
Response 1
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<ns2:getSiteInfoResponse xmlns:ns2="http://billing.xyz.cc/">
<return>
<address1>A</address1>
<city>B</city>
<closeDate>2018-10-15T00:00:00-05:00</closeDate>
<contact1/>
<contact2>TBD</contact2>
<country>X1</country>
<customerNbr>288</customerNbr>
<emailAddr1/>
<emailAddr2/>
<fax1>0</fax1>
<fax2>0</fax2>
<gps>C</gps>
<grouping2>Leased</grouping2>
<grouping4>D</grouping4>
<jobTitle1/>
<jobTitle2/>
<phone1>0</phone1>
<phone2>0</phone2>
<siteId>862578</siteId>
<siteName>D</siteName>
<squareFoot>0.0</squareFoot>
<state>E</state>
<weatherStation>D</weatherStation>
<zip4>4</zip4>
<zip5>F</zip5>
</return>
</ns2:getSiteInfoResponse>
</soap:Body>
</soap:Envelope>
Response 2
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<ns2:getSiteInfoResponse xmlns:ns2="http://billing.xyz.cc/">
<return>
<address1>1202</address1>
<city>QA</city>
<contact1/>
<contact2>BL</contact2>
<country>A</country>
<customerNbr>288</customerNbr>
<emailAddr1/>
<emailAddr2/>
<fax1>0</fax1>
<fax2>0</fax2>
<gps>LTE</gps>
<grouping1>1345</grouping1>
<grouping2>Leased</grouping2>
<grouping3>ZX</grouping3>
<grouping4>AA</grouping4>
<grouping5>2000</grouping5>
<jobTitle1/>
<jobTitle2/>
<phone1>0</phone1>
<phone2>0</phone2>
<services>
<accountNbr>11099942</accountNbr>
<liveDt>2013-07-01T00:00:00-05:00</liveDt>
<service>2</service>
<serviceType>gas</serviceType>
<vendorAddr1/>
<vendorAddr2>M</vendorAddr2>
<vendorCity>N</vendorCity>
<vendorName>O</vendorName>
<vendorNbr>P</vendorNbr>
<vendorPhone>Q</vendorPhone>
<vendorState>R</vendorState>
<vendorZip>S</vendorZip>
</services>
<services>
<accountNbr>13064944</accountNbr>
<liveDt>2018-05-20T00:00:00-05:00</liveDt>
<service>2</service>
<serviceType>gas</serviceType>
<vendorAddr1/>
<vendorAddr2>A</vendorAddr2>
<vendorCity>B</vendorCity>
<vendorName>C</vendorName>
<vendorNbr>677</vendorNbr>
<vendorPhone>D</vendorPhone>
<vendorState>E</vendorState>
<vendorZip>F</vendorZip>
</services>
<siteId>101567</siteId>
<siteName>X</siteName>
<squareFoot>4226.0</squareFoot>
<state>Y</state>
<weatherStation>Z</weatherStation>
<zip4>0</zip4>
<zip5>L</zip5>
</return>
</ns2:getSiteInfoResponse>
</soap:Body>
</soap:Envelope>
Now, I need to further create a table using the whole response to be utilized in business intelligence. If I have to create matching headers in DataSink I need to go through each and every responses to ensure that I have created a corresponding property in datasink. It is not humanly possible without compromising the accuracy.
Is there any way to program Ready API to store individual XML response by each looping call in a file specified by me (2000+ XML responses) or to store all the values by children of the response node without needing me to specify all the header names in the DataSink window. If it happens, i will be fine in both cases to utilize a BI tool to create a corresponding table from there.
Thank you in advance.

As you point out, the differing number of children makes the linear data sink problematic.
That said, you can still use datasink to dump out all values in one go. In the datasink, create a single header and use 'get data' to select the root node of your response.
This will obviously generate a massive file, so you have two choices here. Either dump everything into a single file, or you could create a new file per response
If you're wondering about naming of lots of little files, you can generate a filename on the fly for the data sink to use. To do this, create a groovy script inside the loop. In this script, make it return a path and the file name. You could use some timestamp value, e.g. c:\temp\myResults\2020120218150102.txt, which is year, month, data, hour, min seconds and ms. Then, in the Data sink step where you browse for the file name, use get data to 'grab' the result of the groovy script.

#Chris Adams thanks for your awesome idea. Even though I could not completely put this into practice. But because of your idea (Get Data) I took a different route and I got what I wanted.
So this is what I did. Instead of using DataSink I used create file. The idea is whenever I schedule this task the Ready API can run the whole thing in loop and throw the result in a static folder
with file name
containing
site Id obtained from Get Data Raw request agr3
${getSiteInfo#RawRequest#declare namespace bil='http://billing.xyz.cc/'; //bil:getSiteInfo[1]/arg3[1]}.xml
and
file content
with whole response from root node Response
${getSiteInfo#Response#declare namespace soap='http://schemas.xmlsoap.org/soap/envelope/'; //soap:Envelope[1]}
The end result is this
However, I am still interested in this and I could get this part to work.
That said, you can still use datasink to dump out all values in one go. In the datasink, create a single header and use 'get data' to select the root node of your response.

Related

PDI - Check data types of field

I'm trying to create a transformation read csv files and check data types for each field in that csv.
Like this : the standard field A should string(1) character and field B is integer/number.
And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.
I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.
You can read files in loop, and
add step as below,
after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.
after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.
If it has value 1 then move files else ....
I hope this can help.
You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.
Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.
In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.
Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move
And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like
there. It is more flexible if you need maintenance on the long run.

How to store graph in files

I want to store the following information in a file.
My program is consisted of set of string that are connected forming a graph.
I call each single string "Tag".
let's say we have 3 main tags $Mohammed , $car , $color
Each of the main tags contains sub tags and each sub tag has a value or another sub tag or set of sub tags.
$Mohammad:
$Age: "18"
$color: $red
$kind_of: $human
$car:
$type: $toyota
$color: $blue
$doors:
$number: "3"
$car:
$made_of: $metal
$used_for: $transporting
$types: {$mercedes,$toyota,$nissan}
$best_color: $red
$color:
$usedto: $coloring_things
$example: {$red,$green,$blue,...}
But this in not the only thing, there is a connection between the tags of the same name, so that $Mohammed->$car->$color must be connected with the main tag $color. and $Mohammed->$color:$red , $car->$best_color:$red , $color->$best_color: $red and the main tag $red must all be connected to each other.
The tags connected means be stored in a way that I can call the connected tags at once. just like the computer memory. when it calls something from the memory, it calls the information before and after the requested information.
When I looked to my situation in the first time, I thought that XML would solve it, but then I realized that XML can't represent graph.
I don't want to use databases for this. I want to keep database as my last weapon.
Any idea or suggestion about how can I store,connect and recall the informations from my program?
Thanks in advance.
You actually could use XML, but I would recommend JSON or Yaml.
Your example format is already very close to Yaml.
Take a loot at boost's property_tree
It contains a nice c++ way to represent your graph, and let's you very easily decide what kind of file-representation you want. Be that xml, json, info.
Also, I don't see why your graph can't be represented by xml, as it supports named nodes.
Although property_tree also supports the ini format, that actually can't represent your >2 level deep tree.

WSO2 CEP Multiple rows in resultset

I wanted to know if the WSO2 CEP/Siddhi query supports returning multiple rows if yes how data from those rows can be mapped to the output XML ? e.g. my event stream has a field statusCode which can have values A/B/C I wanted to write a query which gives me the count by status type for past 5 mins e.g A-10,B-5,C-2.. in the current query i used group by statusCode to get the count of status
MyQuery- ...insert into TestStream statusCode, count(statusCode) as count group by statusCode
and my output XML is something like
<statusSmry>
<status>{statusCode}</status>
<count>{count}</status>
</statusSmry>
the output i receive is something like
<statusSmry>
<status>A</status>
<count>10</status>
</statusSmry>
.....
<statusSmry>
<status>B</status>
<count>5</status>
</statusSmry>
....
<statusSmry>
<status>C</status>
<count>2</status>
</statusSmry>
Is it possible to get results of query in a single XML ? i.e. in above case counts for A,B,C in a single XML ?
Thanks
Rajiv
What you asked is not possible in Siddhi.
This is because whenever there is an input event the total count will be updated, at the same time an output for the corresponding updated group need to be triggered to notify the subscribers. Since this is a realtime process Siddhi cannot accumulate all the events and output as one event/XML. If in any case its going to accumulate the events then there will be a problem for how long it's going to accumulate for, 1sec or 1day?, and in what format the output need to be sent, therefore currently it's (WSO2 CEP 2.0.1) not supporting accumulation.
If you need this feature then you have to send the output of CEP to an ESB and run some kind of an aggregation process.
Suho

Write XML to txt file with TinyXml

I have a simple XML file but large. Let's say
<products>
<product_id>98667</product_id>
<name>Hiking Boots</name>
<price>34.99</price>
<product_id>10123</product_id>
<name>Work Gloves</name>
<price>12.99</price>
<product_id>15773</product_id>
<name>Belt</name>
<price>14.99</price>
</products>
I want to write the data of the entire file(say 500 entries) to a tab delimited text file. To understand the process even better, lets say I only want to write the products name and products price. I could not see where the typical TinyXml tutorials were covering this type of write.
My code:
void MyProducts::writeProducts(const char *pFilename) {
TiXmlDocument doc(pFilename);
bool loadOkay = doc.LoadFile();
if (loadOkay) {
cout<<"The file loaded"<<endl;
cout<<pFilename<<endl;
cout<<doc<<endl;
}
else {
cout<<"Failed to load file \"%s\"\n"<< pFilename<<endl;
}
} // end function
I'm not sure if I want to parse line by line like a normal file or if that even does any good with this api. I can print the entire contents of doc so I know the file is loading.
Any help is appreciated. I'm sure its just a matter of knowing what function calls to make.
I would first go through their tutorial. It shows some basic XML writing as well as reading.
The above comments are correct... You would load the XML document (as you have in your code example) and then traverse the XML.
Looking at the documentation, since you already have the XMLDocument loaded, you'd need to call the RootElement method. This returns an element. From an element you're able to pull out information about the current node. To go 'farther' into the tree, you'd have to use the TiXmlNode::IterateChildren to go through the root elements children. When you tranverse through the tree, a TiXmlNode is returned. You then use the Nodes methods to get different class types which will then pull out different information.

XML or CSV for "Tabular Data"

I have "Tabular Data" to be sent from server to client --- I am analyzing should I be going for CSV kind of formate or XML.
The data which I send can be in MB's, server will be streaming it and client will read it line by line to start paring the output as it gets (client can't wait for all data to come).
As per my present thought CSV would be good --- it will reduce the data size and can be parsed faster.
XML is a standard -- I am concerned with parsing data as it comes to system(live parsing) and data size.
What would be the best solution?
thanks for all valuable suggestions.
If it is "Tabular data" and the table is relatively fixed and regular, I would go for a CSV-format. Especially if it is one server and one client.
XML has some advantage if you have multiple clients and want to validate the file format before using the data. On the other hand, XML has cornered the market for "code bloat", so the amount transfered will be much larger.
I would use CSV, with a header which indicate the id of each field.
id, surname, givenname, phone-number
0, Doe, John, 555-937-911
1, Doe, Jane, 555-937-911
As long as you do not forget the header, you should be fine if the data format ever changes. Of course the client need be updated before the server starts sending new streams.
If not all clients can be updated easily, then you need a more lenient messaging system.
Google Protocol Buffer has been designed for this kind of backward/forward compatibility issues, and combines this with excellent (fast & compact) binary encoding abilities to reduce the message sizes.
If you go with this, then the idea is simple: each message represents a line. If you want to stream them, you need a simple "message size | message blob" structure.
Personally, I have always considered XML bloated by design. If you ever go with Human Readable formats, then at least select JSON, you'll cut down the tag overhead by half.
I would suggest you go for XML.
There are plenty of libraries available for parsing.
Moreover, if later the data format changes, the parsing logic in case of XML won't change only business logic may need change.
But in case of CSV parsing logic might need a change
CSV format will be smaller since you only have to delare the headers on the first row then rows of data below with only commas in between to add any extra characters to the stream size.