XML to JSON conversion using perl - regex

I am using perl for converting an XML file JSON. I was able to write the code for converting the input XML file to Json format.
Here is the sample xml:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<Person SchemaVersion="1.0.8">
<Home ID="ABC-XYZ" State="Unknown">
sample perl code doing json conversion:
use JSON;
use XML::Simple;
# Create the object of XML Simple
my $xmlSimple = new XML::Simple(KeepRoot => 1);
# Load the xml file in object
my $dataXML = $xmlSimple->XMLin("input/path/to/above/XMLFILE");
# use encode json function to convert xml object in json.
my $jsonString = encode_json($dataXML);
# finally print json
print $jsonString;
Output JSON value:
"Person": {
"ID": "0",
"SchemaVersion": "1.0.8",
"Home": {
"ID": "ABC-XYZ",
"Laptop": {
"FileName": "/usr/temp/RPM_020515_.tar.gz"
"Location": {
"Number": "62",
"MaxSize": "0",
"Comment": { },
"SiteName": { }
"State": "Unknown"
Above code is working fine.
My question is and this is where i am actually stuck.
I need to do one more thing along with JSON conversion which is checking if element "FileName" in XML is empty or not. if its not extract its value as well.
So output will be two things:
1. XML To JSON convert ( working fine )
2. Along with point 1. extract the value of nested element in XML
"FileName". I need that for some business logic need in next step.
Can some perl experts help me here and suggest me how i can do that in my current perl code.
Thanks for helping in advance.
This is my first perl script so please excuse me if this is a too trivial question to ask.
Tried reading the perl docs but not that helpful.
NOTE: I am trying to use only perl built in libraries not any new third party libraries that is the reason i used XML::Simple. Its production code restriction ( very bad boundation ). I hope something for this exists in XML::Simple or JSON.

As XML::Simple docs note, it 'slurps' the XML into a data structure analogous to the input XML. Thus, to figure out the contents of the XML, just treat it as the hash reference it likely is:
if ($dataXML->{Person}{Home}{ID} eq 'foo') {
# Some action

You should use XML::LibXML instead of XML::Simple. Then use xpath to query each document. I'd take a look at the code of XML::XML2JSON to see if it can be a good fit for the problem…


JMeter json path extractor and Regular expression combination

I want to extract sys_id for the employee_number does not starting with "C"
"items": [{
"sys_updated_on": "2021-01-15 15:04:04",
"sys_id": "60eaa1dc47870d9132f624846d434a",
"employee_number": "C89"
}, {
"sys_updated_on": "2017-12-08 09:26:49",
"sys_id": "c57058e8db8689ca52c4be13961974",
"employee_number": "983"
}, {
"sys_updated_on": "2016-04-08 13:25:00",
"sys_id": "fd413e848716119096ca2d0ebb358e",
"employee_number": "565"
I tried multiple JSON Extractor expressions but no luck
Need both Xpath and Regular expression combination as the JSON contains many other fields and provided JSON is a small part of that.
I also want to know how to combine the regular expression and JSON
If you want to match only numbers in the employee_number - try out the following:
$.items[?(#.employee_number =~ /\d+/)].sys_id
More information:
JsonPath Operators
How to Use the JSON Extractor For Testing
with regards to what you "also want to know" - one post - one question, however I'll give you some hint:

Regular Expression If 2nd parameter is Enrollment

I have below response
"id": "3452",
"enrollable_id": "3452",
"enrollable_type": "Enrollment"
"id": "3453",
"enrollable_id": "3453",
"enrollable_type": "Task"
"id": "3454",
"enrollable_id": "3454",
"enrollable_type": "Enrollment"
"id": "3455",
"enrollable_id": "3455",
"enrollable_type": "Task"
I would like to get id [3452 and 3454] only if enrollable_type= Enrollment. This is for jmeter regex extractor so it would be great if I can just use one liner regex to fetch 3452 and 3454.
The RegEx you are looking for is:
Try it online!
_id":\s*" Finds the place where the enrollment_id is
[^"]+(?= Matches the ID if:
[^\0}]+_type":\s* Finds the place where enrollable_type is
"E Checks if the enrollable type begins with an uppercase E
) End if
( ) Captures the ID
It's important to note that this RegEx will match on valid people and capture the valid ID. This means you will need to get each match's capture rather than just getting each match.
The above RegEx contains backslashes, which you will need to escape if using the RegEx as a string literal.
This is the RegEx with all necessary-to-escape characters escaped:
It's usually a bad idea to parse structured data with just a regex, but if you're intent on going this route then here you go:
This assumes that entrollable_type always follows enrollable_id and that everything is quoted consistently with a little allowance for variance in white space. You should be able to handle a little more variance if necessary, such as if you're unsure if can depend on keys or data being quoted (["']?). However, if you can depend on the order of the properties (such as if they type comes before id) then you should abandon using a regex.
Here's a sample working in JavaScript
const text = `{ "id": "3452", "enrollable_id": "3452", "enrollable_type": "Enrollment" } { "id": "3453", "enrollable_id": "3453", "enrollable_type": "Task" } { "id": "3454", "enrollable_id": "3454", "enrollable_type": "Enrollment" } { "id": "3455", "enrollable_id": "3455", "enrollable_type": "Task" }`;
const re = /"(\d+)"\s*,\s*(?="enrollable_type":\s*"Enrollment")/g;
var match;
while(match = re.exec(text)) {
Your response seems to be a JSON one (however it's malformed). If this is the case and it's really JSON - I would recommend going for JSON Extractor instead as regular expressions are fragile, sensitive to markup change, new lines, order of elements, etc. while JSON Extractor looks only into the content.
The relevant JSON Path query would be something like:
$..[?(#.enrollable_type == 'Enrollment')].enrollable_id
More information: JMeter's JSON Path Extractor Plugin - Advanced Usage Scenarios
You can extract the data in 2 ways
Using Json Extractor.
To extract data using json extractor response data should follow json syntax rules,
To extract data use the following JSON path in json extractor
and use match no -1 as shown below
To extract data using regular expression extractor use the following regex
id": "(.+?)",\s*(.+?)\s*"enrollable_type": "Enrollment
template : $1$2$3$4$
Match no -1
as shown below
you can see the variables stored using debug sampler
More information
extract variables

Jmter complicated regular expression solution? [duplicate]

I have following JSON format in response body
"Name" : "Prashant",
"City" : "Sydney"
"Name" : "Yogi",
"City" : "London"
What is the better way for checking if this array has any records and if yes give me "Name" for first array index. I am using jp#gc JSON extractor plugin for jMeter.
Is it possible to parse this using a plugin or do I need to do it using regular expressions?
Using Ubik Load Pack JSON plugin for JMeter which is part of JMeter since version 3.0 (donated plugin) and called JSON Extractor, you can do it:
Test Plan overview:
ULP_JSON PostProcessor:
If Controller:
And here is the run result:
So as you can see it is possible with plain JMeter
If you're looking to learn JMeter, this book by 3 developers of the project will help you.
I am not sure about your plugin but if it supports JSON path expressions it should be possible.
Try with this expression: $.[0].Name.
This is the plugin I use: http://jmeter-plugins.org/wiki/JSONPathExtractor/ and given expression works with it.
You can find more about JSON Path expressions here: http://goessner.net/articles/JsonPath/index.html#e2.
Working with JSON in JMeter is not quite easy as JMeter was designed long ago before JSON was invented.
There are some extensions however that make life easier:
We can add a regular expression extractor for fetching the value from the response.
Like This:
If possible, always use Regular Expression Extractor. Try to avoid JSON / XPATH / Other extractors. They might look easy to use. But they consume more memory and time. It will affect the performance of your test plan.
source: http://www.testautomationguru.com/jmeter-response-data-extractors-comparison/
Rest Get service sample:
"ObjectIdentifiers": {
"internal": 1,
"External1": "221212-12121",
"External3": "",
"Name": "koh"
"PartyType": "naturalPerson",
"NaturalPerson": {
"idNo": "221212-12121",
"Title": "Mr",
"Name": "koh",
"FirstName": "",
We had a similar requirement in our project for parsing json responses using jmeter. The requirement was to validate all the fields in the json response and the expected values of field would be provided from external data source.
I found the JSR223 PostProcessor quite usefule in this case as we are able to implement Groovy scripts with this. it comes as a default plugin with the recent Jmeter version
Below is the code snippet:
//get the JSON response from prev sampler
String getResponse = prev.getResponseDataAsString();
//parse the response and convert to string
JSONParser parser = new JSONParser(JSONParser.MODE_JSON_SIMPLE);
String parResponse = parser.parse(getResponse);
String preResponse = parResponse.toString();
JsonObject NaturalPerson = JsonObject.readFrom(preResponse);
//replace all commas with a semi-colon
String csvResponse = preResponse.replaceAll(",", ";");
//log response to file
logFileName = "C:/apache-jmeter-5.1.1/Web_Service_Output.csv";
BufferedWriter outLog = new BufferedWriter(new FileWriter(logFileName, true));
outLog.write(csvResponse + "\n");

Generic solution for removing xml declararation using perl

Hi i want remove the declaration in my xml file and problem is declaration is sometimes embed with the root element.
XML looks as follows
<?xml version="1.0" encoding="UTF-8"?> <document> This is a document root
Case 2:
<?xml version="1.0" encoding="UTF-8"?>
<document> This is a document root
Function should also work for the case when root node is in next line.
My function works only for case 2..
sub getXMLData {
my ($xml) = #_;
my #data = ();
while(<FILE>) {
if(/\<\?xml\sversion/) {next;}
push(#data, $_);
return join("\n",#data);
*** Please note that encoding is not constant always.
OK, so the problem here is - you're trying to parse XML line based, and that DOESN'T WORK. You should avoid doing it, because it makes brittle code, which will one day break - as you've noted - thanks to perfectly valid changes to the source XML. Both your documents are semantically identical, so the fact your code handles one and not the other is an example of exactly why doing XML this way is a bad idea.
More importantly though - why are you trying to remove the XML declaration from your XML? What are you trying to accomplish?
Generically reformatting XML can be done like this:
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new(
pretty_print => 'indented',
This will parse your XML and reformat it in one of the valid ways XML may be formatted. However I would strongly urge you not to just discard your XML declaration, and instead carry on with something like XML::Twig to process it. (Open a new question with what you're trying to accomplish, and I'll happily give you a solution that doesn't trip up with different valid formats of XML).
When it comes to merging XML documents, XML::Twig can do this too - and still check and validate your XML as it goes.
So you might do something like (extending from the above):
foreach my $file ( #file_list ) {
my $child = XML::Twig -> new ();
$child -> parsefile ( $xml_file );
my $child_doc = $child -> root -> cut;
$child_doc -> paste ( $twig -> root );
$twig -> print;
Exactly what you'd need to do, depends a little on your desired output structure - you'd need 'wrap' in the root element anyway. Open a new question with some sample input and desired output, and I'll happily take a crack at it.
As an example - if you feed the above your sample input twice, you get:
<?xml version="1.0" encoding="UTF-8"?>
<document><document> This is a document root
<child>----</child></document> This is a document root
Which I know isn't likely to be what you want, but hopefully illustrates a parser based way of XML restructuring.

How to extract everything between 2 characters from JSON response?

I'm using the regex in Jmeter 2.8 to extract some values from JSON responses.
The response is like that:
"key": "prod",
"id": "p2301d",
"objects": [{
"id": "102955",
"key": "member",
I'm trying to get everything except the text between [{....}] with one regex.
I've tried this one "key":([^\[\{.*\}\],].+?) but I'm always getting the other values between [{...}] (in this example: member)
Do you have any clue?
Suppose you can try to use custom JSON utils for jmeter (JSON Path Assertion, JSON Path Extractor, JSON Formatter) - JSON Path Extractor in this case.
Add ATLANTBH jmeter-components to jmeter: https://github.com/ATLANTBH/jmeter-components#installation-instructions.
Add JSON Path Extractor (from Post Processors components list) as child to the sampler which returns json response you want to process:
(I've used Dummy Sampler to emulate your response, you will have your original sampler)
Add as many extractors as values your want to extract (3 in this case: "key", "id", "features").
Configure each extractor: define variable name to store extracted value and JSONPath query to extract corresponding value:
for "key": $.key
for "id": $.id
for "features": $.features
Further in script your can refer extracted values using jmeter variables (variable name pointed in JSON Path Extractor settings in "Name" field): e.g. ${jsonKey}, ${jsonID}, ${$.features}.
Perhaps it may be not the most optimal way but it works.
My solution for my problem was to turn the JSON into an object so that i can extract just the value that i want, and not the values in the {...}.
Here you can see my code:
var JSON={"itemType":"prod","id":"p2301d","version":"10","tags":[{"itemType":"member","id":"p2301e"},{"itemType":"other","id":"prod10450"}],"multiPrice":null,"prices":null};
//Transformation into an object:
obj = eval(JSON );
//write in the Jmeter variable "itemtype", the content of obj.itemType:prod
vars.put("itemtype", obj.itemType);
For more information: http://www.havecomputerwillcode.com/blog/?p=500.
A general solution: DEMO
Regex: (\[{\n\s*(?:\s*"\w+"\s*:\s*[^,]+,)+\n\s*}\])
Explanation, you don't consume the spaces that you must correctly, before each line there are spaces and you must consume them before matching, that's why isn't your regex really working. You don't need to scape the { char.