Extract JSON data from file in C++ - c++

Here there, sorry if this question is not well-suited for this forum. I'm pretty new to programming and thought I'd get a better command of strings and files by creating this little project. What I'm trying to do is extract data from a JSON document. Eventually I'd store the data in an array I suppose and work with it later.
Basically, I'm wondering if there is a better way of going about this. The code seems kind of wordy and definitely not elegant. Again, sorry if this question is not a good one, but I figured there'd be no better way to learn than through a community like this.
#include <iostream>
#include <fstream>
#include <cstring>
#include <string> //probably including more than necessary
using namespace std; //should be specifying items using scope resolution operator instead
int main(int argc, const char * argv[])
{
ifstream sfile("JSONdatatest.txt");
string line,temp;
while(!sfile.eof()){
getline(sfile, line);
temp.append(line); //creates string from file text, use of temp seems extraneous
}
sfile.close();
cout << "Reading from the file.\n";
size_t counter=0;
size_t found=0;
size_t datasize=0;
while(found!=string::npos && found<1000*70){ //problem here, program was creating infinite loop
//initial 'solution' was to constrain found var
//but fixed with if statement
found = temp.find("name: ",counter);
if(found!=string::npos){
found=found+7; //length of find variable "name: ", puts us to the point where data begins
size_t ended=temp.find_first_of( "\"", found);
size_t len=ended-found; //length of datum to extract
string temp2(temp, found, len); //odd use of a second temp function,
cout << temp2 << endl;
counter=ended+1;
datasize++; //also problem with data size and counter, so many counters, can they
//coordinate to have fewer?
}
}
cout << datasize;
return 0}
Where I indicate an infinite loop is made, I fixed by adding the if statement in the while loop. My guess is because I add 7 to 'found' there is a chance it skips over npos and the loop continues. Adding the if statement fixed it, but made the code look clunky. There has to be a more elegant solution.
Thanks in advance!

I would recommend that you use a third-party to do all this stuff, which is pretty tough with raw tools. I actually did this kind of stuff recently so I can give you some help.
I would recommend you take a look at boost::property_tree .
Here is the theory: A Json file is like a tree, you have a root, and many branches.
The idea is to transform this JSON file into a boost::property_tree::ptree, so then you use easily the object ptree and not the file.
First, let's say we have this JSON file:
{
"document": {
"person": {
"name": "JOHN",
"age": 21
},
"code": "AX-GFD123"
}
"body" : "none"
}
Then in your code, be sure to include:
#include "boost/property_tree/ptree.hpp"
#include "boost/property_tree/json_parser.hpp"
Then here is the most interesting part:
boost::property_tree::ptree root;
You create the ptree object named root.
boost::property_tree::read_json("/path_to_my_file/doc.json", root);
Then you tell what file to read, and where to store it (here in root). Be careful, you should use try / catch on this in case the file doesn't exist.
Then you will only use the root tree which is really easy to do. You have many functions (I invite you to see the boost documentation page).
You want to access the namefield. Right then do this:
std::string myname = root.get<std::string> ("document.person.name", "NOT FOUND");
The get function has the first parameter the path to get the attribute you want, the second is for default return if the path is incorrect or doesn't exist. the <std::string> is to show what type it must return.
Let's finish with another example. Let's say you want to check all your root nodes, that means every node which are on the top level.
BOOST_FOREACH(const boost::property_tree::ptree::value_type& child, root.get_child(""))
{ cout << child.first << endl; }
This is a bit more complicated. I explain. You tell boost to look every child of the root with root.get_child("") , "" is used for root. Then, for every child found, (like a basic iterator), you will use const boost::property_tree::ptree::value_type& child.
So inside the foreach, you will use the child to access whatever you want. child.firstwill give you the name of the child node currently in use. In my example it will print first document, and then body.
I invite you to have a look at Boost documentation. It looks maybe hard at first, but it is really easy to use after that.
http://www.boost.org/doc/libs/1_41_0/doc/html/property_tree.html

Related

RapidXML accessing sibling nodes causes segfaults for seemingly no reason

So I recently got a hold of RapidXML to use as a way to parse XML in my program, I have mainly been using it as a way to mess around but I have been getting some very weird issues that I'm really struggling to track down. Try and stick with me through this, because I was pretty thorough with trying to fix this issue, but I must be missing something.
First off here's the XML:
<?xml version="1.0" encoding="utf-8" ?>
<resources>
<image key="tilemap_roguelikesheet" path="res/media/tilemaps/roguelikesheet.png" />
<image key="tilemap_tiles" path="res/media/tilemaps/tiles.png" />
</resources>
The function the segfault occurs:
void TextureManager::LoadResource(const char* pathToFile)
{
rapidxml::xml_document<>* resource = Resources::LoadResource(pathToFile);
std::string imgName;
std::string imgPath;
if (resource != NULL)
{
rapidxml::xml_node<>* resourcesNode = resource->first_node("resources");
if (resourcesNode != NULL)
{
for (rapidxml::xml_node<>* child = resourcesNode->first_node("image"); child; child = child->next_sibling())
{
//Crash here on the second loop through.
imgName = child->first_attribute("key")->value();
imgPath = child->first_attribute("path")->value();
Astraeus::Log(moduleName, "Image Name: " + imgName);
Astraeus::Log(moduleName, "Image Path: " + imgPath);
TextureManager::AddTexture(imgName, imgPath);
}
}
else
{
Astraeus::Error(moduleName, "Resources node failed to load!");
}
resource->clear();
}
else
{
std::string fileName(pathToFile);
Astraeus::Error(moduleName, fileName + " could not be loaded.");
}
}
So segfault happens on the second loop of the for loop to go through all the nodes, and triggers when it tries to do the imgName assignment. Here's where things get a bit odd. When doing a debug of the program, the initial child nodes breakdown shows it has memory pointers to the next nodes and it's elements/attributes etc. When investigating those nodes, you can see that the values exist and rapidxml has seemingly successfully parsed the file.
However, when the second loop occurs, child is shown to still have the exact same memory pointers, but this time the breakdown in values show they are essentially NULL values, so the program fails and we get the code 139. If you try and look at the previous node, that we have just come from the values are also NULL.
Now say, I comment out the line that calls on the AddTexture function, the node is able to print out all the nodes values no problems at all. (The Log method is essentially just printing to console until I do some more funky stuff with it.) so the problem must lie in the function? Here it is:
void TextureManager::AddTexture(const std::string name, const std::string path)
{
Astraeus::Log(moduleName, "Loading texture: " + path);
if (texturesLookup.find(name) != texturesLookup.end())
{
Astraeus::Error(moduleName, "Texture Key: " + name + " already exists in map!");
}
else
{
texturesLookup.insert(std::make_pair(name, path));
//Texture* texture = new Texture();
/*if (texture->LoadFromFile(path))
{
//textures.insert(std::make_pair(name, texture));
}
else
{
Astraeus::Error(moduleName, "Failed to add texture " + name + " to TextureManager!");
}*/
}
}
Ignoring the fact that strings are passed through and so should not affect the nodes in any way, this function is still a bit iffy. If I comment out everything it can work, but sometimes just crashes out again. Some of the code got commented out because instead of directly adding the key name, plus a memory pointer to a texture, I switched to storing the key and path strings, then I could just load the texture in memory later on as a workaround. This solution worked for a little bit, but sure enough began to segfault all over again.
I can't really reliably replicate or narrow down what causes the issue everytime, so would appreciate any help. Is RapidXML doc somehow going out of scope or something and being deleted?
For the record the class is practically just static along with the map that stores the texture pointers.
Thanks!
So for anybody coming back again in the future here's what was happening.
Yes, it was a scope issue but not for the xml_document as I kept initially thinking. The xml_file variable that was in the resources load function was going out of scope, which meant due to the way RapidXML stores things in memory, as soon as that goes out of scope then it frees up the memory, which led to the next time dynamic allocation happened by a specific function it would screw up the xml document and fill it with garbage data.
So I guess the best idea is to make sure xml_file and xml_document do not go out of scope. I have added some of the suggestions from previous answers, but I will point out those items WERE in the code, before being removed to help with the debug process.
Thanks everybody for the help/advice.
I'm not sure, but I think that Martin Honnen made the point.
If next_sibling() return the pointer to the text node between the two "image" elements, when you write
imgName = child->first_attribute("key")->value();
you obtain that child->first_attribute("key") is a null pointer, so the ->value() is dereferencing a null pointer. Crash!
I suppose you should get the next_sibling("image") element; something like
for (rapidxml::xml_node<>* child = resourcesNode->first_node("image");
child;
child = child->next_sibling("image"))
And to be sure not to use a null pointer, I strongly suggest you to check the attribute pointers (are you really sure that "image" elements ever carry the "key" and the "path" elements?); something like this
if ( child->first_attribute("key") )
imgName = child->first_attribute("key")->value();
else
; // do something
if ( child->first_attribute("path") )
imgPath = child->first_attribute("path")->value();
else
; // do something
p.s.: sorry for my bad English.
This line is setting my teeth on edge...
rapidxml::xml_document<>* resource = Resources::LoadResource(pathToFile);
LoadResource returns a pointer, but you never free it anywhere...?
Are you 100% sure that function isn't returning a pointer to an object that's now gone out of scope. Like this classic bug...
int * buggy()
{
int i= 42;
return &i; // UB
}
As #max66 says. You should use next_sibling("image"). If that's failing, you need to find out why.

How to use a variable in the same struct it's defined in?

I am making a rogue-like ASCII game and made a struct called "Armor" and I want to use the name variable in the struct to have the path to whatever the name is.
struct Armor {
bool equipped;
std::string name;
int getBuff(int buff) {
std::fstream item;
std::string line;
std::string response;
std::string value;
item.open("../Data/Items/" + name + ".item", std::fstream::in);
if (item.fail())
errorQuit("ERROR: There was a problem loading armor type .ITEM file."); // Error and quit function
while (!item.eof()) {
getline(item, line);
response = split(line, '=', 0); // Splits string
if (response == "buff" + std::to_string(buff)) {
value = split(line, '=', 1);
break;
}
}
item.close();
return std::stoi(value);
}
};
Then I called it like this:
Armor sword;
sword.name = "Wooden Sword";
int buff = sword.getBuff(1);
But this throws an Unhandled exception error.
I changed it so that getBuff takes 2 parameters, int buff and std::string itemName. and replaced name in the path with itemName;
Then I tried calling it like this:
Armor sword;
sword.name = "Wooden Sword";
int buff = sword.getBuff(1, sword.name);
But this throws the same error.
I'm confused as to why I can't use the name variable as it has already be defined. Is there any other way I can use the name variable like that?
I see you've just edited your comment to say you've figured your problem out, but I just want to add something else that may be helpful:
Without seeing how errorQuit() is defined, there's a potential problem in your getBuff() function. If the expression if (item.fail()) evaluates to true, the function may continue on trying to process the data (unless errorQuit() somehow breaks out of the program or something, which probably isn't the best approach).
Basically, testing for fail() may or may not provide the behavior you require in all scenarios, depending on what bits are set in the stream state. Implementations vary, but... if the file fails to open, failbit and/or badbit will be set, but not eofbit. getline() will see the error state and so it will not try to read from the stream when you call it. But that also means the eofbit will never be set!
There's lots of different "techniques" to file reading. Some people prefer an RAII approach. Others like looping on getline(). Or you could even just use good() to check the error state if you don't care what happened and simply want to know if everything is fine or not.
In any case, you might be interested in the info on this page: std::ios_base::iostate.
Thanks for all your help but I figured it out on my own.
I just made a stupid error that I overlooked like an idiot.
It is searching for buff + int (e.x. buff1) in the file but there are multiple lines that contain that word so I guessed that messed it up. I just made an adjustment to the if statement and it is working as expected.
Sorry to bother you!
your getBuf() function fails on some io-operation and throws an exception.You dont handle exceptions and thus the application quits with the appropriate message. Try surrounding the call to getBuf with try/catch (add the includes to iostream and stdexcept)
try {
int buff = sword.getBuff(1);
}
catch (const std::exception &e) {
std::cout << e.what() << std::endl;
}

How can one get the name of an HDF5 DataSet through the C or C++ API?

I'm trying to read the name of a HDF5 DataSet using the C++ API. For H5::Attribute objects, there is a getName() method. However, I don't see a similar getName() method for H5:DataSet objects.
Ideally I want to do this:
void Dump(H5::DataSet& ds)
{
cout << "Dataset " << ds.getName() << endl;
// continue to print dataset values
}
I know h5dump can do it, but briefly looking at the code, it only knows it by walking the tree using H5Giterate, that is only the parent knows the name of the children, but the children don't know their own name.
This is a partial answer, based on Simon's post. Note that the name is a full hierarchical name,
std::string getName(const H5::DataSet& ds)
{
size_t len = H5Iget_name(ds.getId(),NULL,0);
char buffer[len];
H5Iget_name(ds.getId(),buffer,len+1);
std::string n = buffer;
return n;
}
example name
"/toplevel/videodata"
In C, there is H5Iget_name. I couldn't find the equivalent in C++ but you can use DataSet::getId() and give that to the C function.
I guess the reason why this is not as simple as having a getName() accessor in DataSet is that to read a dataset, you either need to know its name or walk the tree. The only exception I can think of is when dereferencing a reference to a dataset.

How do I use the value of a variable to do things in my code?

I'm trying to parse an HTML file for a C++ assignment. The assignment is demonstrating stacks; we're supposed to push to the stack every time we hit a tag, and then pop off when we find the corresponding closing tag.
The teacher obviously wants us to hard-code a set of tags to detect, like:
// Declare some stacks
Stack html;
Stack div;
...
// When you find an open tag, push to the corresponding stack
if (tagcontents == "html") { html.push(); }
if (tagcontents == "div") { div.push(); }
...
// When you find a close tag, push to the corresponding stack
if (tagcontents == "/html") { html.pop(); }
if (tagcontents == "/div") { div.pop(); }
...
The obvious downside of this is that if I want to support all of the tags available in HTML, I can expect to do lots of redundant coding. The teacher obviously wants us to pick just a small subset of the available tags, and go off those, but I think that's lame. Since I'm lazy (and I firmly believe that all programmers should be), I'm trying to come up with a dynamic solution.
The idea is, whenever I encounter a new tag, create a stack for it. This would allow my program to support ANY tag, regardless of validity. I'm hitting an interesting theoretical problem, though, and I'm not even sure what to call it in order to research it. Namely, that I need to use the VALUE of a variable as part of my actual code. IE:
if (no stack exists named "HTML") { create a stack named "HTML" }
In simplistic terms, how can I:
tag = "html";
Stack tag; // make a stack named HTML?
Or is there another way to do this? Any help would be greatly appreciated. If I can't figure this out, I'll probably just use a switch/case statement like a quitter.
Create the stacks inside a std::map<std::string, Stack>.
use a map/unordered map:
std::map <std:string, Stack> myStacks;
Then you can just do
myMastacks[tagcontents].push()
This will initialize a new stack for the key if one does not yet exist.
and on the end of tag, strip the slash, check if it's on the map, and there you go.
I would do it differently, more simple, with only one stack for all tags (which I think is very reasonable, unless your teacher actually instructed you to use several stacks): Declare a stack of strings. A string represents a tag. You can use the STL stack for this:
stack<string> my_tags;
my_tags.push("div") will push "div" into the stack.
string tag = my_tags.top(); will query the top of the stack, and my_tags.pop() will pop the top item from the stack. Very easy :-)
Again, this solution is good in case you don't really need to practice several stacks, but to examine where you stand within the html parsing.
Here is an example:
#include <stdio.h>
#include <map>
#include <string>
#include <list>
#include <iostream>
typedef std::list<std::string> stack;
typedef std::map<std::string, stack> stack_map;
stack_map my_stacks;
stack& getStack(const std::string& stack_name) {
stack_map::iterator it = my_stacks.find(stack_name);
if( it != my_stacks.end() ) {
return it->second;
} else {
my_stacks[stack_name] = stack();
return my_stacks[stack_name];
}
}
...
stack& div_stack = getStack("div");
// and use that for example
div_stack.push_back("some info");
div_stack.push_back("some more info ... ");
div_stack.push_back("s even more ... ");
.....

How to paste xml to C++ (Tinyxml)

I'm currently working on a project in C++ where I need to read some things from a xml file, I've figured out that tinyxml seams to be the way to go, but I still don't know exactly how to do.
Also my xml file is a little tricky, because it looks a little different for every user that needs to use this.
The xml file I need to read looks like this
<?xml version="1.0" encoding="utf-8"?>
<cloud_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx xmlns:d="http://www.kuju.com/TnT/2003/Delta" d:version="1.0">
<cCareerModel d:id="154964152">
<ScenarioCareer>
<cScenarioCareer d:id="237116344">
<IsCompleted d:type="cDeltaString">CompletedSuccessfully</IsCompleted>
<BestScore d:type="sInt32">0</BestScore>
<LastScore d:type="sInt32">0</LastScore>
<ID>
<cGUID>
<UUID>
<e d:type="sUInt64">5034713268864262327</e>
<e d:type="sUInt64">2399721711294842250</e>
</UUID>
<DevString d:type="cDeltaString">0099a0b7-e50b-45de-8a85-85a12e864d21</DevString>
</cGUID>
</ID>
</cScenarioCareer>
</ScenarioCareer>
<MD5 d:type="cDeltaString"></MD5>
</cCareerModel>
</cloud_xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx>
Now the goal of this program is to be able to insert some string (via. a variable) and serch for the corresponding "cScenarioCarrer d:id" and read the "IsComplete" and the "BestScore".
Those strings later need to be worked with in my program, but that I can handle.
My questions here are
A. How do I go by searching for a specific "cScenarioCareer" ID
B. How do I paste the "IsComplete" and "BestScore" into some variables in my program.
Note: The xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx string is unique for every user, so keep in mind it can be anything.
If anyone out there would like to help me, I'd be very graceful, thank you.
PS. I'd like to have some kind of understanding for what I'm doing here, all though "paste this code into your program" answers are acceptable, I think it would be much better if you can tell me how and why it works.
Since you're doing this in C++ I'll make this example using the ticpp interface to
TinyXml that available at ticpp.googlecode.com.
Assumptions:
A given xml file will contain one <cloud> tag and multiple
<cCareerModel> tags.
Each <cCareerModel> contains a single <ScenarioCareer> tag which in turn contains a single <cScenarioCareer> tag
You've parsed the xml file into a TiXmlDocument called xmlDoc
You don't need to examine the data type attributes
You don't mind using exceptions
I'll also assume that you have a context variable somewhere containing a pointer to the
<cloud> tag, like so:
ticpp::Element* cloud = xmlDoc.FirstChildElement("cloud");
Here's a function that will locate the ticpp::Element for the cScenarioCareer with
the given ID.
ticpp::Element* findScenarioCareer(const std::string& careerId)
{
try
{
// Declare an iterator to access all of the cCareerModel tags and construct an
// end iterator to terminate the loop
ticpp::Iterator<ticpp::Element> careerModel;
const ticpp::Iterator<ticpp::Element> modelEnd = careerModel.end();
// Loop over the careerModel tags
for (careerModel = cloud->FirstChildElement() ; careerModel != modelEnd ;
++careerModel)
{
// Construct loop controls to access careers
ticpp::Iterator<ticpp::Element> career;
const ticpp::Iterator<ticpp::ELement> careerEnd = career.end();
// Loop over careers
for (career = careerModel->FirstChildElement("ScenarioCareer").FirstChildElement() ;
career != careerEnd ; ++career)
{
// If the the d:id attribute value matches then we're done
if (career->GetAttributeOrDefault("d:id", "") == careerId)
return career;
}
}
}
catch (const ticpp::Exception&)
{
}
return 0;
}
Then to get at the information you want you'd do something like:
std::string careerId = "237116344";
std::string completion;
std::string score;
ticpp::Element* career = findScenarioCareer(careerId);
if (career)
{
try
{
completion = career->FirstChildElement("IsCompleted")->GetText();
score = career->FirstChildElement("BestScore")->GetText();
}
catch (const ticpp::Exception&)
{
// Handle missing element condition
}
}
else
{
// Not found
}
Naturally I haven't compiled or tested any of this, but it should give you the idea.