Quick way to extract the infomation from .xml files to the object - c++

I am starter and right now I am trying to extract the key information from a .xml file then load them to an object of my class, for example:
Here are some information in .xml file:
<row Id="17" Phone="12468" Address="Bos" />
<row Id="242" Phone="98324" Address="Chi" Age="30"/>
<row Id="157" Phone="23268" Age="25" />
<row Id="925" Phone="54325" Address="LA" />
And my class would be:
class worker{
string ID;
string Phone;
string Address;
string Age;
}
I know the infomation would be various and if there is not that infomation of that line, we put ""(empty string) in it as return. And I know the infomation are given in the same order of the fields in class. I try to implement a function, let says extractInfo(const string& line, const string &key)
//#line: the whole line read from .xml
//#key: it would be "Id:"", "Phone:"", "Address:"" or "Age:"", so that I could reach the
// previous index of the infomation that I could extract.
extractInfo(const string& line, const string &key){
int index = line.find(key);
if(index == -1) return "";
int start = index + key.length(); //to reach the start quote
int end = start;
while(line[end] != '"'){ //to reach the end quote
end++;
}
return line.substr(start, end - start);
}
int main(){
...// for each line read from .xml, I build a new object of class worker and filling the field
worker.Id = extraInfo(line, "Id:\"");
worker.Phone = extraInfo(line, "Phone:\"");
...//etc.
...//then work on other manipulation
return 0;
}
My question are, is there any way that I could read and load the infomation from xml much more quickly through other APL or functions? That is, is there any way for me to improve this function when the .xml is a huge file with TBytes? And, is there any way that I can use less memory to, for example, find the oldest worker then print out? I know it's tough for me and I still try hard on it!
Thank all the ideas and advice in advance!

You can parse XML with existing XML parsing libraries, such as rapidxml, libxml2, etc.
Please note that for huge XML, since it need read all XML content to create the DOM tree, so the DOM method is not really suitable. you can use libxml2's xmlreader to parse each node one by one.
libxml2 xml reader
static void
streamFile(const char *filename) {
xmlTextReaderPtr reader;
int ret;
reader = xmlReaderForFile(filename, NULL, 0);
if (reader != NULL) {
ret = xmlTextReaderRead(reader);
while (ret == 1) {
const xmlChar *name = xmlTextReaderConstName(reader);
if(xmlStrEqual(BAD_CAST "row", name)) {
const xmlChar *id = xmlTextReaderGetAttribute(reader, "Id");
const xmlChar *phone = xmlTextReaderGetAttribute(reader, "Phone");
// you code here...
xmlFree(id);
xmlFree(phone);
}
ret = xmlTextReaderRead(reader);
}
xmlFreeTextReader(reader);
if (ret != 0) {
fprintf(stderr, "%s : failed to parse\n", filename);
}
} else {
fprintf(stderr, "Unable to open %s\n", filename);
}
}
And, If your XML format is always like above, you can also use std::regex_search to handle it
https://en.cppreference.com/w/cpp/regex/regex_search
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string str = R"(<row Id="17" Phone="12468" Address="Bos" />)";
std::regex regex("(\\w+)=\"(\\w+)\"");
// get all tokens
std::smatch result;
while (std::regex_search(str, result, regex))
{
std::cout << result[1] << ": " << result[2] << std::endl;
str = result.suffix().str();
}
}

Related

Reading process name from /proc/ID/cmdline using null chars don't work

I'm trying to create a small program that can read all pid and yours names from the /proc, i already saw that to do it i need substr by null chars. It works very well but in some cases it just don't work. look the following results:
Proc ID: 3440
Proc name: code
Proc ID: 2588
Proc name: zsh
Proc ID: 2450
Proc name: app --enable-sandbox --num-raster-threads=4 --enable-main-frame-before-activation --renderer-client-id=9 --no-v8-untrusted-code-mitigations --shared-files=v8_context_snapshot_data:100 --vscode-window-config=vscode:30d26ee9-f58e-4c70-8410-066d00859f6c
In the last result of my search the null byte looks not exists. Here is my methods to read the procs:
class process{
public:
vector<pair<int,string>> getProcess(){
vector<pair<int,string>> allProcess;
// Some necessary variables;
fs::path filePath;
fstream fileToRead;
string fileLine;
string finalExecutableName;
string processID;
// For each file in /proc do...
for(auto file : fs::directory_iterator(procDir)){
// Get the folder name to check if is numeric
processID = this->getExecutableOrFolderName(file.path().c_str());
if(this->directoryNameIsNumeric(processID)){
// Get the path and append /cmdline
filePath = file.path();
filePath.concat("/cmdline");
// Open the file
fileToRead.open(filePath.c_str(),ios::in);
if(fileToRead.is_open()){
// Get the line and check if is not null
getline(fileToRead,fileLine);
if(fileLine != ""){
// Try get the executable name
finalExecutableName = this->getExecutableOrFolderName(fileLine);
// Push the result and the pid to vector
allProcess.push_back(make_pair(atoi(processID.c_str()),finalExecutableName));
}
fileLine = "";
}
else{
}
fileToRead.close();
}
}
return allProcess;
};
private:
bool directoryNameIsNumeric(string folderName){
size_t folderNameSize = folderName.length();
for(int i = 0; i < folderNameSize;i++){
if(!isdigit(folderName[i])){
return false;
}
}
return true;
}
//Some possible error here ?
string getExecutableOrFolderName(string fileText){
size_t nullPosition = fileText.find('\0');
if(nullPosition != string::npos){
fileText = fileText.substr(0,nullPosition);
}
return fileText.substr(fileText.rfind("/") + 1);
}
};
any help is appreciated.

CppUnitTestFramework: Test Method Fails, Stack Trace Lists Line Number at the End of Method, Debug Test Passes

I know, I know - that question title is very much all over the place. However, I am not sure what could be an issue here that is causing what I am witnessing.
I have the following method in class Project that is being unit tested:
bool Project::DetermineID(std::string configFile, std::string& ID)
{
std::ifstream config;
config.open(configFile);
if (!config.is_open()) {
WARNING << "Failed to open the configuration file for processing ID at: " << configFile;
return false;
}
std::string line = "";
ID = "";
bool isConfigurationSection = false;
bool isConfiguration = false;
std::string tempID = "";
while (std::getline(config, line))
{
std::transform(line.begin(), line.end(), line.begin(), ::toupper); // transform the line to all capital letters
boost::trim(line);
if ((line.find("IDENTIFICATIONS") != std::string::npos) && (!isConfigurationSection)) {
// remove the "IDENTIFICATIONS" part from the current line we're working with
std::size_t idStartPos = line.find("IDENTIFICATIONS");
line = line.substr(idStartPos + strlen("IDENTIFICATIONS"), line.length() - idStartPos - strlen("IDENTIFICATIONS"));
boost::trim(line);
isConfigurationSection = true;
}
if ((line.find('{') != std::string::npos) && isConfigurationSection) {
std::size_t bracketPos = line.find('{');
// we are working within the ids configuration section
// determine if this is the first character of the line, or if there is an ID that precedes the {
if (bracketPos == 0) {
// is the first char
// remove the bracket and keep processing
line = line.substr(1, line.length() - 1);
boost::trim(line);
}
else {
// the text before { is a temp ID
tempID = line.substr(0, bracketPos - 1);
isConfiguration = true;
line = line.substr(bracketPos, line.length() - bracketPos);
boost::trim(line);
}
}
if ((line.find("PORT") != std::string::npos) && isConfiguration) {
std::size_t indexOfEqualSign = line.find('=');
if (indexOfEqualSign == std::string::npos) {
WARNING << "Unable to determine the port # assigned to " << tempID;
}
else {
std::string portString = "";
portString = line.substr(indexOfEqualSign + 1, line.length() - indexOfEqualSign - 1);
boost::trim(portString);
// confirm that the obtained port string is not an empty value
if (portString.empty()) {
WARNING << "Failed to obtain the \"Port\" value that is set to " << tempID;
}
else {
// attempt to convert the string to int
int workingPortNum = 0;
try {
workingPortNum = std::stoi(portString);
}
catch (...) {
WARNING << "Failed to convert the obtained \"Port\" value that is set to " << tempID;
}
if (workingPortNum != 0) {
// check if this port # is the same port # we are publishing data on
if (workingPortNum == this->port) {
ID = tempID;
break;
}
}
}
}
}
}
config.close();
if (ID.empty())
return false;
else
return true;
}
The goal of this method is to parse any text file for the ID portion, based on matching the port # that the application is publishing data to.
Format of the file is like this:
Idenntifications {
ID {
port = 1001
}
}
In a separate Visual Studio project that unit tests various methods, including this Project::DetermineID method.
#define STRINGIFY(x) #x
#define EXPAND(x) STRINGIFY(x)
TEST_CLASS(ProjectUnitTests) {
Project* parser;
std::string projectDirectory;
TEST_METHOD_INITIALIZE(ProjectUnitTestInitialization) {
projectDirectory = EXPAND(UNITTESTPRJ);
projectDirectory.erase(0, 1);
projectDirectory.erase(projectDirectory.size() - 2);
parser = Project::getClass(); // singleton method getter/initializer
}
// Other test methods are present and pass/fail accordingly
TEST_METHOD(DetermineID) {
std::string ID = "";
bool x = parser ->DetermineAdapterID(projectDirectory + "normal.cfg", ID);
Assert::IsTrue(x);
}
};
Now, when I run the tests, DetermineID fails and the stack trace states:
DetermineID
Source: Project Tests.cpp line 86
Duration: 2 sec
Message:
Assert failed
Stack Trace:
ProjectUnitTests::DetermineID() line 91
Now, in my test .cpp file, TEST_METHOD(DetermineID) { is present on line 86. But that method's } is located on line 91, as the stack trace indicates.
And, when debugging, the unit test passes, because the return of x in the TEST_METHOD is true.
Only when running the test individually or running all tests does that test method fail.
Some notes that may be relevant:
This is a single-threaded application with no tasks scheduled (no race condition to worry about supposedly)
There is another method in the Project class that also processes a file with an std::ifstream same as this method does
That method has its own test method that has been written and passes without any problems
The test method also access the "normal.cfg" file
Yes, this->port has an assigned value
Thus, my questions are:
Why does the stack trace reference the closing bracket for the test method instead of the single Assert within the method that is supposedly failing?
How to get the unit test to pass when it is ran? (Since it currently only plasses during debugging where I can confirm that x is true).
If the issue is a race condition where perhaps the other test method is accessing the "normal.cfg" file, why does the test method fail even when the method is individually ran?
Any support/assistance here is very much appreciated. Thank you!

C++ Bad access when assigning an element to map value

So the question explains the problem...
Background:
I'm trying to solve this problem from HackerRank.
It's basically an html tag parser. Valid input guaranteed, attributes are strings only.
My Approach
I created a custom Tag class that can store a map<string,Tag> of other Tag's, as well as a map<string,string> of attributes. The parsing seems to be working correctly.
The Problem
During the querying part, I get a BAD_ACCESS error on the following query/html combo:
4 1
<a value = "GoodVal">
<b value = "BadVal" size = "10">
</b>
</a>
a.b~size
The error occurs when I try to access the b Tag from a. Specifically, it's in the t=t.tags[tag_name], Line 118 below.
Code
#include <cmath>
#include <cstdio>
#include <vector>
#include <iostream>
#include <algorithm>
#include <sstream>
#include <map>
#include <stack>
using namespace std;
class Tag {
public:
Tag(){};
Tag(string name):name(name){};
string name;
map<string,Tag> tags = map<string, Tag>();
map<string,string> attribs=map<string,string>();
};
int main() {
int lines, queries;
std::cin>>lines>>queries;
std:string str;
getline(cin, str);
stack<string> open;
auto tags = map<string, Tag>();
for (int i = 0; i < lines; i++) {
getline(cin, str);
if (str.length()>1){
// If it's not </tag>, then it's an opening tag
if (str[1] != '/') {
// Parse tag name
auto wordidx = str.find(" ");
if (wordidx == -1) {
wordidx = str.length()-1.f;
}
string name = str.substr(1,wordidx-1);
auto t = Tag(name);
string sub = str.substr(wordidx);
auto equalidx=sub.find("=");
// Parse Attributes
while (equalidx != std::string::npos) {
string key = sub.substr(1,equalidx-2);
sub = sub.substr(equalidx);
auto attrib_start = sub.find("\"");
sub = sub.substr(attrib_start+1);
auto attrib_end = sub.find("\"");
string val = sub.substr(0, attrib_end);
sub = sub.substr(attrib_end+1);
t.attribs[key] = val;
equalidx=sub.find("=");
}
// If we're in a tag, push to that, else push to the base tags
if (open.size() == 0) {
tags[name] = t;
} else {
tags[open.top()].tags[name]=t;
}
open.push(name);
} else {
// Pop the stack if we reached a closing tag
auto wordidx = str.find(">");
string name = str.substr(2,wordidx-2);
// Sanity check, but we're assuming valid input
if (name.compare(open.top())) {
cout<<"FUCK"<<name<<open.top()<<endl;
return 9;
}
open.pop();
}
} else {
std::cout<<"FUCK\n";
}
}
//
// Parse in queries
//
for (int i = 0; i < queries; i++) {
getline(cin, str);
Tag t = Tag();
bool defined = false;
auto next_dot = str.find(".");
while (next_dot!=string::npos) {
string name = str.substr(0,next_dot);
if (defined && t.tags.find(name) == t.tags.end()) {
//TAG NOT IN T
cout<<"Not Found!"<<endl;
continue;
}
t = !defined ? tags[name] : t.tags[name];
defined = true;
str = str.substr(next_dot+1);
next_dot = str.find(".");
}
auto splitter = str.find("~");
string tag_name = str.substr(0,splitter);
string attrib_name = str.substr(splitter+1);
if (!defined) {
t = tags[tag_name];
} else if (t.tags.find(tag_name) == t.tags.end()) {
//TAG NOT IN T
cout<<"Not Found!"<<endl;
continue;
} else {
t = t.tags[tag_name];
}
// T is now set, check the attribute
if (t.attribs.find(attrib_name) == t.attribs.end()) {
cout<<"Not Found!"<<endl;
} else {
cout<<t.attribs[attrib_name]<<endl;
}
}
return 0;
}
What I've tried
This is fixed by just defining Tag x = t.tags[tag_name]; in the line above as a new variable, and then doing t = x; but why is this even happening?
Also, the following query also then fails: a.b.c~height, but it fails on Line 99 when it tried to get a.tags["b"]. No idea why. I was gonna just go with the hacky fix above, but this seems like a big core issue that i'm doing wrong.
I would suggest running this on an IDE and verifying that the parsing is indeed correct.
t=t.tags[tag_name]
This expression is unsafe because you are copy-assigning an object that is owned by that object over the owning object.
Consider what happens on this line:
The map lookup is performed and returns a Tag&.
You try to copy-assign this to t, invoking the implicit copy-assigment operator.
This operator copy-assigns t.tags from the tags attribute of the copy source -- which lives in t.tags.
The result is that the object you're copying into t is destroyed in the middle of that copy. This causes undefined behavior, and an immediate crash is honestly the best possible outcome as it told you exactly where the problem was. (This kind of problem frequently manifests at some point later in the program, at which point you've lost the state necessary to figure out what caused the UB.)
One workaround would be to move the source object into a temporary and then move-assign that temporary over t:
t = Tag{std::move(t.tags[tag_name])};
This lifts the data we want to assign to t out of t before we try to put it in t. Then, when t's assignment operator goes to replace t.tags, the data you're trying to assign to t doesn't live there anymore.
However, this overall approach involves a lot of unnecessary copying. It would be better to declare t as Tag const *t; instead -- have it be a pointer to a tag. Then you can just move that pointer around to point at other tags in your data structure without making copies.
Side note: I just did this problem the other day! Here's a hint that might help you simplify things: do you actually need a structure of tags? Is there a simpler type of lookup structure that would work instead of nested tags?

extract domain between two words

I have in a log file some lines like this:
11-test.domain1.com Logged ...
37-user1.users.domain2.org Logged ...
48-me.server.domain3.net Logged ...
How can I extract each domain without the subdomains? Something between "-" and "Logged".
I have the following code in c++ (linux) but it doesn't extract well. Some function which is returning the extracted string would be great if you have some example of course.
regex_t preg;
regmatch_t mtch[1];
size_t rm, nmatch;
char tempstr[1024] = "";
int start;
rm=regcomp(&preg, "-[^<]+Logged", REG_EXTENDED);
nmatch = 1;
while(regexec(&preg, buffer+start, nmatch, mtch, 0)==0) /* Found a match */
{
strncpy(host, buffer+start+mtch[0].rm_so+3, mtch[0].rm_eo-mtch[0].rm_so-7);
printf("%s\n", tempstr);
start +=mtch[0].rm_eo;
memset(host, '\0', strlen(host));
}
regfree(&preg);
Thank you!
P.S. no, I cannot use perl for this because this part is inside of a larger c program which was made by someone else.
EDIT:
I replace the code with this one:
const char *p1 = strstr(buffer, "-")+1;
const char *p2 = strstr(p1, " Logged");
size_t len = p2-p1;
char *res = (char*)malloc(sizeof(char)*(len+1));
strncpy(res, p1, len);
res[len] = '\0';
which is extracting very good the whole domain including subdomains.
How can I extract just the domain.com or domain.net from abc.def.domain.com ?
is strtok a good option and how can I calculate which is the last dot ?
#include <vector>
#include <string>
#include <boost/regex.hpp>
int main()
{
boost::regex re(".+-(?<domain>.+)\\s*Logged");
std::string examples[] =
{
"11-test.domain1.com Logged ...",
"37-user1.users.domain2.org Logged ..."
};
std::vector<std::string> vec(examples, examples + sizeof(examples) / sizeof(*examples));
std::for_each(vec.begin(), vec.end(), [&re](const std::string& s)
{
boost::smatch match;
if (boost::regex_search(s, match, re))
{
std::cout << match["domain"] << std::endl;
}
});
}
http://liveworkspace.org/code/1983494e6e9e884b7e539690ebf98eb5
something like this with boost::regex. Don't know about pcre.
Is the in a standard format?
it appears so, is there a split function?
Edit:
Here is some logic.
Iterate through each domain to be parsed
Find a function to locate the index of the first string "-"
Next find the index of the second string minus the first string "Logged"
Now you have the full domain.
Once you have the full domain "Split" the domain into your object of choice (I used an array)
now that you have the array broken apart locate the index of the value you wish to reassemble (concatenate) to capture only the domain.
NOTE Written in C#
Main method which defines the first value and the second value
`static void Main(string[] args)
{
string firstValue ="-";
string secondValue = "Logged";
List domains = new List { "11-test.domain1.com Logged", "37-user1.users.domain2.org Logged","48-me.server.domain3.net Logged"};
foreach (string dns in domains)
{
Debug.WriteLine(Utility.GetStringBetweenFirstAndSecond(dns, firstValue, secondValue));
}
}
`
Method to parse the string:
`public string GetStringBetweenFirstAndSecond(string str, string firstStringToFind, string secondStringToFind)
{
string domain = string.Empty;
if(string.IsNullOrEmpty(str))
{
//throw an exception, return gracefully, whatever you determine
}
else
{
//This can all be done in one line, but I broke it apart so it can be better understood.
//returns the first occurrance.
//int start = str.IndexOf(firstStringToFind) + 1;
//int end = str.IndexOf(secondStringToFind);
//domain = str.Substring(start, end - start);
//i.e. Definitely not quite as legible, but doesn't create object unnecessarily
domain = str.Substring((str.IndexOf(firstStringToFind) + 1), str.IndexOf(secondStringToFind) - (str.IndexOf(firstStringToFind) + 1));
string[] dArray = domain.Split('.');
if (dArray.Length > 0)
{
if (dArray.Length > 2)
{
domain = string.Format("{0}.{1}", dArray[dArray.Length - 2], dArray[dArray.Length - 1]);
}
}
}
return domain;
}
`

RapidXML, reading and saving values

I've worked myself through the rapidXML sources and managed to read some values. Now I want to change them and save them to my XML file:
Parsing file and set a pointer
void SettingsHandler::getConfigFile() {
pcSourceConfig = parsing->readFileInChar(CONF);
cfg.parse<0>(pcSourceConfig);
}
Reading values from XML
void SettingsHandler::getDefinitions() {
SettingsHandler::getConfigFile();
stGeneral = cfg.first_node("settings")->value();
/* stGeneral = 60 */
}
Changing values and saving to file
void SettingsHandler::setDefinitions() {
SettingsHandler::getConfigFile();
stGeneral = "10";
cfg.first_node("settings")->value(stGeneral.c_str());
std::stringstream sStream;
sStream << *cfg.first_node();
std::ofstream ofFileToWrite;
ofFileToWrite.open(CONF, std::ios::trunc);
ofFileToWrite << "<?xml version=\"1.0\"?>\n" << sStream.str() << '\0';
ofFileToWrite.close();
}
Reading file into buffer
char* Parser::readFileInChar(const char* p_pccFile) {
char* cpBuffer;
size_t sSize;
std::ifstream ifFileToRead;
ifFileToRead.open(p_pccFile, std::ios::binary);
sSize = Parser::getFileLength(&ifFileToRead);
cpBuffer = new char[sSize];
ifFileToRead.read( cpBuffer, sSize);
ifFileToRead.close();
return cpBuffer;
}
However, it's not possible to save the new value. My code is just saving the original file with a value of "60" where it should be "10".
Rgds
Layne
I think this is a RapidXML Gotcha
Try adding the parse_no_data_nodes flag to cfg.parse<0>(pcSourceConfig)
You should definitely be testing that the output file opened correctly and that your write succeeded. At the simplest, you need something like:
if ( ! ofFileToWrite << "<?xml version=\"1.0\"?>\n"
<< sStream.str() << '\0' ) {
throw "write failed";
}
Note that you don't need the '\0' terminator, but it shouldn't do any harm.
Use the following method to add an attribute to a node. The method uses the allocation of memory for strings from rapidxml. So rapidxml takes care of the strings as long as the document is alive. See http://rapidxml.sourceforge.net/manual.html#namespacerapidxml_1modifying_dom_tree for further information.
void setStringAttribute(
xml_document<>& doc, xml_node<>* node,
const string& attributeName, const string& attributeValue)
{
// allocate memory assigned to document for attribute value
char* rapidAttributeValue = doc.allocate_string(attributeValue.c_str());
// search for the attribute at the given node
xml_attribute<>* attr = node->first_attribute(attributeName.c_str());
if (attr != 0) { // attribute already exists
// only change value of existing attribute
attr->value(rapidAttributeValue);
} else { // attribute does not exist
// allocate memory assigned to document for attribute name
char* rapidAttributeName = doc.allocate_string(attributeName.c_str());
// create new a new attribute with the given name and value
attr = doc.allocate_attribute(rapidAttributeName, rapidAttributeValue);
// append attribute to node
node->append_attribute(attr);
}
}