How often does boost::property_tree parse a property file? - c++

I need to read properties from a file to affect program behavior. Looks like boost::property_tree will do quite nicely. But, I'm wondering if when fetching different kinds of values that the library may reads the file multiple times?
For performance reason I'd like it to only be once. Most of the properties will be simple values like numbers, and strings. But occasionally there will be lists of numbers and lists strings.
I figure the it parses the file only once, but you never know and hence the question.
thanks.

You control it, and it reads only once:
//
// All that is required to read an xml file into the tree
//
std::ifstream in("region.xml");
boost::property_tree::ptree result_tree;
boost::property_tree::read_xml(in,result_tree);
Include the correct header and read ini, json or xml files into the tree using the corresponding read_XXX method.
You then use a "path" to access elements from the tree, or you can iterate over subtrees
BOOST_FOREACH(boost::property_tree::ptree::value_type &v,result_tree.get_child("gpx.rte"))
{
if( v.first == "rtept" ) //current node/element name
{
boost::property_tree::ptree subtree = v.second ;
//A path to the element is required to access the value
const int lat = sub_tree.get<double>( "<xmlattr>.lat")*10000.0;
const int lon = sub_tree.get<double>( "<xmlattr>.lon")*10000.0;
}
}
or direct access via a path:
// Here is simplistic access of the data, again rewritten for flexibility and
// exception safety in real code (nodes shortened too)
const int distVal =
result_tree.get<int>
( "Envelope.Body.MatrixResponse.Matrix.Route.<xmlattr>.distance")/1000;

Every time you want to read you can check if the file has changed. If yes, then you can read it.
char filename[] = "~/test.txt";
char timeStr[100] = "";
struct stat buf;
time_t ltime;
char datebuf[9];
char timebuf[9];
if (!stat(filename, &buf)) {
strftime(timeStr, 100, "%d-%m-%Y %H:%M:%S", localtime(&buf.st_mtime));
printf("\nLast modified date and time = %s\n", timeStr);
}
else {
printf("error getting atime\n");
}
_strtime(timebuf);
_strdate(datebuf);
printf("\nThe Current time is %s\n",timebuf);
printf("\nThe Current Date is %s\n",datebuf);
time(&ltime);
if (difftime(ltime ,buf.st_mtime) > 0) {
// Read it.
}

Related

C++ Read in file element by element, but executing functions every line

I have a file that I need to read in. Each line of the file is exceedingly long, so I'd rather not read each line to a temporary string and then manipulate those strings (unless this isn't actually inefficient - I could be wrong). Each line of the file contains a string of triplets - two numbers and a complex number, separated by a colon (as opposed to a comma, which is used in the complex number). My current code goes something like this:
while (states.eof() == 0)
{
std::istringstream complexString;
getline(states, tmp_str, ':');
tmp_triplet.row() = stoi(tmp_str);
getline(states, tmp_str, ':');
tmp_triplet.col() = stoi(tmp_str);
getline(states, tmp_str, ':');
complexString.str (tmp_str);
complexString >> tmp_triplet.value();
// Then something useful done with the triplet before moving onto the next one
}
tmp_triplet is a variable that stores these three numbers. I want some way to run a function every line (specifically, the triplets in every line are pushed into a vector, and each line in the file denotes a different vector). I'm sure there's an easy way to go about this, but I just want a way to check whether the end of the line has been reached, and to run a function when this is the case.
When trying to plan stuff out, abstraction can be your best friend. If you break down what you want to do by abstract functionality, you can more easily decide what data types should be used and how different data types should be planned out, and often you can find some functions almost write themselves. And typically, your code will be more modular (almost by definition), which will make it easy to reuse, maintain, and adapt if future changes are needed.
For example, it sounds like you want to parse a file. So that should be a function.
To do that function, you want to read in the file lines then process the file lines. So you can make two functions, one for each of those actions, and just call the functions.
To read in file lines you just want to take a file stream, and return a collection of strings for each line.
To process file lines you want to take a collection of strings and for each one parse the string into a triplet value. So you can create a method that takes a string and breaks it into a triplet, and just use that method here.
To process a string you just need to take a string and assign the first part as the row, the second part as the column, and the third part as the value.
struct TripletValue
{
int Row;
int Col;
int Val;
};
std::vector<TripletValue> ParseFile(std::istream& inputStream)
{
std::vector<std::string> fileLines = ReadFileLines(inputStream);
std::vector<TripletValue> parsedValues = GetValuesFromData(fileLines);
return parsedValues;
}
std::vector<std::string> ReadFileLines(std::istream& inputStream)
{
std::vector<std::string> fileLines;
while (!inputStream.eof())
{
std::string fileLine;
getline(inputStream, fileLine);
fileLines.push_back(fileLine);
}
return fileLines;
}
std::vector<TripletValue> GetValuesFromData(std::vector<std::string> data)
{
std::vector<TripletValue> values;
for (int i = 0; i < data.size(); i++)
{
TripletValue parsedValue = ParseLine(data[i]);
values.push_back(parsedValue);
}
return values;
}
TripletValue ParseLine(std::string fileLine)
{
std::stringstream sstream;
sstream << fileLine;
TripletValue parsedValue;
std::string strValue;
sstream >> strValue;
parsedValue.Row = stoi(strValue);
sstream >> strValue;
parsedValue.Col = stoi(strValue);
sstream >> strValue;
parsedValue.Val = stoi(strValue);
return parsedValue;
}

parse huge csv file with C++

I order to simulate my network I am using a trace file (csv file) with a size between 5 to 30 GB.
The csv file is a row based, where each row contains multiple fields delimited by a space and forming teh information to form a network packet:
3 53 4 12 1 1 2 6
Since the file's size could reach several GBs (millions of lines), is it better to divided it into small chunks myfile00.csv, myfile01.csv..., or I can process the entire file on the hard drive without being loaded into the memory?
I want to read the file line by line at a specific time, which is the clock cycle of the simulation, and get all information in the line to create an omnet++ message.
packet MyTrace::getpacket() {
int id; // first field
int cycle; // second field
int source; // third field
int destination; // fourth field
int numberofDep; // fifth field
std::list<int> listofDep; // remaining fields
if (!traceFile.is_open()) {
// get id
// get cycle
// ....
}
Any suggestion would be helpful.
EDIT:
string line;
ifstream myfile ("BlackSmall.csv");
int currentline=0 ;
if (myfile.is_open())
{
while (getline(myfile, line)) {
istringstream ss(line);
string request;
int id, cycle, source , dest, srcType, destType, packetSize, dependency;
int listdep;
std::list<int> dep;
ss >> id;
ss>> cycle;
ss>> source;
ss>> dest;
ss>>request;
ss>> srcType;
ss>> destType;
ss>> packetSize;
ss>> dependency;
while (ss >> listdep) dep.push_back(listdep);
// Create my packet
}
myfile.close();
}
else cout << "Unable to open file";
With the above code, I can get all information that I need from a line.
The problem is that I need to use this code inside a class, which when I call it returns just one line's information. Is there a way how to point to a specific line when I call this class?
It seems like your application seems to require a single sequential pass through the input, so processing a file that is 1GB or 100GB is perhaps just a matter of patience and perhaps parallelism.
The approach should be to translate records line-by-line. You should avoid strategies that attempt to read the entire file into memory. The STL offers the easy-to-use std::ifstream class with a built-in getline method, which returns a std::string containing the line to be converted.
If you are feeling more ambitious and want to control the amount of data read or buffered more carefully then you would not be the first developer to roll-your-own code to implement a buffered reader. This is a fairly empowering exercise and will help you think through some corner cases with reading partial lines and such. But in the end, it probably will not give you a significant boost toward your goal. I suspect the ifstream approach will get you up and running without the hassle and will not ultimately be the bottleneck in processing these files.
If you were really concerned about optimizing execution time then having multiple files might help you launch parallel processing tasks.
// define a class to hold your custom record
class Record {
};
// create a parser function to convert a line of text into the record
bool parse(std::string const &line, Record &record) {
}
// create a translator method to convert a record into the desired output
bool write(Record const &record, std::ofstream &os) {
}
// actually open input stream for the input file
std::ifstream is;
std::ofstream os;
std::string line;
while (std::getline(is,line)) {
Record record;
if (!parse(line,record)) break;
if (!write(record,os)) break;
}
You can re-use the Record instance by moving it outside the while loop so long as you are careful to reset the variable so that information from preceding records does not taint the current record. You can also dive head first into the C++ ecosystem by producing stream input and output operator ("<<",">>") but I personally find this approach to be more confusion than it is worth.
Perhaps best approach for you would be to import your CSV file into SQLite database.
Once you import it and add some indexes, you can easily and very efficiently query necessary rows from that database. SQLite has lots of ready-to-use C/C++ client libraries available, you can start with default one at https://www.sqlite.org/cintro.html.

Clarification required regarding Arrays, Vectors and Maps in usage of a C++ Application

I want to know the right algorithm and a container class for my application. I am trying to build one Client-Server communication system where the Server contains group of files (.txt). The file structure (prototype) is like:
A|B|C|D....|Z$(some integer value)#(some integer value). Again the contents of A to Z are a1_a2_a3_a4......aN|b1_b2_b3_b4......bN|......|z1_z2_z3_z4.....zN. So what I wanted to do is when Server application has started, it has to load these files one-by-one and save the contents of each file in a Container class and again the contents of the file into particular variables based on the delimiters i.e.
for (int i=0; i< (Number of files); i++)
{
1) Load the file[0] in Container class[0];
2) Read the Container class[0] search for occurences of delimiters "_" and "|"
3) Till next "|" occurs, save the value occurred at "_" to an array or variable (save it in a buffer)
4) Do this till the file length completes or reaches EOF
5) Next read the second file, save it in Container class[1] and follow the steps as in 2),3) and 4)
}
I want to know if Vector or Map suits my requirement? As I need to search for occurrences of delimiters and push_back them and access while necessity comes.
Can I read whole single file as block and manipulate with the buffer or while file read only using seekg I can push the values to stack? One which will be better and easier to implement? What are the possibilities of using regex?
According to the format of input, and its size, I'd suggest doing something along these lines for reading and parsing the input:
void ParseOneFile (std::istream & inp)
{
std::vector<std::vector<std::string>> data;
int some_int_1 = 0, some_int_2 = 0;
std::string temp;
data.push_back ({});
while (0 == 0)
{
int c = inp.get();
if ('$' == c)
{
data.back().emplace_back (std::move(temp));
break;
}
else if ('|' == c)
{
data.back().emplace_back (std::move(temp));
data.push_back ({});
}
else if ('_' == c)
data.back().emplace_back (std::move(temp));
else
temp += char(c);
}
char sharp;
inp >> some_int_1 >> sharp >> some_int_2;
assert ('#' == sharp);
// Here, you have your data and your two integers...
}
The above function does not return the information it extracts, so you will want to change that. But it does read one of your files into a vector of vector of strings called data and two integers (some_int_1 and some_int_2.) It uses C++11 and does this reading and parsing quite efficiently, both in terms of processing and memory.
And, the above code does not check for any errors and inconsistent formatting in the input file.
Now, for your data structure problem. Since I have no idea about the nature of your data, I can't say for sure. All I can say is that a two-dimensional array and two integers on the side feels like a natural fit for this data. Since you have several files, you can store them all in another dimension of vector (or perhaps in a map, mapping a file name to a data structure like the following:
struct OneFile
{
vector<vector<string>> data;
int i1, i2;
};
vector<OneFile> all_files;
// or...
// map<string, OneFile> all_files;
The above function would fill one instance of the OneFile struct above.
As an example, all_files[0].data[0][0] will be a string referring to data item A0 in the first file, and all_files[7].data[25][3] will be another string referring to data item Z3 in the 8th file.

How can ofstream write NULL to a file in binary mode?

I am maintaining a C++ method which one of my clients is hitting an issue with. The method is supposed to write out a series of identifiers to a file delimited by a new line. However on their machine somehow the method is writing a series of NULL's out to the file. Opening the file in a binary editor shows that it contains all zeros.
I can't understand why this is happening. I've tried assigning empty strings and strings with the first character set to 0. There is no problem creating the file, just writing the identifiers to it.
Here is the method:
void writeIdentifiers(std::vector<std::string> IDs, std::string filename)
{
std::ofstream out (filename.c_str(), std::ofstream::binary);
if (out.is_open())
{
for (std::vector<std::string>::iterator it = IDs.begin();
it != IDs.end();
it++)
{
out << *it << "\n";
}
}
out.close();
}
My questions: is there any possible input you can provide that method which will create a file which has NULL values in it?
Yeah, the following code quite clearly writes a series of NULL bytes:
std::vector<std::string> ids;
std::string nullstring;
nullstring.assign("\0\0\0\0\0\0\0\0\0\0", 10);
ids.push_back(nullstring);
writeIdentifiers(ids, "test.dat");
Because the std::string container stores the string length, it can't necessarily be used in the same way as an ordinary C (null-terminated) string. Here, I assign a string containing 10 NULL bytes. Those are then output because the string length is 10.

Very large look up table C++ - can I avoid typing the whole thing out?

I am not a programmer, but am an engineer who needs to use C++ coding on this occasion, so sorry if this question is a little basic.
I need to use a look up table as I have some highly non-linear dynamics going on that I need to model. It consists of literally 1000 paired values, from a pair of (0.022815, 0.7) up to (6.9453, 21.85).
I don't want to have to type all these values out in my C code. The values are currently stored in Matlab. Can I read them from a .dat file or something similar?
I will have calculated a value and simply want the program to kick out the paired value.
Thanks,
Adam
You can't read something stored in Matlab directly, unless you want to
write a parser for whatever format Matlab stores its data in. I'm not
familiar with Matlab, but I would be very surprised if it didn't have a
function to output this data to a file, in some text format, which you
could read and parse.
Assuming this is constant data, if it could output something along the
lines of:
{ 0.022815, 0.7 },
...
{ 6.9453, 21.85 },
you could include it as the initializer of a table in C++. (It may look
strange to have a #include in the middle of a variable definition, but
it's perfectly legal, and in such cases, perfectly justified.) Or just
copy/paste it into your C++ program.
If you can't get exactly this format directly, it should be trivial to
write a small script that would convert whatever format you get into
this format.
this program defines a map, then reading from a.txt file, inserting to a map, iterating on map for any purposes you have, and finally writing the map into a file.
just a simple practice:
#include <fstream>
#include <iostream>
#include <map>
using namespace std;
int main(){
ifstream inFile("a.txt", ios::in);
if (! inFile ){
cout<<"unabl to open";
return 0;
}
//reading a file and inserting in a map
map<double,double> mymap;
double a,b;
while( ! inFile.eof() ){
inFile>>a>>b;
mymap.insert ( a,b );
}
inFile.close(); //be sure to close the file
//iterating on map
map<double,double>::iterator it;
for ( it=mymap.begin() ; it != mymap.end(); it++ ){
// (*it).first
// (*it).second
}
//writing the map into a file
ofstream outFile;
outFile.open ("a.txt", ios::out); // or ios::app if you want to append
for ( it=mymap.begin() ; it != mymap.end(); it++ ){
outFile << (*it).first << " - " << (*it).second << endl; //what ever!
}
outFile.close();
return 0;
}
What I would do for this is as follows as I think this is faster than file open and close. First of all create a header file which contains all the data in an array. You could you a "replace all" available in Notepad or so to replace the () braces to { } braces. Later on you could even write a script that makes the header file from the Matlab file
>> cat import_data.h
#define TBL_SIZE 4 // In your case it is 1000
const double table[TBL_SIZE][2] =
{
{ 0.022815, 0.7 },
{ 6.9453, 21.85 },
{ 4.666, 565.9},
{ 567.9, 34.6}
};
Now in the main program you include this header also for the data
>> cat lookup.c
#include <stdio.h>
#include "import_data.h"
double lookup(double key)
{
int i=0;
for(;i<TBL_SIZE; i++) {
if(table[i][0] == key)
return table[i][1];
}
return -1; //error
}
int main() {
printf("1. Value is %f\n", lookup(6.9453));
printf("2. Value is %f\n", lookup(4.666));
printf("3. Value is %f\n", lookup(4.6));
return 0;
}
Yes, you can read them from the dat file. The question is, what format is the dat file? Once you know that, you want to use:
fopen
fread
fclose
for C and
ifstream
for C++ (or something similar).
The program still has to get those pairs from the file and load them in memory. You can loop through the lines in the file, parse the pairs and shove them in a std::map.
Something like this:
#include<fstream>
#include<map>
...
ifstream infile("yourdatfile.dat");
std::string str;
std::map<double, double> m; //use appropriate type(s)
while(getline(infile, str)){
//split str by comma or some delimiter and get the key, value
//put key, value in m
}
//use m
For the signal processing toolbox you can export data to C header files
directly from Matlab(don't know if it's your particular case):
Matlab export to C header
Or maybe the following article could be of help:
Exporting/Importing Data To/From MATLAB
One of options is to generate the C++ lookup table in matlab. Just write to some text file (lookup.cpp), read table producing C++ source...