Clarification required regarding Arrays, Vectors and Maps in usage of a C++ Application - c++

I want to know the right algorithm and a container class for my application. I am trying to build one Client-Server communication system where the Server contains group of files (.txt). The file structure (prototype) is like:
A|B|C|D....|Z$(some integer value)#(some integer value). Again the contents of A to Z are a1_a2_a3_a4......aN|b1_b2_b3_b4......bN|......|z1_z2_z3_z4.....zN. So what I wanted to do is when Server application has started, it has to load these files one-by-one and save the contents of each file in a Container class and again the contents of the file into particular variables based on the delimiters i.e.
for (int i=0; i< (Number of files); i++)
{
1) Load the file[0] in Container class[0];
2) Read the Container class[0] search for occurences of delimiters "_" and "|"
3) Till next "|" occurs, save the value occurred at "_" to an array or variable (save it in a buffer)
4) Do this till the file length completes or reaches EOF
5) Next read the second file, save it in Container class[1] and follow the steps as in 2),3) and 4)
}
I want to know if Vector or Map suits my requirement? As I need to search for occurrences of delimiters and push_back them and access while necessity comes.
Can I read whole single file as block and manipulate with the buffer or while file read only using seekg I can push the values to stack? One which will be better and easier to implement? What are the possibilities of using regex?

According to the format of input, and its size, I'd suggest doing something along these lines for reading and parsing the input:
void ParseOneFile (std::istream & inp)
{
std::vector<std::vector<std::string>> data;
int some_int_1 = 0, some_int_2 = 0;
std::string temp;
data.push_back ({});
while (0 == 0)
{
int c = inp.get();
if ('$' == c)
{
data.back().emplace_back (std::move(temp));
break;
}
else if ('|' == c)
{
data.back().emplace_back (std::move(temp));
data.push_back ({});
}
else if ('_' == c)
data.back().emplace_back (std::move(temp));
else
temp += char(c);
}
char sharp;
inp >> some_int_1 >> sharp >> some_int_2;
assert ('#' == sharp);
// Here, you have your data and your two integers...
}
The above function does not return the information it extracts, so you will want to change that. But it does read one of your files into a vector of vector of strings called data and two integers (some_int_1 and some_int_2.) It uses C++11 and does this reading and parsing quite efficiently, both in terms of processing and memory.
And, the above code does not check for any errors and inconsistent formatting in the input file.
Now, for your data structure problem. Since I have no idea about the nature of your data, I can't say for sure. All I can say is that a two-dimensional array and two integers on the side feels like a natural fit for this data. Since you have several files, you can store them all in another dimension of vector (or perhaps in a map, mapping a file name to a data structure like the following:
struct OneFile
{
vector<vector<string>> data;
int i1, i2;
};
vector<OneFile> all_files;
// or...
// map<string, OneFile> all_files;
The above function would fill one instance of the OneFile struct above.
As an example, all_files[0].data[0][0] will be a string referring to data item A0 in the first file, and all_files[7].data[25][3] will be another string referring to data item Z3 in the 8th file.

Related

C++ Read in file element by element, but executing functions every line

I have a file that I need to read in. Each line of the file is exceedingly long, so I'd rather not read each line to a temporary string and then manipulate those strings (unless this isn't actually inefficient - I could be wrong). Each line of the file contains a string of triplets - two numbers and a complex number, separated by a colon (as opposed to a comma, which is used in the complex number). My current code goes something like this:
while (states.eof() == 0)
{
std::istringstream complexString;
getline(states, tmp_str, ':');
tmp_triplet.row() = stoi(tmp_str);
getline(states, tmp_str, ':');
tmp_triplet.col() = stoi(tmp_str);
getline(states, tmp_str, ':');
complexString.str (tmp_str);
complexString >> tmp_triplet.value();
// Then something useful done with the triplet before moving onto the next one
}
tmp_triplet is a variable that stores these three numbers. I want some way to run a function every line (specifically, the triplets in every line are pushed into a vector, and each line in the file denotes a different vector). I'm sure there's an easy way to go about this, but I just want a way to check whether the end of the line has been reached, and to run a function when this is the case.
When trying to plan stuff out, abstraction can be your best friend. If you break down what you want to do by abstract functionality, you can more easily decide what data types should be used and how different data types should be planned out, and often you can find some functions almost write themselves. And typically, your code will be more modular (almost by definition), which will make it easy to reuse, maintain, and adapt if future changes are needed.
For example, it sounds like you want to parse a file. So that should be a function.
To do that function, you want to read in the file lines then process the file lines. So you can make two functions, one for each of those actions, and just call the functions.
To read in file lines you just want to take a file stream, and return a collection of strings for each line.
To process file lines you want to take a collection of strings and for each one parse the string into a triplet value. So you can create a method that takes a string and breaks it into a triplet, and just use that method here.
To process a string you just need to take a string and assign the first part as the row, the second part as the column, and the third part as the value.
struct TripletValue
{
int Row;
int Col;
int Val;
};
std::vector<TripletValue> ParseFile(std::istream& inputStream)
{
std::vector<std::string> fileLines = ReadFileLines(inputStream);
std::vector<TripletValue> parsedValues = GetValuesFromData(fileLines);
return parsedValues;
}
std::vector<std::string> ReadFileLines(std::istream& inputStream)
{
std::vector<std::string> fileLines;
while (!inputStream.eof())
{
std::string fileLine;
getline(inputStream, fileLine);
fileLines.push_back(fileLine);
}
return fileLines;
}
std::vector<TripletValue> GetValuesFromData(std::vector<std::string> data)
{
std::vector<TripletValue> values;
for (int i = 0; i < data.size(); i++)
{
TripletValue parsedValue = ParseLine(data[i]);
values.push_back(parsedValue);
}
return values;
}
TripletValue ParseLine(std::string fileLine)
{
std::stringstream sstream;
sstream << fileLine;
TripletValue parsedValue;
std::string strValue;
sstream >> strValue;
parsedValue.Row = stoi(strValue);
sstream >> strValue;
parsedValue.Col = stoi(strValue);
sstream >> strValue;
parsedValue.Val = stoi(strValue);
return parsedValue;
}

Reading integers from file in separate arrays, one digit at a time, in C++

I have a text file that contains several lines, each line containing two very large integers.
I need to read the first integer on the line, store each one of its digits in an int array, read the second integer on the line, store each one of its digits in another int array. Then I should perform some operations (adding them, multiplying them etc), then repeat the procedure for the second line in the text file and so on.
I don't know how to read the integers this way. I would be able to read one integer only as an array of digits, but I don't know how to differentiate between the integers separated by space, much less how to tell the compiler when to switch the line.
The reason why I can't read the integers as int variables is, as I said, that they are too large for common numeric operations, so I must do them the same way I would by hand. I've written functions to replicate the process, but they need arrays of digits.
I tried to use fscanf or getline , but anything similar will read both integers on the line in one single array. Also, anything that reads until a space is encountered will read ALL of my numbers, not only the ones on the line I'm at.
The ideal would be two arrays, each containing the digits of one integer, that I keep reinitialising every time I switch the line.
Any suggestions on how to do this (or ideas that follow a different approach to do the same) would be appreciated.
Using boost library (algorithm for string split function, and lexical cast for conversion), you may take a look at this code snippets - (without validation)
typedef std::vector<int> intarray;
intarray da[2];
std::string s;
std::fstream f(filename,std::ios::in);
while(!f.eof() && !f.fail())
{
std::getline(f, s );
std::vector<std::string> v;
boost::algorithm::split(v, s, boost::algorithm::is_any_of(" "));
for(int j = 0; j<1; ++j)
{
std::string fs = v.at(j);
for(int i = 0; i<fs.size(); ++i)
{
try
{
int d = boost::lexical_cast<int>(fs.at(i));
da[j].push_back(d);
}
catch(bad_lexical_cast& e)
{
std::cout << "caught exception.\n";
break;
}
}
}
}

Reading a CSV to vectors in objects

I'm trying to write code that will, on a line-by-line basis, pass numerical data from a CSV to an object's vector. The object's structure is as follows: the object itself (let's call it CS) is an enclosed space, within which resides a vector of objects (called Points) which each have a vector of objects (Features) with 3 variables. The first two variables in these Features are descriptors of the feature and the third is the actual value taken by a specific Point[i].Feature[j]. Each point has the same set of Features, and aside from third value being different, the descriptors are likewise identical. (edit: Sadly I can't change this structure as it's part of a larger framework which is out of my hands)
Currently, my CSV has one column per feature, the first two rows being the descriptors which apply for all points and the rest of the rows being each individual point's third feature value. It's been a while since my introductory C++ course and I'm finding it hard to think of a fast way to implement this, as my CSVs could become fairly large (my current upper limit is 50000 points having 2000 features, this will probably grow) and I wouldn't want to do something silly like rereading the first two lines every time for each point. I've looked around and most CSV solutions involve string CSVs, which I don't have to deal with, and simpler array objects in which the CSV is stored. The problem for me is simply going up a level each time I reach the end of the line and restarting the procedure for the next point, and I can't think of anything. Any tips?
You could just create a temporary array of Descriptor objects which holds the two descriptors for each column and then read in your first row and create your Point objects from that. Afterwards you can just copy the descriptors from the Point a row above, e.g. Point[i-csvWidth], and deallocate the Descriptor array.
I guess I was nearly there, just used the wrong kind of variable to read in.
fstream myFile;
myFile.open(filePath.c_str());
if(!myFile){
cout << "File \"" << filePath << "\" doesn't exist, exiting program." << endl;
exit(EXIT_FAILURE);
}
string line,line2,line3;
Points.clear();
//gets the range row
getline(myFile,line);
istringstream lineStream(line);
//gets the nomin row
getline(myFile,line2);
istringstream lineStream2(line2);
//gets the first person's traits
getline(myFile,line3);
istringstream lineStream3(line3);
CultVec originalCultVec = CultVec(RNG);
int val,val2,val3,val4;
while (lineStream >> val && lineStream2 >> val2 && lineStream3 >> val3) {
Feature feature;
feature.Range = (char)val;
feature.Nomin = (bool)val2;
feature.Trait = (char)val3;
originalCultVec.addFeature(feature);
} // while
Points.push_back(originalCultVec);
while (getline(myFile,line)) {
int i = 0;
CultVec newVec = CultVec(RNG);
istringstream lineStream4(line);
while ( lineStream4 >> val4 ) {
Feature newFeat = originalCultVec.getFeature(i);
newFeat.Trait = (char)val4;
newVec.addFeature(newFeat);
i++;
}
Points.push_back(newVec);
}

Entering and storing a string which can be of any length as the user wishes

What I want to do:
Store records in a file. These records have two things.
time_t rt; //which stores the time the record was entered by the user
and along with this I want to store one string. But I don't know the length of the string.
It will be decided on run time and will depend on how many characters the user enters.
What needs to be done(According to me):
I have no clue. I know about dynamic memory allocation but did not know how to apply this to such a problem.
What I have tried:
I have tried to take one charachter at a time from the user and store it in a text file(Temporarily).
ofstream fileObject;
fileObject.open("temp.txt");
for(int j=0;;j++)
{
ch = _getche();
if( ch == 13) break; //user has pressed the return key
fileObject<<ch;
}
Then I found out the size of the file using the following code:
fileObject.seekp(0,ios::end);
long pos = fileObject.tellg(); //this is the size of the file
Then I declared a dynamic array of the size of the file.
char * entry;
entry = new char[pos]
Closed the file in the "out" mode and opened it again in the "in" mode.
fileObject.close();
ifstream fout;
fout.open("temp.txt"); //this is the name of the text file that i had given
Then character wise I copied the content of the text file into the character array:
for(int i=0;i<pos;i++)
fout>>info[i];
info[i] = '\0';
fout.close();
But now i dont know what to do further.
What I need you to help me with:
Help me to write this record as a class object into a binary ".dat" file.
My specs:
Windows XP SP 3
IDE: Visual C++ 2010 Express
What are the restrictions on the string? And how do you
recognize that the user has entered all of the data he wants in
the string?
If the string has to be a single line, and we can assume
"reasonable" length (i.e. it will easily fit into memory), then
you can use std::getline to get the string into an
std::string (for input), and then define the output format,
say "%Y-%m-%d %H:%M:%S: user string\n" for
the file. If the user string can be several lines, you'll have
to define a protocol to input them (so you can know when
a single record is finished), and a more complex format for the
file: one suggestion would be to separate records by an empty
line (which means that the input cannot contain an empty line),
or to use a record header along the lines of: "%Y-%m-%d
%H:%M:%S line_count\n". (Subversion uses
a variant of this for it's commit messages. With a bit more
information, however, but the timestamp and the number of lines
are there.)
use std::string and std::getline, both from the <string> header
If you are using c++ then std::string is best.
std::string abc="";
I want to store one string. But I don't know the length of the string.
Then you need to use std::string and not a preallocated array of chars.
struct user_record
{
time_t rt; //which stores the time the record was entered by the user
std::string one_string;
};
Help me to write this record as a class object into a binary ".dat" file.
There are a number of serialisation options available to you. Perhaps the simplest is to write this as plain text using the standard stream operations:
std::ostream& operator <<(std::ostream& os, user_record const& ur)
{
return os << ur.rt << ' ' << ur.one_string;
}
std::istream& operator >>(std::istream& is, user_record& ur)
{
return is >> ur.rt >> ur.one_string;
}
For anything more involved than a single-line string, then perhaps you should investigate Boost's serialisation library.

How to read in a data file of unknown dimensions in C/C++

I have a data file which contains data in row/colum form. I would like a way to read this data in to a 2D array in C or C++ (whichever is easier) but I don't know how many rows or columns the file might have before I start reading it in.
At the top of the file is a commented line giving a series of numbers relating to what each column holds. Each row is holding the data for each number at a point in time, so an example data file (a small one - the ones i'm using are much bigger!) could be like:
# 1 4 6 28
21.2 492.1 58201.5 586.2
182.4 1284.2 12059. 28195.2
.....
I am currently using Python to read in the data using numpy.loadtxt which conveniently splits the data in row/column form whatever the data array size, but this is getting quite slow. I want to be able to do this reliably in C or C++.
I can see some options:
Add a header tag with the dimensions from my extraction program
# 1 4 6 28
# xdim, ydim
21.2 492.1 58201.5 586.2
182.4 1284.2 12059. 28195.2
.....
but this requires rewriting my extraction programs and programs which use the extracted data, which is quite intensive.
Store the data in a database file eg. MySQL, SQLite etc. Then the data could be extracted on demand. This might be a requirement further along in the development process so it might be good to look into anyway.
Use Python to read in the data and wrap C code for the analysis. This might be easiest in the short run.
Use wc on linux to find the number of lines and number of words in the header to find the dimensions.
echo $((`cat FILE | wc -l` - 1)) # get number of rows (-1 for header line)
echo $((`cat FILE | head -n 1 | wc -w` - 1)) # get number of columns (-1 for '#' character)
Use C/C++ code
This question is mostly related to point 5 - if there is an easy and reliable way to do this in C/C++. Otherwise any other suggestions would be welcome
Thanks
Create table as vector of vectors:
std::vector<std::vector<double> > table;
Inside infinite (while(true)) loop:
Read line:
std::string line;
std::getline(ifs, line);
If something went wrong (probably EOF), exit the loop:
if(!ifs)
break;
Skip that line if it's a comment:
if(line[0] == '#')
continue;
Read row contents into vector:
std::vector<double> row;
std::copy(std::istream_iterator<double>(ifs),
std::istream_iterator<double>(),
std::back_inserter(row));
Add row to table;
table.push_back(row);
At the time you're out of the loop, "table" contains the data:
table.size() is the number of rows
table[i] is row i
table[i].size() is the number of cols. in row i
table[i][j] is the element at the j-th col. of row i
How about:
Load the file.
Count the number of rows and columns.
Close the file.
Allocate the memory needed.
Load the file again.
Fill the array with data.
Every .obj (3D model file) loader I've seen uses this method. :)
Figured out a way to do this. Thanks go mostly to Manuel as it was the most informative answer.
std::vector< std::vector<double> > readIn2dData(const char* filename)
{
/* Function takes a char* filename argument and returns a
* 2d dynamic array containing the data
*/
std::vector< std::vector<double> > table;
std::fstream ifs;
/* open file */
ifs.open(filename);
while (true)
{
std::string line;
double buf;
getline(ifs, line);
std::stringstream ss(line, std::ios_base::out|std::ios_base::in|std::ios_base::binary);
if (!ifs)
// mainly catch EOF
break;
if (line[0] == '#' || line.empty())
// catch empty lines or comment lines
continue;
std::vector<double> row;
while (ss >> buf)
row.push_back(buf);
table.push_back(row);
}
ifs.close();
return table;
}
Basically create a vector of vectors. The only difficulty was splitting by whitespace which is taken care of with the stringstream object. This may not be the most effective way of doing it but it certainly works in the short term!
Also I'm looking for a replacement for the deprecated atof function, but nevermind. Just needs some memory leak checking (it shouldn't have any since most of the objects are std objects) and I'm done.
Thanks for all your help
Do you need a square or a ragged matrix? If the latter, create a structure like this:
std:vector < std::vector <double> > data;
Now read each line at a time into a:
vector <double> d;
and add the vector to the ragged matrix:
data.push_back( d );
All data structures involved are dynamic, and will grow as required.
I've seen your answer, and while it's not bad, I don't think it's ideal either. At least as I understand your original question, the first comment basically specifies how many columns you'll have in each of the remaining rows. e.g. the one you've given ("1 4 6 28") contains four numbers, which can be interpreted as saying each succeeding line will contain 4 numbers.
Assuming that's correct, I'd use that data to optimize reading the data. In particular, after that, (again, as I understand it) the file just contains row after row of numbers. That being the case, I'd put all the numbers together into a single vector, and use the number of columns from the header to index into the rest:
class matrix {
std::vector<double> data;
int columns;
public:
// a matrix is 2D, with fixed number of columns, and arbitrary number of rows.
matrix(int cols) : columns(cols) {}
// just read raw data from stream into vector:
std::istream &read(std::istream &stream) {
std::copy(std::istream_iterator<double>(stream),
std::istream_iterator<double>(),
std::back_inserter(data));
return stream;
}
// Do 2D addressing by converting rows/columns to a linear address
// If you want to check subscripts, use vector.at(x) instead of vector[x].
double operator()(size_t row, size_t col) {
return data[row*columns+col];
}
};
This is all pretty straightfoward -- the matrix knows how many columns it has, so you can do x,y indexing into the matrix, even though it stores all its data in a single vector. Reading the data from the stream just means copying that data from the stream into the vector. To deal with the header, and simplify creating a matrix from the data in a stream, we can use a simple function like this:
matrix read_data(std::string name) {
// read one line from the stream.
std::ifstream in(name.c_str());
std::string line;
std::getline(in, line);
// break that up into space-separated groups:
std::istringstream temp(line);
std::vector<std::string> counter;
std::copy(std::istream_iterator<std::string>(temp),
std::istream_iterator<std::string>(),
std::back_inserter(counter));
// the number of columns is the number of groups, -1 for the leading '#'.
matrix m(counter.size()-1);
// Read the remaining data into the matrix.
m.read(in);
return m;
}
As it's written right now, this depends on your compiler implementing the "Named Return Value Optimization" (NRVO). Without that, the compiler will copy the entire matrix (probably a couple of times) when it's returned from the function. With the optimization, the compiler pre-allocates space for a matrix, and has read_data() generate the matrix in place.