Tokenization: Refer to particular tokens after read multiple .dat files - c++

I've used the code below to read multiple .dat files into 2D vectors and print out tokens values. However, I need to know if all tokens values will be stored in the memory after the compilation completes, and how can I refer to a certain element like token[3][27] as an example for further processing:
for (int i = 0; i < files.size(); ++i) {
cout << "file name: " << files[i] << endl;
fin.open(files[i].c_str());
if (!fin.is_open()) {
cout<<"error"<<endl;
}
std::vector<vector<string>> tokens;
int current_line = 0;
std::string line;
while (std::getline(fin, line))
{
cout<<"line number: "<<current_line<<endl;
// Create an empty vector for this line
tokens.push_back(vector<string>());
//copy line into is
std::istringstream is(line);
std::string token;
int n = 0;
//parsing
while (getline(is, token, DELIMITER))
{
tokens[current_line].push_back(token);
cout<<"token["<<current_line<<"]["<<n<<"] = " << token <<endl;
n++;
}
cout<<"\n";
current_line++;
}
fin.clear();
fin.close();
}
Do I need to create 2D vector for each file? can that be achieved during the runtime in C++ ?

If you want to use your 2D vector further you need to declare it outside the for loop. The way you did it you create a local variable which is destroyed each and every loop iteration.
for (int i = 0; i < files.size(); ++i) {
std::vector<vector<string>> tokens(i);
}
tokens[0][0]; // you can't do it here: variable tokens not declared in this scope
Of course you can use your tokens container right after the while loop, addressing certain token just the way you mentioned it.
To use tokens outside the for loop you can either make a 3D vector holding files,lines,tokens, or make this a function which returns 2D vector for certain file, and then you can process it.

Related

Binary search check if string contains a string

I have a problem with using binary_search, it works, but only if the whole string is inserted as the search-key
I want it to work without searching after whole string, but just a key word and return "found" if a string(search-key) is part of another string(from sorted vector of strings)
case 5: // case til søgning efter telefonnummer
cout << "Indtast telefonnummer til soegning: " << endl;
getline(cin >> ws, key);
vector<string> mylines_sorted;
for (int i = 0; i < mylines.size(); i++) {
mylines_sorted.push_back(mylines[i]); // vector of strings is transferred to new vector of strings
}
sort(mylines_sorted.begin(), mylines_sorted.end());
for (int i = 0; i < mylines.size(); i++) {
cout << mylines_sorted[i] << endl; // just a check if data is sorted
}
bool result = binary_search(mylines_sorted.begin(), mylines_sorted.end(), key);
cout << result << endl; // another check
if (result == false) {
cout << "Soegning gav intet...!" << endl;
}
else {
cout << "Soegning: " << key << " findes i datafil!" << endl;
}
break;
}
return 0;
string line;
vector<string> mylines;
while (getline(database, line)) {
mylines.push_back(line);
}
I don't know if this part is relevant, I dont think so, but I transfer data from data file to vector of strings
struct Data {
char navn[80];
char addresse[80];
int alder;
unsigned int tlf;
};
There's a very simple way to get "words" from a string: Put the string into an std::istringstream and use std::istream_iterator<std::string> to get words out of it.
Combine this with the vectors insert function to add the strings to the vector you sort and search.
For example something like this:
// For each line...
for (auto const& line : mylines)
{
// Put the line into an input string stream
std::istringstream iss(line);
// Read from the string stream, adding words to the sorted_mylines vector
sorted_mylines.insert(end(sorted_mylines),
std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>());
}
After the above, sorted_mylines will contain all the words from all the lines in mylines.
You can now sort it and search for individual words. Or just skip the sorting and do a linear search.
Considering your edit, and the structure you use, I suggest you first read the file, parse each line into the corresponding structure, and create a vector of that instead of working with lines.
Then you could easily search for either name (which I recommend you split into separate first and last name) or address (which I recommend you also split up into its distinct parts, like street name, house number, postal code, etc.).
If you split it up then it will become much easier to search for specific parts. If you want a more generic search then to a linear loop over all entries, and look in all relevant structure members.

How to separate strings into 2D vector?

This is the file with data that I'm reading from:
MATH201,Discrete Mathematics
CSCI300,Introduction to Algorithms,CSCI200,MATH201
CSCI350,Operating Systems,CSCI300
CSCI101,Introduction to Programming in C++,CSCI100
CSCI100,Introduction to Computer Science
CSCI301,Advanced Programming in C++,CSCI101
CSCI400,Large Software Development,CSCI301,CSCI350
CSCI200,Data Structures,CSCI101
I have successfully read from the file and stored this information in a 2D Vector, but when I check the size of specific rows using course.info[x].size() it appears that its only counting the strings that have spaces between them. So for example, course.info[2].size() returns 2, whereas I would rather it would return 3. Essentially I want it to count each bit of information that is separated by a comma instead of a space. If I use while (getline(courses, line, ',') it puts the information separated by a comma in their own row, which is not what I want.
void LoadFile() {
vector<vector<string>> courseInfo;
ifstream courses;
courses.open("CourseInfo.txt");
if (!courses.is_open()) {
cout << "Error opening file";
}
if (courses) {
string line;
while (getline(courses, line)) {
courseInfo.push_back(vector<string>());
stringstream split(line);
string value;
while (split >> value) {
courseInfo.back().push_back(value);
}
}
}
for (int i = 0; i < courseInfo.size(); i++) {
for (int j = 0; j < courseInfo[i].size(); j++)
std::cout << courseInfo[i][j] << ' ';
std::cout << '\n';
}
getline(.., .., ',') is the tool for the job, but you need to use it in a different place.
Replace while (split >> value) with while (getline(split, value, ',').

Output not displaying all information C++

I have to create a program that will output various information about 1343 runners in a marathon. I'm having to import the data from a csv spreadsheet, so I chose to use the getline function. I use simple recursion to fill a string array and then simply use recursion once more to output the data. But for some reason, it only wants to display 300 or so runners' data. Here's the code:
int main(){
string data[1344];
vector<string> datav;
string header;
ifstream infile("C:\\Users\\Anthony\\Desktop\\cmarathon.csv");
int i = 0;
if (infile.is_open()) {
for (i=0; i<=1343; i++) {
getline(infile, data[i]);
}
datav.assign(data, data+1344);
for (int i = 0; i < datav.size(); i++) {
cout << datav[i] << "\n";
}
}
}
I attempted to use a vector in hopes it would help to allocate the required memory to execute the program properly (if that is in fact the problem here).
That code yields the perfect output of runners 1045-1343. I've tried simple work arounds, such as using several for() loops to combine the output seamlessly to no avail. Any information would be appreciated.
You do not need to copy from the array to the vector. You can add to the vector directly instead. Also, it is somewhat bad practice to shadow another local variable at the outer scope.
int main(){
string line;
vector<string> datav;
string header;
ifstream infile("C:\\Users\\Anthony\\Desktop\\cmarathon.csv");
if (infile.is_open()) {
// Are you supposed to read the header line first?
getline( infile, header );
while( getline( infile, line ).good() )
datav.push_back( line );
cout << "Container has " << datav.size() << " lines\n";
for (size_t i = 0; i < datav.size(); i++) {
cout << datav[i] << "\n";
}
}
}
Of course, you still have to break down each line to the individual fields, so pushing back a class or struct as EToreo suggested would be a good idea.
You should try using a struct to represent the fields in the CSV file and then make a vector of that struct type.
Now, loop through the file, reading each line till you reach the end of the file (Google how to do that) - DO NOT assume 1343, you don't have to. When you read in each line, create a new object from your struct and fill it with the content of that line (you will need to parse it by reading till a tab (\t) or the end of the string) and then datav.push(newObj) it onto your vector.
I suggest using the correct type's in your struct (int for age, string for name, etc.) and passing the string values from the file into those types. It will be much easier to do things like make a sum of everyone's age. You will thank yourself (and maybe me?) later.
If your not needing to use a vector:
for (i=0; i<=1343; i++) {
cout << data[i] << endl;
}
should work to print out whatever is in the data array
It is also possible to specify a delimeter for the getline function if you need to put different strings in different variables.
However EToreo's method may be more useful to you in the long run.

Why is the next line not executing C++

I have attached my full source code of my program that can open a .txt file. It doesn't execute after the cout << length. I am trying to store the .txt file information in memory by using an array.
#include <iostream>
#include <string.h>
#include <fstream>
using namespace std;
char filename[128];
char file[10][250];
int count;
int length;
string line;
int main ()
{
int count = 0;
int length = 0;
cout << "Filename: ";
cin.clear();
cin.getline(filename, sizeof(filename));
string new_inputfile(filename);
ifstream inputfiles (new_inputfile.c_str());
if(!inputfiles.is_open())
{
cout << "File could not be opened. \n ";
}
else
{
for (int i=0; getline(inputfiles,line); i++)
{
length++;
}
cout << length;
// char file[length][250]; <- How can I create the array based on the length variable?
// CODE DOES NOT EXECUTE AFTER THIS.
while(!inputfiles.eof() && (count<10))
{
inputfiles.getline(file[count],250);
count++;
}
for(int i=0; i < count; i++)
{
cout << file[i] << endl;
}
}
inputfiles.close();
return 0;
}
Also, since file[] is char, say for example file[1] contained the char Name=Mike, how do I strip off everything before the =. I want just Mike. I know with string, I can use substr() method, but I don't know for char.
This is horribly wasteful way to count number of lines in a file.
for (int i=0; getline(inputfiles,line); i++) // i is also completely useless here
{
length++;
}
You're reading the whole file only to throw everything away and start again! And after this loop is done, inputfiles.eof() will be true and you'll never enter neither the next while loop nor the last for loop (because i == count). Execution skips directly to inputfiles.close() and then you return from main.
I suggest you work on the line string as you go:
for ( ; getline(inputfiles, line); )
{
// do stuff with line and ditch the global char arrays
}
If you want store the lines for later, well, just save them :) The easiest thing to do is to use a vector:
std::vector<std::string> all_them_lines;
while (getline(file, line) all_them_lines.emplace_back(line);
There, the entire file is now saved in all_them_lines, line by line. You can access them just like you would in an array, like all_them_lines[0]. You also don't need to know the number of lines beforehand - vectors expand automatically when you add stuff to them.
Now to parse a line and extract formatted input from it, check out what stringstream class has to offer.
You asked:
// char file[length][250]; <- How can I create the array based on the length variable?
Declare file as:
char (*file)[250] = NULL;
and then,
file = new char[length][250];
Make sure you call delete [] file before the end of the function.
You said:
// CODE DOES NOT EXECUTE AFTER THIS.
You can rewind the stream and start reading from it again.
inputfiles.seekg(0);
count = 0;
while(!inputfiles.eof())
{
inputfiles.getline(file[count],250);
count++;
}

c++ how to build a 2D matrix of strings from a .dat file? 5 columns x rows

I need to read a .dat file which looks like this:
Atask1 Atask2 Atask3 Atask4 Atask5
Btask1 Btask2 Btask3 Btask4 Btask5
Ctask1 Ctask2 Ctask3 Ctask4 Ctask5
Dtask1 Dtask2 Dtask3 Dtask4 Dtask5
and i need to be able to output information like this:
cout << line(3) << endl; // required output shown below
>>Ctask1 Ctask2 Ctask3 Ctask4 Ctask5
cout << line(2)(4) << endl; // required output shown below
>>Btask4
I don't know how to read 1 line and split it into an array of 5 different strings.
I'd ideally like to have the whole .dat file converted into a vector or a list or some kind of matrix/array structure for easy reference
any simple code or solutions for this??
PLEASE HELP?!?!?!? :-)
EDIT:
vector<string> dutyVec[5];
dut1.open(dutyFILE);
if( !dut1.is_open() ){
cout << "Can't open file " << dutyFILE << endl;
exit(1);
}
if(dut1.eof()){
cout << "Empty file - no duties" << endl;
exit(1);
}
while ( !dut1.eof()){
int count = 0;
getline(dut1, dutyVec[count]);
count++;
}
Your problem addresses a number of issues, all of which I will attempt to answer in one go. So, forgive the length of this post.
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <fstream>
int main(int argc, char argv[]){
std::vector <std::string> v;//just a temporary string vector to store each line
std::ifstream ifile;
ifile.open("C://sample.txt");//this is the location of your text file (windows)
//check to see that the file was opened correctly
if(ifile.is_open()) {
//while the end of file character has not been read, do the following:
while(!ifile.eof()) {
std::string temp;//just a temporary string
getline(ifile, temp);//this gets all the text up to the newline character
v.push_back(temp);//add the line to the temporary string vector
}
ifile.close();//close the file
}
//this is the vector that will contain all the tokens that
//can be accessed via tokens[line-number][[position-number]
std::vector < std::vector<std::string> > tokens(v.size());//initialize it to be the size of the temporary string vector
//iterate over the tokens vector and fill it with data
for (int i=0; i<v.size(); i++) {
//tokenize the string here:
//by using an input stringstream
//see here: http://stackoverflow.com/questions/5167625/splitting-a-c-stdstring-using-tokens-e-g
std::istringstream f(v[i].c_str());
std::string temp;
while(std::getline(f, temp, ' ')) {
tokens[i].push_back(temp);//now tokens is completely filled with all the information from the file
}
}
//at this point, the tokens vector has been filled with the information
//now you can actually use it like you wanted:
//tokens[line-number][[position-number]
//let's test it below:
//let's see that the information is correct
for (int i=0; i<tokens.size(); i++) {
for(int j=0; j<tokens[i].size(); j++) {
std::cout << tokens[i][j] << ' ';
}
std::cout << std::endl;
}
system("pause");//only true if you use windows. (shudder)
return 0;
}
Note, I did not use iterators, which would have been beneficial here. But, that's something I think you can attempt for yourself.