How to separate words in a line from a text file into separate vectors? - c++

I am new to coding in C++. I am trying to lead a line from a text file. Each line contains two words. One word is the object, the other word is its color. So, one line could be Apple Red.
I am trying to read each line and store the first word (the object) into one string vector, and store the second word (the color), into another string vector. Any way of how to do this? Thanks.
I have tried using splitstring, and reading the line using a for loop until I find a space character that separates the two words, but I'm still confused.

Something like this:
std::ifstream file("filename.txt");
if (file.is_open())
{
std::string item, color;
std::string *ptrString = &item;
while (file >> *ptrString)
{
if (s == &color)
{
Your_Code_To_Process_Item(item, color);
item = "";
color = "";
}
s = (s==&item) : &color : &item; // toggle what s points to
}
}

Related

C++ CSV Getline

I have one column of floats in a csv file. No column header.
string val;
vector<float> array;
string file = "C:/path/test.csv";
ifstream csv(file);
if (csv.is_open())
{
string line;
getline(csv, line);
while (!csv.eof())
{
getline(csv, val, '\n');
array.push_back(stof(val));
}
csv.close();
}
I want to push the values in the column to vector array. When I use ',' as a delimiter it pushes the first line to the array but the rest of the column gets stuck together and unpushable. If I use '\n' it doesn't return the first line and I get a stof error.
I tried other answers unsuccessfully. What is the correct way to format this here?
test.csv
Your raw test.csv probably looks like this:
1.41286
1.425
1.49214
...
So there are no comma's, and using , as '(line) separator' would read the whole file and only parse the first float (up to the first \n).
Also, the first line is read but never used:
getline(csv, line); // <- first line never used
while (!csv.eof())
{
getline(csv, val, '\n');
array.push_back(stof(val));
}
Since there is only one field you don't have to use a separator and, as already mentioned in the comments, using while(getline(...)) is the right way to do this:
if (csv.is_open())
{
string line;
while (getline(..))
{
array.push_back(stof(val));
}
csv.close();
}

How to read in a file word by word in QT?

As you can probably tell I am new to QT and I am attempting to import my console app's src code and headers to qt to build a GUI. I am stuck on one particular function which is supposed to load a file and read it in word by word. I know how to do this in C++, but in QT I have been at it for hours and I am not quite sure how to go about it. Along with reading in the file, I have to insert a string (or in this case type T) by using my own personal insert function (irrelevant to the question).
As of right now i am doing which I know is not working for conversion reasons etc:
template <typename T>
bool HashTable<T>::load(const char* filename)
{
QString word;
QFile inputFile(filename);
QTextStream fin(filename);
// std::ifstream iss;
QString line;
// iss.clear();
// iss.open(filename);
while (fin >> word)
{
insert(word);
}
fin.close();
return true;
}
QTextStream does (to my knowledge) not support word-by-word reading of files, it only support reading a certain number of characters (via read(qint64 maxlen)), reading entire lines (via readLine(qint64 maxlen = 0)) or a combination of the above. An example on how to do this is described in this answer.
What you might do - in order to get a list of words - is reading line-by-line, and splitting each read line with QString's split() function, using space as separator.
template <typename T>
bool HashTable<T>::load(const char* filename)
{
QFile inputFile(filename);
if(!inputFile.open(QIODevice::ReadOnly)) {
QMessageBox::information(0, "error", inputFile.errorString());
}
QTextStream fin(&inputFile);
while(!fin.atEnd()) {
QString line = in.readLine();
QStringList words = line.split(" ");
foreach(QString word, words){
insert(word);
}
}
filename.close();
return true;
}
You have to first open your file for reading. Then textstream should be read line by line. In above code I read a line and then split it to words using space (" ") as the token. Then you can read words from the QStringlist.

I'm having trouble reading in lines from a text file?

I want to read in a line from a text file, store the line in an array of strings to later display the line numbers a word can be found on, and also break down the line into words for marking how many times unique words come up. I have successfully been able to break down the lines word by word and mark the frequency in which they appear but I'm struggling with storing the line in an array of strings so that I can use it later.
void get_word(istream& in_stream, string& w,
list<string> &wordlist, int& lineCount, string *line)
{
string t;
getline(in_stream,t);
for (int j=0; t[j]; j++)
t[j] = tolower(t[j]);
line = &t;
istringstream iss(t);
string word;
while(iss >> word)
{
insert_word(word, wordlist);
}
}
So far this is what I have and no matter what I try to do with the line where I try to assign the string "t" to the "line" array that's being pointed to it doesn't put anything in the array, I think I'm just completely missing something.
line is initialized as:
string line[0];
First of all, string line[0] gives you an array of no strings. That's not likely to be what you intended. Why not simply string line? That's one string. You'd pass a pointer to that string into your function, with &line.
Secondly, line = &t replaces the input pointer with a pointer to the local variable. This information is lost once the function ends. If you write *line = t then you are instead setting "the string that line points to" to the value of "the string t".
Ideally you'd avoid the "out argument" and just return the string you want, though.

Read contents of a text file character by character into a vector without skipping whitespace or new lines

So I have several text files. I need to figure out the 10 most common characters and words in the file. I've decided to use a vector, and load it with each character from the file. However, it needs to include white space and new lines.
This is my current function
void readText(ifstream& in1, vector<char> & list, int & spaces, int & words)
{
//Fills the list vector with each individual character from the text ifle
in1.open("test1");
in1.seekg(0, ios::beg);
std::streampos fileSize = in1.tellg();
list.resize(fileSize);
string temp;
char ch;
while (in1.get(ch))
{
//calculates words
switch(ch)
{
case ' ':
spaces++;
words++;
break;
default:
break;
}
list.push_back(ch);
}
in1.close();
}
But for some reason, it doesn't seem to properly hold all of the characters. I have another vector elsewhere in the program that has 256 ints all set to 0. It goes through the vector with the text in it and tallys up the characters with their 0-256 int value in the other vector. However, it's tallying them up fine but spaces and newlines are causing problems. Is there a more efficient way of doing this?
The problem with your code right now is that you're calling
list.resize(fileSize);
and use
list.push_back(ch);
in your read loop at the same time. You only need one or the other.
Omit one of them.
Is there a more efficient way of doing this?
The easiest way is to resize the std::vector <char> with the size you already know and use std::ifstream::read() to read in the whole file in one go. Calculate everything everything else from the vector contents afterwards.
Something along these lines:
list.resize(fileSize);
in1.read(&list[0],fileSize);
for(auto ch : list) {
switch(ch) {
// Process the characters ...
}
}

How to create an inverted index when I've already tokenized my file?

I'm trying to create an inverted index. I'm reading the lines of a text file, the text file has in the first position of each line the id of a document docId and the rest of the line has keywords about this document.
In order to create an inverted index, I first have to tokenize this text file. I did it with a function I wrote, and I store every word in a vector. My only gripe is that I also store the docId as a string in the vector. Here is the header of the tokenize function if you need it:
void tokenize(string& s, char c, vector<string>& v)
Now after tokenizing the file I have to create a function that puts every word in a map, i'm thinking of using an unordered map, in the map every word appears one time. I also have to somehow store the frequency of the word somewhere. I thought that using the docId as a key in the map would be a good idea but then I realized that I can only have one docId which will show me the word, while in my text file a docId has more than one words.
So, how am I going to solve this problem? Where should I begin?
What a mess of a question. Breaking it down, if I understand correctly you have:
doc1 word1a word1b word1c word1d
doc2 word2a word2b word2c
...
You want mappings from words to documents and vice versa. It's hard to tell from your question whether your talk of word "frequency" reflects the same word being a keyword for multiple documents, or whether the description you have of your file format failed to incorporate a needed count for repetitions within each file. Assuming the former:
if (std::ifstream f(filename))
{
std::map<std::string, std::vector<string>> words_in_doc;
std::map<std::string, std::vector<string>> docs_containing_word;
std::string line;
while (getline(f, line))
{
std::istringstream iss(line);
std::string docid, word;
if (line >> docid)
while (line >> word)
{
words_in_doc[docid].push_back(word);
docs_containing_word[word].push_back(docid);
}
}
// do whatever with your data/indices...
}
else
std::cerr << "unable to open input file\n";