Reading from a file into a data structure in C++ - c++

So, I have a text file (data.txt), It's a story, so just sentence after sentence, and fairly long. What I'm trying to do is to take every individual word from the file and store it in a data structure of some type. As user input I'm going to get a word as input, and then I need to find the 10 closest words(in data.txt) to that input word, using a function that finds the Levenshtein distance between 2 strings(I figured that function out though). So I figured I'd use getline() using " " as the delimiter to store the words individually. But i don't know what I should store these words into so that I can access them easily. And there's also the fact that I don't know how many words are in the data.txt file.
I may have explained this badly sorry, I'll answer any questions you have though, to clarify.

In C++ you can store the words in a vector of strings:
#include <vector>
#include <string>
//....
std::vector<std::string> wordsArray;
// read word
wordsArray.push_back(oneWord);

You need a data structure capable to "contain" the strings you read.
The standard library offer a number of "container" classes like:
vector
deque
list
set
map
Give a check to http://en.cppreference.com/w/cpp to the containers library and find the one that better fit your needs.
The proper answer changes depending not only on the fact you have to "store them" but also on what you have to do with them afterwards.

Related

most efficient way to check if a string is within a wordlist

I need the most efficient way to determine whether a string is in a wordlist (this is a textfile of all words).
Obviously I could create an ofstream object and loop through each line to see whether the string is present.
Is there a quicker way? Perhaps by using a map?
Thanks
To find a particular word in a list of many words, I would use a std::unordered_set, which is a hash-table by another name.
Basically:
Read words from file into set.
Pick a "random combination of letters".
use find() to see if it's in the set.
Go to 2 as required.
Obviously, if you only want to search for a single word, it's no point in loading into a set. Just read the file and check as you go (because on average you'll only need to read half the file, which is obviously about 50% faster than reading the whole file)

C++: Read individual lines from text file, sort words alphabetically

I've created a text file with 3 lines filled with random words. I want to :
Read each line in the file individually.
Sort the words in each line alphabetically
Output the sorted line to console.
This is what I've come up with so far [runnable]: https://gist.github.com/anonymous/6211515
It reads the lines and puts them inside a vector, sorts that vector and then prints the resulting vector to console. But I'm just sorting the lines, I'm not sorting the actual words. I'm inputting the entire line as a string, which is causing my problems. I'm very new to C++ programming and I'm not sure what I should do to be able to sort the actual words, not the lines.
This is not homework, it's just a problem that was suggested to me that I solve in preparation for an upcoming exam. It is of paramount importance that the solution be as simple as possible as this exam will be done with pen and paper.
At least if I understand what you want correctly, I'd do something like this:
Read a line into a string with std::getline.
Initialize a std::stringstream from the string
Read words from the stringstream into a vector
Sort the vector
Write the sorted words to the output.
Repeat until done.
What you're looking for is a lexicographical sorting algorithm. Which means, sort word's just like in dictionaries, with alphabetical order.
Standard c++ supports that algorithm. Take alook here: http://www.cplusplus.com/reference/algorithm/lexicographical_compare/
In order to access the implementation just #include <algorithm>
I'm assuming you want to write an algorithm that will do this, rather than use anything at all that's pre-written.
Once you have the strings in the vector, now you need to separate each one into words. So you would want to loop through each character in each string in the vector, find the spaces (or whatever delimiter your file is using), and then put everything before each space into another vector. And delete everything you find before each space from the string once you record it, so that it doesn't show up a second time.
Now you have a vector of words. Then just sort that vector the same way you're currently sorting text_file.

Store characters in C++ tree

How it is possible to store character values in binary tree? I have an CSV file with data, and I have to retrieve that data, search the database, then insert search results. I did that using C++ map from Standard Template Library, but now my task is to do that using tree structure. Searched the web, but haven't found anything about characters, just integers, like this: http://www.cprogramming.com/tutorial/lesson18.html
Thanks.
Edin.
Just use the code from your link and replace int by char.
I wouldn't use "my own" binary tree.
I would suggest you use, std::map or std::vector (depending on the amount of data, and many other factors) - start with vector, as that's the "easiest" - if that can be proven to be "bad", then change it - if you write your code well, it shouldn't change much.
But more importantly, when you say "character", I suspect you actually mean "string". So a vector with a class or struct containing your elements from the csv file would be a sutiable solution.

Threat only unique strings - what is faster vector<std::string> or just std::string

I read from file some strings, and I need to ignore strings that I already treated. First my thought was to create vector<std::string> where I will store strings and after receiving new one check if it is already in the vector. But then I though that I can do the same using just std::string, I think that it is faster and uses less memory, but this way isn't that obvious then using vector. Which approach is better?
A better solution would be to store the strings that you have read in a std::set<string>.
Set lookups are generally faster than lookups in a vector, because sets in C++ standard library are organized as binary trees. If you put all your strings in a single long string, your search would remain linear, and you would have one more problem to solve: dealing with word aliasing. You wouldn't be able to concatenate strings as-is, without a separator, because you wouldn't be able to distinguish between "abc"+"xyz" and "abcxyz"

ifstream best way to read without memory usage

I have a textfile that contains authors and the books written by authors. I am assigned to write a program in that the user will provide the name of a author. And the program must print the name of any books written by that author.
I understand that i am supposed to use an ifstream to read this information. But how can i make it so my program doesn't read the entire file into memory (array, vector, etc.) to perform any of the search queries?
What would be the best way to approach this? my program should use classes as well.
I don't know the whole answer, or even the syntax, but a good way to get started is what do you know about the format of the input text file? Is it simply a two-column file like: [Author Book] separated by a common delimiter? In that case, you could construct a loop that goes through the whole file and only store the entries into a vector that match the search string.
A lot here depends on how often you're going to look for books from the file. If yo're only going to look for one or two, then the most sensible method is probably to just scan through the file reading pairs of lines to find the ones you want.
Most other methods assume that you're going to look for data in the file often enough to justify spending some extra time up front to optimize the queries later. Assuming that's correct, one possibility would be to create an index by reading through the file, hashing each author's name, and the position in the file of the "record" for that author/book pair.
Then you'll store those pairs of hash/file offset into a separate file. When you want to do a query, you'll read the hashes/files offsets into memory. Hash the name of the author you're searching for (using the same algorithm) and see which (if any) file offsets have the same hash value. Seek to those spots in the file, and read in the record for the book. At that point, re-compare the author name in the file to the author name that was entered, in case of a hash collision. Show the records where you get a match.