Occurance of selected words in a text c++ - c++

I'm doing a research in linguistics and I need some help.
I have a list of names in a text file (names.txt)
and I need to find out how many times all the words that are in this file occur in another text file (data.txt).
So far I found a manual way by writing each word from the names.txt file in a string by hand. Is there a shorter way to solve this?
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream file("names.txt");
ifstream file("data.txt")
int wcount = 0;
string token;
string word("Jhon"); //here I write names which are supposed to be taken
string word1("James"); //from names.txt automatically
string word2("Rick");
string word3("Morty");
string word4("Alice");
string word5("Tina");
string word6("Timmy");
// ...
while (file>>token) //here I check if those words exist in data.txt
if ((word == token) || (word1== token)|| (word2 == token) || (word3== token)|| (word4 == token) || (word5== token) || (word6==token))
wcount++;
cout << wcount << endl;
return 0;

Use a std::vector<std::string> to hold the dictionary and std::find to look up words. Some people might argue that the find algorithm of std::set is faster than that of std::vector but you need a really huge number of elements before this algorithm outperforms the gain you get from std::vector being in consecutive memory.
#include <algorithm>
#include <fstream>
#include <iostream>
#include <vector>
int main()
{
std::ifstream names("names.txt");
std::ifstream data("data.txt");
std::vector<std::string> words = { "Jhon", "James", "Rick", "Morty", "Alice", "Tina", "Timmy" };
int wcount = 0;
std::string token;
while (data >> token) //here I check if those words exist in data.txt
if (std::find(std::begin(words), std::end(words), token) != std::end(words))
++wcount;
std::cout << wcount << '\n';
}

Related

How to access last element of const char*

I would like to store a dictionary in a vector of lists. Each lists contains all words that have the same starting letter in the alphabet. (e. g. ananas, apple)
My problem is that I cannot read any words starting with "z" in my const char* array into the list.
Could someone explain to me why and how to fix this/ Is there a way to realize it with const char*? Thank you!
#include <iostream>
#include <list>
#include <vector>
#include <iterator>
#include <algorithm>
#include <string>
#include <fstream>
std::pair<bool, std::vector<std::list<std::string>> > loadwithList()
{
const char* prefix = "abcdefghijklmnopqrstuvwxyz";
std::vector<std::list<std::string>> dictionary2;
std::ifstream infile("/Users/User/Desktop/Speller/Dictionaries/large", std::ios::in);
if (infile.is_open())
{
std::list<std::string> data;
std::string line;
while (std::getline(infile, line))
{
if (line.starts_with(*prefix) && *prefix != '\0')
{
data.push_front(line);
}
else
{
dictionary2.push_back(data);
data.clear();
prefix++;
}
}
infile.close();
return std::make_pair(true, dictionary2);
}
std::cout << "Cant find file\n";
return std::make_pair(false, dictionary2);
}
int main()
{
auto [loaded, dictionary2] = loadwithList();
if (!loaded) return 1;
}
Answer is already given and problems are explained.
Basically you would need a double nested loop. Outer loop would read word by word, inner loop would check a mtach for each of the characters in "prefix". This will be a lot of looping . . .
And somehow not efficient. It would be better to take a std::mapfor storing the data in the first place. And if you really need a std::vectorof std::lists, then we can copy the data. We will take care to store only lowercase alpha characters as the key of the std::map.
For test purposes I loaded a list with words from here. There are roundabout 450'000 words in this list.
I used this for my demo program.
Please see below one potential solution proposal:
#include <iostream>
#include <fstream>
#include <map>
#include <list>
#include <vector>
#include <utility>
#include <string>
#include <cctype>
std::pair<bool, std::vector<std::list<std::string>> > loadwithList() {
std::vector<std::list<std::string>> data{};
bool resultOK{};
// Open File and check, if it could be opened
if (std::ifstream ifs{ "r:\\words.txt" }; ifs) {
// Here we will store the dictionary
std::map<char, std::list<std::string>> dictionary{};
// Fill dictionary. Read complete file and sort according to firstc character
for (std::string line{}; std::getline(ifs, line); )
// Store only alpha letters and words
if (not line.empty() and std::isalpha(line.front()))
// Use lower case start character for map. All all words starting with that character
dictionary[std::tolower(line.front())].push_back(line);
// Reserve space for resulting vector
data.reserve(dictionary.size());
// Move result to vector
for (auto& [letter, words] : dictionary)
data.push_back(std::move(words));
// All good
resultOK = true;
}
else
std::cerr << "\n\n*** Error: Could not open source file\n\n";
// And give back the result
return { resultOK , data };
}
int main() {
auto [result, data] = loadwithList();
if ( result)
for (const std::list<std::string>&wordList : data)
std::cout << (char)std::tolower(wordList.front().front()) << " has " << wordList.size() << "\twords\n";
}
You loose the first word of each letter after 'a'. This is because when you reach a word of the next letter, the if(line.starts_with(*prefix) && *prefix != '\0') fails and only then you go to the next letter but also go to the next word.
You loose the whole letter 'z' because after the last line in your file - the if(line.starts_with(*prefix) && *prefix != '\0') has succeeded at this point - the while (std::getline(infile, line)) terminates and you miss the dictionary2.push_back(data);.

Find specific text in string delimited by newline characters

I want to find a specific string in a list of sentence. Each sentence is a line delimited with a \n. When the newline is reached the current search should stop and start new on the next line.
My program is:
#include <iostream>
#include <string.h>
using namespace std;
int main(){
string filename;
string list = "hello.txt\n abc.txt\n check.txt\n"
cin >> filename;
// suppose i run programs 2 times and at 1st time i enter abc.txt
// and at 2nd time i enter abc
if(list.find(filename) != std::string::npos){
//I want this condition to be true only when user enters complete
// file name. This condition also becoming true even for 'abc' or 'ab' or even for 'a' also
cout << file<< "exist in list";
}
else cout<< "file does not exist in list"
return 0;
}
Is there any way around. i want to find only filenames in the list
list.find will only find substring in the string list, but if you want to compare the whole string till you find the \n, you can tokenize the list and put in some vector.
For that, you can put the string list in std::istringstream and make a std::vector<std::string> out of it by using std::getline like:
std::istringstream ss(list);
std::vector<std::string> tokens;
std::string temp;
while (std::getline(ss, temp)){
tokens.emplace_back(temp);
}
If there are leading or trailing spaces in the tokens, you can trim the tokens before adding them to the vector. For trimming, see What's the best way to trim std::string?, find a trimming solution from there that suits you.
And after that, you can use find from <algorithm> to check for complete string in that vector.
if (std::find(tokens.begin(), tokens.end(), filename) != tokens.end())
std::cout << "found" << std::endl;
First of all I wouldn't keep the list of files in a single string, but I would use any sort of list or vector.
Then if keeping the list in a string is a necessity of yours (for some kind of reason in your application logic) I would separate the string in a vector, then cycle through the elements of the vector checking if the element is exactly the one searched.
To split the elements I would do:
std::vector<std::string> split_string(const std::string& str,
const std::string& delimiter)
{
std::vector<std::string> strings;
std::string::size_type pos = 0;
std::string::size_type prev = 0;
while ((pos = str.find(delimiter, prev)) != std::string::npos)
{
strings.push_back(str.substr(prev, pos - prev));
prev = pos + 1;
}
// To get the last substring (or only, if delimiter is not found)
strings.push_back(str.substr(prev));
return strings;
}
You can see an example of the function working here
Then just use the function and change your code to:
#include <iostream>
#include <string.h>
#include <vector>
using namespace std;
int main(){
string filename;
string list = "hello.txt\n abc.txt\n check.txt\n"
cin >> filename;
vector<string> fileList = split_string(list, "\n");
bool found = false;
for(int i = 0; i<fileList.size(); i++){
if(fileList.at(i) == file){
found = true;
}
}
if(found){
cout << file << "exist in list";
} else {
cout << "file does not exist in list";
}
return 0;
}
Obviously you need to declare and implement the function split_string somewhere in your code. Possibly before main declaration.

Read file into array and return it from a function C++

In Lua, I have such a function to read a file into an array:
function readFile(file)
local output = {}
local f = io.open(file)
for each in f:lines() do
output[#output+1] = each
end
f:close()
return output
end
Now in C++, I tried to write that like this:
string * readFile(file) {
string line;
static string output[] = {};
ifstream stream(file);
while(getline(stream, line)) {
output[sizeof(output)+1] = line;
}
stream.close();
return output;
}
I know you can't return arrays from functions, only pointers. So I did this:
string *lines = readFile("stuff.txt");
And it threw me the error cannot convert 'std::string {aka std::basic_string<char>} to' std::string* {aka std::basic_string<char>*}' in intialization string *lines = readFile("stuff.txt");
Can anyone tell me what is wrong here, and is there a better way to read files into arrays?
EDIT:
I'm going to be using the returned array to do value matching using a for loop. In Lua this would be written as:
for _, each in ipairs(output) do
if each == (some condition here) then
--Do Something
end
end
How can this be done in C++, using vectors (according to the answer by Jerry Coffin)?
EDIT 2:
I can't match the vectors correctly for some reason. I wrote the code in a separate test file.
int main() {
vector<string> stuff = read_pass();
cout << stuff.size() << endl;
cout << stuff[0] << endl;
if (stuff[0] == "admin") {
cout << "true";
}
else {
cout << "false";
}
return 0;
}
read_pass() looks like this:
vector<string> read_pass() {
ifstream stream("stuff.txt");
string line;
vector<string> lines;
while(getline(stream, line)) {
lines.push_back(line);
}
stream.close();
return lines;
}
And stuff.txt looks like this:
admin
why?
ksfndj
I just put it some random lines to test the code. Every time I compile and run main.cpp the output I get is
3
admin
false
So why isn't the code being matched properly?
EDIT 3:
So instead of forcing myself down the vectors method of doing things, I decided to try this instead:
#include <cstdlib>
#include <iostream>
#include <fstream>
#include <vector>
#include "basefunc.h"
using namespace std;
int main() {
string storedUsrnm;
string storedPw;
string pw = "admin";
string usrnm = "admin";
ifstream usernames("usrnm.accts");
ifstream passwords("usrpw.accts");
while(getline(usernames, storedUsrnm)) {
getline(passwords, storedPw);
print("StoredUsrnm " + storedUsrnm);
print("StoredPw: " + storedPw);
if (storedUsrnm == usrnm && storedPw == pw) {
print("True!");
return 0;
}
}
print("False!");
return 0;
}
Where print() is
void print(string str) {
cout << str << endl;
}
This still prints false, at the end, and it leads me to believe that for some reason, the "admin" read by the ifstream is different from the "admin" string. Any explanations for how this is so? Or does this code not work either?
Doesn't look to me like your current code should even compile. Anyway, I'd probably do something like this:
std::vector<std::string> read_file(std::istream &infile) {
std:string line;
std::vector<std::string> lines;
while (std::getline(infile, line))
lines.push_back(line);
return lines;
}
So the basic idea here is to read a line from the file, and if that succeeded, add that line (with push_back) to the vector of results. Repeat until reading a line from the file fails. Then return the vector of all the lines to the caller.
A few notes: especially at first, it's fairly safe to presume that any use of pointers is probably a mistake. That shouldn't be taken as an indication that pointers are terribly difficult to work with, or anything like that--just that they're almost never necessary for the kinds of things most relative beginners do in C++.
Likewise with arrays--at first, assume that what you might think of as an array in some other language translates to a std::vector in C++. C++ does also have arrays, but using them can wait a while (a long while, IMO--I've been writing C++ for decades now, and virtually never use raw pointers or arrays at all).
In the interest of simplicity, I've consolidated the data into the program, so it reads the data from the stringstream, like this:
#include <vector>
#include <string>
#include <fstream>
#include <iostream>
#include <sstream>
using namespace std;
vector<string> read_pass(istream &is) {
string line;
vector<string> lines;
while (getline(is, line)) {
lines.push_back(line);
}
return lines;
}
int main() {
istringstream input{ "admin\nwhy?\nksfndj" };
// To read from an external file, change the preceding line to:
// ifstream input{ "stuff.txt" };
vector<string> stuff = read_pass(input);
cout << stuff.size() << endl;
cout << stuff[0] << endl;
if (stuff[0] == "admin") {
cout << "true";
}
else {
cout << "false";
}
return 0;
}
At least for me, this produces:
3
admin
true
...indicating that it has worked as expected. I get the same with an external file. If you're not getting the same with an external file, my immediate guess would be that (at least the first line of) the file contains some data you're not expecting. If the problem continues, you might consider writing out the individual characters of the strings you read in numeric format, to give a more explicit idea of what you're really reading.
After a long time, I finally came up with the answer
#include <fstream>
#include <iostream>
#include <cstdlib>
#include <map>
using namespace std;
typedef map<int, string> strArr;
strArr readFile(string file) {
ifstream stream(file);
string line;
strArr output;
while(getline(stream, line)) {
output[output.size()+1] = line;
}
stream.close();
return output;
}
It doesn't read the file into an array, but it does return a map that does basically the same thing

Using strtok() to parse text file

I've been trying to make a program that parses a text file and feeds 6 pieces of information into an array of objects. The problem for me is that I'm having issues figuring out how to process the text file. I was told that the first step I needed to do was to write some code that counted how many letters long each entry was. The txt file is in this format:
"thing1","thing2","thing3","thing4","thing5","thing6"
This is the current version of my code:
#include<iostream>
#include<string>
#include<fstream>
#include<cstring>
using namespace std;
int main()
{
ifstream myFile("Book List.txt");
while(myFile.good())
{
string line;
getline(myFile, line);
char *sArr = new char[line.length() + 1];
strcpy(sArr, line.c_str());
char *sPtr;
sPtr = strtok(sArr, " ");
while(sPtr != NULL)
{
cout << strlen(sPtr) << " ";
sPtr = strtok(NULL, " ");
}
cout << endl;
}
myFile.close();
return 0;
}
So there are two things making it hard for me right now.
1) How do I deal with the delimiters?
2) How do I deal with "skipping" the first quotation mark in each line?
Read in a string instead of a c-style string. This means that you can use the handy std methods.
The std::string::find() method should help you out with finding each thing that you want to parse.
http://www.cplusplus.com/reference/string/string/find/
You can use this to find all the commas, which will give you the starts of all the things.
Then you can use std::string::substr() to cut up the string into each piece.
http://www.cplusplus.com/reference/string/string/substr/
You can manage to get rid of the quotation marks by passing in 1 more than the start and 1 less than the length of the thing, you can also use
If you have to use strtok then this code snippet should give enough to modify your program to parse your data:
#include <cstdio>
#include <cstring>
int main ()
{
char str[] ="\"thing1\",\"thing2\",\"thing3\",\"thing4\",\"thing5\"";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str,"\",");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, ",\"");
}
return 0;
}
If you do not have to use strtok then you should use std::string as others have advised. Using std::string and std::istringstream:
#include <string>
#include <sstream>
#include <vector>
#include <iostream>
int main ()
{
std::string str2( "\"thing1\",\"thing2\",\"thing3\",\"thing4\",\"thing5\"" ) ;
std::istringstream is(str2);
std::string part;
while (getline(is, part, ','))
std::cout << part.substr(1,part.length()-2) << std::endl;
return 0;
}
For starters, don't use strtok if you can avoid it (and you easily can here - and you can even avoid using the find series of functions as well).
If you want to read in the whole line and then parse it:
#include <algorithm>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
// defines a new ctype that treats commas as whitespace
struct csv_reader : std::ctype<char>
{
csv_reader() : std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask());
rc['\n'] = std::ctype_base::space;
rc[','] = std::ctype_base::space;
return &rc[0];
}
};
int main()
{
std::ifstream fin("yourFile.txt");
std::string line;
csv_reader csv;
std::vector<std::vector<std::string>> values;
while (std::getline(fin, line))
{
istringstream iss(line);
iss.imbue(std::locale(std::locale(), csv));
std::vector<std::string> vec;
std::copy(std::istream_iterator<std::string>(iss), std::istream_iterator<std::string>(), std::back_inserter(vec));
values.push_back(vec);
}
// values now contains a vector for each line that has the strings split by their commas
fin.close();
return 0;
}
That answers your first question. For your second, you can skip all the quotation marks by adding them to the rc mask (also treating them as whitespace) or you can strip them out afterwards (either directly or by using a transform):
std::transform(vec.begin(), vec.end(), vec.begin(), [](std::string& s)
{
std::string::iterator pend = std::remove_if(s.begin(), s.end(), [](char c)
{
return c == '"';
});
s.erase(pend, s.end());
});

Splitting std::string and inserting into a std::set

As per request of the fantastic fellas over at the C++ chat lounge, what is a good way to break down a file (which in my case contains a string with roughly 100 lines, and about 10 words in each line) and insert all these words into a std::set?
The easiest way to construct any container from a source that holds a series of that element, is to use the constructor that takes a pair of iterators. Use istream_iterator to iterate over a stream.
#include <set>
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
using namespace std;
int main()
{
//I create an iterator that retrieves `string` objects from `cin`
auto begin = istream_iterator<string>(cin);
//I create an iterator that represents the end of a stream
auto end = istream_iterator<string>();
//and iterate over the file, and copy those elements into my `set`
set<string> myset(begin, end);
//this line copies the elements in the set to `cout`
//I have this to verify that I did it all right
copy(myset.begin(), myset.end(), ostream_iterator<string>(cout, "\n"));
return 0;
}
http://ideone.com/iz1q0
Assuming you've read your file into a string, boost::split will do the trick:
#include <set>
#include <boost/foreach.hpp>
#include <boost/algorithm/string.hpp>
std::string astring = "abc 123 abc 123\ndef 456 def 456"; // your string
std::set<std::string> tokens; // this will receive the words
boost::split(tokens, astring, boost::is_any_of("\n ")); // split on space & newline
// Print the individual words
BOOST_FOREACH(std::string token, tokens){
std::cout << "\n" << token << std::endl;
}
Lists or Vectors can be used instead of a Set if necessary.
Also note this is almost a dupe of:
Split a string in C++?
#include <set>
#include <iostream>
#include <string>
int main()
{
std::string temp, mystring;
std::set<std::string> myset;
while(std::getline(std::cin, temp))
mystring += temp + ' ';
temp = "";
for (size_t i = 0; i < mystring.length(); i++)
{
if (mystring.at(i) == ' ' || mystring.at(i) == '\n' || mystring.at(i) == '\t')
{
myset.insert(temp);
temp = "";
}
else
{
temp.push_back(mystring.at(i));
}
}
if (temp != " " || temp != "\n" || temp != "\t")
myset.insert(temp);
for (std::set<std::string>::iterator i = myset.begin(); i != myset.end(); i++)
{
std::cout << *i << std::endl;
}
return 0;
}
Let's start at the top. First off, you need a few variables to work with. temp is just a placeholder for the string while you build it from each character in the string you want to parse. mystring is the string you are looking to split up and myset is where you will be sticking the split strings.
So then we read the file (input through < piping) and insert the contents into mystring.
Now we want to iterate down the length of the string, searching for spaces, newlines, or tabs to split the string up with. If we find one of those characters, then we need to insert the string into the set, and empty our placeholder string, otherwise, we add the character to the placeholder, which will build up the string. Once we finish, we need to add the last string to the set.
Finally, we iterate down the set, and print each string, which is simply for verification, but could be useful otherwise.
Edit: A significant improvement on my code provided by Loki Astari in a comment which I thought should be integrated into the answer:
#include <set>
#include <iostream>
#include <string>
int main()
{
std::set<std::string> myset;
std::string word;
while(std::cin >> word)
{
myset.insert(std::move(word));
}
for(std::set<std::string>::const_iterator it=myset.begin(); it!=myset.end(); ++it)
std::cout << *it << '\n';
}