Getting Word Frequency From Vector In c++ - c++

I have googled this question and couldn't find an answer that worked with my code so i wrote this to get the frequency of the words the only issue is that i am getting the wrong number of occurrences of words apart form one that i think is a fluke. Also i am checking to see if a word has already been entered into the vector so i don't count the same word twice.
fileSize = textFile.size();
vector<wordFrequency> words (fileSize);
int index = 0;
for(int i = 0; i <= fileSize - 1; i++)
{
for(int j = 0; j < fileSize - 1; j++)
{
if(string::npos != textFile[i].find(textFile[j]) && words[i].Word != textFile[j])
{
words[j].Word = textFile[i];
words[j].Times = index++;
}
}
index = 0;
}
Any help would be appreciated.

Consider using a std::map<std::string,int> instead. The map class will handle ensuring that you don't have any duplicates.

Using an associative container:
typedef std::unordered_map<std::string, unsigned> WordFrequencies;
WordFrequencies count(std::vector<std::string> const& words) {
WordFrequencies wf;
for (std::string const& word: words) {
wf[word] += 1;
}
return wf;
}
It is hard to get simpler...
Note: you can replace unordered_map with map if you want the worlds sorted alphabetically, and you can write custom comparisons operations to treat them case-insensitively.

try this code instead if you do not want to use a map container..
struct wordFreq{
string word;
int count;
wordFreq(string str, int c):word(str),count(c){}
};
vector<wordFreq> words;
int ffind(vector<wordFreq>::iterator i, vector<wordFreq>::iterator j, string s)
{
for(;i<j;i++){
if((*i).word == s)
return 1;
}
return 0;
}
Code for finding the no of occurrences in a textfile vector is then:
for(int i=0; i< textfile.size();i++){
if(ffind(words.begin(),words.end(),textfile[i])) // Check whether word already checked for, if so move to the next one, i.e. avoid repetitions
continue;
words.push_back(wordFreq(textfile[i],1)); // Add the word to vector as it was not checked before and set its count to 1
for(int j = i+1;j<textfile.size();j++){ // find possible duplicates of textfile[i]
if(file[j] == (*(words.end()-1)).word)
(*(words.end()-1)).count++;
}
}

Related

Sliding Window Function to Count Unique Words in Vector in C++

I have a vector with a block of text read from a txt file. I need to use a window function to find the number of unique words in a sliding window of size K. I've found this count online which uses a similar technique but with an int array. However, when I try to adjust to code to fit my situation I'm getting an error:
"no match for ‘operator+’ (operand types are ‘std::vectorstd::__cxx11::basic_string<char >’ and ‘int’)gcc"
My question is, should I not be using a vector here? Should I be trying to figure out how to convert to an array? I didn't think there was too much of a difference between the two that I wouldn't be able to adapt the code, but perhaps I am wrong. I literally just started to learn C++ last night. Please help :(
// Counts distinct elements in window of size K
int countWindowDistinct(vector<string> text, int K)
{
int dist_count = 0;
// Traverse the window
for (int i = 0; i < K; i++) {
// Check if element arr[i] exists in arr[0..i-1]
int j;
for (j = 0; j < i; j++)
if (text[i] == text[j])
break;
if (j == i)
dist_count++;
}
return dist_count;
}
// Counts distinct elements in all windows of size k
void countDistinct(vector<string> text, int N, int K)
{
// Traverse through every window
for (int i = 0; i <= N - K; i++)
cout << countWindowDistinct(text + i, K) << endl;
}
int main()
{
//Declares two vectors
std::vector<std::string> book;
std::vector<std::string> unqWords;
//reads a text file and stores the contents in the vector book
book = readFile("test.txt");
//Ensures that all words in the text are lowercase
makeLower(book);
//Loops through the text (one word at a time) and removes all alphanumeric characters
for(int i = 0; i < book.size(); i++)
{
//Function used to remove alphanumeric characters from words
book[i] = removeAlpha(book[i]);
}
int K = 4;
int N = calculate_size(book);
// Function call
countDistinct(book, N, K);
}

What is the problem in this code that redistributes characters to make all strings equal?

I wrote a program to solve the following problem:
You are given an array of strings words (0-indexed).
In one operation, pick two distinct indices i and j, where
words[i] is a non-empty string, and move any character from
words[i] to any position in words[j].
Return true if you can make every string in words equal using any
number of operations, and false otherwise.
But it doesn't seem to work the way I expect. Can someone help point out my error?
bool makeEqual(vector<string>& words)
{
int n = words.size();
map<char, int> mp;
for(int i = 0; i < n; i++)
{
string s = words[i];
for(int j = 0; j < words[i].length(); j++) //<-- what is problem here?
{
mp(s[j])++;
}
}
set<int> st;
for(auto it : mp)
{
st.insert(it.second);
}
return st.size() == 1;
}
The issue with your program is not the part that you have marked. It is the logic that follows that loop.
It looks like your strategy to solve the problem is to count how many times each character appears. That is a reasonable start. From there, it should follow that to satisfy the problem, each character must be able to appear exactly the same number of times in every word. And so, every count must be a multiple of n.
To check if a value x is a multiple of n you can use the modulo operator:
if (x % n == 0) std::cout << x << " is a multiple of " << n << "\n";
And so, you should now see that the actual problem is the very last statement in your function:
return st.size() == 1;
What that does is say that all characters appear exactly the same number of times. However, it's not even correct in the simplest case because you still haven't checked that value is a multiple of n. And it doesn't handle characters that can appear different number of times (e.g. if all words became "eel" it would fail).
So what you actually need is to test that all of the character counts are the correct multiple. You can use std::all_of from <algorithm> to help with this:
bool makeEqual(const vector<string>& words)
{
int n = words.size();
map<char, int> mp;
for(int i = 0; i < n; i++)
{
const string& s = words[i];
for(int j = 0; j < s.length(); j++)
{
mp(s[j])++;
}
}
// Return true if all character counts are a multiple of n
return all_of(mp.begin(), mp.end(),
[n](const pair<char, int>& cc) {
return cc.second % n == 0;
});
}
If you struggle with lambda syntax, this is basically equivalent to:
for (const auto& cc : mp)
{
if (cc.second % n != 0) return false;
}
return true;

looking for a faster way to help reduce/create a huge list of strings

I tried to write an algorithm to guess correctly in the game "Masterminds",
it works the average number of guesses is 6, but it takes a lot of time to calculate the best guess.
I used the idea of Knuth the algorithm works as follows:
Create the set S of 1296 possible codes (1111, 1112 ... 6665, 6666).
Start with initial guess 1122 (Knuth gives examples showing that other first guesses such as 1123, 1234 do not win in five tries on
every code).
Play the guess to get a response of colored and white pegs.
If the response is four colored pegs, the game is won, the algorithm terminates.
Otherwise, remove from S any code that would not give the same response if the current guess were the code.
In my code step 2 is to take random number.
I used vector<string> for this.
AllPoss is the vector full of strings, I guess is the last guess that was used. answer is the count of bulls and cows looks like "x,y" (where x and y are numbers)
void bullpgia::SmartGuesser::remove(string guess, string answer)
{
for (auto i= AllPoss.begin();i != AllPoss.end();i++){
string token = *i;
if (calculateBullAndPgia(token, guess) != answer)
AllPoss.erase(i--);
}
}
this is the part it take a lot of time to calculate is there any way of improvement?
to creating the list i used :
void bullpgia::SmartGuesser::All() {
/**
* creates a pool of all the possibilities strings
* we then delete the ones we dont need
* #param length is the length of the word we need to guess
*/
for(int i=0;i<pow(10,length);i++){
stringstream ss;
ss << setw(length) << setfill('0') << i;
string s = ss.str();
AllPoss.push_back(s);
}
}
the function calculateBullAndPgia(string , string) is:
string calculateBullAndPgia(const string &choice, const string &guess) {
string temp = choice;
string temp2 = guess;
unsigned int bull = 0;
unsigned int pgia = 0;
for (int i = 0; i < temp.length(); i++) {
if (temp[i] == temp2[i]) {
bull++;
temp[i] = 'a';
temp2[i] = 'z';
}
}
for (int i = 0; i < temp.length(); i++) {
for (int j = 0; j < temp2.length(); j++) {
if (i != j && temp[i] == temp2[j]) {
pgia++;
temp[i] = 'a';
temp2[j] = 'z';
}
}
}
return to_string(bull) + "," + to_string(pgia);
}
Erasing a single element in the middle of a vector is O(n). My guess is that you wind up doing it O(n) times per call to SmartGuesser::remove. Then you loop over that so you probably have a O(n^3) algorithm. You instead could use std::remove_if, which is O(n), to move all the to-be-erased elements to the end of the vector where they can be cheaply erased.:
AllPoss.erase(std::remove_if(AllPos.begin(), AllPos.end(), [&](const std::string& token, const std::string& guess) { return calculateBullAndPgia(token, guess) != answer; }), AllPos.end());

Bug in selection sort loop

I need to make a program that will accept a input file of numbers(integer.txt) which will be sorted one number per line, into a vector, then use a selection sort algorithm to sort the numbers in descending order and write them to the output file (sorted.txt). I'm quite sure something is wrong in my selectionSort() function that is causing the loop not to get the right values, because after tested with cout I get vastly improper output. I'm sure it's a beginning programmer's goof.
vector<string> getNumbers()
{
vector<string> numberList;
ifstream inputFile ("integer.txt");
string pushToVector;
while (inputFile >> pushToVector)
{
numberList.push_back(pushToVector);
}
return numberList;
}
vector<string> selectionSort()
{
vector<string> showNumbers = getNumbers();
int vectorMax = showNumbers.size();
int vectorRange = (showNumbers.size() - 1);
int i, j, iMin;
for (j = 0; j < vectorMax; j++)
{
iMin = j;
for( i = j; i < vectorMax; i++)
{
if(showNumbers[i] < showNumbers[iMin])
{
iMin = i;
}
}
if (iMin != j)
{
showNumbers[j] = showNumbers [iMin];
}
}
return showNumbers;
}
void vectorToFile()
{
vector<string> sortedVector = selectionSort();
int vectorSize = sortedVector.size();
ofstream writeTo;
writeTo.open("sorted.txt");
int i = 0;
while (writeTo.is_open())
{
while (i < vectorSize)
{
writeTo << sortedVector[i] << endl;
i += 1;
}
writeTo.close();
}
return;
}
int main()
{
vectorToFile();
}
vectorRange defined but not used.
In your selectionSort(), the only command that changes the vector is:
showNumbers[j] = showNumbers [iMin];
Every time control reaches that line, you overwrite an element of the vector.
You must learn to swap two values, before you even think about sorting a vector.
Also, your functions are over-coupled. If all you want to fix is selectionSort, then you should be able to post that plus a main that calls it with some test data and displays the result, but no, your functions all call each other. Learn to decouple.
Also your variable names are awful.

Writing a permutation function

I've been asked to write a permutation function that uses recursion. The only parameter of the function should be the string that I should find all the permutations of. The function should return a vector with all possible permutations. I know I can use next_permutation in STL Algorithms, but I've been asked not to.
I have the base case set up, and I know I need a for loop, but I'm not quite sure where to go from there. Can someone point me in the right direction?
vector <string> getPerm(string str)
{
vector<string> v;
if(w.length() <= 1)
{
v.push_back(str);
return v;
}
else
{
for(int i = 0; i < str.size(); i++)
{
//Some code
}
}
}
Any help would be appreciated.
Imagine you already have the result of the previous iteration of your function, with returns all the permutations of the first n-1 elements of your string.
vector<string>& v_prev = getPerm(str.substr(0, str.length()-1));
Use this in the
//Some code
part of your code.
Another tip: use the 0-length string as the stop-condition of your recursion. You can construct the 1-lenght permutations recursively ;)
Here is the entire solution:
vector<string> getPerm(string str)
{
vector<string> v;
if (str.empty())
{
v.push_back(string());
return v;
}
else
{
vector<string>& v_prev = getPerm(str.substr(0, str.length()-1));
for(int i = 0; i < v_prev.size(); i++)
{
for (int j = 0; j < v_prev[i].length() + 1; j++)
{
string p = v_prev[i];
p.insert(j, str.substr(str.length() - 1, 1));
v.push_back(p);
}
}
return v;
}
}
Think about these permutations of the string "123"
123
132
213
231
312
321
And think about these permutations of "12"
12
21
Can you see how you might construct the permutations of a n letter string if you know the permutations of all the n-1 letter substrings. That type of solution would be recursive.
For each element x in yourArray
Make a copy of yourArray without element x. Call this new array newArray.
Find all of the permutations of newArray
add element x to the beginning of each of those permutations
Implementing what just Ken Bloom wrote:
vector <string> getPerm(string str)
{
vector<string> v;
if(str.length() <= 1)
{
v.push_back(str);
return v;
}
else
{
for(int i = 0; i < str.size(); i++){
vector<string> perms = getPerm(str.substr(0,i)+str.substr(i+1));
for(int j = 0; j < perms.size(); j++){
v.push_back(str[i] + perms[j]);
}
}
}
}
Try something like this:
permut(s) :
if s.length=0 : exit;
else :
for i=0 to s.length :
front:=s[i];
remove(s,i);
s2 := front + permut(s);
print s2, NEWLINE;