Word search in an array of sentences in C++ - c++

How can I write a program that finds how many times a word entered from the keyboard is repeated in a sentence?

Word search in an array of sentences in C++
How can I write a program that finds how many times a word entered from the keyboard is repeated in a sentence?
I am assuming you want to write a program that counts how many times a (case-sensitive) word has been repeated in an array of sentences. We'll go step-by-step.
#include <string>
#include <vector>
/**
* Counts the number of times a word has been repeated in multiple sentences.
*
* #param sentences An array of string sentences.
* #param word The word to be searched and counted.
* #returns no. of occurrences of `word` in `sentences`.
*/
auto count(const std::vector<std::string>& sentences, const std::string& word) -> std::size_t
{
// Initialize the no. of occurrences of `word` to 0
std::size_t word_count{ 0 };
// Store `word`'s C-string
auto word_cstr = word.c_str();
// Loop through all the sentences to check for `word`'s count in each sentence
for (const auto& sentence : sentences)
{
// Find the first occurrence of `word` in `sentence`
std::size_t found{ sentence.find(word_cstr) };
// Increase the count as long as the word is occurring in the sentence
while (found < std::string::npos)
{
++word_count;
// Search for the next occurrence by starting from the
// current index's next index till the end of the word
found = sentence.find(word_cstr, found+1, word.size());
}
}
// Return the final total no. of occurrences of `word` in all `sentences`
return word_count;
}
Of course, there is (probably) a (much) better solution out there. But this is what I could come up with.
PS: Do note that this search is case-sensitive, and counts all occurrences of the word in the sentences. For example, counting the occurrences of the word "code" in "Coders gonna code the entire codebase." will return 2 ("code" and "codebase").
Alternatively, you can also loop through the sentences one-by-one, split each sentence, conditionally increase the count (the condition being the current word in the sentence should be the same as the given word)

Related

Is there a way to search an unordered_set using a limited alphabetic range?

Context: I am coding an assignment in C++ where a user enters a word or a sentence to unscramble on a word by word basis. I have a text file full of English words that I have read into an unordered_set of strings. Then I go through permutations of each entered word and attempt to find it in the unordered_set. The unscrambled word possibilities are printed out to the user.
Problem: There are a lot of words in the text file. The program doesn't run properly because it takes too long to go through all the permutations and look for a match in the unordered_set.
Possible Solution: I want to limit the range of words to search through, because the text file was already in alphabetical order. For example, if the scrambled word was "cit", one permutation for this word would be "itc". I want to search all of the words in the unordered_set starting with i for "itc".
Here is what I have so far.
void unscramble() {
//issue - too slow, find in range?
string word;
string temp;
ifstream inDictionaryFile("words_alpha.txt");
unordered_set<string> dictionary;
//read dictionary file into a unordered_set
while (getline(inDictionaryFile, temp)) {
auto result = dictionary.insert(temp + " ");
}
cout << "Enter something to unscramble: ";
//find/print out matches for permuations of scrambled words
while (cin>>word) {
do {
word = word + " ";
auto result = dictionary.find(word);
if (result != end(dictionary)) {
cout << setw(10) << word;
}
} while (next_permutation(begin(word), end(word)));
}
}
If you need the permutation of just the first 3 letters you may use an unordered_multiset with the key equal to a canonical permutation (e.g. the sorted first 3 letters). But I guess that the actual problem that you have should not be solved with just one data structure but with several ones, one data structure for storage, other data structures for indexes to that storage.

count longest series of chars next to each other in a string in c++

I am a beginner in c++,I need to be able to find longest series of chars (that are next to each other) in a string that will be inputted by user,how can I do that, I have been trying for hours to type the code and nothing worked for me.
Sorry for my Bad English
example:
"**....*****......"(string input)
I need to find the longest series of '*'
so the input will be 5
problem(need to count just the chars between 'S' and finish at 'F')
int cnt=0,cnt1=0,cnt2=0;
string s;
cin>>s;
for(int j=0;j<s.length();j++){
if(s[j]=='S')
cnt1++;
if(s[j]=='F')
break;
while(true) {
if(s[j]=='*'&&cnt1==1)
cnt++;
if(cnt>cnt2)
cnt2=cnt;
if(s[j]=='F')
break;
}
cnt=0;
}
cout<<cnt2<<endl;
So I assume you are trying to find a series of equal chars (like "aaa").
A simple method, probably not the most efficient, is to count those chars in a loop and hold the index of the longest set of chars.
Get your string from stdin
Create an index counter and a variable to count current number of chars
Count same chars in a loop in one int and mark their beginning index (only if sum is greater than the last one you encountered in loop)
After the loop return the char (and same following chars) in string[index]

how can I compare the contents on an array in C or c++

Basically what I'm looking for is a way to set the contents of an array -ex: a phrase- into individual words to be compared.
so when the user enters the data, I can tell how many words of the same length they are.
void main(){
char array[30];
int length, cont, array_tokens;
printf("enter a phrase: ");
scanf("%[^\n]s", array); //or gets(array); which ever one you like
/*-------------------------
*******magic happens*******
---------------------------*/
for(int i=0; i<wordcount;i++)
printf("%d word(s) with %d letters was entered", array_tokens, cont);//some sort of
system("pause"); //counter which came
} //with the magic that
//happened before
so the result should be:
enter a phrase: user entered a phrase with similar lentgh words
1 word(s) with 1 letter was entered
2 word(s) with 4 letters was entered
1 word(s) with 5 letters was entered
3 word(s) with 6 letters was entered
1 word(s) with 7 letters was entered
Well, strtok() is one way to solve this problem. If you're aiming for efficiency (you should), you should write a loop which iterates over the letters in the sentence. Count the number of non-space characters encountered since the last white-space, and update the n-letter-word-frequency array whenever you get a white-space. I could've written the code to do this, but I don't want to deprive you of the sense of gratification when you write the working piece of code yourself. :P
You can have an array count where count[i] is the number of words of length i.
Set count to 0
Set t to 0
for(int i=0;i<given_phrase.length();i++)
{
if(given_phrase[i]==' ')
{
count[t]++;
t=0;
}
else
{
t++;
}
}
For C, simply check for '\0' for end of phrase.
Then you can display results based on count.
You'll want to split up the sentence into tokens (in this case individual words) by a delimiter (in this case a space char ' '). There are various ways to do that. A good old-school C method is to use strtok as Robert Harvey suggested, which is provided by the C Standard Library. It works by passing in the string you want to split up, followed by the delimiter at which to split the string.
Then, to count the similar length words, try having integer variables for the different word lengths (or more simply in a sort of frequency array, each index representing a word of that length and its value being a count of their occurrences), looping over all the word tokens, incrementing the corresponding variable when a word of a length is encountered. To obtain the length of a C-Style string, try strlen.

Identifying related words with the use of Wildcards in c++

Ok, I've been trying to come up with a solution to my problem. The problem is:
Given a list of 3 letter words (size of the list is irrelevant I think), how can I identify those words in the list that differ with the first word in the list by at most one letter.
Say I have the word pat
then I would like to identify all the words in the list that are:
pa_ such as pay
p_t such as pot
_ot such as rot
Is there a way to implement wildcards in c++?
[This may be more complex than the assignment requires but it avoids slow string comparisons and regular expressions]
Given that everything is a 3-letter word, you might consider representing each word as a 4-byte integer. For example, in "pat" the letter 'p' is 0x70 (ascii), 'a' is 0x61 and 't' is 0x74 so represent "pat" by the integer 0x706174. Do likewise for all the 3-letter words in the test list.
Next, the combination of tests required for 2 of the 3 letters to match (in same order) is:
p?t where the test is 0x70??74
?at where the test is 0x??6174
pa? where the test is 0x7061??
PS can I just add that the stackoverflow 'code sample' button which is suppose to reformat your selection as code is weird in Firefox. This 1-minute post has taken 20 minutes to format!
// assume array words[] of strings
int word0 = calc_int_from_word(words[0]);
for (int ii = 1; ii < words.count; ii++)
{
int wordii = calc_int_from_word(words[ii]);
if (wordii & 0xFFFF00 == word0 & 0xFFFF00 ||
wordii & 0xFF00FF == word0 & 0xFF00FF ||
wordii & 0x00FFFF == word0 & 0x00FFFF)
{
// words[ii] matches words[0] in at least two letters
}
}
try looking at the strcmp function and some of the other related functions listed here : http://www.cplusplus.com/reference/cstring/strcmp/
....or you could use regex like Tony the Lion just said.
EDIT:
....also is the strcspn function
http://www.cplusplus.com/reference/cstring/strcspn/
/* strcspn example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] = "fcba73";
char keys[] = "1234567890";
int i;
i = strcspn (str,keys);
printf ("The first number in str is at position %d.\n",i+1);
return 0;
}
Output:
The first number in str is at position 5
There's also strstr which finds the first occurence of str1 in str2.
for instance:
string str1 = "po";
string str2 = "potlock";
char * pch;
pch = strstr (str2,str1);
Second EDIT:
you could definitely do a loop with
if (strcspn (str1,str2)){
//there is at least one match
}else {
//no matches
}

Horspool algorithm for multiple occurrences of the same pattern

I've implemented in C++ the Horspool algorithm (depending on the Introduction to the Design and Analysis of Algorithms by Anany Levitin, 2nd edition, p. 258) for finding the position of the first occurrence of a desired pattern in the text. However, I want to extend the algorithm to find multiple occurrences of the same pattern. Unfortunately, I got stuck on the latter implementation. You can see my code below:
The function calculates and returns the position of the first occurrence of a desired pattern in the text. The shift sizes are stored in the ShiftTable and the ShiftTable is indexed by the characters of a desired alphabet. Additionally, the integer counter is used for counting the total comparisons between pattern's and text's characters. The counter initially has a zero value. How could I extend this to find multiple occurences of the same pattern?
I attempted the following in the body of the main() function but it's NOT EFFICIENT although it works. If the first occurrence of the pattern is encountered, its position will be printed and the part of the text which ends with the first occurrence of the pattern will be erased. Moreover, the programme will check the remaining text for the pattern and so on.
int counter=0;
while ((position = Find(pattern,text,ShiftTable,counter)) != -1) {
cout << position << endl;
text = text.erase(0,result+m);
}
Any ideas?
Currently you always start at the beginning (i = m - 1). If you want to resume a previous search, just pass in the last position to start from.
In the following I’ve removed the counter variable – what’s the use of that anyway?
int Find(string pattern, string text, int *ShiftTable, int start = 0)
… and …
i = start + m - 1,
… and just call the code as follows:
while ((position = Find(pattern,text,ShiftTable,position)) != -1) {
cout << position << endl;
++position;
}