Partial string search in C++ - c++

Let's say I have a vector of strings called info that reads names of websites from a file one by one sequentially.
This is what I have that searches for names, by the complete name only:
int linearSearch(vector <string> inputs, string search_key){
for (int x=0; x<inputs.size(); x++){
if (search_key==inputs[x]){
return x;
}
}
return -1;
}
Now what if I wanted to COUNT the amount of websites with a particular word in it?
So if I had
apple.com
mac.com
macapple.com
applepie.com
potato.com
and I searched for "apple", it would return 3.

You can use string::find to perform a partial search of the string and store the value into a size_t variable.
Compare that to std::string::npos and increment count if they are not equal.
Here is an simple example using arrays not vector so you can learn and make modifications as required.
int main() {
string inputs[2] = {"stack overflow", "stack exchange"};
string search_key = "stack";
int count;
for(int i = 0; i <sizeof(inputs)/sizeof(inputs[0]); i++)
{
//npos returns -1. If substring is not found, find will return -1.
//if substring is found, condition fails and count is incremented
if (inputs[i].find(search_key) != string::npos)
count++;
}
cout << count;
return 0;
}
Here is the link for the code above. You can see that the output is 2 as expected as the word stack occurs twice in the inputs array.

Related

How do you break a long string into words and iterate through each character of word and if they match increment a char count using stringstream

int MatchString::comparsion(string newq, string oldq){
//breaks down the string into the smaller strings
stringstream s1(newq);
stringstream s2(oldq);
string new_words;
string old_words;
int word_count = 0;
while(s1>>new_words&&s2>>old_words){
for(int i = 0; i<new_words.length();i++){
for(int j = 0; j<old_words.length();j++){
char a = new_words[i];
char b = old_words[j];
if(a == b){
char_count++;
}
else{
j++;
}
}//end of 2nd for
}//end of for
}
return char_count;
}
I'm currently trying to make a function that takes in two strings and breaks them down into words then into chars. Afterward, I try to compare the value of each char and see if they equal each other. And if they do I increment a char_count by 1. Else I increment j so I compare next char in string 2 with string 1. I need to use this char_count value later to develop another algorithm because I need it to calculate a percentage difference between the two strings which is why I return it at the end because including that calculation with this method would be a bit messy. However when cout the return value I get something completely wrong. I don't know what I'm doing wrong can you please help.
Your j++ under else in the for-loop is redundant, if I'm correct. Allow your for-loop to naturally advance its iterator, don't force it within else{}.

Finding Randomly Order Substring in String

For my first part of the question, We have given a long string of input and we have to count the occurrence for it.
For eg.
Input = AXBHAAGHXAXBH
Find = AXBH
Output = 2
This can be achieved by using the string.find("term") loop. Eg.
#include <string>
#include <iostream>
int main()
{
int occurrences = 0;
std::string::size_type pos = 0;
std::string inputz = "AXBHAAGHXAXBH";
std::string target = "AXBH";
while ((pos = inputz.find(target, pos )) != std::string::npos) {
++ occurrences;
pos += target.length();
}
std::cout << occurrences << std::endl;
}
However, I am not sure how to do the second part where, it needs to take into account the random structure:
Random structure refers to any orientation of our find. Important note: The find occurrences are always grouped together but can have different structure.
I do not want to use cases because some sample find are too big eg. Find AXBHNMB would have too many cases to consider and would prefer a more general approach.
Eg. AXBH is find, then AXHB is also acceptable for the occurence
A proper example:
Input = AXBHAAGHXAXBH**ABHX**NBMN**AHBX**
Find = AXBH
Output = 4
Prefer if you please code it for the given example with link to explanation/explanation to any new function you use.
You are correct that checking all permutations would take a lot of time. Fortunately we don't need to do that. What we can do is store the string to find in a std::map<char, int>/std::unordered_map<char, int> and then grab sub strings from the string to search through, convert those into the same type of map and see if those maps are equal. This lets use compare without caring about the order, it just makes sure we have the correct amount of each character. So we would have something like
int main()
{
std::string source = "AHAZHBCHZCAHAHZEHHAAZHBZBZHHAAZAAHHZBAAAAHHHHZZBEWWAAHHZ ";
std::string string_to_find = "AAHHZ";
int counter = 0;
// build map of the characters to find
std::unordered_map<char, int> to_find;
for (auto e : string_to_find)
++to_find[e];
// loop through the string, grabbing string_to_find chunks and comparing
for (std::size_t i = 0; i < source.size() - string_to_find.size();)
{
std::unordered_map<char, int> part;
for (std::size_t j = i; j < string_to_find.size() + i; ++j)
++part[source[j]];
if (to_find == part)
{
++counter;
i += string_to_find.size();
}
else
{
++i;
}
}
std::cout << counter;
}
A naive approach is to iterate over the given string and searching the target string.
In each chunk, we need to sort the portion and compare if it matches with the target string.
#include <string>
#include <iostream>
#include <algorithm>
int main()
{
int occurrences = 0;
std::string::size_type pos = 0;
std::string inputz = "AXBHAAGHXAXBH**ABHX**NBMN**AHBX**";
std::string target = "AXBH";
std::sort(target.begin(), target.end());
int inputz_length = inputz.length();
int target_length = target.length();
int i=0;
for(i=0; i<=inputz_length-target_length; i++)
{
std::string sub = inputz.substr(i, target_length);
std::sort(sub.begin(), sub.end());
if (target.compare(sub) == 0)
{
std::cout << i<<"-->"<< target<<"-->" << sub << std::endl;
occurrences++;
i=i+target_length;
}
}
std::cout << occurrences << std::endl;
return 0;
}
Output:
0-->ABHX-->ABHX
9-->ABHX-->ABHX
15-->ABHX-->ABHX
27-->ABHX-->ABHX
4
Extra function: Uses sort function from algorithm header file.
Time complexity: more than O(n2)
One solution is to find a canonical representation for both the search string and a substring. Two fast approaches are possible.
1) Sort the substring.
2) Calculate a histogram of letters.
Option 2 can be calculated incrementally by incrementing histogram bins for the incoming letters and decrementing the bins for outgoing letters in the search window.
While updating the histogram bin, one can also check if this particular update toggles the overall matching:
// before adding the incoming letter
if (h[incoming] == target[incoming]) matches--;
else if (++h[incoming] == target[incoming]) matches++;
// before subtracting outgoing letter
if (h[outgoing] == target[outgoing]) matches--;
else if (--h[outgoing] == target[outgoing]) matches++;
if (matches == number_of_unique_letters) occurences++;
Then the overall complexity becomes O(n).

Find the most frequently word using hashmaps c++

I need to find the most frequently occurring word and return that value. I must use hash maps and the fuction would take a file name. This is what I've done so far but im very confused.
int most_frequent_word(string filename)
{
string words;
ifstream in(filename.c_str());
unordered_map<string, int> word_map;
while(in >> words)
{
for(int i = 0; i < 100; i++)
{
word_map[words[i]]++;
}
}
return words;
}
any help would be appreciated it. Thanks!
There are several issues in your code that might cause it to work not as expected.
First is for i loop. Why would you need taht loop at all? Leave it like this, you need to count words.
while(in >> words)
{
word_map[words]++;
}
Rename words to word, actually you are reading one word here in >> words.
The third is return statement. You cannot return string when it is declared that function returns int.
However there is nothing to return yet, because so far we only know the number each word occurred. Run a loop to find max value.
int result = 0;
for(unordered_map<string, int>::iterator it = word_map.begin(); it != word_map.end(); it++)
result = max(result, it->second);
return result;
Here word_map consists of pairs of a word and its number of occurrences. We need to iterate over all of these pairs looking for max occurrences. To do this we use iterator it.
I'm also confused!
for(int i = 0; i < 100; i++)
{
word_map[words[i]]++;
}
What are you doing here? Where does the 100 come from? Why do you care for single letters of your words (which is what words[i] gets you)?
If I understand your task correctly, wouldn't it suffice to
++word_map[words];
instead?
Also why do you return words? It's a string, and your function should return and int. Instead find the largest value in your map and you're done.

How to apply a sort function to a string.find( ) to print results alphabetically?

I have a program that reads a text file into a struct (members- str author and str title) and gives the user the option to display all records in the file, search for an author, or search for a title. Now I need to integrate a sort function into this process, so that when the user searches by author or title the results are listed alphabetically.
The following sort function works perfectly with my showAll function, but I have absolutely no idea how I can modify my search functions to alphabetize the results.
Sort function code:
void sortByTitle(int counter){
//variable
int a, b, minIndex;
string temp;
for (a = 0; a < counter; a++){
minIndex = a;
for (b = a + 1; b < counter - 1; b++){
if (books[b].title < books[minIndex].title){
minIndex = b;
}
}
if(minIndex != a) {
temp = books[a].title;
books[a].title = books[minIndex].title;
books[minIndex].title = temp;
cout << books[a].title << endl;
}
}
}
And this is my current title search function:
int showBooksByTitle(int counter, string bookTitle){
int recordCount = 0;
//find the user-input string inside bookTitle
for (int a = 0; a < counter; a++){ //loop through the whole file
if (books[a].title.find(bookTitle) != string::npos){
//print a matching record
cout << books[a].title << " " << "(" << books[a].author << endl;
//keep track of the number of matching records
recordCount++;
}
}
return recordCount;
}
My assignment specifies these function headers, that the search functions return the number of records found, and that the data be read into a struct (rather than a vector). So I have to leave those aspects as they are.
How can I apply the selection sort to the search function so that I can print the records in order?
Any help would be much appreciated!
Here you set
smallestIndex = index;
then you check if
books[index].title < books[smallestIndex].title
It appears you are trying to implement a selection sort. A good snippet can be found here.
Furhtermore:
I don't see the need for declaring loc outside the for-loop. say for(int loc....
There is an abundance of sorting algorithms on wikipedia and you can use std::sort.
In the second line you shadow index with the same variable in the
for-loop.
I don't know how you store books, but if you would use std::vector you don't have to pass counter every time. You can just use books.size().

C++ Dynamic Array Inputs

I am using two dynamic arrays to read from a file. They are to keep track of each word and the amount of times it appears. If it has already appeared, I must keep track in one array and not add it into the other array since it already exists. However, I am getting blank spaces in my array when I meet a duplicate. I think its because my pointer continues to advance, but really it shouldn't. I do not know how to combat this. The only way I have was to use a continue; when I print out the results if the array content = ""; if (*(words + i) == "") continue;. This basically ignores those blanks in the array. But I think that is messy. I just want to figure out how to move the pointer back in this method. words and frequency are my dynamic arrays.
I would like guidance in what my problem is, rather than solutions.
I have now changed my outer loop to be a while loop, and only increment when I have found the word. Thank you WhozCraig and poljpocket.
Now this occurs.
Instead of incrementing your loop variable [i] every loop, you need to only increment it when a NEW word is found [i.e. not one already in the words array].
Also, you're wasting time in your inner loop by looping through your entire words array, since words will only exist up to index i.
int idx = 0;
while (file >> hold && idx < count) {
if (!valid_word(hold)) {
continue;
}
// You don't need to check past idx because you
// only have <idx> words so far.
for (int i = 0; i < idx; i++) {
if (toLower(words[i]) == toLower(hold)) {
frequency[i]++;
isFound = true;
break;
}
}
if (!isFound) {
words[idx] = hold;
frequency[idx] = 1;
idx++;
}
isFound = false;
}
First, to address your code, this is what it should probably look like. Note how we only increment i as we add words, and we only ever scan the words we've already added for duplicates. Note also how the first pass will skip the j-loop entirely and simply insert the first word with a frequency of 1.
void addWords(const std::string& fname, int count, string *words, int *frequency)
{
std::ifstream file(fname);
std::string hold;
int i = 0;
while (i < count && (file >> hold))
{
int j = 0;
for (; j<i; ++j)
{
if (toLower(words[j]) == toLower(hold))
{
// found a duplicate at j
++frequency[j];
break;
}
}
if (j == i)
{
// didn't find a duplicate
words[i] = hold;
frequency[i] = 1;
++i;
}
}
}
Second, to really address your code, this is what it should actually look like:
#include <iostream>
#include <fstream>
#include <map>
#include <string>
//
// Your implementation of toLower() goes here.
//
typedef std::map<std::string, unsigned int> WordMap;
WordMap addWords(const std::string& fname)
{
WordMap words;
std::ifstream inf(fname);
std::string word;
while (inf >> word)
++words[toLower(word)];
return words;
}
If it isn't obvious by now how a std::map<> makes this task easier, it never will be.
check out SEEK_CUR(). If you want to set the cursor back
The problem is a logical one, consider several situations:
Your algorithm does not find the current word. It is inserted at position i of your arrays.
Your algorithm does find the word. The frequency of the word is incremented along with i, which leaves you with blank entries in your arrays whenever there's a word which is already present.
To conclude, 1 works as expected but 2 doesn't.
My advice is that you don't rely on for loops to traverse the string but use a "get-next-until-end" approach which uses a while loop. With this, you can track your next insertion point and thus get rid of the blank entries.
int currentCount = 0;
while (file)
{
// your inner for loop
if (!found)
{
*(words + currentCount) = hold;
*(frequency + currentCount) = 1;
currentCount++;
}
}
Why not use a std::map?
void collect( std::string name, std::map<std::string,int> & freq ){
std::ifstream file;
file.open(name.c_str(), std::ifstream::in );
std::string word;
while( true ){
file >> word; // add toLower
if( file.eof() ) break;
freq[word]++;
}
file.close();
}
The problem with your solution is the use of count in the inner loop where you look for duplicates. You'll need another variable, say nocc, initially 0, used as limit in the inner loop and incremented whenever you add another word that hasn't been seen yet.