Count occurrences of a sub string in a String

Count occurrences of a sub string in a String - c++

First of all, I know there are many duplicates of this question but I have tried and tried and non have been able to solve my issue.
I have the following string
string s = "asdfqasdfp";
I need to loop through the string and find which sub string appears more than once. so in this case its
asdf
I have made the following code but I do not know why it doesn't work. I start from the full string and go down one at a time. I should get occurence value of 2.
int t = s.length();
for (int i = 0; i < s.length(); i++) {
string str = s.substr(0, t);
int occurence = 0;
size_t start = 0;
while ((start = s.find(str, start)) != string::npos) {
++occurence;
start += str.length();
}
if (occurence > 1) {
cout << occurence;
}
else {
--t;
}
}
EDIT: I only want the largest substring that the string contains, in this case
"asdf"

Here's a fixed version of your code, including Daniel's suggestions (thanks! Daniel's demo)
for (size_t t = s.length(); t >= 1; --t) {
for (size_t i = 0; (i + t) <= s.length(); i++) {
std::string str = s.substr(i, t);
size_t occurence = 0;
size_t start = 0;
while ((start = s.find(str, start)) != std::string::npos) {
++occurence;
start += str.length();
}
if (occurence > 1) {
std::cout << str << " " << occurence << std::endl;
return 0;
}
}
}
You need to
loop over t, and decrement it at the end of the i loop rather than inside
limit i to s.length() - t, so that there's always a t-length string to take as str
fix your substr to start at i not 0
You can also stop at the first time you find a duplicate, since this will be a largest duplicate (e.g. if there are two pairs of duplicates of length 4 it will find one of them, but it doesn't sound like you need both). You should also use size_t throughout as your integer type since that's what's used by the string functions here.

Looks ok. Do you need an << endl; to see output in your terminal?

Related

Cannot find a logical sense

i already studied c++ in school and during the last days i have been doing the beginner c++ course of codecademy. On codecademy there is an exercise in which i have to identify palindrome words and return true or false. I haven't been able to resolve it so i saw the solution and it was:
#include <iostream>
// Define is_palindrome() here:
bool is_palindrome(std::string text) {
std::string reversed_text = "";
for (int i = text.size() - 1; i >= 0; i--) {
reversed_text += text[i];
}
if (reversed_text == text) {
return true;
}
return false;
}
int main() {
std::cout << is_palindrome("madam") << "\n";
std::cout << is_palindrome("ada") << "\n";
std::cout << is_palindrome("lovelace") << "\n";
}
My only doubt is with this line:
for (int i = text.size() - 1; i >= 0; i--) {
reversed_text += text[i];
i know it has to do with index values but i can't understand why it has a -1.
Could somebody explain this to me?
i thanks in advance whoever read this post. i'm sorry for my english or my poor using of stacksoverflow, i'm italian and that's my first time using this site.

for (int i = text.size() - 1; i >= 0; i--) {
reversed_text += text[i];
text is basically the string that you receive as input via function. size() is function that returns the size of the string i.e text.size() so in our test cases it will return
5 for madam
3 for ada
8 for lovelace
If you think about the strings as an array with exact above size then the index range will become
0-4 for madam
0-2 for ada
0-7 for lovelace
So that's why the text.size()-1 is using as the starting index of loop. text.size() will return the actual size of string and then minus 1 to get the index of last character in string.
so behind the scene, your loop iteration will look something like below
for (int i = 4; i >= 0; i--) { //for madam
}
//aca
for (int i = 2; i >= 0; i--) {
}
//lovelace
for (int i = 7; i >= 0; i--) {
}
I hope it clear out your confusion.
Thanks,

i know it has to do with index values but i can't understand why it has a -1.
If a string is n characters long, the characters in it are indexed from 0 to n−1.
Since the loop works with characters from the end of the string to the beginning, it starts with index text.size() - 1.
However, the solution you have shown is nominally inefficient. There is no reason to make a reversed copy of the string. It suffices merely to test whether each character in the first half of the string equals the character in the reflected position:
bool is_palindrome(std::string text)
{
size_t e = text.size();
for (int i = 0; i < e/2; ++i)
if (text[i] != text[e-1-i])
return false;
return true;
}

If using a for loop to reverse the string is confusing, you could also use the reverse function
std::string reversed_text = text;
reverse(reversed_text.begin(),reversed_text.end());
which just helps flip the entire string reversed_text and can achieve the same result in a simpler way.

Is there a struct function for finding the longest and not-repeating length substring within a string?

The aim of the function is to find out the longest and not repeating substring, so I need to find out the start position of the substring and the length of it. The thing I'm struggling with is the big O notation should be O(n). Therefore I cannot use nested for loops to check whether each letter is repeated.
I created a struct function like this but I don't know how to continue:
struct Answer {
int start;
int length;
};
Answer findsubstring(char *string){
Answer sub={0, 0}
for (int i = 0; i < strlen(string); i++) {
}
return (sub)
}
For example, the input is HelloWorld, and the output should be World.The length is 5.
If the input isabagkfleoKi, then the output is bagkfleoKi. The length is 10.
Also, if the length of two strings is the same, pick the latter one.

Use a std::unordered_map<char, size_t> to store the indices past the last occurance of a certain char.
Keep the currently best match as well as the match you currently test. Iterating through the chars of the input result in 2 cases you need to handle:
the char already occured and the last occurance of the char requires you to move the start of the potential match to avoid the char from occuring twice: Update the answer with the match ending just before the current char, if that's better than the current answer.
Otherwise: Just update the map
void printsubstring(const char* input)
{
std::unordered_map<char, size_t> lastOccurances;
Answer answer{ 0, 0 };
size_t currentPos = 0;
size_t currentStringStart = 0;
char c;
while ((c = input[currentPos]) != 0)
{
auto entry = lastOccurances.insert({ c, currentPos + 1 });
if (!entry.second)
{
if (currentStringStart < entry.first->second && currentPos - currentStringStart > answer.length)
{
// need to move the start of the potential answer
// -> check, if the match up to the char before the current char was better
answer.start = currentStringStart;
answer.length = currentPos - currentStringStart;
currentStringStart = entry.first->second;
}
entry.first->second = currentPos + 1;
}
++currentPos;
}
// check the match ending at the end of the string
if (currentPos - currentStringStart > answer.length)
{
answer.start = currentStringStart;
answer.length = currentPos - currentStringStart;
}
std::cout << answer.start << ", " << answer.length << std::endl;
std::cout << std::string_view(input + answer.start, answer.length) << std::endl;
}

I'll outline one possible solution.
You'll need two loops. One for pointing at the start of the substring and one that points at the end.
auto stringlen = std::strlen(string);
for(size_t beg = 0; beg < stringlen - sub.length; ++beg) {
// See point 2.
for(size_t end = beg; end < stringlen; ++end) {
// See point 3.
}
}
Create a "blacklist" of characters already seen in the substring.
bool blacklist[1 << CHAR_BIT]{}; // zero initialized
Check if the current end character is already in the blacklist and break out of the loop if it is, otherwise, put it in the blacklist.
if(blacklist[ static_cast<unsigned char>(string[end]) ]) break;
else {
blacklist[ static_cast<unsigned char>(string[end]) ] = true;
// See point 4.
}
Check if the length of the current substring (end - beg + 1) is greater than the longest you currently have (sub.length). If it is longer, store sub.start = beg and sub.length = end - beg + 1
Demo and Demo using a bitset<> instead

Why is my brute force substring search returning extra counts?

Doing some work with timing different algorithms, however my brute force implementation which I have found numerous times on different sites is sometimes returning more results than, say, Notepad++ search or VSCode search. Not sure what I am doing wrong.
The program opens a txt file with a DNA strand string of length 10000000 and searches and counts the number of occurrences of the string passed in via command line.
Algorithm:
int main(int argc, char *argv[]) {
// read in dna strand
ifstream file("dna.txt");
string dna((istreambuf_iterator<char>(file)), istreambuf_iterator<char>());
dna.c_str();
int dnaLength = dna.length();
cout << "DNA Strand Length: " << dnaLength << endl;
string pat = argv[1];
cout << "Pattern: " << pat << endl;
// algorithm
int M = pat.length();
int N = dnaLength;
int localCount = 0;
for (int i = 0; i <= N - M; i++) {
int j;
for (j = 0; j < M; j++) {
if (dna.at(i + j) != pat.at(j)) {
break;
}
}
if (j == M) {
localCount++;
}
}

The difference might be because your algorithm also counts overlapping results, while a quick check with Notepad++ shows that it does not.
Example:
Let dna be "FooFooFooFoo"
And your pattern "FooFoo"
What result do you expect? Notepad++ shows 2 (one starts at position 1, the second at position 7 (after the first).
Your algorithm will find 3 (position 1, 4 and 7)

In your algorithm, the index i increase by 1 every loop. This may cause double counting for some searching pattern. For eaxample, search for ABAB in the text ... ABABABABABAB .... The answer may be 5 times in your methods, and it would be 3 times if each character is not allowed to be double counted. Which answer you want?
To avoid double counting, you may rewrite the index i to a while loop:
i = 0;
while (i < M)
{
for (j = 0; j < M; j++) {
if (dna.at(i + j) != pat.at(j)) {
break;
}
}
if (j == M) {
localCount++;
i += M;
}
else ++i;
}
Or, you can employ the function std::string::find(const string&, int p=0). The first argument is the pattern to look for, and the second the position to start search:
int pos = 0, count=0;
pos = dna.find(pat); // initial serach start from pos=0;
while( pos != std::string::npos) { // while not end of string
++count;
pos = dna.find(pat, pos + M); // start search from pos+M
}
These two methods provide a self-confirmation for confidence.

Search a string for all occurrences of a substring in C++

Write a function countMatches that searches the substring in the given string and returns how many times the substring appears in the string.
I've been stuck on this awhile now (6+ hours) and would really appreciate any help I can get. I would really like to understand this better.
int countMatches(string str, string comp)
{
int small = comp.length();
int large = str.length();
int count = 0;
// If string is empty
if (small == 0 || large == 0) {
return -1;
}
// Increment i over string length
for (int i = 0; i < small; i++) {
// Output substring stored in string
for (int j = 0; j < large; j++) {
if (comp.substr(i, small) == str.substr(j, large)) {
count++;
}
}
}
cout << count << endl;
return count;
}
When I call this function from main, with countMatches("Hello", "Hello"); I get the output of 5. Which is completely wrong as it should return 1. I just want to know what I'm doing wrong here so I don't repeat the mistake and actually understand what I am doing.

I figured it out. I did not need a nested for loop because I was only comparing the secondary string to that of the string. It also removed the need to take the substring of the first string. SOOO... For those interested, it should have looked like this:
int countMatches(string str, string comp)
{
int small = comp.length();
int large = str.length();
int count = 0;
// If string is empty
if (small == 0 || large == 0) {
return -1;
}
// Increment i over string length
for (int i = 0; i < large; i++) {
// Output substring stored in string
if (comp == str.substr(i, small)) {
count++;
}
}
cout << count << endl;
return count;
}

The usual approach is to search in place:
std::string::size_type pos = 0;
int count = 0;
for (;;) {
pos = large.find(small, pos);
if (pos == std::string::npos)
break;
++count;
++pos;
}
That can be tweaked if you're not concerned about overlapping matches (i.e., looking for all occurrences of "ll" in the string "llll", the answer could be 3, which the above algorithm will give, or it could be 2, if you don't allow the next match to overlap the first. To do that, just change ++pos to pos += small.size() to resume the search after the entire preceding match.

The problem with your function is that you are checking that:
Hello is substring of Hello
ello is substring of ello
llo is substring of llo
...
of course this matches 5 times in this case.
What you really need is:
For each position i of str
check if the substring of str starting at i and of length = comp.size() is exactly comp.
The following code should do exactly that:
size_t countMatches(const string& str, const string& comp)
{
size_t count = 0;
for (int j = 0; j < str.size()-comp.size()+1; j++)
if (comp == str.substr(j, comp.size()))
count++;
return count;
}

extraction of letters at even positions in strings?

string extract(string scrambeledword){
unsigned int index;
string output;
string input= " ";
for (index=0; index <= scrambeledword.length() ; index++);
{
if (index%2==0)
{
output+=input ;
cout << output;
}
}
return output;}
I want to extract the even numbered indexed letters from the 40 letter long word inputted by users. does this make sense? i have not taken arrays yet and do not want to include them.

Problems:
1. You have a ; after your for loop, the loop body is never run.
2. <= is wrong here since scrambeledword.length() is out of range. Use != or < instead.
3. You need to either assign something to input before adding it to output or get rid of it altogether.
4. As #Aconcagua pointed out, it is worth noting that I removed your declaration of index from the function scope and added it only to the for loop scope. If you also considered doing so, compiler would throw an error (since it'd be undeclared outside of the scope of for) and you'd be noted about the ; problem.
Fixed version:
string extract(const string &scrambeledword){ // copying strings is expensive
// unsigned int index; // obsolete
string output;
// string input= " "; // obsolete
for (size_t index = 0; index != scrambeledword.length(); ++index) // `<=` would be wrong since scrambeledword.length() is out of range
{
if (index % 2 == 0)
{
output += scrambeledword[index];
// cout << output; // obsolete. If you just want the characters, print scrambeledword[index]
cout << scrambeledword[index];
}
}
cout << endl; // break the line for better readability
return output;
}

Your code won't run the block under the for because there is a ; at the end of the line. That means the for runs without block. Basically it will count to the length of the given word.
In the for index <= scrambeledword.length() can cause an out of bound exception because you can index out of the string-array. Use index < scrambeledword.length() instead.
This can be a good solution for the problem:
string extract(const string& scrambeledword)
{
string output;
for (unsigned int index = 0; index < scrambeledword.length(); index++)
{
if (index % 2 == 0)
{
output += scrambeledword[index];
}
}
return output;
}

auto str = "HelloWorld"s;
int i = 0;
for_each(str.cbegin(), str.cend(), [&i](char const & c) { if (i++ % 2 == 0) cout << c; });
output: Hlool

You could go with something like this:
for(int i = 0; i < scrambleword.length(); i+=2){
output += scrambleword.at(i);
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Count occurrences of a sub string in a String - c++

Looks ok. Do you need an << endl; to see output in your terminal?

Related

Cannot find a logical sense

Is there a struct function for finding the longest and not-repeating length substring within a string?

Why is my brute force substring search returning extra counts?

Search a string for all occurrences of a substring in C++

extraction of letters at even positions in strings?

Categories

Resources