finding the longest substring with k different/unique characters using hash c++

finding the longest substring with k different/unique characters using hash c++ - c++

I came to the problem of finding the longest substring with k unique characters. For instance, given the following str=abcbbbddcc, the results should be:
k=2 => bcbbb
k=3 => bcbbbddcc
I created a function for this purposes using a hash table. The hash-table is going to act as a search-window. Whenever there are more than k unique characters inside of the current window, I shrink it by moving the current "start" of the windows to the right. Otherwise, I just expand the size of the window. Unfortunately, it seems to be a bug on my code and still I'm not able to find it. Could anyone please help me to find the issue? The output of my function are the start index of the substring together with its length, i.e. substring(start, start+maxSize);. I found some related posts java-sol and python-sol, but still no C++ based solution using a hash-table.
#include <iostream>
#include <vector>
#include <string>
#include <unordered_map>
typedef std::vector<int> vector;
typedef std::string string;
typedef std::unordered_map<char, int> unordered_map;
typedef unordered_map::iterator map_iter;
vector longestSubstring(const string & str, int k){
if(str.length() == 0 || k < 0){
return {0};
}
int size = str.length();
int start = 0;
unordered_map map;
int maxSize = 0;
int count = 0;
char c;
for(int i = 0; i < size; i++){
c = str[i];
if(map.find(c)!=map.end()){
map[c]++;
}
else{
map.insert({c, 1});
}
while(map.size()>k){
c = str[start];
count = map[c];
if(count>1){
map[c]--;
}
else{
map.erase(c);
}
start++;
}
maxSize = std::max(maxSize, i-start+1);
}
return {start, maxSize};
}

Before maxSize = std::max(maxSize, i-start+1); you must ensure that map size is exactly k - you can never reach k but current code instanly updates maxSize .
Also remember start value in own max code
if (map.size() == k)
if (i - start + 1 > maxSize) {
maxSize = i - start + 1;
astart = start;
}
...
return {astart, maxSize};
Ideone check

Related

Why unordered_map not showing correct index values of a vector?

I have a string "codeforces" and now when i am storing characters of this string as key in an unordered map and index of occurrence of that character in a vector inside unordered map as value , then it is not showing correct indexes .
In this string "codeforces" character 'c' is occurring at index 1 and 8 , i would like to store character c as key in map and corresponding indexes of occurrences inside vector as value in unordered map . But when i am doing this it is not showing correct value . Can any body tell me why is this happening ?
#include <iostream>
#include <unordered_map>
#include <vector>
#include <algorithm>
#include <string>
using namespace std;
int main(){
string x = "codeforces";
unordered_map<char,vector<int>> data;
for(int i = 1;i <= x.size();i++){
data[x[i]].push_back(i);
}
for(auto it : data){
if(it.first == 'c'){
vector<int> out = it.second;
for(int j = 0;j < out.size();j++){
cout<<out[j]<<" ";
}
cout<<endl;
}
}
return 0;
}
Output should be like this (for character 'c') -> 1 8 .
But it is showing -> 7 .

your for loop has a wrong range. You start at element 1 and because of <= only stop at the size of codeforces + 1, which is out of bounds.
When iterating arrays the index starts at 0 and should end at size() - 1. This can be easily achieved by saying < size() as the less operator will result in false if the index is at size() - therefore size() - 1 is the last iteration step.
You have two options, either go from 1 to size() and access [i - 1]
for(int i = 1; i <= x.size(); i++){
data[x[i - 1]].push_back(i);
}
or go from 0 to size() - 1 and push_back(i + 1)
for(int i = 0; i < x.size(); i++){
data[x[i]].push_back(i + 1);
}
I recommend the latter, as it's the common way to iterate arrays.
read here why you should avoid writing using namespace std;.

Best method for finding and replacing a subarray

I've written some code to find a desired sub-array within a larger array and replace it with a different sub-array of the same length.
e.g.:
int array[] = {1,2,3,4,1,2,3,4};
int find[] = {1,2,3};
int replace[] = {7,8,9};
replaceSubArray(array, 8, find, replace, 3);
And replaceSubArray modifies 'array' to contain {7,8,9,4,7,8,9,4}
My function looks like this:
void replaceSubArray(char* longArray, int longLength, char* find, char* replace, int shortLength) {
int findIndex = 0, replaceIndex = 0;
for (int i = 0; i < longLength; ++i) //loop through larger array
{
if (longArray[i] == find[findIndex]) //if we find a match for an element
{
if (++findIndex == shortLength) //increment the findIndex and see if the entire subarray has been found in the larger array
{
for (int j = i - (shortLength - 1); j <= i; ++j) //entire array has been matched so go back to start of the appearance of subarray in larger array
{
longArray[j] = replace[replaceIndex]; //replace the matched subarray with the contents of replace[]
replaceIndex++;
}
replaceIndex = 0; //reset replaceIndex and findIndex to 0 so we can restart the search for more subarray matches
findIndex = 0;
}
} else { //if an element wasn't matched, reset findIndex to 0 to restart the search for subarray matches
findIndex = 0;
}
replaceIndex = 0;
}
}
It works fine but I am a beginner programmer and was curious if there is any better way to do this? Or if there are any built in functions that would help.

Use standard algorithms. You have
int array[] = {1,2,3,4,1,2,3,4};
int find[] = {1,2,3};
int replace[] = {7,8,9};
then you can use (requires #include <algorithm>, #include <iterator>)
using std::begin, std::end;
auto it = begin(array);
for (;;) {
it = std::search(it, end(array), begin(find), end(find));
if (it == end(array))
break;
it = std::copy(begin(replace), end(replace), it);
}
(live demo)
You can also use the Boyer-Moore searcher: (requires #include <functional>)
using std::begin, std::end;
auto searcher = std::boyer_moore_searcher(begin(find), end(find));
auto it = begin(array);
for (;;) {
it = std::search(it, end(array), searcher);
if (it == end(array))
break;
it = std::copy(begin(replace), end(replace), it);
}
(live demo)
Whether or not this will improve performance depends on a lot of factors, so profile.

To replace just the first occurence:
#include <string.h>
void replaceSubArray(int* longArray, int longLength, int* find, int* replace, int shortLength)
{
int i, k = 0;
for (i = 0 ; i < longLength ; ++i)
{
if (longArray[i] == find[k++])
{
if ( k == shortLength )
{
memcpy(longArray + i + 1 - k, replace, sizeof(int) * shortLength);
break;
}
continue;
}
k = 0;
}
}
To replace all occurences:
#include <string.h>
void replaceSubArray(int* longArray, int longLength, int* find, int* replace, int shortLength)
{
int i, k = 0;
for (i = 0 ; i < longLength ; ++i)
{
if (longArray[i] == find[k++])
{
if ( k == shortLength )
memcpy(longArray + i + 1 - k, replace, sizeof(int) * shortLength);
else
continue;
}
k = 0;
}
}
In C I would prefer this way.
PS: The question was tagged with C too before. Noticed that just now that C tag has been removed. Still posted in case if it helps.

If the elements in your find-array are all different, you could in most cases skip some indexes in your else-case:
replace:
else { //if an element wasn't matched, reset findIndex to 0 to restart the search for subarray matches
findIndex = 0;
}
with
else { //if an element wasn't matched, reset findIndex to 0 to restart the search for subarray matches
findIndex = 0;
i+=find.length-findIndex; // there could not be a match starting before this index.
}
If not all entries in your find-index are different you could use a similar (more complicated) approach. See Knuth–Morris–Pratt algorithm
Using memcpy instead of a loop to make the actual replace should also speed things up a bit.
Hint:
Always profile each change to see if, and in which extend, the change improved the performance.

Here is the sample code in which I used std::vector and few already present feature of c++
#include<stdio.h>
#include<iostream>
#include<vector>
#include<algorithm>
int main () {
std::vector<int> vect1 = {1,2,3,4,5};
std::vector<int> find = {3,4,5};
std::vector<int> replace = {5,6,7};
auto it = std::search(vect1.begin(),vect1.end(),find.begin(),find.end()); // Finds sub array in main vect1
int i = 0;
while ((it != vect1.end()) && (i< replace.size())) {
*it = replace[i]; // replace each elements on by one start from searched index from std::search
i++; //iterate replace vector
it++; //iterate main vector
}
return 0;
}

Convert a string into a char array

New to C++ and So here is part of a project I'm working on, taking a string and printing the most commonly used number along with how many times it was used. i thought this was right, but for some reason my char array wont be read in. any tips or suggestions on how to fix?
#include <string>
#include <iostream>
using namespace std;
char getMostFreqLetter(string ss);
int main() {
string s; //initilizing a variable for string s
s = ("What is the most common letter in this string "); // giving s a string
getMostFreqLetter(s); // caling the function to print out the most freq Letter
return 0;
}
char getMostFreqLetter(string ss) {
int max, index, i = 0;
int array[255] = {0};
char letters[];
// convert all letters to lowercase to make counting letters non case sensative
for (int i = 0; i < ss.length(); i ++){
ss[i] = tolower(ss[i]);
}
//read each letter into
for (int i = 0; i < ss.length(); i ++){
++array[letters[i]];
}
//
max = array[0];
index = 0;
for (int i = 0; i < ss.length(); i ++){
if( array[i] > max)
{
max = array[i];
index = i;
}
}
return 0;
}

If you are not considering white space as letter.
Then more efficient way could have been
vector<int> count(26,0);
for (int i = 0; i < s.length(); i++) {
int range = to_lower(s[i])-'a';
if ( range >= 0 && range < 26)
count[range]++;
}
// Now you can do fix the max while iterating over count;

Use string::c_str().
It converts a string to a character array.

You have a few errors in your code.
Firstly, the array of chars letters is completely unused. You should disregard it and iterate over the string ss instead which is what I think you intended to do.
This would change your second for loop from ++array[letters[i]]; to ++array[ss[i]];.
Secondly, your last for loop is buggy. You are using i as the index to look for the frequency in array whereas you need to use the ascii value of the character (ss[i]) instead. Here is a fixed version with comments:
index = ss[0];
max = array[index];
for (int i = 0; i < ss.length(); i ++){
if(!isspace(ss[i]) && array[ss[i]] > max)
{
max = array[ss[i]]; // you intended to use the ascii values of the characters in s to mark their place in array. In you code, you use i which is the just the index of the character in s as opposed to the ascii value of that character. Hence you need to use array[ss[i]].
index = ss[i];
}
}
return index;
Once you make the above changes you get the following output when run on your string:
Most freq character: t

Getting Word Frequency From Vector In c++

I have googled this question and couldn't find an answer that worked with my code so i wrote this to get the frequency of the words the only issue is that i am getting the wrong number of occurrences of words apart form one that i think is a fluke. Also i am checking to see if a word has already been entered into the vector so i don't count the same word twice.
fileSize = textFile.size();
vector<wordFrequency> words (fileSize);
int index = 0;
for(int i = 0; i <= fileSize - 1; i++)
{
for(int j = 0; j < fileSize - 1; j++)
{
if(string::npos != textFile[i].find(textFile[j]) && words[i].Word != textFile[j])
{
words[j].Word = textFile[i];
words[j].Times = index++;
}
}
index = 0;
}
Any help would be appreciated.

Consider using a std::map<std::string,int> instead. The map class will handle ensuring that you don't have any duplicates.

Using an associative container:
typedef std::unordered_map<std::string, unsigned> WordFrequencies;
WordFrequencies count(std::vector<std::string> const& words) {
WordFrequencies wf;
for (std::string const& word: words) {
wf[word] += 1;
}
return wf;
}
It is hard to get simpler...
Note: you can replace unordered_map with map if you want the worlds sorted alphabetically, and you can write custom comparisons operations to treat them case-insensitively.

try this code instead if you do not want to use a map container..
struct wordFreq{
string word;
int count;
wordFreq(string str, int c):word(str),count(c){}
};
vector<wordFreq> words;
int ffind(vector<wordFreq>::iterator i, vector<wordFreq>::iterator j, string s)
{
for(;i<j;i++){
if((*i).word == s)
return 1;
}
return 0;
}
Code for finding the no of occurrences in a textfile vector is then:
for(int i=0; i< textfile.size();i++){
if(ffind(words.begin(),words.end(),textfile[i])) // Check whether word already checked for, if so move to the next one, i.e. avoid repetitions
continue;
words.push_back(wordFreq(textfile[i],1)); // Add the word to vector as it was not checked before and set its count to 1
for(int j = i+1;j<textfile.size();j++){ // find possible duplicates of textfile[i]
if(file[j] == (*(words.end()-1)).word)
(*(words.end()-1)).count++;
}
}

Need help optimizing a program that finds all possible substrings

I have to find all possible, unique substrings from a bunch of user-input strings. This group of substrings has to be alphabetically sorted without any duplicate elements, and the group must be queryable by number. Here's some example input and output:
Input:
3 // This is the user's desired number of strings
abc // So the user inputs 3 strings
abd
def
2 // This is the user's desired number of queries
7 // So the user inputs 2 queries
2
Output:
// From the alphabetically sorted group of unique substrings,
bd // This is the 7th substring
ab // And this is the 2nd substring
Here's my implementation:
#include <map>
#include <iostream>
using namespace std;
int main() {
int number_of_strings;
int number_of_queries;
int counter;
string current_string;
string current_substr;
map<string, string> substrings;
map<int, string> numbered_substrings;
int i;
int j;
int k;
// input step
cin >> number_of_strings;
string strings[number_of_strings];
for (i = 0; i < number_of_strings; ++i)
cin >> strings[i];
cin >> number_of_queries;
int queries[number_of_queries];
for (i = 0; i < number_of_queries; ++i)
cin >> queries[i];
// for each string in 'strings', I want to insert every possible
// substring from that string into my 'substrings' map.
for (i = 0; i < number_of_strings; ++i) {
current_string = strings[i];
for (j = 1; j <= current_string.length(); ++j) {
for (k = 0; k <= current_string.length()-j; ++k) {
current_substr = current_string.substr(k, j);
substrings[current_substr] = current_substr;
}
}
}
// my 'substrings' container is now sorted alphabetically and does
// not contain duplicate elements, because the container is a map.
// but I want to make the map queryable by number, so I'm iterating
// through 'substrings' and assigning each value to an int key.
counter = 1;
for (map<string,string>::iterator it = substrings.begin();
it != substrings.end(); ++it) {
numbered_substrings[counter] = it->second;
++counter;
}
// output step
for (i = 0; i < number_of_queries; ++i) {
if (queries[i] > 0 && queries[i] <= numbered_substrings.size()) {
cout << numbered_substrings[queries[i]] << endl;
} else {
cout << "INVALID" << endl;
}
}
return 0;
}
I need to optimize my algorithm, but I'm not sure how to do it. Maybe it's the fact that I have a second for loop for assigning new int keys to each substring. Help?

Check out Suffix tree. It usually runs in O(n) time:
This article was helpful for me:
http://allisons.org/ll/AlgDS/Tree/Suffix/

Minor notes:
1. include <string>
2. careful with those } else {; one day you'll have a lot of else if branches
and a lot of lines and you'll wonder where an if starts and where it ends
3. careful with unsigned versus signed mismatching... again, one day it will
come back and bite (also, it's nice to compile without errors or warnings)
4. don't try to define static arrays with a variable size
5. nice with ++ i. not many know it has a slight performance boost
(maybe not noticeable with today's processors but still)
While I do agree that using proper algorithms when needed (say bubble sort, heap sort etc. for sorting, binary search, binary trees etc. for searching), sometimes I find it nice to do an optimization on current code. Imagine having a big project and implementing something requires rewrites... not many are willing to wait for you (not to mention the required unit testing, fat testing and maybe fit testing). At least my opinion. [and yes, I know some are gonna say that if it is so complicated then it was written badly from the start - but hey, you can't argue with programmers that left before you joined the team :P]
But I do agree, using existing stuff is a good alternative when called for. But back to the point. I tested it with
3, abc, def, ghi
4, 1, 3, 7, 12
I can't say whether yours is any slower than mine or vice-versa; perhaps a random string generator that adds maybe 500 inputs (then calculates all subs) might be a better test, but I am too lazy at 2 in the morning. At most, my way of writing it might help you (at least to me it seems simpler and uses less loops and assignments). Not a fan of vectors, cos of the slight overhead, but I used it to keep up with your requirement of dynamic querying... a static array of a const would be faster, obviously.
Also, while not my style of naming conventions, I decided to use your names so you can follow the code easier.
Anyway, take a look and tell me what you think:
#include <map>
#include <iostream>
#include <string> // you forgot to add this... trust me, it's important :)
#include <vector> // not a fan, but it's not that bad IF you want dynamic buffers
#include <strstream>
using namespace std;
int main ()
{
unsigned int number_of_strings = 0;
// string strings[number_of_strings]; // don't do this... you can't assign static arrays of a variable size
// this just defaults to 0; you're telling the compiler
cin >> number_of_strings;
map <string, string> substrings;
string current_string, current_substr;
unsigned int i, j, k;
for (i = 0; i < number_of_strings; ++ i)
{
cin >> current_string;
substrings[current_string] = current_string;
for (j = 1; j <= current_string.length(); ++ j)
{
for (k = 0; k <= current_string.length() - j; ++ k)
{
current_substr = current_string.substr(k, j);
substrings[current_substr] = current_substr;
}
}
}
vector <string> numbered_substrings;
for (map <string, string>::iterator it = substrings.begin(); it != substrings.end(); ++ it)
numbered_substrings.push_back(it->second);
unsigned int number_of_queries = 0;
unsigned int query = 0;
cin >> number_of_queries;
current_string.clear();
for (i = 0; i < number_of_queries; ++ i)
{
cin >> query;
-- query;
if ((query >= 0) && (query < numbered_substrings.size()))
current_string = current_string + numbered_substrings[query] + '\n';
else
cout << "INVALID: " << query << '\n' << endl;
}
cout << current_string;
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

finding the longest substring with k different/unique characters using hash c++ - c++

Related

Why unordered_map not showing correct index values of a vector?

Best method for finding and replacing a subarray

Convert a string into a char array

Getting Word Frequency From Vector In c++

Need help optimizing a program that finds all possible substrings

Categories

Resources