Max Length Substring without any repeating character - c++

Given a String, find the length of longest substring without any repeating character.
Example 1:
Input: s = ”abcabcbb”
Output: 3
Explanation: The answer is abc with length of 3.
Example 2:
Input: s = ”bbbbb”
Output: 1
Explanation: The answer is b with length of 1 units.
My solution works, but it isn't optimised. How can this be done in O(n) time?
#include<bits/stdc++.h>
using namespace std;
int solve(string str) {
if(str.size()==0)
return 0;
int maxans = INT_MIN;
for (int i = 0; i < str.length(); i++) // outer loop for traversing the string
{
unordered_set < int > set;
for (int j = i; j < str.length(); j++) // nested loop for getting different string starting with str[i]
{
if (set.find(str[j]) != set.end()) // if element if found so mark it as ans and break from the loop
{
maxans = max(maxans, j - i);
break;
}
set.insert(str[j]);
}
}
return maxans;
}
int main() {
string str = "abcsabcds";
cout << "The length of the longest substring without repeating characters is " <<
solve(str);
return 0;
}

Use a two pointer approach along with a hashmap here.
Initialise two pointers i = 0, j = 0 (i and j denote the left and right boundary of the current substring)
If the j-th character is not in the map, we can extend the substring. Add the j-th char to the map and increment j.
If the j-th character is in the map, we can not extend the substring without removing the earlier occurrence of the character. Remove the i-th char from the map and increment i.
Repeat this while j < length of string
This will have a time and space complexity of O(n).

#include <string>
#include <iostream>
#include <vector>
int main() {
// 1
std::string s;
std::cin >> s;
// 2
std::vector<int> lut(127, -1);
int i, beg{ 0 }, len_curr{ 0 }, len_ans{ 0 };
for (i = 0; i != s.size(); ++i) {
if (lut[s[i]] == -1 || lut[s[i]] < beg) {
++len_curr;
}
else {
if (len_curr > len_ans) {
len_ans = len_curr;
}
beg = lut[s[i]] + 1;
len_curr = i - lut[s[i]];
}
lut[s[i]] = i;
}
if (len_curr > len_ans) {
len_ans = len_curr;
}
// 3
std::cout << len_ans << '\n';
return 0;
}
In // 1 you:
Define and read your string s.
In // 2 you:
Define your look up table lut, which is a vector of int and consists of 127 buckets each initialized with -1. As per this article there are 95 printable ASCII characters numbered 32 to 126 hence we allocated 127 buckets. lut[ch] is the position in s where you found the character ch for the last time.
Define i (index variable for s), beg (the position in s where your current substring begin at), len_curr (the length of your current substring), len_ans (the length you are looking for).
Loop through s. If you have never found the character s[i] before OR you have found it but at a position BEFORE beg (It belonged to some previous substring in s) you increment len_curr. Otherwise you have a repeating character ! You compare len_curr against len_ans and If needed you assign. Your new beg will be the position AFTER the one you found your repeating character for the last time at. Your new len_curr will be the difference between your current position in s and the position that you found your repeating character for the last time at.
You assign i to lut[s[i]] which means that you found the character s[i] for the last time at position i.
You repeat the If clause when you fall through the loop because your longest substring can be IN the end of s.
In // 3 you:
Print len_ans.

Related

Is there a struct function for finding the longest and not-repeating length substring within a string?

The aim of the function is to find out the longest and not repeating substring, so I need to find out the start position of the substring and the length of it. The thing I'm struggling with is the big O notation should be O(n). Therefore I cannot use nested for loops to check whether each letter is repeated.
I created a struct function like this but I don't know how to continue:
struct Answer {
int start;
int length;
};
Answer findsubstring(char *string){
Answer sub={0, 0}
for (int i = 0; i < strlen(string); i++) {
}
return (sub)
}
For example, the input is HelloWorld, and the output should be World.The length is 5.
If the input isabagkfleoKi, then the output is bagkfleoKi. The length is 10.
Also, if the length of two strings is the same, pick the latter one.
Use a std::unordered_map<char, size_t> to store the indices past the last occurance of a certain char.
Keep the currently best match as well as the match you currently test. Iterating through the chars of the input result in 2 cases you need to handle:
the char already occured and the last occurance of the char requires you to move the start of the potential match to avoid the char from occuring twice: Update the answer with the match ending just before the current char, if that's better than the current answer.
Otherwise: Just update the map
void printsubstring(const char* input)
{
std::unordered_map<char, size_t> lastOccurances;
Answer answer{ 0, 0 };
size_t currentPos = 0;
size_t currentStringStart = 0;
char c;
while ((c = input[currentPos]) != 0)
{
auto entry = lastOccurances.insert({ c, currentPos + 1 });
if (!entry.second)
{
if (currentStringStart < entry.first->second && currentPos - currentStringStart > answer.length)
{
// need to move the start of the potential answer
// -> check, if the match up to the char before the current char was better
answer.start = currentStringStart;
answer.length = currentPos - currentStringStart;
currentStringStart = entry.first->second;
}
entry.first->second = currentPos + 1;
}
++currentPos;
}
// check the match ending at the end of the string
if (currentPos - currentStringStart > answer.length)
{
answer.start = currentStringStart;
answer.length = currentPos - currentStringStart;
}
std::cout << answer.start << ", " << answer.length << std::endl;
std::cout << std::string_view(input + answer.start, answer.length) << std::endl;
}
I'll outline one possible solution.
You'll need two loops. One for pointing at the start of the substring and one that points at the end.
auto stringlen = std::strlen(string);
for(size_t beg = 0; beg < stringlen - sub.length; ++beg) {
// See point 2.
for(size_t end = beg; end < stringlen; ++end) {
// See point 3.
}
}
Create a "blacklist" of characters already seen in the substring.
bool blacklist[1 << CHAR_BIT]{}; // zero initialized
Check if the current end character is already in the blacklist and break out of the loop if it is, otherwise, put it in the blacklist.
if(blacklist[ static_cast<unsigned char>(string[end]) ]) break;
else {
blacklist[ static_cast<unsigned char>(string[end]) ] = true;
// See point 4.
}
Check if the length of the current substring (end - beg + 1) is greater than the longest you currently have (sub.length). If it is longer, store sub.start = beg and sub.length = end - beg + 1
Demo and Demo using a bitset<> instead

Given two string S and T. Determine a substring of S that has minimum difference with T?

I have two string S and T where length of S >= length of T. I have to determine a substring of S which has same length as T and has minimum difference with T. Here difference between two strings of same length means, the number of indexes where they differ. For example: "ABCD" and "ABCE" differ at 3rd index, so their difference is 1.
I know I can use KMP(Knuth Morris Pratt) Pattern Searching algorithm to search T within S. But, what if S doesn't contain T as a substring? So, I have coded a brute force approach to solve this:
int main() {
string S, T;
cin >> S >> T;
int SZ_S = S.size(), SZ_T = T.size(), MinDifference = INT_MAX;
string ans;
for (int i = 0; i + SZ_T <= SZ_S; i++) { // I generate all the substring of S
int CurrentDifference = 0; // and check their difference with T
for (int j = 0; j < SZ_T; j++) { // and store the substring with minimum difference
if (S[i + j] != T[j])
CurrentDifference++;
}
if (CurrentDifference < MinDifference) {
ans = S.substr (i, SZ_T);
MinDifference = CurrentDifference;
}
}
cout << ans << endl;
}
But, my approach only works when S and T has shorter length. But, the problem is S and T can have length as large as 2 * 10^5. How can I approach this?
Let's maximize the number of characters that match. We can solve the problem for each character of the alphabet separately, and then sum up the results for
substrings. To solve the problem for a particular character, give string S and T as sequences 0 and 1 and multiply them using the FFT https://en.wikipedia.org/wiki/Fast_Fourier_transform.
Complexity O(|A| * N log N) where |A| size of the alphabet (for an uppercase letter is 26).

Splitting a string info maximum number of equal substrings

Given a string, what's the most optimized solution to find the maximum number of equal substrings? For example "aaaa" is composed of four equal substrings "a", or "abab" is composed of two "ab"s. But for something as "abcd" there isn't any substrings but "abcd" that when concatenated to itself would make up "abcd".
Checking all the possible substrings isn't a solution since the input can be a string of length 1 million.
Since there is no given condition for the substrings, an optimized solution to find the maximum number of equal substrings is to count the shortest possible strings, letters. Create a map and count the letters of the string. Find the letter with the maximum number. That is your solution.
EDIT:
If the string must only consist of the substrings then the following code computes a solution
#include <iostream>
#include <string>
using ull = unsigned long long;
int main() {
std::string str = "abab";
ull length = str.length();
for (ull i = 1; (2 * i) <= str.length() && str.length() % i == 0; ++i) {
bool found = true;
for (ull j = 1; (j * i) < str.length(); ++j) {
for (ull k = 0; k < i; ++k) {
if(str[k] != str[k + j * i]) {
found = false;
}
}
}
if(found) {
length = i;
break;
}
}
std::cout << "Maximal number: " << str.length() / length << std::endl;
return 0;
}
This algorithm checks if the head of the string is repeated and if the string only consists of repetitions of the head.
i-loop iterates over the length of the head,
j-loop iterates over each repetition,
k-loop iterates over each character in the substring

String having maximum number of given substrings made after swapping some characters?

So, this is an interview question that I was going through.
I have strings a, b, and c. I want to obtain string k by swapping some letters in a, so that k should contain as many non-overlapping substrings equal either to b or c as possible. Substring of string x is a string formed by consecutive segment of characters from x. Two substrings of string x overlap if there is position i in string x occupied by both of them.
Input: The first line contains string a, the second line contains string b, and the third line contains string c (1 ≤ |a|, |b|, |c| ≤ 10^5, where |s| denotes the length of string s).
All three strings consist only of lowercase English letters.
It is possible that b and c coincide.
Output: Find one of possible strings k.
Example:
I/P
abbbaaccca
ab
aca
O/P
ababacabcc
this optimal solutions has three non-overlaping substrings equal to either b or c on positions 1 – 2 (ab), 3 – 4 (ab), 5 – 7 (aca).
Now, the approach that I could think of was to make a character count array for each of the strings, and then proceed ahead. Basically, iterate over the original string (a), check for occurences of b and c. If not there, swap as many characters as possible to make either b or c (whichever is shorter). But, clearly this is not the optimal approach.
Can anyone suggest something better? (Only pseudocode will be enough)
Thanks!
First thing is you'll need to do is count the number of occurrences of each character of each string. The occurrences count of a will be your knapsack, whom you'll need to fill with as many b's or c's.
Note that when I say knapsack I mean the character count vector of a, and inserting b to a will mean reducing the character count vector of a by the character count vector of b.
I'm a little bit short with my mathematical prove, but you'll need to
insert as many b as possible to the knapsack
Insert as many c as possible to the knapsack (in the space that left after 1).
If a removal of a b from the knapsack will enable an insertion of more c, remove b from the knapsack. Otherwise, finish.
Fill as many c that you can to the knapsack
Repeat 3-4.
Throughout the program count the number of b and c in the knapsack and the output should be:
[b_count times b][c_count times c][char_occurrence_left_in_knapsack_for_char_x times char_x for each char_x in lower_case_english]
This should solve your problem at O(n).
Assuming that allowed characters have ASCII code 0-127, I would write a function to count the occurence of each character in a string:
int[] count(String s) {
int[] res = new int[128];
for(int i=0; i<res.length(); i++)
res[i] = 0;
for(int i=0; i<a.length(); i++)
res[i]++;
return res;
}
We can now count occurrences in each string:
int aCount = count(a);
int bCount = count(b);
int cCount = count(c);
We can then write a function to count how many times a string can be carved out of characters of another string:
int carveCount(int[] strCount, int[] subStrCount) {
int min = Integer.MAX_VALUE;
for(int i=0; i<subStrCount.length(); i++) {
if (subStrCount[i] == 0)
continue;
if (strCount[i] >= subStrCount[i])
min = Math.min(min, strCount[i]-subStrCount[i]);
else {
return 0;
}
}
for(int i=0; i<subStrCount.length(); i++) {
if (subStrCount[i] != 0)
strStrCount[i] -= min;
}
return min;
}
and call the function:
int bFitCount = carve(aCount, bCount);
int cFitCount = carve(aCount, cCount);
EDIT: I didn't realize you wanted all characters originally in a, fixing here.
Finally, to produce the output:
StringBuilder sb = new StringBuilder();
for(int i=0; i<bFitCount; i++) {
sb.append(b);
for(int i=0; i<cFitCount; i++) {
sb.append(c);
for(int i=0; i<aCount.length; i++) {
for(int j=0; j<aCount[i]; j++)
sb.append((char)i);
}
return sb.toString();
One more comment: if the goal is to maximize the number of repetitions(b)+repetitions(c), then you may want to first swab b and c if c is shorter. This way if they share some characters you have better chance of increasing the result.
The algorithm could be optimized further, but as it is it should have complexity O(n), where n is the sum of the length of the three strings.
A related problem is called Knapsack problem.
This is basically the solution described by #Tal Shalti.
I tried to keep everything readable.
My program return abbcabacac as one of the string with the most occurences (3).
To get all permutations without repeating a permutation I use std::next_permutation from algorithm. There not much happening in the main function. I only store the number of occurrences and the permutation, if a higher number of occurrences was achieved.
int main()
{
std::string word = "abbbaaccca";
std::string patternSmall = "ab";
std::string patternLarge = "aca";
unsigned int bestOccurrence = 0;
std::string bestPermutation = "";
do {
// count and remove occurrence
unsigned int occurrences = FindOccurences(word, patternLarge, patternSmall);
if (occurrences > bestOccurrence) {
bestOccurrence = occurrences;
bestPermutation = word;
std::cout << word << " .. " << occurences << std::endl;
}
} while (std::next_permutation(word.begin(), word.end()));
std::cout << "Best Permutation " << bestPermutation << " with " << bestOccurrence << " occurrences." << std::endl;
return 0;
}
This function handles the basic algorithm. pattern1 is the longer pattern, so it will be searched for last. If a pattern is found, it will be replaced with the string "##", since this should be very rare in the English language.
The variable occurrenceCounter keeps track of the number of found occurences.
unsigned int FindOccurrences(const std::string& word, const std::string& pattern1, const std::string& pattern2)
{
unsigned int occurrenceCounter = 0;
std::string tmpWord(word);
// '-1' makes implementation of while() easier
std::string::size_type i = -1;
i = -1;
while (FindPattern(tmpWord, pattern2, ++i)) {
occurrenceCounter++;
tmpWord.replace(tmpWord.begin() + i, tmpWord.begin() + i + pattern2.size(), "##");
}
i = -1;
while (FindPattern(tmpWord, pattern1, ++i)) {
occurrenceCounter++;
tmpWord.replace(tmpWord.begin() + i, tmpWord.begin() + i + pattern1.size(), "##");
}
return occurrenceCounter;
}
This function returns the first position of the found pattern. If the pattern is not found, std::string::npos is returned by string.find(...). Also string.find(...) starts to search for the pattern starting by index i.
bool FindPattern(const std::string& word, const std::string& pattern, std::string::size_type& i)
{
std::string::size_type foundPosition = word.find(pattern, i);
if (foundPosition == std::string::npos) {
return false;
}
i = foundPosition;
return true;
}

Find the minimum number of moves to get a "Good" string

A string is called to be good if and only if "All the distinct characters in String are repeated the same number of times".
Now, Given a string of length n, what is the minimum number of changes we have to make in this string so that string becomes good.
Note : We are only allowed to use lowercase English letters, and we can change any letter to any other letter.
Example : Let String is yyxzzxxx
Then here answer is 2.
Explanation : One possible solution yyxyyxxx. We have changed 2 'z' to 2 'y'. Now both 'x' and 'y' are repeated 4 times.
My Approach :
Make a hash of occurrence of all 26 lowercase letters.
Also find number of distinct alphabets in string.
Sort this hash array and start checking if length of string is divisible by number of distinct characters.If yes then we got the answer.
Else reduce distinct characters by 1.
But its giving wrong answers for some results as their may be cases when removing some character that has not occur minimum times provide a good string in less moves.
So how to do this question.Please help.
Constraints : Length of string is up to 2000.
My Approach :
string s;
cin>>s;
int hash[26]={0};
int total=s.length();
for(int i=0;i<26;i++){
hash[s[i]-'a']++;
}
sort(hash,hash+total);
int ans=0;
for(int i=26;i>=1;i--){
int moves=0;
if(total%i==0){
int eachshouldhave=total/i;
int position=26;
for(int j=1;j<26;j++){
if(hash[j]>eachshouldhave && hash[j-1]<eachshouldhave){
position=j;
break;
}
}
int extrasymbols=0;
//THE ONES THAT ARE BELOW OBVIOUSLY NEED TO BE CHANGED TO SOME OTHER SYMBOL
for(int j=position;j<26;j++){
extrasymbols+=hash[j]-eachshouldhave;
}
//THE ONES ABOVE THIS POSITION NEED TO GET SOME SYMBOLS FROM OTHERS
for(int j=0;j<position;j++){
moves+=(eachshouldhave-hash[j]);
}
if(moves<ans)
ans=moves;
}
else
continue;
}
Following should fix your implementation:
std::size_t compute_change_needed(const std::string& s)
{
int count[26] = { 0 };
for(char c : s) {
// Assuming only valid char : a-z
count[c - 'a']++;
}
std::sort(std::begin(count), std::end(count), std::greater<int>{});
std::size_t ans = s.length();
for(std::size_t i = 1; i != 27; ++i) {
if(s.length() % i != 0) {
continue;
}
const int expected_count = s.length() / i;
std::size_t moves = 0;
for(std::size_t j = 0; j != i; j++) {
moves += std::abs(count[j] - expected_count);
}
ans = std::min(ans, moves);
}
return ans;
}