Splitting a string info maximum number of equal substrings - c++

Given a string, what's the most optimized solution to find the maximum number of equal substrings? For example "aaaa" is composed of four equal substrings "a", or "abab" is composed of two "ab"s. But for something as "abcd" there isn't any substrings but "abcd" that when concatenated to itself would make up "abcd".
Checking all the possible substrings isn't a solution since the input can be a string of length 1 million.

Since there is no given condition for the substrings, an optimized solution to find the maximum number of equal substrings is to count the shortest possible strings, letters. Create a map and count the letters of the string. Find the letter with the maximum number. That is your solution.
EDIT:
If the string must only consist of the substrings then the following code computes a solution
#include <iostream>
#include <string>
using ull = unsigned long long;
int main() {
std::string str = "abab";
ull length = str.length();
for (ull i = 1; (2 * i) <= str.length() && str.length() % i == 0; ++i) {
bool found = true;
for (ull j = 1; (j * i) < str.length(); ++j) {
for (ull k = 0; k < i; ++k) {
if(str[k] != str[k + j * i]) {
found = false;
}
}
}
if(found) {
length = i;
break;
}
}
std::cout << "Maximal number: " << str.length() / length << std::endl;
return 0;
}
This algorithm checks if the head of the string is repeated and if the string only consists of repetitions of the head.
i-loop iterates over the length of the head,
j-loop iterates over each repetition,
k-loop iterates over each character in the substring

Related

Max Length Substring without any repeating character

Given a String, find the length of longest substring without any repeating character.
Example 1:
Input: s = ”abcabcbb”
Output: 3
Explanation: The answer is abc with length of 3.
Example 2:
Input: s = ”bbbbb”
Output: 1
Explanation: The answer is b with length of 1 units.
My solution works, but it isn't optimised. How can this be done in O(n) time?
#include<bits/stdc++.h>
using namespace std;
int solve(string str) {
if(str.size()==0)
return 0;
int maxans = INT_MIN;
for (int i = 0; i < str.length(); i++) // outer loop for traversing the string
{
unordered_set < int > set;
for (int j = i; j < str.length(); j++) // nested loop for getting different string starting with str[i]
{
if (set.find(str[j]) != set.end()) // if element if found so mark it as ans and break from the loop
{
maxans = max(maxans, j - i);
break;
}
set.insert(str[j]);
}
}
return maxans;
}
int main() {
string str = "abcsabcds";
cout << "The length of the longest substring without repeating characters is " <<
solve(str);
return 0;
}
Use a two pointer approach along with a hashmap here.
Initialise two pointers i = 0, j = 0 (i and j denote the left and right boundary of the current substring)
If the j-th character is not in the map, we can extend the substring. Add the j-th char to the map and increment j.
If the j-th character is in the map, we can not extend the substring without removing the earlier occurrence of the character. Remove the i-th char from the map and increment i.
Repeat this while j < length of string
This will have a time and space complexity of O(n).
#include <string>
#include <iostream>
#include <vector>
int main() {
// 1
std::string s;
std::cin >> s;
// 2
std::vector<int> lut(127, -1);
int i, beg{ 0 }, len_curr{ 0 }, len_ans{ 0 };
for (i = 0; i != s.size(); ++i) {
if (lut[s[i]] == -1 || lut[s[i]] < beg) {
++len_curr;
}
else {
if (len_curr > len_ans) {
len_ans = len_curr;
}
beg = lut[s[i]] + 1;
len_curr = i - lut[s[i]];
}
lut[s[i]] = i;
}
if (len_curr > len_ans) {
len_ans = len_curr;
}
// 3
std::cout << len_ans << '\n';
return 0;
}
In // 1 you:
Define and read your string s.
In // 2 you:
Define your look up table lut, which is a vector of int and consists of 127 buckets each initialized with -1. As per this article there are 95 printable ASCII characters numbered 32 to 126 hence we allocated 127 buckets. lut[ch] is the position in s where you found the character ch for the last time.
Define i (index variable for s), beg (the position in s where your current substring begin at), len_curr (the length of your current substring), len_ans (the length you are looking for).
Loop through s. If you have never found the character s[i] before OR you have found it but at a position BEFORE beg (It belonged to some previous substring in s) you increment len_curr. Otherwise you have a repeating character ! You compare len_curr against len_ans and If needed you assign. Your new beg will be the position AFTER the one you found your repeating character for the last time at. Your new len_curr will be the difference between your current position in s and the position that you found your repeating character for the last time at.
You assign i to lut[s[i]] which means that you found the character s[i] for the last time at position i.
You repeat the If clause when you fall through the loop because your longest substring can be IN the end of s.
In // 3 you:
Print len_ans.

nth odd digit palindrome number

The following is the code of an nth even length palindrome, but i wanted to know the code for nth odd length palindrome.
#include <bits/stdc++.h>
using namespace std;
// Function to find nth even length Palindrome
string evenlength(string n)
{
// string r to store resultant
// palindrome. Initialize same as s
string res = n;
// In this loop string r stores
// reverse of string s after the
// string s in consecutive manner .
for (int j = n.length() - 1; j >= 0; --j)
res += n[j];
return res;
}
// Driver code
int main()
{
string n = "10";
// Function call
cout << evenlength(n);
return 0;
}
If you want an odd length palindrome (i.e. 101 for an input of 10 or 1234321 for an input of 1234), you might be able to replace the for loop with:
for (int j = static_cast<int>(n.size()) - 2; j >= 0; --j)
res += n[j];
(note n.length() - 2 rather than n.length() - 1)
This will skip the last character when adding the reversed version so that the last digit only shows up once.
Since you want to take input as a string and return a string, you can also make a new variable that is a sub string of input(ignoring last digit),reverse it and concat and return.
string Oddlength(string a)
{
string b;
b=a.substr(0,a.size()-1);
reverse(b.begin(),b.end());
a+=b;
return a;
}
This works because the nth(n>10) odd digit palindrome number is just the number concat with the reverse of the number ignoring the last digit. While for (n<10), its the number itself.
So the 120th odd digit palindrome number is "120"+ "21"="12021".
1200th odd digit palindrome number is "1200" + "021"="1200021".

Given two string S and T. Determine a substring of S that has minimum difference with T?

I have two string S and T where length of S >= length of T. I have to determine a substring of S which has same length as T and has minimum difference with T. Here difference between two strings of same length means, the number of indexes where they differ. For example: "ABCD" and "ABCE" differ at 3rd index, so their difference is 1.
I know I can use KMP(Knuth Morris Pratt) Pattern Searching algorithm to search T within S. But, what if S doesn't contain T as a substring? So, I have coded a brute force approach to solve this:
int main() {
string S, T;
cin >> S >> T;
int SZ_S = S.size(), SZ_T = T.size(), MinDifference = INT_MAX;
string ans;
for (int i = 0; i + SZ_T <= SZ_S; i++) { // I generate all the substring of S
int CurrentDifference = 0; // and check their difference with T
for (int j = 0; j < SZ_T; j++) { // and store the substring with minimum difference
if (S[i + j] != T[j])
CurrentDifference++;
}
if (CurrentDifference < MinDifference) {
ans = S.substr (i, SZ_T);
MinDifference = CurrentDifference;
}
}
cout << ans << endl;
}
But, my approach only works when S and T has shorter length. But, the problem is S and T can have length as large as 2 * 10^5. How can I approach this?
Let's maximize the number of characters that match. We can solve the problem for each character of the alphabet separately, and then sum up the results for
substrings. To solve the problem for a particular character, give string S and T as sequences 0 and 1 and multiply them using the FFT https://en.wikipedia.org/wiki/Fast_Fourier_transform.
Complexity O(|A| * N log N) where |A| size of the alphabet (for an uppercase letter is 26).

Find the minimum number of moves to get a "Good" string

A string is called to be good if and only if "All the distinct characters in String are repeated the same number of times".
Now, Given a string of length n, what is the minimum number of changes we have to make in this string so that string becomes good.
Note : We are only allowed to use lowercase English letters, and we can change any letter to any other letter.
Example : Let String is yyxzzxxx
Then here answer is 2.
Explanation : One possible solution yyxyyxxx. We have changed 2 'z' to 2 'y'. Now both 'x' and 'y' are repeated 4 times.
My Approach :
Make a hash of occurrence of all 26 lowercase letters.
Also find number of distinct alphabets in string.
Sort this hash array and start checking if length of string is divisible by number of distinct characters.If yes then we got the answer.
Else reduce distinct characters by 1.
But its giving wrong answers for some results as their may be cases when removing some character that has not occur minimum times provide a good string in less moves.
So how to do this question.Please help.
Constraints : Length of string is up to 2000.
My Approach :
string s;
cin>>s;
int hash[26]={0};
int total=s.length();
for(int i=0;i<26;i++){
hash[s[i]-'a']++;
}
sort(hash,hash+total);
int ans=0;
for(int i=26;i>=1;i--){
int moves=0;
if(total%i==0){
int eachshouldhave=total/i;
int position=26;
for(int j=1;j<26;j++){
if(hash[j]>eachshouldhave && hash[j-1]<eachshouldhave){
position=j;
break;
}
}
int extrasymbols=0;
//THE ONES THAT ARE BELOW OBVIOUSLY NEED TO BE CHANGED TO SOME OTHER SYMBOL
for(int j=position;j<26;j++){
extrasymbols+=hash[j]-eachshouldhave;
}
//THE ONES ABOVE THIS POSITION NEED TO GET SOME SYMBOLS FROM OTHERS
for(int j=0;j<position;j++){
moves+=(eachshouldhave-hash[j]);
}
if(moves<ans)
ans=moves;
}
else
continue;
}
Following should fix your implementation:
std::size_t compute_change_needed(const std::string& s)
{
int count[26] = { 0 };
for(char c : s) {
// Assuming only valid char : a-z
count[c - 'a']++;
}
std::sort(std::begin(count), std::end(count), std::greater<int>{});
std::size_t ans = s.length();
for(std::size_t i = 1; i != 27; ++i) {
if(s.length() % i != 0) {
continue;
}
const int expected_count = s.length() / i;
std::size_t moves = 0;
for(std::size_t j = 0; j != i; j++) {
moves += std::abs(count[j] - expected_count);
}
ans = std::min(ans, moves);
}
return ans;
}

How many palindromes can be formed by selections of characters from a string?

I'm posting this on behalf of a friend since I believe this is pretty interesting:
Take the string "abb". By leaving out
any number of letters less than the
length of the string we end up with 7
strings.
a b b ab ab bb abb
Out of these 4 are palindromes.
Similarly for the string
"hihellolookhavealookatthispalindromexxqwertyuiopasdfghjklzxcvbnmmnbvcxzlkjhgfdsapoiuytrewqxxsoundsfamiliardoesit"
(a length 112 string) 2^112 - 1
strings can be formed.
Out of these how many are
palindromes??
Below there is his implementation (in C++, C is fine too though). It's pretty slow with very long words; he wants to know what's the fastest algorithm possible for this (and I'm curious too :D).
#include <iostream>
#include <cstring>
using namespace std;
void find_palindrome(const char* str, const char* max, long& count)
{
for(const char* begin = str; begin < max; begin++) {
count++;
const char* end = strchr(begin + 1, *begin);
while(end != NULL) {
count++;
find_palindrome(begin + 1, end, count);
end = strchr(end + 1, *begin);
}
}
}
int main(int argc, char *argv[])
{
const char* s = "hihellolookhavealookatthis";
long count = 0;
find_palindrome(s, strlen(s) + s, count);
cout << count << endl;
}
First of all, your friend's solution seems to have a bug since strchr can search past max. Even if you fix this, the solution is exponential in time.
For a faster solution, you can use dynamic programming to solve this in O(n^3) time. This will require O(n^2) additional memory. Note that for long strings, even 64-bit ints as I have used here will not be enough to hold the solution.
#define MAX_SIZE 1000
long long numFound[MAX_SIZE][MAX_SIZE]; //intermediate results, indexed by [startPosition][endPosition]
long long countPalindromes(const char *str) {
int len = strlen(str);
for (int startPos=0; startPos<=len; startPos++)
for (int endPos=0; endPos<=len; endPos++)
numFound[startPos][endPos] = 0;
for (int spanSize=1; spanSize<=len; spanSize++) {
for (int startPos=0; startPos<=len-spanSize; startPos++) {
int endPos = startPos + spanSize;
long long count = numFound[startPos+1][endPos]; //if str[startPos] is not in the palindrome, this will be the count
char ch = str[startPos];
//if str[startPos] is in the palindrome, choose a matching character for the palindrome end
for (int searchPos=startPos; searchPos<endPos; searchPos++) {
if (str[searchPos] == ch)
count += 1 + numFound[startPos+1][searchPos];
}
numFound[startPos][endPos] = count;
}
}
return numFound[0][len];
}
Explanation:
The array numFound[startPos][endPos] will hold the number of palindromes contained in the substring with indexes startPos to endPos.
We go over all pairs of indexes (startPos, endPos), starting from short spans and moving to longer ones. For each such pair, there are two options:
The character at str[startPos] is not in the palindrome. In that case, there are numFound[startPos+1][endPos] possible palindromes - a number that we have calculated already.
character at str[startPos] is in the palindrome (at its beginning). We scan through the string to find a matching character to put at the end of the palindrome. For each such character, we use the already-calculated results in numFound to find number of possibilities for the inner palindrome.
EDIT:
Clarification: when I say "number of palindromes contained in a string", this includes non-contiguous substrings. For example, the palindrome "aba" is contained in "abca".
It's possible to reduce memory usage to O(n) by taking advantage of the fact that calculation of numFound[startPos][x] only requires knowledge of numFound[startPos+1][y] for all y. I won't do this here since it complicates the code a bit.
Pregenerating lists of indices containing each letter can make the inner loop faster, but it will still be O(n^3) overall.
I have a way can do it in O(N^2) time and O(1) space, however I think there must be other better ways.
the basic idea was the long palindrome must contain small palindromes, so we only search for the minimal match, which means two kinds of situation: "aa", "aba". If we found either , then expand to see if it's a part of a long palindrome.
int count_palindromic_slices(const string &S) {
int count = 0;
for (int position=0; position<S.length(); position++) {
int offset = 0;
// Check the "aa" situation
while((position-offset>=0) && (position+offset+1)<S.length() && (S.at(position-offset))==(S.at(position+offset+1))) {
count ++;
offset ++;
}
offset = 1; // reset it for the odd length checking
// Check the string for "aba" situation
while((position-offset>=0) && position+offset<S.length() && (S.at(position-offset))==(S.at(position+offset))) {
count ++;
offset ++;
}
}
return count;
}
June 14th, 2012
After some investigation, I believe this is the best way to do it.
faster than the accepted answer.
Is there any mileage in making an initial traversal and building an index of all occurances of each character.
h = { 0, 2, 27}
i = { 1, 30 }
etc.
Now working from the left, h, only possible palidromes are at 3 and 17, does char[0 + 1] == char [3 -1] etc. got a palindrome. does char [0+1] == char [27 -1] no, No further analysis of char[0] needed.
Move on to char[1], only need to example char[30 -1] and inwards.
Then can probably get smart, when you've identified a palindrome running from position x->y, all inner subsets are known palindromes, hence we've dealt with some items, can eliminate those cases from later examination.
My solution using O(n) memory and O(n^2) time, where n is the string length:
palindrome.c:
#include <stdio.h>
#include <string.h>
typedef unsigned long long ull;
ull countPalindromesHelper (const char* str, const size_t len, const size_t begin, const size_t end, const ull count) {
if (begin <= 0 || end >= len) {
return count;
}
const char pred = str [begin - 1];
const char succ = str [end];
if (pred == succ) {
const ull newCount = count == 0 ? 1 : count * 2;
return countPalindromesHelper (str, len, begin - 1, end + 1, newCount);
}
return count;
}
ull countPalindromes (const char* str) {
ull count = 0;
size_t len = strlen (str);
size_t i;
for (i = 0; i < len; ++i) {
count += countPalindromesHelper (str, len, i, i, 0); // even length palindromes
count += countPalindromesHelper (str, len, i, i + 1, 1); // odd length palindromes
}
return count;
}
int main (int argc, char* argv[]) {
if (argc < 2) {
return 0;
}
const char* str = argv [1];
ull count = countPalindromes (str);
printf ("%llu\n", count);
return 0;
}
Usage:
$ gcc palindrome.c -o palindrome
$ ./palindrome myteststring
EDIT: I misread the problem as the contiguous substring version of the problem. Now given that one wants to find the palindrome count for the non-contiguous version, I strongly suspect that one could just use a math equation to solve it given the number of distinct characters and their respective character counts.
Hmmmmm, I think I would count up like this:
Each character is a palindrome on it's own (minus repeated characters).
Each pair of the same character.
Each pair of the same character, with all palindromes sandwiched in the middle that can be made from the string between repeats.
Apply recursively.
Which seems to be what you're doing, although I'm not sure you don't double-count the edge cases with repeated characters.
So, basically, I can't think of a better way.
EDIT:
Thinking some more,
It can be improved with caching, because you sometimes count the palindromes in the same sub-string more than once. So, I suppose this demonstrates that there is definitely a better way.
Here is a program for finding all the possible palindromes in a string written in both Java and C++.
int main()
{
string palindrome;
cout << "Enter a String to check if it is a Palindrome";
cin >> palindrome;
int length = palindrome.length();
cout << "the length of the string is " << length << endl;
int end = length - 1;
int start = 0;
int check=1;
while (end >= start) {
if (palindrome[start] != palindrome[end]) {
cout << "The string is not a palindrome";
check=0;
break;
}
else
{
start++;
end--;
}
}
if(check)
cout << "The string is a Palindrome" << endl;
}
public String[] findPalindromes(String source) {
Set<String> palindromes = new HashSet<String>();
int count = 0;
for(int i=0; i<source.length()-1; i++) {
for(int j= i+1; j<source.length(); j++) {
String palindromeCandidate = new String(source.substring(i, j+1));
if(isPalindrome(palindromeCandidate)) {
palindromes.add(palindromeCandidate);
}
}
}
return palindromes.toArray(new String[palindromes.size()]);
}
private boolean isPalindrome(String source) {
int i =0;
int k = source.length()-1;
for(i=0; i<source.length()/2; i++) {
if(source.charAt(i) != source.charAt(k)) {
return false;
}
k--;
}
return true;
}
I am not sure but you might try whit fourier. This problem remined me on this: O(nlogn) Algorithm - Find three evenly spaced ones within binary string
Just my 2cents