So, this is an interview question that I was going through.
I have strings a, b, and c. I want to obtain string k by swapping some letters in a, so that k should contain as many non-overlapping substrings equal either to b or c as possible. Substring of string x is a string formed by consecutive segment of characters from x. Two substrings of string x overlap if there is position i in string x occupied by both of them.
Input: The first line contains string a, the second line contains string b, and the third line contains string c (1 ≤ |a|, |b|, |c| ≤ 10^5, where |s| denotes the length of string s).
All three strings consist only of lowercase English letters.
It is possible that b and c coincide.
Output: Find one of possible strings k.
Example:
I/P
abbbaaccca
ab
aca
O/P
ababacabcc
this optimal solutions has three non-overlaping substrings equal to either b or c on positions 1 – 2 (ab), 3 – 4 (ab), 5 – 7 (aca).
Now, the approach that I could think of was to make a character count array for each of the strings, and then proceed ahead. Basically, iterate over the original string (a), check for occurences of b and c. If not there, swap as many characters as possible to make either b or c (whichever is shorter). But, clearly this is not the optimal approach.
Can anyone suggest something better? (Only pseudocode will be enough)
Thanks!
First thing is you'll need to do is count the number of occurrences of each character of each string. The occurrences count of a will be your knapsack, whom you'll need to fill with as many b's or c's.
Note that when I say knapsack I mean the character count vector of a, and inserting b to a will mean reducing the character count vector of a by the character count vector of b.
I'm a little bit short with my mathematical prove, but you'll need to
insert as many b as possible to the knapsack
Insert as many c as possible to the knapsack (in the space that left after 1).
If a removal of a b from the knapsack will enable an insertion of more c, remove b from the knapsack. Otherwise, finish.
Fill as many c that you can to the knapsack
Repeat 3-4.
Throughout the program count the number of b and c in the knapsack and the output should be:
[b_count times b][c_count times c][char_occurrence_left_in_knapsack_for_char_x times char_x for each char_x in lower_case_english]
This should solve your problem at O(n).
Assuming that allowed characters have ASCII code 0-127, I would write a function to count the occurence of each character in a string:
int[] count(String s) {
int[] res = new int[128];
for(int i=0; i<res.length(); i++)
res[i] = 0;
for(int i=0; i<a.length(); i++)
res[i]++;
return res;
}
We can now count occurrences in each string:
int aCount = count(a);
int bCount = count(b);
int cCount = count(c);
We can then write a function to count how many times a string can be carved out of characters of another string:
int carveCount(int[] strCount, int[] subStrCount) {
int min = Integer.MAX_VALUE;
for(int i=0; i<subStrCount.length(); i++) {
if (subStrCount[i] == 0)
continue;
if (strCount[i] >= subStrCount[i])
min = Math.min(min, strCount[i]-subStrCount[i]);
else {
return 0;
}
}
for(int i=0; i<subStrCount.length(); i++) {
if (subStrCount[i] != 0)
strStrCount[i] -= min;
}
return min;
}
and call the function:
int bFitCount = carve(aCount, bCount);
int cFitCount = carve(aCount, cCount);
EDIT: I didn't realize you wanted all characters originally in a, fixing here.
Finally, to produce the output:
StringBuilder sb = new StringBuilder();
for(int i=0; i<bFitCount; i++) {
sb.append(b);
for(int i=0; i<cFitCount; i++) {
sb.append(c);
for(int i=0; i<aCount.length; i++) {
for(int j=0; j<aCount[i]; j++)
sb.append((char)i);
}
return sb.toString();
One more comment: if the goal is to maximize the number of repetitions(b)+repetitions(c), then you may want to first swab b and c if c is shorter. This way if they share some characters you have better chance of increasing the result.
The algorithm could be optimized further, but as it is it should have complexity O(n), where n is the sum of the length of the three strings.
A related problem is called Knapsack problem.
This is basically the solution described by #Tal Shalti.
I tried to keep everything readable.
My program return abbcabacac as one of the string with the most occurences (3).
To get all permutations without repeating a permutation I use std::next_permutation from algorithm. There not much happening in the main function. I only store the number of occurrences and the permutation, if a higher number of occurrences was achieved.
int main()
{
std::string word = "abbbaaccca";
std::string patternSmall = "ab";
std::string patternLarge = "aca";
unsigned int bestOccurrence = 0;
std::string bestPermutation = "";
do {
// count and remove occurrence
unsigned int occurrences = FindOccurences(word, patternLarge, patternSmall);
if (occurrences > bestOccurrence) {
bestOccurrence = occurrences;
bestPermutation = word;
std::cout << word << " .. " << occurences << std::endl;
}
} while (std::next_permutation(word.begin(), word.end()));
std::cout << "Best Permutation " << bestPermutation << " with " << bestOccurrence << " occurrences." << std::endl;
return 0;
}
This function handles the basic algorithm. pattern1 is the longer pattern, so it will be searched for last. If a pattern is found, it will be replaced with the string "##", since this should be very rare in the English language.
The variable occurrenceCounter keeps track of the number of found occurences.
unsigned int FindOccurrences(const std::string& word, const std::string& pattern1, const std::string& pattern2)
{
unsigned int occurrenceCounter = 0;
std::string tmpWord(word);
// '-1' makes implementation of while() easier
std::string::size_type i = -1;
i = -1;
while (FindPattern(tmpWord, pattern2, ++i)) {
occurrenceCounter++;
tmpWord.replace(tmpWord.begin() + i, tmpWord.begin() + i + pattern2.size(), "##");
}
i = -1;
while (FindPattern(tmpWord, pattern1, ++i)) {
occurrenceCounter++;
tmpWord.replace(tmpWord.begin() + i, tmpWord.begin() + i + pattern1.size(), "##");
}
return occurrenceCounter;
}
This function returns the first position of the found pattern. If the pattern is not found, std::string::npos is returned by string.find(...). Also string.find(...) starts to search for the pattern starting by index i.
bool FindPattern(const std::string& word, const std::string& pattern, std::string::size_type& i)
{
std::string::size_type foundPosition = word.find(pattern, i);
if (foundPosition == std::string::npos) {
return false;
}
i = foundPosition;
return true;
}
Related
I have two string S and T where length of S >= length of T. I have to determine a substring of S which has same length as T and has minimum difference with T. Here difference between two strings of same length means, the number of indexes where they differ. For example: "ABCD" and "ABCE" differ at 3rd index, so their difference is 1.
I know I can use KMP(Knuth Morris Pratt) Pattern Searching algorithm to search T within S. But, what if S doesn't contain T as a substring? So, I have coded a brute force approach to solve this:
int main() {
string S, T;
cin >> S >> T;
int SZ_S = S.size(), SZ_T = T.size(), MinDifference = INT_MAX;
string ans;
for (int i = 0; i + SZ_T <= SZ_S; i++) { // I generate all the substring of S
int CurrentDifference = 0; // and check their difference with T
for (int j = 0; j < SZ_T; j++) { // and store the substring with minimum difference
if (S[i + j] != T[j])
CurrentDifference++;
}
if (CurrentDifference < MinDifference) {
ans = S.substr (i, SZ_T);
MinDifference = CurrentDifference;
}
}
cout << ans << endl;
}
But, my approach only works when S and T has shorter length. But, the problem is S and T can have length as large as 2 * 10^5. How can I approach this?
Let's maximize the number of characters that match. We can solve the problem for each character of the alphabet separately, and then sum up the results for
substrings. To solve the problem for a particular character, give string S and T as sequences 0 and 1 and multiply them using the FFT https://en.wikipedia.org/wiki/Fast_Fourier_transform.
Complexity O(|A| * N log N) where |A| size of the alphabet (for an uppercase letter is 26).
I'm trying to find the number of substrings in a string with same first and last characters.
I could solve it in the naive way by taking two for loops.
I feel that it can be solved much more efficiently.
How do I solve it in a more efficient way?
Loop over the string, and count the number of occurrences for each distinct character. Then, if a character occurs only once, there are no substrings ending and beginning with it, if it occurs twice -- there's only 1, if 3 times -- there are 3, if 4 times -- 6. The function is C(n,2) = n!/(2!(n-2)!) = n(n-1)/2.
Here's a potential implementation.
inline int Nchoose2(int n) {
return n*(n-1)/2;
}
std::string s;
std::map<char,int> m;
int cnt = 0;
for (char c : s) {
if (!m.count(c)) m[c] = 0;
else ++m[c];
}
for (auto &c : m)
cnt += Nchoose2(c.second);
Suppose I have a string "abcdpqrs",
now "dcb" can be counted as a substring of above string as the characters are together.
Also "pdq" is a part of above string. But "bcpq" is not. I hope you got what I want.
Is there any efficient way to do this.
All I can think is taking help of hash to do this. But it is taking long time even in O(n) program as backtracking is required in many cases. Any help will be appreciated.
Here is an O(n * alphabet size) solution:
Let's maintain an array count[a] = how many times the character a was in the current window [pos; pos + lenght of substring - 1]. It can be recomputed in O(1) time when the window is moved by 1 to the right(count[s[pos]]--, count[s[pos + substring lenght]]++, pos++). Now all we need is to check for each pos that count array is the same as count array for the substring(it can be computed only once).
It can actually be improved to O(n + alphabet size):
Instead of comparing count arrays in a naive way, we can maintain the number diff = number of characters that do not have the same count value as in a substring for the current window. The key observation is that diff changes in obvious way we apply count[c]-- or count[c]++ (it either gets incremented, decremented or stays the same depending on only count[c] value). Two count arrays are the same if and only if diff is zero for current pos.
Lets say you have the string "axcdlef" and wants to search "opde":
bool compare (string s1, string s2)
{
// sort both here
// return if they are equal when sorted;
}
you would need to call this function for this example with the following substrings of size 4(same as length as "opde"):
"axcd"
"xcdl"
"cdle"
"dlef"
bool exist = false;
for (/*every split that has the same size as the search */)
exist = exist || compare(currentsplit, search);
You can use a regex (i.e boost or Qt) for this. Alternately you an use this simple approach. You know the length k of the string s to be searched in string str. So take each k consecutive characters from str and check if any of these characters is present in s.
Starting point ( a naive implementation to make further optimizations):
#include <iostream>
/* pos position where to extract probable string from str
* s string set with possible repetitions being searched in str
* str original string
*/
bool find_in_string( int pos, std::string s, std::string str)
{
std::string str_s = str.substr( pos, s.length());
int s_pos = 0;
while( !s.empty())
{
std::size_t found = str_s.find( s[0]);
if ( found!=std::string::npos)
{
s.erase( 0, 1);
str_s.erase( found, 1);
} else return 0;
}
return 1;
}
bool find_in_string( std::string s, std::string str)
{
bool found = false;
int pos = 0;
while( !found && pos < str.length() - s.length() + 1)
{
found = find_in_string( pos++, s, str);
}
return found;
}
Usage:
int main() {
std::string s1 = "abcdpqrs";
std::string s2 = "adcbpqrs";
std::string searched = "dcb";
std::string searched2 = "pdq";
std::string searched3 = "bcpq";
std::cout << find_in_string( searched, s1);
std::cout << find_in_string( searched, s2);
std::cout << find_in_string( searched2, s1);
std::cout << find_in_string( searched3, s1);
return 0;
}
prints: 1110
http://ideone.com/WrSMeV
To use an array for this you are going to need some extra code to map where each character goes in there... Unless you know you are only using 'a' - 'z' or something similar that you can simply subtract from 'a' to get the position.
bool compare(string s1, string s2)
{
int v1[SIZE_OF_ALFABECT];
int v2[SIZE_OF_ALFABECT];
int count = 0;
map<char, int> mymap;
// here is just pseudocode
foreach letter in s1:
if map doesnt contain this letter already:
mymap[letter] = count++;
// repeat the same foreach in s2
/* You can break and return false here if you try to add new char into map,
that means that the second string has a different character already... */
// count will now have the number of distinct chars that you have in both strs
// you will need to check only 'count' positions in the vectors
for(int i = 0; i < count; i++)
v1[i] = v2[i] = 0;
//another pseudocode
foreach letter in s1:
v1[mymap[leter]]++;
foreach letter in s1:
v2[mymap[leter]]++;
for(int i = 0; i < count; i++)
if(v1[i] != v2[i])
return false;
return true;
}
Here is a O(m) best case, O(m!) worst case solution - m being the length of your search string:
Use a suffix-trie, e.g. a Ukkonnen Trie (there are some floating around, but I have no link at hand at the moment), and search for any permutation of the substring. Note that any lookup needs just O(1) for each chararacter of the string to search, regardless of the size of n.
However, while the size of n does not matter, this becomes inpractical for large m.
If however n is small enough anf one is willing to sacrifice lookup performance for index size, the suffix trie can store a string that contains all permutations of the original string.
Then the lookup will always be O(m).
I'd suggest to go with the accepted answer for the general case. However, here you have a suggestion that can perform (much) better for small substrings and large string.
I need to compare string into following way. Can anyone provide me some insight or algorithm in c++.
For example:
"a5" < "a11" - because 5 is less than 11
"6xxx < 007asdf" - because 6 < 7
"00042Q < 42s" - because Q < s alphabetically
"6 8" < "006 9" - because 8 < 9
I suggest you look at the algorithm strverscmp uses - indeed it might be that this function will do the job for you.
What this function does is the following. If both strings are equal,
return 0. Otherwise find the position between two bytes with the
property that before it both strings are equal, while directly after
it there is a difference. Find the largest consecutive digit strings
containing (or starting at, or ending at) this position. If one or
both of these is empty, then return what strcmp(3) would have
returned (numerical ordering of byte values). Otherwise, compare both
digit strings numerically, where digit strings with one or more
leading zeros are interpreted as if they have a decimal point in front
(so that in particular digit strings with more leading zeros come
before digit strings with fewer leading zeros). Thus, the ordering is
000, 00, 01, 010, 09, 0, 1, 9, 10.
Your examples only show digits, letters, and spaces. So for the moment I'll assume you ignore every other symbol (effectively treat them as spaces). You also seem to want to treat uppercase and lowercase letters as equivalent.
It also appears that you interpret runs of digits as a "term" and runs of letters as a "term", with any transition between a letter and a digit being equivalent to a space. A single space is considered equivalent to any number of spaces.
(Note: You are conspicuously missing an example of what to do in cases like:
"5a" vs "a11"
"a5" vs "11a"
So you have to work out what to do when you face a comparison of a numeric term with a string term. You also don't mention intrinsic equalities...such as should "5 a" == "5a" just because "5 a" < "5b"?)
One clear way of doing this would be turn the strings into std::vector of "terms", and then compare these vectors (rather than trying to compare the strings directly). These terms would be either numeric or string. This might help get you started, especially the STL answer:
how to split a string value that contains characters and numbers
Trickier methods that worked on the strings themselves without making an intermediary will be faster in one-off comparisons. But they'll likely be harder to understand and modify, and perhaps slower if you are going to repeatedly compare the same structures.
A nice aspect of parsing into a structure is that you get an intrinsic "cleanup" of the data in the process. Getting the information into a canonical form is often a goal in programs that are tolerating such a variety of inputs.
I'm assuming that you want the compare to be done in this order: presence of digits in range 1-9; value of digits; number of digits; value of the string after the digits.
It's in C, but you can easily transform it into using the C++ std::string class.
int isdigit(int c)
{
return c >= '1' && c <= '9';
}
int ndigits(const char *s)
{
int i, nd = 0;
int n = strlen(s);
for (i = 0; i < n; i++) {
if (isdigit(s[i]))
nd++;
}
return nd;
}
int compare(const char *s, const char *t)
{
int sd, td;
int i, j;
sd = ndigits(s);
td = ndigits(t);
/* presence of digits */
if (!sd && !td)
return strcasecmp(s, t);
else if (!sd)
return 1;
else if (!td)
return -1;
/* value of digits */
for (i = 0, j = 0; i < sd && j < td; i++, j++) {
while (! isdigit(*s))
s++;
while (! isdigit(*t))
t++;
if (*s != *t)
return *s - *t;
s++;
t++;
}
/* number of digits */
if (i < sd)
return 1;
else if (j < td)
return -1;
/* value of string after last digit */
return strcasecmp(s, t);
}
Try this and read about std::string.compare:
#include <iostream>
using namespace std;
int main(){
std::string fred = "a5";
std::string joe = "a11";
char x;
if ( fred.compare( joe ) )
{
std::cout << "fred is less than joe" << std::endl;
}
else
{
std::cout << "joe is less than fred" << std::endl;
}
cin >> x;
}
I'm posting this on behalf of a friend since I believe this is pretty interesting:
Take the string "abb". By leaving out
any number of letters less than the
length of the string we end up with 7
strings.
a b b ab ab bb abb
Out of these 4 are palindromes.
Similarly for the string
"hihellolookhavealookatthispalindromexxqwertyuiopasdfghjklzxcvbnmmnbvcxzlkjhgfdsapoiuytrewqxxsoundsfamiliardoesit"
(a length 112 string) 2^112 - 1
strings can be formed.
Out of these how many are
palindromes??
Below there is his implementation (in C++, C is fine too though). It's pretty slow with very long words; he wants to know what's the fastest algorithm possible for this (and I'm curious too :D).
#include <iostream>
#include <cstring>
using namespace std;
void find_palindrome(const char* str, const char* max, long& count)
{
for(const char* begin = str; begin < max; begin++) {
count++;
const char* end = strchr(begin + 1, *begin);
while(end != NULL) {
count++;
find_palindrome(begin + 1, end, count);
end = strchr(end + 1, *begin);
}
}
}
int main(int argc, char *argv[])
{
const char* s = "hihellolookhavealookatthis";
long count = 0;
find_palindrome(s, strlen(s) + s, count);
cout << count << endl;
}
First of all, your friend's solution seems to have a bug since strchr can search past max. Even if you fix this, the solution is exponential in time.
For a faster solution, you can use dynamic programming to solve this in O(n^3) time. This will require O(n^2) additional memory. Note that for long strings, even 64-bit ints as I have used here will not be enough to hold the solution.
#define MAX_SIZE 1000
long long numFound[MAX_SIZE][MAX_SIZE]; //intermediate results, indexed by [startPosition][endPosition]
long long countPalindromes(const char *str) {
int len = strlen(str);
for (int startPos=0; startPos<=len; startPos++)
for (int endPos=0; endPos<=len; endPos++)
numFound[startPos][endPos] = 0;
for (int spanSize=1; spanSize<=len; spanSize++) {
for (int startPos=0; startPos<=len-spanSize; startPos++) {
int endPos = startPos + spanSize;
long long count = numFound[startPos+1][endPos]; //if str[startPos] is not in the palindrome, this will be the count
char ch = str[startPos];
//if str[startPos] is in the palindrome, choose a matching character for the palindrome end
for (int searchPos=startPos; searchPos<endPos; searchPos++) {
if (str[searchPos] == ch)
count += 1 + numFound[startPos+1][searchPos];
}
numFound[startPos][endPos] = count;
}
}
return numFound[0][len];
}
Explanation:
The array numFound[startPos][endPos] will hold the number of palindromes contained in the substring with indexes startPos to endPos.
We go over all pairs of indexes (startPos, endPos), starting from short spans and moving to longer ones. For each such pair, there are two options:
The character at str[startPos] is not in the palindrome. In that case, there are numFound[startPos+1][endPos] possible palindromes - a number that we have calculated already.
character at str[startPos] is in the palindrome (at its beginning). We scan through the string to find a matching character to put at the end of the palindrome. For each such character, we use the already-calculated results in numFound to find number of possibilities for the inner palindrome.
EDIT:
Clarification: when I say "number of palindromes contained in a string", this includes non-contiguous substrings. For example, the palindrome "aba" is contained in "abca".
It's possible to reduce memory usage to O(n) by taking advantage of the fact that calculation of numFound[startPos][x] only requires knowledge of numFound[startPos+1][y] for all y. I won't do this here since it complicates the code a bit.
Pregenerating lists of indices containing each letter can make the inner loop faster, but it will still be O(n^3) overall.
I have a way can do it in O(N^2) time and O(1) space, however I think there must be other better ways.
the basic idea was the long palindrome must contain small palindromes, so we only search for the minimal match, which means two kinds of situation: "aa", "aba". If we found either , then expand to see if it's a part of a long palindrome.
int count_palindromic_slices(const string &S) {
int count = 0;
for (int position=0; position<S.length(); position++) {
int offset = 0;
// Check the "aa" situation
while((position-offset>=0) && (position+offset+1)<S.length() && (S.at(position-offset))==(S.at(position+offset+1))) {
count ++;
offset ++;
}
offset = 1; // reset it for the odd length checking
// Check the string for "aba" situation
while((position-offset>=0) && position+offset<S.length() && (S.at(position-offset))==(S.at(position+offset))) {
count ++;
offset ++;
}
}
return count;
}
June 14th, 2012
After some investigation, I believe this is the best way to do it.
faster than the accepted answer.
Is there any mileage in making an initial traversal and building an index of all occurances of each character.
h = { 0, 2, 27}
i = { 1, 30 }
etc.
Now working from the left, h, only possible palidromes are at 3 and 17, does char[0 + 1] == char [3 -1] etc. got a palindrome. does char [0+1] == char [27 -1] no, No further analysis of char[0] needed.
Move on to char[1], only need to example char[30 -1] and inwards.
Then can probably get smart, when you've identified a palindrome running from position x->y, all inner subsets are known palindromes, hence we've dealt with some items, can eliminate those cases from later examination.
My solution using O(n) memory and O(n^2) time, where n is the string length:
palindrome.c:
#include <stdio.h>
#include <string.h>
typedef unsigned long long ull;
ull countPalindromesHelper (const char* str, const size_t len, const size_t begin, const size_t end, const ull count) {
if (begin <= 0 || end >= len) {
return count;
}
const char pred = str [begin - 1];
const char succ = str [end];
if (pred == succ) {
const ull newCount = count == 0 ? 1 : count * 2;
return countPalindromesHelper (str, len, begin - 1, end + 1, newCount);
}
return count;
}
ull countPalindromes (const char* str) {
ull count = 0;
size_t len = strlen (str);
size_t i;
for (i = 0; i < len; ++i) {
count += countPalindromesHelper (str, len, i, i, 0); // even length palindromes
count += countPalindromesHelper (str, len, i, i + 1, 1); // odd length palindromes
}
return count;
}
int main (int argc, char* argv[]) {
if (argc < 2) {
return 0;
}
const char* str = argv [1];
ull count = countPalindromes (str);
printf ("%llu\n", count);
return 0;
}
Usage:
$ gcc palindrome.c -o palindrome
$ ./palindrome myteststring
EDIT: I misread the problem as the contiguous substring version of the problem. Now given that one wants to find the palindrome count for the non-contiguous version, I strongly suspect that one could just use a math equation to solve it given the number of distinct characters and their respective character counts.
Hmmmmm, I think I would count up like this:
Each character is a palindrome on it's own (minus repeated characters).
Each pair of the same character.
Each pair of the same character, with all palindromes sandwiched in the middle that can be made from the string between repeats.
Apply recursively.
Which seems to be what you're doing, although I'm not sure you don't double-count the edge cases with repeated characters.
So, basically, I can't think of a better way.
EDIT:
Thinking some more,
It can be improved with caching, because you sometimes count the palindromes in the same sub-string more than once. So, I suppose this demonstrates that there is definitely a better way.
Here is a program for finding all the possible palindromes in a string written in both Java and C++.
int main()
{
string palindrome;
cout << "Enter a String to check if it is a Palindrome";
cin >> palindrome;
int length = palindrome.length();
cout << "the length of the string is " << length << endl;
int end = length - 1;
int start = 0;
int check=1;
while (end >= start) {
if (palindrome[start] != palindrome[end]) {
cout << "The string is not a palindrome";
check=0;
break;
}
else
{
start++;
end--;
}
}
if(check)
cout << "The string is a Palindrome" << endl;
}
public String[] findPalindromes(String source) {
Set<String> palindromes = new HashSet<String>();
int count = 0;
for(int i=0; i<source.length()-1; i++) {
for(int j= i+1; j<source.length(); j++) {
String palindromeCandidate = new String(source.substring(i, j+1));
if(isPalindrome(palindromeCandidate)) {
palindromes.add(palindromeCandidate);
}
}
}
return palindromes.toArray(new String[palindromes.size()]);
}
private boolean isPalindrome(String source) {
int i =0;
int k = source.length()-1;
for(i=0; i<source.length()/2; i++) {
if(source.charAt(i) != source.charAt(k)) {
return false;
}
k--;
}
return true;
}
I am not sure but you might try whit fourier. This problem remined me on this: O(nlogn) Algorithm - Find three evenly spaced ones within binary string
Just my 2cents