How to get count of next combinations for given set?

How to get count of next combinations for given set? - c++

I've edited original text to save potential readers some time and health. Maybe someone will actually use this.
I know it's basic stuff. Probably like very, very basic.
How to get all possible combinations of given set.
E.g.
string set = "abc";
I expect to get:
a b c aa ab ac aaa aab aac aba abb abc aca acb acc baa bab ...
and the list goes on (if no limit for length is set).
I'm looking for a very clean code for that - all that I've found was kind of dirty and not working correctly. The same I can say about code I wrote.
I need such code because I'm writing brute force (md5) implementation working on multiple threads. The pattern is that there's Parent process that feeds threads with chunks of their very own combinations, so they would work on these on their own.
Example: first thread gets package of 100 permutations, second gets next 100 etc.
Let me know if I should post the final program anywhere.
EDIT #2
Once again thank you guys.
Thanks to you I've finished my Slave/Master Brute-Force application implemented with MPICH2 (yep, can work under linux and windows across for example network) and since the day is almost over here, and I've already wasted a lot of time (and sun) I'll proceed with my next task ... :)
You shown me that StackOverflow community is awesome - thanks!

Here's some C++ code that generates permutations of a power set up to a given length.
The function getPowPerms takes a set of characters (as a vector of strings) and a maximum length, and returns a vector of permuted strings:
#include <iostream>
using std::cout;
#include <string>
using std::string;
#include <vector>
using std::vector;
vector<string> getPowPerms( const vector<string>& set, unsigned length ) {
if( length == 0 ) return vector<string>();
if( length == 1 ) return set;
vector<string> substrs = getPowPerms(set,length-1);
vector<string> result = substrs;
for( unsigned i = 0; i < substrs.size(); ++i ) {
for( unsigned j = 0; j < set.size(); ++j ) {
result.push_back( set[j] + substrs[i] );
}
}
return result;
}
int main() {
const int MAX_SIZE = 3;
string str = "abc";
vector<string> set; // use vector for ease-of-access
for( unsigned i = 0; i < str.size(); ++i ) set.push_back( str.substr(i,1) );
vector<string> perms = getPowPerms( set, MAX_SIZE );
for( unsigned i = 0; i < perms.size(); ++i ) cout << perms[i] << '\n';
}
When run, this example prints
a b c aa ba ca ab bb cb ... acc bcc ccc
Update: I'm not sure if this is useful, but here is a "generator" function called next that creates the next item in the list given the current item.
Perhaps you could generate the first N items and send them somewhere, then generate the next N items and send them somewhere else.
string next( const string& cur, const string& set ) {
string result = cur;
bool carry = true;
int loc = cur.size() - 1;
char last = *set.rbegin(), first = *set.begin();
while( loc >= 0 && carry ) {
if( result[loc] != last ) { // increment
int found = set.find(result[loc]);
if( found != string::npos && found < set.size()-1 ) {
result[loc] = set.at(found+1);
}
carry = false;
} else { // reset and carry
result[loc] = first;
}
--loc;
}
if( carry ) { // overflow
result.insert( result.begin(), first );
}
return result;
}
int main() {
string set = "abc";
string cur = "a";
for( int i = 0; i < 20; ++i ) {
cout << cur << '\n'; // displays a b c aa ab ac ba bb bc ...
cur = next( cur, set );
}
}

C++ has a function next_permutation(), but I don't think that's what you want.
You should be able to do it quite easily with a recursive function. e.g.
void combinations(string s, int len, string prefix) {
if (len<1) {
cout << prefix << endl;
} else {
for (int i=0;i<s.size();i++) {
combinations(s, len-1, prefix + s[i])
}
}
}
EDIT: For the threading part, I assume you are working on a password brute forcer?
If so, I guess the password testing part is what you want to speed up rather than password generation.
Therefore, you could simply create a parent process which generates all combinations, then every kth password is given to thread k mod N (where N is the number of threads) for checking.

Another version of permutation is in Python's standard library although you questioned in C++.
http://docs.python.org/library/itertools.html#itertools.permutations
But your list contains an infinitive sequence of a each character, so I think the method that how to order those should be defined first, and state your algorithm clearly.

I can't give you the code but what you need is a recursive algorithm here is some pseudo code
The idea is simple, concatinate each string in your set with each and every other string, then permute the strings. add all your smaller strings to your set and do the same thing again with the new set. Keep going till you are tired :)
Might be a bit confusing but think about it a little ;)
set = { "a", "b", "c"}
build_combinations(set)
{
new_set={}
for( Element in set ){
new_set.add(Element);
for( other_element in set )
new_element = concatinate(Element, other_element);
new_set.add(new_element);
}
new_set = permute_all_elements(new_set);
return build_combinations(new_set);
}
This will obviously cause a stack overflow because there is no terminating condition :) so put into the build_combinations function what ever condition you like (maybe size of set?) to terminate the recursion

Here's an odd and normally not ideal way of doing it, but hey, it works, and it doesn't use recursion :-)
void permutations(char c[], int l) // l is the length of c
{
int length = 1;
while (length < 5)
{
for (int j = 0; j < int(pow(double(l), double(length))); j++) // for each word of a particular length
{
for (int i = 0; i < length; i++) // for each character in a word
{
cout << c[(j / int(pow(double(l), double(length - i - 1))) % l)];
}
cout << endl;
}
length++;
}
}

I know you've got a perfectly good answer already (multiple ones in fact), but I was thinking a bit about this problem and I came up with a pretty neat algorithm that I might as well share.
Basically, you can do this by starting with a list of the symbols, and then appending each symbol to each other symbol to make two symbol words, and then appending each symbol to each word. That might not make much sense like that, so here's what it looks like:
Start with 'a', 'b' and 'c' as the symbols and add them to a list:
a
b
c
Append 'a', 'b' and 'c' to each word in the list. The list then looks like:
a
b
c
aa
ab
ac
ba
bb
bc
ca
cb
cc
Then append 'a', 'b' and 'c' to each new word in the list so the list will look like this:
a
b
c
aa
ab
ac
ba
bb
bc
ca
cb
cc
aaa
aab
aac
aba
abb
... and so on
You can do this easily by using an iterator and just let the iterator keep going from the start.
This code prints out each word that is added to the list.
void permutations(string symbols)
{
list<string> l;
// add each symbol to the list
for (int i = 0; i < symbols.length(); i++)
{
l.push_back(symbols.substr(i, 1));
cout << symbols.substr(i, 1) << endl;
}
// infinite loop that looks at each word in the list
for (list<string>::iterator it = l.begin(); it != l.end(); it++)
{
// append each symbol to the current word and add it to the end of the list
for (int i = 0; i < symbols.length(); i++)
{
string s(*it);
s.push_back(symbols[i]);
l.push_back(s);
cout << s << endl;
}
}
}

a Python example:
import itertools
import string
characters = string.ascii_lowercase
max_length = 3
count = 1
while count < max_length+1:
for current_tuple in itertools.product(characters, repeat=count):
current_string = "".join(current_tuple)
print current_string
count += 1
The output is exactly what you expect to get:
a b c aa ab ac aaa aab aac aba abb abc aca acb acc baa bab ...
(the example is using the whole ASCII lowercase chars set, change "characters = ['a','b','c']" to reduce the size of output)

What you want is called Permutation.
Check this for a Permutation implementation in java

Related

Checking if items from a particular txt file agree to constraints in c++ - Name That Number USACO

I have got some doubts while solving - Name That Number.
It goes like this -
Among the large Wisconsin cattle ranchers, it is customary to brand cows with serial numbers to please the Accounting Department. The cowhands don't appreciate the advantage of this filing system, though, and wish to call the members of their herd by a pleasing name rather than saying, "C'mon, #4734, get along."
Help the poor cowhands out by writing a program that will translate the brand serial number of a cow into possible names uniquely associated with that serial number. Since the cowhands all have cellular saddle phones these days, use the standard Touch-Tone(R) telephone keypad mapping to get from numbers to letters (except for "Q" and "Z"):
2: A,B,C 5: J,K,L 8: T,U,V
3: D,E,F 6: M,N,O 9: W,X,Y
4: G,H,I 7: P,R,S
Acceptable names for cattle are provided to you in a file named "dict.txt", which contains a list of fewer than 5,000 acceptable cattle names (all letters capitalized). Take a cow's brand number and report which of all the possible words to which that number maps are in the given dictionary which is supplied as dict.txt in the grading environment (and is sorted into ascending order).
For instance, brand number 4734 produces all the following names:
GPDG GPDH GPDI GPEG GPEH GPEI GPFG GPFH GPFI GRDG GRDH GRDI
GREG GREH GREI GRFG GRFH GRFI GSDG GSDH GSDI GSEG GSEH GSEI
GSFG GSFH GSFI HPDG HPDH HPDI HPEG HPEH HPEI HPFG HPFH HPFI
HRDG HRDH HRDI HREG HREH HREI HRFG HRFH HRFI HSDG HSDH HSDI
HSEG HSEH HSEI HSFG HSFH HSFI IPDG IPDH IPDI IPEG IPEH IPEI
IPFG IPFH IPFI IRDG IRDH IRDI IREG IREH IREI IRFG IRFH IRFI
ISDG ISDH ISDI ISEG ISEH ISEI ISFG ISFH ISFI
As it happens, the only one of these 81 names that is in the list of valid names is "GREG".
Write a program that is given the brand number of a cow and prints all the valid names that can be generated from that brand number or ``NONE'' if there are no valid names. Serial numbers can be as many as a dozen digits long.
Here is what I tried to solve this problem. Just go through all the names in the list and check which is satisfying the constraints given.
int numForChar(char c){
if (c=='A'||c=='B'||c=='C') return 2;
else if(c=='D'||c=='E'||c=='F') return 3;
else if(c=='G'||c=='H'||c=='I') return 4;
else if(c=='J'||c=='K'||c=='L') return 5;
else if(c=='M'||c=='N'||c=='O') return 6;
else if(c=='P'||c=='R'||c=='S') return 7;
else if(c=='T'||c=='U'||c=='V') return 8;
else if(c=='W'||c=='X'||c=='Y') return 9;
else return 0;
int main(){
ios::sync_with_stdio(0);
cin.tie(0);
freopen("namenum.in","r",stdin);
freopen("namenum.out","w",stdout);
string S; cin >> S;
int len = S.length();
freopen("dict.txt","r",stdin);
string x;
while(cin >> x){
string currName = x;
if(currName.length() != S.length()) continue;
string newString = x;
for(int i=0;i<len;i++){
//now encode the name as a number according to the rules
int num = numForChar(currName[i]);
currName[i] = (char)num;
}
if(currName == S){
cout << newString << "\n";
}
}
return 0;
}
Unfortunately, when I submit it to the judge, for some reason, it says no output produced that is my program created an empty output file. What's possibly going wrong?
Any help would be much appreciated. Thank You.
UPDATE: I tried what Some Programmer Dude suggested by adding a statement else return 0; at the end of the numOfChar function in case of a different alphabet. Unfortunately, it didn't work.

So after looking further at the question and exploring the information for Name That Number. I realized that it is not a current contest, and just a practice challenge. Thus, I updated my answer and also giving you my version of a successful submission. Nonetheless, that is a spoiler and will be posted after why your code was not working.
First, you forgot a } after the declaration of your number function. Secondary, you did not implement anything to check whether if the input fail to yield a valid name. Third, when you use numForChar() on the character of currName, the function yielded an integer value. That is not a problem, the problem is that it is not the ASCII code but is a raw number. You then compare that against a character of the input string. Of which, is an ASCII's value of a digit. Thus, your code can't never find a match. To fix that you can just add 48 to the return value of the numForChar() function or xor the numForChar() return's value to 48.
You are on the right track with your method. But there is a few hints. If you are bored you can always skip to the spoiler. You don't need to use the numForChar() function to actually get a digit value from a character. You can just use a constant array. A constant array is faster than that many if loop.
For example, you know that A, B, C will yield two and A's ASCII code is 65, B's is 66, and C's equal to 67. For that 3, you can have an array of 3 indexes, 0, 1, 2 and all of them stores a 2. Thus, if you get B, you subtract B's ASCII code 65 will yield 1. That that is the index to get the value from.
For getting a number to a character you can have a matrix array of char instead. Skip the first 2 index, 0 and 1. Each first level index, contain 3 arrays of 3 characters that are appropriate to their position.
For dictionary comparing, it is right that we don't need to actually look at the word if the length are unequal. However, besides that, since their dictionary words are sorted, if the word's first letter is lower than the range of the input first letter, we can skip that. On the other hand, if words' first letter are now higher than the highest of the input first letter, there isn't a point in continue searching. Take note that my English for code commenting are almost always bad unless I extensively document it.
Your Code(fixed):
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int numForChar(char c){
if (c=='A'||c=='B'||c=='C') return 2;
else if(c=='D'||c=='E'||c=='F') return 3;
else if(c=='G'||c=='H'||c=='I') return 4;
else if(c=='J'||c=='K'||c=='L') return 5;
else if(c=='M'||c=='N'||c=='O') return 6;
else if(c=='P'||c=='R'||c=='S') return 7;
else if(c=='T'||c=='U'||c=='V') return 8;
else if(c=='W'||c=='X'||c=='Y') return 9;
else return 0;
}
int main(){
ios::sync_with_stdio(0);
cin.tie(0);
ifstream fin("namenum.in");
ifstream dict("dict.txt");
ofstream fout("namenum.out");
string S;
fin >> S;
int len = S.length();
bool match = false;
string x;
while(dict >> x){
string currName = x;
if(currName.length() != S.length()) continue;
string newString = x;
for(int i=0;i<len;i++){
//now encode the name as a number according to the rules
int num = numForChar(currName[i]) ^ 48;
currName[i] = (char)num;
}
if(currName == S){
fout << newString << "\n";
match = true;
}
}
if ( match == false ){
fout << "NONE" << endl;
}
return 0;
}
Spoiler Code(Improved):
#include <fstream>
#include <string>
using namespace std;
// A = 65
// 65 - 0 = 65
const char wToN[] = {
// A ,B ,C ,D ,E ,F ,G ,H ,I ,
'2','2','2','3','3','3','4','4','4',
// J ,K ,L ,M ,N ,O ,P ,Q ,R ,S
'5','5','5','6','6','6','7','7','7','7',
// T ,U ,V ,W ,X ,Y ,Z
'8','8','8','9','9','9','9'
};
// 2 = {A, B, C} = 2[0] = A, 2[1] = B, 2[2] C
const char nToW[10][3] = {
{}, // 0 skip
{}, // 1
{'A','B','C'},
{'D','E','F'},
{'G','H','I'},
{'J','K','L'},
{'M','N','O'},
{'P','R','S'},
{'T','U','V'},
{'W','X','Y'}
};
int main(){
ifstream fin("namenum.in");
ifstream dict("dict.txt");
ofstream fout("namenum.out");
string S;
fin >> S;
// Since this will not change
// make this a const to make it
// run faster.
const int len = S.length();
// lastlen is last Index of length
// We calculate this value here,
// So we do not have to calculate
// it for every loop.
const int lastLen = len - 1;
int i = 0;
unsigned char digits[len];
unsigned char firstLetter[3];
// If not match print None
bool match = false;
for ( ; i < len; i++ ){
// No need to check upper bound
// constrain did not call for check.
if ( S[i] < '2' ) {
fout << "NONE" << endl;
return 0;
}
}
const char digit1 = S[0] ^ 48;
// There are 3 set of first letter.
// We get them by converting digits[0]'s
// value using the nToW array.
firstLetter[0] = nToW[digit1][0];
firstLetter[1] = nToW[digit1][1];
firstLetter[2] = nToW[digit1][2];
string dictStr;
while(dict >> dictStr){
// For some reason, when keeping the i = 0 here
// it seem to work faster. That could be because of compiler xor.
i = 0;
// If it is higher than our range
// then there is no point contineuing.
if ( dictStr[0] > firstLetter[2] ) break;
// Skip if first character is lower
// than our range. or If they are not equal in length
if ( dictStr[0] < firstLetter[0] || dictStr.length() != len ) continue;
// If we are in the letter range
// we always check the second letter
// not the first, since we skip the first
i = 1;
for ( int j = 1; j < len; j++ ){
// We convert each letter in the word
// to the corresponding int value
// by subtracting the word ASCII value
// to 65 and use it again our wToN array.
// if it does not match the digits at
// this current position we end the loop.
if ( wToN[dictStr[i] - 65] != S[j] ) break;
// if we get here and there isn't an unmatch then it is a match.
if ( j == lastLen ) {
match = true;
fout << dictStr << endl;
break;
}
i++;
}
}
// No match print none.
if ( match == false ){
fout << "NONE" << endl;
}
return 0;
}

I suggest you use c++ file handling. Overwriting stdin and stdout doesn't seem appropriate.
Add these,
std::ifstream dict ("dict.txt");
std::ofstream fout ("namenum.out");
std::ifstream fin ("namenum.in");
Accordingly change,
cin >> S --to--> fin >> S;
cin >> x --to--> dict >> x
cout << newString --to--> fout << newString

String having maximum number of given substrings made after swapping some characters?

So, this is an interview question that I was going through.
I have strings a, b, and c. I want to obtain string k by swapping some letters in a, so that k should contain as many non-overlapping substrings equal either to b or c as possible. Substring of string x is a string formed by consecutive segment of characters from x. Two substrings of string x overlap if there is position i in string x occupied by both of them.
Input: The first line contains string a, the second line contains string b, and the third line contains string c (1 ≤ |a|, |b|, |c| ≤ 10^5, where |s| denotes the length of string s).
All three strings consist only of lowercase English letters.
It is possible that b and c coincide.
Output: Find one of possible strings k.
Example:
I/P
abbbaaccca
ab
aca
O/P
ababacabcc
this optimal solutions has three non-overlaping substrings equal to either b or c on positions 1 – 2 (ab), 3 – 4 (ab), 5 – 7 (aca).
Now, the approach that I could think of was to make a character count array for each of the strings, and then proceed ahead. Basically, iterate over the original string (a), check for occurences of b and c. If not there, swap as many characters as possible to make either b or c (whichever is shorter). But, clearly this is not the optimal approach.
Can anyone suggest something better? (Only pseudocode will be enough)
Thanks!

First thing is you'll need to do is count the number of occurrences of each character of each string. The occurrences count of a will be your knapsack, whom you'll need to fill with as many b's or c's.
Note that when I say knapsack I mean the character count vector of a, and inserting b to a will mean reducing the character count vector of a by the character count vector of b.
I'm a little bit short with my mathematical prove, but you'll need to
insert as many b as possible to the knapsack
Insert as many c as possible to the knapsack (in the space that left after 1).
If a removal of a b from the knapsack will enable an insertion of more c, remove b from the knapsack. Otherwise, finish.
Fill as many c that you can to the knapsack
Repeat 3-4.
Throughout the program count the number of b and c in the knapsack and the output should be:
[b_count times b][c_count times c][char_occurrence_left_in_knapsack_for_char_x times char_x for each char_x in lower_case_english]
This should solve your problem at O(n).

Assuming that allowed characters have ASCII code 0-127, I would write a function to count the occurence of each character in a string:
int[] count(String s) {
int[] res = new int[128];
for(int i=0; i<res.length(); i++)
res[i] = 0;
for(int i=0; i<a.length(); i++)
res[i]++;
return res;
}
We can now count occurrences in each string:
int aCount = count(a);
int bCount = count(b);
int cCount = count(c);
We can then write a function to count how many times a string can be carved out of characters of another string:
int carveCount(int[] strCount, int[] subStrCount) {
int min = Integer.MAX_VALUE;
for(int i=0; i<subStrCount.length(); i++) {
if (subStrCount[i] == 0)
continue;
if (strCount[i] >= subStrCount[i])
min = Math.min(min, strCount[i]-subStrCount[i]);
else {
return 0;
}
}
for(int i=0; i<subStrCount.length(); i++) {
if (subStrCount[i] != 0)
strStrCount[i] -= min;
}
return min;
}
and call the function:
int bFitCount = carve(aCount, bCount);
int cFitCount = carve(aCount, cCount);
EDIT: I didn't realize you wanted all characters originally in a, fixing here.
Finally, to produce the output:
StringBuilder sb = new StringBuilder();
for(int i=0; i<bFitCount; i++) {
sb.append(b);
for(int i=0; i<cFitCount; i++) {
sb.append(c);
for(int i=0; i<aCount.length; i++) {
for(int j=0; j<aCount[i]; j++)
sb.append((char)i);
}
return sb.toString();
One more comment: if the goal is to maximize the number of repetitions(b)+repetitions(c), then you may want to first swab b and c if c is shorter. This way if they share some characters you have better chance of increasing the result.
The algorithm could be optimized further, but as it is it should have complexity O(n), where n is the sum of the length of the three strings.

A related problem is called Knapsack problem.
This is basically the solution described by #Tal Shalti.
I tried to keep everything readable.
My program return abbcabacac as one of the string with the most occurences (3).
To get all permutations without repeating a permutation I use std::next_permutation from algorithm. There not much happening in the main function. I only store the number of occurrences and the permutation, if a higher number of occurrences was achieved.
int main()
{
std::string word = "abbbaaccca";
std::string patternSmall = "ab";
std::string patternLarge = "aca";
unsigned int bestOccurrence = 0;
std::string bestPermutation = "";
do {
// count and remove occurrence
unsigned int occurrences = FindOccurences(word, patternLarge, patternSmall);
if (occurrences > bestOccurrence) {
bestOccurrence = occurrences;
bestPermutation = word;
std::cout << word << " .. " << occurences << std::endl;
}
} while (std::next_permutation(word.begin(), word.end()));
std::cout << "Best Permutation " << bestPermutation << " with " << bestOccurrence << " occurrences." << std::endl;
return 0;
}
This function handles the basic algorithm. pattern1 is the longer pattern, so it will be searched for last. If a pattern is found, it will be replaced with the string "##", since this should be very rare in the English language.
The variable occurrenceCounter keeps track of the number of found occurences.
unsigned int FindOccurrences(const std::string& word, const std::string& pattern1, const std::string& pattern2)
{
unsigned int occurrenceCounter = 0;
std::string tmpWord(word);
// '-1' makes implementation of while() easier
std::string::size_type i = -1;
i = -1;
while (FindPattern(tmpWord, pattern2, ++i)) {
occurrenceCounter++;
tmpWord.replace(tmpWord.begin() + i, tmpWord.begin() + i + pattern2.size(), "##");
}
i = -1;
while (FindPattern(tmpWord, pattern1, ++i)) {
occurrenceCounter++;
tmpWord.replace(tmpWord.begin() + i, tmpWord.begin() + i + pattern1.size(), "##");
}
return occurrenceCounter;
}
This function returns the first position of the found pattern. If the pattern is not found, std::string::npos is returned by string.find(...). Also string.find(...) starts to search for the pattern starting by index i.
bool FindPattern(const std::string& word, const std::string& pattern, std::string::size_type& i)
{
std::string::size_type foundPosition = word.find(pattern, i);
if (foundPosition == std::string::npos) {
return false;
}
i = foundPosition;
return true;
}

Longest prefix string length for all the suffixes

Find the length of the longest prefix string for all the suffixes of the string.
For example suffixes of the string ababaa are ababaa, babaa, abaa, baa, aa and a. The similarities of each of these strings with the string "ababaa" are 6,0,3,0,1,1 respectively. Thus the answer is 6 + 0 + 3 + 0 + 1 + 1 = 11.
I wrote following code
#include <iostream>
#include <string.h>
#include <stdio.h>
#include <time.h>
int main ( int argc, char **argv) {
size_t T;
std::cin >> T;
char input[100000];
for ( register size_t i = 0; i < T; ++i) {
std::cin >> input;
double t = clock();
size_t len = strlen(input);
char *left = input;
char *right = input + len - 1;
long long sol = 0;
int end_count = 1;
while ( left < right ) {
if ( *right != '\0') {
if ( *left++ == *right++ ) {
sol++;
continue;
}
}
end_count++;
left = input; // reset the left pointer
right = input + len - end_count; // set right to one left.
}
std::cout << sol + len << std::endl;
printf("time= %.3fs\n", (clock() - t) / (double)(CLOCKS_PER_SEC));
}
}
Working fine, but for a string which is 100000 long and having same character i.e. aaaaaaaaaa.......a, it is taking long time , how can i optimize this one more.

You can use Suffix Array: http://en.wikipedia.org/wiki/Suffix_array

Let's say your ababaa is a pattern P.
I think you could use the following algorithm:
Create a suffix automata for all possible suffixes of P.
Walk the automata using P as input, count edges traversed so far. For each accepting state of the automata add the current edge count to total sum. Walk the automata until you either reach the end of the input or there are no more edges to go through.
The total sum is the result.

Use Z algorithm to calculate length of all substrings, which also prefixes in O(n) and then scan resulting array and sum its values.
Reference: https://www.geeksforgeeks.org/sum-of-similarities-of-string-with-all-of-its-suffixes/

From what I see, you are using plain array to evaluate the suffix and though it may turn out to be efficient for some data set, it would fail to be efficient for some cases, such as the one you mentioned.
You would need to implement a Prefix-Tree or Trie like Data Structure. The code for those aren't straightforward, so if you are not familiar with them, I would suggest you read a little bit about them.

I'm not sure whether a Trie gives you much performance gain.. but I would certainly think about it.
The other idea I had is to try to compress your string. I didn't really think about it, just a crazy idea...
if you have a string like this: ababaa compress it maybe to: abab2a. Then you have to come up with a technique where you can use your algorithm with those strings. The advantage is you can then compare long strings 100000a efficiently with each other. Or more importantly: you can calculate your sum very fast.
But again, I didn't think it through, maybe this is a very bad idea ;)

Here a java implementation:
// sprefix
String s = "abababa";
Vector<Integer>[] v = new Vector[s.length()];
int sPrefix = s.length();
v[0] = new Vector<Integer>();
v[0].add(new Integer(0));
for(int j = 1; j < s.length(); j++)
{
v[j] = new Vector<Integer>();
v[j].add(new Integer(0));
for(int k = 0; k < v[j - 1].size(); k++)
if(s.charAt(j) == s.charAt(v[j - 1].get(k)))
{
v[j].add(v[j - 1].get(k) + 1);
v[j - 1].set(k, 0);
}
}
for(int j = 0; j < v.length; j++)
for(int k = 0; k < v[j].size(); k++)
sPrefix += v[j].get(k);
System.out.println("Result = " + sPrefix);

How to find string in a string

I somehow need to find the longest string in other string, so if string1 will be "Alibaba" and string2 will be "ba" , the longest string will be "baba". I have the lengths of strings, but what next ?
char* fun(char* a, char& b)
{
int length1=0;
int length2=0;
int longer;
int shorter;
char end='\0';
while(a[i] != tmp)
{
i++;
length1++;
}
int i=0;
while(b[i] != tmp)
{
i++;
length++;
}
if(dlug1 > dlug2){
longer = length1;
shorter = length2;
}
else{
longer = length2;
shorter = length1;
}
//logics here
}
int main()
{
char name1[] = "Alibaba";
char name2[] = "ba";
char &oname = *name2;
cout << fun(name1, oname) << endl;
system("PAUSE");
return 0;
}

Wow lots of bad answers to this question. Here's what your code should do:
Find the first instance of "ba" using the standard string searching functions.
In a loop look past this "ba" to see how many of the next N characters are also "ba".
If this sequence is longer than the previously recorded longest sequence, save its length and position.
Find the next instance of "ba" after the last one.
Here's the code (not tested):
string FindLongestRepeatedSubstring(string longString, string shortString)
{
// The number of repetitions in our longest string.
int maxRepetitions = 0;
int n = shortString.length(); // For brevity.
// Where we are currently looking.
int pos = 0;
while ((pos = longString.find(shortString, pos)) != string::npos)
{
// Ok we found the start of a repeated substring. See how many repetitions there are.
int repetitions = 1;
// This is a little bit complicated.
// First go past the "ba" we have already found (pos += n)
// Then see if there is still enough space in the string for there to be another "ba"
// Finally see if it *is* "ba"
for (pos += n; pos+n < longString.length() && longString.substr(pos, n) == shortString; pos += n)
++repetitions;
// See if this sequence is longer than our previous best.
if (repetitions > maxRepetitions)
maxRepetitions = repetitions;
}
// Construct the string to return. You really probably want to return its position, or maybe
// just maxRepetitions.
string ret;
while (maxRepetitions--)
ret += shortString;
return ret;
}

What you want should look like this pseudo-code:
i = j = count = max = 0
while (i < length1 && c = name1[i++]) do
if (j < length2 && name2[j] == c) then
j++
else
max = (count > max) ? count : max
count = 0
j = 0
end
if (j == length2) then
count++
j = 0
end
done
max = (count > max) ? count : max
for (i = 0 to max-1 do
print name2
done
The idea is here but I feel that there could be some cases in which this algorithm won't work (cases with complicated overlap that would require going back in name1). You may want to have a look at the Boyer-Moore algorithm and mix the two to have what you want.

The Algorithms Implementation Wikibook has an implementation of what you want in C++.

http://www.cplusplus.com/reference/string/string/find/
Maybe you made it on purpose, but you should use the std::string class and forget archaic things like char* string representation.
It will make you able to use lots of optimized methods, such as string research, etc.

why dont you use strstr function provided by C.
const char * strstr ( const char * str1, const char * str2 );
char * strstr ( char * str1, const char * str2 );
Locate substring
Returns a pointer to the first occurrence of str2 in str1,
or a null pointer if str2 is not part of str1.
The matching process does not include the terminating null-characters.
use the length's now and create a loop and play with the original string anf find the longest string inside.

How many palindromes can be formed by selections of characters from a string?

I'm posting this on behalf of a friend since I believe this is pretty interesting:
Take the string "abb". By leaving out
any number of letters less than the
length of the string we end up with 7
strings.
a b b ab ab bb abb
Out of these 4 are palindromes.
Similarly for the string
"hihellolookhavealookatthispalindromexxqwertyuiopasdfghjklzxcvbnmmnbvcxzlkjhgfdsapoiuytrewqxxsoundsfamiliardoesit"
(a length 112 string) 2^112 - 1
strings can be formed.
Out of these how many are
palindromes??
Below there is his implementation (in C++, C is fine too though). It's pretty slow with very long words; he wants to know what's the fastest algorithm possible for this (and I'm curious too :D).
#include <iostream>
#include <cstring>
using namespace std;
void find_palindrome(const char* str, const char* max, long& count)
{
for(const char* begin = str; begin < max; begin++) {
count++;
const char* end = strchr(begin + 1, *begin);
while(end != NULL) {
count++;
find_palindrome(begin + 1, end, count);
end = strchr(end + 1, *begin);
}
}
}
int main(int argc, char *argv[])
{
const char* s = "hihellolookhavealookatthis";
long count = 0;
find_palindrome(s, strlen(s) + s, count);
cout << count << endl;
}

First of all, your friend's solution seems to have a bug since strchr can search past max. Even if you fix this, the solution is exponential in time.
For a faster solution, you can use dynamic programming to solve this in O(n^3) time. This will require O(n^2) additional memory. Note that for long strings, even 64-bit ints as I have used here will not be enough to hold the solution.
#define MAX_SIZE 1000
long long numFound[MAX_SIZE][MAX_SIZE]; //intermediate results, indexed by [startPosition][endPosition]
long long countPalindromes(const char *str) {
int len = strlen(str);
for (int startPos=0; startPos<=len; startPos++)
for (int endPos=0; endPos<=len; endPos++)
numFound[startPos][endPos] = 0;
for (int spanSize=1; spanSize<=len; spanSize++) {
for (int startPos=0; startPos<=len-spanSize; startPos++) {
int endPos = startPos + spanSize;
long long count = numFound[startPos+1][endPos]; //if str[startPos] is not in the palindrome, this will be the count
char ch = str[startPos];
//if str[startPos] is in the palindrome, choose a matching character for the palindrome end
for (int searchPos=startPos; searchPos<endPos; searchPos++) {
if (str[searchPos] == ch)
count += 1 + numFound[startPos+1][searchPos];
}
numFound[startPos][endPos] = count;
}
}
return numFound[0][len];
}
Explanation:
The array numFound[startPos][endPos] will hold the number of palindromes contained in the substring with indexes startPos to endPos.
We go over all pairs of indexes (startPos, endPos), starting from short spans and moving to longer ones. For each such pair, there are two options:
The character at str[startPos] is not in the palindrome. In that case, there are numFound[startPos+1][endPos] possible palindromes - a number that we have calculated already.
character at str[startPos] is in the palindrome (at its beginning). We scan through the string to find a matching character to put at the end of the palindrome. For each such character, we use the already-calculated results in numFound to find number of possibilities for the inner palindrome.
EDIT:
Clarification: when I say "number of palindromes contained in a string", this includes non-contiguous substrings. For example, the palindrome "aba" is contained in "abca".
It's possible to reduce memory usage to O(n) by taking advantage of the fact that calculation of numFound[startPos][x] only requires knowledge of numFound[startPos+1][y] for all y. I won't do this here since it complicates the code a bit.
Pregenerating lists of indices containing each letter can make the inner loop faster, but it will still be O(n^3) overall.

I have a way can do it in O(N^2) time and O(1) space, however I think there must be other better ways.
the basic idea was the long palindrome must contain small palindromes, so we only search for the minimal match, which means two kinds of situation: "aa", "aba". If we found either , then expand to see if it's a part of a long palindrome.
int count_palindromic_slices(const string &S) {
int count = 0;
for (int position=0; position<S.length(); position++) {
int offset = 0;
// Check the "aa" situation
while((position-offset>=0) && (position+offset+1)<S.length() && (S.at(position-offset))==(S.at(position+offset+1))) {
count ++;
offset ++;
}
offset = 1; // reset it for the odd length checking
// Check the string for "aba" situation
while((position-offset>=0) && position+offset<S.length() && (S.at(position-offset))==(S.at(position+offset))) {
count ++;
offset ++;
}
}
return count;
}
June 14th, 2012
After some investigation, I believe this is the best way to do it.
faster than the accepted answer.

Is there any mileage in making an initial traversal and building an index of all occurances of each character.
h = { 0, 2, 27}
i = { 1, 30 }
etc.
Now working from the left, h, only possible palidromes are at 3 and 17, does char[0 + 1] == char [3 -1] etc. got a palindrome. does char [0+1] == char [27 -1] no, No further analysis of char[0] needed.
Move on to char[1], only need to example char[30 -1] and inwards.
Then can probably get smart, when you've identified a palindrome running from position x->y, all inner subsets are known palindromes, hence we've dealt with some items, can eliminate those cases from later examination.

My solution using O(n) memory and O(n^2) time, where n is the string length:
palindrome.c:
#include <stdio.h>
#include <string.h>
typedef unsigned long long ull;
ull countPalindromesHelper (const char* str, const size_t len, const size_t begin, const size_t end, const ull count) {
if (begin <= 0 || end >= len) {
return count;
}
const char pred = str [begin - 1];
const char succ = str [end];
if (pred == succ) {
const ull newCount = count == 0 ? 1 : count * 2;
return countPalindromesHelper (str, len, begin - 1, end + 1, newCount);
}
return count;
}
ull countPalindromes (const char* str) {
ull count = 0;
size_t len = strlen (str);
size_t i;
for (i = 0; i < len; ++i) {
count += countPalindromesHelper (str, len, i, i, 0); // even length palindromes
count += countPalindromesHelper (str, len, i, i + 1, 1); // odd length palindromes
}
return count;
}
int main (int argc, char* argv[]) {
if (argc < 2) {
return 0;
}
const char* str = argv [1];
ull count = countPalindromes (str);
printf ("%llu\n", count);
return 0;
}
Usage:
$ gcc palindrome.c -o palindrome
$ ./palindrome myteststring
EDIT: I misread the problem as the contiguous substring version of the problem. Now given that one wants to find the palindrome count for the non-contiguous version, I strongly suspect that one could just use a math equation to solve it given the number of distinct characters and their respective character counts.

Hmmmmm, I think I would count up like this:
Each character is a palindrome on it's own (minus repeated characters).
Each pair of the same character.
Each pair of the same character, with all palindromes sandwiched in the middle that can be made from the string between repeats.
Apply recursively.
Which seems to be what you're doing, although I'm not sure you don't double-count the edge cases with repeated characters.
So, basically, I can't think of a better way.
EDIT:
Thinking some more,
It can be improved with caching, because you sometimes count the palindromes in the same sub-string more than once. So, I suppose this demonstrates that there is definitely a better way.

Here is a program for finding all the possible palindromes in a string written in both Java and C++.

int main()
{
string palindrome;
cout << "Enter a String to check if it is a Palindrome";
cin >> palindrome;
int length = palindrome.length();
cout << "the length of the string is " << length << endl;
int end = length - 1;
int start = 0;
int check=1;
while (end >= start) {
if (palindrome[start] != palindrome[end]) {
cout << "The string is not a palindrome";
check=0;
break;
}
else
{
start++;
end--;
}
}
if(check)
cout << "The string is a Palindrome" << endl;
}

public String[] findPalindromes(String source) {
Set<String> palindromes = new HashSet<String>();
int count = 0;
for(int i=0; i<source.length()-1; i++) {
for(int j= i+1; j<source.length(); j++) {
String palindromeCandidate = new String(source.substring(i, j+1));
if(isPalindrome(palindromeCandidate)) {
palindromes.add(palindromeCandidate);
}
}
}
return palindromes.toArray(new String[palindromes.size()]);
}
private boolean isPalindrome(String source) {
int i =0;
int k = source.length()-1;
for(i=0; i<source.length()/2; i++) {
if(source.charAt(i) != source.charAt(k)) {
return false;
}
k--;
}
return true;
}

I am not sure but you might try whit fourier. This problem remined me on this: O(nlogn) Algorithm - Find three evenly spaced ones within binary string
Just my 2cents

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to get count of next combinations for given set? - c++

What you want is called Permutation. Check this for a Permutation implementation in java

Related

Checking if items from a particular txt file agree to constraints in c++ - Name That Number USACO

String having maximum number of given substrings made after swapping some characters?

Longest prefix string length for all the suffixes

How to find string in a string

How many palindromes can be formed by selections of characters from a string?

Categories

Resources