Smallest Binary String not Contained in Another String

Smallest Binary String not Contained in Another String - c++

The question description is relatively simple, an example is given
input: 10100011
output: 110
I have tried using BFS but I don't think this is an efficient enough solution (maybe some sort of bitmap + sliding window solution?)
string IntToString(int a)
{
ostringstream temp;
temp << a;
return temp.str();
}
bool is_subsequence(string& s, string& sub) {
if(sub.length() > s.length()) return false;
int pos = 0;
for(char c : sub)
{
pos = s.find(c, pos);
if(pos == string::npos) return false;
++pos;
}
return true;
}
string shortestNotSubsequence(string& s) {
Queue q(16777216);
q.push(0);
q.push(1);
while(!q.empty())
{
string str;
int num = q.front; q.pop();
str = IntToString(num);
if(!is_subsequence(s, str)) return str;
string z = str + '0';
string o = str + '1';
q.push(stoi(str+'0'));
q.push(stoi(str+'1'));
}
return "";
}
int main() {
string N;
cin >> N;
cout << shortestNotSubsequence(N) << endl;
return 0;
}

You can do this pretty easily in O(N) time.
Let W = ceiling(log2(N+1)), where N is the length of the input string S.
There are 2W possible strings of length W. S must have less than N of them as substrings, and that's less than 2W, so at least one string of length W must not be present in S.
W is also less than the number of bits in a size_t, and it only takes O(N) space to store a mask of all possible strings of length W. Initialize such a mask to 0s, and then iterate through S using the lowest W bits in a size_t as a sliding window of the substrings you encounter. Set the mask bit for each substring you encounter to 1.
When you're done, scan the mask to find the first 0, and that will be a string of length W that's missing.
There may also be shorter missing strings, though, so merge the mask bits in pairs to make a mask for the strings of length W-1, and then also set the mask bit for the last W-1 bits in S, since those might not be included in any W-length string. Then scan the mask for 0s to see if you can find a shorter missing string.
As long as you keep finding shorter strings, keep merging the mask for smaller strings until you get to length 1. Since each such operation divides the mask size in 2, that doesn't affect the overall O(N) time for the whole algorithm.
Here's an implementation in C++
#include <string>
#include <vector>
#include <algorithm>
std::string shortestMissingBinaryString(const std::string instr) {
const size_t len = instr.size();
if (len < 2) {
if (!len || instr[0] != '0') {
return std::string("0");
}
return std::string("1");
}
// Find a string size guaranteed to be missing
size_t W_mask = 0x3;
unsigned W = 2;
while(W_mask < len) {
W_mask |= W_mask<<1;
W+=1;
}
// Make a mask of all the W-length substrings that are present
std::vector<bool> mask(W_mask+1, false);
size_t lastSubstr=0;
for (size_t i=0; i<len; ++i) {
lastSubstr = (lastSubstr<<1) & W_mask;
if (instr[i] != '0') {
lastSubstr |= 1;
}
if (i+1 >= W) {
mask[lastSubstr] = true;
}
}
//Find missing substring of length W
size_t found = std::find(mask.begin(), mask.end(), false) - mask.begin();
// try to find a shorter missing substring
while(W > 1) {
unsigned testW = W - 1;
W_mask >>= 1;
// calculate masks for length testW
for (size_t i=0; i<=W_mask; i++) {
mask[i] = mask[i*2] || mask[i*2+1];
}
mask.resize(W_mask+1);
// don't forget the missing substring at the end
mask[lastSubstr & W_mask] = true;
size_t newFound = std::find(mask.begin(), mask.end(), false) - mask.begin();
if (newFound > W_mask) {
// no shorter string
break;
}
W = testW;
found = newFound;
}
// build the output string
std::string ret;
for (size_t bit = ((size_t)1) << (W-1); bit; bit>>=1) {
ret.push_back((found & bit) ? '1': '0');
}
return ret;
}

Related

Checking whether a String is a Lapindrome or not [duplicate]

This question already has answers here:
How to find whether the string is a Lapindrome? [closed]
(2 answers)
Closed 2 years ago.
The question is to check whether a given string is a lapindrome or not(CodeChef). According to the question, Lapindrome is defined as a string which when split in the middle, gives two halves having the same characters and same frequency of each character.
I have tried solving the problem using C++ with the code below
#include <iostream>
#include<cstring>
using namespace std;
bool lapindrome(char s[],int len){
int firstHalf=0,secondHalf=0;
char c;
for(int i=0,j=len-1;i<j;i++,j--){
firstHalf += int(s[i]);
secondHalf += int(s[j]);
}
if(firstHalf == secondHalf){
return true;
}
else
return false;
}
int main() {
// your code goes here
int t,len;
bool result;
char s[1000];
cin>>t;
while(t){
cin>>s;
len = strlen(s);
result = lapindrome(s,len);
if(result == true)
cout<<"YES"<<endl;
else
cout<<"NO"<<endl;
--t;
}
return 0;
}
I have taken two count variables which will store the sum of ascii code of characters from first half and second half. Then those two variables are compared to check whether both the halves are equal or not.
I have tried the code on a couple of custom inputs and it works fine. But after I submit the code, the solution seems to be wrong.

Replace the lapindrome function to this one:
bool isLapindrome(std::string str)
{
int val1[MAX] = {0};
int val2[MAX] = {0};
int n = str.length();
if (n == 1)
return true;
for (int i = 0, j = n - 1; i < j; i++, j--)
{
val1[str[i] - 'a']++;
val2[str[j] - 'a']++;
}
for (int i = 0; i < MAX; i++)
if (val1[i] != val2[i])
return false;
return true;
}
Example Output
Input a string here: asdfsasd
The string is NOT a lapindrome.
---
Input a string here: asdfsdaf
The string is a lapindrome.
Enjoy!

You're not counting frequencies of the characters, only their sum. You could simply split the string into halves, create two maps for character frequencies of both sides e.g. std::map containing the count for each character. Then You can compare both maps with something like std::equal to check the complete equality of the maps (to see whether the halves are the same in terms of character frequency).

Instead of counting the frequency of characters (in the two halfs of input string) in two arrays or maps, it's actually sufficient to count them in one as well.
For this, negative counts have to be allowed.
Sample code:
#include <iostream>
#include <string>
#include <unordered_map>
bool isLapindrome(const std::string &text)
{
std::unordered_map<unsigned char, int> freq;
// iterate until index (growing from begin) and
// 2nd index (shrinking from end) cross over
for (size_t i = 0, j = text.size(); i < j--; ++i) {
++freq[(unsigned char)text[i]]; // count characters of 1st half positive
--freq[(unsigned char)text[j]]; // count characters of 2nd half negative
}
// check whether positive and negative counts didn't result in 0
// for at least one counted char
for (const std::pair<unsigned char, int> &entry : freq) {
if (entry.second != 0) return false;
}
// Otherwise, the frequencies were balanced.
return true;
}
int main()
{
auto check = [](const std::string &text) {
std::cout << '\'' << text << "': "
<< (isLapindrome(text) ? "yes" : "no")
<< '\n';
};
check("");
check("abaaab");
check("gaga");
check("abccab");
check("rotor");
check("xyzxy");
check("abbaab");
}
Output:
'': yes
'abaaab': yes
'gaga': yes
'abccab': yes
'rotor': yes
'xyzxy': yes
'abbaab': no
Live Demo on coliru
Note:
About the empty input string, I was a bit uncertain. If it's required to not to count as Lapindrome then an additional check is needed in isLapindrome(). This could be achieved with changing the final
return true;
to
return !text.empty(); // Empty input is considered as false.

The problem with your code was, that you only compare the sum of the characters. What's meant by frequency is that you have to count the occurrence of each character. Instead of counting frequencies in maps like in the other solutions here, you can simply sort and compare the two strings.
#include <iostream>
#include <string>
#include <algorithm>
bool lapindrome(const std::string& s) {
// true if size = 1, false if size = 0
if(s.size() <= 1) return (s.size());
std::string first_half = s.substr(0, s.size() / 2);
std::sort(first_half.begin(), first_half.end());
std::string second_half = s.substr(s.size() / 2 + s.size() % 2);
std::sort(second_half.begin(), second_half.end());
return first_half == second_half;
}
// here's a shorter hacky alternative:
bool lapindrome_short(std::string s) {
if (s.size() <= 1) return (s.size());
int half = s.size() / 2;
std::sort(s.begin(), s.begin() + half);
std::sort(s.rbegin(), s.rbegin() + half); // reverse half
return std::equal(s.begin(), s.begin() + half, s.rbegin());
}
int main() {
int count;
std::string input;
std::cin >> count;
while(count--) {
std::cin >> input;
std::cout << input << ": "
<< (lapindrome(input) ? "YES" : "NO") << std::endl;
}
return 0;
}
Live Demo

Run-length decompression using C++

I have a text file with a string which I encoded.
Let's say it is: aaahhhhiii kkkjjhh ikl wwwwwweeeett
Here the code for encoding, which works perfectly fine:
void Encode(std::string &inputstring, std::string &outputstring)
{
for (int i = 0; i < inputstring.length(); i++) {
int count = 1;
while (inputstring[i] == inputstring[i+1]) {
count++;
i++;
}
if(count <= 1) {
outputstring += inputstring[i];
} else {
outputstring += std::to_string(count);
outputstring += inputstring[i];
}
}
}
Output is as expected: 3a4h3i 3k2j2h ikl 6w4e2t
Now, I'd like to decompress the output - back to original.
And I am struggling with this since a couple days now.
My idea so far:
void Decompress(std::string &compressed, std::string &original)
{
char currentChar = 0;
auto n = compressed.length();
for(int i = 0; i < n; i++) {
currentChar = compressed[i++];
if(compressed[i] <= 1) {
original += compressed[i];
} else if (isalpha(currentChar)) {
//
} else {
//
int number = isnumber(currentChar).....
original += number;
}
}
}
I know my Decompress function seems a bit messy, but I am pretty lost with this one.
Sorry for that.
Maybe there is someone out there at stackoverflow who would like to help a lost and beginner soul.
Thanks for any help, I appreciate it.

Assuming input strings cannot contain digits (this cannot be covered by your encoding as e. g. both the strings "3a" and "aaa" would result in the encoded string "3a" – how would you ever want to decompose again?) then you can decompress as follows:
unsigned int num = 0;
for(auto c : compressed)
{
if(std::isdigit(static_cast<unsigned char>(c)))
{
num = num * 10 + c - '0';
}
else
{
num += num == 0; // assume you haven't read a digit yet!
while(num--)
{
original += c;
}
}
}
Untested code, though...
Characters in a string actually are only numerical values, though. You can consider char (or signed char, unsigned char) as ordinary 8-bit integers as well. And you can store a numerical value in such a byte, too. Usually, you do run length encoding exactly that way: Count up to 255 equal characters, store the count in a single byte and the character in another byte. One single "a" would then be encoded as 0x01 0x61 (the latter being the ASCII value of a), "aa" would get 0x02 0x61, and so on. If you have to store more than 255 equal characters you store two pairs: 0xff 0x61, 0x07 0x61 for a string containing 262 times the character a... Decoding then gets trivial: you read characters pairwise, first byte you interpret as number, second one as character – rest being trivial. And you nicely cover digits that way as well.

#include "string"
#include "iostream"
void Encode(std::string& inputstring, std::string& outputstring)
{
for (unsigned int i = 0; i < inputstring.length(); i++) {
int count = 1;
while (inputstring[i] == inputstring[i + 1]) {
count++;
i++;
}
if (count <= 1) {
outputstring += inputstring[i];
}
else {
outputstring += std::to_string(count);
outputstring += inputstring[i];
}
}
}
bool alpha_or_space(const char c)
{
return isalpha(c) || c == ' ';
}
void Decompress(std::string& compressed, std::string& original)
{
size_t i = 0;
size_t repeat;
while (i < compressed.length())
{
// normal alpha charachers
while (alpha_or_space(compressed[i]))
original.push_back(compressed[i++]);
// repeat number
repeat = 0;
while (isdigit(compressed[i]))
repeat = 10 * repeat + (compressed[i++] - '0');
// unroll releat charachters
auto char_to_unroll = compressed[i++];
while (repeat--)
original.push_back(char_to_unroll);
}
}
int main()
{
std::string deco, outp, inp = "aaahhhhiii kkkjjhh ikl wwwwwweeeett";
Encode(inp, outp);
Decompress(outp, deco);
std::cout << inp << std::endl << outp << std::endl<< deco;
return 0;
}

The decompression can't possibly work in an unambiguous way because you didn't define a sentinel character; i.e. given the compressed stream it's impossible to determine whether a number is an original single number or it represents the repeat RLE command. I would suggest using '0' as the sentinel char. While encoding, if you see '0' you just output 010. Any other char X will translate to 0NX where N is the repeat byte counter. If you go over 255, just output a new RLE repeat command

Insert symbol into string C++

I need to insert symbol '+' into string after its each five symbol.
st - the member of class String of type string
int i = 1;
int original_size = st.size;
int count = 0;
int j;
for (j = 0; j < st.size; j++)
{
if (i % 5)
count++;
}
while (st.size < original_size + count)
{
if (i % 5)
{
st.insert(i + 1, 1, '+');
st.size++;
}
i++;
}
return st;
I got an error in this part of code. I think it is connected with conditions of of the while-cycle. Can you help me please how to do this right?

If I've understood you correctly then you want to insert a '+' character every 5 chars in the original string. One way to do this would be to create a temporary string and then reassign the original string:
std::string st("A test string with some chars");
std::string temp;
for (int i = 1; i <= st.size(); ++i)
{
temp += st[i - 1];
if (i % 5 == 0)
{
temp += '+';
}
}
st = temp;
You'll notice I've started the loop at 1, this is to avoid the '+' being inserted on the first iteration (0%5==0).

#AlexB's answer shows how to generate a new string with the resulting text.
That said, if your problem is to perform in-place insertions your code should look similar to this:
std::string st{ "abcdefghijk" };
for(auto i = 4; i != st.size(); i += 5)
st.insert(i+1, 1, '+'); // insert 1 character = '+' at position i
assert(st == "abcde+fghij+k");

std::string InsertEveryNSymbols(const std::string & st, size_t n, char c)
{
const size_t size(st.size());
std::string result;
result.reserve(size + size / n);
for (size_t i(0); i != size; ++i)
{
result.push_back(st[i]);
if (i % n == n - 1)
result.push_back(c);
}
return result;
}
You don't need a loop to calculate the length of the resulting string. It's going to be simply size + size / 5. And doing multiple inserts makes it a quadratic-complexity algorithm when you can just as easily keep it linear.

Nothing no one else has done, but eliminates the string resizing and the modulus and takes advantage of a few new and fun language features.
std::string temp(st.length() + st.length()/5, '\0');
// preallocate string to eliminate need for resizing.
auto loc = temp.begin(); // iterator for temp string
size_t count = 0;
for (char ch: st) // iterate through source string
{
*loc++ = ch;
if (--count == 0) // decrement and test for zero much faster than
// modulus and test for zero
{
*loc++ = '+';
count = 5; // even with this assignment
}
}
st = temp;

Finding common characters in two strings

I am coding for the problem in which we got to count the number of common characters in two strings. Main part of the count goes like this
for(i=0; i < strlen(s1); i++) {
for(j = 0; j < strlen(s2); j++) {
if(s1[i] == s2[j]) {
count++;
s2[j] = '*';
break;
}
}
}
This goes with an O(n^2) logic. However I could not think of a better solution than this. Can anyone help me in coding with an O(n) logic.

This is very simple. Take two int arrays freq1 and freq2. Initialize all its elements to 0. Then read your strings and store the frequencies of the characters to these arrays. After that compare the arrays freq1 and freq2 to find the common characters.

It can be done in O(n) time with constant space.
The pseudo code goes like this :
int map1[26], map2[26];
int common_chars = 0;
for c1 in string1:
map1[c1]++;
for c2 in string2:
map2[c2]++;
for i in 1 to 26:
common_chars += min(map1[i], map2[i]);

Your current code is O(n^3) because of the O(n) strlens and produces incorrect results, for example on "aa", "aa" (which your code will return 4).
This code counts letters in common (each letter being counted at most once) in O(n).
int common(const char *a, const char *b) {
int table[256] = {0};
int result = 0;
for (; *a; a++)table[*a]++;
for (; *b; b++)result += (table[*b]-- > 0);
return result;
}
Depending on how you define "letters in common", you may have different logic. Here's some testcases for the definition I'm using (which is size of the multiset intersection).
int main(int argc, char *argv[]) {
struct { const char *a, *b; int want; } cases[] = {
{"a", "a", 1},
{"a", "b", 0},
{"a", "aa", 1},
{"aa", "a", 1},
{"ccc", "cccc", 3},
{"aaa", "aaa", 3},
{"abc", "cba", 3},
{"aasa", "asad", 3},
};
int fail = 0;
for (int i = 0; i < sizeof(cases) / sizeof(*cases); i++) {
int got = common(cases[i].a, cases[i].b);
if (got != cases[i].want) {
fail = 1;
printf("common(%s, %s) = %d, want %d\n",
cases[i].a, cases[i].b, got, cases[i].want);
}
}
return fail;
}

You can do it with 2n:
int i,j, len1 = strlen(s1), len2 = strlen(s2);
unsigned char allChars[256] = { 0 };
int count = 0;
for( i=0; i<len1; i++ )
{
allChars[ (unsigned char) s1[i] ] = 1;
}
for( i=0; i<len2; i++ )
{
if( allChars[ (unsigned char) s1[i] ] == 1 )
{
allChars[ (unsigned char) s2[i] ] = 2;
}
}
for( i=0; i<256; i++ )
{
if( allChars[i] == 2 )
{
cout << allChars[i] << endl;
count++;
}
}

Following code traverses each sting only once. So the complexity is O(n). One of the assumptions is that the upper and lower cases are considered same.
#include<stdio.h>
int main() {
char a[] = "Hello world";
char b[] = "woowrd";
int x[26] = {0};
int i;
int index;
for (i = 0; a[i] != '\0'; i++) {
index = a[i] - 'a';
if (index > 26) {
//capital char
index = a[i] - 'A';
}
x[index]++;
}
for (i = 0; b[i] != '\0'; i++) {
index = b[i] - 'a';
if (index > 26) {
//capital char
index = b[i] - 'A';
}
if (x[index] > 0)
x[index] = -1;
}
printf("Common characters in '%s' and '%s' are ", a, b);
for (i = 0; i < 26; i++) {
if (x[i] < 0)
printf("%c", 'a'+i);
}
printf("\n");
}

int count(string a, string b)
{
int i,c[26]={0},c1[26]={};
for(i=0;i<a.length();i++)
{
if(97<=a[i]&&a[i]<=123)
c[a[i]-97]++;
}
for(i=0;i<b.length();i++)
{
if(97<=b[i]&&b[i]<=123)
c1[b[i]-97]++;
}
int s=0;
for(i=0;i<26;i++)
{
s=s+abs(c[i]+c1[i]-(c[i]-c1[i]));
}
return (s);
}
This is much easier and better solution

for (std::vector<char>::iterator i = s1.begin(); i != s1.end(); ++i)
{
if (std::find(s2.begin(), s2.end(), *i) != s2.end())
{
dest.push_back(*i);
}
}
taken from here

C implementation to run in O(n) time and constant space.
#define ALPHABETS_COUNT 26
int commonChars(char *s1, char *s2)
{
int c_count = 0, i;
int arr1[ALPHABETS_COUNT] = {0}, arr2[ALPHABETS_COUNT] = {0};
/* Compute the number of occurances of each character */
while (*s1) arr1[*s1++-'a'] += 1;
while (*s2) arr2[*s2++-'a'] += 1;
/* Increment count based on match found */
for(i=0; i<ALPHABETS_COUNT; i++) {
if(arr1[i] == arr2[i]) c_count += arr1[i];
else if(arr1[i]>arr2[i] && arr2[i] != 0) c_count += arr2[i];
else if(arr2[i]>arr1[i] && arr1[i] != 0) c_count += arr1[i];
}
return c_count;
}

First, your code does not run in O(n^2), it runs in O(nm), where n and m are the length of each string.
You can do it in O(n+m), but not better, since you have to go through each string, at least once, to see if a character is in both.
An example in C++, assuming:
ASCII characters
All characters included (letters, numbers, special, spaces, etc...)
Case sensitive
std::vector<char> strIntersect(std::string const&s1, std::string const&s2){
std::vector<bool> presents(256, false); //Assuming ASCII
std::vector<char> intersection;
for (auto c : s1) {
presents[c] = true;
}
for (auto c : s2) {
if (presents[c]){
intersection.push_back(c);
presents[c] = false;
}
}
return intersection;
}
int main() {
std::vector<char> result;
std::string s1 = "El perro de San Roque no tiene rabo, porque Ramon Rodriguez se lo ha cortado";
std::string s2 = "Saint Roque's dog has no tail, because Ramon Rodriguez chopped it off";
//Expected: "S a i n t R o q u e s d g h l , b c m r z p"
result = strIntersect(s1, s2);
for (auto c : result) {
std::cout << c << " ";
}
std::cout << std::endl;
return 0;
}

Their is a more better version in c++ :
C++ bitset and its application
A bitset is an array of bool but each Boolean value is not stored separately instead bitset optimizes the space such that each bool takes 1 bit space only, so space taken by bitset bs is less than that of bool bs[N] and vector bs(N). However, a limitation of bitset is, N must be known at compile time, i.e., a constant (this limitation is not there with vector and dynamic array)
As bitset stores the same information in compressed manner the operation on bitset are faster than that of array and vector. We can access each bit of bitset individually with help of array indexing operator [] that is bs[3] shows bit at index 3 of bitset bs just like a simple array. Remember bitset starts its indexing backward that is for 10110, 0 are at 0th and 3rd indices whereas 1 are at 1st 2nd and 4th indices.
We can construct a bitset using integer number as well as binary string via constructors which is shown in below code. The size of bitset is fixed at compile time that is, it can’t be changed at runtime.
For more information about bitset visit the site : https://www.geeksforgeeks.org/c-bitset-and-its-application
The code is as follows :
// considering the strings to be of lower case.
int main()
{
string s1,s2;
cin>>s1>>s2;
//Declaration for bitset type variables
bitset<26> b_s1,b_s2;
// setting the bits in b_s1 for the encountered characters of string s1
for(auto& i : s1)
{
if(!b_s1[i-'a'])
b_s1[i-'a'] = 1;
}
// setting the bits in b_s2 for the encountered characters of string s2
for(auto& i : s2)
{
if(!b_s2[i-'a'])
b_s2[i-'a'] = 1;
}
// counting the number of set bits by the "Logical AND" operation
// between b_s1 and b_s2
cout<<(b_s1&b_s2).count();
}

No need to initialize and keep an array of 26 elements (numbers for each letter in alphabet). Just fo the following:
Using HashMap store letter as a key and integer got the count as a value.
Create a Set of characters.
Iterate through each string characters, add to the Set from step 2. If add() method returned false, (means that same character already exists in the Set), then add the character to the map and increment the value.
These steps are written considering Java programming language.

Python Code:
>>>s1='abbc'
>>>s2='abde'
>>>p=list(set(s1).intersection(set(s2)))
>>print(p)
['a','b']
Hope this helps you, Happy Coding!

can be easily done using the concept of "catching" which is a sub-algorithm of hashing.

Find subsequence of given length from a given string?

To find the sub-sequences from a string of given length i have a recursive code (shown below) but it takes much time when the string length is big....
void F(int index, int length, string str)
{
if (length == 0) {
cout<<str<<endl;
//int l2=str.length();
//sum=0;
//for(int j=0;j<l2;j++)
//sum+=(str[j]-48);
//if(sum%9==0 && sum!=0)
//{c++;}
//sum=0;
} else {
for (int i = index; i < n; i++) {
string temp = str;
temp += S[i];
//sum+=(temp[i]-48);
F(i + 1, length - 1, temp);
}
}
}
Please help me with some idea of implementing non-recursive code or something else.

You mentioned your current code is too slow when the input string length is large. It would be helpful if you could provide a specific example along with your timing info so we know what you consider to be "too slow". You should also specify what you would consider to be an acceptable run time. Here's an example:
I'll start with an initial version that I believe is similar to your current algorithm. It generates all subsequences of length >= 2:
#include <iostream>
#include <string>
void subsequences(const std::string& prefix, const std::string& suffix)
{
if (prefix.length() >= 2)
std::cout << prefix << std::endl;
for (size_t i=0; i < suffix.length(); ++i)
subsequences(prefix + suffix[i], suffix.substr(i + 1));
}
int main(int argc, char* argv[])
{
subsequences("", "ABCD");
}
Running this program produces the following output:
AB
ABC
ABCD
ABD
AC
ACD
AD
BC
BCD
BD
CD
Now let's change the input string to something longer. I'll use a 26-character input string:
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
This generates 67,108,837 subsequences. I won't list them here :-). On my machine, the code shown above takes just over 78 seconds to run (excluding output to cout) with the 26-character input string.
When I look for ways to optimize the above code, one thing that jumps out is that it's creating two new string objects for each recursive call to subsequences(). What if we could preallocate space once upfront and then simply pass pointers? Version 2:
#include <stdio.h>
#include <malloc.h>
#include <string.h>
void subsequences(char* prefix, int prefixLength, const char* suffix)
{
if (prefixLength >= 2)
printf("%s\n", prefix);
for (size_t i=0; i < strlen(suffix); ++i) {
prefix[prefixLength] = suffix[i];
prefix[prefixLength + 1] = '\0';
subsequences(prefix, prefixLength + 1, suffix + i + 1);
}
}
int main(int argc, char* argv[])
{
const char *inputString = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char *prefix = (char*) _malloca(strlen(inputString) + 1);
subsequences(prefix, 0, inputString);
}
This generates the same 67,108,837 subsequences, but execution time is now just over 2 seconds (again, excluding output via printf).

Your code might be slow because your string is large. For a sequence of n unique elements there are (n over k) subsequences of length k. That means for the sequence "ABCDEFGHIJKLMNOPQRSTUVWXYZ" there are 10.400.600 different subsequences of length 13. That number grows pretty fast.
Nevertheless, since you asked, here is a non-recursive function that takes a string str and a size n and prints all subsequences of length n of that string.
void print_subsequences(const std::string& str, size_t n)
{
if (n < 1 || str.size() < n)
{
return; // there are no subsequences of the given size
}
// start with the first n characters (indexes 0..n-1)
std::vector<size_t> indexes(n);
for (size_t i = 0; i < n; ++i)
{
indexes[i] = i;
}
while (true)
{
// build subsequence from indexes
std::string subsequence(n, ' ');
for (size_t i = 0; i < n; ++i)
{
subsequence[i] = str[indexes[i]];
}
// there you are
std::cout << subsequence << std::endl;
// the last subsequence starts with n-th last character
if (indexes[0] >= str.size() - n)
{
break;
}
// find rightmost incrementable index
size_t i = n;
while (i-- > 0)
{
if (indexes[i] < str.size() - n + i)
{
break;
}
}
// increment that index and set all following indexes
size_t value = indexes[i];
for (; i < n; ++i)
{
indexes[i] = ++value;
}
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Smallest Binary String not Contained in Another String - c++

Related

Checking whether a String is a Lapindrome or not [duplicate]

Run-length decompression using C++

Insert symbol into string C++

Finding common characters in two strings

Find subsequence of given length from a given string?

Categories

Resources