Searching a string of ints for a repeating pattern [duplicate]

Searching a string of ints for a repeating pattern [duplicate] - c++

My problem is to find the repeating sequence of characters in the given array. simply, to identify the pattern in which the characters are appearing.
.---.---.---.---.---.---.---.---.---.---.---.---.---.---.
1: | J | A | M | E | S | O | N | J | A | M | E | S | O | N |
'---'---'---'---'---'---'---'---'---'---'---'---'---'---'
.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.
2: | R | O | N | R | O | N | R | O | N | R | O | N | R | O | N |
'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'
.---.---.---.---.---.---.---.---.---.---.---.---.
3: | S | H | A | M | I | L | S | H | A | M | I | L |
'---'---'---'---'---'---'---'---'---'---'---'---'
.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.
4: | C | A | R | P | E | N | T | E | R | C | A | R | P | E | N | T | E | R |
'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'
Example
Given the previous data, the result should be:
"JAMESON"
"RON"
"SHAMIL"
"CARPENTER"
Question
How to deal with this problem efficiently?

Tongue-in-cheek O(NlogN) solution
Perform an FFT on your string (treating characters as numeric values). Every peak in the resulting graph corresponds to a substring periodicity.

For your examples, my first approach would be to
get the first character of the array (for your last example, that would be C)
get the index of the next appearance of that character in the array (e.g. 9)
if it is found, search for the next appearance of the substring between the two appearances of the character (in this case CARPENTER)
if it is found, you're done (and the result is this substring).
Of course, this works only for a very limited subset of possible arrays, where the same word is repeated over and over again, starting from the beginning, without stray characters in between, and its first character is not repeated within the word. But all your examples fall into this category - and I prefer the simplest solution which could possibly work :-)
If the repeated word contains the first character multiple times (e.g. CACTUS), the algorithm can be extended to look for subsequent occurrences of that character too, not only the first one (so that it finds the whole repeated word, not only a substring of it).
Note that this extended algorithm would give a different result for your second example, namely RONRON instead of RON.

In Python, you can leverage regexes thus:
def recurrence(text):
import re
for i in range(1, len(text)/2 + 1):
m = re.match(r'^(.{%d})\1+$'%i, text)
if m: return m.group(1)
recurrence('abcabc') # Returns 'abc'
I'm not sure how this would translate to Java or C. (That's one of the reasons I like Python, I guess. :-)

First write a method that find repeating substring sub in the container string as below.
boolean findSubRepeating(String sub, String container);
Now keep calling this method with increasing substring in the container, first try 1 character substring, then 2 characters, etc going upto container.length/2.

Pseudocode
len = str.length
for (i in 1..len) {
if (len%i==0) {
if (str==str.substr(0,i).repeat(len/i)) {
return str.substr(0,i)
}
}
}
Note: For brevity, I'm inventing a "repeat" method for strings, which isn't actually part of Java's string; "abc".repeat(2)="abcabc"

Using C++:
//Splits the string into the fragments of given size
//Returns the set of of splitted strings avaialble
set<string> split(string s, int frag)
{
set<string> uni;
int len = s.length();
for(int i = 0; i < len; i+= frag)
{
uni.insert(s.substr(i, frag));
}
return uni;
}
int main()
{
string out;
string s = "carpentercarpenter";
int len = s.length();
//Optimistic approach..hope there are only 2 repeated strings
//If that fails, then try to break the strings with lesser number of
//characters
for(int i = len/2; i>1;--i)
{
set<string> uni = split(s,i);
if(uni.size() == 1)
{
out = *uni.begin();
break;
}
}
cout<<out;
return 0;
}

The first idea that comes to my mind is trying all repeating sequences of lengths that divide length(S) = N. There is a maximum of N/2 such lengths, so this results in a O(N^2) algorithm.
But i'm sure it can be improved...

Here is a more general solution to the problem, that will find repeating subsequences within an sequence (of anything), where the subsequences do not have to start at the beginning, nor immediately follow each other.
given an sequence b[0..n], containing the data in question, and a threshold t being the minimum subsequence length to find,
l_max = 0, i_max = 0, j_max = 0;
for (i=0; i<n-(t*2);i++) {
for (j=i+t;j<n-t; j++) {
l=0;
while (i+l<j && j+l<n && b[i+l] == b[j+l])
l++;
if (l>t) {
print "Sequence of length " + l + " found at " + i + " and " + j);
if (l>l_max) {
l_max = l;
i_max = i;
j_max = j;
}
}
}
}
if (l_max>t) {
print "longest common subsequence found at " + i_max + " and " + j_max + " (" + l_max + " long)";
}
Basically:
Start at the beginning of the data, iterate until within 2*t of the end (no possible way to have two distinct subsequences of length t in less than 2*t of space!)
For the second subsequence, start at least t bytes beyond where the first sequence begins.
Then, reset the length of the discovered subsequence to 0, and check to see if you have a common character at i+l and j+l. As long as you do, increment l.
When you no longer have a common character, you have reached the end of your common subsequence.
If the subsequence is longer than your threshold, print the result.

Just figured this out myself and wrote some code for this (written in C#) with a lot of comments. Hope this helps someone:
// Check whether the string contains a repeating sequence.
public static bool ContainsRepeatingSequence(string str)
{
if (string.IsNullOrEmpty(str)) return false;
for (int i=0; i<str.Length; i++)
{
// Every iteration, cut down the string from i to the end.
string toCheck = str.Substring(i);
// Set N equal to half the length of the substring. At most, we have to compare half the string to half the string. If the string length is odd, the last character will not be checked against, but it will be checked in the next iteration.
int N = toCheck.Length / 2;
// Check strings of all lengths from 1 to N against the subsequent string of length 1 to N.
for (int j=1; j<=N; j++)
{
// Check from beginning to j-1, compare against j to j+j.
if (toCheck.Substring(0, j) == toCheck.Substring(j, j)) return true;
}
}
return false;
}
Feel free to ask any questions if it's unclear why it works.

and here is a concrete working example:
/* find greatest repeated substring */
char *fgrs(const char *s,size_t *l)
{
char *r=0,*a=s;
*l=0;
while( *a )
{
char *e=strrchr(a+1,*a);
if( !e )
break;
do {
size_t t=1;
for(;&a[t]!=e && a[t]==e[t];++t);
if( t>*l )
*l=t,r=a;
while( --e!=a && *e!=*a );
} while( e!=a && *e==*a );
++a;
}
return r;
}
size_t t;
const char *p;
p=fgrs("BARBARABARBARABARBARA",&t);
while( t-- ) putchar(*p++);
p=fgrs("0123456789",&t);
while( t-- ) putchar(*p++);
p=fgrs("1111",&t);
while( t-- ) putchar(*p++);
p=fgrs("11111",&t);
while( t-- ) putchar(*p++);

Not sure how you define "efficiently". For easy/fast implementation you could do this in Java:
private static String findSequence(String text) {
Pattern pattern = Pattern.compile("(.+?)\\1+");
Matcher matcher = pattern.matcher(text);
return matcher.matches() ? matcher.group(1) : null;
}
it tries to find the shortest string (.+?) that must be repeated at least once (\1+) to match the entire input text.

This is a solution I came up with using the queue, it passed all the test cases of a similar problem in codeforces. Problem No is 745A.
#include<bits/stdc++.h>
using namespace std;
typedef long long ll;
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
string s, s1, s2; cin >> s; queue<char> qu; qu.push(s[0]); bool flag = true; int ind = -1;
s1 = s.substr(0, s.size() / 2);
s2 = s.substr(s.size() / 2);
if(s1 == s2)
{
for(int i=0; i<s1.size(); i++)
{
s += s1[i];
}
}
//cout << s1 << " " << s2 << " " << s << "\n";
for(int i=1; i<s.size(); i++)
{
if(qu.front() == s[i]) {qu.pop();}
qu.push(s[i]);
}
int cycle = qu.size();
/*queue<char> qu2 = qu; string str = "";
while(!qu2.empty())
{
cout << qu2.front() << " ";
str += qu2.front();
qu2.pop();
}*/
while(!qu.empty())
{
if(s[++ind] != qu.front()) {flag = false; break;}
qu.pop();
}
flag == true ? cout << cycle : cout << s.size();
return 0;
}

I'd convert the array to a String object and use regex

Put all your character in an array e.x. a[]
i=0; j=0;
for( 0 < i < count )
{
if (a[i] == a[i+j+1])
{++i;}
else
{++j;i=0;}
}
Then the ratio of (i/j) = repeat count in your array.
You must pay attention to limits of i and j, but it is the simple solution.

Related

Search For Subtext In Text With A Defined Algorithm

I want to create a program to search for a subtext in a text.
For example, I have this text: abcdeabbdfeg
And in that text I want to find: cd
But I want to use this algorithm:
start = 1
end = string length of the text
middle = (start + end) / 2
if (pattern < text[middle]) end = mid - 1;
if (pattern > text[middle]) start = mid + 1;
...and continue until the pattern is found in the text
So, I already have a simple program that completely works without any problem but without that algorithm above, so now I only want to implement that algorithm above in my program, I have tried many ways, but my program won't show anything in any case, after I add that algorithm...
This is the code that I have and works:
void search(char *pat, char *txt)
{
int M = strlen(pat);
int N = strlen(txt);
for (int i = 0; i <= N - M; i++)
{
int j;
for (j = 0; j < M; j++)
{
if (txt[i+j] != pat[j])
break;
}
if (j == M)
{
printf("Pattern found at index %d \n", i);
}
}
}
And this is the code above with the implementation of the algorithm:
int _tmain(int argc, _TCHAR* argv[])
{
char t[32];
cout << "Please enter your text (t):";
cin >> t;
char p[32];
cout << "Please enter the pattern (p) you wish to look for in that text (t):";
cin >> p;
int start, end = 0;
double middle = 0;
start = 1;
end = strlen(t);
while (start <= end)
{
int M = strlen(p);
int N = strlen(t);
middle = std::ceil((start + end) / 2.0);
int mid = (int)middle;
for (int i = mid; i <= M; i++)
{
int j;
for (j = 0; j < M; j++)
{
if (t[mid] != p[j]) break;
if (p[j] < t[mid]) { end = mid - 1; }
else if (p[j] > t[mid]) { start = mid + 1; }
}
if (j == M)
{
printf("Pattern found at index %d \n", i);
}
}
}
if (start > end) cout << "Search has ended: pattern p does not occur in the text." << endl;
return 0;
}

Your algorithm is still a binary search. You split the array into two partitions, then select a partition, based on the value of a letter.
The requirements of a partitioned search is to have an ordered collection.
Let's use your example.
0 1 2 3 4 5 6 7 8 9 10 11
+---+---+---+---+---+---+---+---+---+---+---+---+
| a | b | c | d | e | a | b | b | d | f | e | g |
+---+---+---+---+---+---+---+---+---+---+---+---+
If you choose the midpoint at index 5, this yields the letter a. Since you are searching for the letter c first, then d, the algorithm says that the letter c must lie in the partition 6..11. Thus the basis of your issue.
The algorithm will not find cd because there is no c in the partition 6..11.
The algorithm assumes that the array is sorted and that for any given index, there will be one partition containing values less than array[index] and one partition containing values greater than array[index].
This assumption is demonstrated by the following code of yours:
if (p[j] < t[mid]) { end = mid - 1; }
else if (p[j] > t[mid]) { start = mid + 1; }
No matter how you name your algorithm, if it assumes an ordering on the array (e.g. p[j] < t[mid]), the array must be ordered.
You data is not ordered, so your algorithm fails the assumption, and thus the algorithm fails.
Edit 1:
Using Partitions
If you really must use a partitioning algorithm, you will need to build a set of partitions.
For example one partition starts at index 0 and proceeds until array[i] > array[i+1], this ends up at index 4. The other partition is 5..11.
(By the way, by determining the partitions, you have used more operations than a linear search.)
At this point, how do you know which partition to choose?
You don't. The letter c, that you are searching for, lies between a and e in the first partition; and a through g in the second partition. Pick a partition. If not found in the partition, you will have to search the other partition.
By performing a binary search on either partition, you have used more operations than a linear search.

How would I cycle through all of the various possibilities in this situation?

I saw a programming assignment that I decided to try, and it's basically where the user inputs something like "123456789=120", and the program has to insert a '+' or '-' at different positions to make the statement true. For example, in this case, it could do 123+4-5+6-7+8-9 = 120. There are only 3^8 possible combinations, so I think it would be okay to brute force it, but I don't know exactly in what order I could go in/how to actually implement that. More specifically, I don't know what order I would go in in inserting the '+' and '-'. Here is what I have:
#include <iostream>
#include <cmath>
using namespace std;
int string_to_integer(string);
int main()
{
string input, result_string;
int result, possibilities;
getline(cin, input);
//remove spaces
for(int i = 0; i < input.size(); i++)
{
if(input[i] == ' ')
{
input.erase(i, 1);
}
}
result_string = input.substr(input.find('=') + 1, input.length() - input.find('='));
result = string_to_integer(result_string);
input.erase(input.find('='), input.length() - input.find('='));
possibilities = pow(3, input.length() - 1);
cout << possibilities;
}
int string_to_integer(string substring)
{
int total = 0;
int power = 1;
for(int i = substring.length() - 1; i >= 0; i--)
{
total += (power * (substring[i] - 48));
power *= 10;
}
return total;
}

The basic idea: generate all the possible variations of +, - operators (including the case where the operator is missing), then parse the string and obtain the sum.
The approach: combinatorially, it is easy to show that we can do this by associating the operators (or the absence thereof) with the base-3 digits. So we can just iterate over every 8-digit ternary number, but instead of printing 0, 1 and 2, we will append a "+", a "-" or nothing before the next digit in the string.
Note that we do not actually need a string for this; one could use digits and operators etc. directly as well, computing the result on the fly. I only took the string-based approach because it's simple to explain, trivial to implement, and additionally, it gives us some visual feedback, which helps understanding the solution.
Now that we have constructed our string, we can just parse it; the simplest solution is to use the C standard library function strtol() for this purpose, which will take signs into account and it will return a signed integer. Because of this, we can just sum all the signed integers in a simple loop and we are done.
Code:
#include <iostream>
#include <string>
#include <cstring>
#include <cstdlib>
int main()
{
const char *ops = " +-";
// 3 ^ 8 = 6561
for (int i = 0; i < 6561; i++) {
// first, generate the line
int k = i;
std::string line = "1";
for (int j = 0; j < 8; j++) {
if (k % 3)
line += ops[k % 3];
k /= 3;
line += (char)('2' + j);
}
// now parse it
int result = 0;
const char *s = line.c_str();
char *p;
while (*s) {
int num = strtol(s, &p, 10);
result += num;
s = p;
}
// output
std::cout << line << " = " << result << (result == 120 ? " MATCH" : "") << std::endl;
}
return 0;
}
Result:
h2co3-macbook:~ h2co3$ ./quirk | grep MATCH
12-3-45+67+89 = 120 MATCH
1+2-34-5+67+89 = 120 MATCH
12-3+4+5+6+7+89 = 120 MATCH
1-23+4+56-7+89 = 120 MATCH
1+2+34-5+6-7+89 = 120 MATCH
123+4+5-6-7-8+9 = 120 MATCH
1+2-3+45+6+78-9 = 120 MATCH
12-3+45+67+8-9 = 120 MATCH
123+4-5+6-7+8-9 = 120 MATCH
123-4+5+6+7-8-9 = 120 MATCH
h2co3-macbook:~ h2co3$

The following bool advance(string& s) function will give you all combinations of '+', '-' and ' ' strings of arbitrary length except one and return false if no more are available.
char advance(char c)
{
switch (c)
{
case ' ': return '+';
case '+': return '-';
default: case '-': return ' ';
}
}
bool advance(string& s)
{
for (int i = 0; i < s.size(); ++i)
if ((s[i] = advance(s[i])) != ' ')
return true;
return false;
}
You have to first feed it with a string containing only spaces having desired length and then repeat 'advancing' it. Usage:
string s = " ";
while (advance(s))
cout << '"' << s << '"' << endl;
The above code will print
"+ "
"- "
" + "
"++ "
"-+ "
" - "
.
.
.
" ---"
"+---"
"----"
Note that the 'first' combination with just 4 spaces is not printed.
You can interleave those combinations with your lhs, skipping spaces, to produce expressions.

Another very similar approach, in plain C OK, in C++ if you really want it that way ;) and a bit more configurable
The same base 3 number trick is used to enumerate the combinations of void, + and - operators.
The string is handled as a list of positive or negative values that are added together.
The other contribution is very compact and elegant, but uses some C tricks to shorten the code.
This one is hopefully a bit more detailled, albeit not as beautiful.
#include <iostream>
#include <string>
using namespace std;
#include <string.h>
#include <math.h>
void solver (const char * str, int result)
{
int op_max = pow(3, strlen(str)); // number of operator permutations
// loop through all possible operator combinations
for (int o = 0 ; o != op_max ; o++)
{
int res = 0; // computed operation result
int sign = 1; // sign of the current value
int val = str[0]-'0'; // read 1st digit
string litteral; // litteral display of the current operation
// parse remaining digits
int op;
for (unsigned i=1, op=o ; i != strlen (str) ; i++, op/=3)
{
// get current digit
int c = str[i]-'0';
// get current operator
int oper = op % 3;
// apply operator
if (oper == 0) val = 10*val + c;
else
{
// add previous value
litteral += sign*val;
res += sign*val;
// store next sign
sign = oper == 1 ? 1 : -1;
// start a new value
val = c;
}
}
// add last value
litteral += sign*val;
res += sign*val;
// check result
if (res == result)
{
cout << litteral << " = " << result << endl;
}
}
}
int main(void)
{
solver ("123456789", 120);
}
Note: I used std::strings out of laziness, though they are notoriously slow.

Longest common substring from more than two strings - C++

I need to compute the longest common substrings from a set of filenames in C++.
Precisely, I have an std::list of std::strings (or the QT equivalent, also fine)
char const *x[] = {"FirstFileWord.xls", "SecondFileBlue.xls", "ThirdFileWhite.xls", "ForthFileGreen.xls"};
std::list<std::string> files(x, x + sizeof(x) / sizeof(*x));
I need to compute the n distinct longest common substrings of all strings, in this case e.g. for n=2
"File" and ".xls"
If I could compute the longest common subsequence, I could cut it out it and run the algorithm again to get the second longest, so essentially this boils down to:
Is there a (reference?) implementation for computing the LCS of a std::list of std::strings?
This is not a good answer but a dirty solution that I have - brute force on a QList of QUrls from which only the part after the last "/" is taken. I'd love to replace this with "proper" code.
(I have discovered http://www.icir.org/christian/libstree/ - which would help greatly, but I can't get it to compile on my machine. Someone used this maybe?)
QString SubstringMatching::getMatchPattern(QList<QUrl> urls)
{
QString a;
int foundPosition = -1;
int foundLength = -1;
for (int i=urls.first().toString().lastIndexOf("/")+1; i<urls.first().toString().length(); i++)
{
bool hit=true;
int xj;
for (int j=0; j<urls.first().toString().length()-i+1; j++ ) // try to match from position i up to the end of the string :: test character at pos. (i+j)
{
if (!hit) break;
QString firstString = urls.first().toString().right( urls.first().toString().length()-i ).left( j ); // this needs to match all k strings
//qDebug() << "SEARCH " << firstString;
for (int k=1; k<urls.length(); k++) // test all other strings, k = test string number
{
if (!hit) break;
//qDebug() << " IN " << urls.at(k).toString().right(urls.at(k).toString().length() - urls.at(k).toString().lastIndexOf("/")+1);
//qDebug() << " RES " << urls.at(k).toString().indexOf(firstString, urls.at(k).toString().lastIndexOf("/")+1);
if (urls.at(k).toString().indexOf(firstString, urls.at(k).toString().lastIndexOf("/")+1)<0) {
xj = j;
//qDebug() << "HIT LENGTH " << xj-1 << " : " << firstString;
hit = false;
}
}
}
if (hit) xj = urls.first().toString().length()-i+1; // hit up to the end of the string
if ((xj-2)>foundLength) // have longer match than existing, j=1 is match length
{
foundPosition = i; // at the current position
foundLength = xj-1;
//qDebug() << "Found at " << i << " length " << foundLength;
}
}
a = urls.first().toString().right( urls.first().toString().length()-foundPosition ).left( foundLength );
//qDebug() << a;
return a;
}

If as you say suffix trees are too heavyweight or otherwise impractical, the following
fairly simple brute-force approach may be adequate for your application.
I assume distinct substrings shall be non-overlapping and are picked from
left to right.
Even with these assumptions, there need not be a unique set that comprises
"the N distinct longest common substrings" of a set of strings. Whatever N is,
there might be more than N distinct common substrings all of the same maximal
length and any choice of N from among them would be arbitrary. Accordingly
the solution finds the at-most N *sets* of the longest distinct common
substrings in which all those of the same length are one set.
The algorithm is as follows:
Q is the target quota of lengths.
Strings is the problem set of strings.
Results is an initially empty multimap that maps a length to a set of strings,
Results[l] being the set with length l
N, initially 0, is the number of distinct lengths represented in Results
If Q is 0 or Strings is empty return Results
Find any shortest member of Strings; keep a copy of it S and remove it
from Strings. We proceed by comparing the substrings of S with those
of Strings because all the common substrings of {Strings, S} must be
substrings of S.
Iteratively generate all the substrings of S, longest first, using the
obvious nested loop controlled by offset and length. For each substring ss of
S:
If ss is not a common substring of Strings, next.
Iterate over Results[l] for l >= the length of ss until end of
Results or until ss is found to be a substring of the examined
result. In the latter case, ss is not distinct from a result already
in hand, so next.
ss is common substring distinct from any already in hand. Iterate over
Results[l] for l < the length of ss, deleting each result that is a
substring of ss, because all those are shorter than ss and not distinct
from it. ss is now a common substring distinct from any already in hand and
all others that remain in hand are distinct from ss.
For l = the length of ss, check whether Results[l] exists, i.e. if
there are any results in hand the same length as ss. If not, call that
a NewLength condition.
Check also if N == Q, i.e. we have already reached the target quota of distinct
lengths. If NewLength obtains and also N == Q, call that a StickOrRaise condition.
If StickOrRaise obtains then compare the length of ss with l = the
length of the shortest results in hand. If ss is shorter than l
then it is too short for our quota, so next. If ss is longer than l
then all the shortest results in hand are to be ousted in favour of ss, so delete
Results[l] and decrement N.
Insert ss into Results keyed by its length.
If NewLength obtains, increment N.
Abandon the inner iteration over substrings of S that have the
same offset of ss but are shorter, because none of them are distinct
from ss.
Advance the offset in S for the outer iteration by the length of ss,
to the start of the next non-overlapping substring.
Return Results.
Here is a program that implements the solution and demonstrates it with
a list of strings:
#include <list>
#include <map>
#include <string>
#include <iostream>
#include <algorithm>
using namespace std;
// Get a non-const iterator to the shortest string in a list
list<string>::iterator shortest_of(list<string> & strings)
{
auto where = strings.end();
size_t min_len = size_t(-1);
for (auto i = strings.begin(); i != strings.end(); ++i) {
if (i->size() < min_len) {
where = i;
min_len = i->size();
}
}
return where;
}
// Say whether a string is a common substring of a list of strings
bool
is_common_substring_of(
string const & candidate, list<string> const & strings)
{
for (string const & s : strings) {
if (s.find(candidate) == string::npos) {
return false;
}
}
return true;
}
/* Get a multimap whose keys are the at-most `quota` greatest
lengths of common substrings of the list of strings `strings`, each key
multi-mapped to the set of common substrings of that length.
*/
multimap<size_t,string>
n_longest_common_substring_sets(list<string> & strings, unsigned quota)
{
size_t nlengths = 0;
multimap<size_t,string> results;
if (quota == 0) {
return results;
}
auto shortest_i = shortest_of(strings);
if (shortest_i == strings.end()) {
return results;
}
string shortest = *shortest_i;
strings.erase(shortest_i);
for ( size_t start = 0; start < shortest.size();) {
size_t skip = 1;
for (size_t len = shortest.size(); len > 0; --len) {
string subs = shortest.substr(start,len);
if (!is_common_substring_of(subs,strings)) {
continue;
}
auto i = results.lower_bound(subs.size());
for ( ;i != results.end() &&
i->second.find(subs) == string::npos; ++i) {}
if (i != results.end()) {
continue;
}
for (i = results.begin();
i != results.end() && i->first < subs.size(); ) {
if (subs.find(i->second) != string::npos) {
i = results.erase(i);
} else {
++i;
}
}
auto hint = results.lower_bound(subs.size());
bool new_len = hint == results.end() || hint->first != subs.size();
if (new_len && nlengths == quota) {
size_t min_len = results.begin()->first;
if (min_len > subs.size()) {
continue;
}
results.erase(min_len);
--nlengths;
}
nlengths += new_len;
results.emplace_hint(hint,subs.size(),subs);
len = 1;
skip = subs.size();
}
start += skip;
}
return results;
}
// Testing ...
int main()
{
list<string> strings{
"OfBitWordFirstFileWordZ.xls",
"SecondZWordBitWordOfFileBlue.xls",
"ThirdFileZBitWordWhiteOfWord.xls",
"WordFourthWordFileBitGreenZOf.xls"};
auto results = n_longest_common_substring_sets(strings,4);
for (auto const & val : results) {
cout << "length: " << val.first
<< ", substring: " << val.second << endl;
}
return 0;
}
Output:
length: 1, substring: Z
length: 2, substring: Of
length: 3, substring: Bit
length: 4, substring: .xls
length: 4, substring: File
length: 4, substring: Word
(Built with gcc 4.8.1)

Generating all n-letter permutations

I am attempting to calculate all the possible 3 letter permutations, using the 26 letters (Which amounts to only 26*25*24=15,600). The order of the letters matters, and I don't want repeating letters. (I wanted the permutations to be generated in lexicographical order, but that isn't necessary)
So far I attempted to nest for loops, but I ended up iterating through every combination possible. So there are repeating letters, which I do not want, and the for loops can become difficult to manage if I want more than 3 letters.
I can flip through the letters until I get a letter that has not been used, but it isn't in lexicographical order and it is much slower than using next_permutation (I cannot use this std method because I'm left calculating all of the subsets of the 26 letters).
Is there a more efficient way to do this?
To put in perspective of the inefficiency, next_permutation iterates through the first 6 digits instantaneously. However, it takes several seconds to get all the three letter permutations using this method, and next_permutation still quickly becomes inefficient with the 2^n subsets I must calculate.
Here is what I have for the nested for loops:
char key[] = {'a','b','c','d','e','f','g','h','i','j','k',
'l','m','n','o','p','r','s','t','u','v','w','x','y','z'};
bool used[25];
ZeroMemory( used, sizeof(bool)*25 );
for( int i = 0; i < 25; i++ )
{
while( used[i] == true )
i++;
if( i >= 25 )
break;
used[i] = true;
for( int j = 0; j < 25; j++ )
{
while( used[j] == true )
j++;
if( j >= 25 )
break;
used[j] = true;
for( int k = 0; k < 25; k++ )
{
while( used[k] == true )
k++;
if( k >= 25 )
break;
used[k] = true;
cout << key[i] << key[j] << key[k] << endl;
used[k] = false;
}
used[j] = false;
}
used[i] = false;
}

Make a root which represents the start of a combination, so it has no value.
calculate all the possible children (26 letter, 26 children...)
for each root child calculate possible children (so: remaining letters)
use a recursive limited-depth search to find your combinations.

This is a solution I would try if i just want a "simple" solution. I'm not sure how recource intensive this is so I suggest you start trying with a small set of letters.
a = {a...z}
b = {a...z}
c = {a...z}
for each(a)
{
for each(b)
{
for each(c)
{
echo a + b + c;
}
}
}

For a specific and small, n, manual loops like you have is the easiest way. However, your code can be highly simplified:
for(char a='a'; a<='z'; ++a) {
for(char b='a'; b<='z'; ++b) {
if (b==a) continue;
for(char c='a'; c<='z'; ++c) {
if (c==a) continue;
if (c==b) continue;
std::cout << a << b << c << '\n';
}
}
}
For a variable N, obviously we need a different strategy. And, it turns out, it needs an incredibly different strategy. This is based on DaMachk's answer, of using recursion to generate subsequent letters
template<class func_type>
void generate(std::string& word, int length, const func_type& func) {
for(char i='a'; i<='z'; ++i) {
bool used = false;
for(char c : word) {
if (c==i) {
used = true;
break;
}
}
if (used) continue;
word.push_back(i);
if (length==1) func(word);
else generate(word, length-1, func);
word.pop_back();
}
}
template<class func_type>
void generate(int length, const func_type& func) {
std::string word;
generate(word, length, func);
}
You can see it here
I also made an unrolled version, which turned out to be incredibly complicated, but is significantly faster. I have two helper functions: I have a function to "find the next letter" (called next_unused) which increases the letter at an index to the next unused letter, or returns false if it cannot. The third function, reset_range "resets" a range of letters from a given index to the end of the string to the first unused letter it can. First we use reset_range to find the first string. To find subsequent strings, we call next_unused on the last letter, and if that fails, the second to last letter, and if that fails the third to last letter, etc. When we find a letter we can properly increase, we then "reset" all the letters to the right of that to the smallest unused values. If we get all the way to the first letter and it cannot be increased, then we've reached the end, and we stop. The code is frightening, but it's the best I could figure out.
bool next_unused(char& dest, char begin, bool* used) {
used[dest] = false;
dest = 0;
if (begin > 'Z') return false;
while(used[begin]) {
if (++begin > 'Z')
return false;
}
dest = begin;
used[begin] = true;
return true;
}
void reset_range(std::string& word, int begin, bool* used) {
int count = word.size()-begin;
for(int i=0; i<count; ++i)
assert(next_unused(word[i+begin], 'A'+i, used));
}
template<class func_type>
void doit(int n, func_type func) {
bool used['Z'+1] = {};
std::string word(n, '\0');
reset_range(word, 0, used);
for(;;) {
func(word);
//find next word
int index = word.size()-1;
while(next_unused(word[index], word[index]+1, used) == false) {
if (--index < 0)
return; //no more permutations
}
reset_range(word, index+1, used);
}
}
Here it is at work.
And here it is running in a quarter of the time as the simple one

I was doing a similar thing in powershell. Generating all the possible combinations of 9 symbols. After a bit of trial and error this is what I came up with.
$S1=New-Object System.Collections.ArrayList
$S1.Add("a")
$S1.Add("b")
$S1.Add("c")
$S1.Add("d")
$S1.Add("e")
$S1.Add("f")
$S1.Add("g")
$S1.Add("h")
$S1.Add("i")
$S1 | % {$a = $_
$S2 = $S1.Clone()
$S2.Remove($_)
$S2 | % {$b = $_
$S3 = $S2.Clone()
$S3.Remove($_)
$S3 | % {$c = $_
$S4 = $S2.Clone()
$S4.Remove($_)
$S4 | % {$d = $_
$S5 = $S4.Clone()
$S5.Remove($_)
$S5 | % {$e = $_
$S6 = $S5.Clone()
$S6.Remove($_)
$S6 | % {$f = $_
$S7 = $S6.Clone()
$S7.Remove($_)
$S7 | % {$g = $_
$S8 = $S7.Clone()
$S8.Remove($_)
$S8 | % {$h = $_
$S9 = $S8.Clone()
$S9.Remove($_)
$S9 | % {$i = $_
($a+$b+$c+$d+$e+$f+$g+$h+$i)
}
}
}
}
}
}
}
}
}

How to on efficient and quick way add prefix to number and remove?

How to on efficient and quick way add prefix to number and remove ? (number can have arbitrary number of digits, number doesn't have limit)
I have number for example 122121 and I want to add digit 9 at the begining to be 9122121, after that I need to remove first digit in number. I have split into vector, push front digit(in this case 9) and the create number from digits ( iteration with multiplying 10).
Is there more efficient way ?

If you want efficiency, don't use anything else than numbers, no vectors, strings, etc. In your case:
#include <iostream>
unsigned long long add_prefix( unsigned long long number, unsigned prefix )
{
// if you want the example marked (X) below to print "9", add this line:
if( number == 0 ) return prefix;
// without the above, the result of (X) would be "90"
unsigned long long tmp = ( number >= 100000 ) ? 1000000 : 10;
while( number >= tmp ) tmp *= 10;
return number + prefix * tmp;
}
int main()
{
std::cout << add_prefix( 122121, 9 ) << std::endl; // prints 9122121
std::cout << add_prefix( 122121, 987 ) << std::endl; // prints 987122121
std::cout << add_prefix( 1, 9 ) << std::endl; // prints 91
std::cout << add_prefix( 0, 9 ) << std::endl; // (X) prints 9 or 90
}
but watch out for overflows. Without overflows, the above even works for multi-digit prefixes. I hope you can figure out the reverse algorithm to remove the prefix.
Edited: As Andy Prowl pointed out, one could interpret 0 as "no digits", so the prefix should not be followed by the digit 0. But I guess it depends on the OPs use-case, so I edited the code accordingly.

You can calculate number of digits using floor(log10(number)) + 1. So the code would look like:
int number = 122121;
int noDigits = floor(log10(number)) + 1;
//Add 9 in front
number += 9*pow(10,noDigits);
//Now strip the 9
number %= pow(10,noDigits);
I hope I got everything right ;)

I shall provide an answer that makes use of binary search and a small benchmark of the answers provided so far.
Binary Search
The following function uses binary search to find the number of digits of the desired number and appends the desired digit in front of it.
int addPrefix(int N, int digit) {
int multiplier = 0;
// [1, 5]
if(N <= 100000) {
// [1, 3]
if(N <= 1000) {
//[1, 2]
if(N <= 100) {
//[1, 1]
if(N <= 10) {
multiplier = 10;
//[2, 2]
} else {
multiplier = 100;
}
//[3, 3]
} else {
multiplier = 1000;
}
//[4, 4]
} else if(N <= 10000) {
multiplier = 10000;
//[5, 5]
} else {
multiplier = 100000;
}
//[6, 7]
} else if(N <= 10000000) {
//[6, 6]
if(N <= 1000000) {
multiplier = 1000000;
//[7, 7]
} else {
multiplier = 10000000;
}
//[8, 9]
} else {
//[8, 8]
if(N <= 100000000) {
multiplier = 100000000;
//[9, 9]
} else {
multiplier = 1000000000;
}
}
return N + digit * multiplier;
}
It is rather verbose. But, it finds the number of digits for a number in the range of int in a maximum of 4 comparisons.
Benchmark
I created a small benchmark running each provided algorithm against 450 million iterations, 50 million iterations per number of determined number of digits.
int main(void) {
int i, j, N = 2, k;
for(i = 1; i < 9; ++i, N *= 10) {
for(j = 1; j < 50000000; ++j) {
k = addPrefix(N, 9);
}
}
return 0;
}
The results:
+-----+-----------+-------------+----------+---------+
| run | Alexander | Daniel Frey | kkuryllo | W.B. |
+-----+-----------+-------------+----------+---------+
| 1st | 2.204s | 3.983s | 5.145s | 23.216s |
+-----+-----------+-------------+----------+---------+
| 2nd | 2.189s | 4.044s | 5.081s | 23.484s |
+-----+-----------+-------------+----------+---------+
| 3rd | 2.197s | 4.232s | 5.043s | 23.378s |
+-----+-----------+-------------+----------+---------+
| AVG | 2.197s | 4.086s | 5.090s | 23.359s |
+-----+-----------+-------------+----------+---------+
You can find the sources used in this Gist here.

How about using lexical cast from boost? That way you're not doing the iteration and all the yourself.
http://www.boost.org/doc/libs/1_53_0/doc/html/boost_lexical_cast.html

you could put the digits in an std::string and use insert and delete but it might be an overkill

%First find the highest power of 10 greater than your number. Then multiple the addition by that and add to your number
For example:
int x;
int addition;
int y = 1;
while (y <= x)
{
y *= 10;
}
x += addition * y;
I didn't test this code so just take as an example...
I also don't really understand your other instructions, you'll need to clarify.
edit okay I think I understand that you also want to remove the first digit sometime as well. You can use a simular approach to do this.
int x;
int y = 1;
while (y <= x*10)
{
y *= 10;
}
x %= y;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Searching a string of ints for a repeating pattern [duplicate] - c++

Tongue-in-cheek O(NlogN) solution Perform an FFT on your string (treating characters as numeric values). Every peak in the resulting graph corresponds to a substring periodicity.

Pseudocode len = str.length for (i in 1..len) { if (len%i==0) { if (str==str.substr(0,i).repeat(len/i)) { return str.substr(0,i) } } } Note: For brevity, I'm inventing a "repeat" method for strings, which isn't actually part of Java's string; "abc".repeat(2)="abcabc"

The first idea that comes to my mind is trying all repeating sequences of lengths that divide length(S) = N. There is a maximum of N/2 such lengths, so this results in a O(N^2) algorithm. But i'm sure it can be improved...

I'd convert the array to a String object and use regex

Put all your character in an array e.x. a[] i=0; j=0; for( 0 < i < count ) { if (a[i] == a[i+j+1]) {++i;} else {++j;i=0;} } Then the ratio of (i/j) = repeat count in your array. You must pay attention to limits of i and j, but it is the simple solution.

Related

Search For Subtext In Text With A Defined Algorithm

How would I cycle through all of the various possibilities in this situation?

Longest common substring from more than two strings - C++

Generating all n-letter permutations

How to on efficient and quick way add prefix to number and remove?

Categories

Resources