Longest prefix string length for all the suffixes

Longest prefix string length for all the suffixes - c++

Find the length of the longest prefix string for all the suffixes of the string.
For example suffixes of the string ababaa are ababaa, babaa, abaa, baa, aa and a. The similarities of each of these strings with the string "ababaa" are 6,0,3,0,1,1 respectively. Thus the answer is 6 + 0 + 3 + 0 + 1 + 1 = 11.
I wrote following code
#include <iostream>
#include <string.h>
#include <stdio.h>
#include <time.h>
int main ( int argc, char **argv) {
size_t T;
std::cin >> T;
char input[100000];
for ( register size_t i = 0; i < T; ++i) {
std::cin >> input;
double t = clock();
size_t len = strlen(input);
char *left = input;
char *right = input + len - 1;
long long sol = 0;
int end_count = 1;
while ( left < right ) {
if ( *right != '\0') {
if ( *left++ == *right++ ) {
sol++;
continue;
}
}
end_count++;
left = input; // reset the left pointer
right = input + len - end_count; // set right to one left.
}
std::cout << sol + len << std::endl;
printf("time= %.3fs\n", (clock() - t) / (double)(CLOCKS_PER_SEC));
}
}
Working fine, but for a string which is 100000 long and having same character i.e. aaaaaaaaaa.......a, it is taking long time , how can i optimize this one more.

You can use Suffix Array: http://en.wikipedia.org/wiki/Suffix_array

Let's say your ababaa is a pattern P.
I think you could use the following algorithm:
Create a suffix automata for all possible suffixes of P.
Walk the automata using P as input, count edges traversed so far. For each accepting state of the automata add the current edge count to total sum. Walk the automata until you either reach the end of the input or there are no more edges to go through.
The total sum is the result.

Use Z algorithm to calculate length of all substrings, which also prefixes in O(n) and then scan resulting array and sum its values.
Reference: https://www.geeksforgeeks.org/sum-of-similarities-of-string-with-all-of-its-suffixes/

From what I see, you are using plain array to evaluate the suffix and though it may turn out to be efficient for some data set, it would fail to be efficient for some cases, such as the one you mentioned.
You would need to implement a Prefix-Tree or Trie like Data Structure. The code for those aren't straightforward, so if you are not familiar with them, I would suggest you read a little bit about them.

I'm not sure whether a Trie gives you much performance gain.. but I would certainly think about it.
The other idea I had is to try to compress your string. I didn't really think about it, just a crazy idea...
if you have a string like this: ababaa compress it maybe to: abab2a. Then you have to come up with a technique where you can use your algorithm with those strings. The advantage is you can then compare long strings 100000a efficiently with each other. Or more importantly: you can calculate your sum very fast.
But again, I didn't think it through, maybe this is a very bad idea ;)

Here a java implementation:
// sprefix
String s = "abababa";
Vector<Integer>[] v = new Vector[s.length()];
int sPrefix = s.length();
v[0] = new Vector<Integer>();
v[0].add(new Integer(0));
for(int j = 1; j < s.length(); j++)
{
v[j] = new Vector<Integer>();
v[j].add(new Integer(0));
for(int k = 0; k < v[j - 1].size(); k++)
if(s.charAt(j) == s.charAt(v[j - 1].get(k)))
{
v[j].add(v[j - 1].get(k) + 1);
v[j - 1].set(k, 0);
}
}
for(int j = 0; j < v.length; j++)
for(int k = 0; k < v[j].size(); k++)
sPrefix += v[j].get(k);
System.out.println("Result = " + sPrefix);

Related

Given two string S and T. Determine a substring of S that has minimum difference with T?

I have two string S and T where length of S >= length of T. I have to determine a substring of S which has same length as T and has minimum difference with T. Here difference between two strings of same length means, the number of indexes where they differ. For example: "ABCD" and "ABCE" differ at 3rd index, so their difference is 1.
I know I can use KMP(Knuth Morris Pratt) Pattern Searching algorithm to search T within S. But, what if S doesn't contain T as a substring? So, I have coded a brute force approach to solve this:
int main() {
string S, T;
cin >> S >> T;
int SZ_S = S.size(), SZ_T = T.size(), MinDifference = INT_MAX;
string ans;
for (int i = 0; i + SZ_T <= SZ_S; i++) { // I generate all the substring of S
int CurrentDifference = 0; // and check their difference with T
for (int j = 0; j < SZ_T; j++) { // and store the substring with minimum difference
if (S[i + j] != T[j])
CurrentDifference++;
}
if (CurrentDifference < MinDifference) {
ans = S.substr (i, SZ_T);
MinDifference = CurrentDifference;
}
}
cout << ans << endl;
}
But, my approach only works when S and T has shorter length. But, the problem is S and T can have length as large as 2 * 10^5. How can I approach this?

Let's maximize the number of characters that match. We can solve the problem for each character of the alphabet separately, and then sum up the results for
substrings. To solve the problem for a particular character, give string S and T as sequences 0 and 1 and multiply them using the FFT https://en.wikipedia.org/wiki/Fast_Fourier_transform.
Complexity O(|A| * N log N) where |A| size of the alphabet (for an uppercase letter is 26).

In c++, how can I grab the next few characters from a string?

My goal is to grab a specific part of a very large number and concatenate that part with another number, then continue. Since integers only go so high, I have a string of the number. I do NOT know what this number could be, so I can't input it in myself. I can use substr for the first part, but I am stuck shortly after.
An example
"435509590420924949"
I want to take the first 5 characters out, convert to integer, do my own calculation to them, then concatenate them with the rest of the string. So I will take 43550 out, do formula to get 49, then add 49 to another 5 in a row after the original string "95904" so the new answer will be "4995904".
This is my code for the first part I made up,
string temp;
int number;
temp = data.substr(0, 5);
number = atoi(temp.c_str());
This grabs the first first characters in the strings, converts to integers where I can calculate it, but I don't know how to grab the next 5 of the long string.

You can get the length of the string, so something like:
std::size_t startIndex = 0;
std::size_t blockLength = 5;
std::size_t length = data.length();
while(startIndex < length)
{
std::string temp = data.substr(startIndex, blockLength);
// do something with temp
startIndex += blockLength;
// TODO: this will skip the last "block" if it is < blockLength,
// so you need to modify it a bit for this case.
}

You can use loops. For example:
std::size_t subStrSize = 5;
for (std::size_t k = 0; k < data.size(); k+=subStrSize) {
std::size_t h = std::min(k + subStrSize - 1, data.size() - 1);
int number = 0;
for (std::size_t l = k; l <= h; ++l)
number = number * 10 + data[l] - '0';
//-- Some work with number --
}

How to replace certain items in a char array with an integer in C++?

Below is an example code that is not working the way I want.
#include <iostream>
using namespace std;
int main()
{
char testArray[] = "1 test";
int numReplace = 2;
testArray[0] = (int)numReplace;
cout<< testArray<<endl; //output is "? test" I wanted it 2, not a '?' there
//I was trying different things and hoping (int) helped
testArray[0] = '2';
cout<<testArray<<endl;//"2 test" which is what I want, but it was hardcoded in
//Is there a way to do it based on a variable?
return 0;
}
In a string with characters and integers, how do you go about replacing numbers? And when implementing this, is it different between doing it in C and C++?

If numReplace will be in range [0,9] you can do :-
testArray[0] = numReplace + '0';
If numReplace is outside [0,9] you need to
a) convert numReplace into string equivalent
b) code a function to replace a part of string by another evaluated in (a)
Ref: Best way to replace a part of string by another in c and other relevant post on SO
Also, since this is C++ code, you might consider using std::string, here replacement, number to string conversion, etc are much simpler.

You should look over the ASCII table over here: http://www.asciitable.com/
It's very comfortable - always look on the Decimal column for the ASCII value you're using.
In the line: TestArray[0] = (int)numreplace; You've actually put in the first spot the character with the decimal ASCII value of 2. numReplace + '0' could do the trick :)
About the C/C++ question, it is the same in both and about the characters and integers...
You should look for your number start and ending.
You should make a loop that'll look like this:
int temp = 0, numberLen, i, j, isOk = 1, isOk2 = 1, from, to, num;
char str[] = "asd 12983 asd";//will be added 1 to.
char *nstr;
for(i = 0 ; i < strlen(str) && isOk ; i++)
{
if(str[i] >= '0' && str[i] <= '9')
{
from = i;
for(j = i ; j < strlen(str) && isOk2)
{
if(str[j] < '0' || str[j] > '9')//not a number;
{
to=j-1;
isOk2 = 0;
}
}
isOk = 0; //for the loop to stop.
}
}
numberLen = to-from+1;
nstr = malloc(sizeof(char)*numberLen);//creating a string with the length of the number.
for(i = from ; i <= to ; i++)
{
nstr[i-from] = str[i];
}
/*nstr now contains the number*/
num = atoi(numstr);
num++; //adding - we wanted to have the number+1 in string.
itoa(num, nstr, 10);//putting num into nstr
for(i = from ; i <= to ; i++)
{
str[i] = nstr[i-from];
}
/*Now the string will contain "asd 12984 asd"*/
By the way, the most efficient way would probably be just looking for the last digit and add 1 to it's value (ASCII again) as the numbers in ASCII are following each other - '0'=48, '1'=49 and so on. But I just showed you how to treat them as numbers and work with them as integers and so. Hope it helped :)

Project Euler #8 - C++ code failed to work

I know there are multiple topic regarding Project Euler #8. But I am using a different approach, no STL.
#include <iostream>
using namespace std;
int main(){
char str[] = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450";
int size = strlen(str);
int number = 1;
int max = 0;
int product = 0;
int lowerBound = 0;
int upperBound = 4;
for (int i = 0; i <= size/5; i++)
{
for (int j = lowerBound; j <= upperBound; j++)
{
number = number * str[j];
}
product = number;
number = 1;
lowerBound += 5;
upperBound += 5;
if (product > max)
{
max = product;
}
}
cout << "the largest product: " << max << endl;
return 0;
}
the answer is : 550386080, which is way too big and incorrect.
Please tell me what's wrong with my code. No advanced pointers or template technique, just control flow statement and some basic stuff.

Part of your problem is the expression
number = number * str[j];
The str[j] is an ASCII character and you are incorrectly assuming it's a numeric value in the range 0..9. A cheap way to convert a single numeric character to a number would be to say
number = number * (str[j] - '0');
That gets you closer to the correct answer but there is another problem. You are testing each index range like [0..4], [5..9], [10..14], [15..19], etc. You should instead be testing indices [0..4], [1..5], [2..6], [3..7], etc. I'll leave that for you to correct.

How many palindromes can be formed by selections of characters from a string?

I'm posting this on behalf of a friend since I believe this is pretty interesting:
Take the string "abb". By leaving out
any number of letters less than the
length of the string we end up with 7
strings.
a b b ab ab bb abb
Out of these 4 are palindromes.
Similarly for the string
"hihellolookhavealookatthispalindromexxqwertyuiopasdfghjklzxcvbnmmnbvcxzlkjhgfdsapoiuytrewqxxsoundsfamiliardoesit"
(a length 112 string) 2^112 - 1
strings can be formed.
Out of these how many are
palindromes??
Below there is his implementation (in C++, C is fine too though). It's pretty slow with very long words; he wants to know what's the fastest algorithm possible for this (and I'm curious too :D).
#include <iostream>
#include <cstring>
using namespace std;
void find_palindrome(const char* str, const char* max, long& count)
{
for(const char* begin = str; begin < max; begin++) {
count++;
const char* end = strchr(begin + 1, *begin);
while(end != NULL) {
count++;
find_palindrome(begin + 1, end, count);
end = strchr(end + 1, *begin);
}
}
}
int main(int argc, char *argv[])
{
const char* s = "hihellolookhavealookatthis";
long count = 0;
find_palindrome(s, strlen(s) + s, count);
cout << count << endl;
}

First of all, your friend's solution seems to have a bug since strchr can search past max. Even if you fix this, the solution is exponential in time.
For a faster solution, you can use dynamic programming to solve this in O(n^3) time. This will require O(n^2) additional memory. Note that for long strings, even 64-bit ints as I have used here will not be enough to hold the solution.
#define MAX_SIZE 1000
long long numFound[MAX_SIZE][MAX_SIZE]; //intermediate results, indexed by [startPosition][endPosition]
long long countPalindromes(const char *str) {
int len = strlen(str);
for (int startPos=0; startPos<=len; startPos++)
for (int endPos=0; endPos<=len; endPos++)
numFound[startPos][endPos] = 0;
for (int spanSize=1; spanSize<=len; spanSize++) {
for (int startPos=0; startPos<=len-spanSize; startPos++) {
int endPos = startPos + spanSize;
long long count = numFound[startPos+1][endPos]; //if str[startPos] is not in the palindrome, this will be the count
char ch = str[startPos];
//if str[startPos] is in the palindrome, choose a matching character for the palindrome end
for (int searchPos=startPos; searchPos<endPos; searchPos++) {
if (str[searchPos] == ch)
count += 1 + numFound[startPos+1][searchPos];
}
numFound[startPos][endPos] = count;
}
}
return numFound[0][len];
}
Explanation:
The array numFound[startPos][endPos] will hold the number of palindromes contained in the substring with indexes startPos to endPos.
We go over all pairs of indexes (startPos, endPos), starting from short spans and moving to longer ones. For each such pair, there are two options:
The character at str[startPos] is not in the palindrome. In that case, there are numFound[startPos+1][endPos] possible palindromes - a number that we have calculated already.
character at str[startPos] is in the palindrome (at its beginning). We scan through the string to find a matching character to put at the end of the palindrome. For each such character, we use the already-calculated results in numFound to find number of possibilities for the inner palindrome.
EDIT:
Clarification: when I say "number of palindromes contained in a string", this includes non-contiguous substrings. For example, the palindrome "aba" is contained in "abca".
It's possible to reduce memory usage to O(n) by taking advantage of the fact that calculation of numFound[startPos][x] only requires knowledge of numFound[startPos+1][y] for all y. I won't do this here since it complicates the code a bit.
Pregenerating lists of indices containing each letter can make the inner loop faster, but it will still be O(n^3) overall.

I have a way can do it in O(N^2) time and O(1) space, however I think there must be other better ways.
the basic idea was the long palindrome must contain small palindromes, so we only search for the minimal match, which means two kinds of situation: "aa", "aba". If we found either , then expand to see if it's a part of a long palindrome.
int count_palindromic_slices(const string &S) {
int count = 0;
for (int position=0; position<S.length(); position++) {
int offset = 0;
// Check the "aa" situation
while((position-offset>=0) && (position+offset+1)<S.length() && (S.at(position-offset))==(S.at(position+offset+1))) {
count ++;
offset ++;
}
offset = 1; // reset it for the odd length checking
// Check the string for "aba" situation
while((position-offset>=0) && position+offset<S.length() && (S.at(position-offset))==(S.at(position+offset))) {
count ++;
offset ++;
}
}
return count;
}
June 14th, 2012
After some investigation, I believe this is the best way to do it.
faster than the accepted answer.

Is there any mileage in making an initial traversal and building an index of all occurances of each character.
h = { 0, 2, 27}
i = { 1, 30 }
etc.
Now working from the left, h, only possible palidromes are at 3 and 17, does char[0 + 1] == char [3 -1] etc. got a palindrome. does char [0+1] == char [27 -1] no, No further analysis of char[0] needed.
Move on to char[1], only need to example char[30 -1] and inwards.
Then can probably get smart, when you've identified a palindrome running from position x->y, all inner subsets are known palindromes, hence we've dealt with some items, can eliminate those cases from later examination.

My solution using O(n) memory and O(n^2) time, where n is the string length:
palindrome.c:
#include <stdio.h>
#include <string.h>
typedef unsigned long long ull;
ull countPalindromesHelper (const char* str, const size_t len, const size_t begin, const size_t end, const ull count) {
if (begin <= 0 || end >= len) {
return count;
}
const char pred = str [begin - 1];
const char succ = str [end];
if (pred == succ) {
const ull newCount = count == 0 ? 1 : count * 2;
return countPalindromesHelper (str, len, begin - 1, end + 1, newCount);
}
return count;
}
ull countPalindromes (const char* str) {
ull count = 0;
size_t len = strlen (str);
size_t i;
for (i = 0; i < len; ++i) {
count += countPalindromesHelper (str, len, i, i, 0); // even length palindromes
count += countPalindromesHelper (str, len, i, i + 1, 1); // odd length palindromes
}
return count;
}
int main (int argc, char* argv[]) {
if (argc < 2) {
return 0;
}
const char* str = argv [1];
ull count = countPalindromes (str);
printf ("%llu\n", count);
return 0;
}
Usage:
$ gcc palindrome.c -o palindrome
$ ./palindrome myteststring
EDIT: I misread the problem as the contiguous substring version of the problem. Now given that one wants to find the palindrome count for the non-contiguous version, I strongly suspect that one could just use a math equation to solve it given the number of distinct characters and their respective character counts.

Hmmmmm, I think I would count up like this:
Each character is a palindrome on it's own (minus repeated characters).
Each pair of the same character.
Each pair of the same character, with all palindromes sandwiched in the middle that can be made from the string between repeats.
Apply recursively.
Which seems to be what you're doing, although I'm not sure you don't double-count the edge cases with repeated characters.
So, basically, I can't think of a better way.
EDIT:
Thinking some more,
It can be improved with caching, because you sometimes count the palindromes in the same sub-string more than once. So, I suppose this demonstrates that there is definitely a better way.

Here is a program for finding all the possible palindromes in a string written in both Java and C++.

int main()
{
string palindrome;
cout << "Enter a String to check if it is a Palindrome";
cin >> palindrome;
int length = palindrome.length();
cout << "the length of the string is " << length << endl;
int end = length - 1;
int start = 0;
int check=1;
while (end >= start) {
if (palindrome[start] != palindrome[end]) {
cout << "The string is not a palindrome";
check=0;
break;
}
else
{
start++;
end--;
}
}
if(check)
cout << "The string is a Palindrome" << endl;
}

public String[] findPalindromes(String source) {
Set<String> palindromes = new HashSet<String>();
int count = 0;
for(int i=0; i<source.length()-1; i++) {
for(int j= i+1; j<source.length(); j++) {
String palindromeCandidate = new String(source.substring(i, j+1));
if(isPalindrome(palindromeCandidate)) {
palindromes.add(palindromeCandidate);
}
}
}
return palindromes.toArray(new String[palindromes.size()]);
}
private boolean isPalindrome(String source) {
int i =0;
int k = source.length()-1;
for(i=0; i<source.length()/2; i++) {
if(source.charAt(i) != source.charAt(k)) {
return false;
}
k--;
}
return true;
}

I am not sure but you might try whit fourier. This problem remined me on this: O(nlogn) Algorithm - Find three evenly spaced ones within binary string
Just my 2cents

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Longest prefix string length for all the suffixes - c++

You can use Suffix Array: http://en.wikipedia.org/wiki/Suffix_array

Use Z algorithm to calculate length of all substrings, which also prefixes in O(n) and then scan resulting array and sum its values. Reference: https://www.geeksforgeeks.org/sum-of-similarities-of-string-with-all-of-its-suffixes/

Related

Given two string S and T. Determine a substring of S that has minimum difference with T?

In c++, how can I grab the next few characters from a string?

How to replace certain items in a char array with an integer in C++?

Project Euler #8 - C++ code failed to work

How many palindromes can be formed by selections of characters from a string?

Categories

Resources