Find the character at the `k`th location in the infinite string - c++

I am trying to solve a problem:
Given two strings, s and t, we can form a string x of infinite length, as:
a. Append s to x 1 time;
b. Append t to x 2 times;
c. Append s to x 3 times;
d. Append t to x 4 times;
and so on...
Given k, find the kth character (1 indexed) in the resultant infinite string x.
For e.g., if s = a, t = bc and k = 4, then output: b (x=abcbc). s and t can contain anywhere from 1 to 100 characters, while 1<=k<=10^16.
The brute force way of actually constructing string x is trivial but too slow. How do I optimize it further?
In C++, the brute force solution would look like this:
#include <iostream>
using namespace std;
int main() {
int repeat=1, k=4;
string s="a", t="bc", x;
bool appendS=true;
while(x.size()<k) {
for(int i=1; i<=repeat; i++)
if(appendS) x+=s;
else x+=t;
appendS=!appendS;
repeat++;
}
cout<<x[k-1];
return 0;
}
But how do I optimize it, given huge k?

The string looks like
sttsssttttsssssttttttssssssstttttttt...
Group the string into substrings like
(stts)(ssttttss)(sssttttttsss)(ssssttttttttssss)(sssss...
Let
len(s) = a
len(t) = b
len(s+t) = c
Group 1: stts -> length = 2*c.
Group 2: ssttttss -> length = 4*c.
Group 3: sssttttttsss -> length = 6*c.
Continuing the pattern, it is easy to see that the length of ith group will be 2*i*c.
Let the kth character be in group n.
Total length of first n groups =
2*c + 4*c + 6*c .... + 2*n*c = (2*c)*(1+2+3...+n) = c*n*(n+1)
Since total length of n groups has to be greater than or equal to k,
c*n*(n+1) >= k
n*(n+1) >= k/c
Finding the largest value of n that satisfies this inequality is a trivial task. Now, the nth group looks something like
ss...(n times) + tttt...(2*n times) + ss...(n times)
Now, you just need to find the position of k mod n in this block, which is a simple task.

The search position is located at the nth append, where n is calculated for the last integer where this sum inequality holds, where l1 and l2 are the two strings lengths:
Hence, the search position is at the nth append, at the mod(n+1,2)+1 string, in the Delta k character of that string.
Explanation:
The sum is the addition of all lengths. Since you know all the substrings, you know the whole sum. And since both are integer algebraic expression, you have to do a little linear search over that, to find that integer n, as in any simple integer equation.
Having the nth append number, and the sum, the position is trivially obtained as Delta k.
Moreover, note that Delta k is a closed expression. You do not need any loop to calculate it, just to evaluate, since the sums of the i terms are the sum of the first floor(n/2) even and the first floor(n/2)+1 odd integers.

Related

Count how many substrings exist in a Fibonacci string

The problem is this:
You are given an integer N and a substring SUB
The Fibonacci String follows the following rules:
F\[0\] = 'A'
F\[1\] = 'B'
F\[k\] = F\[k - 2\] + F\[k - 1\]
(Meaning F\[2\] = 'AB', F\[3\] = 'BAB', F\[4\] = 'ABBAB',...)
Task: Count how many times substring SUB appears in F\[n\]
Sample cases:
Input
Output
4 AB
2
6 BAB
4
(N <= 5 * 10^3, 1 <= SUB.length() <= 50)
I had an overall understanding of the problem and wanting to find a more optimal way to solve that problem
My approach is following the fomula F\[k\] = F\[k - 2\] + F\[k - 1\] and then run loop tills it reaches (F\[k\].length - 1), each loop I extract a substring from F\[k\] at i with the same length as SUB (call it F_sub), then I check whether F_sub equals to SUB or not, if yes I increase count (Yes, this approach is not optimal enough for the big tests)
I am also thinking whether Dynamic Programing is suited for this problem or not
Starting with the first 2 strings that are at least as long as SUB, you should switch the representation of the strings F[n]. Instead of remembering the complete string, you only need to remember 3 numbers:
occurrences: the number of times SUB occurs within the string
prefix: The length of the longest prefix of the string that is a proper suffix of SUB
suffix: The length of the longest suffix of the string that is a proper prefix of SUB
Given o, p, an s for F[k] and F[k+1], you can calculate them for the concatenation F[k+2]:
F[k+2].p = F[k].p
F[k+2].s = F[k+1].s
F[k+2].o = F[k].o + F[k+1].o + JOIN(F[k].s,F[k+1].p)
The function JOIN(a,b) calculates the number of occurrences of SUB within the first a characters of SUB joined to the last b characters of SUB. There are only |SUB|2 values. In fact, since all the values for p and s are copied from the first 2 strings, there are only 4 values of this function that will be used. You can calculate them in advance.
F[N].o is the answer you are looking for.
A straightforward implementation of this takes O(N + |SUB|2), assuming constant time mathematical operations. Since |SUB| <= 50, this is quite efficient.
If the constraint on N was much larger, there's an optimization using matrix exponentiation that could bring the complexity down to O(log N + |SUB|2), but that's not necessary under the given constraints.

Infinite array problem in for q number of queries

Given an array, "A" of integers and you have also defined the new array"B"as a concatenation of
array"A" for an infinite number of times.
For example,if the given array"A"is[1,2,3]then,infinite array“B”is[1,2,3,1,2,3,1,2,3 ........
Now you are given q queries, each query consists of two integers"L" and"R"(1-based indexing).
Your task is to find the sum of the subarray from index"L" to"R"(both inclusive)in the infinite array "B" for each query.
Note:
The value of the sum can be very large,return the answer as modulus 10^9+7.
Input Format
The first line of input contains a single integer T, representing the number of test cases
or queries to be run.
Then the test cases follow.
The first line of each test case contains a single integer N, denoting the size of the
array"A".
The second line of each test case contains N single space-separated integers, elements of
the array"A".
The third line of each test case contains a single integer Q, denoting the number of
queries.
Then each of the Q lines of each test case contains two single space-separated integers L,
and R denote the left and the right index of the infinite array"B" whose sum is to be
returned.
I have come up with an approach that doesn't think is the optimal solution for this. I have used two pointer approach in this.
#include <iostream>
#include <vector>
vector<int> sumInRanges(vector<int> &arr, int n, vector<vector<long long>> &queries, int q) {
// Write your code here
vector<int> res;
for(int i = 0; i < q; i++){
int sum = 0;
int start = queries[i][0];
int end = queries[i][1];
while( start <= end ){
if(start < arr.size()){
sum += arr[start - 1];
}
else{
int mod = start % arr.size();
sum += arr[mod - 1];
}
}
res.push_back(sum);
}
return res;
}
Hint: think where you do the same things
Suppose A=[1,2,3], L=2, R=14
Your approach iterates over
2,3, 1,2,3, 1,2,3, 1,2,3, 1,2
see, how many times you iterate over the whole array?
instead, calculate sum(A) ONCE
then, calculate how many times it's used in the query (something like floor(R/len(A)) - ceil(L/len(A)))
multiply them
iterate ONCE to find missing sums (left from L%len(A) to len(A) and right from beginning to R%len(A))
And don't forget about the modulo 1e9+7
If len(A) and q get huge - save partial sums to beginning and partial sums till end once
And then retrieve them each query in constant time

A problem of taking combination for set theory

Given an array A with size N. Value of a subset of Array A is defined as product of all numbers in that subset. We have to return the product of values of all possible non-empty subsets of array A %(10^9+7).
E.G. array A {3,5}
` Value{3} = 3,
Value{5} = 5,
Value{3,5} = 5*3 = 15
answer = 3*5*15 %(10^9+7).
Can someone explain the mathematics behind the problem. I am thinking of solving it by combination to solve it efficiently.
I have tried using brute force it gives correct answer but it is way too slow.
Next approach is using combination. Now i think that if we take all the sets and multiply all the numbers in those set then we will get the correct answer. Thus i have to find out how many times a number is coming in calculation of answer. In the example 5 and 3 both come 2 times. If we look closely, each number in a will come same number of times.
You're heading in the right direction.
Let x be an element of the given array A. In our final answer, x appears p number of times, where p is equivalent to the number of subsets of A possible that include x.
How to calculate p? Once we have decided that we will definitely include x in our subset, we have two choices for the rest N-1 elements: either include them in set or do not. So, we conclude p = 2^(N-1).
So, each element of A appears exactly 2^(N-1) times in the final product. All remains is to calculate the answer: (a1 * a2 * ... * an)^p. Since the exponent is very large, you can use binary exponentiation for fast calculation.
As Matt Timmermans suggested in comments below, we can obtain our answer without actually calculating p = 2^(N-1). We first calculate the product a1 * a2 * ... * an. Then, we simply square this product n-1 times.
The corresponding code in C++:
int func(vector<int> &a) {
int n = a.size();
int m = 1e9+7;
if(n==0) return 0;
if(n==1) return (m + a[0]%m)%m;
long long ans = 1;
//first calculate ans = (a1*a2*...*an)%m
for(int x:a){
//negative sign does not matter since we're squaring
if(x<0) x *= -1;
x %= m;
ans *= x;
ans %= m;
}
//now calculate ans = [ ans^(2^(n-1)) ]%m
//we do this by squaring ans n-1 times
for(int i=1; i<n; i++){
ans = ans*ans;
ans %= m;
}
return (int)ans;
}
Let,
A={a,b,c}
All possible subset of A is ={{},{a},{b},{c},{a,b},{b,c},{c,a},{a,b,c,d}}
Here number of occurrence of each of the element are 4 times.
So if A={a,b,c,d}, then numbers of occurrence of each of the element will be 2^3.
So if the size of A is n, number of occurrence of eachof the element will be 2^(n-1)
So final result will be = a1^p*a2^pa3^p....*an^p
where p is 2^(n-1)
We need to solve x^2^(n-1) % mod.
We can write x^2^(n-1) % mod as x^(2^(n-1) % phi(mod)) %mod . link
As mod is a prime then phi(mod)=mod-1.
So at first find p= 2^(n-1) %(mod-1).
Then find Ai^p % mod for each of the number and multiply with the final result.
I read the previous answers and I was understanding the process of making sets. So here I am trying to put it in as simple as possible for people so that they can apply it to similar problems.
Let i be an element of array A. Following the approach given in the question, i appears p number of times in final answer.
Now, how do we make different sets. We take sets containing only one element, then sets containing group of two, then group of 3 ..... group of n elements.
Now we want to know for every time when we are making set of certain numbers say group of 3 elements, how many of these sets contain i?
There are n elements so for sets of 3 elements which always contains i, combinations are (n-1)C(3-1) because from n-1 elements we can chose 3-1 elements.
if we do this for every group, p = [ (n-1)C(x-1) ] , m going from 1 to n. Thus, p= 2^(n-1).
Similarly for every element i, p will be same. Thus we get
final answer= A[0]^p *A[1]^p...... A[n]^p

How do I solve this making it more efficient?

So, I am trying to solve the following question: https://www.codechef.com/TSTAM15/problems/ACM14AM3
The Mars Orbiter Mission probe lifted-off from the First Launch Pad at Satish Dhawan Space Centre (Sriharikota Range SHAR), Andhra
Pradesh, using a Polar Satellite Launch Vehicle (PSLV) rocket C25 at
09:08 UTC (14:38 IST) on 5 November 2013.
The secret behind this successful launch was the launch pad that ISRO
used. An important part of the launch pad is the launch tower. It is
the long vertical structure which supports the rocket.
ISRO now wants to build a better launch pad for their next mission.
For this, ISRO has acquired a long steel bar, and the launch tower can
be made by cutting a segment from the bar. As part of saving the cost,
the bar they have acquired is not homogeneous.
The bar is made up of several blocks, where the ith block has
durability S[i], which is a number between 0 and 9. A segment is
defined as any contiguous group of one or more blocks.
If they cut out a segment of the bar from ith block to jth block
(i<=j), then the durability of the resultant segment is given by (S[i]*10(j-i) + S[i+1]*10(j-i-1) + S[i+2]*10(j-i-2) + … + S[j] * 10(0)) % M. In other words, if W(i,j) is the base-10 number formed by
concatenating the digits S[i], S[i+1], S[i+2], …, S[j], then
the durability of the segment (i,j) is W(i,j) % M.
For technical reasons that ISRO will not disclose, the durability of
the segment used for building the launch tower should be exactly L.
Given S and M, find the number of ways ISRO can cut out a segment from
the steel bar whose durability is L. Input
The first line contains a string S. The ith character of this string
represents the durability of ith segment. The next line contains a
single integer Q, denoting the number of queries. Each of the next Q
lines contain two space separated integers, denoting M and L. Output
For each query, output the number of ways of cutting the bar on a
separate line. Constraints
1 ≤ |S| ≤ 2 * 10^4
Q ≤ 5
0 < M < 500
0 ≤ L < M
Example
Input:
23128765
3
7 2
9 3
15 5
Output:
9
4
5
Explanation
For M=9, L=3, the substrings whose remainder is 3 when divided by
9 are: 3, 31287, 12 and 876.
Now, what I did was, I initially generate all possible substrings of numbers of the given length, and tried to divide it by the given number to check if it is divisible and added it to the answer. Therefore, my code for the same was,
string s;
cin>>s;
int m,l,ans=0;
for ( i = 0; i < s.length(); i++ )
{
for ( j = i+1; j < s.length(); j++ )
{
string p = s.substr(i,j);
long long num = stoi(p);
if (num%m == l)
ans++;
}
}
cout<<ans<<"\n";
return 0;
But obviously since the input length is upto 10^4, this doesn't work in required time. How can I make it more optimal?
A little advice I can give you is to initialize a variable to s.length() to avoid calling the function each time for each for block.
Ok, here goes, with a working program at the bottom
Major optimization #1
Do not (ever) work with strings when it comes to integer arithmetic. You're converting string => integer over and over and over again (this is an O(n^2) problem), which is painstakingly slow. Besides, it also misses the point.
Solution: first convert your array-of-characters (string) to array-of-numbers. Integer arithmetic is fast.
Major optimization #2
Use a smart conversion from "substring" to number. After transforming the characters to actual integers, they become the factors in the the polynomial a_n * 10^n. To convert a substring of n segments into a number, it is enough to compute sum(a_i * 10^i) for 0 <= i < n.
And nicely enough, if the coefficients a_i are arranged the way they are in the problem's statement, you can use Horner's method (https://en.wikipedia.org/wiki/Horner%27s_method) to very quickly evaluate the numerical value of the substring.
In short: keep a running value of the current substring and growing it by one element is just * 10 + new element
Example: string "128472373".
First substring = "1", value = 1.
For the second substring we need to
add the digit "2" as follows: value = value * 10 + "2", thus: value = 1 * 10 + 2 = 12.
For 3rd substring need to add digit "8": value = value * 10 + "8", thus: value = 12 * 10 + 8 = 128.
Etcetera.
I had some issues with formatting the C++ code inline so I stuck it in IDEone: https://ideone.com/TbJiqK
The gist of the program:
In main loop, loop over all possible start points:
// For all startpoints in the segments array ...
for(int* f=segments; f<segments+n_segments; f++)
// add up the substrings that fullfill the question
n += count_segments(f, segments+n_segments, m, l);
// Output the answer for this question
cout << n << endl;
Implementation of the count_segments() function:
// Find all substrings that % m == l
// Use Horner's algorithm to quickly evaluate sum(a_n*10^n) where
// a_n are the segments' durabilities
int count_segments(int* first, int* last, int m, int l) {
int n = 0, number = 0;
while( first<last ) {
number = number * 10 + *first; // This is Horner's method
if( (number % m)==l ) {
n++;
// If you don't believe - enable this line of output and
// see the numbers matching the combinations of the
//cout << "[" << m << ", " << l << "]: " << number << endl;
}
first++;
}
return n;
}

Finding the maximum weight subsequence of an array of positive integers?

I'm tring to find the maximum weight subsequence of an array of positive integers - the catch is that no adjacent members are allowed in the final subsequence.
The exact same question was asked here, and a recursive solution was given by MarkusQ thus:
function Max_route(A)
if A's length = 1
A[0]
else
maximum of
A[0]+Max_route(A[2...])
Max_route[1...]
He provides an explanation, but can anyone help me understand how he has expanded the function? Specifically what does he mean by
f[] :- [],0
f [x] :- [x],x
f [a,b] :- if a > b then [a],a else [b],b
f [a,b,t] :-
ft = f t
fbt = f [b|t]
if a + ft.sum > fbt.sum
[a|ft.path],a+ft.sum
else
fbt
Why does he expand f[] to [],0? Also how does his solution take into consideration non-adjacent members?
I have some C++ code that is based on this algorithm, which I can post if anyone wants to see it, but I just can't for the life of me fathom why it works.
==========For anyone who's interested - the C++ code ==============
I should add, that the array of integers is to be treated as a circular list, so any sequence containing the first element cannot contain the last.
int memo[55][55];
int solve(int s, int e)
{
if( s>e ) return 0;
int &ret=memo[s][e];
if(ret!=-1)
{
return ret;
}
ret=max(solve(s+1,e), solve(s+2,e)+a[s]);
return ret;
}
class Sequence
{
public:
int maxSequence(vector <int> s)
{
memset(memo,-1);
int n = s.size();
for(int i=0; i<n; i++)
a[i]=s[i];
return max(solve(0,n-2),solve(1,n-1));
}
};
I don't really understand that pseudocode, so post the C++ code if this isn't helpful and I'll try to improve it.
I'm tring to find the maximum weight subsequence of an array of positive integers - the catch is that no adjacent members are allowed in the final subsequence.
Let a be your array of positive ints. Let f[i] = value of the maximum weight subsequence of the sequence a[0..i].
We have:
f[0] = a[0] because if there's only one element, we have to take it.
f[1] = max(a[0], a[1]) because you have the no adjacent elements restriction, so if you have two elements, you can only take one of them. It makes sense to take the largest one.
Now, generally you have:
f[i > 1] = max(
f[i - 2] + a[i] <= add a[i] to the largest subsequence of the sequence a[0..i - 2]. We cannot take a[0..i - 1] because otherwise we risk adding an adjacent element.
f[i - 1] <= don't add the current element to the maximum of a[0..i - 2], instead take the maximum of a[0..i - 1], to which we cannot add a[i].
)
I think this way is easier to understand than what you have there. The approaches are equivalent, I just find this clearer for this particular problem, since recursion makes things harder in this case and the pseudocode could be clearer either way.
But what do you NOT understand? It seems quite clear for me:
we will build the maximal subsequence for every prefix of our given sequence
to calculate the maximal subsequence for prefix of length i, we consider two possibilities: Either the last element is, or isn't in the maximal subsequence (clearly there are no other possibilities).
if it is there, we consider the value of the last element, plus the value of maximal subsequence of the prefix two elements shorter (because in this case, we know the last element cannot be present in the maximal subsequence because of the adjacent elements rule)
if it isn't we take the value of maximal sum of prefix one element shorter (if the last element of the prefix is not in the maximal subsequence, the maximal subsequence has to be equal for this and the previous prefix)
we compare and take the maximum of the two
Plus: you need to remember actual subsequences; you need to avoid superfluous function invocations, hence the memoization.
Why does he expand f[] to [],0?
Because the first from the pair in return value means current maximal subsequence, and the second is its value. Maximal subsequence of an empty sequence is empty and has value zero.