IndexError: string index out of range, when comparing two strings

IndexError: string index out of range, when comparing two strings - if-statement

My code is trying to compare each separate character in a string to find the differences between two words. However, the line "if a[i] == b [i]" appears to be giving some grief. i is a variable which has already been given a value, and will be tracked through, and I cannot see how it is not running. My code in full is as follows:
a = str(input("Choose a word, any word: "))
b = str(input("Choose another word: "))
j = 0
r = 0
n = len(a)
m = len(b)
if n == m:
while r <= n:
if a[r] == b[r]:
r = r + 1
else:
j = j + 1
r = r + 1
print("The hamming distance between ", a, "and ", b, "is: ", j)
else:
p = max(n, m) - min(n, m)
while r <= p:
if a[r] == b[r]:
r = r + 1
else:
j = j + 1
r = r + 1
k = p + j
print ("The hamming distance between ", a, "and ", b, "is: ", k)
I know it likely isnt the most compact, but any help would be greatly appreciated, thank you.
Edit: I have fixed it, sheer stupidity on my part, a simple case of removing the equal from after the r <= n and r <= p parts fixed it

I have fixed it, sheer stupidity on my part, a simple case of removing the equal from after the r <= n and r <= p parts fixed it.

Related

Pythagorean triple nested loops missunderstanding

Greetings and regards;
I was trying to find Pythagorean Triple numbers less than 1000.
Fortunately I was able to find the algorithm for it, Here it is:
for (int a = 1; a < 1000; a++)
{
for (int b = a; b < 1000; b++)
{
for (int c = b; c < 1000; c++)
{
if ((a * a) + (b * b) == (c * c))
{
cout << "( " << a << ", " << b << ", " << c << " )";
cout << endl;
}
}
}
}
But I don't understand a thing about this code!
Why does the initial value of each loop start from the value of the previous loop ? While the initial value of each loops can be started from 1 !
What's the reason for this ?

For a < b :
Pythagorean triples appear in pairs i.e. (a,b,c) and (b,a,c) : a,b < c ∀ a,b,c ∈ ℕ. Since other one of the pair becomes a trivial solution if one is found. Suppose a Pythagorean triple (a,b,c) is found such that a < b then we immediately know that (b,a,c) is also a Pythagorean triple so we don't want our program to search for it as it will just increase the search domain and thus the execution time. To avoid that, loops are set as a≤b. However, you can also initiate them as a < b or b = a + 1
For b < c or a < b < c:
You can initiate them as a < b < c or (c = b + 1 and b = a + 1) because no Pythagorean triple can be of form (b,b,c) as b^2 + b^2 = 2 * b^2 = c^2, that means c = b * sqrt(2) in which c is an integer and b * sqrt(2) is an irrational number, so the two can never be equal and integer solution can never exist. But c = b * sqrt(2) also says that c > b.
Therefore, a < b < c

Pythagorean triplets have only one way of being ordered: if a² + b² = c² then one can prove than a² + c² ≠ b² and b² + c² ≠ a².
From the above and a few special cases (a = 0 is excluded by definition, a ∊ (0, 2] are easy to check by hand), it follows that one only has to check triplets for which 2 < a ≤ b < c, and this is (almost) what the tree loops do.
There are two reasons for this:
By setting up the loop so that a ≤ b ≤ c we guarantee that no triplet appears more than once
There are fewer triplets to test, so we reduce the execution time by a constant factor.

Substrings of equal length comparison using hashing

On an assignment that I have, for a string S, I need to compare two substrings of equal lengths. Output should be "Yes" if they are equal, "No" if they are not equal. I am given the starting indexes of two substrings (a and b), and the length of the substrings L.
For example, for S = "Hello", a = 1, b = 3, L = 2, the substrings are:
substring1 = "el" and substring2 = "lo", which aren't equal, so answer will be "No".
I think hashing each substring of the main string S and writing them all to memory would be a good aproach to take. Here is the code I have written for this (I have tried to implement what I learned about this from the Coursera course that I was taking):
This function takes any string, and values for p and x for hashing thing, and performs a polynomial hash on the given string.
long long PolyHash(string str, long long p, int x){
long long res = 0;
for(int i = str.length() - 1; i > -1; i--){
res = (res * x + (str[i] - 'a' + 1)) % p;
}
return res;
}
The function below just precomputes all hashes, and fills up an array called ah, which is initialized in the main function. The array ah consists of n = string length rows, and n = string length columns (half of which gets wasted because I couldn't find how to properly make it work as a triangle, so I had to go for a full rectangular array). Assuming n = 7, then ah[0]-ah[6] are hash values for string[0]-string[6] (meaning all substrings of length 1). ah[7]-ah[12] are hash values for string[0-1]-string[5-6] (meaning all substrings of length 2), and etc. until the end.
void PreComputeAllHashes(string str, int len, long long p, int x, long long* ah){
int n = str.length();
string S = str.substr(n - len, len);
ah[len * n + n - len] = PolyHash(S, p, x);
long long y = 1;
for(int _ = 0; _ < len; _++){
y = (y * x) % p;
}
for(int i = n - len - 1; i > -1; i--){
ah[n * len + i] = (x * ah[n * len + i + 1] + (str[i] - 'a' + 1) - y * (str[i + len] - 'a' + 1)) % p;
}
}
And below is the main function. I took p equal to some large prime number, and x to be some manually picked, somewhat "random" prime number.
I take the text as input, initialize hash array, fill the hash array, and then take queries as input, to answer all queries from my array.
int main(){
long long p = 1e9 + 9;
int x = 78623;
string text;
cin >> text;
long long* allhashes = new long long[text.length() * text.length()];
for(int i = 1; i <= text.length(); i++){
PreComputeAllHashes(text, i, p, x, allhashes);
}
int queries;
cin >> queries;
int a, b, l;
for(int _ = 0; _ < queries; _++){
cin >> a >> b >> l;
if(a == b){
cout << "Yes" << endl;
}else{
cout << ((allhashes[l * text.length() + a] == allhashes[l * text.length() + b]) ? "Yes" : "No") << endl;
}
}
return 0;
}
However, one of the test cases for this assignment on Coursera is throwing an error like this:
Failed case #7/14: unknown signal 6 (Time used: 0.00/1.00, memory used: 29396992/536870912.)
Which, I have looked up online, and means the following:
Unknown signal 6 (or 7, or 8, or 11, or some other).This happens when your program crashes. It can be
because of division by zero, accessing memory outside of the array bounds, using uninitialized
variables, too deep recursion that triggers stack overflow, sorting with contradictory comparator,
removing elements from an empty data structure, trying to allocate too much memory, and many other
reasons. Look at your code and think about all those possibilities.
And I've been looking at my code the entire day, and still haven't been able to come up with a solution to this error. Any help to fix this would be appreciated.
Edit: The assignment states that the length of the input string can be up to 500000 characters long, and the number of queries can be up to 100000. This task also has 1 second time limit, which is pretty small for going over characters one by one for each string.

So, I did some research as to how I can reduce the complexity of this algorithm that I have implemented, and finally found it! Turns out there is a super-simple way (well, not if you count the theory involved behind it) to get hash value of any substring, given the prefix hashes of the initial string!
You can read more about it here, but I will try to explain it briefly.
So what do we do - We precalculate all the hash values for prefix-substrings.
Prefix substrings for a string "hello" would be the following:
h
he
hel
hell
hello
Once we have hash values of all these prefix substrings, we can collect them in a vector such that:
h[str] = str[0] + str[1] * P + str[2] * P^2 + str[3] * P^3 + ... + str[N] * P^N
where P is any prime number (I chose p = 263)
Then, we need a high value that we will take everything's modulo by, just to keep things not too large. This number I will choose m = 10^9 + 9.
First I am creating a vector to hold the precalculated powers of P:
vector<long long> p_pow (s.length());
p_pow[0] = 1;
for(size_t i=1; i<p_pow.size(); ++i){
p_pow[i] = (m + (p_pow[i-1] * p) % m) % m;
}
Then I calculate the vector of hash values for prefix substrings:
vector<long long> h (s.length());
for (size_t i=0; i<s.length(); ++i){
h[i] = (m + (s[i] - 'a' + 1) * p_pow[i] % m) % m;
if(i){
h[i] = (m + (h[i] + h[i-1]) % m) % m;
}
}
Suppose I have q queries, each of which consist of 3 integers: a, b, and L.
To check equality for substrings s1 = str[a...a+l-1] and s2 = str[b...b+l-1], I can compare the hash values of these substrings. And to get the hash value of substrings using the has values of prefix substrings that we just created, we need to use the following formula:
H[I..J] * P[I] = H[0..J] - H[0..I-1]
Again, you can read about the proof of this in the link.
So, to address each query, I would do the following:
cin >> a >> b >> len;
if(a == b){ // just avoid extra calculation, saves little time
cout << "Yes" << endl;
}else{
long long h1 = h[a+len-1] % m;
if(a){
h1 = (m + (h1 - h[a-1]) % m) % m;
}
long long h2 = h[b+len-1] % m;
if(b){
h2 = (m + (h2 - h[b-1]) % m) % m;
}
if (a < b && h1 * p_pow[b-a] % m == h2 % m || a > b && h1 % m == h2 * p_pow[a-b] % m){
cout << "Yes" << endl;
}else{
cout << "No" << endl;
}
}

Your approach is very hard and complex for such a simple task. Assuming that you only need to do this operation once. You can compare the substrings manually with a for loop. No need for hashing. Take a look at this code:
for(int i = a, j = b, counter = 0 ; counter < L ; counter++, i++, j++){
if(S[i] != S[j]){
cout << "Not the same" << endl;
return 0;
}
}
cout << "They are the same" << endl;

How to add integers in a string, using only TWO while loops, with one nested in the other

here's the question
"Standard input consists of a single addition involving exactly five integer terms.
"271+9730+30+813+5" for example.
I need to add all that while using only max two while loops with one in another.
Im only allowed to use functions such as
if/else
while
can't use lists for this
I've tried saving the first number as "x" and then the second number as "y" and add that and then at the end restart the loop with the string being cut to exclude the first two numbers
#!/usr/bin/env python
s = raw_input()
i = 0
y = 0
while i < len(s) and s[i] != "+":
i = i + 1
x = s[:i]
if i < len(s):
j = i + 1
while j < len(s) and s[j] != "+":
j += 1
y = s[i + i:j]
s = s[j:]
i = 0

You need to think in terms of magnitude. Each time you move one place without hitting a '+' you increase the value of the first digit 10-fold.
result=0
s="271+9730+30+813+5"
spot=0
dec=0
i=0
while spot < 4 and i < len(s):
if s[0] == '+':
spot+=1;
s=s[1:];
if s[i] == '+':
result = (result) + (int(s[0])*(10**(dec-1)));
s=s[1:];
i=0;
dec=0;
print s
print result
else:
dec+=1;
i+=1;
result = result + int(s)
print result
EDIT: Alternately, a much more computationally efficient solution using int() and exactly 2 while loops:
result=0
s="271+9730+30+813+5"
spot=0
i=0
while spot < 4:
while s[i] != '+':
i+= 1;
result += int(s[0:i]);
s=s[i+1:];
i=0;
print "remaining string: " + s
print "current result: " + str(result)
spot+=1;
result += int(s)
print "final result: " + str(result)

count distinct slices in an array

I was trying to solve this problem.
An integer M and a non-empty zero-indexed array A consisting of N
non-negative integers are given. All integers in array A are less than
or equal to M.
A pair of integers (P, Q), such that 0 ≤ P ≤ Q < N, is called a slice
of array A. The slice consists of the elements A[P], A[P + 1], ...,
A[Q]. A distinct slice is a slice consisting of only unique numbers.
That is, no individual number occurs more than once in the slice.
For example, consider integer M = 6 and array A such that:
A[0] = 3
A[1] = 4
A[2] = 5
A[3] = 5
A[4] = 2
There are exactly nine distinct slices: (0, 0), (0, 1), (0, 2), (1,
1), (1,2), (2, 2), (3, 3), (3, 4) and (4, 4).
The goal is to calculate the number of distinct slices.
Thanks in advance.
#include <algorithm>
#include <cstring>
#include <cmath>
#define MAX 100002
// you can write to stdout for debugging purposes, e.g.
// cout << "this is a debug message" << endl;
using namespace std;
bool check[MAX];
int solution(int M, vector<int> &A) {
memset(check, false, sizeof(check));
int base = 0;
int fibot = 0;
int sum = 0;
while(fibot < A.size()){
if(check[A[fibot]]){
base = fibot;
}
check[A[fibot]] = true;
sum += fibot - base + 1;
fibot += 1;
}
return min(sum, 1000000000);
}

The solution is not correct because your algorithm is wrong.
First of all, let me show you a counter example. Let A = {2, 1, 2}. The first iteration: base = 0, fibot = 0, sum += 1. That's right. The second one: base = 0, fibot = 1, sum += 2. That's correct, too. The last step: fibot = 2, check[A[fibot]] is true, thus, base = 2. But it should be 1. So your code returns1 + 2 + 1 = 4 while the right answer 1 + 2 + 2 = 5.
The right way to do it could be like this: start with L = 0. For each R from 0 to n - 1, keep moving the L to the right until the subarray contais only distinct values (you can maintain the number of occurrences of each value in an array and use the fact that A[R] is the only element that can occur more than once).
There is one more issue with your code: the sum variable may overflow if int is 32-bit type on the testing platform (for instance, if all elements of A are distinct).
As for the question WHY your algorithm is incorrect, I have no idea why it should be correct in the first place. Can you prove it? The base = fibot assignment looks quite arbitrary to me.

I would like to share the explanation of the algorithm that I have implemented in C++ followed by the actual implementation.
Notice that the minimum amount of distinct slices is N because each element is a distinct one-item slice.
Start the back index from the first element.
Start the front index from the first element.
Advance the front until we find a duplicate in the sequence.
In each iteration, increment the counter with the necessary amount, this is the difference between front and back.
If we reach the maximum counts at any iteration, just return immediately for slight optimisation.
In each iteration of the sequence, record the elements that have occurred.
Once we have found a duplicate, advance the back index one ahead of the duplicate.
While we advance the back index, clear all the occurred elements since we start a new slice beyond those elements.
The runtime complexity of this solution is O(N) since we go through each
element.
The space complexity of this solution is O(M) because we have a hash to store
the occurred elements in the sequences. The maximum element of this hash is M.
int solution(int M, vector<int> &A)
{
int N = A.size();
int distinct_slices = N;
vector<bool> seq_hash(M + 1, false);
for (int back = 0, front = 0; front < N; ++back) {
while (front < N and !seq_hash[A[front]]) { distinct_slices += front - back; if (distinct_slices > 1000000000) return 1000000000; seq_hash[A[front++]] = true; }
while (front < N and back < N and A[back] != A[front]) seq_hash[A[back++]] = false;
seq_hash[A[back]] = false;
}
return distinct_slices;
}

100% python solution that helped me, thanks to https://www.martinkysel.com/codility-countdistinctslices-solution/
def solution(M, A):
the_sum = 0
front = back = 0
seen = [False] * (M+1)
while (front < len(A) and back < len(A)):
while (front < len(A) and seen[A[front]] != True):
the_sum += (front-back+1)
seen[A[front]] = True
front += 1
else:
while front < len(A) and back < len(A) and A[back] != A[front]:
seen[A[back]] = False
back += 1
seen[A[back]] = False
back += 1
return min(the_sum, 1000000000)

Solution with 100% using Ruby
LIMIT = 1_000_000_000
def solution(_m, a)
a.each_with_index.inject([0, {}]) do |(result, slice), (back, i)|
return LIMIT if result >= LIMIT
slice[back] = true
a[(i + slice.size)..-1].each do |front|
break if slice[front]
slice[front] = true
end
slice.delete back
[result + slice.size, slice]
end.first + a.size
end

Using Caterpillar algorithm and the formula that S(n+1) = S(n) + n + 1 where S(n) is count of slices for n-element array java solution could be:
public int solution(int top, int[] numbers) {
int len = numbers.length;
long count = 0;
if (len == 1) return 1;
int front = 0;
int[] counter = new int[top + 1];
for (int i = 0; i < len; i++) {
while(front < len && counter[numbers[front]] == 0 ) {
count += front - i + 1;
counter[numbers[front++]] = 1;
}
while(front < len && numbers[i] != numbers[front] && i < front) {
counter[numbers[i++]] = 0;
}
counter[numbers[i]] = 0;
if (count > 1_000_000_000) {
return 1_000_000_000;
}
}
return count;
}

Simplest way to split n objects into m sets, where m doesn't divide n?

I have two integers m and n, with m < n. In general, m doesn't divide n.
Say that n = m*q + r, where q is the integer quotient of n and m, and r is the remainder (0 <= r < m). If we split n objects into m boxes as homogeneously as possible, r of the boxes will contain q+1 objects and the remaining boxes will contain q objects. Suppose that the objects are indexed from 1 to n, and that they are inserted into the boxes in order. Moreover, suppose that the first r boxes contain q+1 objects.
I want to write a function that returns a list of indices i1, i2, ..., im, such that i1 is the index of the smallest object in the first box, i2 the index of the smallest object in the second box, and so on.
I can think of a couple of ways to write this function myself, but I think they are too complicated. I believe there's a simple way to do this that I am not seeing.

Okay, I did not fully understand what you want to say by 'inserted into the boxes in order', so I'll offer you a solution for the 2 possible meanings.
a) The objects are inserted into the boxes like this:
9|
5|6|7|8
1|2|3|4
in which case the solution is fairly simple: just print all the numbers from 1 to m.
Code:
function foo(int n, int m) {
for (int k=1; k<=m; k++)
cout<<k<<endl;
}
b)The objects are inserted into the boxes like this:
3|6|
2|5|8|10
1|4|7|9
in which case, for every box, the object with the smallest index in box k is: (n / m) * (k - 1) + min(k, n % m + 1)
Code:
function foo(int n, int m) {
for (int k=1; k<=m; k++)
cout<<(n / m) * (k - 1) + min(k, n % m + 1)<<endl;
}

Add q objects to every box. If the box is one of the n - m*q == n % m first ones, add one extra object:
std::vector<int> starts_of(int n, int m)
{
std::vector<int> v;
int q = n / m;
int s = 1;
for (int i = 0; i < m; i++) {
v.push_back(s);
s += q;
if (i < n % m) s++;
}
return v;
}

r = n % m
d = n / m
//with two loops without conditionals:
for i = 0..r - 1
I[i] = 1 + i * d + i
for i = r..m - 1
I[i] = 1 + i * d + r
//or with single loop:
for i = 0..m - 1
I[i] = 1 + i * d + min(i, r)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

IndexError: string index out of range, when comparing two strings - if-statement

I have fixed it, sheer stupidity on my part, a simple case of removing the equal from after the r <= n and r <= p parts fixed it.

Related

Pythagorean triple nested loops missunderstanding

Substrings of equal length comparison using hashing

How to add integers in a string, using only TWO while loops, with one nested in the other

count distinct slices in an array

Simplest way to split n objects into m sets, where m doesn't divide n?

Categories

Resources