Finding the middle of an array without knowing the length - c++

Find the middle of the string or array with an unknown length. You may
not traverse the list to find the length. You may not use anything to
help you find the length - as it is "unknown." (ie. no sizeof (C) or count(C#) etc...)
I had this question as an interview question. I'm just wondering what the answer is. I did ask if i could use sizeof, he said "no, the size of the string or array is unknown - you just need to get to the middle."
BTW, i'm not sure if this is actually possible to solve with no traversing. I almost felt as though he may have wanted to see how confident i am in my answer :S not sure...
His English was bad - also not sure if this contributed to misunderstandings. He directly told me that i do not need to traverse the list to get to the middle :S :S I'm assuming he meant no traversing at all..... :S

Have two counters, c1 and c2. Begin traversing the list, incrementing c1 every time and c2 every other time. By the time c1 gets to the end, c2 will be in the middle.
You haven't "traversed the list to find the length" and then divided it by two, you've just gone through once.
The only other way I can think of would be to keep taking off the first and last item of the list until you are left with the one(s) in the middle.

You (or your interviewer) are very vague in what the data is (you mentioned "string" and "array"); there's no assumption that can be made, so it can be anything.
You mentioned that the length of the string is unknown, but from your wording it might seem like you (or the interviewer) actually meant to say unknowable.
a) If it's just unknown, then the question is, how can it can be determined? In the case of strings, for example, you can consider the end to be '\0'. You can then apply some algorithms like the ones suggested by the other answers.
b) If it's unknowable, the riddle has no solution. The concept of middle has no meaning without a beginning and an end.
Bottom line, you cannot talk about a middle without a beginning and an end, or a length. Either this question was intentionally unanswerable, or you did not understand it properly. You must know more than just the beginning of the memory segment and maybe its type.

The following code will find the middle of an array WITHOUT traversing the list
int thearray[80];
int start = (int)&thearray;
int end = (int)(&thearray+1);
int half = ((end-start) / 4)/ 2;
std::cout << half << std::endl;
EDITS:
This code assumes you are dealing with an actual array and not a pointer to the first element of one, thus code like:
int *pointer_to_first_element = (int *)malloc(someamount);
will not work, likewise with any other notation that degrades the array reference into a pointer to the first element. Basically any notation using the *.

You would just use the difference between the addresses of the first and last elements.

I think this problem is aimed to also test your skills in problem analysis and requirements gathering. As others have stated before, we will need at least another piece of data to solve this issue.
My approach is to let clear to the interviewer that we can solve the problem with one constraint in the function call: the caller must provide 2 pointer, one to the beginning and another to the end of the array. Given those 2 pointers, and using basic pointer arithmetic, I reach this solution; please let me know what you think about it.
int *findMiddleArray( int const *first, int const *last )
{
if( first == NULL || last == NULL || first > last )
{
return NULL;
}
if( first == last )
{
return (int *)first;
}
size_t dataSize= ( size_t )( first + 1 ) - ( size_t )first,
addFirst= ( size_t )first,
addLast= ( size_t )last,
arrayLen= ( addLast - addFirst) / dataSize + 1,
arrayMiddle= arrayLen % 2 > 0 ? arrayLen / 2 + 1 : arrayLen / 2;
return ( int * )( ( arrayMiddle - 1 ) * dataSize + addFirst );
}

one way you can find midpoint of array is (for odd length array)
just use two loops ,1st loop start traverse from 0 index and the other (nested) loop will traverse from last index of array. Now just compare elements when it comes same ...that will be the mid point of array. i.e if(arr[i]== arr[j]) . Hope you got the point !
For Even length array ..you can do if(arr[i] == arr[j-1]) Or if(arr[i] == arr[j+1]) as they will never be same .try it by dry run!

Related

Reversing an array by recursively splitting the array in C++

string recursion(int arr[], int arrSize) {
if(arrSize == 1){
return to_string(arr[0]);
}
string letterLeft, letterRight, letterFull;
//logic1: Normal Recursion
//letterFull = to_string(a[n-1]) + " " + recursion(a, n-1);
//logic2: D&C ------------------------------------------------------ <
letterLeft += recursion(arr, arrSize/2);
letterRight += recursion(arr + arrSize/2, arrSize-arrSize/2) + " ";
letterFull += letterRight + letterLeft;
return letterFull;
}
I recently used the 'logic2: D&C' for splitting an array recursively to reverse them (more like Divide and Conquer).
(i.e. if 2,5,4,1 is given as input, output should be 1,4,5,2).
It passed all test case like the example I just gave. Yet, I felt like I don't really understand the flow of recursion. I am here to get help from someone in explaining what happens in the 'logic2: D&C' step-by-step with an example flowchart & values if possible?
PS:
I took help from https://stackoverflow.com/a/40853237/18762063 to implement this 'logic2: D&C'.
Edits - Made little changes to argument list.
It is unclear why there are used objects of the type std::string. To reverse an array of objects of the type char there is no need to use the class sdt::string. This is just inefficient. The function can look the following way
void recursion( char arr[], size_t arrSize )
{
if ( not ( arrSize < 2 ) )
{
std::swap( arr[0], arr[arrSize-1] );
recursion( arr + 1, arrSize - 2 );
}
}
The point of divide and conquer algorithms is breaking your data down to smaller and smaller pieces while processing it.
If you want to reverse an array for example, than the following should happen:
The data is split into 2 equal parts and the order of these parts gets switched (so the back half becomes the front half, and the front half becomes the back half)
The 2 split parts are then also broken down to 2-2 equal parts, and get reversed.
And so on, until the splits are 1 length, so they can't be split any further, resulting in a fully reversed array.
Reading material on D and C algorithms

Bitwise Operation in C/C++: ORing all XOR'd pairs in O(N)

I need to XOR each possible pair of elements in an array, and then OR those results together.
Is it possible to do this in O(N)?
Example:-
If list contain three numbers 10,15 & 17, Then there will be a total of 3 pairs:
d1=10^15=5;
d2=10^17=27;
d3=17^15=30;
k= d1 | d2 | d3 ;
K=31
Acutually, it's even easier than Tanmay suggests.
It turns out that most of the pairs are redundant: (A^B)|(A^C)|(B^C) == (A^B)|(A^C) and
(A^B)|(A^C)|(A^D)|(B^C)|(B^D)|(C^D) == (A^B)|(A^C)|(A^D), etc. So you can just XOR each element with the first, and OR the results:
result = 0;
for (i=1; i<N;i++){
result|=data[0]^data[i];
}
OR everything, NAND everything, AND both results
Finding all combinations in O(1) is obviously impossible. So the solution had to be something ad-hoc reformulation of the problem. This is a complete intuition. (I don't have proof, but it works).
I am not sure how to solve it mathematically using boolean algebra since it involves finding all combination pairs, but I'll try to explain it using Venn diagram.
The required area is exactly identical to Venn diagram of OR except for the area of AND. Therefore they have to be subtracted. If you try it with n > 3, the picture would be even clearer.
The best way to test this method would be to simulate it with algorithms which don't have to be O(1). Anyways, you can try finding a direct proof. If you find it, please kindly share it with us too. :)
As far as your question goes, I'm sure you can implement it in O(1) yourself easily.
Good luck.
Bitwise means that you only care about 1 or 0...
The OR phase is true if at least one "pair XOR" is true.
There exists only two series for which all "pair XOR" are false : 1,1,1,1,1,1,1,1 and 0,0,0,0,0,0.
The solution is therefore a for loop to test if all items are 1 or 0.
And this is O(n) !
Bye,
You can just do what is straightforward: loop over all the pairs, 'xor' them, and 'or' the sub results. Here is a function that expects a pointer to the start of the array, and the size of the array. I typed it here without trying it, but even if it is not correct, I hope you get the idea.
unsigned int compute(const unsigned int *p, size_t size)
{
assert(size >= 2);
size_t counter = size - 1;
unsigned int value = 0;
while (counter != 0) {
value |= *p ^ *(p + 1);
++p;
--counter;
}
return value;
}

Increase string overlap matrix building efficiency

I have a huge list (N = ~1million) of strings 100 characters long that I'm trying to find the overlaps between. For instance, one string might be
XXXXXXXXXXXXXXXXXXAACTGCXAACTGGAAXA (and so on)
I need to build an N by N matrix that contains the longest overlap value for every string with every other string. My current method is (pseudocode)
read in all strings to array
create empty NxN matrix
compare each string to every string with a higher array index (to avoid redoing comparisons)
Write longest overlap to matrix
There's a lot of other stuff going on, but I really need a much more efficient way to build the matrix. Even with the most powerful computing clusters I can get my hands on this method takes days.
In case you didn't guess, these are DNA fragments. X indicates "wild card" (probe gave below a threshold quality score) and all other options are a base (A, C, T, or G). I tried to write a quaternary tree algorithm, but this method was far too memory intensive.
I'd love any suggestions you can give for a more efficient method; I'm working in C++ but pseudocode/ideas or other language code would also be very helpful.
Edit: some code excerpts that illustrate my current method. Anything not particularly relevant to the concept has been removed
//part that compares them all to each other
for (int j=0; j<counter; j++) //counter holds # of DNA
for (int k=j+1; k<counter; k++)
int test = determineBestOverlap(DNArray[j],DNArray[k]);
//boring stuff
//part that compares strings. Definitely very inefficient,
//although I think the sheer number of comparisons is the main problem
int determineBestOverlap(string str1, string str2)
{
int maxCounter = 0, bestOffset = 0;
//basically just tries overlapping the strings every possible way
for (int j=0; j<str2.length(); j++)
{
int counter = 0, offset = 0;
while (str1[offset] == str2[j+offset] && str1[offset] != 'X')
{
counter++;
offset++;
}
if (counter > maxCounter)
{
maxCounter = counter;
bestOffset = j;
}
}
return maxCounter;
} //this simplified version doesn't account for flipped strings
Do you really need to know the match between ALL string pairs? If yes, then you will have to compare every string with every other string, which means you will need n^2/2 comparisons, and you will need one half terabyte of memory even if you just store one byte per string pair.
However, i assume what you really are interested in is long strings, those that have more than, say, 20 or 30 or even more than 80 characters in common, and you probably don't really want to know if two string pairs have 3 characters in common while 50 others are X and the remaining 47 don't match.
What i'd try if i were you - still without knowing if that fits your application - is:
1) From each string, extract the largest substring(s) that make(s) sense. I guess you want to ignore 'X'es at the start and end entirely, and if some "readable" parts are broken by a large number of 'X'es, it probably makes sense to treat the readable parts individually instead of using the longer string. A lot of this "which substrings are relevant?" depends on your data and application that i don't really know.
2) Make a list of these longest substrings, together with the number of occurences of each substring. Order this list by string length. You may, but don't really have to, store the indexes of every original string together with the substring. You'll get something like (example)
AGCGCTXATCG 1
GAGXTGACCTG 2
.....
CGCXTATC 1
......
3) Now, from the top to the bottom of the list:
a) Set the "current string" to the string topmost on the list.
b) If the occurence count next to the current string is > 1, you found a match. Search your original strings for the substring if you haven't remembered the indexes, and mark the match.
c) Compare the current string with all strings of the same length, to find matches where some characters are X.
d) Remove the 1st character from the current string. If the resulting string is already in your table, increase its occurence counter by one, else enter it into the table.
e) Repeat 3b with the last, instead of the first, character removed from the current string.
f) Remove the current string from the list.
g) Repeat from 3a) until you run out of computing time, or your remaining strings become too short to be interesting.
If this is a better algorithm depends very much on your data and which comparisons you're really interested in. If your data is very random/you have very few matches, it will probably take longer than your original idea. But it might allow you to find the interesting parts first and skip the less interesting parts.
I don't see many ways to improve the fact that you need to compare each string with each other including shifting them, and that is by itself super long, a computation cluster seems the best approach.
The only thing I see how to improve is the string comparison by itself: replace A,C,T,G and X by binary patterns:
A = 0x01
C = 0x02
T = 0x04
G = 0x08
X = 0x0F
This way you can store one item on 4 bits, i.e. two per byte (this might not be a good idea though, but still a possible option to investigate), and then compare them quickly with a AND operation, so that you 'just' have to count how many consecutive non zero values you have. That's just a way to process the wildcard, sorry I don't have a better idea to reduce the complexity of the overall comparison.

Using a hash to find one duplicated and one missing number in an array

I had this question during an interview and am curious to see how it would be implemented.
Given an unsorted array of integers from 0 to x. One number is missing and one is duplicated. Find those numbers.
Here is what I came up with:
int counts[x+1];
for(int i =0;i<=x; i++){
counts[a[i]]++;
if(counts[a[i]] == 2)
cout<<”Duplicate element: “<<a[i]; //I realized I could find this here
}
for(int j=0; j<=x; j++){
if(counts[j] == 0)
cout<<”Missing element: “<<j;
//if(counts[j] == 2)
// cout<<”Duplicate element: “<<j; //No longer needed here.
}
My initial solution was to create another array of size x+1, loop through the given array and index into my array at the values of the given array and increment. If after the increment any value in my array is two, that is the duplicate. However, I then had to loop through my array again to find any value that was 0 for the missing number.
I pointed out that this might not be the most time efficient solution, but wasn't sure how to speed it up when I was asked. I realized I could move finding the duplicate into the first loop, but that didn't help with the missing number. After waffling for a bit, the interviewer finally gave me the idea that a hash would be a better/faster solution. I have not worked with hashes much, so I wasn't sure how to implement that. Can someone enlighten me? Also, feel free to point out any other glaring errors in my code... Thanks in advance!
If the range of values is the about the same or smaller than the number of values in an array, then using a hash table will not help. In this case, there are x+1 possible values in an array of size x+1 (one missing, one duplicate), so a hash table isn't needed, just a histogram which you've already coded.
If the assignment were changed to be looking for duplicate 32 bit values in an array of size 1 million, then the second array (a histogram) could need to be 2^32 = 4 billion counts long. This is when a hash table would help, since the hash table size is a function of the array size, not the range of values. A hash table of size 1.5 to 2 million would be large enough. In this case, you would have 2^32 - 2^20 = 4293918720 "missing" values, so that part of the assignment would go away.
Wiki article on hash tables:
Hash Table
If x were small enough (such that the sum of 0..x can be represented), you could compute the sum of the unique values in a, and subtract that from the sum of 0..x, to get the missing value, without needing the second loop.
Here is a stab at a solution that uses an index (a true key-value hash doesn't make sense when the array is guaranteed to include only integers). Sorry OP, it's in Ruby:
values = mystery_array.sort.map.with_index { |n,i| n if n != i }.compact
missing_value,duplicate_value = mystery_array.include?(values[0] - 1) ? \
[values[-1] + 1, values[0]] : [values[0] - 1, values[-1]]
The functions used likely employ a non-trivial amount of looping behind the scenes, and this will create a (possibly very large) variable values which contains a range between the missing and/or duplicate value, as well as a second lookup loop, but it works.
Perhaps the interviewer meant to say Set instead of hash?
Sorting allowed?
auto first = std::begin(a);
auto last = std::end(a);
// sort it
std::sort( first, last );
// find duplicates
auto first_duplicate = *std::adjacent_find( first, last );
// find missing value
auto missing = std::adjacent_find(first, last, [](int x, int y) {return x+2 == y;});
int missing_number = 0;
if (missing != last)
{
missing_number = 1+ *missing;
}
else
{
if (counts[0] != 0)
{
missing_number = 0;
}
else
{
missing_number = 9;
}
}
Both could be done in a single hand-written loop, but I wanted to use only stl algorithms. Any better idea for handling the corner cases?
for (i=0 to length) { // first loop
for( j=0 to length ){ // second loop
if (t[i]==j+1) {
if (counter==0){//make sure duplicated number has not been found already
for( k=i+1 to length ) { //search for duplicated number
if(t[k]==j+1){
j+1 is the duplicated number ;
if(missingIsFound)
exit // exit program, missing and dup are found
counter=1 ;
}//end if t[k]..
}//end loop for duplicated number
} // end condition to search
continue ; // continue to first loop
}
else{
j+1 is the missing number ;
if(duplicatedIsFound)
exit // exit program, missing and dup are found
continue ; //continue to first loop
}//end second loop
} //end first loop

counting of the occurrence of substrings

Is there an efficient algorithm to count the total number of occurrence of a sub-string X in a longer string Y ?
To be more specific, what I want is, the total number of ways of selecting A.size() elements from B such that there exists a permutation of the selected elements that matches B.
An example is as follows: search the total number of occurrence of X=AB in string Y=ABCDBFGHIJ ?
The answer is 2 : first A and second B, and first A and 5-th B.
I know we can generate all permutations of the long string (which will be N! length N strings Y) and use KMP algorithm to search/count the occurrence of X in Y.
Can we do better than that ?
The original problem I try to solve is as follows: let's say we have a large matrix M of size r by c (r and c in the range of 10000's). Given a small matrix P of size a by b (a and b are in the range of 10's). Find the total number of different selections of a rows and b columns of M (this will give us an a by b "submatrix" H) so that there exists a permutation of the rows and columns of H that gives us a matrix which matches P.
I think once I can solve 1-D case, 2-D may follow the solution.
After research, I find out that this is a sub-graph isomorphism problem and it is NP-hard. There are some algorithms solve this efficiently. One can google it and see many papers on this.
After having read, then re-read the question (at #Charlie 's suggestion), I have concluded that these answers are not addressing the real issue. I have concluded also that I still do not know exactly what the issue is, but if OP answer's my questions and clarifies the issue, then I will come back and make a better attempt at addressing it. For now, I will leave this as a place holder...
To find occurrences of a letter or other character:
char buf[]="this is the string to search";
int i, count=0, len;
len = strlen(buf);
for(i=0;i<len;i++)
{
if(buf[i] == 's') count++;
}
or, using strtok(), find occurrences of a sub-string:
Not pretty, brute force method.
// strings to search
char str1[]="is";
char str2[]="s";
int count = 0;
char buf[]="this is the string to search";
char *tok;
tok = strtok(buf, str1);
while(tok){
count++;
tok = strtok(NULL, str1);
}
tok = strtok(buf, str2);
while(tok){
count++;
tok = strtok(NULL, str2);
}
count should contain the total of occurrences of "s", + occurrences of "is"
[EDIT]
First, let me ask for a technical clarification of your question, given A = "AR", B = "START", the solutions would be "A", "R" and "AR", in this case all found in the 3rd and 4th letters of B. Is that correct?. If so, that's easy enough. You can do that with some small modifications and additions to what I have already done above. And if you have questions about that code, I would be happy to address them if I can.
The second part is your real question: Searching with better than, or at least with the same efficiency as the KMP algorithm - that's the real trick. If choosing the best approach is the real question, then some Google searching is in order. Because once you find, and settle on the best approach (efficiency >= KPM) to solving the sub-string search, then the implementation will be a set of simple steps (if you give it enough time), possibly, but not necessarily using some of the same components of C used above. (Pointer manipulation will be faster than using the string functions I think.) But these techniques are just implementation, and should always follow a good design. Here are a few Google searches to help you get started with a search... (you may have already been to some of these)
Validating KMP
KMP - Can we do better?
KMP - Defined
KMP - Improvements using Fibonacci String
If once you have made your algorithm selection, and begin to implement your design, you have questions about techniques, or coding suggestions, Post them. My guess is there are several people here who would enjoy helping with such a useful algorithm.
If X is a substring in Y, then each character of X must be in Y. So we first iterate through X and find the counts of each character, in an array counts.
Then for each character that has count >= 1, we count the number of times it appears in Y which can be done trivially in O(n).
From here the answer should just be the multiplication of the combinations C(count(Y),count(X)).
If after the 3rd time reading your question I finally understand it correctly.