May Leetcode Speedrun Question: Single element in a Sorted Array - c++

So I was watching Errichto complete these challenges and I was amazed at how fast he solved the "Single element in a Sorted Array". From a beginner's perspective, it does look impressive - maybe for senior devs the speed is quite normal.
You are given a sorted array where all elements are integers, and all elements appear exactly twice in the array, except for one element, which appears exactly once. (i.e., all elements are duplicated, except one.) You need to find the element appearing exactly once.
I am just here to understand how said code works:
class Solution {
public:
int singleNonDuplicate(vector<int>& nums) {
long long a = 0;
for(int x : nums) {
a ^= x
}
return a;
}
};
Here's what I've got so far:
for every single integer "x" in the vector/array "nums", a is equal to a^x (if what I said is correct).
And here are my questions:
Wouldn't a^x be equal to 0 because a is 0 since the beginning?
int singleNonDuplicate(vector<int> nums) {
//...
}
and
int singleNonDuplicate(vector<int>& nums) {
//...
}
I've understood this: vector<int> nums is pass by value (you're working with a "copy" of nums inside the function) and vector<int>& nums is pass by reference (you're working with nums itself inside the function).
Does the "&" matter if you were to solve the problem just like Errichto?
ps:
sorry for possible mistakes from a programming perspective, I might've accidentally said some wrong things.
yes I will learn C++ sooner or later, 2020 is the first year in my life where I actually have an actual "programming" class in my schedule, these videos are entertaining and I'm curious to see why said code works & try understand etc.

Casual proof:
(If you're interested in areas of study that help you to come up with solutions like this and understand them, I'd suggest Discrete Mathematics and Group Theory / Abstract Algebra.)
I think I know the question you were referencing. It goes something like,
You are given an unsorted array where all elements are integers, and all elements appear exactly twice in the array, except for one element, which appears exactly once. (i.e., all elements are duplicated, except one.)
You're on the right track for the first part, why the algorithm works. It takes advantage of a few properties of XOR:
X^0=X
X^X=0
The XOR operation is commutative and associative.
# proof
# since XOR is commutative, we can take advantage
# of the fact that all elements except our target
# occur in pairs of two:
P1, P1 = Two integers of the same value in a pair.
T = Our target.
# sample unsorted order, array size = 7 = (3*2)+1
[ P3, P1, T, P1, P2, P3, P2 ]
# since XOR is commutative, we can re-arrange the array
# to our liking, and still get the same value as
# the XOR algorithm.
# here, we move our target to the front, and then group
# each pair together. I've arranged them in ascending
# order, but that's not important:
[ T, P1, P1, P2, P2, P3, P3 ]
# write out the equation our algorithm is solving:
solution = 0 ^ T ^ P1 ^ P1 ^ P2 ^ P2 ^ P3 ^ P3
# because XOR is associative, we can use parens
# to indicate elements of the form X^X=0:
solution = T ^ (P1 ^ P1) ^ (P2 ^ P2) ^ (P3 ^ P3) ^ 0
# substitute X^X=0
solution = T ^ 0 ^ 0 ^ 0 ^ 0
# use X^0=X repeatedly
solution = T
So we know that running that algorithm will give us our target, T.
On using & to pass-by-reference instead of pass-by-value:
Your understanding is correct. Here, it doesn't make a real difference.
Pass-by-reference lets you modify the original value in place, which he doesn't do.
Pass-by-value copies the vector, which wouldn't meaningfully impact performance here.
So he gets style points for using pass-by-reference, and if you're using leetcode to demonstrate your diligence as a software developer it's good to see, but it's not pertinent to his solution.

^ is XOR operation in the world of coding, not the power operation (which you are assuming I guess).
I don't know about which problem you are talking about, if its finding the only unique element in array (given every other element occurs twice),
then the logic behind solving is
**a XOR a equals 0 **
**a XOR 0 equals a**
So if we XOR all the elements present in array, we will get 0 corresponding to the elements occurring twice.
The only element remaining will be XORed with 0 and hence we get the element.
Answer to second query is that whenever you want to modify the array we pass it by reference .
PS: I am also new to programming.I hope I answered your queries.

Related

How is binary search applicable here (since the values are not monotonic)?

I am solving a LeetCode problem Search in Rotated Sorted Array, in order to learn Binary Search better. The problem statement is:
There is an integer array nums sorted in ascending order (with distinct values). Prior to being passed to your function, nums is possibly rotated at an unknown pivot index. For example, [0,1,2,4,5,6,7] might be rotated at pivot index 3 and become [4,5,6,7,0,1,2]. Given the array nums after the possible rotation and an integer target, return the index of target if it is in nums, or -1 if it is not in nums.
With some online help, I came up with the solution below, which I mostly understand:
class Solution {
public:
int search(vector<int>& nums, int target) {
int l=0, r=nums.size()-1;
while(l<r) { // 1st loop; how is BS applicable here, since array is NOT sorted?
int m=l+(r-l)/2;
if(nums[m]>nums[r]) l=m+1;
else r=m;
}
// cout<<"Lowest at: "<<r<<"\n";
if(nums[r]==target) return r; //target==lowest number
int start, end;
if(target<=nums[nums.size()-1]) {
start=r;
end=nums.size()-1;
} else {
start=0;
end=r;
}
l=start, r=end;
while(l<r) {
int m=l+(r-l)/2;
if(nums[m]==target) return m;
if(nums[m]>target) r=m;
else l=m+1;
}
return nums[l]==target ? l : -1;
}
};
My question: Are we searching over a parabola in the first while loop, trying to find the lowest point of a parabola, unlike a linear array in traditional binary search? Are we finding the minimum of a convex function? I understand how the values of l, m and r change leading to the right answer - but I do not fully follow how we can be guaranteed that if(nums[m]>nums[r]), our lowest value would be on the right.
You actually skipped something important by “getting help”.
Once, when I was struggling to integrate something tricky for Calculus Ⅰ, I went for help and the advisor said, “Oh, I know how to do this” and solved it. I learned nothing from him. It took me another week of going over it (and other problems) myself to understand it sufficient that I could do it myself.
The purpose of these assignments is to solve the problem yourself. Even if your solution is faulty, you have learned more than simply reading and understanding the basics of one example problem someone else has solved.
In this particular case...
Since you already have a solution, let’s take a look at it: Notice that it contains two binary search loops. Why?
As you observed at the beginning, the offset shift makes the array discontinuous (not convex). However, the subarrays either side of the discontinuity remain monotonic.
Take a moment to convince yourself that this is true.
Knowing this, what would be a good way to find and determine which of the two subarrays to search?
Hints:
A binary search as  ( n ⟶ ∞ )   is   O(log n)
O(log n) ≡ O(2 log n)
I should also observe to you that the prompt gives as example an arithmetic progression with a common difference of 1, but the prompt itself imposes no such restriction. All it says is that you start with a strictly increasing sequence (no duplicate values). You could have as input [19 74 512 513 3 7 12].
Does the supplied solution handle this possibility?
Why or why not?

What's the logic behind the order the elements are passed to a comparison function in std::sort?

I'm practicing lambdas:
int main()
{
std::vector<int> v {1,2,3,4};
int count = 0;
sort(v.begin(), v.end(), [](const int& a, const int& b) -> bool
{
return a > b;
});
}
This is just code from GeeksForGeeks to sort in descending order, nothing special. I added some print statements (but took them out for this post) to see what was going on inside the lambda. They print the entire vector, and the a and b values:
1 2 3 4
a=2 b=1
2 1 3 4
a=3 b=2
3 2 1 4
a=4 b=3
4 3 2 1 <- final
So my more detailed question is:
What's the logic behind the order the vector elements are being passed into the a and b parameters?
Is b permanently at index 0 while a is iterating? And if so, isn't it a bit odd that the second param passed to the lambda stays at the first element? Is it compiler-specific? Thanks!
By passing a predicate to std::sort(), you are specifying your sorting criterion. The predicate must return true if the first parameter (i.e., a) precedes the second one (i.e., b), for the sorting criterion you are specifying.
Therefore, for your predicate:
return a > b;
If a is greater than b, then a will precede b.
So my more detailed question is: What's the logic behind the order the vector elements are being passed into the a and b parameters?
a and b are just pairs of elements of the elements you are passing to std::sort(). The "logic" will depend on the underlying algorithm that std::sort() implements. The pairs may also differ for calls with identical input due to randomization.
Is 'b' permanently at index 0 while 'a' is iterating? And if so, isn't it a bit odd that the second param passed to the lambda stays at the first element?
No, because the first element is the higher.
Seems that, with this algorithm, all elements are checked (and maybe switched) with the higher one (at first round) and the higher one is placed in first position; so b ever points to the higher one.
For Visual Studio, std::sort uses insertion sort if the sub-array size is <= 32 elements. For a larger sub-array, it uses intro sort, which is quick sort unless the "recursion" depth gets too deep, in which case it switches to heap sort. The output you program produces appears to correspond to some variation of insertion sort. Since the compare function is "less than", and since insertion sort is looking for out of order due to left values "greater than" right values, the input parameters are swapped.
You just compare two elements, with a given ordering. This means that if the order is a and then b, then the lambda must return true.
The fact that a or b are the first or the last element of the array, or fixed, depends on the sorting algorithm and of course of your data!

Translating the following C++ code into Nim

I'm trying to learn Nim by converting different pieces of code, and I've stumbled upon something which I've never seen before.
#include<bits/stdc++.h>
...
for(int t=q&1?u+x:u+x>>1;t>1;)t/=p[++cnt]=sieve[t];
...
sort(p+1,p+cnt+1);
I understand what the ternary operator is and how it works, what I don't quite get is what's going on with the variables "t" and "cnt" (both integers) and the array "p" (an array of integers). How does using an increment as the index of "p" work?
Then there's the sort function, in which I completely gave up because I couldn't find any documentation on what it does (the fact that it's taking an integer added to an array obviously doesn't help).
Lets first start of by making the code a little more readable. A little bit of whitespace never hurt anybody.
for(int t = (q & 1? u + x: u + x >> 1); t > 1;)
{
t /= p[++cnt] = sieve[t];
}
what's going on with the variables "t" and "cnt" (both integers) and the array "p" (an array of integers)
So t is being set to either u + x or u + x >> 1 depending on what q & 1 is. Then inside the loop we are dividing t by whatever the value of sieve at the index of t is. We are also assign that value to the p array at the position of ++cnt. ++cnt is using the pre increment operator to increase the value of cnt by 1 and then using that value for the index of p.
Then there's the sort function, in which I completely gave up because I couldn't find any documentation on what it does
For this I am assuming they are using the std::sort() function. When dealing with arrays the name of the array is treated as a pointer to the first element of the array. So when we see sort(p+1,p+cnt+1); you can translate it to sort(one from the begining of the array, cnt + 1 elements from the begining of the array);. So this is going to sort all of the elements in the array from one from the begining of the array to one less than cnt + 1 elements from the begining of the array.
Are you trying to learn Nim as you said, or trying to learn C? Both things you asked about are pretty basic c:
++cnt has the side effect (cnt=cnt+1) combined with the value that cnt ends up with. That value is used as the index. The side effect is a side effect.
p+1 and p+cnt are each pointers. The name of an array is treated as a constant pointer to the first element of that array in most uses within C. A pointer plus an integer is another pointer, pointing that number of elements past the original.

How to write a 2d-array C++ function-parameter (eg. for int[x][y])?

I have an 2D-Array
int tilemap[800][600];
I want to use such an Araay in an Function (as parameter)
void load(int* tiles);
But the datatype int* is not the right one.
What is the right one? I have no idea.
Thanks!
void load(int tiles[][600]);
You can have matrix in two ways. One is using pointer to array of pointers (rows) to array of data.
Other way is the way you have chosen.
It's, as you sad, a 2-D array, which means that (in case of c/c++) rows of data are one behind of other in memory.
This matrix:
+-------+
| 7 | 9 |
+-------+
| 5 | 2 |
+-------+
is represented like this in memory:
[ [7] [9] ] [ [5] [2] ]
To get element at position [1][0], computer needs to calculate it's address from address of matrix (the address of it's first element) and position numbers.
In the above example you see that address of [1][0] is (address of number 7) + 2*sizeof(int), that is &matrix + 2.
So the actual formula to calculate address of element [i][j] is
&matrix + (number_of_columns*i + j)
I omitted *sizeof(int) or *sizeof(type) because arithmetic that includes pointer and integer already includes that, but if &matrix is raw address to some data, without knowledge of it's type it should be included.
That's what happens at level of machine instructions, computer calculates
address_of_matrix + (number_of_columns*i + j)*sizeof(type)
So you need to tell compiler at least what is number of columns this matrix has.
You'll also need number of rows of course, but that's something you need, not compiler.
I hope it's clearer. It's hard to explain it here. But there are a lot of material on the net.
the datatype int* is not the right one.
Actually, it is just fine. An array of arrays has all elements stored sequentially, so a pointer to the first is sufficient to find all the others.
The "best" option, though, is a reference-to-array-of-array, because that will preserve the size information:
template<size_t M, size_t N>
void load(int (&tiles)[M][N])
{
}
(Note that like all templates, the definition should be visible at the point of instantiation)
If the second extent is always fixed, you can pass an array-of-array as pointer-to-array, because array-of-T can always be passed as pointer-to-T:
void load(int (*tiles)[600])
{
}
This is what both Marko and 2501's example code is doing, although they are using a misleading syntax that looks like an array, while in fact it is a pointer.

Super long arrays in C++

I have two sets A and B. Set A contains unique elements. Set B contains all elements. Each element in the B is a 10 by 10 matrix where all entries are either 1 or 0. I need to scan through set B and everytime i encounter a new matrix i will add it to set A. Therefore set A is a subset of B containing only unique matrices.
It seems like you might really be looking for a way to manage a large, sparse array. Trivially, you could use a hash map with your giant index as your key, and your data as the value. If you talk more about your problem, we might be able to find a more appropriate data structure for your problem.
Update:
If set B is just some set of matrices and not the set of all possible 10x10 binary matrices, then you just want a sparse array. Every time you find a new matrix, you compute its key (which could simply be the matrix converted into a 100 digit binary value, or even a 100 character string!), look up that index. If no such key exists, insert the value 1 for that key. If the key does exist, increment and re-store the new value for that key.
Here is some code, maybe not very efficient :
# include <vector>
# include <bitset>
# include <algorithm>
// I assume your 10x10 boolean matrix is implemented as a bitset of 100 bits.
// Comparison of bitsets
template<size_t N>
class bitset_comparator
{
public :
bool operator () (const std::bitset<N> & a, const std::bitset<N> & b) const
{
for(size_t i = 0 ; i < N ; ++i)
{
if( !a[i] && b[i] ) return true ;
else if( !b[i] && a[i] ) return false ;
}
return false ;
}
} ;
int main(int, char * [])
{
std::set< std::bitset<100>, bitset_comparator<100> > A ;
std::vector< std::bitset<100> > B ;
// Fill B in some manner ...
// Keeping unique elements in A
std::copy(B.begin(), B.end(), std::inserter(A, A.begin())) ;
}
You can use std::listinstead of std::vector. The relative order of elements in B is not preserved in A (elements in A are sorted).
EDIT : I inverted A and B in my first post. It's correct now. Sorry for the inconvenience. I also corrected the comparison functor.
Each element in the B is a 10 by 10 matrix where all entries are either 1 or 0.
Good, that means it can be represented by a 100-bit number. Let's round that up to 128 bits (sixteen bytes).
One approach is to use linked lists - create a structure like (in C):
typedef struct sNode {
unsigned char bits[16];
struct sNode *next;
};
and maintain the entire list B as a sorted linked list.
The performance will be somewhat less (a) than using the 100-bit number as an array index into a truly immense (to the point of impossible given the size of the known universe) array.
When it comes time to insert a new item into B, insert it at its desired position (before one that's equal or greater). If it was a brand new one (you'll know this if the one you're inserting before is different), also add it to A.
(a) Though probably not unmanageably so - there are options you can take to improve the speed.
One possibility is to use skip lists, for faster traversal during searches. These are another pointer that references not the next element but one 10 (or 100 or 1000) elements along. That way you can get close to the desired element reasonably quickly and just do the one-step search after that point.
Alternatively, since you're talking about bits, you can divide B into (for example) 1024 sub-B lists. Use the first 10 bits of the 100-bit value to figure out which sub-B you need to use and only store the next 90 bits. That alone would increase search speed by an average of 1000 (use more leading bits and more sub-Bs if you need improvement on that).
You could also use a hash on the 100-bit value to generate a smaller key which you can use as an index into an array/list, but I don't think that will give you any real advantage over the method in the previous paragraph.
Convert each matrix into a string of 100 binary digits. Now run it through the Linux utilities:
sort | uniq
If you really need to do this in C++, it is possible to implement your own merge sort, then the uniq part becomes trivial.
You don't need N buckets where N is the number of all possible inputs. A binary tree will just do fine. This is implemented with set class in C++.
vector<vector<vector<int> > > A; // vector of 10x10 matrices
// fill the matrices in A here
set<vector<vector<int> > > B(A.begin(), A.end()); // voila!
// now B contains all elements in A, but only once for duplicates