I came up with the following algorithm to calculate the time complexity to find the second most occuring character in a string. This algo is divided into two parts. The first part where characters are inserted into a map in O(n). I am having difficulty with the second part. Iterating over the map is O(n) push and pop is O(log(n)). what would be the BigO complexity of the second part ? finally what would the overall complexity be ? Any help understanding this would be great ?
void findKthHighestChar(int k,std::string str)
{
std::unordered_map<char, int> map;
//Step 1: O(n)
for (int i = 0; i < str.size(); i++)
{
map[str[i]] = map[str[i]] + 1;
}
//Step2: O(n*log())
//Iterate through the map
using mypair = std::pair<int, char>;
std::priority_queue<mypair, std::vector<mypair>, std::greater<mypair>> pq;
for (auto it = map.begin(); it != map.end(); it++) //This is O(n) .
{
pq.push(mypair(it->second, it->first)); //push is O(log(n))
if (pq.size() > k) {
pq.pop(); //pop() is O(log(n))
}
}
std::cout << k << " highest is " << pq.top().second;
}
You have 2 input variables, k and n (with k < n).
And one hidden: alphabet size A
Step1 has average-case complexity of O(n).
Step2: O(std::min(A, n)*log(k)).
Iterating the map is O(std::min(A, n))
Queue size is bound to k, so its operation are in O(log(k))
Whole algorithm is so O(n) + O(std::min(A, n)*log(k))
If we simplify and get rid of some variables to keep only n:
(k->n, A->n): O(n) + O(n*log(n)) so O(n*log(n)).
(k->n, std::min(A, n)->A): O(n) + O(log(n)) so O(n).
Does it have to be this algorithm?
You can use an array (of the size of your alphabet) to hold the frequencies.
You can populate it in O(n), (one pass through your string). Then you can find the largest, or second largest, frequency in one pass. Still O(n).
Related
So i have an array which has even and odds numbers in it.
I have to sort it with odd numbers first and then even numbers.
Here is my approach to it:
int key,val;
int odd = 0;
int index = 0;
for(int i=0;i<max;i++)
{
if(arr[i]%2!=0)
{
int temp = arr[index];
arr[index] = arr[i];
arr[i] = temp;
index++;
odd++;
}
}
First I separate even and odd numbers then I apply sorting to it.
For sorting I have this code:
for (int i=1; i<max;i++)
{
key=arr[i];
if(i<odd)
{
val = 0;
}
if(i>=odd)
{
val = odd;
}
for(int j=i; j>val && key < arr[j-1]; j--)
{
arr[j] = arr[j-1];
arr[j-1] = key;
}
}
The problem i am facing is this i cant find the complexity of the above sorting code.
Like insertion sort is applied to first odd numbers.
When they are done I skip that part and start sorting the even numbers.
Here is my approach for sorting if i have sorted array e.g: 3 5 7 9 2 6 10 12
complexity table
How all this works?
in first for loop i traverse through the loop and put all the odd numbers before the even numbers.
But since it doesnt sort them.
in next for loop which has insertion sort. I basically did is only like sorted only odd numbers first in array using if statement. Then when i == odd the nested for loop then doesnt go through all the odd numbers instead it only counts the even numbers and then sorts them.
I'm assuming you know the complexity of your partitioning (let's say A) and sorting algorithms (let's call this one B).
You first partition your n element array, then sort m element, and finally sort n - m elements. So the total complexity would be:
A(n) + B(m) + B(n - m)
Depending on what A and B actually are you should probably be able to simplify that further.
Edit: Btw, unless the goal of your code is to try and implement partitioning/sorting algorithms, I believe this is much clearer:
#include <algorithm>
#include <iterator>
template <class T>
void partition_and_sort (T & values) {
auto isOdd = [](auto const & e) { return e % 2 == 1; };
auto middle = std::partition(std::begin(values), std::end(values), isOdd);
std::sort(std::begin(values), middle);
std::sort(middle, std::end(values));
}
Complexity in this case is O(n) + 2 * O(n * log(n)) = O(n * log(n)).
Edit 2: I wrongly assumed std::partition keeps the relative order of elements. That's not the case. Fixed the code example.
I'm still a little confused about what the runtime complexity is of a std::map in C++. I know that the first for loop in the algorithm below takes O(N) or linear runtime. However, the second for loop has another for loop iterating over the map. Does that add anything to the overall runtime complexity? In other words, what is the overall runtime complexity of the following algorithm? Is it O(N) or O(Nlog(N)) or something else?
vector<int> smallerNumbersThanCurrent(vector<int>& nums) {
vector<int> result;
map<int, int> mp;
for (int i = 0; i < nums.size(); i++) {
mp[nums[i]]++;
}
for (int i = 0; i < nums.size(); i++) {
int numElements = 0;
for (auto it = mp.begin(); it != mp.end(); it++) {
if (it->first < nums[i]) numElements += it->second;
}
result.push_back(numElements);
}
return result;
}
The complexity of a map is that of insertion, deletion, search, etc. But iteration is always linear.
Having two for loops like this inside each other will produce O(N^2) complexity time, be it a map or not, given the n iterations in the inner loop (the size of the map) for each iteration of the outer loop (the size of the vector, which is the same in your code as the size of the map).
Your second for loop runs nums.size() times, so let's call that N. Looks like the map has as many entries as nums, so this contains same N entries. The two for loops then of size N is N*N or N^2.
The begin and end functions invoked by map are constant time because they each have a pointer reference from what I can tell:
C++ map.end function documentation
Note if you do have two for loops, but the outer one is size N and inner one is different size say M, then complexity is M*N, not N^2. Be careful on that point, but yes if N is same for both loops, then N^2 is runtime.
I have an unsorted vector of N elements and would like to find the K lowest or largest elements. K is expected to be K << N way smaller than N but the algo should be robust to be efficient also for larger values of K e.g. 50-80% of N.
Thinking along the lines of reusing Quicksort would mean using exactly the Kth smallest/largest element as pivot to partition. But finding the Kth smallest/largest value is already computing the solution to the OP.
Here is the partition bit of Quicksort:
template<typename T>
int partition(std::vector<T>& arr, int low, int high, T pivot) {
int i = (low - 1);
for (int j = low; j <= high - 1; ++j) {
if (arr[j] <= pivot) {
i++;
std::swap(arr[i], arr[j]);
}
}
std::swap(arr[i + 1], arr[high]);
return (i + 1);
}
If I knew what the pivot value would be corresponding to the Kth smallest/largest then I can use the partition above to solve my OP.
Partial_sort will put the least (greatest) K elements in the front of a container, and sort them. Call it like
std::partial_sort(arr.begin(), arr.begin() + K, arr.end());
std::partial_sort(arr.begin(), arr.begin() + K, arr.end(), std::greater<>());
It will run about N log K time
The standard library std::nth_element algorithm does what you want in O(n) complexity. Given the call:
std::nth_element(arr.begin(), arr.begin() + K, arr.end());
The Kth element is the element that would occur if the whole range was sorted. Elements before the Kth will all be less than or equal to the Kth element.
By default the algorithm uses the less-than operator. If you want the largest K elements you can use a different compare function, such as:
std::nth_element(arr.begin(), arr.begin() + K, arr.end(), std::greater<>{});
Take a look at median of medians algorithm (https://en.m.wikipedia.org/wiki/Median_of_medians). It takes O(n) time and does exactly that. It's one of the most efficient algorithms if not the best one.
I have some different implementations of the code for finding the Kth largest element in an unsorted array. The three implementations I use all use either min/max heap, but I am having trouble figuring out the runtime complexity for one of them.
Implementation 1:
int findKthLargest(vector<int> vec, int k)
{
// build min-heap
make_heap(vec.begin(), vec.end(), greater<int>());
for (int i = 0; i < k - 1; i++) {
vec.pop_back();
}
return vec.back();
}
Implementation 2:
int findKthLargest(vector<int> vec, int k)
{
// build max-heap
make_heap(vec.begin(), vec.end());
for (int i = 0; i < k - 1; i++) {
// move max. elem to back (from front)
pop_heap(vec.begin(), vec.end());
vec.pop_back();
}
return vec.front();
}
Implementation 3:
int findKthLargest(vector<int> vec, int k)
{
// max-heap prio. q
priority_queue<int> pq(vec.begin(), vec.end());
for (int i = 0; i < k - 1; i++) {
pq.pop();
}
return pq.top();
}
From my reading, I am under the assumption that the runtime for the SECOND one is O(n) + O(klogn) = O(n + klogn). This is because building the max-heap is done in O(n) and popping it will take O(logn)*k if we do so 'k' times.
However, here is where I am getting confused. For the FIRST one, with a min-heap, I assume building the heap is O(n). Since it is a min-heap, larger elements are in the back. Then, popping the back element 'k' times will cost k*O(1) = O(k). Hence, the complexity is O(n + k).
And similarly, for the third one, I assume the complexity is also O(n + klogn) with the same reasoning I had for the max-heap.
But, some sources still say that this problem cannot be done faster than O(n + klogn) with heaps/pqs! In my FIRST example, I think this complexity is O(n + k), however. Correct me if I'm wrong. Need help thx.
Properly implemented, getting the kth largest element from a min-heap is O((n-k) * log(n)). Getting the kth largest element from a max-heap is O(k * log(n)).
Your first implementation is not at all correct. For example, if you wanted to get the largest element from the heap (k == 1), the loop body would never be executed. Your code assumes that the last element in the vector is the largest element on the heap. That is incorrect. For example, consider the heap:
1
3 2
That is a perfectly valid heap, which would be represented by the vector [1,3,2]. Your first implementation would not work to get the 1st or 2nd largest element from that heap.
The second solution looks like it would work.
Your first two solutions end up removing items from vec. Is that what you intended?
The third solution is correct. It takes O(n) to build the heap, and O((k - 1) log n) to remove the (k-1) largest items. And then O(1) to access the largest remaining item.
There is another way to do it, that is potentially faster in practice. The idea is:
build a min-heap of size k from the first k elements in vec
for each following element
if the element is larger than the smallest element on the heap
remove the smallest element from the heap
add the new element to the heap
return element at the top of the heap
This is O(k) to build the initial heap. Then it's O((n-k) log k) in the worst case for the remaining items. The worst case occurs when the initial vector is in ascending order. That doesn't happen very often. In practice, a small percentage of items are added to the heap, so you don't have to do all those removals and insertions.
Some heap implementations have a heap_replace method that combines the two steps of removing the top element and adding the new element. That reduces the complexity by a constant factor. (i.e. rather than an O(log k) removal followed by an O(log k) insertion, you get an constant time replacement of the top element, followed by an O(log k) sifting it down the heap).
This is heap solution for java. We remove all elements which are less than kth element from the min heap. After that we will have kth largest element at the top of the min heap.
class Solution {
int kLargest(int[] arr, int k) {
PriorityQueue<Integer> heap = new PriorityQueue<>((a, b)-> Integer.compare(a, b));
for(int a : arr) {
heap.add(a);
if(heap.size()>k) {
// remove smallest element in the heap
heap.poll();
}
}
// return kth largest element
return heap.poll();
}
}
The worst case time complexity will be O(NlogK) where N is total no of elements. You will be using 1 heapify operation when inserting initial k elements in heap. After that you'll be using 2 operations(1 insert and 1 remove). So this makes the worst case time complexity O(NlogK). You can improve it with some other methods and bring the average case time complexity of heap update to Θ(1). Read this for more info.
Quickselect: Θ(N)
If you're looking for a faster solution on average. Quickselect algorithm which is based on quick sort is a good option. It provides average case time complexity of O(N) and O(1) space complexity. Of course worst case time complexity is O(N^2) however randomized pivot(used in following code) yields very low probability for such scenario. Following is code for quickselect algo for finding kth largest element.
class Solution {
public int findKthLargest(int[] nums, int k) {
return quickselect(nums, k);
}
private int quickselect(int[] nums, int k) {
int n = nums.length;
int start = 0, end = n-1;
while(start<end) {
int ind = partition(nums, start, end);
if(ind == n-k) {
return nums[ind];
} else if(ind < n-k) {
start = ind+1;
} else {
end = ind-1;
}
}
return nums[start];
}
private int partition(int[] nums, int start, int end) {
int pivot = start + (int)(Math.random()*(end-start));
swap(nums, pivot, end);
int left=start;
for(int curr=start; curr<end; curr++) {
if(nums[curr]<nums[end]) {
swap(nums, left, curr);
left++;
}
}
swap(nums, left, end);
return left;
}
private void swap(int[] nums, int i, int j) {
int temp = nums[i];
nums[i] = nums[j];
nums[j] = temp;
}
}
Here is an algorithm counting occurrences of anagrams of one string (search_word) in the other (text):
#include<iostream>
#include<algorithm>
#include<string>
#include<deque>
using namespace std;
int main()
{
string text = "forxxorfxdofr";
string search_word = "for";
deque<char> word;
word.insert(word.begin(), text.begin(), text.begin() + search_word.size());
int ana_cnt = 0;
for (int ix = 3; ix <= text.size(); ++ix)
{
deque<char> temp = word;
sort(word.begin(), word.end());
if (string(word.begin(), word.end()) == search_word)
++ana_cnt;
word = temp;
word.pop_front();
word.push_back(text[ix]);
}
cout << ana_cnt << endl;
}
What's the complexity of this algorithm?
I think it's O(n) algorithm, where n is the length o text. This is because the amount of time needed to execute what is inside for loop is independent of the lenght of n. However, some think it is not O(n). They say the sorting algorithm also counts when computing complexity.
It's O(n) if you only consider the string text with length n as input.
Proof: You're looping over ix from 3 (probably search_word.size(), isn't it?) to text.size(), so asymptotically you execute the loop body n times (since there is no break, continue or modification of ix in the loop body).
The loop body is independent of n. It sorts a queue of fixed size, namely m = search_word.size(), that is O(m log(m)) in the average case (worst case O(m^2)). As this is independent of n we're done with a total of O(n).
It's not O(n): If you want to be a little bit more precise, you'd probably count search_word with length m as input and this comes to a total of O(n m log(m)) on average, O(n m^2) in the worst case.