Related
So I have been brushing up a bit on c++ and found out I am very rusty in comparison to my other languages. I have been working on this problem from codewars.com
Given a list lst and a number N, create a new list that contains each number of lst at most N times without reordering. For example if N = 2, and the input is [1,2,3,1,2,1,2,3], you take [1,2,3,1,2], drop the next [1,2] since this would lead to 1 and 2 being in the result 3 times, and then take 3, which leads to [1,2,3,1,2,3].
To accomplish this task I wanted to create a multidimentional vector to hold unique values of the provided list in the first dimension and corresponding number of occurences in the second dimension. However, I was unfamiliar with c++'s syntax to accomplish this, so I just made 2 separate vectors.(instances, countOfInstance)
Essentially what my algorithm will do is:
loop through the provided array(arr)
check to see if the value in "arr" does not exists in "instances"
If not found then push the value in "arr" to "instances",
add a counting value of 1 that corresponds to this index in "countOfInstance"
and then add the value in "arr" to the nFilteredVector.
If the value in "arr" is found in "instances" then:
Find the index value of "arr" in "instances"
Use this index to find its corresponding count value in "countOfInstances"
Determine if the count is less than the provided "N"
if less than "N" add to "nFilteredVector"
Then increment the value in "countOfInstances"
However, when I try to access the index of "CountOfInstances" with index of "instances" i get an odd error
no viable overloaded operator[] for type 'std::vector'
if (countOfInstances[std::find(instances.begin(), instances.end(),arr[i])] <=2){
Correct me if I am wrong, but it is my understanding that the find function returns the index value of the found element. I was wanting to use that index value to access "countOfInstances" vector.
Can someone please help me figure out the correct syntax of what I am looking for. Bonus points for integrating "instances" and "countOfInstance" as a multidimentional vector!!
#include <algorithm>
std::vector<int> deleteNth(std::vector<int> arr, int n)
{
std::vector<int> nFilteredVector;
std::vector<int> instances;
std::vector<int> countOfInstances;
for (int i =0; i < arr.size();i++){
if(std::find(instances.begin(), instances.end(),arr[i])==instances.end()){//value not found need to add corresponding value to instances vector then add an element of 1 to the correpeonding index of the countOfInstance vector.
instances.push_back(arr[i]);
countOfInstances.push_back(1);
nFilteredVector.push_back(arr[i]);
}else{ // value is found just need to increment the value in countOfInstances
//find the instance of the value in arr in the instance vector, use that value to find the corresponding value in countOfInstance
if (countOfInstances[std::find(instances.begin(), instances.end(),arr[i])] <=n){
nFilteredVector.push_back(arr[i]);
}
countOfInstances[std::find(instances.begin(), instances.end(),arr[i])]++;
}
return nFilteredVector;
}
Here are some examples of what codewars will be testing for
{
Assert::That(deleteNth({20,37,20,21}, 1), Equals(std::vector<int>({20, 37, 21})));
Assert::That(deleteNth({1,1,3,3,7,2,2,2,2}, 3), Equals(std::vector<int>({1, 1, 3, 3, 7, 2, 2, 2})));
}
If what you are trying to achieve is get the index of the found item in a std::vector, the following does this job using std::distance:
#include <algorithm>
#include <vector>
auto iter = std::find(instances.begin(), instances.end(),arr[i]);
if ( iter != instances.end())
{
// get the index of the found item
auto index = std::distance(instances.begin(), iter);
//...
}
I believe std::find return an iterator on instances. You cannot use an iterator from one list on another and you cannot use an iterator as an index.
What you could do is use
std::find(instances.begin(), instances.end(), arr[i]) - instances.begin()
as your index. This is a bit ugly, so you might also want to look a bit more at iterators and how to use them.
I'm practicing lambdas:
int main()
{
std::vector<int> v {1,2,3,4};
int count = 0;
sort(v.begin(), v.end(), [](const int& a, const int& b) -> bool
{
return a > b;
});
}
This is just code from GeeksForGeeks to sort in descending order, nothing special. I added some print statements (but took them out for this post) to see what was going on inside the lambda. They print the entire vector, and the a and b values:
1 2 3 4
a=2 b=1
2 1 3 4
a=3 b=2
3 2 1 4
a=4 b=3
4 3 2 1 <- final
So my more detailed question is:
What's the logic behind the order the vector elements are being passed into the a and b parameters?
Is b permanently at index 0 while a is iterating? And if so, isn't it a bit odd that the second param passed to the lambda stays at the first element? Is it compiler-specific? Thanks!
By passing a predicate to std::sort(), you are specifying your sorting criterion. The predicate must return true if the first parameter (i.e., a) precedes the second one (i.e., b), for the sorting criterion you are specifying.
Therefore, for your predicate:
return a > b;
If a is greater than b, then a will precede b.
So my more detailed question is: What's the logic behind the order the vector elements are being passed into the a and b parameters?
a and b are just pairs of elements of the elements you are passing to std::sort(). The "logic" will depend on the underlying algorithm that std::sort() implements. The pairs may also differ for calls with identical input due to randomization.
Is 'b' permanently at index 0 while 'a' is iterating? And if so, isn't it a bit odd that the second param passed to the lambda stays at the first element?
No, because the first element is the higher.
Seems that, with this algorithm, all elements are checked (and maybe switched) with the higher one (at first round) and the higher one is placed in first position; so b ever points to the higher one.
For Visual Studio, std::sort uses insertion sort if the sub-array size is <= 32 elements. For a larger sub-array, it uses intro sort, which is quick sort unless the "recursion" depth gets too deep, in which case it switches to heap sort. The output you program produces appears to correspond to some variation of insertion sort. Since the compare function is "less than", and since insertion sort is looking for out of order due to left values "greater than" right values, the input parameters are swapped.
You just compare two elements, with a given ordering. This means that if the order is a and then b, then the lambda must return true.
The fact that a or b are the first or the last element of the array, or fixed, depends on the sorting algorithm and of course of your data!
I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.
Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.
You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.
You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64
The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.
let's say we have the following 2d array of integers:
1 3 3 1
1 0 2 2
2 0 3 1
1 1 1 0
2 1 1 3
I was trying to create an implementation where the user could give as input the array itself and a string. An example of a string in the above example would be 03 which would mean that the user wants to sort the array based on the first and the fourth column.
So in this case the result of the sorting would be the following:
1 1 1 0
1 3 3 1
1 0 2 2
2 0 3 1
2 1 1 3
I didn't know a lot about the compare functions that are being used inside the STL's sort function, however after searching I created the following simple implementation:
I created a class called Comparator.h
class Comparator{
private:
std::string attr;
public:
Comparator(std::string attr) { this->attr = attr; }
bool operator()(const int* first, const int* second){
std::vector<int> left;
std::vector<int> right;
size_t i;
for(i=0;i<attr.size();i++){
left.push_back(first[attr.at(i) - '0']);
right.push_back(second[attr.at(i) - '0']);
}
for(i=0;i<left.size();i++){
if(left[i] < right[i]) return true;
else if(left[i] > right[i]) return false;
}
return false;
}
};
I need to know the information inside the string so I need to have a class where this string is a private variable. Inside the operator I would have two parameters first and second, each of which will refer to a row. Now having this information I create a left and a right vector where in the left vector I have only the numbers of the first row that are important to the sorting and are specified by the string variable and in the right vector I have only the numbers of the second row that are important to the sorting and are specified by the string variable.
Then I do the needed comparisons and return true or false. The user can use this class by calling this function inside the Sorting.cpp class:
void Sorting::applySort(int **data, std::string attr, int amountOfRows){
std::sort(data, data+amountOfRows, Comparator(attr));
}
Here is an example use:
int main(void){
//create a data[][] variable and fill it with integers
Sorting sort;
sort.applySort(data, "03", number_of_rows);
}
I have two questions:
First question
Can my implementation get better? I use extra variables like the left and right vectors, and then I have some for loops which brings some extra costing to the sorting operation.
Second question
Due to the extra cost, how much worse does the time complexity of the sorting become? I know that STL's sort is O(n*logn) where n is the number of integers that you want to sort. Here n has a different meaning, n is the number of rows and each row can have up to m integers which in turn can be found inside the Comparator class by overriding the operator function and using extra variables(the vectors) and for loops.
Because I'm not sure how exactly is STL's sort implemented I can only make some estimates.
My initial estimate would be O(n*m*log(n)) where m is the number of columns that are important to the sorting however I'm not 100% certain about it.
Thank you in advance
You can certainly improve your comparator. There's no need to copy the columns and then compare them. Instead of the two push_back calls, just compare the values and either return true, return false, or continue the loop according to whether they're less, greater, or equal.
The relevant part of the complexity of sort is O(n * log n) comparisons (in C++11. C++03 doesn't give quite such a good guarantee), where n is the number of elements being sorted. So provided your comparator is O(m), your estimate is OK to sort the n rows. Since attr.size() <= m, you're right.
First question: you don't need left and rigth - you add elements one by one and then iterate over the vectors in the same order. So instead of pushing values to vectors and then iterating over them, simply use the values as you generate them in the first cycle like so:
for(i=0;i<attr.size();i++){
int left = first[attr.at(i) - '0'];
int right = second[attr.at(i) - '0'];
if(left < right) return true;
else if(left > right) return false;
}
Second question: can the time complexity be improved? Not with sorting algorithm that uses direct comparison. On the other had the problem you solve here is somewhat similar to radix sort. And so I believe you should be able to do the sorting in O(n*m) where m is the number of sorting criteria.
1) Firstly to start off you should convert the string into an integer array in the constructor. With validation of values being less than the number of columns.
(You could also have another constructor that takes an integer array as a parameter.
A slight enhancement is to allow negative values to indicate that the order of the sort is reversed for that column. In this case the values would be -N..-1 , 1..N)
2) There is no need for the intermediate left, right arrays.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Given two arrays a and b .Find all pairs of elements (a1,b1) such that a1 belongs to Array A and b1 belongs to Array B whose sum a1+b1 = k .
Given : An unsorted array A of integers
Input : An integer k
Output : All the two element set with sum of elements in each set equal to k in O(n).
Example:
A = {3,4,5,1,4,2}
Input : 6
Output : {3,3}, {5,1}, {4,2}
Note : I know an O(n logn) solution but that would require to have the array sorted. Is there any way by which this problem can be solved in O(n). An non-trivial C++ data structure can be used i.e there's no bound on space
Make a constant-time lookup table (hash) so you can see if a particular integer is included in your array (O(n)). Then, for each element in the array, see if k-A[i] is included. This takes constant time for each element, so a total of O(n) time. This assumes the elements are distinct; it is not difficult to make it work with repeating elements.
Just a simple algorithm off the top of my head:
Create a bitfield that represents the numbers from 0 to k, labeled B
For each number i in A
Set B[i]
If B[k-i] is set, add (i, k-i) to the output
Now as people have raised, if you need to have two instances of the number 3 to output (3, 3) then you just switch the order of the last two statements in the above algorithm.
Also I'm sure that there's a name for this algorithm, or at least some better one, so if anyone knows I'd be appreciative of a comment.
http://codepad.org/QR9ptUwR
This will print all pairs. The algorithm is same as told by #bdares above.
I have used stl maps as we dont have hash tables in STL.
One can reduce the,
Element Uniqueness bit,
to this. No O(n).
There are k pairs of integers that sum to k: {0,k}, {1,k-1}, ... etc. Create an array B of size k+1 where elements are boolean. For each element e of the array A, if e <= k && B[e] == false, set B[e] = true and if B[k-e] == true, emit the pair {e,k-e}. Needs to be extended slightly for negative integers.