What is the cheapest way to sort a permutation in C++? - c++

The problem is:
You have to sort an array in ascending order(permutation: numbers from 1 to N in a random order) using series of swaps. Every swap has a price and there are 5 types of prices. Write a program that sorts the given array for the smallest price.
There are two kinds of prices: priceByValue and priceByIndex. All of the prices of a kind are given in 2 two-dimensional arrays N*N. Example how to access prices:
You want to swap the 2nd and the 5th elements from the permutation with values of 4 and 7. The price for this swap will be priceByValue[4][7] + priceByIndex[2][5].
Indexes of all arrays are counted from 1 (, not from 0) in order to have access to all of the prices (the permutation elements’ values start from 1): priceByIndex[2][5] would actually be priceByIndex[1][4] in code. Moreover, the order of the indexes by which you access prices from the two-dimensional arrays doesn’t matter: priceByIndex[i][j] = priceByIndex[j][i] and priceByIndex[i][i] is always equal to 0. (priceByValue is the same)
Types of prices:
Price[i][j] = 0;
Price[i][j] = random number between 1 and 4*N;
Price[i][j] = |i-j|*6;
Price[i][j] = sqrt(|i-j|) *sqrt(N)*15/4;
Price[i][j] = max(i,j)*3;
When you access prices by index i and j are the indexes of the elements you want to swap from the original array; when you access prices by value i and j are the values of the elements you want to swap from the original array. (And they are always counted from 1)
Things given:
N - an integer from 1 to 400, Mixed array, Type of priceByIndex, priceByIndex matrix, Type of priceByValue, priceByValue matrix. (all elements of a matrix are from the given type)
Things that should 'appear on the screen': number of swaps, all swaps (only by index - 2 5 means that you have swapped 2nd and 3rd elements) and the price.
As I am still learning C++, I was wondering what is the most effective way to sort the array in order to try to find the sort with the smallest cost.
There might be a way how to access series of swaps that result a sorted array and see which one is with the smallest price and I need to sort the array by swapping the elements which are close by both value and index, but I don’t know how to do this. I would be very grateful if someone can give me a solution how to find the cheapest sort in code. Thank you in advance!
More: this problem might have no solution, I am just trying to get a result close to the ideal.

Dynamic Programming!
Think of the problem as a graph. Each of the N-factorial permutations represents a graph vertex, and the allowed swaps are just arcs between vertices. The price-tag of a swap is just the weight on the arc.
When you look at the problem this way, it can be easily solved with Dijkstra's algortihm for finding the lowest cost path through a graph from one vertex to another.
This is also called Single Pair Shortest Path

you can use an algorithm for sorting an array in lexicographical order and modify it so that it fits your needs ( you did not mention the sorting criteria like the desired result i.e. least value first, ... ) there are multiple algorithms available for this, i.e. quick sort,...
a code example is in https://www.geeksforgeeks.org/lexicographic-permutations-of-string/

Related

How can one populate a vector with all possible string combinations of n vector length in the C++ programming language

I have a vector of 100 strings and another empty vector and I'm trying to fill the empty vector with every possible combination of n strings from the group of 100. (n = 1, 2, 3,...)
If n =1 then you get every unique vector composed of 1 string (or all 100 strings as vectors)
If n =2 then you get every unique vector composed of 2 strings(or 100^2 variations)
C++ is not my native language.
I have some attempts so far, and what I'd do in Zou_script (proprietary in-house) would be to assign each string a number and then permute through all possible combinations of those numbers, and then reference individual strings through Vector[] to create the vectors.
This seems slow and has a lack of elegance, holding the string bank in memory could be bad if the string bank was much bigger.
I have used std::next_permutation but I'm having trouble extending it elegantly to sorting vectors composed of strings.
How can one populate a vector with all possible string combinations of n vector length in the C++ programming language? <- Question.
Might anyone be of any assistance? If you're unsure, or intimidated by the question it's OK to go to the next one.
Update
I have managed to replicate the technique in C++ but next_permutation is significantly slower because it does not understand it does not need to calculate the entire permutation vector, only up to n amount.
Any way to manipulate next_permutation to only calculate x elements of a vector permutation?
I have asked this question but I will do the best that I can to give an answer. This is in fact a well researched and often asked question in C/C++.
"How to consider permutations of a group of N elements, r at a time?"
There are many ways to tackle the problem. One such way is to generate a vector of integers with which to map to a vector of your elements.
Using std::next_permutation you can generate a list of numbers (permutations of the integer vector) and truncate to the amount of items you are considering. This list can then be sorted using vector tools, duplicates removed. This will give you a list of all unique permutations of N integers, r at a time, for mapping to your element vector.
Then it can be as easy as calling the r numbers from your permutation integer list and using them in your element list index to generate the permutations of your elements.
for (int k = 0; k < linecount_of_integer_permutation_list; k++)
{
// insert code for calling up integer permutation list line
// and assigning that permutation to vector
for (int i = 0; i < r; i++)
{
file << element[intvec[r]]; // can put whatever delimiters you want/need
}
file << std::endl;
intvec.clear();
// remember to clear vectors, or other flags depending on what you need
}
This is cumbersome and very slow.
https://howardhinnant.github.io/combinations.html
Has some very good ideas on how to handle this issue faster. The above will work for small sets, however the jump from small to absolutely unmanageable is very quick in permutations.
Thank you for all your help. It is actually an interesting question but apparently it isnt needed for many people's applications in programming.

Random pairs from two lists

My question is similar to this one.
I have two lists: X with n elements and Y with m elements - let's say they hold row and column indices for a n x m matrix A. Now, I'd like to write something to k random places in matrix A.
I thought of two solutions:
Get a random element x from X and a random element y from Y. Check if something is already written to A[x][y] and if not - write. But if k is close to m*n I can shoot like this for ever.
Create an m*n array with all possible combinations of indices, shuffle it, draw first k elements and write there. But the problem I see here is that if both n and m are very big, the newly created n*m array may be huge (and shuffling may take some time too).
Karoly Horvath suggested to combine the two. I guess I'd have to pick threshold t and:
.
if( k/(m*n) > t ){
use option 2.
}else{
use option 1.
}
Any advice on how to pick t then?
Are there any other (better) approaches I missed?
There's an elegant algorithm due to Floyd for sampling without replacement from a range of integers. You can map the resulting integers in [0, n*m) to coordinates by the C++ function [m](int i) { return std::make_pair(i / m, i % m); }.
The best approach depends on how full your resulting matrix will be.. If you are going to fill more than half of it your rate of collision (aka getting random spot that is already "written" to) is going to be high and will cause you to loop a lot more than you would want.
I would not generate all possibilities, but instead I would build it as you go using a lists of lists. One for all possible values of X and from that a list of possible values of Y. I would initialize the X list but not the Y ones.
every time you pick a value of x for the first time you create a dictionary or list of m elements, then remove the one you use. then next time you pick x you will have m-1 elements, once a X value runs out of elements then remove it from the list so it does not get picked again.. this way you can guarantee never to pick a occupied space again, and you do not need to generate n*m possible options.
You have n x m elements, e.g. 200 elements for a 10 x 20 matrix. Picking one out of 200 should be easy. Point is, whatever you do, you can flatten the two dimensions into one, reducing that part of the problem.
Notes:
Use floor divide and modulo operations to get row and column out of the index.
Blacklist: Store the picked index in a set to quickly skip those that were already written.
Whitelist: Store the indices that are not yet picked in a set. If this is better than blacklisting depends on how full your set is.
Using the right container type for the set might come important, it doesn't have to be std::set. For the blacklist, you only need fast lookup and fast insertion, a vector<bool> might actually work pretty well. For the whitelist, you need fast random access and fast deletion, a vector<unsigned> with the remaining indices would be a good choice.
Prepare to switch between either method depending on the circumstances.
for a nxm matrix, you can consider [0..n*m-1] the indexes for the matrix elements.
Filling in a random index is rather trivial, just generate a random number between 0 and n*m-1, and that is the position to be filled.
Subsequently doing this operation can be a little more tricky:
you can test weather you have already written something to a position and regenerate the random number; but as you fill the matrix you will have a larger number of index regeneration.
a better solution is to put all the indexes in a vector of n*m elements. As you generate an index, you remove it from the list and next time generate a random index between 0 and N-1
example:
vector<int> indexVec;
for (i=0;i<n*m;i++)
indexVec.push_back(i);
nrOfIndexes = n*m-1;
while (nrOfIndexes>1)
{
index = rand()% nrOfIndexes;
processMatrixLocation(index);
indexVec.erase(indexVec.begin()+index);
nrOfIndexes--;
}
processMatrixLocation(indexVec[0]);

Search Algorithm to find the k lowest values in a list

I have a list that contains n double values and I need to find the k lowest double values in that list
k is much smaller than n
the initial list with the n double values is randomly ordered
the found k lowest double values are not required to be sorted
What algorithm would you recommend?
At the moment I use Quicksort to sort the whole list, and then I take the first k elements out of the sorted list. I expect there should be a much faster algorithm.
Thank you for your help!!!
You could model your solution to match the nlargest() code in Python's standard library.
Heapify the first k values on a maxheap.
Iterate over the remaining n - k values.
Compare each to the element of the top of the heap.
If the new value is lower, do a heapreplace operation (which replaces the topmost heap element with the new value and then sifts it downward).
The algorithm can be surprisingly efficient. For example, when n=100,000 and k=100, the number of comparisons is typically around 106,000 for randomly arranged inputs. This is only slightly more than 100,000 comparisons to find a single minimum value. And, it does about twenty times fewer comparisons than a full quicksort on the whole dataset.
The relative strength of various algorithms is studied and summarized at: http://code.activestate.com/recipes/577573-compare-algorithms-for-heapqsmallest
You can use selection algorithm to find the kth lowest element and then iterate and return it and all elements that are lower then it. More work has to be done if the list can contain duplicates (making sure you don't end up with more elements that you need).
This solution is O(n).
Selection algorithm is implemented in C++ as nth_element()
Another alternative is to use a max heap of size k, and iterate the elements while maintaining the heap to hold all k smallest elements.
for each element x:
if (heap.size() < k):
heap.add(x)
else if x < heap.max():
heap.pop()
heap.add(x)
When you are done - the heap contains k smallest elements.
This solution is O(nlogk)
Take a look at partial_sort algorithm from C++ standard library.
You can use std::nth_element. This is O(N) complexity because it doesn't sort the elements, it just arranges them such that every element under a certain N is less than N.
you can use selection sort, it takes O(n) to select first lowest value. Once we have set this lowest value on position 1 we can rescan the data set to find out second lowest value. and can do it until we have kth lowest value. in this way if k is enough smaller then n then we will have complexity kn which is equivalent to O(n)...

How to efficiently *nearly* sort a list?

I have a list of items; I want to sort them, but I want a small element of randomness so they are not strictly in order, only on average ordered.
How can I do this most efficiently?
I don't mind if the quality of the random is not especially good, e.g. it simply based on the chance ordering of the input, e.g. an early-terminated incomplete sort.
The context is implementing a nearly-greedy search by introducing a very slight element of inexactness; this is in a tight loop and so the speed of sorting and calling random() are to be considered
My current code is to do a std::sort (this being C++) and then do a very short shuffle just in the early part of the array:
for(int i=0; i<3; i++) // I know I have more than 6 elements
std::swap(order[i],order[i+rand()%3]);
Use first two passes of JSort. Build heap twice, but do not perform insertion sort. If element of randomness is not small enough, repeat.
There is an approach that (unlike incomplete JSort) allows finer control over the resulting randomness and has time complexity dependent on randomness (the more random result is needed, the less time complexity). Use heapsort with Soft heap. For detailed description of the soft heap, see pdf 1 or pdf 2.
You could use a standard sort algorithm (is a standard library available?) and pass a predicate that "knows", given two elements, which is less than the other, or if they are equal (returning -1, 0 or 1). In the predicate then introduce a rare (configurable) case where the answer is random, by using a random number:
pseudocode:
if random(1000) == 0 then
return = random(2)-1 <-- -1,0,-1 randomly choosen
Here we have 1/1000 chances to "scamble" two elements, but that number strictly depends on the size of your container to sort.
Another thing to add in the 1000 case, could be to remove the "right" answer because that would not scramble the result!
Edit:
if random(100 * container_size) == 0 then <-- here I consider the container size
{
if element_1 < element_2
return random(1); <-- do not return the "correct" value of -1
else if element_1 > element_2
return random(1)-1; <-- do not return the "correct" value of 1
else
return random(1)==0 ? -1 : 1; <-- do not return 0
}
in my pseudocode:
random(x) = y where 0 <= y <=x
One possibility that requires a bit more space but would guarantee that existing sort algorithms could be used without modification would be to create a copy of the sort value(s) and then modify those in some fashion prior to sorting (and then use the modified value(s) for the sort).
For example, if the data to be sorted is a simple character field Name[N] then add a field (assuming data is in a structure or class) called NameMod[N]. Fill in the NameMod with a copy of Name but add some randomization. Then 3% of the time (or some appropriate amount) change the first character of the name (e.g., change it by +/- one or two characters). And then 10% of the time change the second character +/- a few characters.
Then run it through whatever sort algorithm you prefer. The benefit is that you could easily change those percentages and randomness. And the sort algorithm will still work (e.g., it would not have problems with the compare function returning inconsistent results).
If you are sure that element is at most k far away from where they should be, you can reduce quicksort N log(N) sorting time complexity down to N log(k)....
edit
More specifically, you would create k buckets, each containing N/k elements.
You can do quick sort for each bucket, which takes k * log(k) times, and then sort N/k buckets, which takes N/k log(N/k) time. Multiplying these two, you can do sorting in N log(max(N/k,k))
This can be useful because you can run sorting for each bucket in parallel, reducing total running time.
This works if you are sure that any element in the list is at most k indices away from their correct position after the sorting.
but I do not think you meant any restriction.
Split the list into two equally-sized parts. Sort each part separately, using any usual algorithm. Then merge these parts. Perform some merge iterations as usual, comparing merged elements. For other merge iterations, do not compare the elements, but instead select element from the same part, as in the previous step. It is not necessary to use RNG to decide, how to treat each element. Just ignore sorting order for every N-th element.
Other variant of this approach nearly sorts an array nearly in-place. Split the array into two parts with odd/even indexes. Sort them. (It is even possible to use standard C++ algorithm with appropriately modified iterator, like boost::permutation_iterator). Reserve some limited space at the end of the array. Merge parts, starting from the end. If merged part is going to overwrite one of the non-merged elements, just select this element. Otherwise select element in sorted order. Level of randomness is determined by the amount of reserved space.
Assuming you want the array sorted in ascending order, I would do the following:
for M iterations
pick a random index i
pick a random index k
if (i<k)!=(array[i]<array[k]) then swap(array[i],array[k])
M controls the "sortedness" of the array - as M increases the array becomes more and more sorted. I would say a reasonable value for M is n^2 where n is the length of the array. If it is too slow to pick random elements then you can precompute their indices beforehand. If the method is still too slow then you can always decrease M at the cost of getting a poorer sort.
Take a small random subset of the data and sort it. You can use this as a map to provide an estimate of where every element should appear in the final nearly-sorted list. You can scan through the full list now and move/swap elements that are not in a good position.
This is basically O(n), assuming the small initial sorting of the subset doesn't take a long time. Hopefully you can build the map such that the estimate can be extracted quickly.
Bubblesort to the rescue!
For a unsorted array, you could pick a few random elements and bubble them up or down. (maybe by rotation, which is a bit more efficient) It will be hard to control the amount of (dis)order, even if you pick all N elements, you are not sure that the whole array will be sorted, because elements are moved and you cannot ensure that you touched every element only once.
BTW: this kind of problem tends to occur in game playing engines, where the list with candidate moves is kept more-or-less sorted (because of weighted sampling), and sorting after each iteration is too expensive, and only one or a few elements are expected to move.

Fast Algorithm for finding largest values in 2d array

I have a 2D array (an image actually) that is size N x N. I need to find the indices of the M largest values in the array ( M << N x N) . Linearized index or the 2D coords are both fine. The array must remain intact (since it's an image). I can make a copy for scratch, but sorting the array will bugger up the indices.
I'm fine with doing a full pass over the array (ie. O(N^2) is fine). Anyone have a good algorithm for doing this as efficiently as possible?
Selection is sorting's austere sister (repeat this ten times in a row). Selection algorithms are less known than sort algorithms, but nonetheless useful.
You can't do better than O(N^2) (in N) here, since nothing indicates that you must not visit each element of the array.
A good approach is to keep a priority queue made of the M largest elements. This makes something O(N x N x log M).
You traverse the array, enqueuing pairs (elements, index) as you go. The queue keeps its elements sorted by first component.
Once the queue has M elements, instead of enqueuing you now:
Query the min element of the queue
If the current element of the array is greater, insert it into the queue and discard the min element of the queue
Else do nothing.
If M is bigger, sorting the array is preferable.
NOTE: #Andy Finkenstadt makes a good point (in the comments to your question) : you definitely should traverse your array in the "direction of data locality": make sure that you read memory contiguously.
Also, this is trivially parallelizable, the only non parallelizable part is when you merge the queues when joining the sub processes.
You could copy the array into a single dimensioned array of tuples (value, original X, original Y ) and build a basic heap out of it in (O(n) time), provided you implement the heap as an array.
You could then retrieve the M largest tuples in O(M lg n) time and reference their original x and y from the tuple.
If you are going to make a copy of the input array in order to do a sort, that's way worse than just walking linearly through the whole thing to pick out numbers.
So the question is how big is your M? If it is small, you can store results (i.e. structs with 2D indexes and values) in a simple array or a vector. That'll minimize heap operations but when you find a larger value than what's in your vector, you'll have to shift things around.
If you expect M to get really large, then you may need a better data structure like a binary tree (std::set) or use sorted std::deque. std::set will reduce number of times elements must be shifted in memory, while if you use std::deque, it'll do some shifting, but it'll reduce number of times you have to go to the heap significantly, which may give you better performance.
Your problem doesn't use the 2 dimensions in any interesting way, it is easier to consiger the equivalent problem in a 2d array.
There are 2 main ways to solve this problem:
Mantain a set of M largest elements, and iterate through the array. (Using a heap allows you to do this efficiently).
This is simple and is probably better in your case (M << N)
Use selection, (the following algorithm is an adaptation of quicksort):
Create an auxiliary array, containing the indexes [1..N].
Choose an arbritary index (and corresponding value), and partition the index array so that indexes corresponding to elements less go to the left, and bigger elements go to the right.
Repeat the process, binary search style until you narrow down the M largest elements.
This is good for cases with large M. If you want to avoid worst case issues (the same quicksort has) then look at more advanced algorithms, (like median of medians selection)
How many times do you search for the largest value from the array?
If you only search 1 time, then just scan through it keeping the M largest ones.
If you do it many times, just insert the values into a sorted list (probably best implemented as a balanced tree).