C++ sort n-dimension array across arbitrary dimension - c++

I have n-dimension arrays of double values stored as vectors with last index-major, that have to be sorted w.r.t. a given dimension k. A classical strategy consists in first permuting dimensions k and 1 then sort i.e. with std::sort with respect to first index then permuting dimensions again. I was wondering how this strategy would compare (in terms of speed) to the use of a custom iterator.
For example, let us consider X a NxM array 2d, second index major, that has to be sorted with respect to second index. For a given row number i, values to be sorted are X[i+N*j], j=0..M-1, and could be accessed and sorted by using a custom iterator taking into account that values are not consecutive in memory.
How would compare the respective cost (in terms of e.g. cache, page faults, ...) of accessing non-consecutive values while sorting (with the custom iterator) w.r.t. the cost of the same type of access, but only during the two permutations ?

Related

Time complexity for searching a list of lists?

Suppose I store a matrix as a list of lists, where the first list represents the rows, and each element in that list is a list representing the values in that row. What would the time complexity for finding an element be?
If I'm not mistaken, the average time complexity finding an element in a list using linear search is O(n), does that mean the average time complexity for a list of lists is O(n2)?
If n means the width and height of a square matrix, then a linear search in the matrix will take O(n2) time. More generally, a linear search in a rectangular m×n matrix will take O(mn) time. Both are because that is the number of entries in the matrix, and a linear search will do O(1) work per entry.
If instead you use n to mean the total number of entries in the matrix, then the time complexity is O(n) for the same reason as above.
Note that the above assumes testing for the search target takes O(1) time (e.g. comparing primitive integers). If that's false, then you should multiply the above by the time complexity of the equality test; for example, if for some reason you have an m×n matrix of strings of length c, then the running time will be O(mnc).
Well, you would have two indexes. So the complexity is equal two a list, but multiplied by two. That beeing said, if you want to model a matrix, you might consider using an array of arrays (not to be confused with a jagged array), because in normal circumstances, a matrix has a fixed size (a List<T> is a wrapper around a T[] to allow adding and removing items).

What is the cheapest way to sort a permutation in C++?

The problem is:
You have to sort an array in ascending order(permutation: numbers from 1 to N in a random order) using series of swaps. Every swap has a price and there are 5 types of prices. Write a program that sorts the given array for the smallest price.
There are two kinds of prices: priceByValue and priceByIndex. All of the prices of a kind are given in 2 two-dimensional arrays N*N. Example how to access prices:
You want to swap the 2nd and the 5th elements from the permutation with values of 4 and 7. The price for this swap will be priceByValue[4][7] + priceByIndex[2][5].
Indexes of all arrays are counted from 1 (, not from 0) in order to have access to all of the prices (the permutation elements’ values start from 1): priceByIndex[2][5] would actually be priceByIndex[1][4] in code. Moreover, the order of the indexes by which you access prices from the two-dimensional arrays doesn’t matter: priceByIndex[i][j] = priceByIndex[j][i] and priceByIndex[i][i] is always equal to 0. (priceByValue is the same)
Types of prices:
Price[i][j] = 0;
Price[i][j] = random number between 1 and 4*N;
Price[i][j] = |i-j|*6;
Price[i][j] = sqrt(|i-j|) *sqrt(N)*15/4;
Price[i][j] = max(i,j)*3;
When you access prices by index i and j are the indexes of the elements you want to swap from the original array; when you access prices by value i and j are the values of the elements you want to swap from the original array. (And they are always counted from 1)
Things given:
N - an integer from 1 to 400, Mixed array, Type of priceByIndex, priceByIndex matrix, Type of priceByValue, priceByValue matrix. (all elements of a matrix are from the given type)
Things that should 'appear on the screen': number of swaps, all swaps (only by index - 2 5 means that you have swapped 2nd and 3rd elements) and the price.
As I am still learning C++, I was wondering what is the most effective way to sort the array in order to try to find the sort with the smallest cost.
There might be a way how to access series of swaps that result a sorted array and see which one is with the smallest price and I need to sort the array by swapping the elements which are close by both value and index, but I don’t know how to do this. I would be very grateful if someone can give me a solution how to find the cheapest sort in code. Thank you in advance!
More: this problem might have no solution, I am just trying to get a result close to the ideal.
Dynamic Programming!
Think of the problem as a graph. Each of the N-factorial permutations represents a graph vertex, and the allowed swaps are just arcs between vertices. The price-tag of a swap is just the weight on the arc.
When you look at the problem this way, it can be easily solved with Dijkstra's algortihm for finding the lowest cost path through a graph from one vertex to another.
This is also called Single Pair Shortest Path
you can use an algorithm for sorting an array in lexicographical order and modify it so that it fits your needs ( you did not mention the sorting criteria like the desired result i.e. least value first, ... ) there are multiple algorithms available for this, i.e. quick sort,...
a code example is in https://www.geeksforgeeks.org/lexicographic-permutations-of-string/

Best data structure for finding maximum in a 2d matrix with update queries

I have a 2d matrix of doubles. My task is to find the maximum element of the matrix at any point.
Queries will be of 2 types:
Update query : In this query, 2n - 1 elements will be updated, i.e. all elements of row i and column i will be updated. (by updates I mean, change the element, it can be anything, increment or decrement)
Maximum query : Return maximum element in the 2d array.
I came up with a solution by using binary heaps. My idea is to keep a maxheap of n^2 elements implemented using an array, and maintain another array of size n^2 to keep the indices of heap elements. So (i,j)th element in the matrix which is nothing but (i*n + j)th element in the flat array will store the index corresponding to its position in the heap.
So this way, 2n-1 updates will be handled in (2n-1)log(n^2) time. And maximum query can be answered in O(1) time.
I wasn't able to use STL implementation because I have to keep track of heap elements, i.e. upon update query I should know which heap elements are supposed to be updated. STL also doesn't support changing keys.
How do I improve the update query time? Is there some other data structure which can handle these operations faster?
I'd use a STL vector of indices i*n+j. Keep sorted this n^2 sized array using your own compare function. Sorting after update is n^2 O(log n^2). Querying the maximun is asking for the first element in the vector.
Edit
If you're interested only in maximum value, you can keep its position (i,j) cached. When the matrix is updated it will need to be sorted again only if this cached position change.

Hash function to map unique array of int to an index of range 0..n

I need to map an array of sorted integers, with the length varying from 1 to 4 at max, to an index of a global array. Like [13,24,32] becoming a number in range 0..n, and no other array mapping to that same number.
The quantity of arrays is a few millions, and the mapping has to be "unique" (or at least with very few collisions for smaller arrays), because these arrays represent itemsets and I use the k-1 smaller itemset to build the one of size k.
My current implementation uses a efficient hash function that produces a double between 0..1 for an array, and I store the itemsets in STL Map, with the double as the key. Got from this article:
N. D. Atreas, C. Karanikas “A faster pattern matching algorithm based on prime numbers and hashing approximation “, 2007
I'm going to implement a parallel version of this in CUDA, so I can't use something like STL Map. I could create myself easily a self-balancing binary search tree as a map in the GPU global memory, but that would be really slow. So in order to reduce global memory access to a minimum, I need to map the itemset to a huge array in the global memory.
I've tried to cast the double as a long integer and hash it with a 64 bit hash function, but it produces some collision, as expected.
So, I ask if there's a "unique" hash function for doubles between 0..1, or for array of integers from size 1..4, that gives an unique index for a table of size N.
If I make this an assumption about your arrays:
each item of your arrays (such as 13) are 32-bit integers.
Then what you ask is impossible.
You have at least 2^(32*4) possible values, or 128 bits. And you are trying to pack those into an array of much smaller size (20 bits for one million entries). You cannot do this without collisions (or there being some agreement amongst the elements, such as each element choosing the "next available index", but that's not a hash).

Fast Algorithm for finding largest values in 2d array

I have a 2D array (an image actually) that is size N x N. I need to find the indices of the M largest values in the array ( M << N x N) . Linearized index or the 2D coords are both fine. The array must remain intact (since it's an image). I can make a copy for scratch, but sorting the array will bugger up the indices.
I'm fine with doing a full pass over the array (ie. O(N^2) is fine). Anyone have a good algorithm for doing this as efficiently as possible?
Selection is sorting's austere sister (repeat this ten times in a row). Selection algorithms are less known than sort algorithms, but nonetheless useful.
You can't do better than O(N^2) (in N) here, since nothing indicates that you must not visit each element of the array.
A good approach is to keep a priority queue made of the M largest elements. This makes something O(N x N x log M).
You traverse the array, enqueuing pairs (elements, index) as you go. The queue keeps its elements sorted by first component.
Once the queue has M elements, instead of enqueuing you now:
Query the min element of the queue
If the current element of the array is greater, insert it into the queue and discard the min element of the queue
Else do nothing.
If M is bigger, sorting the array is preferable.
NOTE: #Andy Finkenstadt makes a good point (in the comments to your question) : you definitely should traverse your array in the "direction of data locality": make sure that you read memory contiguously.
Also, this is trivially parallelizable, the only non parallelizable part is when you merge the queues when joining the sub processes.
You could copy the array into a single dimensioned array of tuples (value, original X, original Y ) and build a basic heap out of it in (O(n) time), provided you implement the heap as an array.
You could then retrieve the M largest tuples in O(M lg n) time and reference their original x and y from the tuple.
If you are going to make a copy of the input array in order to do a sort, that's way worse than just walking linearly through the whole thing to pick out numbers.
So the question is how big is your M? If it is small, you can store results (i.e. structs with 2D indexes and values) in a simple array or a vector. That'll minimize heap operations but when you find a larger value than what's in your vector, you'll have to shift things around.
If you expect M to get really large, then you may need a better data structure like a binary tree (std::set) or use sorted std::deque. std::set will reduce number of times elements must be shifted in memory, while if you use std::deque, it'll do some shifting, but it'll reduce number of times you have to go to the heap significantly, which may give you better performance.
Your problem doesn't use the 2 dimensions in any interesting way, it is easier to consiger the equivalent problem in a 2d array.
There are 2 main ways to solve this problem:
Mantain a set of M largest elements, and iterate through the array. (Using a heap allows you to do this efficiently).
This is simple and is probably better in your case (M << N)
Use selection, (the following algorithm is an adaptation of quicksort):
Create an auxiliary array, containing the indexes [1..N].
Choose an arbritary index (and corresponding value), and partition the index array so that indexes corresponding to elements less go to the left, and bigger elements go to the right.
Repeat the process, binary search style until you narrow down the M largest elements.
This is good for cases with large M. If you want to avoid worst case issues (the same quicksort has) then look at more advanced algorithms, (like median of medians selection)
How many times do you search for the largest value from the array?
If you only search 1 time, then just scan through it keeping the M largest ones.
If you do it many times, just insert the values into a sorted list (probably best implemented as a balanced tree).