I have two datasets like below:
set1: 57.5276 55.3756 24.2798 54.5989
and
set2: 55.1118 55.004 24.824 57.1398
Now I want to arrange the second set such that it matches closest to the first set (I mean 57.1398 55.1118 24.824 55.004 this order).
How Can I do that in C++.
Arrange the second set such that it is ordered in the same way as the first set.
More specifically, the first set goes from the greatest number (57.5276) to the second greatest number (55.3756) to the fourth greatest to the third greatest.
Arrange the second set in the same way. Greatest (57.1398), second greatest (55.004), fourth greatest (24.824), third greatest (57.1398) in that order. This would minimize the average difference between items of the same index.
Programatically a simple way to implement this would be to sort both sets, then to find the sorted index of each number in the first set and arrange the second set in that same order.
Matching "closest" should be specified a little better. That could be, e.g., "minimum squared error", "minimum absolute error", "maximum correlation", which all would give different results.
Depending on what is meant with "closest", you may have to go through all permutations of set2, which would be expensive. If you want to go with the "sort both sets" solution, then a possible way to achieve this in c++ while retaining the order of the first set is to create a vector of indices into set1 and sorting this based on the values in set1
std::vector<double> set1{{57.5276,55.3756,24.2798,54.5989}};
std::vector<size_t> set1index(set1.size());
std::iota(set1index.begin(), set1index.end(), 0);
std::sort(set1index.begin(), set1index.end(), [&](size_t a, size_t b){return set1[a] < set1[b];});
You can then sort set2 and use the indices to recreate the order of set1 with the values of set2.
Related
Suppose we have an array of length 10 like [1,1,1,1,1,1,1,1,1,1]. After multiple queries of range, I wanted to update this array in this manner:
update(2,5) [1,2,2,2,2,1,1,1,1,1]
update (3,4) [1,2,3,3,2,1,1,1,1,1]
update (1,3) [2,3,4,3,2,1,1,1,1,1]
update(5,6) [2,3,4,3,3,2,1,1,1,1]
and so on.
In short the update function will increase the values within the given range by 1.
It's not necessary to print the array after each update. I want to get the array after Q queries. So is there any efficient way to do this?
I already did the naive approach which took O(n^2) time.
High level overview with simple integer order value to get my point across:
id (primary) | order (sort) | attributes ..
----------------------------------------------------------
ft8df34gfx 1 ...
ft8df34gfx 2 ...
ft8df34gfx 3 ...
ft8df34gfx 4 ...
ft8df34gfx 5 ...
Usually it would be easy to change the order (e.g if user drags and drops list items on front-end): shift item around, calculate new order values and update affected items in db with new order.
Constraints:
Doesn't have all the items at once, only a subset of them (think pagination)
Update only a single item in db if single item is moved (1 item per shift)
My initial idea:
Use epoch as order and append something unique to avoid duplicate epoch times, e.g <epoch>#<something-unique-to-item>. Initial value is insertion time (default order is therefore newest first).
Client/server (whoever calculates order) knows the epoch for each item in subset of items it has.
If item is shifted, look at the epoch of previous and next item (if has previous or next - could be moved to first or last), pick a value between and update. More than 1 shifts? Repeat the process.
But..
If items are shifted enough times, epoch values get closer and closer to each other until you can't find a middleground with whole integers.
Add lots of zeroes to epoch on insert? Still reach limit at some point..
If item is shifted to first or last and there are items in previous or next page (remember, pagination), we don't know these values and can't reliably find a "value between".
Fetch 1 extra hidden item from previous and next page? Querying gets complicated..
Is this even possible? What type/value should I use as order?
DynamoDB does not allow the primary partition and sort keys to be changed for a particular item (to change them, the item would need to be deleted and recreated with the new key values), so you'll probably want to use a local or global secondary index instead.
Assuming the partition/sort keys you're mentioning are for a secondary index, I recommend storing natural numbers for the order (1, 2, 3, etc.) and then updating them as needed.
Effectively, you would have three cases to consider:
Adding a new item - You would perform a query on the secondary partition key with ScanIndexForward = false (to reverse the results order), with a projection on the "order" attribute, limited to 1 result. That will give you the maximum order value so far. The new item's order will just be this maximum value + 1.
Removing an item - It may seem unsettling at first, but you can freely remove items without touching the orders of the other items. You may have some holes in your ordering sequence, but that's ok.
Changing the order - There's not really a way around it; your application logic will need to take the list of affected items and write all of their new orders to the table. If the items used to be (A, 1), (B, 2), (C, 3) and they get changed to A, C, B, you'll need to write to both B and C to update their orders accordingly so they end up as (A, 1), (C, 2), (B, 3).
there is a table which grows as
1,1
1,1,2
1,1,3,3
1,1,4,4,6
1,1,5,5,10,10
1,1,6,6,15,15,20
.....and so on
If i want to find an specific element of the table like if I want to find 4th element of 6th row then the answer will be 6 but if I want to find the nth element of mth row for any n>=1 m>=1 then how to do it?
These numbers look like binomial coefficients, so this "table" could be Pascal's triangle row-wise re-ordered by size.
Though, this is just one of the infinitely many "tables" that'd start like this. If you don't name a specific production rule or another way to deduce arbitrary values of the "table", there's no way telling for sure which of those infinitely many "tables" you have here.
I assume you want to hold the values in a kind of table without wasting memory by for example giving each line more slots than necessary.
To do that I'd suggest a vector of vectors (assuming your values are integers):
std::vector< std::vector<int> > table;
Provided you are sure that a value at (m, n) exists you can get it with:
int value = table[m][n];
(Note that m and n count from 0.)
If you're not sure use the safer
int value = table.at(m).at(n);
which will throw an exception if (m, n) doesn't exist.
To add a row you could call
table.resize(table.size() + 1);
and to add a column to a row
table[m].resize(table[m].size() + 1);
I'd recommend to put the table into the protected or private section of a special class and add functions to access the elements as needed.
I need to keep data of the following form:
(a,b,1),
(c,d,2),
(e,f,3),
(g,h,4),
(i,j,5),
(k,l,6),
(m,a,7)
...
such that the integers within the data (3rd column) are consecutively ordered and are unique. Also there are 2,954,208,208 such rows. I am searching for a data structure which returns the value of the 3rd column given the value of first two columns e.g.
Given: (i,j) it returns 5
And given the value of 3rd column, first two columns can be retrieved. For example,
Given: 5 it returns (a,b)
Is there some data structure which may help me achieve the same.
My approach towards solving this problem was to use hash-maps..but hash-maps do not turn out to be efficient. Is there some other way out.
The values in the first, second and third column are all of 64-bit.
I have 4GB of RAM.
I’m not specialist in signal processing. I’m doing simple processing on 1D signal using c++. I want really to know how I can determine the part that have the highest zero cross rate (highest frequency!). Is there a simple way or method to tell the beginning and the end of this part.
This image illustrate the form of my signal, and this image is what I need to do (two indexes of beginning and end)
Edited:
Actually I have no prior idea about the width of the beginning and the end, it's so variable.
I could calculate the number of zero crossing, but I have no idea how to define it's range
double calculateZC(vector<double> signals){
int ZC_counter=0;
int size=signals.size();
for (int i=0; i<size-1; i++){
if((signals[i]>=0 && signals[i+1]<0) || (signals[i]<0 && signals[i+1]>=0)){
ZC_counter++;
}
}
return ZC_counter;
}
Here is a fairly simple strategy which might give you some point to start. The outline of the algorithm is as follows
Input: Vector of your data points {y0,y1,...}
Parameters:
Window size sigma.
A threshold 0<p<1 defining when to start looking for a region.
Output: The start- and endpoint {t0,t1} of the region with the most zero-crossings
I won't give any C++ code, but the method should be easy to implement. As example let us use the following function
What we desire is the region between about 480 and 600 where the zero density higher than in the front. First step in the algorithm is to calculate the positions of zeros. You can do this by what you already have but instead of counting, you store the values for i where you met a zero.
This will give you a list of zero positions
From this list (you can do this directly in the above for-loop!) you create a list having the same size as your input data which looks like {0,0,0,...,1,0,..,1,0,..}. Every zero-crossing position in your input data is marked with a 1.
The next step is to smooth this list with a smoothing filter of size sigma. Here, you can use what you like; in the simplest case a moving average or a Gaussian filter. The higher you choose sigma the bigger becomes your look around window which measures how many zero-crossings are around a certain point. Let me give the output of this filter together with the original zero positions. Note that I used a Gaussian filter of size 10 here
In a next step, you go through the filtered data find the maximum value. In this case it is about 0.15. Now you choose your second parameter which is some percentage of this maximum. Lets say p=0.6.
The final step is to go through the filtered data and when the value is greater than p you start to remember a new region. As soon as the value drops below p, you end this region and remember start and endpoint. Once you are finished walking through the data, you are left with a list of regions, each defined by a start and an endpoint. Now you choose the region with the biggest extend and you are done.
(Optionally, you could add the filter size to each end of the final region)
For the above example, I get 11 regions as follows
{{164,173},{196,205},{220,230},{241,252},{259,271},{278,290},
{297,309},{318,327},{341,350},{458,468},{476,590}}
where the one with the biggest extend is the last one {476,590}. The final result looks (with 1/2 filter region padding)
Conclusion
Please don't be discouraged by the length of my answer. I tried to explain everything in detail. The implementation is really just some loops:
one loop to create the zero-crossings list {0,0,..,1,0,...}
one nested loop for the moving average filter (or you use some library Gaussian filter). Here you can at the same time extract the maximum value
one loop to extract all regions
one loop to extract the largest region if you haven't already extracted it in the above step