there is a table which grows as
1,1
1,1,2
1,1,3,3
1,1,4,4,6
1,1,5,5,10,10
1,1,6,6,15,15,20
.....and so on
If i want to find an specific element of the table like if I want to find 4th element of 6th row then the answer will be 6 but if I want to find the nth element of mth row for any n>=1 m>=1 then how to do it?
These numbers look like binomial coefficients, so this "table" could be Pascal's triangle row-wise re-ordered by size.
Though, this is just one of the infinitely many "tables" that'd start like this. If you don't name a specific production rule or another way to deduce arbitrary values of the "table", there's no way telling for sure which of those infinitely many "tables" you have here.
I assume you want to hold the values in a kind of table without wasting memory by for example giving each line more slots than necessary.
To do that I'd suggest a vector of vectors (assuming your values are integers):
std::vector< std::vector<int> > table;
Provided you are sure that a value at (m, n) exists you can get it with:
int value = table[m][n];
(Note that m and n count from 0.)
If you're not sure use the safer
int value = table.at(m).at(n);
which will throw an exception if (m, n) doesn't exist.
To add a row you could call
table.resize(table.size() + 1);
and to add a column to a row
table[m].resize(table[m].size() + 1);
I'd recommend to put the table into the protected or private section of a special class and add functions to access the elements as needed.
Related
Assume I have the following array of objects:
Object 0:
[0]=1.1344
[1]=2.18
...
[N]=1.86
-----------
Object 1 :
[0]=1.1231
[1]=2.16781
...
[N]=1.8765
-------------
Object 2 :
[0]=1.2311
[1]=2.14781
...
[N]=1.5465
--------
Object 17:
[0]=1.31
[1]=2.55
...
[N]=0.75
How can I compare those objects?
You can see that object 0 and object 1 are very similar but object 17 not like any of them.
I would like to have algorithm tha twill give me all the similar object in my array
You tag this question with Algorithm (and I am not expert in C++) so lets give a pseudo code.
First, you should set a threshold which define 2 var with different under that threshold as similar. Second step will be to loop over all pair of elements and check for similarity.
Consider A to be array with n objects and m to be number of fields in each object.
threshold = 0.1
for i in (0, n):
for j in (i+1,n):
flag = true;
for k in (1,m):
if (abs(A[i][k] - A[j][k]) > threshold)
flag = false // if the absolute value of the diff is above the threshold object are not similar
break // no need to continue checks
if (flag)
print: element i and j similar // and do what ever
Time complexity is O(m * n^2).
Notice that you can use the same algorithm to sort the objects array - declare compare function as the max diff between field and then sort accordingly.
Hope that helps!
Your problem essentially boils down to nearest neighbor search which is a well researched problem in data mining.
There are diffent approaches to this problem.
I would suggest to decide first what number of similar elements you want OR to set a given threshold for the similarity. Than you have to iterate through all the vectors and compute a distance function between the query vector and each vector in the database.
I would suggest you to use Euclidean distance in your case since you have real nominal data.
You can read more about the topic of nearest neighbor search and Euclidean distancehere and here. Good luck!
What you need is a classifier, for your problem there are 2 algorithms depends on what you wanted.
If you need to find which object is most similar to the choosen object-m, you can use nearest neighbor algorithm or else if you need to find similar sets of objects you can use k-means algorithm to find k sets.
I have two datasets like below:
set1: 57.5276 55.3756 24.2798 54.5989
and
set2: 55.1118 55.004 24.824 57.1398
Now I want to arrange the second set such that it matches closest to the first set (I mean 57.1398 55.1118 24.824 55.004 this order).
How Can I do that in C++.
Arrange the second set such that it is ordered in the same way as the first set.
More specifically, the first set goes from the greatest number (57.5276) to the second greatest number (55.3756) to the fourth greatest to the third greatest.
Arrange the second set in the same way. Greatest (57.1398), second greatest (55.004), fourth greatest (24.824), third greatest (57.1398) in that order. This would minimize the average difference between items of the same index.
Programatically a simple way to implement this would be to sort both sets, then to find the sorted index of each number in the first set and arrange the second set in that same order.
Matching "closest" should be specified a little better. That could be, e.g., "minimum squared error", "minimum absolute error", "maximum correlation", which all would give different results.
Depending on what is meant with "closest", you may have to go through all permutations of set2, which would be expensive. If you want to go with the "sort both sets" solution, then a possible way to achieve this in c++ while retaining the order of the first set is to create a vector of indices into set1 and sorting this based on the values in set1
std::vector<double> set1{{57.5276,55.3756,24.2798,54.5989}};
std::vector<size_t> set1index(set1.size());
std::iota(set1index.begin(), set1index.end(), 0);
std::sort(set1index.begin(), set1index.end(), [&](size_t a, size_t b){return set1[a] < set1[b];});
You can then sort set2 and use the indices to recreate the order of set1 with the values of set2.
I have a model that has one attribute with a list of floats:
values = ArrayField(models.FloatField(default=0), default=list, size=64, verbose_name=_('Values'))
Currently, I'm getting my entries and order them according to the sum of all diffs with another list:
def diff(l1, l2):
return sum([abs(v1-v2) for v1, v2 in zip(l1, l2)])
list2 = [0.3, 0, 1, 0.5]
entries = Model.objects.all()
entries.sort(key=lambda t: diff(t.values, list2))
This works fast if my numer of entries is very slow small. But I'm afraid with a large number of entries, the comparison and sorting of all the entries will get slow since they have to be loaded from the database. Is there a way to make this more efficient?
best way is to write it yourself, right now you are iterating over a list over 4 times!
although this approach looks pretty but it's not good.
one thing that you can do is:
have a variable called last_diff and set it to 0
iterate through all entries.
iterate though each entry.values
from i = 0 to the end of list, calculate abs(entry.values[i]-list2[i])
sum over these values in a variable called new_diff
if new_diff > last_diff break from inner loop and push the entry into its right place (it's called Insertion Sort, check it out!)
in this way, in average scenario, time complexity is much lower than what you are doing now!
and maybe you must be creative too. I'm gonna share some ideas, check them for yourself to make sure that they are fine.
assuming that:
values list elements are always positive floats.
list2 is always the same for all entries.
then you may be able to say, the bigger the sum over the elements in values, the bigger the diff value is gonna be, no matter what are the elements in list2.
then you might be able to just forget about whole diff function. (test this!)
The only way to makes this really go faster, is to move as much work as possible to the database, i.e. the calculations and the sorting. It wasn't easy, but with the help of this answer I managed to actually write a query for that in almost pure Django:
class Unnest(models.Func):
function = 'UNNEST'
class Abs(models.Func):
function = 'ABS'
class SubquerySum(models.Subquery):
template = '(SELECT sum(%(field)s) FROM (%(subquery)s) _sum)'
x = [0.3, 0, 1, 0.5]
pairdiffs = Model.objects.filter(pk=models.OuterRef('pk')).annotate(
pairdiff=Abs(Unnest('values')-Unnest(models.Value(x, ArrayField(models.FloatField())))),
).values('pairdiff')
entries = Model.objects.all().annotate(
diff=SubquerySum(pairdiffs, field='pairdiff')
).order_by('diff')
The unnest function turns each element of the values into a row. In this case it happens twice, but the two resulting columns are instantly subtracted and made positive. Still, there are as many rows per pk as there are values. These need to be summed, but that's not as easy as it sounds. The column can't be simply be aggregated. This was by far the most tricky part—even after fiddling with it for so long, I still don't quite understand why Postgres needs this indirection. Of the few options there are to make it work, I believe a subquery is the single one expressible in Django (and only as of 1.11).
Note that the above behaves exactly the same as with zip, i.e. the when one array is longer than the other, the remainder is ignored.
Further improvements
While it will be a lot faster already when you don't have to retrieve all rows anymore and loop over them in Python, it doesn't change yet that it results in a full table scan. All rows will have to be processed, every single time. You can do better, though. Have a look into the cube extension. Use it to calculate the L1 distance—at least, that seems what you're calculating—directly with the <#> operator. That will require the use of RawSQL or a custom Expression. Then add a GiST index on the SQL expression cube("values"), or directly on the field if you're able to change the type from float[] to cube. In case of the latter, you might have to implement your own CubeField too—I haven't found any package yet that provides it. In any case, with all that in place, top-N queries on the lowest distance will be fully indexed hence blazing fast.
I need to keep data of the following form:
(a,b,1),
(c,d,2),
(e,f,3),
(g,h,4),
(i,j,5),
(k,l,6),
(m,a,7)
...
such that the integers within the data (3rd column) are consecutively ordered and are unique. Also there are 2,954,208,208 such rows. I am searching for a data structure which returns the value of the 3rd column given the value of first two columns e.g.
Given: (i,j) it returns 5
And given the value of 3rd column, first two columns can be retrieved. For example,
Given: 5 it returns (a,b)
Is there some data structure which may help me achieve the same.
My approach towards solving this problem was to use hash-maps..but hash-maps do not turn out to be efficient. Is there some other way out.
The values in the first, second and third column are all of 64-bit.
I have 4GB of RAM.
Here is a recursive function that I'm trying to create that finds all the subsets passed in an STL set. the two params are an STL set to search for subjects, and a number i >= 0 which specifies how big the subsets should be. If the integer is bigger then the set, return empty subset
I don't think I'm doing this correctly. Sometimes it's right, sometimes its not. The stl set gets passed in fine.
list<set<int> > findSub(set<int>& inset, int i)
{
list<set<int> > the_list;
list<set<int> >::iterator el = the_list.begin();
if(inset.size()>i)
{
set<int> tmp_set;
for(int j(0); j<=i;j++)
{
set<int>::iterator first = inset.begin();
tmp_set.insert(*(first));
the_list.push_back(tmp_set);
inset.erase(first);
}
the_list.splice(el,findSub(inset,i));
}
return the_list;
}
From what I understand you are actually trying to generate all subsets of 'i' elements from a given set right ?
Modifying the input set is going to get you into trouble, you'd be better off not modifying it.
I think that the idea is simple enough, though I would say that you got it backwards. Since it looks like homework, i won't give you a C++ algorithm ;)
generate_subsets(set, sizeOfSubsets) # I assume sizeOfSubsets cannot be negative
# use a type that enforces this for god's sake!
if sizeOfSubsets is 0 then return {}
else if sizeOfSubsets is 1 then
result = []
for each element in set do result <- result + {element}
return result
else
result = []
baseSubsets = generate_subsets(set, sizeOfSubsets - 1)
for each subset in baseSubssets
for each element in set
if no element in subset then result <- result + { subset + element }
return result
The key points are:
generate the subsets of lower rank first, as you'll have to iterate over them
don't try to insert an element in a subset if it already is, it would give you a subset of incorrect size
Now, you'll have to understand this and transpose it to 'real' code.
I have been staring at this for several minutes and I can't figure out what your train of thought is for thinking that it would work. You are permanently removing several members of the input list before exploring every possible subset that they could participate in.
Try working out the solution you intend in pseudo-code and see if you can see the problem without the stl interfering.
It seems (I'm not native English) that what you could do is to compute power set (set of all subsets) and then select only subsets matching condition from it.
You can find methods how to calculate power set on Wikipedia Power set page and on Math Is Fun (link is in External links section on that Wikipedia page named Power Set from Math Is Fun and I cannot post it here directly because spam prevention mechanism). On math is fun mainly section It's binary.
I also can't see what this is supposed to achieve.
If this isn't homework with specific restrictions i'd simply suggest testing against a temporary std::set with std::includes().