Suppose we have an array of length 10 like [1,1,1,1,1,1,1,1,1,1]. After multiple queries of range, I wanted to update this array in this manner:
update(2,5) [1,2,2,2,2,1,1,1,1,1]
update (3,4) [1,2,3,3,2,1,1,1,1,1]
update (1,3) [2,3,4,3,2,1,1,1,1,1]
update(5,6) [2,3,4,3,3,2,1,1,1,1]
and so on.
In short the update function will increase the values within the given range by 1.
It's not necessary to print the array after each update. I want to get the array after Q queries. So is there any efficient way to do this?
I already did the naive approach which took O(n^2) time.
Related
I have a model that has one attribute with a list of floats:
values = ArrayField(models.FloatField(default=0), default=list, size=64, verbose_name=_('Values'))
Currently, I'm getting my entries and order them according to the sum of all diffs with another list:
def diff(l1, l2):
return sum([abs(v1-v2) for v1, v2 in zip(l1, l2)])
list2 = [0.3, 0, 1, 0.5]
entries = Model.objects.all()
entries.sort(key=lambda t: diff(t.values, list2))
This works fast if my numer of entries is very slow small. But I'm afraid with a large number of entries, the comparison and sorting of all the entries will get slow since they have to be loaded from the database. Is there a way to make this more efficient?
best way is to write it yourself, right now you are iterating over a list over 4 times!
although this approach looks pretty but it's not good.
one thing that you can do is:
have a variable called last_diff and set it to 0
iterate through all entries.
iterate though each entry.values
from i = 0 to the end of list, calculate abs(entry.values[i]-list2[i])
sum over these values in a variable called new_diff
if new_diff > last_diff break from inner loop and push the entry into its right place (it's called Insertion Sort, check it out!)
in this way, in average scenario, time complexity is much lower than what you are doing now!
and maybe you must be creative too. I'm gonna share some ideas, check them for yourself to make sure that they are fine.
assuming that:
values list elements are always positive floats.
list2 is always the same for all entries.
then you may be able to say, the bigger the sum over the elements in values, the bigger the diff value is gonna be, no matter what are the elements in list2.
then you might be able to just forget about whole diff function. (test this!)
The only way to makes this really go faster, is to move as much work as possible to the database, i.e. the calculations and the sorting. It wasn't easy, but with the help of this answer I managed to actually write a query for that in almost pure Django:
class Unnest(models.Func):
function = 'UNNEST'
class Abs(models.Func):
function = 'ABS'
class SubquerySum(models.Subquery):
template = '(SELECT sum(%(field)s) FROM (%(subquery)s) _sum)'
x = [0.3, 0, 1, 0.5]
pairdiffs = Model.objects.filter(pk=models.OuterRef('pk')).annotate(
pairdiff=Abs(Unnest('values')-Unnest(models.Value(x, ArrayField(models.FloatField())))),
).values('pairdiff')
entries = Model.objects.all().annotate(
diff=SubquerySum(pairdiffs, field='pairdiff')
).order_by('diff')
The unnest function turns each element of the values into a row. In this case it happens twice, but the two resulting columns are instantly subtracted and made positive. Still, there are as many rows per pk as there are values. These need to be summed, but that's not as easy as it sounds. The column can't be simply be aggregated. This was by far the most tricky part—even after fiddling with it for so long, I still don't quite understand why Postgres needs this indirection. Of the few options there are to make it work, I believe a subquery is the single one expressible in Django (and only as of 1.11).
Note that the above behaves exactly the same as with zip, i.e. the when one array is longer than the other, the remainder is ignored.
Further improvements
While it will be a lot faster already when you don't have to retrieve all rows anymore and loop over them in Python, it doesn't change yet that it results in a full table scan. All rows will have to be processed, every single time. You can do better, though. Have a look into the cube extension. Use it to calculate the L1 distance—at least, that seems what you're calculating—directly with the <#> operator. That will require the use of RawSQL or a custom Expression. Then add a GiST index on the SQL expression cube("values"), or directly on the field if you're able to change the type from float[] to cube. In case of the latter, you might have to implement your own CubeField too—I haven't found any package yet that provides it. In any case, with all that in place, top-N queries on the lowest distance will be fully indexed hence blazing fast.
I am using boost::accumulators::tag::extended_p_square_quantile for calculating percentile. In this, I also need to feed probabilities to the accumulator so I did this m_acc = AccumulatorType(boost::accumulators::extended_p_square_probabilities = probs); where probs is a vector containing the probabilities.
Values in the prob vector are {0.5,0.3,0.9,0.7}
I provided some sample values to accumulator.
But when I try to get the percentile using boost::accumulators::quantile(m_acc, boost::accumulators::quantile_probability = probs[0]); it returns incorrect values and even nan sometimes.
What is wrong here?
I ran into this problem and wasted lot of time to figured out the problem and therefore want to answer this.
Problem is with the vector. Vector should be shorted in increasing order of its values.
Change the vector values to this {0.3,0.5,0.7,0.9} and it will work as expected.
So if someone is using tag::extended_p_square_quantile for percentile(which supports multiple probabilities) then (s)he needs to give probabilities(vector/array/list) in sorted order.
This isn't the case with tag::p_square_quantile because we can give only one value(probability) in it.
I have a table with properties as ReadingTime, Frequency and I would like to insert 3 values in between those records where the time difference is greater than 12 hours. I could determine the time difference using the "Time Difference" node available but could not insert rows as per the requirement. Is there any way to attain this in knime ?
In case you are using Time Generator in a chunk loop (with the lagged column and Use second column option on the Time Difference node), you can generate as many nodes as you want (I assume you already use some switches/if nodes).
I need to keep data of the following form:
(a,b,1),
(c,d,2),
(e,f,3),
(g,h,4),
(i,j,5),
(k,l,6),
(m,a,7)
...
such that the integers within the data (3rd column) are consecutively ordered and are unique. Also there are 2,954,208,208 such rows. I am searching for a data structure which returns the value of the 3rd column given the value of first two columns e.g.
Given: (i,j) it returns 5
And given the value of 3rd column, first two columns can be retrieved. For example,
Given: 5 it returns (a,b)
Is there some data structure which may help me achieve the same.
My approach towards solving this problem was to use hash-maps..but hash-maps do not turn out to be efficient. Is there some other way out.
The values in the first, second and third column are all of 64-bit.
I have 4GB of RAM.
I am writing a code to do some template matching using cv::matchTemplate but I have run into some problems with the 2-dimensional vector of vectors (vov) I created which I have called vvABC. At the moment, my vov has 10 elements which can change based on the values I pass while running the code.
My problem is moving from one column in my vov to the next so I can calculate the size. From my understanding of how vov works, if I have my elements stored in my vov as:
C_A C_B
0 0
1 1
2 2
3
4
5
6
To calculate the size of the first column, I should simply do something like:
vvABC[0].size() to get the size of the first column (which would give 3 in this case) and vvABC[1].size() to get the size of the second column (which would give 7). The problem I am now faced with is both of them give '3' in both cases which is obviously wrong.
Can someone please help me out on how I can get the correct size of the next column?
I stored my detections in my vvABC, now I want to match them one at a time.
It seems like you made a mistake here:
for (uint iCaTemplate = iCa + 1; iCaTemplate < vvABC[iCa].size(); ++iCaTemplate) {
iCa is an index on the 'first level' of vector (of size 2 in your example above), i.e. columns, and you use it to go through the elements of the 'second level' of vector, i.e. rows.
Thanks a lot guys, esp. JGab, after several debug outputs, I finally found that my vector of vectors wasn't being filled up the way I thought it was...thanks once more and my apologies for my belated response.