O(1) Django ORM strategy to query related objects of related objects - django

The relationship between Foo and Bar is through Baz as follows:
class Foo(Model):
# stuff
class Bar(Model)
# stuff
class Baz(Model):
foos = ManyToManyField("Foo")
bar = ForeignKey("Bar")
I basically need to generate the following dict representing the Bars that are related to each Foo through Baz (in dict comprehension pseudo-code):
{ foo.id: [list of unique bars related to the foo through any baz] for foo in all foos}
I can currently generate my data structure with O(N) queries (1 query per Foo), but with lots of data this is a bottleneck, and I need it optimized to O(1) (not a single query per se, but a fixed number of queries irrespective of data size of any of the models), while also minimizing iterations of the data in python.

If you can drop to SQL, you could use the single query (the appname should prefix all the tables names):
select distinct foo.id, bar.id
from baz_foos
join baz on baz_foos.baz_id = baz.id
join foo on baz_foos.foo_id = foo.id
join bar on baz.bar_id = bar.id
baz_foos is the many-to-many table Django creates.
#Alasdair's solution is possibly/probably more readable (although if you're doing this for performance reasons that might not be most important). His solution uses exactly two queries (which is hardly a difference). The only problem I see is if you have a large number of Baz objects since the generated sql looks like this:
SELECT "foobar_baz"."id", "foobar_baz"."bar_id", "foobar_bar"."id"
FROM "foobar_baz"
INNER JOIN "foobar_bar" ON ("foobar_baz"."bar_id" = "foobar_bar"."id")
SELECT
("foobar_baz_foos"."baz_id") AS "_prefetch_related_val",
"foobar_foo"."id"
FROM "foobar_foo"
INNER JOIN "foobar_baz_foos" ON ("foobar_foo"."id" = "foobar_baz_foos"."foo_id")
WHERE "foobar_baz_foos"."baz_id" IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94,
95, 96, 97, 98, 99, 100, 101)
If you have only a few Bar's and a few hundred Foo's, I would do:
from django.db import connection
from collections import defaultdict
# foos = {f.id: f for f in Foo.objects.all()}
bars = {b.id: b for b in Bar.objects.all()}
c = connection.cursor()
c.execute(sql) # from above
d = defaultdict(set)
for f_id, b_id in c.fetchall():
d[f_id].add(bars[b_id])

Using select_related and prefetch_related, I think you can build the required data structure with 2 queries:
out = {}
bazes = Baz.objects.select_related('bar').prefetch_related('foos')
for baz in bazes:
for foo in baz.foos.all():
out.setdefault(foo.id, set()).add(baz.bar)
The values of the output dictionary are sets, not lists as in your question, to ensure uniqueness.

Related

Different behavior between Enum.chunk(arr, 3) and Enum.chunk_every(arr, 3)

i have a data structure of a flat array of numbers
[145, 46, 200, 3, 178, 206, 73, 228, 165, 65, 6, 141, 73, 90, 181, 100]
i need to make an array of arrays with a max of 3 items per sub array. So i look at some examples, and Enum.chunk(arr, n) seems like a candidate
so .chuck(arr, 3) says its deprecated, use chuck_every(arr, 3) instead, so i did that and it produces a strange result vs chunk
for example: chunk returns
[[145, 46, 200], [3, 178, 206], [73, 228, 165], [65, 6, 141], [73, 90, 181]]
while chunk_every returns
[145, 46, 200],
[3, 178, 206],
[73, 228, 165],
[65, 6, 141],
[73, 90, 181],
'p']
the main difference being an extra random element which is a string???
it's almost like it converted the element that chunk cuts off and converts it to a string?
Naturally I am expecting the replacement method would have the same output given the same input. Right?
Look at last element: 100. chunk seems to discard that value while chunk_every add it at last element alone. That is the p character you see. Elixir try to show as chars arrays of numbers in the console, as that is its internal representation.
As you can see in the documentation, you can pass :discard as leftover parameter to behave as deprecated chunk function.
https://hexdocs.pm/elixir/Enum.html#chunk_every/2
Enum.chunk_every/4 was designed by this,
actually its a number if you do like this:
[145, 46, 200, 3, 178, 206, 73, 228, 165, 65, 6, 141, 73, 90, 181, 100]
|> Enum.chunk_every(3, 3, [])
|> Enum.each(fn item ->
IO.inspect item, charlists: false
end)
you can find more detail from official discussion:
https://github.com/elixir-lang/elixir/issues/7260
Sometimes it makes sense to implement basic functionality ourselves, instead of looking up the standard library, for the precisely controlled result.
Here is a recursive implementation, that discards the tail.
input =
[145, 46, 200, 3, 178, 206, 73, 228,
165, 65, 6, 141, 73, 90, 181, 100]
defmodule MyEnum do
def chunk_3(input), do: do_chunk_3(input, [])
defp do_chunk_3([e1, e2, e3 | rest], acc),
do: do_chunk_3(rest, [[e1, e2, e3] | acc])
defp do_chunk_3(_, acc), do: Enum.reverse(acc)
end
MyEnum.chunk_3(input)
#⇒ [[145, 46, 200],
# [3, 178, 206],
# [73, 228, 165],
# [65, 6, 141],
# [73, 90, 181]]

Divide (and replace) numbers extracted from a string in Google Sheets

I'm trying to convert numbers that were previously percentages to a decimal format by dividing them by 100 in Google Sheets. Basically, I have:
<polygon points="48, 6, 43, 7, 38, 9, 34, 12, 29, 16, 24, 22, 22, 30, 22, 44, 23, 50, 23, 65, 25, 72, 28, 77, 32, 82, 35, 86, 40, 90, 43, 92, 50, 93, 55, 91, 62, 87, 70, 76, 74, 69, 75, 64, 75, 54, 74, 49, 74, 40, 74, 32, 71, 23, 66, 15, 59, 9, 53, 6" />
And I want:
<polygon points=".48, .06, .43, .07, .38, .09, .34, .12, .29, .16, .24, .22, .22, .30, .22, .44, .23, .50, .23, .65, .25, .72, .28, .77, .32, .82, .35, .86, .40, .90, .43, .92, .50, .93, .55, .91, .62, .87, .70, .76, .74, .69, .75, .64, .75, .54, .74, .49, .74, .40, .74, .32, .71, .23, .66, .15, .59, .09, .53, .06" />
Is there any way to extract numbers, do an operation on them, then replace them in the previous string? I tried to use a regex token in REGEXREPLACE but it doesn't seem to be supported.
=(REGEXREPLACE(A2,"[^[:digit:]]",($/10)))
You cannot apply any function to the string replacement pattern in REGEXREPLACE. In this concrete case, you may simply append a 0 before single-digit numbers and then add dots before each sequence of 1 or more digits:
=REGEXREPLACE(REGEXREPLACE(A1,"\b\d\b", "0$0"), "\d+", ".$0")
See screenshot:
NOTES:
REGEXREPLACE(A1,"\b\d\b", "0$0") - finds a digit not preceded nor followed with a letter/digit/_, and adds a 0 in front of it ($0 is the placeholder for the whole match)
REGEXREPLACE(..., "\d+", ".$0") - prepends one or more digit chunks with a dot.

Huge training error with pybrain

This is my training function:
def train(input_layer_data, output_layer_data, dnn, stn):
ds = SupervisedDataSet(len(input_layer_data), len(output_layer_data))
ds.addSample(input_layer_data, output_layer_data)
if 'network' in dnn[stn]:
net_dumped = dnn[stn]['network']
net = pickle.loads(net_dumped)
else:
net = buildNetwork(len(input_layer_data), 50, len(output_layer_data), hiddenclass=SigmoidLayer, outclass = SigmoidLayer)
trainer = BackpropTrainer(net, ds)
trainer.trainEpochs(1)
trnresult = percentError( trainer.testOnClassData(), input_layer_data )
print "epoch: %4d" % trainer.totalepochs, \
" train error: %5.2f%%" % trnresult
return net
I call this function with a single input and output data repeatedly.
And this is the output it generates,
inp=[48, 48, 8, 69, 69, 8, 57, 57, 8, 67, 67, 8, 71, 71, 8, 75, 75, 8, 71, 71, 8]
out=[27, 27, 8, 71, 71, 8, 75, 75, 8, 71, 71, 8, 67, 67, 8, 57, 57, 8, 69, 69, 8]
epoch: 0 train error: 2100.00%
FeedForwardNetwork-152
Modules:
[<BiasUnit 'bias'>, <LinearLayer 'in'>, <SigmoidLayer 'hidden0'>, <SigmoidLayer 'out'>]
Connections:
[<FullConnection 'FullConnection-148': 'bias' -> 'out'>, <FullConnection 'FullConnection-149': 'bias' -> 'hidden0'>, <FullConnection 'FullConnection-150': 'in' -> 'hidden0'>, <FullConnection 'FullConnection-151': 'hidden0' -> 'out'>]
I don't understand such huge error.
The error continues through the whole program(this is for just one call).
How do I reduce the error?

If/else statement on raphael g.barchar

Hi i have a bar chart made with g.raphael
The 47 in brackets in bold i am looking to place an if else statement in there. it seems to cause errors if i do so. Any help?
paper.barchart(-5, -20, 480, 260, [(47), 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52], {colors:["RGB(45,58,65)","RGB(217,31,53)","RGB(217,31,53)","RGB(217,31,53)","RGB(217,31,53)","RGB(217,31,53)","RGB(205,148,43)","RGB(205,148,43)","RGB(205,148,43)","RGB(205,148,43)","RGB(73,102,20)","RGB(73,102,20)","RGB(73,102,20)","RGB(73,102,20)","RGB(73,102,20)","RGB(0,99,186)","RGB(0,99,186)","RGB(0,99,186)","RGB(0,99,186)"]})
I've never tried a nested if block there, but you can define your data array outside of the instantiation, using if statements to build that according to your conditions, and then using your array variable in the call:
var dataArray = [...];
var colorArray = [...];
paper.barchart(-5, -20,
480, 260,
dataArray,
{
colors : colorArray
});

Extended tuple unpacking in Python 2

Is it possible to simulate extended tuple unpacking in Python 2?
Specifically, I have a for loop:
for a, b, c in mylist:
which works fine when mylist is a list of tuples of size three. I want the same for loop to work if I pass in a list of size four.
I think I will end up using named tuples, but I was wondering if there is an easy way to write:
for a, b, c, *d in mylist:
so that d eats up any extra members.
You can't do that directly, but it isn't terribly difficult to write a utility function to do this:
>>> def unpack_list(a, b, c, *d):
... return a, b, c, d
...
>>> unpack_list(*range(100))
(0, 1, 2, (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99))
You could apply it to your for loop like this:
for sub_list in mylist:
a, b, c, d = unpack_list(*sub_list)
You could define a wrapper function that converts your list to a four tuple. For example:
def wrapper(thelist):
for item in thelist:
yield(item[0], item[1], item[2], item[3:])
mylist = [(1,2,3,4), (5,6,7,8)]
for a, b, c, d in wrapper(mylist):
print a, b, c, d
The code prints:
1 2 3 (4,)
5 6 7 (8,)
For the heck of it, generalized to unpack any number of elements:
lst = [(1, 2, 3, 4, 5), (6, 7, 8), (9, 10, 11, 12)]
def unpack(seq, n=2):
for row in seq:
yield [e for e in row[:n]] + [row[n:]]
for a, rest in unpack(lst, 1):
pass
for a, b, rest in unpack(lst, 2):
pass
for a, b, c, rest in unpack(lst, 3):
pass
You can write a very basic function that has exactly the same functionality as the python3 extended unpack. Slightly verbose for legibility. Note that 'rest' is the position of where the asterisk would be (starting with first position 1, not 0)
def extended_unpack(seq, n=3, rest=3):
res = []; cur = 0
lrest = len(seq) - (n - 1) # length of 'rest' of sequence
while (cur < len(seq)):
if (cur != rest): # if I am not where I should leave the rest
res.append(seq[cur]) # append current element to result
else: # if I need to leave the rest
res.append(seq[cur : lrest + cur]) # leave the rest
cur = cur + lrest - 1 # current index movded to include rest
cur = cur + 1 # update current position
return(res)
Python 3 solution for those that landed here via an web search:
You can use itertools.zip_longest, like this:
from itertools import zip_longest
max_params = 4
lst = [1, 2, 3, 4]
a, b, c, d = next(zip(*zip_longest(lst, range(max_params))))
print(f'{a}, {b}, {c}, {d}') # 1, 2, 3, 4
lst = [1, 2, 3]
a, b, c, d = next(zip(*zip_longest(lst, range(max_params))))
print(f'{a}, {b}, {c}, {d}') # 1, 2, 3, None
For Python 2.x you can follow this answer.