How to do a union of multiple queryset from a clickhouse ORM - django

In a Django search app, I want to query a clickhouse database (using the infi.clickhouse_orm library) for pairs of values such as (a=1 AND b>=1.5) OR (a=2 AND b>=1). In SQL this could be done with
select * from table where a == 1 and b >= 1.5 UNION ALL select * from table where a == 2 and b >= 1
Looking at other exemples I have tried:
With the queryset defined as
qs = TABLE.objects_in(db)
qs_1 = qs.filter(A__eq=1, B__gte=1.5)
qs_2 = qs.filter(A__eq=2, B__gte=1)
The | operator
qs_union = qs_1 | qs_2
which returns
unsupported operand type(s) for |: 'QuerySet' and 'QuerySet'
The UNION operator
qs_union = qs_1.union(qs_2)
which returns
'QuerySet' object has no attribute 'union'
and the Q objects
qs_union = qs.filter(Q(A__eq=1, B__gte=1.5) | Q(A__eq=2, B__gte=1))
which returns
'Q' object has no attribute 'to_sql'
From a clickhouse models, how do you perfom a union of 2, or more, queryset?
Thanks!

Short answer: you should use the Q class of infi.clickhouse_orm.query, like:
from infi.clickhouse_orm.query import Q
The Q-class in info.clickhouse_orm [GitHub] has a to_sql method:
class Q(object):
# ...
def to_sql(self, model_cls):
if self._fovs:
sql = ' {} '.format(self._mode).join(fov.to_sql(model_cls) for fov in self._fovs)
else:
if self._l_child and self._r_child:
sql = '({} {} {})'.format(
self._l_child.to_sql(model_cls), self._mode, self._r_child.to_sql(model_cls))
else:
return '1'
if self._negate:
sql = 'NOT (%s)' % sql
return sql
Since the errors says it can not find the to_sql, it looks like you did not use that Q-class, but Django's Q class.

Related

How to update Django query object in a for loop

I know you can update all Django records matching a filter by using:
myQuery = myModel.objects.filter(fieldA = 1, FieldB = 2)
myQuery.update(fieldA = 5, FieldB = 6)
But if I want to iterate through the query results and only update certain values, how can I do this?
I have tried:
myQuery = myModel.objects.filter(fieldA = 1, FieldB = 2)
for item in range(myQuery.count())
if (myQuery[item].fieldC) == 10:
myQuery[item].update(fieldC = 100)
This returns AttributeError: 'myModel' object has no attribute 'update'
As you found out yourself, a Model object has no .update(..) method. You can .save(..) [Django-doc] the object, and specify what fields to update with the update_fields=… parameter [Django-doc]:
myQuery = myModel.objects.filter(fieldA=1, FieldB=2)
for item in myQuery:
if item.fieldC == 10:
item.fieldC = 100
item.save(update_fields=['fieldC'])
That being said, the above is very inefficient. Since for n objects, it will make a total of at most n+1 queries.
You can simply move the condition to the filter part here:
myModel.objects.filter(fieldA=1, FieldB=2, fieldC=10).update(fieldC=100)

'list' object has no attribute 'filter'

def g_view(request):
header_category = Category.objects.all()
m = Type1.objects.all()
r=Type2.objects.all()
g=Type3.objects.all()
from itertools import chain
orders=list(sorted(chain(m,r,g),key=lambda objects:objects.start))
mquery = request.GET.get('m')
if mquery:
orders = orders.filter(
Q(name__icontains=mquery) |
Q(game__name__icontains=mquery) |
Q(teams__name__icontains=mquery)).distinct()
(Type is abstract and type1 type2 type3 are inherited classes)
I got this error 'list' object has no attribute 'filter'
Basically when you chain multiple queryset, you loose the ability of queryset. After chaining, they become part of an iterator. And you can access the values of iterator by iterating it or calling list explicitly. You need to do the query before chaining the queryset.
query = Q(name__icontains=mquery) |
Q(game__name__icontains=mquery) |
Q(teams__name__icontains=mquery)
m = Type1.objects.filter(query).distinct()
r = Type2.objects.filter(query).distinct()
g = Type3.objects.filter(query).distinct()
orders=list(sorted(chain(m,r,g),key=lambda objects:objects.start))

cleaning up my SQLAlchemy operations (reducing repetition)

I have some server side processing of some data (client-side library = jQuery DataTables)
I am using POST as my ajax method. In my Flask webapp, I can access the POST data with request.values
The data type / structure of request.values is werkzeug.datastructures.CombinedMultiDict
If the user wants to sort a column, the request contains a key called action with a value of filter (note the below printouts are obtained with for v in request.values: print v, request.values[v])
...
columns[7][data] role
columns[8][search][regex] false
action filter
columns[10][name]
columns[3][search][value]
...
all the column names are also contained in the request as keys. The columns that have search terms will have the search string as a value for the column name key (as opposed to empty for columns with no search term entered. So, If I want to search for firstname containing bill, I would see the following in my request
columns[7][searchable] true
...
columns[6][name]
firstname bill
columns[0][search][value]
columns[2][searchable] true
...
columns[5][data] phone
role
columns[10][data] registered_on
...
columns[0][searchable] true
email
columns[7][orderable] true
...
columns[2][search][value]
Notice how role and email are empty. So my code below is very non-DRY
rv = request.values
if rv.get('action') == 'filter':
if len(rv.get('firstname')):
q = q.filter(User.firstname.ilike('%{0}%'.format(rv.get('firstname'))))
if len(rv.get('lastname')):
q = q.filter(User.lastname.ilike('%{0}%'.format(rv.get('lastname'))))
if len(rv.get('username')):
q = q.filter(User.username.ilike('%{0}%'.format(rv.get('username'))))
if len(rv.get('email')):
q = q.filter(User.email.ilike('%{0}%'.format(rv.get('email'))))
if len(rv.get('phone')):
q = q.filter(User.phone.ilike('%{0}%'.format(rv.get('phone'))))
if len(rv.get('region')):
q = q.filter(User.region.name.ilike('%{0}%'.format(rv.get('region'))))
if len(rv.get('role')):
q = q.filter(User.role.name.ilike('%{0}%'.format(rv.get('role'))))
if len(rv.get('is_active')):
q = q.filter(User.is_active_ == '{0}'.format(rv.get('is_active')))
if len(rv.get('is_confirmed')):
q = q.filter(User.is_confirmed == '{0}'.format(rv.get('is_confirmed')))
if len(rv.get('registered_on_from')):
fdate = datetime.strptime(rv.get('registered_on_from'), '%Y-%m-%d')
q = q.filter(User.registered_on > fdate)
if len(rv.get('registered_on_to')):
tdate = datetime.strptime(rv.get('registered_on_to'), '%Y-%m-%d')
q = q.filter(User.registered_on < tdate)
I was building the sorting functionality, and I found the following statement that greatly simplified my life (see this answer)
q = q.order_by('{name} {dir}'.format(name=sort_col_name, dir=sort_dir))
I was wondering if there was a way to simplify this set of filtering queries like the above sorting code since I will have to do this for many other models.
This should help:
from sqlalchemy import inspect
from sqlalchemy.sql.sqltypes import String,Boolean
def filter_model_by_request(qry,model,rv):
if rv.get('action') == 'filter':
mapper = inspect(model).attrs # model mapper
col_names = list(set([c.key for c in mapper]) & set(rv.keys()))
# col_names is a list generated by intersecting the request values and model column names
for col_name in col_names:
col = mapper[col_name].columns[0]
col_type = type(col.type)
if col_type == String: # filter for String
qry = qry.filter(col.ilike('%{0}%'.format(rv.get(col_name))))
elif col_type == Boolean: # filter for Boolean
qry = qry.filter(col == '{0}'.format(rv.get(col_name)))
return qry
Example call (I used it with a #app.before_request and a cURL call to verify):
qry = db.session.query(User)
print filter_model_by_request(qry,User,request.values).count()
The date range filtering is not included in the function, add this feature if you wish, your code is fine for that purpose.
side note: be careful with the bigger/smaller operators for the dates. You're excluding the actual requested dates. Use <= or >= to include dates in filtering action. It's always a pitfall for me..

Django ORM equivalent for this SQL..calculated field derived from related table

I have the following model structure below:
class Master(models.Model):
name = models.CharField(max_length=50)
mounting_height = models.DecimalField(max_digits=10,decimal_places=2)
class MLog(models.Model):
date = models.DateField(db_index=True)
time = models.TimeField(db_index=True)
sensor_reading = models.IntegerField()
m_master = models.ForeignKey(Master)
The goal is to produce a queryset that returns all the fields from MLog plus a calculated field (item_height) based on the related data in Master
using Django's raw sql:
querySet = MLog.objects.raw('''
SELECT a.id,
date,
time,
sensor_reading,
mounting_height,
(sensor_reading - mounting_height) as item_height
FROM db_mlog a JOIN db_master b
ON a.m_master_id = b.id
''')
How do I code this using Django's ORM?
I can think of two ways to go about this without relying on raw(). The first is pretty much the same as what #tylerl suggested. Something like this:
class Master(models.Model):
name = models.CharField(max_length=50)
mounting_height = models.DecimalField(max_digits=10,decimal_places=2)
class MLog(models.Model):
date = models.DateField(db_index=True)
time = models.TimeField(db_index=True)
sensor_reading = models.IntegerField()
m_master = models.ForeignKey(Master)
def _get_item_height(self):
return self.sensor_reading - self.m_master.mounting_height
item_height = property(_get_item_height)
In this case I am defining a custom (derived) property for MLog called item_height. This property is calculated as the difference of the sensor_reading of an instance and the mounting_height of its related master instance. More on property here.
You can then do something like this:
In [4]: q = MLog.objects.all()
In [5]: q[0]
Out[5]: <MLog: 2010-09-11 8>
In [6]: q[0].item_height
Out[6]: Decimal('-2.00')
The second way to do this is to use the extra() method and have the database do the calculation for you.
In [14]: q = MLog.objects.select_related().extra(select =
{'item_height': 'sensor_reading - mounting_height'})
In [16]: q[0]
Out[16]: <MLog: 2010-09-11 8>
In [17]: q[0].item_height
Out[17]: Decimal('-2.00')
You'll note the use of select_related(). Without this the Master table will not be joined with the query and you will get an error.
I always do the calculations in the app rather than in the DB.
class Thing(models.Model):
foo = models.IntegerField()
bar = models.IntegerField()
#Property
def diff():
def fget(self):
return self.foo - self.bar
def fset(self,value):
self.bar = self.foo - value
Then you can manipulate it just as you would any other field, and it does whatever you defined with the underlying data. For example:
obj = Thing.objects.all()[0]
print(obj.diff) # prints .foo - .bar
obj.diff = 4 # sets .bar to .foo - 4
Property, by the way, is just a standard property decorator, in this case coded as follows (I don't remember where it came from):
def Property(function):
keys = 'fget', 'fset', 'fdel'
func_locals = {'doc':function.__doc__}
def probeFunc(frame, event, arg):
if event == 'return':
locals = frame.f_locals
func_locals.update(dict((k,locals.get(k)) for k in keys))
sys.settrace(None)
return probeFunc
sys.settrace(probeFunc)
function()
return property(**func_locals)

How do I do an OR filter in a Django query?

I want to be able to list the items that either a user has added (they are listed as the creator) or the item has been approved.
So I basically need to select:
item.creator = owner or item.moderated = False
How would I do this in Django? (preferably with a filter or queryset).
There is Q objects that allow to complex lookups. Example:
from django.db.models import Q
Item.objects.filter(Q(creator=owner) | Q(moderated=False))
You can use the | operator to combine querysets directly without needing Q objects:
result = Item.objects.filter(item.creator = owner) | Item.objects.filter(item.moderated = False)
(edit - I was initially unsure if this caused an extra query but #spookylukey pointed out that lazy queryset evaluation takes care of that)
It is worth to note that it's possible to add Q expressions.
For example:
from django.db.models import Q
query = Q(first_name='mark')
query.add(Q(email='mark#test.com'), Q.OR)
query.add(Q(last_name='doe'), Q.AND)
queryset = User.objects.filter(query)
This ends up with a query like :
(first_name = 'mark' or email = 'mark#test.com') and last_name = 'doe'
This way there is no need to deal with or operators, reduce's etc.
You want to make filter dynamic then you have to use Lambda like
from django.db.models import Q
brands = ['ABC','DEF' , 'GHI']
queryset = Product.objects.filter(reduce(lambda x, y: x | y, [Q(brand=item) for item in brands]))
reduce(lambda x, y: x | y, [Q(brand=item) for item in brands]) is equivalent to
Q(brand=brands[0]) | Q(brand=brands[1]) | Q(brand=brands[2]) | .....
Similar to older answers, but a bit simpler, without the lambda...
To filter these two conditions using OR:
Item.objects.filter(Q(field_a=123) | Q(field_b__in=(3, 4, 5, ))
To get the same result programmatically:
filter_kwargs = {
'field_a': 123,
'field_b__in': (3, 4, 5, ),
}
list_of_Q = [Q(**{key: val}) for key, val in filter_kwargs.items()]
Item.objects.filter(reduce(operator.or_, list_of_Q))
operator is in standard library: import operator
From docstring:
or_(a, b) -- Same as a | b.
For Python3, reduce is not a builtin any more but is still in the standard library: from functools import reduce
P.S.
Don't forget to make sure list_of_Q is not empty - reduce() will choke on empty list, it needs at least one element.
Multiple ways to do so.
1. Direct using pipe | operator.
from django.db.models import Q
Items.objects.filter(Q(field1=value) | Q(field2=value))
2. using __or__ method.
Items.objects.filter(Q(field1=value).__or__(field2=value))
3. By changing default operation. (Be careful to reset default behavior)
Q.default = Q.OR # Not recommended (Q.AND is default behaviour)
Items.objects.filter(Q(field1=value, field2=value))
Q.default = Q.AND # Reset after use.
4. By using Q class argument _connector.
logic = Q(field1=value, field2=value, field3=value, _connector=Q.OR)
Item.objects.filter(logic)
Snapshot of Q implementation
class Q(tree.Node):
"""
Encapsulate filters as objects that can then be combined logically (using
`&` and `|`).
"""
# Connection types
AND = 'AND'
OR = 'OR'
default = AND
conditional = True
def __init__(self, *args, _connector=None, _negated=False, **kwargs):
super().__init__(children=[*args, *sorted(kwargs.items())], connector=_connector, negated=_negated)
def _combine(self, other, conn):
if not(isinstance(other, Q) or getattr(other, 'conditional', False) is True):
raise TypeError(other)
if not self:
return other.copy() if hasattr(other, 'copy') else copy.copy(other)
elif isinstance(other, Q) and not other:
_, args, kwargs = self.deconstruct()
return type(self)(*args, **kwargs)
obj = type(self)()
obj.connector = conn
obj.add(self, conn)
obj.add(other, conn)
return obj
def __or__(self, other):
return self._combine(other, self.OR)
def __and__(self, other):
return self._combine(other, self.AND)
.............
Ref. Q implementation
This might be useful https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Basically it sounds like they act as OR
Item.objects.filter(field_name__startswith='yourkeyword')