Django: objects.raw() resulting query but not records - django

I'm django newbie, I have one fundamental and one technical questions.
I'm using Postgres DB. I used psycopg2 connection/cursor for fetching the data, there was some delay while establishing a connection. I read that, ORM takes care of low level activities such as establishing a connection, etc. If I use django, ORM will takes care of connection challenge ?
1.1. Can I expect same (low level activities) with raw() as well?
objects.raw(sql) returning Query but not records from the table.
I defined Student Model as below
class Student(models.Model):
firstname = models.CharField(max_length=100)
surname = models.CharField(max_length=100)
def __str__(self):
return self.firstname
While creating the view,
def studentList(request):
#posts = Student.objects.all() --> 1. working as expected (fetching all records firstname)
cursor = connection.cursor()
sql = "select * from api_student"
cursor.execute(sql)
posts = cursor.fetchone() --> 2. returning entire record
#posts = Student.objects.raw(sql) --> 3. RETURNING SQL QUERY NOT RECORD FROM TABLE ???
print(posts)
return render(request, 'output.html', {'posts':posts})
output:
<QuerySet [<Student: Anil>]>
<RawQuerySet: select * from api_student> --> this is the challenge, did I miss any
('Anil', 'kumar')

The raw() method takes a raw sql query, executes it, and returns a RawQuerySet instance. You can iterate on RawQuerySet like normal QuerySet and get objects.
sql = "select * from api_student"
student_qs = Student.objects.raw(sql)
for obj in student_qs:
print(obj.pk, obj.firstname, obj.surname)

Related

Race condition when two different users inserting new records to database in Django

There is a race condition situation, when I want to create a new instance of model Order.
There is a daily_id field that everyday for any category starts from one. It means every category has its own daily id.
class Order(models.Model):
daily_id = models.SmallIntegerField(default=0)
category = models.ForeignKey(Categoty, on_delete=models.PROTECT, related_name="orders")
declare_time = models.DateField()
...
}
daily_id field of new record is being calculated using this method:
def get_daily_id(category, declare_time):
try:
last_order = Order.objects.filter(declare_time=declare_time,
category=category).latest('daily_id')
return last_order.daily_id + 1
except Order.DoesNotExist:
# If no order has been registered in declare_time date.
return 1
The problem is that when two different users are registering orders in the same category at the same time, it is highly likely that the orders have the repetitive daily_id values.
I have tried #transaction.atomic decorator for post method of DRF APIView and it didn't work!
You must use an auto increment and add a view that computes your semantic order like :
SELECT *, ROW_NUMBER() OVER(PARTITION BY MyDayDate ORDER BY id_autoinc) AS daily_id

How to return queryset instead of list from Django model manager with custom SQL

I'm dealing with a legacy data source and a driver not supported by Django orm. I can only submit queries using their proprietary odbc driver via pyodbc. My workaround is to submit custom sql via pyodbc from the model manager. This techniqe (inspired by Django documentation) returns a list and not a queryset. This works great until I use packages that expect querysets.
How do I convert the result list to a queryset? Is there a way to inject the results into a queryset?
class MyManager(models.Manager):
def getdata(self):
con_string = 'DSN=myOdbcDsn;UID=id;PWD=pass'
conn=pyodbc.connect(con_string)
cursor=conn.cursor()
result_list = []
try:
sql = "select distinct coalesce(WORKCENTER_GROUP, 'na') workcenterGroup, WORKCENTER_CODE workcenterCode FROM Workcenter"
cursor.execute(sql)
for row in cursor.fetchall():
p = self.model(workcenterGroup=row[0], workcenterCode=row[1])
result_list.append(p)
except pyodbc.Error as ex:
print("----------------ERROR %s: %s" % (ex.args[0], ex.args[1]))
conn.close()
return result_list
class ProdTrends2(models.Model):
workcenterGroup=models.CharField("Group", max_length=100)
workcenterCode=models.CharField("Code", max_length=100)
objects=MyManager()
Calling set(result_list) should be enough to satisfy packages the need a Queryset.

Django: Single query with multiple joins on the same one-to-many relationship

Using the Django QuerySet API, how can I perform multiple joins between the same two tables/models? See the following untested code for illustration purposes:
class DataPacket(models.Model):
time = models.DateTimeField(auto_now_add=True)
class Field(models.Model):
packet = models.ForeignKey(DataPacket, models.CASCADE)
name = models.CharField(max_length=25)
value = models.FloatField()
I want to grab a list of data packets with only specific named fields. I tried something like this:
pp = DataPacket.prefetch_related('field_set')
result = []
for p in pp:
o = {
f.name: f.value
for f in p.field_set.all()
if f.name in ('latitude', 'longitude')
}
o['time'] = p.time
result.append(o)
But this has proven extremely inefficient because I'm working with hundreds to thousands of packets with a lot of other fields besides the latitude and longitude fields I want.
Is there a Django QuerySet call which translates into an efficient SQL query performing two inner joins from the datapacket table to the field table on different rows? I can do it with raw SQL, as follows (assuming the Django application is named myapp) (again, untested code for illustration purposes):
from django.db import connection
with connection.cursor() as cursor:
cursor.execute('''
SELECT p.time AS time, f1.value AS lat, f2.value AS lon
FROM myapp_datapacket AS p
INNER JOIN myapp_field as f1 ON p.id = f1.packet_id
INNER JOIN myapp_field as f2 ON p.id = f2.packet_id
WHERE f1.name = 'latitude' AND f2.name = 'longitude'
''')
result = list(cursor)
But instinct tells me not to use the low-level DB api if I don't have to do so. Possible reasons to back that up might be that my SQL code might not be compatible with all the DBMs Django supports, or I feel like I'm more at risk of trashing my database by misunderstanding a SQL command than I am at misunderstanding the Django API call, etc.
Try Performing raw SQL queries in django. As well as select related in raw request.
prefetch on raw query:
from django.db.models.query import prefetch_related_objects
raw_queryset = list(raw_queryset)
prefetch_related_objects(raw_queryset, ['a_related_lookup',
'another_related_lookup', ...])
Your example:
from django.db.models.query import prefetch_related_objects
raw_DataPacket = list(DataPacket.objects.raw)
pp = prefetch_related_objects(raw_DataPacket, ['field_set'])
Example of prefetch_related with Raw Queryset:
models:
class Country:
name = CharField()
class City:
country = models.ForeignKey(Country)
name = models.CharField()
prefetch_related:
from django.db.models.query import prefetch_related_objects
#raw querysets do not have len()
#thats why we need to evaluate them to list
cities = list(City.objects.raw("select * from city inner join country on city.country_id = country.id where name = 'london'"))
prefetch_related_objects(cities, ['country'])
Answer provided from information from these sources: djangoproject - performing raw queries | Related Stackoverflow Question | Google docs question

In terms of performance, which one is better modifying queryset or writing SQL through managers in django?

In terms of performance, which one is better modifying queryset or writing SQL through managers in django?
class DahlBookManager(models.Manager):
def get_queryset(self):
return super(DahlBookManager, self).get_queryset().filter(author='Roald Dahl')
or
class PollManager(models.Manager):
def with_counts(self):
from django.db import connection
cursor = connection.cursor()
cursor.execute("""
SELECT p.id, p.question, p.poll_date, COUNT(*)
FROM polls_opinionpoll p, polls_response r
WHERE p.id = r.poll_id
GROUP BY p.id, p.question, p.poll_date
ORDER BY p.poll_date DESC""")
result_list = []
for row in cursor.fetchall():
p = self.model(id=row[0], question=row[1], poll_date=row[2])
p.num_responses = row[3]
result_list.append(p)
return result_list
I'd imagine its the first one - but this isn't a fair contest.
Your second pure sql statement also has to do a grouping and an ordering which your first does not so the first is just a WHERE.
The reason it could be the second is because the first gets * rather than just the 3 items you need so you may be better off with the following:
super(DahlBookManager, self).get_queryset().filter(author='Roald Dahl').values('id',
'question',
'poll_date')
Now this may just be my opinion but for most queries you do in django you should use django and avoid raw queries. It will help you if you ever decide to use a different database schema and potentially create more efficient queries.

DRF - How to get WritableField to not load entire database into memory?

I have a very large database (6 GB) that I would like to use Django-REST-Framework with. In particular, I have a model that has a ForeignKey relationship to the django.contrib.auth.models.User table (not so big) and a Foreign Key to a BIG table (lets call it Products). The model can be seen below:
class ShoppingBag(models.Model):
user = models.ForeignKey('auth.User', related_name='+')
product = models.ForeignKey('myapp.Product', related_name='+')
quantity = models.SmallIntegerField(default=1)
Again, there are 6GB of Products.
The serializer is as follows:
class ShoppingBagSerializer(serializers.ModelSerializer):
product = serializers.RelatedField(many=False)
user = serializers.RelatedField(many=False)
class Meta:
model = ShoppingBag
fields = ('product', 'user', 'quantity')
So far this is great- I can do a GET on the list and individual shopping bags, and everything is fine. For reference the queries (using a query logger) look something like this:
SELECT * FROM myapp_product WHERE product_id=1254
SELECT * FROM auth_user WHERE user_id=12
SELECT * FROM myapp_product WHERE product_id=1404
SELECT * FROM auth_user WHERE user_id=12
...
For as many shopping bags are getting returned.
But I would like to be able to POST to create new shopping bags, but serializers.RelatedField is read-only. Let's make it read-write:
class ShoppingBagSerializer(serializers.ModelSerializer):
product = serializers.PrimaryKeyRelatedField(many=False)
user = serializers.PrimaryKeyRelatedField(many=False)
...
Now things get bad... GET requests to the list action take > 5 minutes and I noticed that my server's memory jumps up to ~6GB; why?! Well, back to the SQL queries and now I see:
SELECT * FROM myapp_products;
SELECT * FROM auth_user;
Ok, so that's not good. Clearly we're doing "prefetch related" or "select_related" or something like that in order to get access to all the products; but this table is HUGE.
Further inspection reveals where this happens on Line 68 of relations.py in DRF:
def initialize(self, parent, field_name):
super(RelatedField, self).initialize(parent, field_name)
if self.queryset is None and not self.read_only:
manager = getattr(self.parent.opts.model, self.source or field_name)
if hasattr(manager, 'related'): # Forward
self.queryset = manager.related.model._default_manager.all()
else: # Reverse
self.queryset = manager.field.rel.to._default_manager.all()
If not readonly, self.queryset = ALL!!
So, I'm pretty sure that this is where my problem is; and I need to say, don't select_related here, but I'm not 100% if this is the issue or where to deal with this. It seems like all should be memory safe with pagination, but this is simply not the case. I'd appreciate any advice.
In the end, we had to simply create our own PrimaryKeyRelatedField class to override the default behavior in Django-Rest-Framework. Basically we ensured that the queryset was None until we wanted to lookup the object, then we performed the lookup. This was extremely annoying, and I hope the Django-Rest-Framework guys take note of this!
Our final solution:
class ProductField(serializers.PrimaryKeyRelatedField):
many = False
def __init__(self, *args, **kwargs):
kwarsgs['queryset'] = Product.objects.none() # Hack to ensure ALL products are not loaded
super(ProductField, self).__init__(*args, **kwargs)
def field_to_native(self, obj, field_name):
return unicode(obj)
def from_native(self, data):
"""
Perform query lookup here.
"""
try:
return Product.objects.get(pk=data)
except Product.ObjectDoesNotExist:
msg = self.error_messages['does_not_exist'] % smart_text(data)
raise ValidationError(msg)
except (TypeError, ValueError):
msg = self.error_messages['incorrect_type'] % type(data)
raise ValidationError(msg)
And then our serializer is as follows:
class ShoppingBagSerializer(serializers.ModelSerializer):
product = ProductField()
...
This hack ensures the entire database isn't loaded into memory, but rather performs one-off selects based on the data. It's not as efficient computationally, but it also doesn't blast our server with 5 second database queries loaded into memory!