I got a View.py function that looks like this:
def GetAllCities(request):
cities = list(City.objects.all())
return HttpResponse(json.dumps(cities))
My City model looks like this
class City(models.Model):
city = models.CharField()
loc = models.CharField()
population = models.IntegerField()
state = models.CharField()
_id = models.CharField()
class MongoMeta:
db_table = "cities"
def __unicode__(self):
return self.city
I am using a MongoDB that looks like this
{
"_id" : ObjectId("5179837cbd7fe491c1f23227"),
"city" : "ACMAR",
"loc" : "[-86.51557, 33.584132]",
"state" : "AL",
"population" : 6055
}
I get the following error when trying to return the JSON from my GetAllCities function:
City ACMAR is not JSON serializable
So I tried this Instead:
def GetAllCities(request):
cities = serializers.serialize("json", City.objects.all())
return HttpResponse(cities)
And this works but It's very slow, it takes about 9 seconds(My database contains 30000 rows)
Should it take this long or am I doing something wrong?
I've built the same app in PHP, Rails and NodeJS.
In PHP it takes on average 2000ms, NodeJS = 800ms, Rails = 5882ms and Django 9395ms. Im trying to benchmark here so I wonder if there is a way to optimize my Django code or is this as fast as it gets?
For sure you do not need to return ALL cities, as you probably won't display all 30000 rows anyway (at least in user-friendly way). Consider a solution where you return only cities within some range from requested location. Mongo supports geospatial indexes, so there should be no problem in doing that. There are also many tutorials over the internet how to perform spatial filtering in Django/MongoDB.
def GetAllCities(request, lon, lat):
#Pseudo-code
cities = City.objects.filterWithingXkmFromLonLat(lon, lat).all()
cities = serializers.serialize("json", cities)
return HttpResponse(cities)
If you really, really need all cities, consider caching the response. Location, name and population of cities are not things which change dynamically, in a matter of let's say seconds. Cache the result and recalculate only every hour, day or more. Django supports cache out of the box
#cache_page(60 * 60)
def GetAllCities(request):
(...)
Another thing you can try to get a little more of speed is to get from db just the values you need and get the QuerySet to build the dictionary.
A simple query like this would work:
City.objects.all().values('id', 'city', 'loc', 'population', 'state')
Or you can put it in a manager:
class CitiesManager(models.Manager):
class as_dict(self):
return self.all().values('id', 'city', 'loc', 'population', 'state')
class City(models.Model):
.... your fields here...
objects = CitiesManager()
And then use it in your view as:
City.objects.as_dict()
FOUND A SOLUTION
I am benchmarking with different methods, one method is to see how fast one language/framework is to select ALL rows in a database and return it as JSON. I found a solution now that speeds it up by half the time!
My new views.py
def GetAllCities(request):
dictionaries = [ obj.as_dict() for obj in City.objects.all() ]
return HttpResponse(json.dumps({"Cities": dictionaries}), content_type='application/json')
And my new model
class City(models.Model):
city = models.CharField()
loc = models.CharField()
population = models.IntegerField()
state = models.CharField()
_id = models.CharField()
def as_dict(self):
return {
"id": self.id,
"city": self.city,
"loc": self.loc,
"population": self.population,
"state": self.state
# other stuff
}
class MongoMeta:
db_table = "cities"
def __unicode__(self):
return self.city
Found the solution here
Related
i'm trying to call back unique constraints field , in my project i have to count number of M2M selected
class Booking(models.Model):
room_no = models.ForeignKey(Room,on_delete=models.CASCADE,blank=True,related_name='rooms')
takes_by = models.ManyToManyField(Vistor)
#property
def no_persons(self):
qnt = Booking.objects.filter(takes_by__full_information=self).count()#but this doesnt work
return qnt
Cannot query "some room information": Must be "Vistor" instance.
class Vistor(models.Model):
full_name = models.CharField(max_length=150)
dob = models.DateField(max_length=14)
city = models.ForeignKey(City,on_delete=models.CASCADE)
class Meta:
constraints = [
models.UniqueConstraint(fields=['full_name','dob','city'],name='full_information')
]
def __str__(self):
return f'{self.full_name} - {self.city} - {self.dob}'
it it possible to access full_information through Booking model ? thank you ..
If you want to count the number of Visitors related to that booking, you can count these with:
#property
def no_persons(self):
self.taken_by.count()
This will make an extra query to the database, therefore it is often better to let the database count these in the query. You can thus remove the property, and query with:
from django.db.models import Count
Booking.objects.annotate(
no_persons=Count('takes_by')
)
The Bookings that arise from this QuerySet will have an extra attribute no_persons with the number of related Visitors.
Let's assume I have a Product model in my project:
class Product(models.Model):
price = models.IntegerField()
and I want to have some sort of statistics (let's say I want to keep track of how has the price changed over time) for it:
class ProductStatistics(models.Model):
created = models.DateTimeField(auto_add_now=True)
statistics_value = models.IntegerField()
product = models.ForeignKey(Product)
#classmethod
def create_for_product(cls, product_ids):
statistics = []
products = Product.objects.filter(id__in=products_ids)
for product in products:
statistics.append(
product=product
statistics_value=product.price
)
cls.objects.bulk_create(statistics)
#classmethod
def get_latest_by_products_ids(cls, product_ids):
return None
I have a problem with implementing get_latest_by_products_ids method. I want only latest statistic, so I can't do something like:
#classmethod
def get_latest_by_products_ids(cls, product_ids):
return cls.objects.filter(product__id__in=product_ids)
because this would return all statistics I have gathered through time. How can I limit the query to only most recent one for each Product?
EDIT
I am using PostgreSQL database.
Querysets already have a last() method (and a first() method too FWIW). The only question is what you want to define as "last" since this depends on the queryset's ordering... But assuming you want the last by creation date (created field), you can also use the lastest() method:
#classmethod
def get_latest_by_products_ids(cls, product_ids):
found = []
for pid in products_ids:
found.append(cls.objects.filter(product_id=pid).latest("created"))
return found
As a side note: Django's coding style is to use the Manager (and eventually the Queryset) for operations working on the whole table, so instead of creating classmethods on your model you should create a custom manager:
class productStatisticManager(models.Manager):
def create_for_products(self, product_ids):
statistics = []
products = Product.objects.filter(id__in=products_ids)
for product in products:
statistics.append(
product=product
statistics_value=product.price
)
self.bulk_create(statistics)
def get_latest_by_products_ids(cls, product_ids):
found = []
for pid in products_ids:
last = self.objects.filter(product_id=pid).latest("created")
found.append(last)
return found
class ProductStatistics(models.Model):
created = models.DateTimeField(auto_add_now=True)
statistics_value = models.IntegerField()
product = models.ForeignKey(Product)
objects = ProductStatisticManager()
Putting the method in Product model and will be easier:
class Product(models.Model):
price = models.IntegerField()
def get_latest_stat(self):
return self.productstatistics_set.all().order_by('-created')[0] # or [:1]
Using [:1] instead of [0] will return QuerySet of single element while [0] will return just one Object of model Class.
eg.
>>> type(cls.objects.filter(product__id__in=product_ids).order_by('-created')[:1])
<class 'django.db.models.query.QuerySet'>
>>> type(cls.objects.filter(product__id__in=product_ids).order_by('-created')[0])
<class 'myApp.models.MyModel'>
I want to write all types of complex queries,
for example :
If someone wants information "Fruit" is "Guava" in "Pune District" then they will get data for guava in pune district.
htt//api/?fruit=Guava&?district=Pune
If someone wants information "Fruit" is "Guava" in "Girnare Taluka" then they will get data for guava in girnare taluka.
htt://api/?fruit=Guava&?taluka=Girnare
If someone wants information for "Fruit" is "Guava" and "Banana" then they will get all data only for this two fruits, like wise
htt://api/?fruit=Guava&?Banana
But, when I run server then I cant get correct output
If i use http://api/?fruit=Banana then I get all data for fruit which is banana, pomegranate, guava instead of get data for fruit is only banana. So I am confuse what happen here.
can you please check my code, where I made mistake?
*Here is my all files
models.py
class Wbcis(models.Model):
Fruit = models.CharField(max_length=50)
District = models.CharField(max_length=50)
Taluka = models.CharField(max_length=50)
Revenue_circle = models.CharField(max_length=50)
Sum_Insured = models.FloatField()
Area = models.FloatField()
Farmer = models.IntegerField()
def get_wbcis(fruit=None, district=None, talkua=None, revenue_circle=None, sum_insured=None, area=None,min_farmer=None, max_farmer=None, limit=100):
query = Wbcis.objects.all()
if fuit is not None:
query = query.filter(Fruit=fruit)
if district is not None:
query = query.filter(District=district)
if taluka is not None:
query = query.filter(Taluka=taluka)
if revenue_circle is not None:
query = query.filter(Revenue_circle= revenue_circle)
if sum_insured is not None:
query = query.filter(Sum_Insured=sum_Insured)
if area is not None:
query = query.filter(Area=area)
if min_farmer is not None:
query = query.filter(Farmer__gte=min_farmer)
if max_farmer is not None:
query = query.filter(Farmer__lt=max_farmer)
return query[:limit]
Views.py
class WbcisViewSet(ModelViewSet):
queryset = Wbcis.objects.all()
serializer_class = WbcisSerializer
def wbcis_view(request):
fruit = request.GET.get("fruit")
district = request.GET.get("district")
taluka = request.GET.get("taluka")
revenue_circle = request.GET.get("revenue_circle")
sum_insured = request.GET.get("sum_insured")
area = request.GET.get("area")
min_farmer = request.GET.get("min_farmer")
max_farmer = request.GET.get("max_farmer")
wbcis = get_wbcis(fruit, district, taluka,revenue_circle,sum_insured,area, min_farmer, max_farmer)
#convert them to JSON:
dicts = []
for wbci in wbcis:
dicts.append(model_to_dict(wbci))
return JsonResponse(dicts)
Serializers.py
from rest_framework.serializers import ModelSerializer
from WBCIS.models import Wbcis
class WbcisSerializer(ModelSerializer):
class Meta:
model = Wbcis
fields=('id','Fruit','District','Sum_Insured','Area','Farmer','Taluka','Revenue_circle',)
whats need changes in this code for call these queries to get exact output?
I don't think that you're actually calling that view, judging by your usage I presume you're calling the viewset itself and then ignoring the query params.
You should follow the drf docs for filtering but essentially, provide the get queryset method to your viewset and include the code you currently have in your view in that
class WbcisViewSet(ModelViewSet):
queryset = Wbcis.objects.all() # Shouldn't need this anymore
serializer_class = WbcisSerializer
def get_queryset(self):
fruit = self.request.query_params.get("fruit")
....
return get_wbscis(...)
EDIT:
It turns out the real question is - how do I get select_related to follow the m2m relationships I have defined? Those are the ones that are taxing my system. Any ideas?
I have two classes for my django app. The first (Item class) describes an item along with some functions that return information about the item. The second class (Itemlist class) takes a list of these items and then does some processing on them to return different values. The problem I'm having is that returning a list of items from Itemlist is taking a ton of queries, and I'm not sure where they're coming from.
class Item(models.Model):
# for archiving purposes
archive_id = models.IntegerField()
users = models.ManyToManyField(User, through='User_item_rel',
related_name='users_set')
# for many to one relationship (tags)
tag = models.ForeignKey(Tag)
sub_tag = models.CharField(default='',max_length=40)
name = models.CharField(max_length=40)
purch_date = models.DateField(default=datetime.datetime.now())
date_edited = models.DateTimeField(auto_now_add=True)
price = models.DecimalField(max_digits=6, decimal_places=2)
buyer = models.ManyToManyField(User, through='Buyer_item_rel',
related_name='buyers_set')
comments = models.CharField(default='',max_length=400)
house_id = models.IntegerField()
class Meta:
ordering = ['-purch_date']
def shortDisplayBuyers(self):
if len(self.buyer_item_rel_set.all()) != 1:
return "multiple buyers"
else:
return self.buyer_item_rel_set.all()[0].buyer.name
def listBuyers(self):
return self.buyer_item_rel_set.all()
def listUsers(self):
return self.user_item_rel_set.all()
def tag_name(self):
return self.tag
def sub_tag_name(self):
return self.sub_tag
def __unicode__(self):
return self.name
and the second class:
class Item_list:
def __init__(self, list = None, house_id = None, user_id = None,
archive_id = None, houseMode = 0):
self.list = list
self.house_id = house_id
self.uid = int(user_id)
self.archive_id = archive_id
self.gen_balancing_transactions()
self.houseMode = houseMode
def ret_list(self):
return self.list
So after I construct Itemlist with a large list of items, Itemlist.ret_list() takes up to 800 queries for 25 items. What can I do to fix this?
Try using select_related
As per a question I asked here
Dan is right in telling you to use select_related.
select_related can be read about here.
What it does is return in the same query data for the main object in your queryset and the model or fields specified in the select_related clause.
So, instead of a query like:
select * from item
followed by several queries like this every time you access one of the item_list objects:
select * from item_list where item_id = <one of the items for the query above>
the ORM will generate a query like:
select item.*, item_list.*
from item a join item_list b
where item a.id = b.item_id
In other words: it will hit the database once for all the data.
You probably want to use prefetch_related
Works similarly to select_related, but can deal with relations selected_related cannot. The join happens in python, but I've found it to be more efficient for this kind of work than the large # of queries.
Related reading on the subject
What I want is to be able to get this weeks/this months/this years etc. hotest products. So I have a model named ProductStatistics that will log each hit and each purchase on a day-to-day basis. This is the models I have got to work with:
class Product(models.Model):
name = models.CharField(_("Name"), max_length=200)
slug = models.SlugField()
description = models.TextField(_("Description"))
picture = models.ImageField(upload_to=product_upload_path, blank=True)
category = models.ForeignKey(ProductCategory)
prices = models.ManyToManyField(Store, through='Pricing')
objects = ProductManager()
class Meta:
ordering = ('name', )
def __unicode__(self):
return self.name
class ProductStatistic(models.Model):
# There is only 1 `date` each day. `date` is
# set by datetime.today().date()
date = models.DateTimeField(default=datetime.now)
hits = models.PositiveIntegerField(default=0)
purchases = models.PositiveIntegerField(default=0)
product = models.ForeignKey(Product)
class Meta:
ordering = ('product', 'date', 'purchases', 'hits', )
def __unicode__(self):
return u'%s: %s - %s hits, %s purchases' % (self.product.name, str(self.date).split(' ')[0], self.hits, self.purchases)
How would you go about sorting the Products after say (hits+(purchases*2)) the latest week?
This structure isn't set in stone either, so if you would structure the models in any other way, please tell!
first idea:
in the view you could query for today's ProductStatistic, than loop over the the queryset and add a variable ranking to every object and add that object to a list. Then just sort after ranking and pass the list to ur template.
second idea:
create a filed ranking (hidden for admin) and write the solution of ur formula each time the object is saved to the database by using a pre_save-signal. Now you can do ProductStatistic.objects.filter(date=today()).order_by('ranking')
Both ideas have pros&cons, but I like second idea more
edit as response to the comment
Use Idea 2
Write a view, where you filter like this: ProductStatistic.objects.filter(product= aProductObject, date__gte=startdate, date__lte=enddate)
loop over the queryset and do somthing like aProductObject.ranking+= qs_obj.ranking
pass a sorted list of the queryset to the template
Basically a combination of both ideas
edit to your own answer
Your solution isn't far away from what I suggested — but in sql-space.
But another solution:
Make a Hit-Model:
class Hit(models.Model):
date = models.DateTimeFiles(auto_now=True)
product = models.ForeignKey(Product)
purchased= models.BooleanField(default=False)
session = models.CharField(max_length=40)
in your view for displaying a product you check, if there is a Hit-object with the session, and object. if not, you save it
Hit(product=product,
date=datetime.datetime.now(),
session=request.session.session_key).save()
in your purchase view you get the Hit-object and set purchased=True
Now in your templates/DB-Tools you can do real statistics.
Of course it can generate a lot of DB-Objects over the time, so you should think about a good deletion-strategy (like sum the data after 3 month into another model MonthlyHitArchive)
If you think, that displaying this statistics would generate to much DB-Traffic, you should consider using some caching.
I solved this the way I didn't want to solve it. I added week_rank, month_rank and overall_rank to Product and then I just added the following to my ProductStatistic model.
def calculate_rank(self, days_ago=7, overall=False):
if overall:
return self._default_manager.all().extra(
select = {'rank': 'SUM(hits + (clicks * 2))'}
).values()[0]['rank']
else:
return self._default_manager.filter(
date__gte = datetime.today()-timedelta(days_ago),
date__lte = datetime.today()
).extra(
select = {'rank': 'SUM(hits + (clicks * 2))'}
).values()[0]['rank']
def save(self, *args, **kwargs):
super(ProductStatistic, self).save(*args, **kwargs)
t = Product.objects.get(pk=self.product.id)
t.week_rank = self.calculate_rank()
t.month_rank = self.calculate_rank(30)
t.overall_rank = self.calculate_rank(overall=True)
t.save()
I'll leave it unsolved if there is a better solution.