I have a database table with over a million records
in my views, i select all records like below:
data=Student.objects.all()
I get a memory error when rendering the result to a grid on the template.
Any good practise to run large querysets without error please?
Regards
Joshua
I can't comment yet so I'll just post this as an answer. You might want to consider using
jquery datatable for front end UI. It has server side processing option which is ideal for dealing with large database. Just a suggestion.
Related
This might sound like a stupid question so apologies in case I'm wasting your time. I have a tons or results coming from my data in my Django project. It's a table with many columns and almost 4000 rows. I am using Datatables for pagination, filtering, horizontal scrolling, sorting the columns.
Client-side (I meant server-side) I am also using django-filter for querying the database.
My problem is that the loading of the initial data (non filtered via django-filter) takes a lot of time. Shall I implement pagination on the client-side (I meant server-side)? If so, how does this work with Datatables? Will Datatables paginate/display only the (first page of) paginated data coming from the server-side query? Is there a way for the two to work together?
Thanks.
I've a model with approximately 150K rows.
It takes 1.3s to render the ListView for this model.
When I click the change link in the ListView I takes almost 2 minutes to render the change view.
Other models have normal render times for the edit view.
Any ideas how to speed this up?
Your best bet is to limit the number of returned rows and implement some type of pagination in your application.
Django conveniently implements a type of pagination
First of all, ask yourself these questions:
Do you have much work with your data in templates?
Can I do this work in a backend and in a template only render it?
Do I use pagination?
As I know pagination in Django implemented with LIMIT and OFFSET sql statements, which work not so quickly when you're having many pages. In our projects, we wrote a row SQL for this purpose which works a little bit faster.
Also, you can install Django Debug Toolbar which can show you what statements django ORM is executing and measure time.
I'm working on a template inside one of my apps and I need to have a lot of records in a table to see how it looks like (and several other behaviors) inside the template when queried. I don't want to waste my time inserting over 30 records one by one. I'm trying to do a bulk insert but I have no previously dumped data or such to populate using it.
The correctness of data is not important to me. the quantity is important.
Does that have anything to do with mocking?
I'm not trying to unit test anything.
Thanks,
Which type of data do you want...??
i mean.. blog posts.. or any other things...try use Faker for django which provieds fake data.
Faker - Documentation
Faker - github
I need to create some fixtures for django test. Does anyone know of a shortcut that lets me get x number of rows from every table in database and serialize it?
Background:
I have Multiple tables with 10's of millions of entries. I have tried to use the ./manage.py dumpdata, but in addition to taking too long there is no way fixture should be that large.
Each table has multiple foreign keys.
The Problem:
The code I am trying to test frequently calls select_related() Meaning I need all the foreign key relationships filled in.
Does anyone know of any tools that can help me follow foreign relationships for serializing DB data??? Any suggestions would be greatly appreciated.
Thank you for your time.
I have used the django-autofixture pluggable apps in a couple projects. You could give that a shot. Instead of using data from your database for tests, create a development database filled with autofixtures.
This link has a few other examples of similar pluggable apps.
http://djangopackages.com/grids/g/fixtures/
Another option is the tool Dynamic Dynamic Fixture, that follow Foreign Keys and Many to Many fields. Also, you can use the option "number_of_laps" that may help you.
I have the following model
class Plugin(models.Model):
name = models.CharField(max_length=50)
# more fields
which represents a plugin that can be downloaded from my site. To track downloads, I have
class Download(models.Model):
plugin = models.ForiegnKey(Plugin)
timestamp = models.DateTimeField(auto_now=True)
So to build a view showing plugins sorted by downloads, I have the following query:
# pbd is plugins by download - commented here to prevent scrolling
pbd = Plugin.objects.annotate(dl_total=Count('download')).order_by('-dl_total')
Which works, but is very slow. With only 1,000 plugins, the avg. response is 3.6 - 3.9 seconds (devserver with local PostgreSQL db), where a similar view with a much simpler query (sorting by plugin release date) takes 160 ms or so.
I'm looking for suggestions on how to optimize this query. I'd really prefer that the query return Plugin objects (as opposed to using values) since I'm sharing the same template for the other views (Plugins by rating, Plugins by release date, etc.), so the template is expecting Plugin objects - plus I'm not sure how I would get things like the absolute_url without a reference to the plugin object.
Or, is my whole approach doomed to failure? Is there a better way to track downloads? I ultimately want to provide users some nice download statistics for the plugins they've uploaded - like downloads per day/week/month. Will I have to calculate and cache Downloads at some point?
EDIT: In my test dataset, there are somewhere between 10-20 Download instances per Plugin - in production I expect this number would be much higher for many of the plugins.
That does seem unusually slow. There's nothing obvious in your query that would cause that slowness, though. I've done very similar queries in the past, with larger datasets, and they have executed in milliseconds.
The only suggestion I have for now is to install the Django debug toolbar, and in its SQL tab find the offending query and go to EXPLAIN to get the database to tell you exactly what it is doing when it executes. If it's doing subqueries, for example, check that they are using an index - if not, you may need to define one manually in the db. If you like, post the result of EXPLAIN here and I'll help further if possible.
Annotations are obviously slow, as they need to update every record in the db.
One direct way would be to denormalize the db field. Use a download_count field on the plugin models that is incremented on the new save of Download. Use the sort by the aggregate query on Plugins.
If you think there are going to be too many downloads to update another record of the Plugin all the time, you can update the download_count field on the Plugin via a cron.