what is the best way for heavy periodic tasks (django) - django

i have a django app that has more than 500 tables each table is for a device (each device sends 500 data every day and i store them in database).
i should get 10 minutes,hourly,daily,weekly, monthly averages and store them in another table called averages.
i don't know what is the best way for these periodic tasks. using django modules like celery-beat or using supervisor of the host?
thanks a lot.

Related

best way to store celery results

I am using celery for my async tasks, however, I am using Redis as backend broker. My current plan is to filter and manipulate data inside the backend and store it into django-db for viewing etc.
Is this the recommended way? Or I should use Django DB as backend results db and store all the raw data then filter and manipulate it into different tables?

What is an best way of manipulating millions of records and saving back to database with option to cancel and pause?

Am using Django with python 2.7. am having an excel sheet with millions of rows. I have to manipulate rows data and save back to database (postgresql). I want to do it efficiently. Following are the approaches am thinking of :
1.) enqueue all the rows (data) in a queue (preferably RabbitMQ) and will fetch with a bunch of 100 entries at once. and will execute and
will save it in database.
2.) thinking of using thread in background which will be managing 100 rows by each thread and will save back the result to database. I'm not
sure how many database connections will be opened in this scenario.
Can you please suggest me an efficient way to achieve this. it will be really very helpful.
How to implement cancel and pause logic in this scenario? should i be using a database ?

Long running prepare statements on Azure SQL Data Warehouse

I am running a functional test of a 3rd party application on an Azure SQL Data Warehouse database set at DWU 1000. In reviewing the current activity via:
sys.dm_pdw_exec_requests
I see:
prepare statements taking 30+ seconds,
NULL statements taking up to 25 seconds,
compilation of statements takes up to 60 seconds,
explain statements taking 60+ seconds, and
select count(1) from empty tables take 60+ seconds.
How does one identify the bottleneck involved?
The test has been running for a few hours and the Azure portal shows little DWU consumed on average, so I doubt that modifying the DWU will make any difference.
The third-party application has workload management feature, so I've specified a limit of 30 connections to the ADW database (understanding that only 32 sessions are active on the database itself.)
There are approximately ~1,800 tables and ~350 views in the database across 29 schemas (per information_schema.tables).
I am in a functional testing mode, so many of the tables involved in the queries have not yet been loaded, but statistics have been created on every column on every table in the scope of the test.
One userID is being used in the test. It is in smallrc.
have a look at your tables - in the query? Make sure all columns in joins, group by, and order by have up-to-date statistics.
https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-statistics

How to do a task scheduling in python?

I have a python application which is used fetch data from one database to another database. I want to schedule this process in a daily basis. How to implement this?

Cronjob: Web Service query

I have a cronjob that runs every hours and parse 150,000+ records. Each record is summarized individually in a MySQL tables. I use two web services to retrieve the user information.
User demographic (ip, country, city etc.)
Phone information (if landline or cell phone and if cell phone what is the carrier)
Every time I get 1 record I check if I have information and if not I call these web services. After tracing my code I found out both of these calls takes 2 to 4 seconds and it makes my cronjob very slow and I can't compile statistics on time.
Is there a way to make these web service faster?
Thanks
simple:
get the data locally and use mellissa data:
for ip: http://w10.melissadata.com/dqt/websmart/ip-locator.htm
for phone: http://www.melissadata.com/fonedata.html
you can also cache them using memcache or APC which will make it faster since he does not have to request the data from the api or database.
A couple of ideas... if the same users are returning, caching the data in another table would be very helpful... you would only look it up once and have it for returning users. Upon re-reading the question it looks like you are doing that.
Another option would be to spawn new threads when you need to do the look-ups. This could be a new thread for each request, or if this is not feasible you could have n service threads ready to do the look-ups and update the results.