App Engine GQL: querying a date range - django

What would be the App Engine equivalent of this Django statement?
return Post.objects.get(created_at__year=bits[0],
created_at__month=bits[1],
created_at__day=bits[2],
slug__iexact=bits[3])
I've ended up writing this:
Post.gql('WHERE created_at > DATE(:1, :2, :3) AND created_at < DATE(:1, :2, :4) and slug = :5',
int(bit[0]), int(bit[1]), int(bit[2]), int(bit[2]) + 1, bit[3])
But it's pretty horrific compared to Django. Any other more Pythonic/Django-magic way, e.g. with Post.filter() or created_at.day/month/year attributes?

How about
from datetime import datetime, timedelta
created_start = datetime(year, month, day)
created_end = created_start + timedelta(days=1)
slug_value = 'my-slug-value'
posts = Post.all()
posts.filter('created_at >=', created_start)
posts.filter('created_at <', created_end)
posts.filter('slug =', slug_value)
# You can iterate over this query set just like a list
for post in posts:
print post.key()

You don't need 'relativedelta' - what you describe is a datetime.timedelta. Otherwise, your answer looks good.
As far as processing time goes, the nice thing about App Engine is that nearly all queries have the same cost-per-result - and all of them scale proportionally to the records returned, not the total datastore size. As such, your solution works fine.
Alternately, if you need your one inequality filter for something else, you could add a 'created_day' DateProperty, and do a simple equality check on that.

Ended up using the relativedelta library + chaining the filters in jQuery style, which although not too Pythonic yet, is a tad more comfortable to write and much DRYer. :) Still not sure if it's the best way to do it, as it'll probably require more database processing time?
date = datetime(int(year), int(month), int(day))
... # then
queryset = Post.objects_published()
.filter('created_at >=', date)
.filter('created_at <', date + relativedelta(days=+1))
...
and passing slug to the object_detail view or yet another filter.

By the way you could use the datetime.timedelta. That lets you find date ranges or date deltas.

Related

How to add weeks to a datetime column, depending on a django model/dictionary?

Context
There is a dataframe of customer invoices and their due dates.(Identified by customer code)
Week(s) need to be added depending on customer code
Model is created to persist the list of customers and week(s) to be added
What is done so far:
Models.py
class BpShift(models.Model):
bp_name = models.CharField(max_length=50, default='')
bp_code = models.CharField(max_length=15, primary_key=True, default='')
weeks = models.IntegerField(default=0)
helper.py
from .models import BpShift
# used in views later
def week_shift(self, df):
df['DueDateRange'] = df['DueDate'] + datetime.timedelta(
weeks=BpShift.objects.get(pk=df['BpCode']).weeks)
I realised my understanding of Dataframes is seriously flawed.
df['A'] and df['B'] would return Series. Of course, timedelta wouldn't work like this(weeks=BpShift.objects.get(pk=df['BpCode']).weeks).
Dataframe
d = {'BpCode':['customer1','customer2'],'DueDate':['2020-05-30','2020-04-30']}
df = pd.DataFrame(data=d)
Customer List csv
BP Name,BP Code,Week(s)
Customer1,CA0023MY,1
Customer2,CA0064SG,1
Error
BpShift matching query does not exist.
Commentary
I used these methods in hope that I would be able to change the dataframe at once, instead of
using df.iterrows(). I have recently been avoiding for loops like a plague and wondering if this
is the "correct" mentality. Is there any recommended way of doing this? Thanks in advance for any guidance!
This question Python & Pandas: series to timedelta will help to take you from Series to timedelta. And although
pandas.Series(
BpShift.objects.filter(
pk__in=df['BpCode'].tolist()
).values_list('weeks', flat=True)
)
will give you a Series of integers, I doubt the order is the same as in df['BpCode']. Because it depends on the django Model and database backend.
So you might be better off to explicitly create not a Series, but a DataFrame with pk and weeks columns so you can use df.join. Something like this
pandas.DataFrame(
BpShift.objects.filter(
pk__in=df['BpCode'].tolist()
).values_list('pk', 'weeks'),
columns=['BpCode', 'weeks'],
)
should give you a DataFrame that you can join with.
So combined this should be the gist of your code:
django_response = [('customer1', 1), ('customer2', '2')]
d = {'BpCode':['customer1','customer2'],'DueDate':['2020-05-30','2020-04-30']}
df = pd.DataFrame(data=d).set_index('BpCode').join(
pd.DataFrame(django_response, columns=['BpCode', 'weeks']).set_index('BpCode')
)
df['DueDate'] = pd.to_datetime(df['DueDate'])
df['weeks'] = pd.to_numeric(df['weeks'])
df['new_duedate'] = df['DueDate'] + df['weeks'] * pd.Timedelta('1W')
print(df)
DueDate weeks new_duedate
BpCode
customer1 2020-05-30 1 2020-06-06
customer2 2020-04-30 2 2020-05-14
You were right to want to avoid looping. This approach gets all the data in one SQL query from your Django model, by using filter. Then does a left join with the DataFrame you already have. Casts the dates and weeks to the right types and then computes a new due date using the whole columns instead of loops over them.
NB the left join will give NaN and NaT for customers that don't exist in your Django database. You can either avoid those rows by passing how='inner' to df.join or handle them whatever way you like.

How can I use the F() object to do this with the Django ORM?

I encountered a model like this:
class Task(models.Model):
timespan = models.IntegerField(null=True, blank=True)
class Todo(models.Model):
limitdate = models.DateTimeField(null=True, blank=True)
task = models.ForeignKey(Task)
I need to extract all Todos with a limitdate that is lower or equal to today's date + a timespan defined in the related Task model.
Something like (dummy example):
today = datetime.datetime.now()
Todo.objects.filter(limitdate__lte=today + F('task__timespan'))
Now, I can do that with a loop but I'm looking for a way to do it with F(), and I can't find one.
I'm starting to wonder if I can do that with F(). Maybe I should use extra ?
Please note that I don't have the luxury of changing the model code.
The main issue is that DB does not support date + integer and its hard to write ORM query to date + integer::interval, for PostgreSQL for example, where integer is the value of the task_timespan column, in days count.
However, as
limitdate <= today + task__timespan equals to
limitdate - today <= task__timespan
We could transform the query to
Todo.objects.filter(task__timespan__gte=F('limitdate') - today).distinct()
thus the SQL becomes something like integer >= date - date, that should work in PostgreSQL because date - date outputs interval which could be compared w/ integer days count.
In other DBs such as SqLite, it's complicated because dates need to be cast w/ julianday() at first...and I think you need to play w/ extra() or even raw() to get the correct SQL.
Also, as Chris Pratt suggests, if you could use timestamp in all relative fields, the query task might become easier because of less limited add and subtract operations.
P.S. I don't have env to verify it now, you could try it first.
The problem is that there's no TIMESPAN type on a database. So, F cannot return something that you can actually work with in this context. I'm not sure what type of field you actually used in your database, but the only way I can think of to do this is to the store the timespan as an integer consisting of seconds, add that to "today" as a timestamp, and then convert it back into a datetime which you can use to compare with limitdate. However, I'm unsure if Django will accept such complex logic with F.

Django queryset aggregate by time interval

Hi I am writing a Django view which ouputs data for graphing on the client side (High Charts). The data is climate data with a given parameter recorded once per day.
My query is this:
format = '%Y-%m-%d'
sd = datetime.datetime.strptime(startdate, format)
ed = datetime.datetime.strptime(enddate, format)
data = Climate.objects.filter(recorded_on__range = (sd, ed)).order_by('recorded_on')
Now, as the range is increased the dataset obviously gets larger and this does not present well on the graph (aside from slowing things down considerably).
Is there an way to group my data as averages in time periods - specifically average for each month or average for each year?
I realize this could be done in SQL as mentioned here: django aggregation to lower resolution using grouping by a date range
But I would like to know if there is a handy way in Django itself.
Or is it perhaps better to modify the db directly and use a script to populate month and year fields from the timestamp?
Any help much appreciated.
Have you tried using django-qsstats-magic (https://github.com/kmike/django-qsstats-magic)?
It makes things very easy for charting, here is a timeseries example from their docs:
from django.contrib.auth.models import User
import datetime, qsstats
qs = User.objects.all()
qss = qsstats.QuerySetStats(qs, 'date_joined')
today = datetime.date.today()
seven_days_ago = today - datetime.timedelta(days=7)
time_series = qss.time_series(seven_days_ago, today)
print 'New users in the last 7 days: %s' % [t[1] for t in time_series]

Select DISTINCT individual columns in django?

I'm curious if there's any way to do a query in Django that's not a "SELECT * FROM..." underneath. I'm trying to do a "SELECT DISTINCT columnName FROM ..." instead.
Specifically I have a model that looks like:
class ProductOrder(models.Model):
Product = models.CharField(max_length=20, promary_key=True)
Category = models.CharField(max_length=30)
Rank = models.IntegerField()
where the Rank is a rank within a Category. I'd like to be able to iterate over all the Categories doing some operation on each rank within that category.
I'd like to first get a list of all the categories in the system and then query for all products in that category and repeat until every category is processed.
I'd rather avoid raw SQL, but if I have to go there, that'd be fine. Though I've never coded raw SQL in Django/Python before.
One way to get the list of distinct column names from the database is to use distinct() in conjunction with values().
In your case you can do the following to get the names of distinct categories:
q = ProductOrder.objects.values('Category').distinct()
print q.query # See for yourself.
# The query would look something like
# SELECT DISTINCT "app_productorder"."category" FROM "app_productorder"
There are a couple of things to remember here. First, this will return a ValuesQuerySet which behaves differently from a QuerySet. When you access say, the first element of q (above) you'll get a dictionary, NOT an instance of ProductOrder.
Second, it would be a good idea to read the warning note in the docs about using distinct(). The above example will work but all combinations of distinct() and values() may not.
PS: it is a good idea to use lower case names for fields in a model. In your case this would mean rewriting your model as shown below:
class ProductOrder(models.Model):
product = models.CharField(max_length=20, primary_key=True)
category = models.CharField(max_length=30)
rank = models.IntegerField()
It's quite simple actually if you're using PostgreSQL, just use distinct(columns) (documentation).
Productorder.objects.all().distinct('category')
Note that this feature has been included in Django since 1.4
User order by with that field, and then do distinct.
ProductOrder.objects.order_by('category').values_list('category', flat=True).distinct()
The other answers are fine, but this is a little cleaner, in that it only gives the values like you would get from a DISTINCT query, without any cruft from Django.
>>> set(ProductOrder.objects.values_list('category', flat=True))
{u'category1', u'category2', u'category3', u'category4'}
or
>>> list(set(ProductOrder.objects.values_list('category', flat=True)))
[u'category1', u'category2', u'category3', u'category4']
And, it works without PostgreSQL.
This is less efficient than using a .distinct(), presuming that DISTINCT in your database is faster than a python set, but it's great for noodling around the shell.
Update:
This is answer is great for making queries in the Django shell during development. DO NOT use this solution in production unless you are absolutely certain that you will always have a trivially small number of results before set is applied. Otherwise, it's a terrible idea from a performance standpoint.

Bulk updating a table

I want to update a customer table with a spreadsheet from our accounting system. Unfortunately I can't just clear out the data and reload all of it, because there are a few records in the table that are not in the imported data (don't ask).
For 2000 records this is taking about 5 minutes, and I wondered if there was a better way of doing it.
for row in data:
try:
try:
customer = models.Retailer.objects.get(shared_id=row['Customer'])
except models.Retailer.DoesNotExist:
customer = models.Retailer()
customer.shared_id = row['Customer']
customer.name = row['Name 1']
customer.address01 = row['Street']
customer.address02 = row['Street 2']
customer.postcode = row['Postl Code']
customer.city = row['City']
customer.save()
except:
print formatExceptionInfo("Error with Customer ID: " + str(row['Customer']))
Look at my answer here: Django: form that updates X amount of models
The QuerySet has update() method - rest is explained in above link.
I've had some success using this bulk update snippet:
http://djangosnippets.org/snippets/446/
It's a bit outdated, but it worked on django 1.1, so I suppose you can still make it work. If you are looking for a quick way to do a one time bulk insert, this is the quickest (I'm not sure I'd trust it for regular use without seriously testing performance).
I've made a terribly crude attempt on a solution for this problem, but it's not finished yet and it doesn`t support working with django orm objects directly - yet.
http://pypi.python.org/pypi/dse/0.1.0
It`s not been properly testet and let me know if you have any suggestions on how to improve it. Using the django orm to do stuff like this is terrible.
Thomas