I have the following model on my postgresql database:
class UrlXML(models.Model):
uuid = models.UUIDField(default=uuid.uuid4, editable=False, db_index=True)
url = models.TextField()
is_active = models.BooleanField(default=True, db_index=True)
run_last_time = models.DateTimeField(blank=True, null=True)
run_frequency = models.IntegerField(default=24)
Every hour I need to get from database url that need to be downloaded based when on my current time and if the last time it ran was higher then the frequency.
I manage to create the raw query, but I can't manage to create it in Django Queryset.
Here is the following query:
select (run_last_time + INTERVAL '1 hours' * run_frequency), run_frequency, NOW(), run_last_time from urlxml where is_active=True and (run_last_time + INTERVAL '1 hours' * run_frequency) <= NOW();
Example:
Current time is 2017-04-03 11:00:00
I have two url in database:
Url A: Ran last time 2017-04-03 08:00:00 and its frequency is 6 hours
Url B: Ran last time 2017-04-02 11:00:00 and its frequency is 24 hours
When I execute the function at 2017-04-03 11:00:00 (within the margin of + and - 30 minutes), it must bring me the Url B only, 'cause the last time it ran was 24 hours ago.
I managed to find a solution using the extra in the Queryset.
Here it is:
UrlXML.objects.filter(is_active=True).extra(
where={"run_last_time + INTERVAL '1 hours' * run_frequency <= NOW()"}
)
I don't know if this is the best way to do this, but is the only one I manage to find.
If there are better ways to do it, I'm open to suggestions.
If you were to change your model slightly, you could use Query Expressions.
# models.py
class UrlXML(models.Model):
...
run_frequency = models.DurationField(default=timedelta(hours=24))
UrlXML.objects \
.annotate(expires=ExpressionWrapper(
F('run_last_time') + F('run_frequency'),
output_field=DateTimeField())) \
.filter(expires__lte=datetime.now())
This solutions is also a bit more robust, as you can use datetime.timedelta instead of hours only.
Related
Context
There is a dataframe of customer invoices and their due dates.(Identified by customer code)
Week(s) need to be added depending on customer code
Model is created to persist the list of customers and week(s) to be added
What is done so far:
Models.py
class BpShift(models.Model):
bp_name = models.CharField(max_length=50, default='')
bp_code = models.CharField(max_length=15, primary_key=True, default='')
weeks = models.IntegerField(default=0)
helper.py
from .models import BpShift
# used in views later
def week_shift(self, df):
df['DueDateRange'] = df['DueDate'] + datetime.timedelta(
weeks=BpShift.objects.get(pk=df['BpCode']).weeks)
I realised my understanding of Dataframes is seriously flawed.
df['A'] and df['B'] would return Series. Of course, timedelta wouldn't work like this(weeks=BpShift.objects.get(pk=df['BpCode']).weeks).
Dataframe
d = {'BpCode':['customer1','customer2'],'DueDate':['2020-05-30','2020-04-30']}
df = pd.DataFrame(data=d)
Customer List csv
BP Name,BP Code,Week(s)
Customer1,CA0023MY,1
Customer2,CA0064SG,1
Error
BpShift matching query does not exist.
Commentary
I used these methods in hope that I would be able to change the dataframe at once, instead of
using df.iterrows(). I have recently been avoiding for loops like a plague and wondering if this
is the "correct" mentality. Is there any recommended way of doing this? Thanks in advance for any guidance!
This question Python & Pandas: series to timedelta will help to take you from Series to timedelta. And although
pandas.Series(
BpShift.objects.filter(
pk__in=df['BpCode'].tolist()
).values_list('weeks', flat=True)
)
will give you a Series of integers, I doubt the order is the same as in df['BpCode']. Because it depends on the django Model and database backend.
So you might be better off to explicitly create not a Series, but a DataFrame with pk and weeks columns so you can use df.join. Something like this
pandas.DataFrame(
BpShift.objects.filter(
pk__in=df['BpCode'].tolist()
).values_list('pk', 'weeks'),
columns=['BpCode', 'weeks'],
)
should give you a DataFrame that you can join with.
So combined this should be the gist of your code:
django_response = [('customer1', 1), ('customer2', '2')]
d = {'BpCode':['customer1','customer2'],'DueDate':['2020-05-30','2020-04-30']}
df = pd.DataFrame(data=d).set_index('BpCode').join(
pd.DataFrame(django_response, columns=['BpCode', 'weeks']).set_index('BpCode')
)
df['DueDate'] = pd.to_datetime(df['DueDate'])
df['weeks'] = pd.to_numeric(df['weeks'])
df['new_duedate'] = df['DueDate'] + df['weeks'] * pd.Timedelta('1W')
print(df)
DueDate weeks new_duedate
BpCode
customer1 2020-05-30 1 2020-06-06
customer2 2020-04-30 2 2020-05-14
You were right to want to avoid looping. This approach gets all the data in one SQL query from your Django model, by using filter. Then does a left join with the DataFrame you already have. Casts the dates and weeks to the right types and then computes a new due date using the whole columns instead of loops over them.
NB the left join will give NaN and NaT for customers that don't exist in your Django database. You can either avoid those rows by passing how='inner' to df.join or handle them whatever way you like.
I am making a scheduler in Django and having issues filtering my events for the weekly calendar view. The calendar supports multi-day events, and my current filter doesn't work with this weekly view.
Here is my model:
class Event(models.Model):
title = models.CharField(max_length=40)
start = models.DateTimeField()
end = models.DateTimeField()
description = models.TextField()
all_day = models.BooleanField(default=False)
recuring = models.BooleanField(default=False)
recuring_end = models.DateTimeField(blank=True, null=True)
def __str__(self):
return self.title
def get_absolute_url(self):
return '/cab/event/%i/' % self.id
and I'm trying to filter the events that occur during a given week. For single day events I do something like.
events = Event.objects.order_by('start').filter(Q(start__gte=monday) | Q(end__lte=sunday))
This works to retrieve all single day events that occur during the week. It also works for multi-day events that either start or stop during the given week. The issue is retrieving objects that start before and finish after the week, but do span the week.
My idea is to try and filter out any event spanning longer than 9 days (ie. would start Sunday of prior week and finish Monday of next week) since I know these are rare and wont completely destroy performance. I want to do this without specifying a date range as this is not dynamic.
To try and minimise the performance impact I was trying to use F expressions to evaluate the duration of the event with the start and end of the event. My first idea was to do something like:
my_events = Event.objects.order_by('start').filter(Q(start__gte=monday) | Q(end__lte=sunday) | Q( (F('end_day') - F('start_day')) >= 9 ) )
but I get error 'bool' object is not iterable
also tried:
my_events = Event.objects.order_by('start').filter(Q(start__gte=monday) | Q(end__lte=sunday) | Q( (F('end_day') - F('start_day')) >= datetime.timedelta(days=9) ) )
but get can't compare datetime.timedelta to ExpressionNode
Anyone have incite as how to do such a thing?
from datetime import timedelta
Event.objects.filter(end__gt=F('start') + timedelta(days=9))
Documentation has example.
UPDATE:
Events, that span more than 9 days AND (start later than Monday OR end sooner than Sunday), ordered by start.
(Event.objects
.filter(end__gt=F('start') + timedelta(days=9),
Q(start__gte=monday) | Q(end__lte=sunday))
.order_by('start'))
Just the warning to #https://stackoverflow.com/users/4907653/f43d65
's answer, that last query lookups with Q objects might be invalid.
Reference to docs
Lookup functions can mix the use of Q objects and keyword arguments. All arguments provided to a lookup function (be they keyword arguments or Q objects) are “AND”ed together. However, if a Q object is provided, it must precede the definition of any keyword arguments. For example:
Poll.objects.get(
Q(pub_date=date(2005, 5, 2)) | Q(pub_date=date(2005, 5, 6)),
question__startswith='Who',)
… would be a valid query, equivalent to the previous example; but:
# INVALID QUERY
Poll.objects.get(
question__startswith='Who',
Q(pub_date=date(2005, 5, 2)) | Q(pub_date=date(2005, 5, 6))
)
… would not be valid.
I'm wondering if it is possible to use Django on a database of transaction to get all transaction that happened on a(ll) Monday(s) between 10 and 11.
For completeness here is the model definition:
class P1data(models.Model):
date_time = models.DateTimeField(auto_now_add=True, db_index=True)
price = models.DecimalField(max_digits=40, decimal_places=12)
volume = models.DecimalField(max_digits=40, decimal_places=12)
Use the week_day and hour lookups:
P1data.objects.filter(date_time__week_day=2, date_time__hour__range=(10, 11))
UPDATE: If hour lookup doesn't support range then try to use the combination of lte/gte:
P1data.objects.filter(date_time__week_day=2,
date_time__hour__gte=10,
date_time__hour__lte=11)
This is driving me crazy. I've used all the lookup_types and none seem to work.
I need to select an object that was created two weeks ago from today.
Here's what I've got:
twoweeksago = datetime.datetime.now() - datetime.timedelta(days=14)
pastblast = Model.objects.filter(user=user, created=twoweeksago, done=False)
The model has a created field that does this: created = models.DateTimeField(auto_now_add=True, editable=False)
But my query isn't returning everything. Before you ask, yes, there are records in the db with the right date.
Can someone make a suggestion as to what I'm doing wrong?
Thanks
DateTimeField is very different from DateField, if you do
twoweeksago = datetime.datetime.now() - datetime.timedelta(days=14)
That is going to return today's date, hour, minute, second minus 14 days, and the result is going to include also hours minutes seconds etc. So the query:
pastblast = Model.objects.filter(user=user, created=twoweeksago, done=False)
Is going to find for a instance was created just in that exact time, If you only want to care about the day, and not hours, minutes and seconds you can do something like
pastblast = Model.objects.filter(user=user, created__year=twoweeksago.year, created__month=twoweeksago.month, created__day=twoweeksago.day, done=False)
Check the django docs:
https://docs.djangoproject.com/en/1.4/ref/models/querysets/#year
I have a Django model, shown below, that I use to keep track of which ip addresses visit my site and when.
class Visit(models.Model):
created = models.DateTimeField(default=datetime.utcnow)
ip = models.IPAddressField(editable=False)
I'd like to write a method on this model that returns the number of days in took for the last 100 visits from a particular IP. Multiple visits in a single day (hour etc) from an IP all count as separate visits. So, if someone visted the site 100 times in the past 2 days, it would return 2, 100 times in the past 8 days, it would return 8, and so on.
You probably want to change the default= for created_on to be auto_now_add since the datetime.utcnow doesn't update if you're using servers other than the dev server:
class Visit(models.Model):
created = models.DateTimeField(auto_now_add=True,editable=False)
ip = models.IPAddressField(editable=False)
from datetime import datetime
def days_for_100(ip_addr):
now = datetime.now()
dates = Visit.objects.filter(ip=ip_addr)
if dates.count()<100:
latest = dates.latest('-created')
else:
latest = dates.dates('created','day',order='DESC')[99]
return (now-latest).days # timedelta.days
This returns how many days earlier the 100th visit ago happened (or how long ago the first visit occurred if there are less than 100 visits)
An easy approach is to get the last 100 Visit objects for an ip address and count the number of unique created objets in them.
def num_of_dates(ip_addr)
dates = [v.created for v in Visit.objects.filter(ip=ip_addr).order_by('-created')[0:100]]
unique_dates = set(dates)
return len(unique_dates)