How can I schedule the invalidation of a redis cache? - django

I am using django as a framework to build a content management system for a site with a blog.
Each blog post will have a route that contains a unique identifier for the blog post. These blog posts can be scheduled and have an expiry date. This means that the routes have to be dynamic.
The entire site needs to be cached and we have redis set up as a back end cache. We currently cache rendered pages against out static routes, but need to find a way of caching pages against the dynamic routes (and invalidating them when the blog posts expire.)
I could use a cron job but it isn't appropriate because...
a) New blog posts go live rarely and not periodically
b) Users can schedule posts to the minute. This means that a cron job would have to run every minute which seems like overkill!
I've just found the django-cacheops library, which seems to do exactly what I need (schedule the invalidation of our cache and invalidate them via signals). Is this compatible with our existing setup and how easy is the setup?
I assume this is a pretty common problem - does anyone have any better ideas than the above?

I can't comment on django-cacheops because I've never used it, but Redis provides a really easy way to do this using the EXPIRE command:
Set a timeout on key. After the timeout has expired, the key will automatically be deleted.
Usage:
SET some_key "some_value"
EXPIRE some_key 10
The key some_key will now automatically be cleaned/deleted by Redis in 10 seconds. If you need to delete blog posts' cache knowing when they should be deleted from the outset, this should serve your needs perfectly.

Cacheops invalidate cache when a post is changed, that's its primary use. But you can also expire by timeout:
from cacheops import cached_as, cached_view_as
# A queryset
post = Post.objects.cache(timeout=your_timeout).get(pk=post_pk)
# A function
#cached_as(Post.objects.filter(pk=post_pk), timeout=your_timeout)
def get_post_data(...):
...
# A view
#cached_view_as(Post, timeout=your_timeout)
def post(request, ...):
...
However, there is currently no way you can specify timeout depending on cached object.

Related

How to handle network errors when saving to Django Models

I have a Django .save() execution that loops at n times.
My concern is how to guard against network errors during saving, as some entries could be saved while others won't and there could be no telling.
What is the best way to make sure that the execution is completed?
Here's a sample of my code
# SAVE DEBIT ENTRIES
for i in range(len(debit_journals)):
# UPDATE JOURNAL RECORD
debit_journals[i].approval_no = journal_transaction_id
debit_journals[i].approval_status = 'approved'
debit_journals[i].save()
Either use bulk_create / bulk_update to execute a single DB query, or use transaction.atomic as decorator for your function so that any error on save will rollback your database before your function was run.
Try something like below (I suppose your model name is DebitJournal and debit_journals is a list).
for debit_journal in debit_journals:
debit_journal.approval_no = journal_transaction_id
debit_journal.approval_status = 'approved'
DebitJournal.objects.bulk_update(debit_journals, ["approval_no", "approval_status"])
If debit_journals is a QuerySet you can also try
debit_journals.update(approval_no=journal_transaction_id, approval_status='approved').
It depends of what you call a network error, if it's between the user and the django application or between the django application and the database. If it's only between the user and the app, note that if the request has been sent correctly even if the user lose the connection afterward the objects will be created. So a user might not have the request response, but objects will still be created.
If it's between the database and the django application some objects might still be created before the error.
Usually if you want a "All or Nothing" behaviour you should use manual transaction as described there: https://docs.djangoproject.com/en/4.1/topics/db/transactions/
Note that if the creation is really long you might hit the request timeout. If the creation takes more than a few seconds you should consider making it a background task. The request is only there to create the task.
See Python task queue alternatives and frameworks for 3rd party solutions.

How to invalidate AWS APIGateway cache

We have a service which inserts into dynamodb certain values. For sake of this question let's say its key:value pair i.e., customer_id:customer_email. The inserts don't happen that frequently and once the inserts are done, that specific key doesn't get updated.
What we have done is create a client library which, provided with customer_id will fetch customer_email from dynamodb.
Given that customer_id data is static, what we were thinking is to add cache to the table but one thing which we are not sure that what will happen in the following use-case
client_1 uses our library to fetch customer_email for customer_id = 2.
The customer doesn't exist so API Gateway returns not found
APIGateway will cache this response
For any subsequent calls, this cached response will be sent
Now another system inserts customer_id = 2 with its email id. This system doesn't know if this response has been cached previously or not. It doesn't even know that any other system has fetched this specific data. How can we invalidate cache for this specific customer_id when it gets inserted into dynamodb
You can send a request to the API endpoint with a Cache-Control: max-age=0 header which will cause it to refresh.
This could open your application up to attack as a bad actor can simply flood an expensive endpoint with lots of traffic and buckle your servers/database. In order to safeguard against that it's best to use a signed request.
In case it's useful to people, here's .NET code to create the signed request:
https://gist.github.com/secretorange/905b4811300d7c96c71fa9c6d115ee24
We've built a Lambda which takes care of re-filling cache with updated results. It's a quite manual process, with very little re-usable code, but it works.
Lambda is triggered by the application itself following application needs. For example, in CRUD operations the Lambda is triggered upon successful execution of POST, PATCH and DELETE on a specific resource, in order to clear the general GET request (i.e. clear GET /books whenever POST /book succeeded).
Unfortunately, if you have a View with a server-side paginated table you are going to face all sorts of issues because invalidating /books is not enough since you actually may have /books?page=2, /books?page=3 and so on....a nightmare!
I believe APIG should allow for more granular control of cache entries, otherwise many use cases aren't covered. It would be enough if they would allow to choose a root cache group for each request, so that we could manage cache entries by group rather than by single request (which, imho, is also less common).
Did you look at this https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-caching.html ?
There is way to invalidate entire cache or a particular cache entry

Does Django have a method of storage like HTML5's localStorage or sessionStorage?

Does Django have a method of storage like HTML5's localStorage or sessionStorage?
I want to use the Django/Django-Rest-Framework as the backend of my project.
but whether the Django has a convenient storage method to server my project? if in the HTML5 there are localStorage and sessionStorage, which is very useful.
EDIT
I want to use a simple method to store my temporary data, such as, if there is a requirement to share the data.
such as I have 3 providers (a_provider, b_provider, c_provider), they can process a origin_data.
in a function,
def process_data():
a_provider(get_data()) # process a
b_provider(get_data()) # process b
c_provider(get_data()) # process c
the get_data() can get the shared data.
rather than every process to return the processed data as param to pass into other provider.
There are some 'Offline Solutions' you can check out here
However, If you are trying to completely run in local-storage Django probably isn't your choice. Some new development on this particular topic is being explored by the awesome team at BeeWare.
Hope this helps.

how do i update my db data(mongodb) after fixed interval of time using django?

My User collection contains data such as
{"user1":"zera",
"my_status":"active",
"date_creation" : ISODate("2013-10-01T10:15:52.055Z")
}
{"user2":"dfgf",
"my_status":"noactive",
"date_creation": ISODate("2013-10-01T08:55:41.212Z")
}
I need to find each user with my_status :"active" and update their my_status after 24 hours from each user's date_creation.
Can anyone suggest a method to do it using django?
Well, I'd write an async task to keep polling the database to check for users with active status. If the user is active, update their status.
For the asynchronous tasks, you can use python-rq but to make things easier there's a django module for python-rq, it's django-rq. Also, Celery is another popular and good option. There's also a module for Django, you can find it here.

Updating a hit counter when an image is accessed in Django

I am working on doing some simple analytics on a Django webstite (v1.4.1). Seeing as this data will be gathered on pretty much every server request, I figured the right way to do this would be with a piece of custom middleware.
One important metric for the site is how often given images are accessed. Since each image is its own object, I thought about using django-hitcount, but figured that was unnecessary for what I was trying to do. If it proves easier, I may use it though.
The current conundrum I face is that I don't want to query the database and look for a given object for every HttpRequest that occurs. Instead, I would like to wait until a successful response (indicated by an HttpResponse.status of 200 or whatever), and then query the server and update a hit field for the corresponding image. The reason the only way to access the path of the image is in process_request, while the only way to access the status code is in process_response.
So, what do I do? Is it as simple as creating a class variable that can hold the path and then lookup the file once the response code of 200 is returned, or should I just use django-hitcount?
Thanks for your help
Set up a cron task to parse your Apache/Nginx/whatever access logs on a regular basis, perhaps with something like pylogsparser.
You could use memcache to store the counters and then periodically persist them to the database. There are risks that memcache will evict the value before it's been persisted but this could be acceptable to you.
This article provides more information and highlights a risk arising when using hosted memcache with keys distributed over multiple servers. http://bjk5.com/post/36567537399/dangers-of-using-memcache-counters-for-a-b-tests