Best way to cache statistics that don't change often - django

I am writing a django app and as part of the job I need to crunch some data and generate statistics and serve them on a RESTful API. The statistics don't change that often, however when the statistics do change, the next request needs to serve the most up to date request. What I am currently doing is using a caching mechanism like django-redis to store the statistics and when a request is made, the view calls the cache client and serve its content. What I would prefer is a caching mechanism that prevents my view from ever being called and also provides up to date content. Is there such away (django plugin) that would allow me to do this?

One way to accomplish this would be to use a 'reverse proxy cache' such as Nginx or Varnish.
Basically when a request would be made to your Django application, it would first pass through your proxy cache. The proxy cache would check to see if the request is available in the cache, and if so, it would serve the response from cache. If the request isn't in the cache, it would hand the request off to django to process te request and issue a response. The response would then pass back through the proxy cache and set the contents of the response into cache so that subsequent requests use the response from the cache.
Invalidating items in the cache per a write through policy as updating an item in the database could accomplished by issuing a cache purge command that is specific to the reverse proxy cache server that you have installed.

Related

How does web (axios) cache works?

When server caches a page, I thought it worked like this, but apparently I'm wrong.
server stores http response contents in a cache-store with a key (that's generated for the endpoint)
one could create separate cache key for certain aspects of requests (such as user language)
when there's a request for the same key, and it's stored in cache-store (such as redis), webserver returns the cached response instead of recreating the response
I think the above understanding is flawed because,,
when I hit the endpoint with axios, first request is hit on the server, but the second request doesn't even hit the endpoint, it seems axios is reusing the previous response it received.
So, if I clear all cache store between two requests, the second request doesn't get to the request handler (view function in django)
for http request (browser reload) does work as I described in the above, it hits the web server and gets the cached response.
In django, I'm using from django.views.decorators.cache import cache_page
But the question is more of general working of web response cache
General web caching is explained very well here * https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching
I guess certain client never reaches out to server when cache-control timeout or expires is set, and some client does reach out to server..

AWS nginx as a service?

I'm looking for a service that allows me to proxy/modify incoming requests inside AWS.
Currently I am using cloudfront, but that has limited functions.
I need to be able to see user agent strings and make proxy decisions based on that - like reverse proxying to another domain, or routing all requests to /index.html.
Anyone know of a service that within AWS - or outside of AWS.
It sounds like you are describing Lambda#Edge, which is a CloudFront enhancement that allows you to define Lambda functions that will fire at any of 4 hook points in the CloudFront signal flow, and modify the request or generate a dynamic response.
Viewer Request triggers allow inspection/modification of requests and dynamic generation of small responses before the cache lookup.
Origin Request triggers are similar, but fire after the cache is checked. They allow you to inspect and modify the request, including changing the origin server, path, and/or query string, or to generate a response instead of allowing CloudFront to proceed with the connection to the origin.
If the request goes to the origin, then once it returns, an Origin Response trigger can fire to modify the response headers or replace the response body with a different body you generate. The response after this trigger is finished with it is what gets stored in the cache, if cacheable.
Once a reaponse is cached, further firing of the Origin Request and Origin Response triggers doesn't occur for subsequent requests that can be served from the cache.
Finally, when the response is ready, whether it came from the cache or the origin, a Viewer Response trigger can modify it further, if desired.
Response triggers can also inspect many of the headers from the original request.
Lambda#Edge functions are written in Node.js, and are presented with the request or responses as simple structured objects that you inspect and/or modify.

Redirecting API requests in Django Rest Framework

I have a two-layer backend architecture:
a "front" server, which serves web clients. This server's codebase is shared with a 3rd party developer
a "back" server, which holds top-secret-proprietary-kick-ass-algorithms, and has a single endpoint to do its calculation
When a client sends a request to a specific endpoint in the "front" server, the server should pass the request to the "back" server. The back server then crunches some numbers, and returns the result.
One way of achieving it is to use the requests library. A simpler way would be to have the "front" server simply redirect the request to the "back" server. I'm using DRF throughout both servers.
Is redirecting an ajax request possible using DRF?
You don't even need the DRF to add a redirection to urlconf. All you need to redirect is a simple rule:
urlconf = [
url("^secret-computation/$",
RedirectView.as_view(url=settings.BACKEND_SECRET_COMPUTATION_URL))),
url("^", include(your_drf_router.urls)),
]
Of course, you may extend this to a proper DRF view, register it with the DRF's router (instead of directly adding url to urlconf), etc etc - but there isn't much sense in doing so to just return a redirect response.
However, the code above would only work for GET requests. You may subclass HttpResponseRedirect to return HTTP 307 (replacing RedirectView with your own simple view class or function), and depending on your clients, things may or may not work. If your clients are web browsers and those may include IE9 (or worse) then 307 won't help.
So, unless your clients are known to be all well-behaving (and on non-hostile networks without any weird way-too-smart proxies - you'll never believe what kinds of insanity those may do to HTTP requests), I'd suggest to actually proxy the request.
Proxying can be done either in Django - write a GenericViewSet subclass that uses requests library - or by using something in front of it, e.g. nginx or Caddy (or any other HTTP server/load balancer that you know best).
For production purposes, as you probably have a fronting webserver, I suggest to use that. This would save implementation time and also a little bit of server resources, as your "front" Django project won't even have to handle the request and keep the worker busy as it waits for the response.
For development purposes, your options may vary. If you use bare runserver then a proxy view may be your best option. If you use e.g. Docker, you may just throw in an HTTP server container in front of your Django container.
For example, I currently have a two-project setup (legacy Django 1.6 project and newer Django 1.11 project, sharing the same database) and a Caddy server in front of those, routing on per-URL basis. With a simple 9-line Caddyfile things just work:
:80
tls off
log / stdout "{common}"
proxy /foo project1:8000 {
transparent
}
proxy / project2:8000 {
transparent
}
(This is a development-mode config.) If you can have something similar, then, I guess, that would be the simplest option.

Django Upstream Caching (Vary On Headers) Not working

I have a view which displays user specific, meaning the content of the response for the same URL is unique per individual authenticated user.
Ideally, these pages would be cached in the browser. However, that does not appear to be the case in Chrome or Firefox (on production or locally).
The development server is processing the view each time, despite the fact that I've set the #vary_on_cookies decorator.
I have the right middleware in place (in the right order):
django.middleware.cache.UpdateCacheMiddleware
django.middleware.cache.FetchFromCacheMiddleware
Do I need to set CACHE_MIDDLEWARE_ANONYMOUS_ONLY = False?
One thing that I've noticed is that the request is sending this cache control header:
Cache-Control:max-age=0
I assume that that might be the root problem. Or is this related to the development server?
Any suggestions?

Website Forms (POST) On Multiple Instances (Servers) Website (Python Django / PHP)

Suppose I have a PHP / Python (Django) website.
The website is running on multiple instances servers.
Meaning the URL for the website is www.test.com, and from a load balancer, it can get the client to www.server1.com or www.server2.com and so on.
When there is a form on the website, and the processing of this form is located on the same page:
Can the following situation exist ? :
- User go to www.test.com - behind the scenes, through the load balancer, he gets to www.server*1*.com. He fills a form.
- The form action (URL) is for www.test.com - so behind the scenes, through the load balancer, he gets to www.server*2*.com.
So here, will the needed form data, and more important for my question maybe - the 'request' data, (like request.SOMETHING at Python Django) will be missing ? Because maybe it was saved before on the session, at www.server*1*.com, and now it is missing at www.server*2*.com ?
The request will always have all data, as that gets forwarded to the edge server. request.POST and request.GET will have all the data from the request. The problem however, is that the session data might not be available at that edge server. Example, you started your session on server1, then request another page from server2. server2 might assign a new session and forbid you to access certain contents.
To overcome this session problem, you can do one of two things:
Share sessions between servers (central session storage)
Always forward the user to the same edge server. Some loadbalancers store the forwared-to edge server in a cookie. On subsequent requests, the user gets forwarded to the same edge node every time. That same edge node will keep the session of that user, so no problems.
Yes, this is a valid concern. Due to the nature of the Web (HTTP), the other request might end up on the other server. This issue is called persistence or stickiness.
The solution here would be to save all this information on the client side (using cookies) and not rely on server-side sessions. So it would be up to you to implement it like this using Python/Django. Using the client-side approach gives the best performance, and should be the easiest to implement.
Keep in mind that this solution bears quite a significant security risk for man-in-the-middle attacks, unless you encrypt the connection with SSL/TSL (using HTTPS), as all of the client data is stored in the cookies which could be intercepted.