As a little backstory, I'm working on an application which pipes KML to googleearth based on packet data from a mesh network. Example:
UDP Packet ---> Django ORM to place organized data in DB ---> Django view to read the DB and return a KML representation of the packet data (gps, connections, etc) to Google Earth.
The problem here being that the DB rows tell a story and doing a query, or a series of queries, isn't enough to "paint a picture" of this mesh network. I need to retain some internal python structures and classes to maintain a "state" of the network between requests/responses.
Here is where I need help. Currently, to retain this "state", I use Django's low level cache API to store a class with unlimited timeout. And every request, I just retrieve that class from the cache, add to it's structures, and save it back to the cache. This seems to be working, and pretty well actually; but it doesn't feel right.
Maybe I should ditch Django and extend Python's BaseHTTP class to handle my requests/responses?
Maybe I should create a separate application to retain the "state" and Django pipes it request data through a socket?
I just feel like I'm misusing Django and being unsafe with crucial data. And help?
I know this is unconventional and a little crazy.
(Note: I'm currently using Django's ORM outside of a Django instance for the UDP socket listener, so I am aware I can use Django's environment outside of an instance.)
Maybe I should ditch Django and extend Python's BaseHTTP class to handle my requests/responses?
Ditching Django for Python's BaseHTTP won't change the fact that HTTP is a stateless protocol and you want to add state to it. You are correct that storing state in the cache is somewhat volatile depending on the cache backend. It's possible you could switch this to the session instead of the cache.
Maybe I should create a separate application to retain the "state" and Django pipes it request data through a socket?
Yes this seems like a viable option. Again HTTP is stateless so if you want state you need to persist it somewhere and the DB is another place you could store this.
This really sounds like the kind of storage problem Redis and MongoDB are made to efficiently handle. You should be able to find a suitable data structure to keep track of your packet data and matching support for creating cheap, atomic updates to boot.
Related
I'm working on a Django middleware to store all requests/responses in my main database (Postgres / SQLite).
But it's not hard to guess that the overhead will be crazy, so I'm thinking to use Redis to queue the requests for an amount of time and then send them slowly to my database.
e.g. receiving 100 requests, storing them in database, waiting to receive another 100 requests and doing the same, or something like this.
The model is like this:
url
method
status
user
remote_ip
referer
user_agent
user_ip
metadata # any important piece of data related to request/response e.g. errors or ...
created_at
updated_at
My questions are "is it a good approach? and how we can implement it? do you have any example that does such a thing?"
And the other question is that "is there any better solution"?
This doesn't suit the concrete question/answer format particularly well, unfortunately.
"Is this a good approach" is difficult to answer directly with a yes or no response. It will work and your proposed implementation looks sound, but you'll be implementing quite a bit of software and adding quite a bit of complexity to your project.
Whether this is desirable isn't easily answerable without context only you have.
Some things you'll want to answer:
What am I doing with these stored requests? Debugging? Providing an audit trail?
If it's for debugging, what does a database record get us that our web server's request logs do not?
If it's for an audit trail, is every individual HTTP request the best representation of that audit trail? Does an auditor care that someone asked for /favicon.ico? Does it convey the meaning and context they need?
Do we absolutely need every request stored? For how long? How do we handle going over our storage budget? How do we handle in edge cases like the client hanging up before getting the response, or we've processed the request but crashed before sending a response or logging the record?
Does logging a request in band with the request itself present a performance cost we actually can't afford?
Compare the strengths and weaknesses of your approach to some alternatives:
We can rely on the web server's logs, which we're already paying the cost for and are built to handle many of the oddball cases here.
We can write an HTTPLog model in band with the request using a simple middleware function, which solves some complexities like "what if redis is down but django and the database aren't?"
We write an audit logging system by providing any context needed to an out-of-band process (perhaps through signals or redis+celery)
Above all: capture your actual requirements first, implement the simplest solution that works second, and optimize only after you actually see performance issues.
I would not put this functionality in my Django application. There are many tools to do that. One of them is NGINX, which is a reverse proxy server which you can put infront of Django. Then you can use the access log from NGINX. Also, you can format those logs according to your need. Usually for this big amount of data, it is better to not store them in database, because this data will rarely be used. You can store them in a S3 bucket or just in plain files and use a log parser tool to parse them.
Is there any utility in defining models in Ember if I am only going to read from the server?
Yes there is. Modelling your frontend to be similar to the backend will always be helpful.
The plus points I can think of are
Hassle free handling of relationships.
Caching of results in the store would save on server calls.
Writing back to the server becomes easy if you suddenly need that.
Selective data loading with the {async:true} option in relationships. Load only what you need.
Check out this link http://emberjs.com/blog/2014/03/18/the-road-to-ember-data-1-0.html for more information.
I'm creating a REST-centric application that will use a NoSQL data store of some kind for most of the domain-specific models. For the primary site that I intend to build around the REST data framework, I still want to use a traditional relational database for users, billing info, and other metadata that's outside the scope of the domain data model.
I've been advised that this approach is only a good idea if I can avoid performing I/O to both the RDBMS and NoSQL data stores on the same request as much as possible.
My questions:
Is this good advice? (I'm assuming so, but the rest of these questions are useless if the first premise is wrong.)
I'd like to cache at least the logged on user as much as possible. Is it possible to use Django sessions to do this in a way that is secure, reliably correct, and fault-tolerant? Ideally, I would like to have the session API be a safe, drop-in replacement for retrieving the current user with as little interaction with the users table as possible. What legwork will I need to do to hook everything up?
If this ends up being too much of a hassle, how easy is it to store user information in the NoSQL store (that is, eliminate the RDBMS completely) without using django-nonrel? Can custom authentication/authorization backends do this?
I'm pondering using the same approach for my application and I think it is generally safe but requires special care to tackle cache consistency issues.
The way Django normally operates is that when request is received, a query is run against a Session table to find a session associated with a cookie from the request. Then, when you access request.user, a query is run against a User table to find a user for a given session (if any, because Django supports anonymous sessions). So, by default, Django needs two queries to associate each request with a user, which is expensive.
A nice thing about Django session is that it can be used as a key, value store without extending any model class (unlike for example User class that is hard to extend with additional fields). So you can for example put request.session['email'] = user.email to store additional data in the session. This is safe, in a sense, that what you read from request.session dictionary is for sure what you have put there, client has no way to change these values. So you can indeed use this technique to avoid query to the User table.
To avoid query to the Session table, you need to enable session caching (or store session data in the client cookie with django.contrib.sessions.backends.signed_cookies, which is safe, because such cookies are cryptographically protected against modification by a client).
With caching enabled, you need 0 queries to associate a request with user data. But the problem is cache consistency. If you use local in memory cache with write through option (django.core.cache.backends.locmem.LocMemCache with django.contrib.sessions.backends.cached_db) the session data will be written to a DB on each modification, but it won't be read from a DB if it is present in the cache. This introduces a problem if you have multiple Django processes. If one process modifies a session (for example changes session['email']), other process can still use an old, cached value.
You can solve it by using shared cache (Memcached backend), which guarantees that changes done by one process are visible to all other processes. In this way, you are replacing a query to a Session table with a request to a Memcached backend, which should be much faster.
Storing session data in a client cookie can also solve cache consistency issues. If you modify an email field in the cookie, all future requests send by the client should have a new email. Although client can deliberately send an old cookie, which still carries old values. Whether this is a problem is application depended.
Currently, I'm working on a project where I have a server - client relationship between two django applications running on separate hosts.
The server has to store and provide a large amount of relational data, eg: Suppliers, Companys, Products, etc etc..
The client downloads data on request from the server and adds it to their database. clients can also upload from their station to the database to expand it.
The previous person that developed this used XMLRPC to transfer the vast (13MB typical) XML file from server to client. now really all we're sending are database agnostic objects to be stored in a database so i wondered if there was a more efficient way of doing it?
Please ask for more details if you need them, I wasn't really sure what you'd need to know
EDIT: Efficient in terms of Networking, and Server Side Processing. Clients can do the heavy lifting.
A shared database design seems more suitable. But of course there may be security, political or organisational reasons ruling that out. Plus there would be significant re-design required.
To reduce network bandwidth first check that HTTP gzip compression is enabled.
If it's just a dumb data transfer JSON would generally be a lot more compact than XMLRPC. Does the data look amenable to a straight translation to JSON? This would still require some server-side processing.
For minimal server-side processing (if the database tables are relatively similar) it may be very efficient to just send the client a dump of the relevant db query. Of course unless the tables have the same schema you would have to do some client-side processing of raw SQL, which is not ideal.
I have 2 django servers, with their own database, I want to exchange some specific objects between them over the http protocol.
Actually, I planed to create some views to generate XML output on one side to be imported on the other side. Is there a nicer way ?
Is there a reason this needs to happen through http?
If you just want to read data from one server to be used on the other, you could create a simple API that returns a representation of the object you queried for (in xml/json or whatever other format you wanted).
If there is going to be a decent amount of processing going on, or slow communication, and you don't need it to happen real time (in the request/response cycle), you could look at a message queue. Something like RabbitMQ for instance.
If you want both servers to have direct access to both databases, you could try to take advantage of Django's multiple database support.
If it's more of a one-off copy of data, just write a small (non-Django) script to do it.