Secure-by-default django ORM layer---how? - django

I'm running a Django shop where we serve each our clients an object graph which is completely separate from the graphs of all the other clients. The data is moderately sensitive, so I don't want any of it to leak from one client to another, nor for one client to delete or alter another client's data.
I would like to structure my code such that I by default write code which adheres to the security requirements (No hard guarantees necessary), but lets me override them when I know I need to.
My main fear is that in a Twig.objects.get(...), I forget to add client=request.client, and likewise for Leaf.objects.get where I have to check that twig__client=request.client. This quickly becomes error-prone and complicated.
What are some good ways to get around my own forgetfulness? How do I make this a thing I don't have to think about?

One candidate solution I have in mind is this:
Set the default object manager as DANGER = models.Manager() on my abstract base class(es).
Have a method ok(request) on said base classes which applies .filter(leaf__twig__branch__trunk__root__client=request.client) as applicable.
use MyModel.ok(request) instead of MyModel.objects wherever feasible.
Can this be improved upon? One not so nice issue is when a view calls a model method, e.g. branch.get_twigs_with_fruit, I now have to either pass a request for it to run through ok or I have to invoke DANGER. I like neither :-\
Is there some way of getting access to the current request? I think that might mitigate the situation...

Ill explain a different problem I had however I think the solution might be something to look into.
Once I was working on a project to visualize data where I needed to have a really big table which will store all the data for all visualizations. That turned out to be a big problem because I would have to do things like Model.objects.filter(visualization=5) which was just not very elegant and not efficient.
To make things simpler and more efficient I ended up creating dynamic models on the fly. Essentially I would create a separate table in the db on the fly and then store a data only for that one visualization in that. My code is something like:
def get_model_class(table_name):
class ModelBase(ModelBase):
def __new__(cls, name, bases, attrs):
name = '{}_{}'.format(name, table_name)
return super(ModelBase, cls).__new__(cls, name, bases, attrs)
class Data(models.Model):
# fields here
__metaclass__ = ModelBase
class Meta(object):
db_table = table_name
return Data
dynamic_model = get_model_class('foo')
This was useful for my purposes because it allowed queries to be much faster but getting back to your issue I think something like this can be useful because this will make sure that each client's data is separate not only via a foreign key, but is actually separated in the db.
Using this method is pretty straight forward except before using the model, you have to call the function to get it for each client. To make things more efficient you can cache/memoize the results of the function call so that it does not have to recompute the same thing more than once.

Related

Django: How to depend on an externally ID, which can be switched?

Consider the following scenario:
Our Django database objects must rely on IDs that are provided by external service A (ESA) - this is because we use this ID to pull the information about objects that aren't created yet from the external directly. ESA might shut down soon, so we also pull information about the same objects from external service B (ESB), and save them as a fallback.
Because these IDs are relied on heavily in views and URLs, the ideal scenario would be to use a #property:
#property
dynamic_id = ESA_id
And then, if ESA shuts down, we can switch easily by changing dynamic_id to ESB_id. The problem with this though, is that properties cannot be used in queryset filters and various other scenarios, which is also a must in this case.
My current thought is to just save ESA_id, ESB_id, and dynamic_ID as regular fields separately and assign dynamic_ID = ESA_id, and then, in case ESA shuts down, simply go over the objects and do dynamic_ID = ESB_id.
But I feel there must be a better way?
Having ESA_id and ESB_id fields in the same table is a good solution, then you have some kind of setting (DEFAULT_SERVICE_ID='ESA_id'|'ESB_id') and your code change the lookup based on this option.
Here you can see an aproach to create filters dynamicly
https://stackoverflow.com/a/310785/1448667

Django ManyToMany with inheritance

I've checked a number of SO articles and I don't believe there is any way to accomplish what I want to do, but before I abandon Django I wanted to articulate the question itself and see if I missed something.
I'm implementing a graph (nodes, edges) which can contain subclasses of a base type. That is, an edge can connect a base class to a subclass, or a subclass to a subclass, etc . . . I want to be able to pull all the edges for a given object, find the objects these edges point to, and call some function on these terminal objects. I was hoping that I could call the function in a polymorphic way, but I can't figure out a way to make that happen.
class Node(models.Model):
...
def dosomething():
class SpecialNode(Node):
...
def dosomething():
class Edge(models.Model):
#yes, related_name is weird, but this seems to be what makes sense
source = models.ForeignKey(Node, related_name='targets')
target = models.ForeignKey(Node, related_name='sources')
With this structure I can do:
sourceedges = node.sources.all()
for sourceedge in sourceedges:
sourceedge.source.dosomething()
But the "dosomething" function is always called on the Node object, even if source is actually a SpecialNode object.
I've tried doing this with django_polymorphic but I don't believe this supports M2M inheritance through an Edge object (this is required for other reasons in my app).
I've tried to use contenttypes, but I think you're only allowed one generic relation per class. Edge, in other words, can't have 2 different generic relations in it.
I imagine I could establish an object called an Endpoint which would have just a single generic relation on it, then link the Edge object to it like so:
class Endpoint(models.Model)
...
content_object = generic.GenericForeignKey(...)
class Edge(models.Model)
source = models.ForeignKey(Endpoint, related_name='targets')
target = models.ForeignKey(Endpoint, related_name='sources')
But this introduces another level of indirection in my model and I start to feel like the framework is coding me rather than the other way around :)
Anyway, if someone has figured out a way to get this particular use case done, please let me know. A better way than what I suggested above would be welcome, as I currently like a lot of things I get out of the box with Django.
Thanks

Django - How to pass dynamic models between pages

I have made a django app that creates models and database tables on the fly. This is, as far as I can tell, the only viable way of doing what I need. The problem arises of how to pass a dynamically created model between pages.
I can think of a few ways of doing such but they all sound horrible. The methods I can think of are:
Use global variables within views.py. This seems like a horrible hack and likely to cause conflicts if there are multiple simultaneous users.
Pass a reference in the URL and use some eval hackery to try and refind the model. This is probably stupid as the model could potentially be garbage collected en route.
Use a place-holder app. This seems like a bad idea due to conflicts between multiple users.
Having an invisible form that posts the model when a link is clicked. Again very hacky.
Is there a good way of doing this, and if not, is one of these methods more viable than the others?
P.S. In case it helps my app receives data (as a json string) from a pre-existing database, and then caches it locally (i.e. on the webserver) creating an appropriate model and table on the fly. The idea is then to present this data and do various filtering and drill downs on it with-out placing undue strain on the main database (as each query returns a few hundred results out of a database of hundreds of millions of data points.) W.R.T. 3, the tables are named based on a hash of the query and time stamp, however a place-holder app would have a predetermined name.
Thanks,
jhoyla
EDITED TO ADD: Thanks guys, I have now solved this problem. I ended up using both answers together to give a complete answer. As I can only accept one I am going to accept the contenttypes one, sadly I don't have the reputation to give upvotes yet, however if/when I ever do I will endeavor to return and upvote appropriately.
The solution in it's totality,
from django.contrib.contenttypes.models import ContentType
view_a(request):
model = create_model(...)
request.session['model'] = ContentType.objects.get_for_model(model)
...
view_b(request):
ctmodel = request.session.get('model', None)
if not ctmodel:
return Http404
model = ctmodel.model_class()
...
My first thought would be to use content types and to pass the type/model information via the url.
You could also use Django's sessions framework, e.g.
def view_a(request):
your_model = request.session.get('your_model', None)
if type(your_model) == YourModel
your_model.name = 'something_else'
request.session['your_model'] = your_model
...
def view_b(request):
your_model = request.session.get('your_model', None)
...
You can store almost anything in the session dictionary, and managing it is also easy:
del request.session['your_model']

Move a python / django object from a parent model to a child (subclass)

I am subclassing an existing model. I want many of the members of the parent class to now, instead, be members of the child class.
For example, I have a model Swallow. Now, I am making EuropeanSwallow(Swallow) and AfricanSwallow(Swallow). I want to take some but not all Swallow objects make them either EuropeanSwallow or AfricanSwallow, depending on whether they are migratory.
How can I move them?
It's a bit of a hack, but this works:
swallow = Swallow.objects.get(id=1)
swallow.__class__ = AfricanSwallow
# set any required AfricanSwallow fields here
swallow.save()
I know this is much later, but I needed to do something similar and couldn't find much. I found the answer buried in some source code here, but also wrote an example class-method that would suffice.
class AfricanSwallow(Swallow):
#classmethod
def save_child_from_parent(cls, swallow, new_attrs):
"""
Inputs:
- swallow: instance of Swallow we want to create into AfricanSwallow
- new_attrs: dictionary of new attributes for AfricanSwallow
Adapted from:
https://github.com/lsaffre/lino/blob/master/lino/utils/mti.py
"""
parent_link_field = AfricanSwallow._meta.parents.get(swallow.__class__, None)
new_attrs[parent_link_field.name] = swallow
for field in swallow._meta.fields:
new_attrs[field.name] = getattr(swallow, field.name)
s = AfricanSwallow(**new_attrs)
s.save()
return s
I couldn't figure out how to get my form validation to work with this method however; so it certainly could be improved more; probably means a database refactoring might be the best long-term solution...
Depends on what kind of model inheritance you'll use. See
http://docs.djangoproject.com/en/dev/topics/db/models/#model-inheritance
for the three classic kinds. Since it sounds like you want Swallow objects that rules out Abstract Base Class.
If you want to store different information in the db for Swallow vs AfricanSwallow vs EuropeanSwallow, then you'll want to use MTI. The biggest problem with MTI as the official django model recommends is that polymorphism doesn't work properly. That is, if you fetch a Swallow object from the DB which is actually an AfricanSwallow object, you won't get an instance o AfricanSwallow. (See this question.) Something like django-model-utils InheritanceManager can help overcome that.
If you have actual data you need to preserve through this change, use South migrations. Make two migrations -- first one that changes the schema and another that copies the appropriate objects' data into subclasses.
I suggest using django-model-utils's InheritanceCastModel. This is one implementation I like. You can find many more in djangosnippets and some blogs, but after going trough them all I chose this one. Hope it helps.
Another (outdated) approach: If you don't mind keeping parent's id you can just create brand new child instances from parent's attrs. This is what I did:
ids = [s.pk for s in Swallow.objects.all()]
# I get ids list to avoid memory leak with long lists
for i in ids:
p = Swallow.objects.get(pk=i)
c = AfricanSwallow(att1=p.att1, att2=p.att2.....)
p.delete()
c.save()
Once this runs, a new AfricanSwallow instance will be created replacing each initial Swallow instance
Maybe this will help someone :)

Multijoin queries in Django

What's the best and/or fastest method of doing multijoin queries in Django using the ORM and QuerySet API?
If you are trying to join across tables linked by ForeignKeys or ManyToManyField relationships then you can use the double underscore syntax. For example if you have the following models:
class Foo(models.Model):
name = models.CharField(max_length=255)
class FizzBuzz(models.Model):
bleh = models.CharField(max_length=255)
class Bar(models.Model):
foo = models.ForeignKey(Foo)
fizzbuzz = models.ForeignKey(FizzBuzz)
You can do something like:
Fizzbuzz.objects.filter(bar__foo__name = "Adrian")
Don't use the API ;-) Seriously, if your JOIN are complex, you should see significant performance increases by dropping down in to SQL rather than by using the API. And this doesn't mean you need to get dirty dirty SQL all over your beautiful Python code; just make a custom manager to handle the JOINs and then have the rest of your code use it rather than direct SQL.
Also, I was just at DjangoCon where they had a seminar on high-performance Django, and one of the key things I took away from it was that if performance is a real concern (and you plan to have significant traffic someday), you really shouldn't be doing JOINs in the first place, because they make scaling your app while maintaining decent performance virtually impossible.
Here's a video Google made of the talk:
http://www.youtube.com/watch?v=D-4UN4MkSyI&feature=PlayList&p=D415FAF806EC47A1&index=20
Of course, if you know that your application is never going to have to deal with that kind of scaling concern, JOIN away :-) And if you're also not worried about the performance hit of using the API, then you really don't need to worry about the (AFAIK) miniscule, if any, performance difference between using one API method over another.
Just use:
http://docs.djangoproject.com/en/dev/topics/db/queries/#lookups-that-span-relationships
Hope that helps (and if it doesn't, hopefully some true Django hacker can jump in and explain why method X actually does have some noticeable performance difference).
Use the queryset.query.join method, but only if the other method described here (using double underscores) isn't adequate.
Caktus blog has an answer to this: http://www.caktusgroup.com/blog/2009/09/28/custom-joins-with-djangos-queryjoin/
Basically there is a hidden QuerySet.query.join method that allows adding custom joins.