Data arrays in Django model - django

I am just starting out with Django and would like to know the best way to deal with the following data. I have DataSets which are comprised of many x,y coordinate pairs which I wish to plot. It is my understanding that Django doesn't support numeric arrays directly in it's models so what is the best way to deal with these? Right now all I can think of is something like I have below:
class DataSet(models.Model):
set_name = models.CharField(max_length=100)
class DataPoint(models.Model):
x = models.FloatField()
y = models.FloatField()
dataset = models.ForeignKey(DataSet)
This seems a bit odd and without having any experience with databases or django I am not sure how to proceed. I am using postgresql right now which I believe does support array entries but I am not sure if I am prepared to make a custom field in Django.

Short of a custom field your idea of defining DataPoint and DataSet models seems to be the way to go. You should consider changing the relationship between the two to a many to many field if there is the possibility of a data point occurring in more than one data set.
It would also help to write (with tests) a thin business layer combining the two to minimize the need to think in terms of how the models are stored in the database.

You can use the Many-to-many model relationship.
From django docs:
from django.db import models
class Topping(models.Model):
# ...
pass
class Pizza(models.Model):
# ...
toppings = models.ManyToManyField(Topping)
More info:
https://docs.djangoproject.com/en/2.1/topics/db/models/#many-to-many-relationships
Example:
https://docs.djangoproject.com/en/2.1/topics/db/examples/many_to_many/

Related

Should I use ArrayField or ManyToManyField for tags

I am trying to add tags to a model for a postgres db in django and I found two solutions:
using foreign keys:
class Post(models.Model):
tags = models.ManyToManyField('tags')
...
class Tag(models.Model):
name = models.CharField(max_length=140)
using array field:
from django.contrib.postgres.fields import ArrayField
class Post(models.Model):
tags = ArrayField(models.CharField(max_length=140))
...
assuming that I don't care about supporting other database-backends in my code, what is a recommended solution ?
If you use an Array field,
The size of each row in your DB is going to be a bit large thus Postgres is going to be using more toast tables
Every time you get the row, unless you specifically use defer the field or otherwise exclude it from the query via only, or values or something, you paying the cost of loading all those values every time you iterate across that row. If that's what you need then so be it.
Filtering based on values in that array, while possible isn't going to be as nice and the Django ORM doesn't make it as obvious as it does for M2M tables.
If you use M2M field,
You can filter more easily on those related values
Those fields are postponed by default, you can use prefetch_related if you need them and then get fancy if you want only a subset of those values loaded.
Total storage in the DB is going to be slightly higher with M2M because of keys, and extra id fields.
The cost of the joins in this case is completely negligible because of keys.
With that being said, the above answer doesn't belong to me. A while ago, I had stumbled upon this dilemma when I was learning Django. I had found the answer here in this question, Django Postgres ArrayField vs One-to-Many relationship.
Hope you get what you were looking for.
If you want the class tags to be monitored ( For eg : how many tags, how many of a particular tag etd ) , the go for the first option as you can add more fields to the model and will add richness to the app.
On the other hand, if you just want it to be a array list just for sake of displaying or minimal processing, go for that option.
But if you wish to save time and add richness to the app, you can use this
https://github.com/alex/django-taggit
It is as simple as this to initialise :
from django.db import models
from taggit.managers import TaggableManager
class Food(models.Model):
# ... fields here
tags = TaggableManager()
and can be used in the following way :
>>> apple = Food.objects.create(name="apple")
>>> apple.tags.add("red", "green", "delicious")
>>> apple.tags.all()
[<Tag: red>, <Tag: green>, <Tag: delicious>]

Django - querying all models that have common fields

If I have many Django models - all with the following common fields: -
created_by = models.ForeignKey(User)
modified_by = models.ForeignKey(User)
and I would like to query all models to find out which objects were created or modified by a specific user, is there a sane way to go about achieving this?
Or do I have to fall back to doing ModelA.objects.filter(created_by=userone) ModelB.objects.filter(created_by=userone) and so on?
I should mention that in reality these fields are in an abstract base class from which all other models inherit them. But let's pretend I didn't tell you what I just told you about the abstract base class, is there still a way to do what I want to do?
to expand on what akshar raaj is saying, you can simplify the code by having an array of models.
query_models = [ModelA, ModelB, ModelC, ModelD]
for query_model in query_models:
results = query_model.objects.filter(created_by=userone)
if len(results) > 0:
print '%s has it!!!' % query_model.__name__
You will have to operate on one model at a time. Djano ORM managers internally make use of a class called Queyset which talks with database and this class QuerySet instances have a model attribute on them which is synonymous to a single table. So, they can only talk with one table, though join operations are possible. But for your scenario you will have to make multiple calls to db.
orm is just an abstraction over database, think if you can do this directly with sql.
At sql:
select val from test union select val from test2 union......;
So there will be a lot of unions since you say you have a lot of models so doesn't make much sense to accomplish this with a single db call.

Django Haystack/Solr: Faceting on a model but show results only from a ForeignKey field

I have two models in Django like follows(in pseudo code)
class Medicine(db.Model):
field_1 = db.CharField()
field_2 = db.CharField()
class Application(db.Model):
field_1 = db.CharField()
field_2 = db.CharField()
medicine = db.ForeignKey(Medicine)
There is a 1:M. One medicine can have many applications.
I need to facet on the fields of Application but only show related Medicine objects. Something like DISTINCT in SQL.
What would be the most straight forward way to accomplish this with haystack?
Do I make SearchIndex for Medicine or Application? If I make SearchIndex for Application, how do I detect/filter duplicate Medicine objects?
PS: I know there's Field Collapsing feature in dev releases of Solr, but I want to avoid doing that, becuase it is huge database and performance critical.
I solved this with the help of Daniel Lindsay(Haystack/pySolr author) on haystack mailing list.
from haystack import indexes
class Medicine(indexes.SearchIndex):
field_1 = indexes.MultiValuedField(faceted=True)
# Other field definitions
def prepare_field_1(self, object):
values = list()
for app in object.applications.all():
values.append(app.field_on_which_to_facet)
return values
# define "prepare_fieldname" methods for other fields in similar fashion.
Indexing takes some time as the data to be indexed is huge is huge, but worked like a charm.

Creation of dynamic model fields in django

This is a problem concerning django.
I have a model say "Automobiles". This will have some basic fields like "Color","Vehicle Owner Name", "Vehicle Cost".
I want to provide a form where the user can add extra fields depending on the automobile that he is adding. For example, if the user is adding a "Car", he will extra fields in the form, dynamically at run time, like "Car Milage", "Cal Manufacturer".
Suppose if the user wants to add a "Truck", he will add "Load that can be carried", "Permit" etc.
How do I achieve this in django?
There are two questions here:
How to provide a form where the user can add new fields at run time?
How to add the fields to the database so that it can be retrieved/queried later?
There are a few approaches:
key/value model (easy, well supported)
JSON data in a TextField (easy, flexible, can't search/index easily)
Dynamic model definition (not so easy, many hidden problems)
It sounds like you want the last one, but I'm not sure it's the best for you. Django is very easy to change/update, if system admins want extra fields, just add them for them and use south to migrate. I don't like generic key/value database schemas, the whole point of a powerful framework like Django is that you can easily write and rewrite custom schemas without resorting to generic approaches.
If you must allow site users/administrators to directly define their data, I'm sure others will show you how to do the first two approaches above. The third approach is what you were asking for, and a bit more crazy, I'll show you how to do. I don't recommend using it in almost all cases, but sometimes it's appropriate.
Dynamic models
Once you know what to do, this is relatively straightforward. You'll need:
1 or 2 models to store the names and types of the fields
(optional) An abstract model to define common functionality for your (subclassed) dynamic models
A function to build (or rebuild) the dynamic model when needed
Code to build or update the database tables when fields are added/removed/renamed
1. Storing the model definition
This is up to you. I imagine you'll have a model CustomCarModel and CustomField to let the user/admin define and store the names and types of the fields you want. You don't have to mirror Django fields directly, you can make your own types that the user may understand better.
Use a forms.ModelForm with inline formsets to let the user build their custom class.
2. Abstract model
Again, this is straightforward, just create a base model with the common fields/methods for all your dynamic models. Make this model abstract.
3. Build a dynamic model
Define a function that takes the required information (maybe an instance of your class from #1) and produces a model class. This is a basic example:
from django.db.models.loading import cache
from django.db import models
def get_custom_car_model(car_model_definition):
""" Create a custom (dynamic) model class based on the given definition.
"""
# What's the name of your app?
_app_label = 'myapp'
# you need to come up with a unique table name
_db_table = 'dynamic_car_%d' % car_model_definition.pk
# you need to come up with a unique model name (used in model caching)
_model_name = "DynamicCar%d" % car_model_definition.pk
# Remove any exist model definition from Django's cache
try:
del cache.app_models[_app_label][_model_name.lower()]
except KeyError:
pass
# We'll build the class attributes here
attrs = {}
# Store a link to the definition for convenience
attrs['car_model_definition'] = car_model_definition
# Create the relevant meta information
class Meta:
app_label = _app_label
db_table = _db_table
managed = False
verbose_name = 'Dynamic Car %s' % car_model_definition
verbose_name_plural = 'Dynamic Cars for %s' % car_model_definition
ordering = ('my_field',)
attrs['__module__'] = 'path.to.your.apps.module'
attrs['Meta'] = Meta
# All of that was just getting the class ready, here is the magic
# Build your model by adding django database Field subclasses to the attrs dict
# What this looks like depends on how you store the users's definitions
# For now, I'll just make them all CharFields
for field in car_model_definition.fields.all():
attrs[field.name] = models.CharField(max_length=50, db_index=True)
# Create the new model class
model_class = type(_model_name, (CustomCarModelBase,), attrs)
return model_class
4. Code to update the database tables
The code above will generate a dynamic model for you, but won't create the database tables. I recommend using South for table manipulation. Here are a couple of functions, which you can connect to pre/post-save signals:
import logging
from south.db import db
from django.db import connection
def create_db_table(model_class):
""" Takes a Django model class and create a database table, if necessary.
"""
table_name = model_class._meta.db_table
if (connection.introspection.table_name_converter(table_name)
not in connection.introspection.table_names()):
fields = [(f.name, f) for f in model_class._meta.fields]
db.create_table(table_name, fields)
logging.debug("Creating table '%s'" % table_name)
def add_necessary_db_columns(model_class):
""" Creates new table or relevant columns as necessary based on the model_class.
No columns or data are renamed or removed.
XXX: May need tweaking if db_column != field.name
"""
# Create table if missing
create_db_table(model_class)
# Add field columns if missing
table_name = model_class._meta.db_table
fields = [(f.column, f) for f in model_class._meta.fields]
db_column_names = [row[0] for row in connection.introspection.get_table_description(connection.cursor(), table_name)]
for column_name, field in fields:
if column_name not in db_column_names:
logging.debug("Adding field '%s' to table '%s'" % (column_name, table_name))
db.add_column(table_name, column_name, field)
And there you have it! You can call get_custom_car_model() to deliver a django model, which you can use to do normal django queries:
CarModel = get_custom_car_model(my_definition)
CarModel.objects.all()
Problems
Your models are hidden from Django until the code creating them is run. You can however run get_custom_car_model for every instance of your definitions in the class_prepared signal for your definition model.
ForeignKeys/ManyToManyFields may not work (I haven't tried)
You will want to use Django's model cache so you don't have to run queries and create the model every time you want to use this. I've left this out above for simplicity
You can get your dynamic models into the admin, but you'll need to dynamically create the admin class as well, and register/reregister/unregister appropriately using signals.
Overview
If you're fine with the added complication and problems, enjoy! One it's running, it works exactly as expected thanks to Django and Python's flexibility. You can feed your model into Django's ModelForm to let the user edit their instances, and perform queries using the database's fields directly. If there is anything you don't understand in the above, you're probably best off not taking this approach (I've intentionally not explained what some of the concepts are for beginners). Keep it Simple!
I really don't think many people need this, but I have used it myself, where we had lots of data in the tables and really, really needed to let the users customise the columns, which changed rarely.
Database
Consider your database design once more.
You should think in terms of how those objects that you want to represent relate to each other in the real world and then try to generalize those relations as much as you can, (so instead of saying each truck has a permit, you say each vehicle has an attribute which can be either a permit, load amount or whatever).
So lets try it:
If you say you have a vehicle and each vehicle can have many user specified attributes consider the following models:
class Attribute(models.Model):
type = models.CharField()
value = models.CharField()
class Vehicle(models.Model):
attribute = models.ManyToMany(Attribute)
As noted before, this is a general idea which enables you to add as much attributes to each vehicle as you want.
If you want specific set of attributes to be available to the user you can use choices in the Attribute.type field.
ATTRIBUTE_CHOICES = (
(1, 'Permit'),
(2, 'Manufacturer'),
)
class Attribute(models.Model):
type = models.CharField(max_length=1, choices=ATTRIBUTE_CHOICES)
value = models.CharField()
Now, perhaps you would want each vehicle sort to have it's own set of available attributes. This can be done by adding yet another model and set foreign key relations from both Vehicle and Attribute models to it.
class VehicleType(models.Model):
name = models.CharField()
class Attribute(models.Model):
vehicle_type = models.ForeigngKey(VehicleType)
type = models.CharField()
value = models.CharField()
class Vehicle(models.Model):
vehicle_type = models.ForeigngKey(VehicleType)
attribute = models.ManyToMany(Attribute)
This way you have a clear picture of how each attribute relates to some vehicle.
Forms
Basically, with this database design, you would require two forms for adding objects into the database. Specifically a model form for a vehicle and a model formset for attributes. You could use jQuery to dynamically add more items on the Attribute formset.
Note
You could also separate Attribute class to AttributeType and AttributeValue so you don't have redundant attribute types stored in your database or if you want to limit the attribute choices for the user but keep the ability to add more types with Django admin site.
To be totally cool, you could use autocomplete on your form to suggest existing attribute types to the user.
Hint: learn more about database normalization.
Other solutions
As suggested in the previous answer by Stuart Marsh
On the other hand you could hard code your models for each vehicle type so that each vehicle type is represented by the subclass of the base vehicle and each subclass can have its own specific attributes but that solutions is not very flexible (if you require flexibility).
You could also keep JSON representation of additional object attributes in one database field but I am not sure this would be helpfull when querying attributes.
Here is my simple test in django shell- I just typed in and it seems work fine-
In [25]: attributes = {
"__module__": "lekhoni.models",
"name": models.CharField(max_length=100),
"address": models.CharField(max_length=100),
}
In [26]: Person = type('Person', (models.Model,), attributes)
In [27]: Person
Out[27]: class 'lekhoni.models.Person'
In [28]: p1= Person()
In [29]: p1.name= 'manir'
In [30]: p1.save()
In [31]: Person.objects.a
Person.objects.aggregate Person.objects.all Person.objects.annotate
In [32]: Person.objects.all()
Out[33]: [Person: Person object]
It seems very simple- not sure why it should not be a considered an option- Reflection is very common is other languages like C# or Java- Anyway I am very new to django things-
Are you talking about in a front end interface, or in the Django admin?
You can't create real fields on the fly like that without a lot of work under the hood. Each model and field in Django has an associated table and column in the database. To add new fields usually requires either raw sql, or migrations using South.
From a front end interface, you could create pseudo fields, and store them in a json format in a single model field.
For example, create an other_data text field in the model. Then allow users to create fields, and store them like {'userfield':'userdata','mileage':54}
But I think if you're using a finite class like vehicles, you would create a base model with the basic vehicle characteristics, and then create models that inherits from the base model for each of the vehicle types.
class base_vehicle(models.Model):
color = models.CharField()
owner_name = models.CharField()
cost = models.DecimalField()
class car(base_vehicle):
mileage = models.IntegerField(default=0)
etc

How to store arbitrary name/value key pairs in a Django model?

I have a fixed data model that has a lot of data fields.
class Widget(Models.model):
widget_owner = models.ForeignKey(auth.User)
val1 = models.CharField()
val2 = models.CharField()
...
val568 = ...
I want to cram even more data into this Widget by letting my users specify custom data fields. What's a sane way to do this? Is storing name/value pairs where the user can specify additional "Widget fields" a good idea? My pseudo thoughts are below:
data_types = ('free_text', 'date', 'integer', 'price')
class CustomWidgetField(models.Model)
owner = ForeignKey(auth.User)
field_title = models.CharField(auth.User)
field_value_type = models.CharField(choices = data_types)
class CustomWidgetValue(models.Model)
field_type = ForeignKey(CustomWidgetField)
widget = ForeignKey(Widget)
value = models.TextField()
So I want to let each user build a new type of data field that will apply to all of their widgets and then specify values for each custom field in each widget. I will probably have to do filtering/searching on these custom fields just as I would on a native field (which I assume will be much slower than operating on native fields.) But the scale is to have a few dozen custom fields per Widget and each User will only have a few thousand Widgets in their inventory. I can also probably batch most of the searching/filtering on the custom fields into a backend script (maybe.)
Consider representing all custom properties with serialized dict. I used this in a recent project and it worked really well.
class Widget(models.Model):
owner = models.ForeignKey(auth.User)
props = models.TextField(blank=True) # serialized custom data
#property
def props_dict(self):
return simplejson.loads(self.props)
class UserProfile(models.Model)
user = models.ForeignKey(auth.User)
widget_fields = models.TextField(blank=True) # serialized schema declaration
It looks like you've reinvented the triple store. I think it's a common thing, as we follow the idea of database flexibility to its natural conclusion. Triple stores tend to be fairly inefficient in relational database systems, but there are systems designed specifically for them.
http://en.wikipedia.org/wiki/Triplestore
At the scales you're talking about, your performance is likely to be acceptable, but they don't generally scale well without a specialized DB.
In my opinion, the best way to achieve this sort of completely extensible model is really with EAV (Entity, Attribute, Value). Its basically a way to bring a schemaless non-relational database to SQL. You can read a bunch more about it on wikipedia, http://en.wikipedia.org/wiki/Entity-attribute-value_model but one of the better implementation of it in django is from the EveryBlock codebase. Hope it's a help!
http://github.com/brosner/everyblock_code/blob/master/ebpub/ebpub/db/models.py
http://github.com/tuttle/django-expando may be of interest to you.
When I had an object that could be completely customized by users, I created a field on the model that would contain some JSON in the column. Then you can just serialize back and forth when you need to use it or save it.
However, it does make it harder to use the data in SQL queries.