SQLAlchemy Reflection Using Metaclass with Column Override - python-2.7

I have a set of dynamic database tables (Postgres 9.3 with PostGIS) that I am mapping using a python metaclass:
cls = type(str(tablename), (db.Model,), {'__tablename__':tablename})
where, db.Model is the db object via flask-sqlalchemy and tablename is a bit of unicode.
The cls is then added to an application wide dictionary current_app.class_references (using Flask's current_app) to avoid attempts to instantiate the class multiple times.
Each table contains a geometry column, wkb_geometry stored in Well Known Binary. I want to map these to use geoalchemy2 with the final goal of retrieving GeoJSON.
If I was declaring the table a priori, I would use:
class GeoPoly():
__tablename__ = 'somename'
wkb_geometry = db.Column(Geometry("POLYGON"))
#more columns...
Since I am trying to do this dynamically, I need to be able to override the reflection of cls1 with the known type.
Attempts:
Define the column explicitly, using the reflection override syntax.
cls = type(str(tablename), (db.Model,), {'__tablename__':tablename,
'wkb_geometry':db.Column(Geometry("POLYGON"))})
which returns the following on a fresh restart, i.e. the class has not yet been instantiated:
InvalidRequestError: Table 'tablename' is already defined for this MetaData instance. Specify 'extend_existing=True' to redefine options and columns on an existing Table object
Use mixins with the class defined above (sans tablename):
cls = type(str(tablename), (GeoPoly, db.Model), {'__tablename__':tablename})
Again MetaData issues.
Override the column definition attribute after the class is instantiated:
cls = type(str(tablename), (db.Model,), {'__tablename__':tablename})
current_app.class_references[tablename] = cls
cls.wkb_geometry = db.Column(Geometry("POLYGON"))
Which results in:
InvalidRequestError: Implicitly combining column tablename.wkb_geometry with column tablename.wkb_geometry under attribute 'wkb_geometry'. Please configure one or more attributes for these same-named columns explicitly.
Is it possible to use the metadata construction to support dynamic reflection **and* *override a column known will be available on all tables?

I'm not sure if I exactly follow what you're doing, but I've overridden reflected columns in the past inside my own __init__ method on a custom metaclass that inherits from DeclarativeMeta. Any time the new base class is used, it checks for a 'wkb_geometry' column name, and replaces it with (a copy of) the one you created.
import sqlalchemy as sa
from sqlalchemy.ext.declarative import DeclarativeMeta, declarative_base
wkb_geometry = db.Column(Geometry("POLYGON"))
class MyMeta(DeclarativeMeta):
def __init__(cls, clsname, parents, dct):
for key, val in dct.iteritems():
if isinstance(sa.Column) and key is 'wkb_geometry':
dct[key] = wkb_geometry.copy()
MyBase = declarative_base(metaclass=MyMeta)
cls = type(str(tablename), (MyBase,), {'__tablename__':tablename})
This may not exactly work for you, but it's an idea. You probably need to add db.Model to the MyBase tuple, for example.

This is what I use to customize a particular column while relying on autoload for everything else. The code below assumes an existing declarative Base object for a table named my_table. It loads the metadata for all columns but overrides the definition of a column named polygon:
class MyTable(Base):
__tablename__ = 'my_table'
__table_args__ = (Column(name='polygon', type=Geometry("POLYGON"),
{'autoload':True})
Other arguments to the Table constructor can be provided in the dictionary. Note that the dictionary must appear last in the list!
The SQLAlchemy documentation Using a Hybrid Approach with __table__ provides more details and examples.

Related

Django add objects to Related Manager during creation

Consider a simple ForeignKey relationship:
class A(Model):
pass
class B(Model):
a = ForeignKey(A)
I have an API view that creates an A and a set of B's based on outside data (data NOT passed from the user), then serializes the created objects and returns the serialized data. My object creation code looks something like:
a = A()
a.b_set.bulk_create(B(a=a) for b in [...])
My issue is that this does not add the B objects to a's b_set, so that if I were to run
print(a.b_set.all())
afterwards, it would re-query the DB to get b_set. This is unnecessary though, because I already have a's entire b_set as I just created it. I'm doing this with a series of nested objects so it results in a LOT of unnecessary queries. My current workaround is to, after creation, run a query like
A.objects.prefetch_related('b_set').get(a=a.id)
then serializer that fetched object. This limits serializtion to just one unnecessary query, but I'd like to eliminate that one as well. It seems to me like there should be a way to cache the created B objects on a, and eliminate
any need to hit the DB again during serialization.
I believe you need to first execute a.save() before you can bulk_create. Here are my results using the two models you described:
a = A()
a.save()
a.b_set.bulk_create([B(a=a), B(a=a), B(a=a)])
a.b_set.count()
>>> 3
After some investigation into the QuerySet and Model source code, I decided my best/only option was to directly modify _prefetched_objects_cache on each object. Definitely not pretty but it works. Here's the gist of what I did:
a = A()
b_set = a.b_set.bulk_create(B(a=a) for b in [...])
a._prefetch_related_cache = {}
a._prefetch_related_cache['b_set'] = b_set
This ensures that all the created B's are cached on a. Note that if B has an auto-created primary key field, those fields won't be populated in the objects returned by bulk_create with most backends. Fortunately I'm using PostgreSQL which returns auto-PKs from bulk_create so that's not a problem for me.

ndb verify entity uniqueness in transaction

I've been trying to create entities with a property which should be unique or None something similar to:
class Thing(ndb.Model):
something = ndb.StringProperty()
unique_value = ndb.StringProperty()
Since ndb has no way to specify that a property should be unique it is only natural that I do this manually like this:
def validate_unique(the_thing):
if the_thing.unique_value and Thing.query(Thing.unique_value == the_thing.unique_value).get():
raise NotUniqueException
This works like a charm until I want to do this in an ndb transaction which I use for creating/updating entities. Like:
#ndb.transactional
def create(the_thing):
validate_unique(the_thing)
the_thing.put()
However ndb seems to only allow ancestor queries, the problem is my model does not have an ancestor/parent. I could do the following to prevent this error from popping up:
#ndb.non_transactional
def validate_unique(the_thing):
...
This feels a bit out of place, declaring something to be a transaction and then having one (important) part being done outside of the transaction. I'd like to know if this is the way to go or if there is a (better) alternative.
Also some explanation as to why ndb only allows ancestor queries would be nice.
Since your uniqueness check involves a (global) query it means it's subject to the datastore's eventual consistency, meaning it won't work as the query might not detect freshly created entities.
One option would be to switch to an ancestor query, if your expected usage allows you to use such data architecture, (or some other strongly consistent method) - more details in the same article.
Another option is to use an additional piece of data as a temporary cache, in which you'd store a list of all newly created entities for "a while" (giving them ample time to become visible in the global query) which you'd check in validate_unique() in addition to those from the query result. This would allow you to make the query outside the transaction and only enter the transaction if uniqueness is still possible, but the ultimate result is the manual check of the cache, inside the transaction (i.e. no query inside the transaction).
A 3rd option exists (with some extra storage consumption as the price), based on the datastore's enforcement of unique entity IDs for a certain entity model with the same parent (or no parent at all). You could have a model like this:
class Unique(ndb.Model): # will use the unique values as specified entity IDs!
something = ndb.BooleanProperty(default=False)
which you'd use like this (the example uses a Unique parent key, which allows re-using the model for multiple properties with unique values, you can drop the parent altogether if you don't need it):
#ndb.transactional
def create(the_thing):
if the_thing.unique_value:
parent_key = get_unique_parent_key()
exists = Unique.get_by_id(the_thing.unique_value, parent=parent_key)
if exists:
raise NotUniqueException
Unique(id=the_thing.unique_value, parent=parent_key).put()
the_thing.put()
def get_unique_parent_key():
parent_id = 'the_thing_unique_value'
parent_key = memcache.get(parent_id)
if not parent_key:
parent = Unique.get_by_id(parent_id)
if not parent:
parent = Unique(id=parent_id)
parent.put()
parent_key = parent.key
memcache.set(parent_id, parent_key)
return parent_key

Updating derived values in SQLAlchemy

Usual sqlalchemy usage:
my_prop = Column("my_prop", Text)
I would like different semantics. Let's say an object has a set of fields (propA, propB, propC). I would like to maintain a database column which is derived from these fields (let's say, propA + propB + propC). I would like the column to be updated whenever any one of these set of fields is updated. Thank you.
Hybrid properties provide the functionality you are looking for. They allow you to write python properties that are usable in queries.
Here's how you might start if you wanted to have a name column and provide access to first and last name properties.
#hybrid_property
def first_name(self):
# get the first name from the name column
#first_name.setter
def first_name(self, value):
# update the name column with the first name replaced
#first_name.expression
def first_name(cls):
# return a sql expression that extracts the first name from the name column
# this is appropriate to be used in queries

Can I safely assume that Django models IDs are unique upon save()

I need to store matches in my database and those matches already have a unique ID where they come from. For further assistance and referring, it is best for me to keep this ID:
match = Match(id=my8digitsid)
match.save()
However, incoming matches (not played yet) don't have an ID yet. Can I safely save my match as follow:
match = Match()
match.save
And then, once the match played modify it as such:
match.id = my8digitsid
When I say safely, I mean whether or not that the default ID generated (auto-incremented I guess) is unique and won't have any conflicts with my self-made IDs.
Yes, you can be sure that the ORM will make unique id's as referred in the documentation here. The database is the one calculating the new number.
If a model has an AutoField — an auto-incrementing primary key — then
that auto-incremented value will be calculated and saved as an
attribute on your object the first time you call save():
>>> b2 = Blog(name='Cheddar Talk', tagline='Thoughts on cheese.')
>>> b2.id # Returns None, because b doesn't have an ID yet.
>>> b2.save()
>>> b2.id # Returns the ID of your new object. There’s no way to tell what the value of an ID will be before you call save(), because
that value is calculated by your database, not by Django.
For convenience, each model has an AutoField named id by default
unless you explicitly specify primary_key=True on a field in your
model.
You can also provide the Id if you want using this. I copy below the info from Django documentation.
Explicitly specifying auto-primary-key values If a model has an
AutoField but you want to define a new object’s ID explicitly when
saving, just define it explicitly before saving, rather than relying
on the auto-assignment of the ID:
>>> b3 = Blog(id=3, name='Cheddar Talk', tagline='Thoughts on cheese.')
>>> b3.id # Returns 3.
>>> b3.save()
>>> b3.id # Returns 3.
If you assign auto-primary-key values manually, make sure not to use
an already-existing primary-key value! If you create a new object with
an explicit primary-key value that already exists in the database,
Django will assume you’re changing the existing record rather than
creating a new one.
Given the above 'Cheddar Talk' blog example, this example would
override the previous record in the database:
b4 = Blog(id=3, name='Not Cheddar', tagline='Anything but cheese.')
b4.save() # Overrides the previous blog with ID=3!
But I don't recommend You to assign that ID yourself. I think more convenient to create a field of the model with the ID from where they come from.
The reason Why I don't recommend this is because you will have to verify always that the id provided has not been used before before inserting it. As a general rule I try to avoid modifying the standard behaviour of Django as much as possible.

Why do you have to set both id and pk to None to copy a Django model with inheritance?

The Django docs recommend copying a model instance thus:
original.pk = None
original.save()
But if you "use inheritance" -- apparently meaning if the model's class is a subclass of a subclass of models.Model -- you need to do it slightly differently.
Specifically, the doc says:
Due to how inheritance works, you have to set both pk and id to None:
and gives an example analogous to this:
original.pk = None
original.id = None
original.save()
This seems kludgey. In any case, I'd like to understand what's going on. Why does using inheritance require you to set the id field to None also? Don't all Django models inherit from models.Model in any case?
(NOTE: I'm omitting the bit from the doc about copying m2m fields, which incidentally seems even more kludgey.)
It's because MTI (Multiple Table Inheritance), the type you're talking about here, stores the object across multiple tables. Take this example:
class Animal(models.Model):
...
class Dog(Animal):
...
When you create a Dog, all the fields on Animal are saved into the table for Animal, and just the fields directly on Dog are saved to the table for Dog. When you lookup the Dog later, Django queries both tables and stitches them together.
Both tables, however need primary keys, and Django uses AutoFields for that, which are simply positive integer fields. So Dog has an id and Animal has an id. The pk is filled with the id for the Animal part because this is the main piece, and Dog's id doesn't matter. However, if you're going to make a copy, you need to copy both pieces. Otherwise, the Animal part will of the copy will not get it's own Dog part of the copy.