Pythonic and SQLAlchemy best way to map data that requires multiple lookups

Pythonic and SQLAlchemy best way to map data that requires multiple lookups - python-2.7

I'm learning SQLAlchemy and using a database where multiple table lookups are needed to find a single piece of data.
I'm trying to find the best (most efficient and Pythonic) way to map the multiple lookups to a single SQLAlchemy object or reusable python method.
Ultimately, there will be dozens if not hundreds of mapped object such as these, so something like a .map file might be handy.
I.e. (Using pseudocode)
If I want to find the data 'Status' from 'Patient Name' have to use three tables.
Instead of writing a function for every potential 'this' to 'that' data request, is there an SQLAlchemy or Pythonic way to make the mappings?
I CAN make new, temporary SQLAlchemy Tables to store data. I am NOT at liberty to change the database I'm reading from. I'm hoping to reduce the number of individual calls to the database, because it is remote and slow.
I'm not sure a data join will work, because the Primary Keys, Foreign Keys and Column names are inconsistent in the database. But, I don't really know how to make select-joins in SQLAlchemy.
Perhaps I need to create a new table, with relationships to those three previous tables? But I'm not understanding the relationships well.
Can these tables be auto-generated from a map.ini file?
EDIT:
I might add, that some of these relationships could be one to many. I.e. a patient may be associated with more than one statusID...such as...

Related

Alternatives to dynamically creating model fields

I'm trying to build a web application where users can upload a file (specifically the MDF file format) and view the data in forms of various charts. The files can contain any number of time based signals (various numeric data types) and users may name the signals wildly.
My thought on saving the data involves 2 steps:
Maintain a master table as an index, to save such meta information as file names, who uploaded it, when, etc. Records (rows) are added each time a new file is uploaded.
Create a new table (I'll refer to this as data tables) for each file uploaded, within the table each column will be one signal (first column being timestamps).
This brings the problem that I can't pre-define the Model for the data tables because the number, name, and datatype of the fields will differ among virtually all uploaded files.
I'm aware of some libs that help to build runtime dynamic models but they're all dated and questions about them on SO basically get zero answers. So despite the effort to make it work, I'm not even sure my approach is the optimal way to do what I want to do.
I also came across this Postgres specifc model field which can take nested arrays (which I believe fits the 2-D time based signals lists). In theory I could parse the raw uploaded file and construct such an array and basically save all the data in one field. Not knowing the limit of size of data, this could also be a nightmare for the queries later on, since to create the charts it usually takes only a few columns of signals at a time, compared to a total of up to hundreds of signals.
So my question is:
Is there a better way to organize the storage of data? And how?
Any insight is greatly appreciated!

If the name, number and datatypes of the fields will differ for each user, then you do not need an ORM. What you need is a query builder or SQL string composition like Psycopg. You will be programatically creating a table for each combination of user and uploaded file (if they are different) and programtically inserting the records.
Using postgresql might be a good choice, you might also create a GIN index on the arrays to speed up queries.
However, if you are primarily working with time-series data, then using a time-series database like InfluxDB, Prometheus makes more sense.

Wrapping a Django query set in a raw query or vice versa?

I'm thinking of using a raw query to quickly get around limitations with either my brain or the Django ORM, but I don't want to redevelop the infrastructure required to support the existing ORM code such as filters. Right now I'm stuck with two dead ends:
Writing an inner raw query and reusing that like any other query set. Even though my raw query selects the correct columns, I can't filter on it:
AttributeError: 'RawQuerySet' object has no attribute 'filter'
This is corroborated by another answer, but I'm still hoping that that information is out of date.
Getting the SQL and parameters from the query set and wrapping that in a raw query. It seems the raw SQL should be retrievable using queryset.query.get_compiler(DEFAULT_DB_ALIAS).as_sql() - how would I get the parameters as well (obviously without actually running the query)?

One option for dealing with complex queries is to write a VIEW that encapsulates the query, and then stick a model in front of that. You will still be able to filter (and depending upon your view, you may even get push-down of parameters to improve query performance).
All you need to do to get a model that is backed by a view is have it as "unmanaged", and then have the view created by a migration operation.
It's better to try to write a QuerySet if you can, but at times it is not possible (because you are using something that cannot be expressed using the ORM, for instance, or you need to to something like a LATERAL JOIN).

Django Postgres ArrayField vs One-to-Many relationship

For a model in my database I need to store around 300 values for a specific field. What would be the drawbacks, in terms of performance and simplicity in query, if I use Postgres-specific ArrayField instead of a separate table with One-to-Many relationship?

If you use an array field
The size of each row in your DB is going to be a bit large thus Postgres is going to be using a lot more toast tables (http://www.postgresql.org/docs/9.5/static/storage-toast.html)
Every time you get the row, unless you specifically use defer (https://docs.djangoproject.com/en/1.9/ref/models/querysets/#defer) the field or otherwise exclude it from the query via only, or values or something, you paying the cost of loading all those values every time you iterate across that row. If that's what you need then so be it.
Filtering based on values in that array, while possible isn't going to be as nice and the Django ORM doesn't make it as obvious as it does for M2M tables.
If you use M2M
You can filter more easily on those related values
Those fields are postponed by default, you can use prefetch_related if you need them and then get fancy if you want only a subset of those values loaded
Total storage in the DB is going to be slightly higher with M2M because of keys, and extra id fields
The cost of the joins in this case is completely negligible because of keys.
Personally I'd say go with the M2M tables, but I don't know your specific application. If you're going to be working with a massive amount of data it's likely worth grabbing a representative dataset and testing both methods with it.

Django and Oracle nested table support

Can Django support Oracle nested tables or varrays or collections in some manner? Asking just for completeness as our project is reworking the data model, attempting to move away from EAV organization, but I don't like creating a bucket load of dependent supporting tables for each main entity.
e.g.
(not the proper Oracle syntax, but gets the idea across)
Events
eventid
report_id
result_tuple (result_type_id, result_value)
anomaly_tuple(anomaly_type_id, anomaly_value)
contributing_factors_tuple(cf_type_id, cf_value)
etc,
where the can be multiple rows of the tuples for one eventid
each of these tuples can, of course exist as separate tables, but this seems to be more concise. If it 's something Django can't do, or I can't modify the model classes to do easily, then perhaps just having django create the extra tables is the way to go.
--edit--
I note that django-hstore is doing something very similar to what I want to do, but using postgresql's hstore capability. Maybe I can branch off of that for an Oracle nested table implementation. I dunno...I'm pretty new to python and django, so my reach may exceed my grasp in this case.

Querying a nested table gives you a cursor to traverse the tuples, one member of which is yet another cursor, so you can get the rows from the nested table.

Django - Raw SQL Queries - What Happens in Joins

I'm reading that I can use raw SQL in Django and have Django actually build my models from the results.
However I'm wondering what happens if I use joins in the raw SQL. How will Django know what models to use?
(Are there any other issues I should be aware of?)

It's not the joins that matter, but the column names. You could, for example, do the following:
SELECT table.id, other_table.name AS name from table join other_table using (id)
and pass that into your table model. Django would then treat the names from other_table as though they were names from table and give your normal table instances. I can't imagine why you would want to do that though...
The important thing to remember is that Django is using a very simple mapping from your SQL to its model structure. You can subvert it if you want, but you'll probably end up with some hard to maintain code.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js