Using SQL SEQUENCEs with Django - django

I need to generate numbers that are unique and in sequence (for easy traceability). Also, they should be as gapless as possible (for decent non-repudiation).
SQL SEQUENCEs are the obvious answer. What's the cleanest way to use them with Django 1.8 and MySQL? They doesn't seem to be a good way to use SEQUENCEs with Django.

Since MySQL doesn't have separate sequences the only clear way is to create a new table that has autoincrementing IDs.
Thinking about the design in a super relational way, I am also able to store other related data in the new table.
The other data is otherwise being redundantly stored in other tables (redundantly since some of the other tables are for logging). However, storing the data in a more organised way will result in better traceability and smaller database size.

Related

Alternatives to dynamically creating model fields

I'm trying to build a web application where users can upload a file (specifically the MDF file format) and view the data in forms of various charts. The files can contain any number of time based signals (various numeric data types) and users may name the signals wildly.
My thought on saving the data involves 2 steps:
Maintain a master table as an index, to save such meta information as file names, who uploaded it, when, etc. Records (rows) are added each time a new file is uploaded.
Create a new table (I'll refer to this as data tables) for each file uploaded, within the table each column will be one signal (first column being timestamps).
This brings the problem that I can't pre-define the Model for the data tables because the number, name, and datatype of the fields will differ among virtually all uploaded files.
I'm aware of some libs that help to build runtime dynamic models but they're all dated and questions about them on SO basically get zero answers. So despite the effort to make it work, I'm not even sure my approach is the optimal way to do what I want to do.
I also came across this Postgres specifc model field which can take nested arrays (which I believe fits the 2-D time based signals lists). In theory I could parse the raw uploaded file and construct such an array and basically save all the data in one field. Not knowing the limit of size of data, this could also be a nightmare for the queries later on, since to create the charts it usually takes only a few columns of signals at a time, compared to a total of up to hundreds of signals.
So my question is:
Is there a better way to organize the storage of data? And how?
Any insight is greatly appreciated!
If the name, number and datatypes of the fields will differ for each user, then you do not need an ORM. What you need is a query builder or SQL string composition like Psycopg. You will be programatically creating a table for each combination of user and uploaded file (if they are different) and programtically inserting the records.
Using postgresql might be a good choice, you might also create a GIN index on the arrays to speed up queries.
However, if you are primarily working with time-series data, then using a time-series database like InfluxDB, Prometheus makes more sense.

Pythonic and SQLAlchemy best way to map data that requires multiple lookups

I'm learning SQLAlchemy and using a database where multiple table lookups are needed to find a single piece of data.
I'm trying to find the best (most efficient and Pythonic) way to map the multiple lookups to a single SQLAlchemy object or reusable python method.
Ultimately, there will be dozens if not hundreds of mapped object such as these, so something like a .map file might be handy.
I.e. (Using pseudocode)
If I want to find the data 'Status' from 'Patient Name' have to use three tables.
Instead of writing a function for every potential 'this' to 'that' data request, is there an SQLAlchemy or Pythonic way to make the mappings?
I CAN make new, temporary SQLAlchemy Tables to store data. I am NOT at liberty to change the database I'm reading from. I'm hoping to reduce the number of individual calls to the database, because it is remote and slow.
I'm not sure a data join will work, because the Primary Keys, Foreign Keys and Column names are inconsistent in the database. But, I don't really know how to make select-joins in SQLAlchemy.
Perhaps I need to create a new table, with relationships to those three previous tables? But I'm not understanding the relationships well.
Can these tables be auto-generated from a map.ini file?
EDIT:
I might add, that some of these relationships could be one to many. I.e. a patient may be associated with more than one statusID...such as...

Container for in-memory representation of a DB table

Let's say I have a (MySQL) DB. I want to automate the update of this database via an application, that will:
1. Import from DB
2. Calculate updated data
3. Export back updated data
The timing is important, I don't want to import while calculating, in fact I don't want any queries then; I want to import (a) table(s) as a whole, then calculate. So, my question is, if a row is represented with an instance of a class, then what container do I put these objects into?
A vector? A set? What about ordered vs. unordered? Just use what seems best for my case according to big O times? Any special traps to fall into here? Is this case no different than with data "born in memory", so the only things to consider besides size overhead are "do I want the lookup or the insertion to be faster" ?
Probably the best route is to use some ORM, but let's say I don't want to.
I've seen some apps use boost::unordered_set, and I wondered, if there is a particular reason for its use...
I use a jdbc-like interface as the connector (libmysqlcpp).
I do not think that the container you have to use can be guessed with so few information. It mainly depends of the data size, type and the algorithm you will run.
But my main concern over such a design is that it will quickly choke your network or your base and database. If you have a big table you'll:
select all the data from the table
retrieve all the data over the network
process on you machine part (some columns ?) or the entirety of the data
push the data over the network
update your rows (or erase/replace maybe)
Why don't you consider working directly on the mysql server ? You create your user defined function that work on the directly data, saving the network and even taking advantage of the fact that mysql is built to handle gigantic amount of data, quantity that an in-memory container is not built to handle.

Django and Oracle nested table support

Can Django support Oracle nested tables or varrays or collections in some manner? Asking just for completeness as our project is reworking the data model, attempting to move away from EAV organization, but I don't like creating a bucket load of dependent supporting tables for each main entity.
e.g.
(not the proper Oracle syntax, but gets the idea across)
Events
eventid
report_id
result_tuple (result_type_id, result_value)
anomaly_tuple(anomaly_type_id, anomaly_value)
contributing_factors_tuple(cf_type_id, cf_value)
etc,
where the can be multiple rows of the tuples for one eventid
each of these tuples can, of course exist as separate tables, but this seems to be more concise. If it 's something Django can't do, or I can't modify the model classes to do easily, then perhaps just having django create the extra tables is the way to go.
--edit--
I note that django-hstore is doing something very similar to what I want to do, but using postgresql's hstore capability. Maybe I can branch off of that for an Oracle nested table implementation. I dunno...I'm pretty new to python and django, so my reach may exceed my grasp in this case.
Querying a nested table gives you a cursor to traverse the tuples, one member of which is yet another cursor, so you can get the rows from the nested table.

Data browsing/querying tool for loosely structured data

I have a set of statistical data (about 100M size), which is organized in key-value pairs, some of the values are just numbers (e.g. like person's age or weight) and some are hierarchical (e.g. like person's employments - it can have a set of employment records, each again containing key/value pairs, etc.). The real data is not exactly these but the structure is similar.
I need to query these data with arbitrary set of criteria - i.e. I may want to ask something like "where 20 oldest persons worked 3 years ago" or "what is the sum of all salaries for all people that ever worked at company X for more than a year", or "give me all you know on people that found a new job recently", etc.
I can program each individual query pretty easily but since there can be many of them and they vary all the time it becomes tedious to program each one anew, so the question is if there's an existing tool that would make it easier for me to do such queries (if it has a nice GUI that's a bonus :). Something like SQL wouldn't work well because data fields aren't really fixed and making hierarchy work in SQL would be too much trouble IMHO. So is there a tool that I could use with relative ease for this task (i.e. not learning a whole new language for that - I'd better stay with hand-coding the queries then)?
You may want to look at MongoDB. It is a JSON data store, so it essentially works with key/value pairs, and you can nest JSON within JSON. It uses JavaScript as the query language. Of course, you'd need to convert your data to JSON, but this is not difficult.
Another option may be a graph database like Neo4j. Each record is a node and you can define relationships between nodes (visualized as edges).
I do not think either of these have any type of GUI, but they are pretty easy to query. MongoDB uses JS with bindings you can use to call the DB. Neo4j uses Java, but there are some bindings for other languages.
SQL queries would be challenging, but it would work. I will also throw PostgreSQL as an option since it is somewhat object oriented, but I am more familiar with the others.