Django: Specifying Dynamic Database at Runtime - django

I'm attempting to trial setting up a system in Django whereby I specify the database connection to use at runtime. I feel I may need to go as low level as possible, but want to try and work within the idioms of Django where possible, perhaps stretching it as much as can be possible.
The general precise is that I have a centralised database that stores meta information Datasets - but the actual datasets are created as dyanmic models at runtime, in the database in question. I need to be able to specify which database to connect to at runtime to then extract the data back out...
I have kind of the following idea:
db = {}
db['ENGINE'] = 'django.db.backends.postgresql'
db['OPTIONS'] = {'autocommit': True}
db['NAME'] = my_model_db['database']
db['PASSWORD'] = my_model_db['password']
db['USER'] = my_model_db['user']
db['HOST'] = my_model_db['host']
logger.info("Connecting to database {db} on {host}".format(db=source_db['NAME'], host=source_db['HOST']))
connections.databases['my_model_dynamic_db'] = db
DynamicObj.objects.using('my_model_dynamic_db').all()
Has anyone achieved this? And how?

Related

Django - how to copy one, specific model instance between two databases

I have an application that has a model called CalculationResults. I also have two databases with the same scheme - let's call them "source" and "target". On source database, I have a specific instance of CalculationResults that I want to migrate to "target" database - preferably, I also want to change the field "owner" of this instance in the process. What is the easiest way to achieve this goal? The data I'm talking about is not huge, so it's rather a question of manpower vs computational power.
I have never done this, but I believe that following will work:
results = CalculationResults.objects.using('source').filter(field=search_field)
for result in results:
result.owner = 'new owner'
result.save(using='target')

SQL Server to Big Query DDL

I am new to big query and finding ways to migrate all the table structures from SQL Server to Big Query. There were close to 300 tables that needs to be created. Is there any way to do this automated or do in less time. Kindly throw some light on it as I presume many would have done this task.
Thanks in advance,
Venkat.
I recenty wrote a blog post about the subject
https://ael-computas.medium.com/copy-sql-server-data-to-bigquery-without-cdc-c520b408bddf
I also released a package on pypi that you can use as a base for your integrations
from database_to_bigquery.sql_server import SqlServerToCsv, SqlServerToBigquery
sql_server_to_csv = SqlServerToCsv(username="scott",
password="t1ger",
host="127.0.0.1//optionalinstance_name",
database="thedb",
destination=f"gs://gcsbucketname")
bigquery = SqlServerToBigquery(sql_server_to_csv=sql_server_to_csv)
result = bigquery.ingest_table(sql_server_table="table_to_read",
sql_server_schema="dbo",
bigquery_destination_project="bigqueryproject",
bigquery_destination_dataset="bigquerydataset")
print(result.full_str())
Just getting the schema could be something like this.
columns, pks = sql_server_to_csv.get_columns("table", "dbo")
bigquery.write_bigquery_schema(columns, "/path/to/file")
# /path/to/file can be in cloud storage, ie - gs://foo/bar.json
It does make some assumptions on datatypes, that you might want to override, and that is entirely possible.

How to retrieve values from Django ForeignKey -> ManyToMany fields?

I have a model (Realtor) with a ForeignKey field (BillingTier), which has a ManyToManyField (BillingPlan). For each logged in realtor, I want to check if they have a billing plan that offers automatic feedback on their listings. Here's what the models look like, briefly:
class Realtor(models.Model):
user = models.OneToOneField(User)
billing_tier = models.ForeignKey(BillingTier, blank=True, null=True, default=None)
class BillingTier(models.Model):
plans = models.ManyToManyField(BillingPlan)
class BillingPlan(models.Model):
automatic_feedback = models.BooleanField(default=False)
I have a permissions helper that checks the user permissions on each page load, and denies access to certain pages. I want to deny the feedback page if they don't have the automatic feedback feature in their billing plan. However, I'm not really sure the best way to get this information. Here's what I've researched and found so far, but it seems inefficient to be querying on each page load:
def isPermitted(user, url):
premium = [t[0] for t in user.realtor.billing_tier.plans.values_list('automatic_feedback') if t[0]]
I saw some solutions which involved using filter (ManyToMany field values from queryset), but I'm equally unsure of using the query for each page load. I would have to get the billing tier id from the realtor: bt_id = user.realtor.billing_tier.id and then query the model like so:
BillingTier.objects.filter(id = bt_id).filter(plans__automatic_feedback=True).distinct()
I think the second option reads nicer, but I think the first would perform better because I wouldn't have to import and query the BillingTier model.
Is there a better option, or are these two the best I can hope for? Also, which would be more efficient for every page load?
As per the OP's invitation, here's an answer.
The core question is how to define an efficient permission check based on a highly relational data model.
The first variant involves building a Python list from evaluating a Django query set. The suspicion must certainly be that it imposes unnecessary computations on the Python interpreter. Although it's not clear whether that's tolerable if at the same time it allows for a less complex database query (a tradeoff which is hard to assess), the underlying DB query is not exactly simple.
The second approach involves fetching additional 1:1 data through relational lookups and then checking if there is any record fulfilling access criteria in a different, 1:n relation.
Let's have a look at them.
bt_id = user.realtor.billing_tier.id: This is required to get the hook for the following 1:n query. It is indeed highly inefficient in itself. It can be optimized in two ways.
As per Django: Access Foreign Keys Directly, it can be written as bt_id = user.realtor.billing_tier_id because the id is of course present in billing_tier and needs not be found via a relational operation.
Assuming that the page in itself would only load a user object, Django can be told to fetch and cache relational data along with that via select_related. So if the page does not only fetch the user object but the required billing_tier_id as well, we have saved one additional DB hit.
BillingTier.objects.filter(id = bt_id).filter(plans__automatic_feedback=True).distinct() can be optimized using Django's exists because that will redurce efforts both in the database and regarding data traffic between the database and Python.
Maybe even Django's prefetch_related can be used to combine the 1:1 and 1:n queries into a single query, but it's much more difficult to judge whether that pays. Could be worth a try.
In any case, it's worth installing a gem called Django Debug Toolbar which will allow you to analyze how much time your implementation spends on database queries.

How can I feed data from Hibernate to the Weka Java API?

I am developing a data mining application with the Weka API, Java and MySQL DB connectivity. I want to feed data from the database to the algorithm. I used http://weka.wikispaces.com/Use+Weka+in+your+Java+code#Instances-Database.
Since I use Hibernate and the hibernate.cfg.xml file has the database connection information, can't I just write a normal method in the DAO class to retrieve data and then pass that to the algorithm?
The Weka API is, unfortunately, in some points quite constrained. As such, you will need to have Instances objects. IIRC this its not an interface that you could implement otherwise, but an actual object you have to create.
Therefore, you will likely need to query all your database and produce Instances out of it. Not using hibernate but raw database accesses will save you from doing things twice, thus needing twice as much memory.
I've recently done this with Hibernate, but there is no way that a hibernate class can simply be put into WEKA. I've done it this way:
generate a table in the database that has the model information available as you need it (I've done this since i would have needed to do very complex, time consuming queries for every row. This way, I do the heavy work once and just read it from a simple table)
create you POJO, DAO and what not
then just set-up your WEKA model
Sample Code (WEKA 3.7)
ArrayList<Attribute> atts = new ArrayList<Attribute>();
atts.add(new Attribute("attribute1"));
atts.add(new Attribute("attribute1"));
atts.add(new Attribute("id", (ArrayList<String>) null));
data = new Instances("yourData", atts, 0);
DAOModel dao = getYourDaoModelHereFromHibernateHoweverYouWantIt();
for (Model m : dao.findAll()) {
vals = new double[data.numAttributes()];
vals[0] = m.getAttribute1();
vals[1] = m.getAttribute2();
vals[2] = data.attribute(2).addStringValue(m.getId());
data.add(new DenseInstance(1.0, vals));
}
data now has the proper format and the algorithms can work with it (you could also save it to an .arff file if you want to work with the GUI)

Django create/alter tables on demand

I've been looking for a way to define database tables and alter them via a Django API.
For example, I'd like to be write some code which directly manipulates table DDL and allow me to define tables or add columns to a table on demand programmatically (without running a syncdb). I realize that django-south and django-evolution may come to mind, but I don't really think of these tools as tools meant to be integrated into an application and used by and end user... rather these tools are utilities used for upgrading your database tables. I'm looking for something where I can do something like:
class MyModel(models.Model): # wouldn't run syncdb.. instead do something like below
a = models.CharField()
b = models.CharField()
model = MyModel()
model.create() # this runs the create table (instead of a syncdb)
model.add_column(c = models.CharField()) # this would set a column to be added
model.alter() # and this would apply the alter statement
model.del_column('a') # this would set column 'a' for removal
model.alter() # and this would apply the removal
This is just a toy example of how such an API would work, but the point is that I'd be very interested in finding out if there is a way to programatically create and change tables like this. This might be useful for things such as content management systems, where one might want to dynamically create a new table. Another example would be a site that stores datasets of an arbitrary width, for which tables need to be generated dynamically by the interface or data imports. Dose anyone know any good ways to dynamically create and alter tables like this?
(Granted, I know one can do direct SQL statements against the database, but that solution lacks the ability to treat the databases as objects)
Just curious as to if people have any suggestions or approaches to this...
You can try and interface with the django's code that manages changes in the database. It is a bit limited (no ALTER, for example, as far as I can see), but you may be able to extend it. Here's a snippet from django.core.management.commands.syncdb.
for app in models.get_apps():
app_name = app.__name__.split('.')[-2]
model_list = models.get_models(app)
for model in model_list:
# Create the model's database table, if it doesn't already exist.
if verbosity >= 2:
print "Processing %s.%s model" % (app_name, model._meta.object_name)
if connection.introspection.table_name_converter(model._meta.db_table) in tables:
continue
sql, references = connection.creation.sql_create_model(model, self.style, seen_models)
seen_models.add(model)
created_models.add(model)
for refto, refs in references.items():
pending_references.setdefault(refto, []).extend(refs)
if refto in seen_models:
sql.extend(connection.creation.sql_for_pending_references(refto, self.style, pending_references))
sql.extend(connection.creation.sql_for_pending_references(model, self.style, pending_references))
if verbosity >= 1 and sql:
print "Creating table %s" % model._meta.db_table
for statement in sql:
cursor.execute(statement)
tables.append(connection.introspection.table_name_converter(model._meta.db_table))
Take a look at connection.creation.sql_create_model. The creation object is created in the database backend relevant to the database you are using in your settings.py. All of them are under django.db.backends.
If you must have ALTER table, I think you can create your own custom backend that extends an existing one and adds this functionality. Then you can interface with it directly through a ExtendedModelManager you create.
Quickly off the top of my head..
Create a Custom Manager with the Create/Alter methods.