I'm putting together a partitioned table in postgres which will be used by an API written in Django. Postgres has a number of issues with this, most of them having to do with the RETURNING clause in SQL returning NULL or creating duplicate records (google postgres partition returning if you want to learn more).
I believe the solution is to override the save() method in ORM to use a stored procedure or custom SQL, but how do I map the incoming arguments to a custom SQL statement?
Ideally it would look like this but instead of calling the super method it would map the args to a custom SQL statement.
The simplest way is make trigger before insert/update on PostgreSQL site.
Related
I am performing Mysql to bigquery data migration using jdbc to bigquery template in dataflow.
But while performing "select * from teable1" command on mysql, i also want to insert the selected data to another table in same database for some reason.
How can i perform both select and insert queries in dataflow template? I got error when used semicolon between two queries.
The Jdbc to Bigquery template will write all data you read to the table specified under "Bigquery output table" (<my-project>:<my-dataset>.<my-table>), so there is no need to write the insert statement.
(The parameter is "outputTable" for gcloud/REST)
As #PeterKim mentioned the JDBC to BigQuery termplate could be not the best approach for your use case.
You could try to use that template as reference and modify it to write into MySQL, in this post you will find an implementation about how to make an insert into MYSQL database.
After modifying the pipeline source code you can create a custom template.
I'm trying to integrate this RAW query into a django ORM query but I'm facing problems to apply the raw query and the orm query
The original query which works fine with postgres querytools:
"SELECT SUM(counter),type, manufacturer
FROM sells GROUP BY manufacturer, type"
Now I tried to integrate this into a django-orm query like this:
res_postgres = Sells.objects.all().values('manufacturer','type','counter').aggregate(cnter=Sum('counter'))
But what I get is just a the counter cnter ...
What I need is the result from the Raw query which looks like this
What I also tried is to use values and field names. Like Sells.objects.values('manufacturer'....).aggregate(cnter=Sum(counter))
But then django is building a query which integrates a GROUP BY id. Which is not what I need. I need an aggregation of the entire data not the object level while keeping the information of the other fields.
When I use Cars.objects.raw() it asks me about primary keys, which I also don't need.
Any hints here? is that possible with Django ORM at all?
Use annotate(...) instead of aggregate()
res_postgres = Sells.objects.values('manufacturer','type').annotate(cnter=Sum('counter'))
I have been crawling around its doc but mostly it uses database with model.
The problem is my database is too large and I don't want to create any models
since it's legacy one, and
I will have to call different tables dynamically,
so I just want to pull data from it. Is that possible in django?
You can go around the model layer and use sql directly. However, you will have to process the tables in python, not having the advantage of using ORM objects.
https://docs.djangoproject.com/en/1.10/topics/db/sql/#executing-custom-sql-directly
As pointed out in a comment, Django provides a way to automatically generate the models from the legacy database with inspectdb.
This guide describes the few manual steps required to "clean" the automatically generated models.
While this doesn't directly answer the stated question of avoiding models, it does address your issue of not wanting to create them yourself, due to the large database.
Data should be stored somewhere. There are a lot of ways to store data, but the most reliable one is a database (hence the name).
You could be storing data in a JSON file and save that. You could also be storing data in environment variables. You can even store data in a plain text file. All of those are NOT recommended. I would just try to use a database, any type of database (MongoDB / Postgres / MySQL, anything). That's what it is meant for.
I think this is a recurrent question in the Internet, but unfortunately I'm still unable to find a successful answer.
I'm using Ruby on Rails 4 and I would like to create a model that interfaces with a SQL query, not with an actual table in the database. For example, let's suppose I have two tables in my database: Questions and Answers. I want to make a report that contains statistics of both tables. For such purpose, I have a complex SQL statement that takes data from these tables to build up the statistics. However the SELECT used in the SQL statement does not directly take values from neither Answers nor Questions tables, but from nested SELECTs.
So far I've been able to create the StatItem model, without any migration, but when I try StatItem.find_by_sql("...nested selects...") the system complains about unexisting table stat_items in the database.
How can I create a model whose instance's data is retrieved from a complex query and not from a table? If it's not possible, I could create a temporary table to store the data in there. In such case, how can I tell the migration file to not create such table (it would be created by the query)?
How about creating a materialized view from your complex query and following this tutorial:
ActiveRecord + PostgreSQL Materialized Views
Michael Kohl and his proposal of materialized views has given me an idea, which I initially discarded because I wrongly thought that a single database connection could be shared by two processes, but after reading about how Rails processes requests, I think my solution is fine.
STEP 1 - Create the model without migration
rails g model StatItem --migration=false
STEP 2 - Create a temporary table called stat_items
#First, drop any existing table created by older requests (database connections are kept open by the server process(es).
ActiveRecord::Base.connection.execute('DROP TABLE IF EXISTS stat_items')
#Second, create the temporary table with the desired columns (notice: a dummy column called 'id:integer' should exist in the table)
ActiveRecord::Base.connection.execute('CREATE TEMP TABLE stat_items (id integer, ...)')
STEP 3 - Execute an SQL statement that inserts rows in stat_items
STEP 4 - Access the table using the model, as usual
For example:
StatItem.find_by_...
Any comments/improvements are highly appreciated.
I was wondering this could produce any problem if I directly add rows or remove some from a model table. I thought maybe Django records the number of rows in all tables? or this could mess up the auto-generated id's?
I don't think it matters but I'm using MySql.
No, it's not a problem because Django does the same that you do "directly" to the database, it execute SQL statements, and the auto generated id is handled by the database server (MySql server in this case), no matter where that SQL queries comes from, whatever it is Mysql Client or Django.
Since you can have Django work on a pre-existing database (one that wasn't created by Django), I don't think you will have problems if you access/write the tables of your own app (you might want to avoid modifying Django's internal tables like auth, permission, content_type etc until you are familiar with them)
When you create a model through Django, Django doesn't store the count or anything (unless your app does), so it's okay if you create the model with Django on the SQL database, and then have another app write/read from that same SQL table
If you use Django signals, those will not be triggered by modifying the SQL table directly through the DB, so you might want to pay attention to side effects like that.
Your RDBMS handles it's own auto generated IDs and referential integrity, counts etc, so you don't have to worry about messing it up.