Manually adding data causes IntegrityError - django

I have two Django sites: one for development and one for production. Every once in a while, the data from the development database needs to be transferred to the production database or the other way around. I use postgresql.
This works fine: I empty the tables from the database I want to copy to, I generate sql from the applicable tables, and insert the data in the emptied tables. So far, so good.
But when I enter data into the database via the admin interface, Django raises IntegrityErrors, because appname_modelname_pkey already exists.
I think this is because the admin interface wants to add data with id 1, but that's already an imported record. Django isn't aware that id '1' is already taken.
How do I fix this problem? I want Django to increment the id (like SQL auto_increment would do), no matter what data is already stored.
Any help is appreciated!

If you're using postgres on both sides, then the sequence associated with the primary keys are going to be different.
For example, suppose you're moving production data to development. Also, suppose the sequence value in production is 20 and the sequence in development is 10. Then the first new item you add in development will have an id of 11. That id (probably) already exists in the production data, so you get an integrity error.
When you restore tables from a dump, you can reset the sequences by dropping and recreating the existing tables before you restore.
(Or, you can probably use the ALTER SEQUENCE command to sync up the sequences. However, I'm not familiar enough with postgres to say whether that's the right way to do it.)

Related

Am I required to use django data migrations?

I migrated a database that was manipulated via SQL to use Django's migration module: https://docs.djangoproject.com/en/3.0/topics/migrations/.
Initially I thought of using migrations only for changes in the model such as changes in columns or new tables, performing deletes, inserts and updates through SQL, as they are constant and writing a migration for each case would be a little impractical.
Could using the migration module without using data migrations cause me some kind of problem or inconsistency in the future?
You can think about data migration if you made a change that required also a manual fix on the data.
Maybe you decided to normalize something in the database, e.g. to split a name column to the first name and the last name. If you have only one instance of the application with one database and you are the only developer then you will also not write a data migration, but if you want to change it on a live production site with 24 x 7 hours traffic or you cooperate with other developers then you probably prepare a data migration for their databases or you will thoroughly test the migration on a copy of live data that the update will work on the production site correctly without issues and with minimal shutdown. If you don't write a data migration and you had no problem immediately then it is OK and will be not worse then a possible ill-conceived data migration.

Archive data after every year

I have lots of models in my project like Advertisements, UserDetails etc. Currently I have to delete the entire database every year so as to not create any conflicts between this year data and previous year data.
I want to implement a feature that can allow me to switch between different years. What can be the best way to implement this?
I think you could switch schemas in PostgreSQL. It's not completely straightforward. There are several ways to do that you can look into. The way I did it was to use a default search path for the Django database user account (e.g. user2018, user2019, etc) that only included the schema I wanted to use. I can't check the exact settings right now because my office network is down. You can also do it in settings.py or in each individual model using db_table according to what I've read, although both those solutions seem more convoluted that using the search path.
You would have to shutdown, change the database username in settings.py (or change the search path in PostgreSQL, change the schema over to a new one, and then run migrate to create the tables again. If you have reference data in any of the tables then schema-to-schema copies are easy to do.
Try searching for change django database schema postgresql to see what options there are for specifying the schema.

Rails Adding a column to a table at runtime

What are the cons of allowing a user to add a column to a table in the database at runtime in a production environment. Is there a correct way to do it?
Normally when using a relational DB we never extend the DB at runtime.
At high performance, this is basically impossible (requires modifying the entire dataset, so the request will hang for the user)
Besides that, we do not really want to give users the power to grow our dataset (adding a new column means requiring an extra field for every row in the DB).
However, some relational DBs like Postgres support unstructured data like JSON. This might serve your purpose.

PostgreSQL: update table with new records from the same table on remote server

We have a PostgreSQL server running in production and a plenty of workstations with an isolated development environments. Each one has its own local PostgreSQL server (with no replication with the production server). Developers need to receive updates stored in production server periodically.
I am trying to figure out how to dump the contents of several selected tables from server in order to update the tables on development workstations. The biggest challenge is that the tables I'm trying to synchronize may be diverged (developers may add - but not delete - new fields to the tables through the Django ORM, while schema of the production database remains unchanged for a long time).
Therefore the updated records and new fields of the tables stored on workstations must be preserved against the overwriting.
I guess that direct dumps (e.g. pg_dump -U remote_user -h remote_server -t table_to_copy source_db | psql target_db) are not suitable here.
UPD: If possible I would also like to avoid the use of third (intermediate) database while transferring the data from production database to the workstations.
I would recommend the following approach.
I'll outline example based on a single table customer.
We want to copy some entries from this table on production. Obviously, full table dump will break new stuff that exists on development envs;
Therefore, create a table with the similar structure, but a different name, say customer_$. Another way is to create a dedicated schema for such “copying” tables. You might also want to include a couple of extra columns there, like copy_id and/or copy_stamp;
Now you can INSERT INTO customer_$ SELECT ... to populate your copying table with wanted data. You might need to think of the way how to do this, though. In the tool we use here we can supply predicate data via the -w switch, like -w "customer_id IN (SELECT id FROM cust2copy)";
After you've populated your copying table(s), you can dump them. Make sure to use the following switches to the pg_dump:
--column-inserts to explicitly list target columns, for on development env copying table might have changed it's structure. This might be “slow” for big volumes though;
--table / -t to specify tables to dump.
On the target env, make sure to (1) empty copying tables and (2) prevent parallel activities of similar nature;
Load date into the copying tables;
The most interesting part comes: you need to check, that data you're bout to INSERT into the main tables will not conflict with any of the constraints defined on the tables. You might have:
PRIMARY KEY violations. You can (1) replace existing entries or (2) merge entries together or (3) skip entries from the copying tables or (4) choose to assign different ID in the copying tables;
UNIQUE KEY violations, most likely you'll have to UPDATE some columns in the copying tables;
FOREIGN KEY violations, you'll have either to give up on such entries, or to copy over missing stuff from the production as well;
CHECK violations, you'll have to investigate this ones manually.
After checks are done and data in the copying tables is fixed, you can copy it into the main tables.
This is a very formal description of the approach. Say, for step #7 we have a huge pile of extra tools to do ID or ID ranges remapping, to manipulate data in the copying tables, adjusting security settings, ownership, some defaults, etc.
Also, we have a so-called catalogue for this tool, which allows us to group logically tied tables under common names. Say, to copy customers from production we have to check round 50 tables in order to satisfy all possible dependencies.
I haven't seen similar tools in the wild though so far.

Coldfusion: Move data from one datasource to another

I need to move a series of tables from one datasource to another. Our hosting company doesn't give shared passwords amongst the databases so I can't write a SQL script to handle it.
The best option is just writing a little coldfusion scripty that takes care of it.
Ordinarily I would do something like:
SELECT * INTO database.table FROM database.table
The only problem with this is that cfquery's don't allow you to use two datasources in the same query.
I don't think I could use a QoQ's either because you can't tell it to use the second datasource, but to have a dbType of 'Query'.
Can anyone think of any intelligent ways of getting this done? Or is the only option to just loop over each line in the first query adding them individually to the second?
My problem with that is that it will take much longer. We have a lot of tables to move.
Ok, so you don't have a shared password between the databases, but you do seem to have the passwords for each individual database (since you have datasources set up). So, can you create a linked server definition from database 1 to database 2? User credentials can be saved against the linked server, so they don't have to be the same as the source DB. Once that's set up, you can definitely move data between the two DBs.
We use this all the time to sync data from our live database into our test environment. I can provide more specific SQL if this would work for you.
You CAN access two databases, but not two datasources in the same query.
I wrote something a few years ago called "DataSynch" for just this sort of thing.
http://www.bryantwebconsulting.com/blog/index.cfm/2006/9/20/database_synchronization
Everything you need for this to work is included in my free "com.sebtools" package:
http://sebtools.riaforge.org/
I haven't actually used this in a few years, but I can't think of any reason why it wouldn't still work.
Henry - why do any of this? Why not just use SQL manager to move over the selected tables usign the "import data" function? (right click on your dB and choose "import" - then use the native client and permissions for the "other" database to specify the tables. Your SQL manager will need to have access to both DBs, but the db servers themselves do not need access to each other. Your manager studio will serve as a conduit.