Push changes to data in one Django model to another - django

I writing an application that has a change control workflow. Users retrieve data for a particular month and then they make edits to it and there is a review phase where they can approve records. There are 2 identical tables a master and a staging table. When the user loads up the application they load data from the master table and can edit it in a crud grid. When they hit the stage button I want that data to get pushed inserted into the staging table. How do i tell my view to do that. The staging table doesnt have the associated records yet, i want the records that are sent back as part of the push to get inserted there rather than doing an update to the master table?
Any advice would be greatly appreciated.

You can add new field name can be status in your master table which shows this record in staging process.
For example :
You inserted new record into master table so initial value of status will be 1 (new_created).
When you want to process Master table record you must change the status to 2 (in_staging) which is showing thia record in already in staging process can't process further.
Using new field can manage your process easily and you can check how many records are in staging process at a given time.
When you store the master record you can check which fields are changed or not in form.
Save master record with status 2 and copy the master record and save them into staging table.
After that when your staging completed you can use the same process to save your objects.

Related

Django starts primary key from 1 when there is already data in database

I have a pg database created by migrate with Django, all the tables have been created successfully and are empty at the start.
Now I have to fill one of the tables from a database backup created by pgdump. This database has a table transactions which contains data (consider no FK, and the schema of the table is same), so using pgrestore, I restored only that transaction table from the database backup. Everything restored and data is shown in the Django web app as well.
But Now when I create a new entry in that table using django web app, the django starts assigning the primary key from 1 to the newly created entry, but as the table is restored from a database backup, that id already exists, if I try again, django will try to assign PK 2, then 3 and so on. But those transactions have already been restored from DB backup
How to tell Django the last transaction id so that it can start assigning from there?
Django does not decide how the primary keys are generated for an AutoField [Django-doc] (or BigAutoField [Djang-doc] or SmallAutoField [Django-doc]): it is the database that assigns values for these.
For PostgreSQL, the database makes use of sequences, and each time it has to determine a value it updates the sequence, such that next time a different value will be given. You thus need to update that sequence.
As you found out yourself, you can do this with:
ALTER SEQUENCE public.modelname_id_seq RESTART some_value

Fetch desired ID contents from multiple servers

Imagine I have a distributed system with 500 servers. I have a main database server that stores some metadata and each entry’s primary key is the content ID. The actual content that’s related to the content ID spreads across 500 servers. But not all contentID’s content is in the 500 servers yet. Say only half of them are on the 500 servers.
How could I find out the contentIDs that are not deployed to any one of the 500 servers yet?
I’m thinking using map reduce style way to solve this but not sure how would the process be like.
Given the context in the question:
You can build a table in your database containing information about contentID to instance mapping.
Whenever an instance has a data for the given content ID, it needs to make a call and register the contentID.
If your instances can crash and you need to remove those content, you can implement health-check which will try to update your database every 30seconds~ 1 minute.
Now, whenever you need to access the instanceID for a given contentID and whether it has been loaded or not you can refer to the table above and check if the contentID has a instanceID with health-check time within 1 min.
Note: You can also consider using Zookeeper or In-Memory datastore like Redis for storing this data as well.

Reflecting changes on big tables in hdfs

I have an order table in the OLTP system.
Each order record has a OrderStatus field.
When end users created an order, OrderStatus field set as "Open".
When somebody cancels the order, OrderStatus field set as "Canceled".
When an order process finished(transformed into invoice), OrderStatus field set to "Close".
There are more than one hundred million record in the table in the Oltp system.
I want to design and populate data warehouse and data marts on hdfs layer.
In order to design data marts, I need to import whole order table to hdfs and then I need to reflect changes on the table continuously.
First, I can import whole table into hdfs in the initial load process by using sqoop. I may take long time but I will do this once.
When an order record is updated or a new order record entered, I need to reflect changes in hdfs. How can I achieve this in hdfs for such a big transaction table?
Thanks
One of the easier ways is to work with database triggers in your OLTP source db and every change an update happens use that trigger to push an update event to your Hadoop environment.
On the other hand (this depends on the requirements for your data users) it might be enough to reload the whole data dump every night.
Also, if there is some kind of last changed timestamp, it might be a possible way to load only the newest data and do some kind of delta check.
This all depends on your data structure, your requirements and your ressources at hand.
There are several other ways to do this but usually those involve messaging, development and new servers and I suppose in your case this infrastructure or those ressources are not available.
EDIT
Since you have a last changed date, you might be able to pull the data with a statement like
SELECT columns FROM table WHERE lastchangedate < (now - 24 hours)
or whatever your interval for loading might be.
Then process the data with sqoop or ETL tools or the like. If the records are already available in your Hadoop environment, you want to UPDATE it. If the records are not available, INSERT them with your appropriate mechanism. This is also called UPSERTING sometimes.

Sitecore Publishing Problems and determining item state

Can anyone explain to me what state the data should be in for a healthy sitecore instance in each database?
for example:
We currently have an issue with publishing in a 2 server setup.
Our staging server hosts the SQL instance and the authoring / staging instance of sitecore.
We then have a second server to host just the production website for our corp site.
When I look in the master database the PublishQueue table is full of entries and the same table in the web database is empty.
Is this correct?
No amount of hitting publish buttons is changing that at the moment.
How do I determine what the state of an item is in both staging and production environments without having to write an application on top of the sitecore API which I really don't have time for?
This is a normal behavior for the Publish Queue of the Web Database to be blank. The reason is because changes are made on the Master database which will add an entry in the Publish Queue.
After publishing, the item will not be removed from the Publish Queue table. It is the job of the CleanupPublishQueue to cleanup the publish queue table.
In general, tables WILL be different between the two databases as they are used for different purposes. Your master database is generally connected to by authors and the publishing logic, while the web database is generally used as a holding place for the latest published version of content that should be visible.
In terms of debugging publishing, from the Sitecore desktop, you can swap between 'master' and 'web' databases in the lower right corner and use the Content Editor to examine any individual item. This is useful for spot checking individual items have been published successfully.
If an item is missing from 'web', or the wrong version is in 'web', you should examine the following:
Publishing Restrictions on the item: Is there a restriction applied to the item or version that prevents it from publishing at this time?
Workflow state: Is the item/version in the final approved workflow state? You can use the workbox to do a quick check for items needing approval.
Connection strings: Is your staging system connection strings setup to connect to the correct 'web' used by the production delivery server?
The Database table [PublishQueue] is a table where all save and other mutations are stored. This table is used by a Incremental Publish. Sitecore get all the items from the PublishQueue table that were modified more recently than the last incremental publish date. The PublishQueue tabel is not used by a full publish
So it is okay that this Table contain a lot of records on the Master. The web database has the same database scheme. (not the same data, web contain only one version of a item, optimize for performance) The PublishQueue on the web is Empty this is normal.
To Know the state of an item compair the master version with the web version, there can be more than 1 webdatabase, The master database do not know the state/version of the web database

Auto id in database vs Auto id through code

I am working on an enterprise application developed in C++ and the database is mariadb. The application processes two audit files Authentication.log and SytemDetails.log.
Audit operation requires data to be inserted into two tables called Authentication table and SystemDetails table. Auto id is the primary key for Authentication table and foreign key in SystemDetails table.
Authentication table keeps Authentication information i.e session open, login session info and SystemDetails keeps details of command executed during each session.
Right now an Authid is auto generated in database, as follows:
Authentication :AuthID,ParentAuthID,info1,Info2....
SystemDetails:sysid,authid,info1,info2....
It works as follows:
App insert one Authentication record insert wihtout parentauthid
Gets the generated auth id
Update parentauthid field of Authentication table
Finds the related system details record
Gets the auth id from database and insert the record in database table.
Problem:
DB size 200k records (Authentication table).
I found 6000 record taking more than 30 min.
After analysis, I found that step 2 and step 3 is time taking processes the database grows.
I having feeling that it is better to generate the Auth id in C++ code instead of through the database. With this change we can remove step 3 and 5.
Which is better technique to generate Auto ID for table?
Generating unique IDs in the presence of multiple users is non-trivial, and probably requires a write to permanent storage. That's slow. A client-generated GUID would be faster.
That said, when inserting a new child record you should already know the parent ID (avoids step 3) and the system details similarly should only be inserted into the DB when Authentication record exists (avoids step 4). This is true regardless of where you generate those ID's.