Auto id in database vs Auto id through code - c++

I am working on an enterprise application developed in C++ and the database is mariadb. The application processes two audit files Authentication.log and SytemDetails.log.
Audit operation requires data to be inserted into two tables called Authentication table and SystemDetails table. Auto id is the primary key for Authentication table and foreign key in SystemDetails table.
Authentication table keeps Authentication information i.e session open, login session info and SystemDetails keeps details of command executed during each session.
Right now an Authid is auto generated in database, as follows:
Authentication :AuthID,ParentAuthID,info1,Info2....
SystemDetails:sysid,authid,info1,info2....
It works as follows:
App insert one Authentication record insert wihtout parentauthid
Gets the generated auth id
Update parentauthid field of Authentication table
Finds the related system details record
Gets the auth id from database and insert the record in database table.
Problem:
DB size 200k records (Authentication table).
I found 6000 record taking more than 30 min.
After analysis, I found that step 2 and step 3 is time taking processes the database grows.
I having feeling that it is better to generate the Auth id in C++ code instead of through the database. With this change we can remove step 3 and 5.
Which is better technique to generate Auto ID for table?

Generating unique IDs in the presence of multiple users is non-trivial, and probably requires a write to permanent storage. That's slow. A client-generated GUID would be faster.
That said, when inserting a new child record you should already know the parent ID (avoids step 3) and the system details similarly should only be inserted into the DB when Authentication record exists (avoids step 4). This is true regardless of where you generate those ID's.

Related

Django starts primary key from 1 when there is already data in database

I have a pg database created by migrate with Django, all the tables have been created successfully and are empty at the start.
Now I have to fill one of the tables from a database backup created by pgdump. This database has a table transactions which contains data (consider no FK, and the schema of the table is same), so using pgrestore, I restored only that transaction table from the database backup. Everything restored and data is shown in the Django web app as well.
But Now when I create a new entry in that table using django web app, the django starts assigning the primary key from 1 to the newly created entry, but as the table is restored from a database backup, that id already exists, if I try again, django will try to assign PK 2, then 3 and so on. But those transactions have already been restored from DB backup
How to tell Django the last transaction id so that it can start assigning from there?
Django does not decide how the primary keys are generated for an AutoField [Django-doc] (or BigAutoField [Djang-doc] or SmallAutoField [Django-doc]): it is the database that assigns values for these.
For PostgreSQL, the database makes use of sequences, and each time it has to determine a value it updates the sequence, such that next time a different value will be given. You thus need to update that sequence.
As you found out yourself, you can do this with:
ALTER SEQUENCE public.modelname_id_seq RESTART some_value

Push changes to data in one Django model to another

I writing an application that has a change control workflow. Users retrieve data for a particular month and then they make edits to it and there is a review phase where they can approve records. There are 2 identical tables a master and a staging table. When the user loads up the application they load data from the master table and can edit it in a crud grid. When they hit the stage button I want that data to get pushed inserted into the staging table. How do i tell my view to do that. The staging table doesnt have the associated records yet, i want the records that are sent back as part of the push to get inserted there rather than doing an update to the master table?
Any advice would be greatly appreciated.
You can add new field name can be status in your master table which shows this record in staging process.
For example :
You inserted new record into master table so initial value of status will be 1 (new_created).
When you want to process Master table record you must change the status to 2 (in_staging) which is showing thia record in already in staging process can't process further.
Using new field can manage your process easily and you can check how many records are in staging process at a given time.
When you store the master record you can check which fields are changed or not in form.
Save master record with status 2 and copy the master record and save them into staging table.
After that when your staging completed you can use the same process to save your objects.

Django REST Framework as backend for tracking user history

I'm trying to track user history using a DRF backend. For that, I've created a history table that will get a new timestamped row with each update. This model is a many-to-one and has a reference to the user model by a foreign key.
Here comes the confusing part. Every time I pull up the user profile, I would like to also pull the last entry into the history table. Thus, I am considering adding a couple of columns in the table which get updated with every insert into the history table since this is probably less expensive than performing a secondary lookup each time. I'd like some feedback on this approach.
Additionally, I'm slightly confused by how to perform this update/insert combination via a single API endpoint as DRF seems to only support one-to-one CRUD.
For illustrative purposes, I'd like to achieve the following via a single API view:
User hits API endpoint with access token and update values --> Insert history table --> update user table for user's row with inserted details
Thanks!

DynamoDB table/index schema design for querying multi-valued attributes

I'm building a DynamoDB app that will eventually serve a large number (millions) of users. Currently the app's item schema is simple:
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
email: "foo#foo.com",
... other attributes ...
}
When a new user signs up, or if a user wants to find another user by email address, we'll need to look up users by email instead of by userId. With the current schema that's easy: just use a global secondary index with email as the Partition Key.
But we want to enable multiple email addresses per user, and the DynamoDB Query operation doesn't support a List-typed KeyConditionExpression. So I'm weighing several options to avoid an expensive Scan operation every time a user signs up or wants to find another user by email address.
Below is what I'm planning to change to enable additional emails per user. Is this a good approach? Is there a better option?
Add a sort key column (e.g. itemTypeAndIndex) to allow multiple items per userId.
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
itemTypeAndIndex: "main", // sort key
email: "foo#foo.com",
... other attributes ...
}
If the user adds a second, third, etc. email, then add a new item for each email, like this:
{
userId: "08074c7e0c0a4453b3c723685021d0b6", // partition key
itemTypeAndIndex: "Email-2", // sort key
email: "bar#bar.com"
// no more attributes
}
The same global secondary index (with email as the Partition Key) can still be used to find both primary and non-primary email addresses.
If a user wants to change their primary email address, we'd swap the email values in the "primary" and "non-primary" items. (Now that DynamoDB supports transactions, doing this will be safer than before!)
If we need to delete a user, we'd have to delete all the items for that userId. If we need to merge two users then we'd have to merge all items for that userId.
The same approach (new items with same userId but different sort keys) could be used for other 1-user-has-many-values data that needs to be Query-able
Is this a good way to do it? Is there a better way?
Justin, for searching on attributes I would strongly advise not to use DynamoDB. I am not saying, you can't achieve this. However, I see a few problems that will eventually come in your path if you will go this root.
Using sort-key on email-id will result in creating duplicate records for the same user i.e. if a user has registered 5 email, that implies 5 records in your table with the same schema and attribute except email-id attribute.
What if a new use-case comes in the future, where now you also want to search for a user based on some other attribute(for example cell phone number, assuming a user may have more then one cell phone number)
DynamoDB has a hard limit of the number of secondary indexes you can create for a table i.e. 5.
Thus with increasing use-case on search criteria, this solution will easily become a bottle-neck for your system. As a result, your system may not scale well.
To best of my knowledge, I can suggest a few options that you may choose based on your requirement/budget to address this problem using a combination of databases.
Option 1. DynamoDB as a primary store and AWS Elasticsearch as secondary storage [Preferred]
Store the user records in DynamoDB table(let's call it UserTable)as and when a user registers.
Enable DynamoDB table streams on UserTable table.
Build an AWS Lambda function that reads from the table's stream and persists the records in AWS Elasticsearch.
Now in your application, use DynamoDB for fetching user records from id. For all other search criteria(like searching on emailId, phone number, zip code, location etc) fetch the records from AWS Elasticsearch. AWS Elasticsearch by default indexes all the attributes of your record, so you can search on any field within millisecond of latency.
Option 2. Use AWS Aurora [Less preferred solution]
If your application has a relational use-case where data are related, you may consider this option. Just to call out, Aurora is a SQL database.
Since this is a relational storage, you can opt for organizing the records in multiple tables and join them based on the primary key of those tables.
I will suggest for 1st option as:
DynamoDB will provide you durable, highly available, low latency primary storage for your application.
AWS Elasticsearch will act as secondary storage, which is also durable, scalable and low latency storage.
With AWS Elasticsearch, you can run any search query on your table. You can also do analytics on data. Kibana UI is provided out of the box, that you may use to plot the analytical data on a dashboard like (how user growth is trending, how many users belong to a specific location, user distribution based on city/state/country etc)
With DynamoDB streams and AWS Lambda, you will be syncing these two databases in near real-time [within few milliseconds]
Your application will be scalable and the search feature can further be enhanced to do filtering on multi-level attributes. [One such example: search all users who belong to a given city]
Having said that, now I will leave this up to you to decide. 😊

How RedShift Sessions are handled from a Server Connection for TEMP tables

I'm using ColdFusion to connect to a RedShift database and I'm trying to understand how to test/assume myself of how the connections work in relation to TEMP tables in RedShift.
In my CFADMIN for the datasource I have unchecked Maintain connections across client requests. I would assume then each user who is using my website would have their own "Connection" to the DB? Is that correct?
Per the RedShift docs about temp tables:
TEMP: Keyword that creates a temporary table that is visible only within the current session. The table is automatically dropped at the end of the session in which it is created. The temporary table can have the same name as a permanent table. The temporary table is created in a separate, session-specific schema. (You cannot specify a name for this schema.) This temporary schema becomes the first schema in the search path, so the temporary table will take precedence over the permanent table unless you qualify the table name with the schema name to access the permanent table.
Am I to understand that if #1 is true and each user has their own connection to the database and thereby their own session then per #2 any tables that are created will be only in that session even though the "user" is the same as it's a connection from my server that is using the same credentials.
3.If my assumptions in #1 and #2 are correct then if I have ColdFusion code that runs a query like so:
drop if exists tablea
create temp table tablea
insert into tablea
select * from realtable inner join
drop tablea
And multiple users are using that same function that does this. They should never run into any conflicts where one table gets dropped as another request is trying to use it correct?
How do I test that this is the case? Besides throwing it into production and waiting for an error how can I know. I tried running a few windows side by side in different browsers and stuff and didn't notice an issue, but I don't know how to know if the temp tables truly are different between clients. (as they should be.) I imagine I could query some meta data but what meta data about the table would tell me that?
I have a similar situation, but with redbrick database software. I handle it by creating unique table names. The general idea is:
Create a table name something like this:
<cfset tablename = TableText & randrange(1, 100000)>
Try to create a table with that name. If you fail try again with a different name.
If you fail 3 times stop trying and mail the cfcatch information to someone.
I have all this code in a custom tag.
Edit starts here
Based on the comments, here is some more information about my situation. In CFAdmin, for the datasource being discussed, the Maintain Connections box is checked.
I put this code on a ColdFusion page:
<cfquery datasource="dw">
create temporary table dan (f1 int)
</cfquery>
I ran the page and then refreshed it. The page executed successfully the first time. When refreshed, I got this error.
Error Executing Database Query.
** ERROR ** (7501) Name defined by CREATE TEMPORARY TABLE already exists.
That's why I use unique tablenames. I don't cache the queries though. Ironically, my most frequent motivation for using temporary tables is because there are situations where they make things run faster than using the permanent tables.