I have a table which is updated with some regularity and I want to be able to version it so that I can at any time roll back to a previous version. I want to do this at an abstract level so that I am not versioning the data itself (i.e. having versioning being part of the table) but rather storing the transaction log in another table. What is the best way of doing this?
Possible solution: add a trigger for onInsert, onUpdate, onDelete, etc. and have those perform the relevant inserts on some other table but it seems like more work than necessary seeing as how sqlite already has a transaction log: if only I could somehow query the log (and have it not be deleted).
For example:
CREATE TABLE Contacts(
id INT PRIMARY KEY,
first VARCHAR,
middle VARCHAR,
last VARCHAR,
phone VARCHAR,
date TIMESTAMP
);
With that table, I would want to be able to answer the question: "What phone number did so and so have prior to 'some day'" or "Did so and so ever change their first name?"
Sounds like you want to store the DATA as different versions - so for example, you could have a table that contains the filename (or some similar identifier) and a version number, and a second table that contains the the content of each "file" (based on some ID). When a file is updated, the file is stored again. If you want to do it like a proper version system, you store the difference between the previous and the new data in the second table [or the latest file and the difference towards the older versions of the file, or some variation on that theme - there are many open source version control systems out there, all of which have slightly different solutions to how the actual file-data is stored].
However, there are also existing encrypted variants of version control systems, such as GIST - which may be doing what you want in the first place, but without further details of exactly what you want to do, it's hard to say.
Related
Suppose I have created a table like this.
CREATE TABLE Vehicle
and insert some documents to this table.
INSERT INTO Vehicle
<< {
'VIN' : '1N4AL11D75C109151',
'Type' : 'Sedan',
} >>
So my requirement is to change the table name from Vehicle to VehicleCar and want to change the 'VIN' to 'VID'
How can I do that?
Thanks,
Dasun.
QLDB doesn't currently offer an ALTER TABLE capability. You'd have to DROP the table and re-create it. This counts against your table limits, so don't do it too often.
QLDB is schema-less, so you can change your field names and/or the structure of your documents anytime you want to, simply by writing new revisions to your documents in the new format. The journal will still contain the old revisions, however. If your application has any functionality that uses the history() function to access old revisions, then it needs to be able to gracefully handle variations in the document format.
It is important to note that QLDB is not optimized for scanning large volumes of data. It's optimized for targeted queries against an index using an equality operator. A query like "SELECT * FROM table" will scan the entire table. This is an anti-pattern for QLDB and will not perform well as your ledger grows. So if you change your document format, running a SELECT * and updating every document to the new format may be more work than you realize. First, that SELECT * scan query may time-out or it may be aborted with an Optimistic Concurrency Control exception because another process inserted a document in the table. Second, you'd have to do it in batches of 40 documents at a time because of the limit to the number of documents in a transaction.
All of this is to say that making your application resilient to schema changes is a good idea. :-)
I have an existing HANA warehouse which was built without create/update timestamps. I need to generate a number of nightly batch delta files to send to another platform. My problem is how to detect which records are new or changed so that I can capture those records within the replication process.
Is there a way to use HANA's built-in features to detect new/changed records?
SAP HANA does not provide a general change data capture interface for tables (up to current version HANA 2 SPS 02).
That means, to detect "changed records since a given point in time" some other approach has to be taken.
Depending on the information in the tables different options can be used:
if a table explicitly contains a reference to the last change time, this can be used
if a table has guaranteed update characteristics (e.g. no in-place update and monotone ID values), this could be used. E.g.
read all records where ID is larger than the last processed ID
if the table does not provide intrinsic information about change time then one could maintain a copy of the table that contains
only the records processed so far. This copy can then be used to
compare the current table and compute the difference. SAP HANA's
Smart Data Integration (SDI) flowgraphs support this approach.
In my experience, efforts to try "save time and money" on this seemingly simple problem of a delta load usually turn out to be more complex, time-consuming and expensive than using the corresponding features of ETL tools.
It is possible to create a Log table and organize columns according to your needs so that by creating a trigger on your database tables you can create a log record with timestamp values. Then you can query your log table to determine which records are inserted, updated or deleted from your source tables.
For example, following is from one of my test trigger codes
CREATE TRIGGER "A00077387"."SALARY_A_UPD" AFTER UPDATE ON "A00077387"."SALARY" REFERENCING OLD ROW MYOLDROW,
NEW ROW MYNEWROW FOR EACH ROW
begin INSERT
INTO SalaryLog ( Employee,
Salary,
Operation,
DateTime ) VALUES ( :mynewrow.Employee,
:mynewrow.Salary,
'U',
CURRENT_DATE )
;
end
;
You can create AFTER INSERT and AFTER DELETE triggers as well similar to AFTER UPDATE
You can organize your Log table so that so can track more than one table if you wish just by keeping table name, PK fields and values, operation type, timestamp values, etc.
But it is better and easier to use seperate Log tables for each table.
I am currently developing server software in C++ with a MySQL data backend. I am using the official MySQL/connector library from Oracle to work with MySQL. The connection itself is working and I'm not having any issues with that.
My problem is that the database and the table schemas tend to change every once in a while because new tables and columns keep getting added. Also exiting column may be changed for the same reason. To make sure I recognize outdated server software quickly I wanted to add a warning when the database has changed.
My first idea was to hardcode how the database (and tables and such) should look and then check whether the current database matches the hardcoded data. But I have no clue how to achive that.
In summary I want to be able to detect whether
A table has been added or removed
A column in a table has been altered
A column in a table has been added or removed
with as little C++ code as possible. Also it should be quite easy to maintain.
Additional information will be added when required.
I would suggest the following approach:
1) fork and execute the mysql command line client. Set up a pair of pipes, to mysql's standard input and output.
2) At this point you should be able to execute simple commands by piping them to mysql via the standard input pipe, and read the output from the standard output pipe.
You will need to make careful notes as to the output format of each mysql command, so that you know when you finished reading its output, and you can send the next command.
3) As the first order of being, execute:
show tables;
The output that comes back will list all tables in the database. Parsing the output into a list of table names is trival. Then execute for each table:
show create table <tablename>;
The resulting output shows all fields in the table, its keys, and constraints. Pretty much all of this table's schema. Lather, rinse, repeat, for every table.
4) In this manner you can capture a basic schema of the entire database, for comparison purposes. If necessary, use the same approach to capture the triggers, and other objects. You'll likely need to do some minor massaging of the data, and exclude a few bits. "show create table", for example, will include the current AUTO_INCREMENT values, which you can ignore.
This general approach, of driving a mysql process via its standard input and output, is bit wobbly, of course. With a little bit of work, you can use mysql's native client library, and execute all of these commands, and capture their results, directly. This should be more reliable.
Suppose I have a dynamo table named Person, which has 2 fields, name (string), age (int). Let's assume it has a TB worth of data and experiences a small amount of read throughput, but a ton of write throughput. Now I want to add a new field called Phone (string). What is the best way to go about moving the data from one table to another?
Note: Dynamo doesn't let you rename tables, and fields cannot be null.
Here are the options I think I have:
Dump the table to .csv, run a script (overnight probably since it's a TB worth of data) to add a default phone number to this file. (Not ideal, will also lose all new data submitted into old table, unless I bring the service offline to perform the migration (which is not an option in this case)).
Use the SCAN api call. (SCAN will read all values, then will consume significant write throughput on the new table to insert all old data into it).
How can I do perform a dynamo migration on a large table w/o
significant data loss?
you don't need to do anything. This is NoSQL, not SQL. (i.e. there is no idiomatic way to do this as you normally don't need migrations for NoSQL)
Just start writing entries with the additional key.
Records you get back that are written before will not have this key. What you normally do is have a default value you use when missing.
If you want to backfill, just go through and read the value + put the value with the additional field. You can do this in one run via a scan or again do it lazily when accessing the data.
I have a Borland builder c++ 6 application calling Oracle 10g database. Operating over a LAN. When the application in question makes a simple db select e.g.
select table_name from element_tablenames where element_id = 10023842
the following is recorded as happening in Oracle (from the performance logs)
select table_name
from element_tablenames
where element_id = 10023842
then immediately (and not from C++ source code but perhaps deeper)
select table_name, element_tablenames.ROWID
from element_tablenames
where element_id = 10023842
The select statement is only called once in the TADODbQuery object, yet two queries are being performed - one to parse and the other adds the ROWID for executon.
Over a WAN and many, many queries this is obviously a problem to the user.
Does anyone know why this might be happening, can someone suggest a solution?
Agree with Robert.
The ROWID uniquely identifies a row in a table so that the returned record can be applied back to the database with any changes (or as a DELETE).
Is there a way to identify a particular column (or set of columns) as a primary key so that it can be used to identify a row without using a ROWID.
I don't know exactly where the RowID is coming from, it could be either the TAdoQuery implementation or the Oracle Driver. But I am sure I found the reason.
From the Oracle docs:
If the database table does not contain a primary key, the ROWID must be selected explicitly when populating DataTable.
So I suspect your Table does not have a primary key, either add one or add the rowid.
Either way this will solve the duplicate query problem.
Since you are concerned about performance. In general
Using TAdoQuery you can set the CursorType to optimize different behaviors for performance. This article covers this from a TAdoQuery perspective. MSDN also has an article that covers it from from a general ADO Perspective. Finally the specifications from the Oracle Driver can be useful.
I would recommend setting the Cursor to either as they are the only supported by Oracle
ctStatic - Bi-directional query produced.
ctOpenForwardOnly - Unidirectional query produced, fastest but can't call Prior
You can also play with CursorLocation to see how it effects your speed.